ZD Net

Resisting the urge to be impressed, knowing what we talk about when we talk about AI

Big DataTime Stamp: June 1, 2022 10:49 AM

Source Node: 2308183

Republished By Plato

Followers: 0

DALL-E 2’s already iconic depiction of an astronaut riding a horse has been hailed as “a milestone in AI’s journey to make sense of the world”. Critics argue that may be an overstatement.

Joscha Bach: https://twitter.com/Plinz/status/1529013919682994176

Bragging rights are in constant flux, it would seem. As to whether those multimodal AI models do anything to address the criticism on resource utilization and bias, while there is not much known at this point, based on what is known the answers seem to be “probably not” and “sort of”, respectively. And what about the actual intelligence part? Let’s look under the hood for a moment.

OpenAI notes that “DALL·E 2 has learned the relationship between images and the text used to describe them. It uses a process called “diffusion,” which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image”.

Google notes that their “key discovery is that generic LLMs (e.g. T5), pre-trained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model”.

While Imagen seems to rely heavily on LLMs, the process is different for DALL-E 2. However, both OpenAI’s and Google’s people, as well as independent experts, claim that those models show a form of “understanding” that overlaps with human understanding. The MIT Technology review went as far as to call the horse-riding astronaut, the image which has become iconic for DALL-E 2, a milestone in AI’s journey to make sense of the world.

Gary Marcus, however, remains unconvinced. Marcus, a scientist, best-selling author, and entrepreneur, is well known in AI circles for his critique on a number of topics, including the nature of intelligence and what’s wrong with deep learning. He was quick to point out deficiencies in both DALL-E 2 and Imagen, and to engage in public dialogue, including with people from Google.

Marcus shares his insights in an aptly titled “Horse rides astronaut” essay. His conclusion is that expecting those models to be fully sensitive to semantics as it relates to the syntactic structure is wishful thinking and that the inability to reason is a general failure point of modern machine learning methods and a key place to look for new ideas.

Last but not least, in May 2022, DeepMind announced Gato, a generalist AI model. As ZDNet’s own Tiernan Ray notes, Gato is a different kind of multimodal AI model. Gato can work with multiple kinds of data to perform multiple kinds of tasks, such as playing video games, chatting, writing compositions, captioning pictures, and controlling robotic arm stacking blocks.

As Ray also notes, Gato does a so-so job at a lot of things. However, that did not stop people from the DeepMind team that built Gato from exclaiming that “The Game is Over! It’s about making these models bigger, safer, compute efficient, faster at sampling, smarter memory, more modalities”.

Language, goals, and the market power of the few

So where does all of that leave us? Hype, metaphysical beliefs and enthusiastic outbursts aside, the current state of AI should be examined with sobriety. While the models that have been released in the last few months are really impressive feats of engineering and are sometimes able of producing amazing results, the intelligence they point to is not really artificial.

Human intelligence is behind the impressive engineering that generates those models. It is human intelligence that has built models that are getting better and better at what Alan Turing’s foundational paper, Computing Machinery and Intelligence called “the imitation game,” which has come to be known popularly as “the Turing test”.

As the Executive Director of the Center on Privacy & Technology (CPT) at Georgetown Law Emily Tucker writes, Turing replaced the question “can machines think?” with the question of whether a human can mistake a computer for another human.

Turing does not offer the latter question in the spirit of a helpful heuristic for the former question; he does not say that he thinks these two questions are versions of one another. Rather, he expresses the belief that the question “can machines think?” has no value, and appears to hope affirmatively for a near future in which it is in fact very difficult if not impossible for human beings to ask themselves the question at all.

In some ways, that future may be fast approaching. Models like Imagen and DALL-E break when presented with prompts that require intelligence of the kind humans possess in order to process. However, for most intents and purposes, those may be considered edge cases. What the DALL-Es of the world are able to generate is on par with the most skilled artists.

The question then is, what is the purpose of it all. As a goal in itself, spending the time and resources that something like Imagen requires to be able to generate cool images at will seems rather misplaced.

Seeing this as an intermediate goal towards the creation of “real” AI may be more justified, but only if we are willing to subscribe to the notion that doing the same thing at an increasingly bigger scale will somehow lead to different outcomes.

A neural network transforms input, the circles on the left, to output, on the right. How that happens is a transformation of weights, center, which we often confuse for patterns in the data itself.

Tiernan Ray for ZDNET

In this light, Tucker’s stated intention to be as specific as possible about what the technology in question is and how it works, instead of using terms such as “Artificial intelligence and “machine learning”, starts making sense on some level.

For example, writes Tucker, instead of saying “face recognition uses artificial intelligence,” we might say something like “tech companies use massive data sets to train algorithms to match images of human faces”. Where a complete explanation is disruptive to the larger argument, or beyond CPT’s expertise, they will point readers to external sources.

Truth be told, that does not sound very practical in terms of readability. However, it’s good to keep in mind that when we say “AI”, it really is a convention, not something to be taken at face value. It really is tech companies using massive data sets to train algorithms to perform — sometimes useful and/or impressive — imitations of human intelligence.

Which inevitably, leads to more questions, such as — to do what, and for whose benefit. As Erik Brynjolfsson, an economist by training and director of the Stanford Digital Economy Lab writes, the excessive focus on human-like AI drives down wages for most people “even as it amplifies the market power of a few” who own and control the technologies.

In that respect, AI is no different than other technologies that predated it. What may be different this time around is the speed at which things are unfolding, and the degree of amplification to the power of the few.