How Transformers Seem to Mimic Parts of the Brain

Source Node: 1661484

Understanding how the brain organizes and accesses spatial information — where we are, what’s around the corner, how to get there — remains an exquisite challenge. The process involves recalling an entire network of memories and stored spatial data from tens of billions of neurons, each connected to thousands of others. Neuroscientists have identified key elements such as grid cells, neurons that map locations. But going deeper will prove tricky: It’s not as though researchers can remove and study slices of human gray matter to watch how location-based memories of images, sounds and smells flow through and connect to each other.

Artificial intelligence offers another way in. For years, neuroscientists have harnessed many types of neural networks ­­­— the engines that power most deep learning applications — to model the firing of neurons in the brain. In recent work, researchers have shown that the hippocampus, a structure of the brain critical to memory, is basically a special kind of neural net, known as a transformer, in disguise. Their new model tracks spatial information in a way that parallels the inner workings of the brain. They’ve seen remarkable success.

“The fact that we know these models of the brain are equivalent to the transformer means that our models perform much better and are easier to train,” said James Whittington, a cognitive neuroscientist who splits his time between Stanford University and the lab of Tim Behrens at the University of Oxford.

Studies by Whittington and others hint that transformers can greatly improve the ability of neural network models to mimic the sorts of computations carried out by grid cells and other parts of the brain. Such models could push our understanding of how artificial neural networks work and, even more likely, how computations are carried out in the brain, Whittington said.

“We’re not trying to re-create the brain,” said David Ha, a computer scientist at Google Brain who also works on transformer models. “But can we create a mechanism that can do what the brain does?”

Transformers first appeared five years ago as a new way for AI to process language. They are the secret sauce in those headline-grabbing sentence-completing programs like BERT and GPT-3, which can generate convincing song lyrics, compose Shakespearean sonnets and impersonate customer service representatives.

Transformers work using a mechanism called self-attention, in which every input — a word, a pixel, a number in a sequence — is always connected to every other input. (Other neural networks connect inputs only to certain other inputs.) But while transformers were designed for language tasks, they’ve since excelled at other tasks such as classifying images — and now, modeling the brain.

In 2020, a group led by Sepp Hochreiter, a computer scientist at Johannes Kepler University Linz in Austria, used a transformer to retool a powerful, long-standing model of memory retrieval called a Hopfield network. First introduced 40 years ago by the Princeton physicist John Hopfield, these networks follow a general rule: Neurons that are active at the same time build strong connections with each other.

Hochreiter and his collaborators, noting that researchers have been looking for better models of memory retrieval, saw a connection between how Hopfield networks retrieve memories and how transformers perform attention. They upgraded the Hopfield network, essentially turning it into a transformer. That change allowed the model to store and retrieve more memories because of more effective connections, Whittington said. Hopfield himself, together with Dmitry Krotov at the MIT-IBM Watson AI Lab, proved that a transformer-based Hopfield network was biologically plausible.

Then, earlier this year, Whittington and Behrens helped further tweak Hochreiter’s approach, modifying the transformer so that instead of treating memories as a linear sequence — like a string of words in a sentence — it encoded them as coordinates in higher-dimensional spaces. That “twist,” as the researchers called it, further improved the model’s performance on neuroscience tasks. They also showed that the model was mathematically equivalent to models of the grid cell firing patterns that neuroscientists see in fMRI scans.

“Grid cells have this kind of exciting, beautiful, regular structure, and with striking patterns that are unlikely to pop up at random,” said Caswell Barry, a neuroscientist at University College London. The new work showed how transformers replicate exactly those patterns observed in the hippocampus. “They recognized that a transformer can figure out where it is based on previous states and how it’s moved, and in a way that’s keyed into traditional models of grid cells.”

Other recent work suggests that transformers could advance our understanding of other brain functions as well. Last year, Martin Schrimpf, a computational neuroscientist at the Massachusetts Institute of Technology, analyzed 43 different neural net models to see how well they predicted measurements of human neural activity as reported by fMRI and electrocorticography. Transformers, he found, are the current leading, state-of-the-art neural networks, predicting almost all the variation found in the imaging.

And Ha, along with fellow computer scientist Yujin Tang, recently designed a model that could intentionally send large amounts of data through a transformer in a random, unordered way, mimicking how the human body transmits sensory observations to the brain. Their transformer, like our brains, could successfully handle a disordered flow of information.

“Neural nets are hard-wired to accept a particular input,” said Tang. But in real life, data sets often change quickly, and most AI doesn’t have any way to adjust. “We wanted to experiment with an architecture that could adapt very quickly.”

Despite these signs of progress, Behrens sees transformers as just a step toward an accurate model of the brain — not the end of the quest. “I’ve got to be a skeptic neuroscientist here,” he said. “I don’t think transformers will end up being how we think about language in the brain, for example, even though they have the best current model of sentences.”

“Is this the most efficient basis to make predictions about where I am and what I will see next? If I’m honest, it’s too soon to tell,” said Barry.

Schrimpf, too, noted that even the best-performing transformers are limited, working well for words and short phrases, for example, but not for larger-scale language tasks like telling stories.

“My sense is that this architecture, this transformer, puts you in the right space to understand the structure of the brain, and can be improved with training,” said Schrimpf. “This is a good direction, but the field is super complex.”

Time Stamp:

More from Quantamagazine