Studying the human mind is tough. You can ask people how they think, but they often don’t know. You can scan their brains, but the tools are blunt. You can damage their brains and watch what happens, but they don’t take kindly to that. So even a task as supposedly simple as the first step in reading—recognizing letters on a page—keeps scientists guessing.
Now, psychologists are using artificial intelligence (AI) to probe how our minds actually work. Marco Zorzi, a psychologist at the University of Padua in Italy, used artificial neural networks to show how the brain might “hijack” existing connections in the visual cortex to recognize the letters of the alphabet, he and colleagues reported last month in Nature Human Behaviour. Zorzi spoke with Science about the study and about his other work. This interview has been edited for brevity and clarity.
Q: What did you learn in your study of letter perception?
A: We first trained the model on patches of natural images, of trees and mountains, and then this knowledge becomes a vocabulary of basic visual features the network uses to learn about letter shapes. This idea of “neural recycling” has been around for some time, but as far as I know this is the first demonstration where you actually gained in performance: We saw better letter recognition in a model that trained on natural images than one that didn’t. Recycling makes learning letters much faster compared to the same network without recycling. It gives the network a head start.
Q: How does the training work?
A: It uses “unsupervised” learning. After pretraining on the natural images, we feed the neural network unlabeled images of letters. The goal is simply to build an internal model of the data, to find the latent structure. It’s called “generative” because it’s generating patterns from the top down. It uses the knowledge it has learned to interpret the new incoming sensory information.
Later, a simpler algorithm learns to put letter labels on that network’s outputs. This one uses “supervised” learning—we tell it when it’s right and wrong—but most of the work was done by the unsupervised algorithm.
Q: Why focus on unsupervised learning, which is much less common in AI?
A: With supervised learning, you are assuming that you have a teacher providing the correct label at each learning event. Think about how we humans learn. This very rarely happens.
Supervised learning is a feed-forward, bottom-up approach, unlike the top-down approach of unsupervised learning. There are a lot of feedback connections in the brain. Moreover, there is intrinsic activity in the brain, which is one of the more interesting findings of last 20 years or so in neuroimaging. It’s not generated by sensory stimuli. Intrinsic activity can only come from activating neurons in high layers and then propagating this activity back and forth around the network. It can be described as a form of “dreaming” or “imagery.” When combined with sensory activity, top-down feedback leads to interpretation of the input. For example, if a written word is partially blocked, readers can fill in what they don’t see based on what they expect.
The other advantage of unsupervised learning is that since there is no assigned task, knowledge is not tied to a specific application. It’s easy to learn a new task by using this higher-level knowledge. An example is that learning what numbers mean is later applied to learning arithmetic.
Q: The part of your network trained on natural images was still more responsive to images of real letters versus made-up ones. Does that mean real letters somehow resemble nature?
A: Yes, this is one explanation. There’s this hypothesis that has been around for some time that the shapes of symbols across all writing systems have been culturally selected to better match the statistics of our visual environment. You can think about this in terms of the type of shapes needed to better suit brains trained on nature.
Q: What else have you learned about human cognition?
A: We know that babies and animals can compare numbers of objects even without labels. We found that deep unsupervised learning on images containing different numbers of objects yields this visual number sense in a neural network. It was the first study using deep learning for cognitive modeling.
With neural networks, you have a learning algorithm. You can try to map the learning trajectory of the network onto human developmental data. Take something like learning to read. If you have a computer model that learns to read, you may also try to understand atypical learning, as in dyslexia.
Q: What have you found about dyslexia?
A: There’s a huge debate. What is the core deficit? People have looked at phonological, visual, and attentional deficits. We tested these hypotheses in a computer model of reading development. In a study that has not been published, we observed that if you don’t assume that dyslexia is caused by more than one deficit, there’s no way to explain the diversity in real dyslexic children. Where this approach is going is to try to build personalized models of individuals and use the simulations to predict the outcomes of interventions.
Q: Could simulating the brain like this also improve AI?
A: I think so. Bringing in more constraints from the information we have about the brain and how people learn can give us some new ideas on how to explore new learning solutions.