Proteins are the minions of life, working alone or together to build, manage, fuel, protect, and eventually destroy cells. To function, these long chains of amino acids twist and fold and intertwine into complex shapes that can be slow, even impossible, to decipher. Scientists have dreamed of simply predicting a protein’s shape from its amino acid sequence—an ability that would open a world of insights into the workings of life. “This problem has been around for 50 years; lots of people have broken their head on it,” says John Moult, a structural biologist at the University of Maryland, Shady Grove. But a practical solution is in their grasp.
Several months ago, in a result hailed as a turning point, computational biologists showed that artificial intelligence (AI) could accurately predict protein shapes. That group describes its approach online in Nature today. Meanwhile, David Baker and Minkyung Baek at the University of Washington, Seattle, and their colleagues present their AI-based structure prediction approach online in Science. Their method works on not just simple proteins, but also complexes of proteins.
Baker’s and Baek’s method and computer code have been available for weeks, and the team has already used it to model more than 4500 protein sequences submitted by other researchers. Savvas Savvides, a structural biologist at Ghent University, had tried six times to model a problematic protein. He says Baker’s and Baek’s program, called RoseTTAFold, “paved the way to a structure solution.”
In fall of 2020, DeepMind, a U.K.-based AI company owned by Google, wowed the field with its structure predictions in a biennial competition. Called Critical Assessment of Protein Structure Prediction (CASP), the competition uses structures newly determined using laborious lab techniques such as x-ray crystallography as benchmarks. DeepMind’s program, AlphaFold2, did “really extraordinary things [predicting] protein structures with atomic accuracy,” says Moult, who organizes CASP.
But for many structural biologists, AlphaFold2 was a tease: “Incredibly exciting but also very frustrating,” says David Agard, a structural biophysicist at the University of California, San Francisco. In mid-June, 3 days after the Baker lab posted its RoseTTAFold preprint, Demis Hassabis, DeepMind’s CEO, tweeted that AlphaFold2’s details were under review at a publication and the company would provide “broad free access to AlphaFold for the scientific community.” Nature has now rushed to publish that paper to coincide with the Science paper. “It is appropriate that it is not coming out after ours, as our work is really based on their advances,” Baker says.
DeepMind’s 30-minute presentation at CASP had been enough to inspire Baek to develop her own approach. Like AlphaFold2, it uses AI’s ability to discern patterns in vast databases of examples, generating ever more informed and accurate iterations as it learns. When given a new protein to model, RoseTTAFold proceeds along multiple “tracks.” One compares the protein’s amino acid sequence with all similar sequences in protein databases. Another predicts pairwise interactions between amino acids within the protein, and a third compiles the putative 3D structure. The program bounces among the tracks to refine the model, using the output of each one to update the others. DeepMind’s approach involves just two tracks.
Gira Bhabha, a cell and structural biologist at New York University School of Medicine, says both methods work well. “Both the DeepMind and Baker lab advances are phenomenal and will change how we can use protein structure predictions to advance biology,” she says. A DeepMind spokesperson wrote in an email, “It’s great to see examples such as this where the protein folding community is building on AlphaFold to work towards our shared goal of increasing our understanding of structural biology.”
But AlphaFold2 solved the structures of only single proteins, whereas RoseTTAFold has also predicted complexes, such as the structure of the immune molecule interleukin-12 latched onto its receptor. Many biological functions depend on protein-protein interactions, says Torsten Schwede, a computational structural biologist at the University of Basel. “The ability to handle protein-protein complexes directly from sequence information makes it extremely attractive for many questions in biomedical research.”
Baker concedes that AlphaFold2’s structures are more accurate. But Savvides says the Baker lab’s approach better captures “the essence and particularities of protein structure,” such as identifying strings of atoms sticking out of the sides of the protein—features key to interactions between proteins. Last year, AlphaFold2 needed a lot of computing power to work, more than RoseTTAFold. “Now, it seems they’ve accelerated their method since CASP14, and it’s now comparable to RoseTTAFold,” Baek says.
Beginning on 1 June, Baker and Baek began to challenge their method by asking researchers to send in their most baffling protein sequences. Fifty-six head scratchers arrived in the first month, all of which have now predicted structures. Agard’s group sent in an amino acid sequence with no known similar proteins. Within hours, his group got a protein model back “that probably saved us a year of work,” Agard says. Now, he and his team know where to mutate the protein to test ideas about how it functions.
Because Baek’s and Baker’s group has released its computer code on the web, others can improve on it; the code has been downloaded 250 times since 1 July. “Many researchers will build their own structure prediction methods upon Baker’s work,” says Jinbo Xu, a computational structural biologist at the Toyota Technological Institute at Chicago. Hassabis says its computer code is now also open source. As a result of both groups’ work, progress should now be swift, Moult says: “When there’s a breakthrough like this, 2 years later, everyone is doing it as well if not better than before.”