New universe of miniproteins is upending cell biology and genetics

Mice put human runners to shame. Despite taking puny strides, the rodents can log 10 kilometers or more per night on an exercise wheel. But the mice that muscle biologist Eric Olson of the University of Texas Southwestern Medical Center in Dallas and colleagues unveiled in 2015 stood out. On a treadmill, the mice could scurry up a steep 10% grade for about 90 minutes before faltering, 31% longer than other rodents. Those iron mice differed from counterparts in just one small way—the researchers had genetically altered the animals to lack one muscle protein. That was enough to unleash superior muscle performance. "It's like you've taken the brakes off," Olson says.

Just as startling was the nature of the crucial protein. Muscles house some gargantuan proteins. Dystrophin, a structural protein whose gene can carry mutations that cause muscular dystrophy, has more than 3600 amino acids. Titin, which acts like a spring to give muscles elasticity, is the biggest known protein, with more than 34,000 amino acids. The protein disabled in the mice has a paltry 46. Although researchers have probed how muscles work for more than 150 years, they had completely missed the huge impact this tiny protein, called myoregulin, has on muscle function.

Olson and his colleagues weren't the only ones to be blindsided by Lilliputian proteins. As scientists now realize, their initial rules for analyzing genomes discriminated against identifying those pint-size molecules. Now, broader criteria and better detection methods are uncovering minuscule proteins by the thousands, not just in mice, but in many other species, including humans. "For the first time, we are about to explore this universe of new proteins," says biochemist Jonathan Weissman of the University of California, San Francisco.

Biologists are just beginning to delve into the functions of those molecules, called microproteins, micropeptides, or miniproteins. But their small size seems to allow them to jam the intricate workings of larger proteins, inhibiting some cellular processes while unleashing others. Early findings suggest microproteins bolster the immune system, control destruction of faulty RNA molecules, protect bacteria from heat and cold, dictate when plants flower, and provide the toxic punch for many types of venom. "There's probably going to be small [proteins] involved in all biological processes. We just haven't looked for them before," says biochemist Alan Saghatelian of the Salk Institute for Biological Studies in San Diego, California.

The venom of this predatory water bug has more than a dozen small proteins.

ANDREW WALKER

Small proteins also promise to revise the current understanding of the genome. Many appear to be encoded in stretches of DNA—and RNA—that were not thought to help build proteins of any sort. Some researchers speculate that the short stretches of DNA could be newborn genes, on their way to evolving into larger genes that make full-size proteins. Thanks in part to small proteins, "We need to rethink what genes are," says microbiologist and molecular biologist Gisela Storz of the National Institute of Child Health and Human Development in Bethesda, Maryland.

Despite the remaining mysteries, scientists are already testing potential uses for the molecules. One company sells insecticides derived from small proteins in the poison of an Australian funnel-web spider. And a clinical trial is evaluating an imaging agent based on another minute protein in scorpion venom, designed to highlight the borders of tumors so that surgeons can remove them more precisely. Many drug companies are now searching for small proteins with medical potential, says biochemist Glenn King of the University of Queensland in St. Lucia, Australia. "It's one of the most rapidly growing areas."

Other short amino acid chains, often called peptides or polypeptides, abound in cells, but they are pared-down remnants of bigger predecessors. Myoregulin and its diminutive brethren, in contrast, are born small. How tiny they can be remains unclear. Fruit flies rely on a microprotein with 11 amino acids to grow normal legs, and some microbes may crank out proteins less than 10 amino acids long, notes microbial genomicist Ami Bhatt of Stanford University in Palo Alto, California. But even the largest small proteins don't measure up to average-size proteins such as alpha amylase, a 496–amino-acid enzyme in our saliva that breaks down starch.

Few small proteins came to light until recently because of a criterion for identifying genes set about 20 years ago. When scientists analyze an organism's genome, they often scan for open reading frames (ORFs), which are DNA sequences demarcated by signals that tell the cell's ribosomes, its proteinmaking assembly lines, where to start and stop. In part to avoid a data deluge, past researchers typically excluded any ORF that would yield a protein smaller than 100 amino acids in eukaryotes or 50 amino acids in bacteria. In yeast, for example, that cutoff limited the list of ORFs to about 6000.

Relaxing that criterion reveals that cells carry vastly more ORFs. Earlier this year, Stanford postdoc Hila Sberro Livnat, Bhatt, and colleagues trawled genome fragments from the microbes that inhabit four parts of the human body, including the gut and skin. By searching for small ORFs that could encode proteins between five and 50 amino acids long, the researchers identified about 4000 families of potential microproteins. Almost half resemble no known proteins, but the sequence for one small ORF suggested that a corresponding protein resides in ribosomes—a hint that it could play some fundamental role. "It's not just genes with esoteric functions that have been missed" when scientists overlooked small ORFs, Bhatt says. "It's genes with core functions."

For the first time, we are about to explore this universe of new proteins.

Jonathan Weissman, University of California, San Francisco

Other cells also house huge numbers of short ORFs—yeast could make more than 260,000 molecules with between two and 99 amino acids, for example. But cells almost certainly don't use all those ORFs, and some of the amino acid strings they produce may not be functional. In 2011, after finding more than 600,000 short ORFs in the fruit fly genome, developmental geneticist Juan Pablo Couso of the University of Sussex in Brighton, U.K., and colleagues tried to whittle down the number. They reasoned that if a particular ORF had an identical or near-identical copy in a related species, it was less likely to be genomic trash. After searching another fruit fly's genome and analyzing other evidence that the sequences were being translated, the group ended up with a more manageable figure of 401 short ORFs likely to yield microproteins. That would still represent a significant fraction of the insects' protein repertoire—they harbor about 22,000 full-size proteins.

Weissman and colleagues found microproteins a second way, through a method they invented to broadly determine which proteins cells are making. To fashion any protein, a cell first copies a gene into messenger RNA. Then ribosomes read the mRNA and string together amino acids in the order it specifies. By sequencing mRNAs attached to ribosomes, Weissman and his team pinpoint which ones cells are actually turning into proteins and where on the RNAs a ribosome starts to read. In a 2011 Cell study, he and his team applied that ribosome profiling method, also called Ribo-seq, to mouse embryonic stem cells and discovered the cells were making thousands of unexpected proteins, including many that would fall below the 100–amino-acid cutoff. "It was quite clear that the standard understanding had ignored a large universe of proteins, many of which were short," Weissman says.

Saghatelian and his colleagues adopted a third approach to discover a trove of microproteins in our own cells. The researchers used mass spectrometry, which involves breaking up proteins into pieces that are sorted by mass to produce a distinctive spectrum for each protein. Saghatelian, his then-postdoc Sarah Slavoff, and colleagues applied the method to protein mixtures from human cells and then subtracted the signatures of known proteins. That approach revealed spectra for 86 previously undiscovered tiny proteins, the smallest just 18 amino acids long, the researchers reported in 2013 in Nature Chemical Biology.

Being small limits a protein's capabilities. Larger proteins fold into complex shapes suited for a particular function, such as catalyzing chemical reactions. Proteins smaller than about 50 to 60 amino acids probably don't fold, says chemist Julio Camarero of the University of Southern California in Los Angeles. So they probably aren't suited to be enzymes or structural proteins.

However, their diminutive size also opens up opportunities. "They are tiny enough to fit into nooks and crannies of larger proteins that function as channels and receptors," Olson says. Small proteins often share short stretches of amino acids with their larger partners and can therefore bind to and alter the activity of those proteins. Bound microproteins can also shepherd bigger molecules to new locations—helping them slip into cell membranes, for instance.

A microprotein in the poison of the deathstalker scorpion has been fused to a fluorescent dye to make tumors emit near-infrared light. (1) A tumor seen in visible light (2) Same tumor in visible and near-infrared light

(TOP TO BOTTOM) IVAN KUZMIN/SCIENCE SOURCE; BLAZE BIOSCIENCE (2)

Because of their attraction to larger proteins, small proteins may give cells a reversible way to switch larger proteins on or off. In a 2016 study in PLOS Genetics, plant developmental biologist Stephan Wenkel of the University of Copenhagen and colleagues genetically altered Arabidopsis plants to produce extra amounts of two small proteins. The plants normally burst into flower when the days are long enough, but when they overproduced the two microproteins, their flowering was postponed. The small proteins caused that delay by blocking a hefty protein called CONSTANS that triggers flowering. They tether CONSTANS to other inhibitory proteins that shut it down. "A cell uses things that help it survive. If a short protein does the job, that's fine," Saghatelian says.

Those jobs include other key tasks. In 2016, Slavoff, Saghatelian, and colleagues revealed that human cells manufacture a 68–amino-acid protein they named NoBody that may help manage destruction of faulty or unneeded mRNA molecules. NoBody's name reflects its role in preventing formation of processing bodies (P-bodies), mysterious clusters in the cytoplasm where RNA breakdown may occur. When the protein is missing, more P-bodies form, thus boosting RNA destruction and altering the cell's internal structure. "It shows that small proteins can have massive effects in the cell," Slavoff says.

Muscles appear to depend on a variety of microproteins. During embryonic development, individual muscle cells merge into fibers that power contraction. The 84–amino-acid protein myomixer teams up with a larger protein to bring the cells together, Olson's team reported in 2017 in Science. Without it, embryonic mice can't form muscles and are almost transparent.

Later in life, myoregulin steps in to help regulate muscle activity. When a muscle receives a stimulus, cellular storage depots spill calcium, triggering the fibers to contract and generate force. An ion pump called SERCA then starts to return the calcium to storage, allowing the muscle fibers to relax. Myoregulin binds to and inhibits SERCA, Olson's team found. The effect limits how often a mouse's muscles can contract—perhaps ensuring that the animal has muscle power in reserve for an emergency, such as escaping a predator. Another small protein, DWORF, has the opposite effect, unleashing SERCA and enabling the muscle to contract repeatedly.

Even extensively studied organisms such as the intestinal bacterium Escherichia coli harbor unexpected small proteins that have important functions. Storz and her team reported in 2012 that a previously undiscovered 49–amino-acid protein called AcrZ helps the microbe survive some antibiotics by stimulating a pump that expels the drugs.

And the venom produced by a variety of organisms—including spiders, centipedes, scorpions, and poisonous mollusks—teems with tiny proteins. Many venom components disable or kill by blocking the channels for sodium or other ions that are necessary for transmission of nerve impulses. Small proteins "hit these ion channels with amazing specificity and potency," King says. "They are the major components of venoms and are responsible for most of the pharmacological and biological effects."

Australia's giant fish-killing water bug, for instance, doesn't just rely on sharp claws and lancelike mouthparts to subdue prey. It injects its victims with a brew of more than 130 proteins, 15 of which have fewer than 100 amino acids, King and colleagues reported last year.

Unlike hulking proteins such as antibodies, microproteins delivered by pill or injection may be able to slip into cells and alter their functions. Captopril, the first of a class of drugs for high blood pressure known as angiotensin-converting enzyme inhibitors was developed from a small protein in the venom of a Brazilian pit viper. But the drug, which the Food and Drug Administration approved for sale in the United States in 1981, was discovered by chance, before scientists recognized small proteins as a distinct group. So far, only a few microproteins have reached the market or clinical trials.

Cancer researchers are trying to capitalize on a microprotein in the poison of the deathstalker scorpion (Leiurus quinquestriatus) of Africa and the Middle East. The molecule has a mysterious attraction to tumors. By fusing it to a fluorescent dye, scientists hope to illuminate the borders of brain tumors so that surgeons can safely cut out the cancerous tissue. "It lights up the tumor. You can see the margins and if there are any metastases," King says. A clinical trial is now evaluating whether the dual molecule can help surgeons remove brain tumors in children.

How important small proteins will be for medicine is still unknown, but they have already upended several biological assumptions. Geneticist Norbert Hübner of the Max Delbrück Center for Molecular Medicine in Berlin and colleagues found dozens of new microproteins in human heart cells. The group traced them to an unexpected source: short sequences within long noncoding RNAs, a variety that was thought not to produce proteins. After identifying 169 long noncoding RNAs that were probably being read by ribosomes, Hübner and his team used a type of mass spectrometry to confirm that more than half of them yielded microproteins in heart cells, a result reported earlier this year in Cell.

Bacteria such as Escherichia coli also churn out many microproteins, although their functions remain unclear in many cases.

KWANGSHIN KIM/SCIENCE SOURCE

The DNA sequences for other tiny proteins also occur in unconventional locations. For example, some lie near the ORFs for bigger proteins. Researchers previously thought those sequences helped manage the production of the larger proteins, but rarely gave rise to proteins themselves. Some coding sequences for recently discovered microproteins are even nested within sequences that encode other, longer proteins.

Those genomic surprises could illuminate how new genes arise, says evolutionary systems biologist Anne-Ruxandra Carvunis of the University of Pittsburgh in Pennsylvania. Researchers had thought most new genes emerge when existing genes duplicate or fuse, or when species swap DNA. But to Carvunis, microproteins suggest protogenes can form when mutations create new start and stop signals in a noncoding portion of the genome. If the resulting ORF produces a beneficial protein, the novel sequences would remain in the genome and undergo natural selection, eventually evolving into larger genes that code for more complex proteins.

In a 2012 study, Carvunis, who was then a postdoc in the lab of Marc Vidal at the Dana-Farber Cancer Institute in Boston, and colleagues found that yeast translate more than 1000 short ORFs into proteins, implying that these sequences are protogenes. In a new study, Carvunis and her team tested whether young ORFs can be advantageous for cells. They genetically altered yeast to boost output of 285 recently evolved ORFs, most of which code for molecules that are smaller than the standard protein cutoff or just over it. For almost 10% of the proteins, increasing their levels enhanced cell growth in at least one environment. The results, posted on the preprint server bioRxiv, suggest these sequences could be on their way to becoming full-fledged genes, Carvunis says.

Slavoff still recalls being astonished when, during her interview for a postdoc position with Saghatelian, he asked whether she would be willing to go hunting for small proteins. "I had never thought that there could be this whole size of proteins that was dark to us until then."

But the bet paid off—she now runs her own lab that is searching for microproteins. Recently, she unleashed some of her postdocs and graduate students on one of the most studied organisms, the K12 strain of E. coli. The team soon uncovered five new microproteins. "We are probably only scratching the surface," she says.