Read our COVID-19 research and news.

Malte Mueller/Getty Images

No shared language? No problem! People across cultures understand clues from ‘vocal charades’

One of the hardest questions for evolutionary linguists is why humans speak at all. When people don’t share a language, they quickly resort to using their hands, rather than their voices: It’s easier to mime “drink” than it is to make a noise that sounds like drinking. Those gestures, over time, can easily blossom into full-fledged sign languages. “If gesture is good enough for language,” says Aleksandra Ćwiek, a linguistics Ph.D. student at the Leibniz-Centre General Linguistics, “why the hell do we talk?”

In a new study, Ćwiek and her colleagues help answer that question: People from very different cultures can understand nonlinguistic vocal clues better than expected by chance, they find. Speakers of 28 languages could all successfully guess meanings in a charadeslike game where other people expressed words like “water” using vocal sounds—but no language. 

The study bolsters a growing argument that vocal sounds, like gestures, can be “iconic”—mimicking some part of the idea they’re trying to convey—says Gareth Roberts, an evolutionary linguist at the University of Pennsylvania who wasn’t involved in the new work. For example, whereas a gesture for “eat” might mimic chewing food, a vocalization could mimic the noises of chewing itself—and unlike the word “eat,” both convey something about the act of eating. Smaller studies have shown similar effects, but “the true contribution of this [new] study lies in its scale,” Roberts says.

For their “vocal charades” experiment, researchers used recordings from a previous study, in which mostly English speakers came up with vocalizations for 30 words, including actions like “cut,” nouns like “child,” and more abstract ideas like “this” and “that.” They invented their sounds without using actual words or other linguistic conventions (such as saying “num num” to mean “eat”).

For some words, for example, “sleep,” the strategy was obvious—make a snoring sound. But for others, the best tactic wasn’t at all clear. For “fruit,” some people made a thumping noise, like a dropped apple hitting the ground. Others went for a crunching, slurping noise. And for an abstract word such as “good,” people often made a noise that changed in pitch from low to high; for “bad,” many made a pitch that went from high to low.

Then, Ćwiek and her colleagues asked a different group of 843 people to match the recorded sounds to their correct meanings. They tested speakers of 25 languages, including some that are closely related to English—like German and Swedish—and some that aren’t, like Chinese and Armenian.

They found that participants could guess general meanings surprisingly well. Each correct meaning was presented along with five incorrect options, so guessing at random would give participants a 17% chance of being right. But on average, people across all languages guessed correctly 65% of the time, they report this week in Scientific Reports. That’s enough to show participants often understood the clues, researchers say.

Some words were easier than others: Participants nearly always correctly guessed the sound for “sleep.” But guesses for the more abstract “that” and “gather” scraped in only just above chance. English speakers were correct 74% of the time, suggesting a shared culture helps, says senior author Marcus Perlman, a linguist at the University of Birmingham. But the lowest score, for Thai speakers, was 52%—still far above chance.

To cast the cultural net even wider, the researchers also tested participants in communities that seldom use written language. This included speakers of three additional languages in Vanuatu, French Guiana, and Brazil. Rather than asking people to choose the written word that matched the clue they heard, they were asked to choose pictures, limiting the test to concepts that could be shown in a photograph.

Again, people were surprisingly good at the task. Participants were correct at least 34% of the time (compared with 8% if they’d been right by chance), with Daakie speakers from Vanuatu getting nearly half the answers right. “It’s cool to think that we … can communicate meaning just with the sound of our voice,” Perlman says. “People don’t just make meaningless sounds.”

It’s a neat study, says Limor Raviv, an evolutionary linguist at the Dutch-speaking Free University of Brussels who wasn’t involved with the work. Although it’s simple, it challenges an old and central idea in linguistics: that there’s no relationship between the sounds that make up a word and the meaning of that word. For instance, there’s nothing about the word “cat” that is obviously connected to the animal. But this study adds to a growing pile of evidence that iconicity in speech isn’t limited to just the rare case of onomatopoeia, like “meow.”

If vocalization, like gesture, can convey meaning without being part of a language, it could have played a role in the emergence of early linguistic systems, Perlman says. The finding makes it possible for linguists to start to explore how vocalization and gesture might have worked in tandem in the evolution of language, Raviv says, rather than arguing about which came first: “It makes the mystery of the shift from gesture to spoken language obsolete.”