No matter what language they speak, people try to make themselves understood with a minimum of effort.

No matter what language they speak, people try to make themselves understood with a minimum of effort.


All languages have evolved to have this in common

Have you ever wondered why you say “The boy is playing Frisbee with his dog” instead of “The boy dog his is Frisbee playing with”?  You may be trying to give your brain a break, according to a new study. An analysis of 37 widely varying tongues finds that, despite the apparent great differences among them, they share what might be a universal feature of human language: All of them have evolved to make communication as efficient as possible.

Earth is a veritable Tower of Babel: Up to 7000 languages are still spoken across the globe, belonging to roughly 150 language families. And they vary widely in the way they put sentences together. For example, the three major building blocks of a sentence, subject (S), verb (V), and object (O), can come in three different orders. English and French are SVO languages, whereas German and Japanese are SOV languages; a much smaller number, such as Arabic and Hebrew, use the VSO order. (No well-documented languages start sentences or clauses with the object, although some linguists have jokingly suggested that Klingon might do so.)

Yet despite these different ways of structuring sentences, previous studies of a limited number of languages have shown that they tend to limit the distance between words that depend on each other for their meaning. Such “dependency” is key if sentences are to make sense.

For example, in the sentence “Jane threw out the trash,” the word “Jane” is dependent on “threw”—it modifies the verb by telling us who was doing the throwing, just as we need “trash” to know what was thrown, and “out” to know where the trash went. Although “threw” and “trash” are three words away from each other, we can still understand the sentence easily.

But we might have more trouble understanding a sentence like “Jane threw the old trash sitting in the kitchen out,” because now “threw” and “trash” are four words apart and “threw” and “out” are eight words apart. We can shorten those dependency distances, and make the sentence clearer, by changing it to read “Jane threw out the old trash sitting in the kitchen.”

Sentences A and B are the same length and use the same words, as do C and D. But the dependency lengths of the second sentence in each pair are longer.

R. Futrell et al., PNAS (2015)

These observations had led some linguists to hypothesize that all of the world’s languages reduce the distance between dependent words, something called dependency length minimization (DLM). Yet the most comprehensive previous studies of this trend only covered seven languages. Although most of them did show at least some evidence for DLM, the support for it in German was weak. That finding raised doubts about whether DLM really was a universal feature of human language.

To try to resolve the question, a team led by Richard Futrell, a linguist at the Massachusetts Institute of Technology in Cambridge, analyzed 37 languages from 10 different language families to see how much they minimized dependency lengths over what would be expected by chance. In addition to major languages such as English, German, French, and Spanish, the database also included ancient Greek, Arabic, Basque, Tamil, and Telugu, one of India’s classical languages. For most of the languages, the researchers used written prose from newspapers, novels, and blogs, although for ancient Greek and Latin they relied on poetry. They crunched thousands of sentences using software designed to measure dependency lengths.

The results, published online today in the Proceedings of the National Academy of Sciences, demonstrate that all 37 languages, including German, minimize dependency lengths to degrees greater than expected by chance. Nevertheless, the team found wide variations in the extent of DLM. Thus Italian, Indonesian, and Irish showed high degrees of minimization, whereas Japanese, Korean, and Turkish showed much less. In general, SOV languages like German tend to have longer dependency lengths, although this is not a hard-and-fast rule.

Just why these variations exist is a topic for future research, the authors say. But they point out that German and many other SOV languages employ a linguistic device called “case marking,” a modification of key words in a sentence that makes it easier to distinguish the subject from the object. For example, whereas English speakers must say either “John kisses Mary” or “Mary kisses John” to know who is kissing whom, in Japanese one can say “John Mary kiss” because the case marking will make it clear. (English, an SVO language that generally does not use case markings, nevertheless has some vestiges of it from its origins in the Germanic Old English: We say “He threw the ball to her” rather than “He threw the ball to she” to make it absolutely clear who is the subject and who is the object.)

Limiting dependency length is advantageous, Futrell says, because convoluted sentences require more memory processing—and thus more energy—for both listeners and speakers who are trying to understand and be understood. Thus it makes sense that short dependency lengths became a universal feature in human language. “As language users, we have a choice of many ways of expressing ourselves,” Futrell says. “What languages don’t do is force you” into inefficient and energy-wasting use of memory stores.

The new work is a “major advance” because “it shows that DLM is a property of languages in general,” says David Temperley, a cognitive scientist at the University of Rochester (U of R) in New York. Nevertheless, he stops short of concluding that it is a “universal” or “hard-wired” feature of language, rather than a strategy that humans have developed over time to make themselves better understood. Florian Jaeger, a psycholinguist also at U of R, agrees. Jaeger says that the current paper, along with other recent research, shows that although “the bias towards efficiency is a strong factor in explaining” common features of the world’s languages, “finding a potentially universal pattern does not necessarily” mean that it is “genetically encoded.”