Speak or write in English, and the world will hear you. Speak or write in Tamil or Portuguese, and you may have a harder time getting your message out. Now, a new method for mapping how information flows around the globe identifies the best languages to spread your ideas far and wide. One hint: If you’re considering a second language, try Spanish instead of Chinese.
The study was spurred by a conversation about an untranslated book, says Shahar Ronen, a Microsoft program manager whose Massachusetts Institute of Technology (MIT) master’s thesis formed the basis of the new work. A bilingual Hebrew-English speaker from Israel, he told his MIT adviser, César Hidalgo (himself a Spanish-English speaker), about a book written in Hebrew whose translation into English he wasn’t yet aware of. “I was able to bridge a certain culture gap because I was multilingual,” Ronen says. He began thinking about how to create worldwide maps of how multilingual people transmit information and ideas.
Ronen and co-authors from MIT, Harvard University, Northeastern University, and Aix-Marseille University tackled the problem by describing three global language networks based on bilingual tweeters, book translations, and multilingual Wikipedia edits. The book translation network maps how many books are translated into other languages. For example, the Hebrew book, translated from Hebrew into English and German, would be represented in lines pointing from a node of Hebrew to nodes of English and German. That network is based on 2.2 million translations of printed books published in more than 1000 languages. As in all of the networks, the thickness of the lines represents the number of connections between nodes. For tweets, the researchers used 550 million tweets by 17 million users in 73 languages. In that network, if a user tweets in, say, Hindi as well as in English, the two languages are connected. To build the Wikipedia network, the researchers tracked edits in up to five languages done by editors, carefully excluding bots.
In all three networks, English has the most transmissions to and from other languages and is the most central hub, the team reports online today in the Proceedings of the National Academy of Sciences. But the maps also reveal “a halo of intermediate hubs,” according to the paper, such as French, German, and Russian, which serve the same function at a different scale.
In contrast, some languages with large populations of speakers, such as Mandarin, Hindi, and Arabic, are relatively isolated in these networks. This means that fewer communications in those languages reach speakers of other languages. Meanwhile, a language like Dutch—spoken by 27 million people—can be a disproportionately large conduit, compared with a language like Arabic, which has a whopping 530 million native and second-language speakers. This is because the Dutch are very multilingual and very online.
The network maps show what is already widely known: If you want to get your ideas out, you can reach a lot of people through the English language. But the maps also show how speakers in disparate languages benefit from being indirectly linked through hub languages large and small. On Twitter, for example, ideas in Filipino can theoretically move to the Korean-speaking sphere through Malay, whereas the most likely path for ideas to go from Turkish to Malayalam (spoken in India by 35 million people) is through English. These networks are revealed in detail at the study’s website.
The authors note that the users they studied, whom they consider elite because—unlike most people in the world—they are literate and online, do not represent all the speakers of a language. However, “the elites of global languages have a disproportionate amount of power and responsibility, because they are tacitly shaping the way in which distant cultures see each other—even if this is not their goal,” Hidalgo says. When conflict in Ukraine flared this past summer, most people in the world learned about it through news stories originally written in English and then translated to other languages. In this case, “any implicit bias or angle taken by the English media will color the information about the conflict that is available to many non-English speakers,” Hidalgo says.
The networks potentially offer guidance to governments and other language communities that want to change their international role. “If I want my national language to be more prominent, then I should invest in translating more documents, encouraging more people to tweet in their national language,” Ronen says. “On the other side, if I want our ideas to spread, we should pick a second language that’s very well connected.”
For non-English speakers, the choice of English as second or third language is an obvious one. For English speakers, the analysis suggests it would be more advantageous to choose Spanish over Chinese—at least if they’re spreading their ideas through writing.
The problem of measuring the relative status of the world’s languages "is a very tricky one, and often very hard to get good data about,” says Mark Davis, the president and co-founder of the Unicode Consortium in Mountain View, California, which does character encoding for the world’s computers and mobile devices. “Their perspective on the problem is interesting and useful.”
Cultural transmission happens in spoken language too, points out William Rivers, the executive director of the nonprofit Joint National Committee for Languages and the National Council for Languages and International Studies in Garrett Park, Maryland. Data on interactions in, say, the souks of Marrakech, where people speak Arabic, Hassaniya, Moroccan Arabic, French, Tashelhit, and other languages, are impossible to get but important in cultural transmission, he says. He adds that “as the Internet has become more available to more people around the world, they go online in their own languages.” When they do, now they know how to connect to other languages and move their ideas, too.