New Twitter Study Discovers “Global Superdialects”

Do you say sneakers, gym shoes, or trainers? Soda, pop, or fizzy drink? Your choice has a lot to do with where you’re from. Certain terms vary by region, and it should be possible to get a good picture of regional differences in vocabulary by searching for these terms on Twitter and plotting where they come from using geolocation data.

As MIT Technology Review reports, a new study did just that for variable terms in Spanish. As expected, terms known to distinguish various dialects of Spanish mapped well, in tweets, to the areas they are commonly associated with. For example, the map above shows that a computer is called a computadora in Mexico, an ordenador in Spain and a computador in Chile. The different terms for car—auto, carro, coche, concho, and movi—are also mapped. The size of the dots corresponds to the number of tweets with that term.

But researchers Bruno Gonçalves and David Sánchez also found something unexpected when they combined the data on all the words together. There were two main dialect groups, and they were divided not by region, but by population density. There were two “superdialects”—one in dense, urban centers, and another in smaller towns and rural areas. The rural areas “keep a larger number of characteristic items and native words,” while cities, more subject to the forces of globalization, tend toward “dialect unification, smoothing possible lexical differences.” The urban superdialect is a less differentiated, international Spanish, and the rural superdialect is more varied and less subject to international leveling, despite the fact that everyone in the study is using Twitter.

We don’t speak differently just because we live in different places, but because we live differently. This is something sociolinguists have known for a long time. Advances in techniques for analyzing the huge amount of language data on Twitter offer new ways to look at how our lives influence our language.

The original paper is here.