Original image

4 Ideas From Linguistics to Help You Appreciate Arrival

Original image

Spoiler Warning: If you haven't seen Arrival and plan to soon, you might want to save this article for after.

The most exciting thing about Denis Villeneuve’s new sci-fi space-encounter movie isn’t the aliens or the spaceships or the worldwide panic they bring on. It’s the fact that the hero is a linguistics professor!

It’s nice to feel that your seemingly esoteric field is actually the key to saving humankind. Even better if a film about it can get more people interested in the science of language structure. The film’s linguist, Louise Banks, played by Amy Adams, is charged with figuring out the language of the aliens who have landed on earth. She needs to do this in order to find out what they want.

How would one go about decoding a language that nobody knows? Field linguists—those who go out into the world to analyze little-known languages—have developed techniques for doing this kind of thing. The filmmakers consulted with McGill University linguist Jessica Coon, who herself has worked in the field on native languages of Mexico and Canada.

The problem of interpreting an unfamiliar language becomes a lot harder when dealing with creatures that don’t share our human bodies or articulators, much less a common frame of reality or physical environment, but that’s no reason not to start with the basics of linguistic communication that we do have a handle on. Here are four important concepts from linguistics that help Dr. Banks do the job she needs to do in Arrival.


At one point Colonel Weber (played by Forest Whitaker) asks Dr. Banks why she’s wasting time with a list of simple words like eat and walk when their priority is to find out what the purpose of the aliens’ visit is. A good field linguist knows you can’t just jump to abstract concepts like purpose without establishing the basics first. But what are the basics?

For decades, linguists have used variations on the Swadesh list, a list of basic concepts first put together in the 1950s by linguist Morris Swadesh. They include concepts like I and you, one and many, as well as objects and actions in the observable world like person, blood, fire, eat, sleep, and walk. They were chosen to be as universal as possible, and they can be indicated by pointing or pantomime or pictures, which makes it possible to ask for their words before proper linguistic question-asking has been figured out. Though the movie’s heptapods likely don’t share most of our universal, earth-bound concepts, it’s as good a place to start as any.


It might seem that the most important question to focus on when trying to analyze an unknown language is "what does this mean?" For a linguist, however, the most important question is "what are the units?" This is not because meaning is not useful, but because, while you can have meaning without language, you cannot have language without units. A sigh is meaningful, but not linguistic. It is not composed of discrete units, but an overall feel.

The concept of discreteness is one of the basic design features of human language. Linguistic utterances are patterns of combinations of smaller, meaningless units (sounds, or in the movie’s case, parts of ink blots) that reoccur in other utterances in different combinations with different meanings. When Dr. Banks sits down to analyze the circular ink blots the heptapods have thrown out, she marks up specific parts of them. She is not viewing them as analog, holistic pictures of meaning, but as compositions of parts, and she expects those parts to occur in other ink blots.


The concept of the minimal pair is crucial for figuring out what the units of a specific language are. An English speaker will say that car, whether it's pronounced with a regular r or a rolled r, means the same thing (even if the rolled r sounds a bit strange). A Spanish speaker will say that caro means something different with a rolled r (caro "expensive" vs. carro "car"). The rolled r in English is just a different pronunciation of the same unit. In Spanish, it’s a different unit.

A minimal pair is a pair of words that differ in meaning because one sound has changed. The existence of a minimal pair shows that the differing sound is a crucial element of the language’s structure. In one scene in the movie, Dr. Banks notes that two ink blots are exactly the same except for a little hook on the end. That’s how she knows the hook does something important. With that knowledge, she can put it in the known inventory of units for heptapod, and look for it in other utterances.


The linguistic current running through the heart of the movie is a version of what’s come to be known as the Sapir-Whorf hypothesis, most simply explained as the idea that the language you speak influences the way you think. This idea is controversial, since it has been demonstrated that languages do not restrict or constrain what people are able to perceive. However, a milder version of the theory holds that language can lay down default ways of categorizing experience that are easily shaken off if required.

We see the extreme version of Sapir-Whorf played out in the way that the perceptive abilities of Dr. Banks are completely transformed by the act of her learning the heptapod language. Her conception of time is altered by language.

The origins of the Sapir-Whorf hypothesis trace back to an analysis by Benjamin Whorf of the concept of time in the Native American language Hopi. He argued that where the linguistic devices of European languages express time as a continuum from past to present to future, with time units like days, weeks, and years conceived of as objects, the Hopi language distinguishes only between the experienced and the not experienced, and does not conceive of stretches of time as objects. There are no days in Hopi, only the return of the sun.

Whorf’s analysis has been challenged by later Hopi scholars, but it is clear that the language does handle the idea of linguistic tense in a way that is difficult to grasp for speakers of European languages. Assuming that that means we live in a different reality with respect to time is taking things way too far. But who ever said the world of fiction wasn’t allowed to take things too far?

If you find the real ideas behind the movie intriguing, or just want to get more familiar with the exciting world of linguist-heroes, check out this collection of real world resources listed by Gretchen McCulloch.

Original image
iStock // Ekaterina Minaeva
Man Buys Two Metric Tons of LEGO Bricks; Sorts Them Via Machine Learning
Original image
iStock // Ekaterina Minaeva

Jacques Mattheij made a small, but awesome, mistake. He went on eBay one evening and bid on a bunch of bulk LEGO brick auctions, then went to sleep. Upon waking, he discovered that he was the high bidder on many, and was now the proud owner of two tons of LEGO bricks. (This is about 4400 pounds.) He wrote, "[L]esson 1: if you win almost all bids you are bidding too high."

Mattheij had noticed that bulk, unsorted bricks sell for something like €10/kilogram, whereas sets are roughly €40/kg and rare parts go for up to €100/kg. Much of the value of the bricks is in their sorting. If he could reduce the entropy of these bins of unsorted bricks, he could make a tidy profit. While many people do this work by hand, the problem is enormous—just the kind of challenge for a computer. Mattheij writes:

There are 38000+ shapes and there are 100+ possible shades of color (you can roughly tell how old someone is by asking them what lego colors they remember from their youth).

In the following months, Mattheij built a proof-of-concept sorting system using, of course, LEGO. He broke the problem down into a series of sub-problems (including "feeding LEGO reliably from a hopper is surprisingly hard," one of those facts of nature that will stymie even the best system design). After tinkering with the prototype at length, he expanded the system to a surprisingly complex system of conveyer belts (powered by a home treadmill), various pieces of cabinetry, and "copious quantities of crazy glue."

Here's a video showing the current system running at low speed:

The key part of the system was running the bricks past a camera paired with a computer running a neural net-based image classifier. That allows the computer (when sufficiently trained on brick images) to recognize bricks and thus categorize them by color, shape, or other parameters. Remember that as bricks pass by, they can be in any orientation, can be dirty, can even be stuck to other pieces. So having a flexible software system is key to recognizing—in a fraction of a second—what a given brick is, in order to sort it out. When a match is found, a jet of compressed air pops the piece off the conveyer belt and into a waiting bin.

After much experimentation, Mattheij rewrote the software (several times in fact) to accomplish a variety of basic tasks. At its core, the system takes images from a webcam and feeds them to a neural network to do the classification. Of course, the neural net needs to be "trained" by showing it lots of images, and telling it what those images represent. Mattheij's breakthrough was allowing the machine to effectively train itself, with guidance: Running pieces through allows the system to take its own photos, make a guess, and build on that guess. As long as Mattheij corrects the incorrect guesses, he ends up with a decent (and self-reinforcing) corpus of training data. As the machine continues running, it can rack up more training, allowing it to recognize a broad variety of pieces on the fly.

Here's another video, focusing on how the pieces move on conveyer belts (running at slow speed so puny humans can follow). You can also see the air jets in action:

In an email interview, Mattheij told Mental Floss that the system currently sorts LEGO bricks into more than 50 categories. It can also be run in a color-sorting mode to bin the parts across 12 color groups. (Thus at present you'd likely do a two-pass sort on the bricks: once for shape, then a separate pass for color.) He continues to refine the system, with a focus on making its recognition abilities faster. At some point down the line, he plans to make the software portion open source. You're on your own as far as building conveyer belts, bins, and so forth.

Check out Mattheij's writeup in two parts for more information. It starts with an overview of the story, followed up with a deep dive on the software. He's also tweeting about the project (among other things). And if you look around a bit, you'll find bulk LEGO brick auctions online—it's definitely a thing!

Original image
One Bite From This Tick Can Make You Allergic to Meat
Original image

We like to believe that there’s no such thing as a bad organism, that every creature must have its place in the world. But ticks are really making that difficult. As if Lyme disease wasn't bad enough, scientists say some ticks carry a pathogen that causes a sudden and dangerous allergy to meat. Yes, meat.

The Lone Star tick (Amblyomma americanum) mostly looks like your average tick, with a tiny head and a big fat behind, except the adult female has a Texas-shaped spot on its back—thus the name.

Unlike other American ticks, the Lone Star feeds on humans at every stage of its life cycle. Even the larvae want our blood. You can’t get Lyme disease from the Lone Star tick, but you can get something even more mysterious: the inability to safely consume a bacon cheeseburger.

"The weird thing about [this reaction] is it can occur within three to 10 or 12 hours, so patients have no idea what prompted their allergic reactions," allergist Ronald Saff, of the Florida State University College of Medicine, told Business Insider.

What prompted them was STARI, or southern tick-associated rash illness. People with STARI may develop a circular rash like the one commonly seen in Lyme disease. They may feel achy, fatigued, and fevered. And their next meal could make them very, very sick.

Saff now sees at least one patient per week with STARI and a sensitivity to galactose-alpha-1, 3-galactose—more commonly known as alpha-gal—a sugar molecule found in mammal tissue like pork, beef, and lamb. Several hours after eating, patients’ immune systems overreact to alpha-gal, with symptoms ranging from an itchy rash to throat swelling.

Even worse, the more times a person is bitten, the more likely it becomes that they will develop this dangerous allergy.

The tick’s range currently covers the southern, eastern, and south-central U.S., but even that is changing. "We expect with warming temperatures, the tick is going to slowly make its way northward and westward and cause more problems than they're already causing," Saff said. We've already seen that occur with the deer ticks that cause Lyme disease, and 2017 is projected to be an especially bad year.

There’s so much we don’t understand about alpha-gal sensitivity. Scientists don’t know why it happens, how to treat it, or if it's permanent. All they can do is advise us to be vigilant and follow basic tick-avoidance practices.

[h/t Business Insider]