Original image

5 Behind-the-Scenes Secrets of How the Dictionary is Made

Original image

The dictionary is such a comforting, authoritative presence in our culture that it sometimes feels as if it’s always existed. But a dictionary must be made. How does a dictionary come to be? The answer is also the title to a new book by Merriam-Webster editor Kory Stamper: Word by Word.

Word by Word: The Secret Life of Dictionaries is a captivating inside look at the job of dictionary making from both a personal and historical angle. Stamper begins with the interview that landed her an entry-level position at America’s oldest dictionary maker and takes the reader along as she learns the ropes and works her way up, the process sometimes turning her original love of words on its head in ways she didn’t expect. Here are five behind-the-scenes secrets she shares about how the dictionary is made.


Dictionary entries require a million editorial decisions to be made about everything from font size and part-of-speech abbreviations to how to structure a definition. It can take a while to get comfortable in a set of guidelines, so it’s better start with H or one of the other letters toward the middle of the alphabet that doesn’t have as many words attached to it. If you work to the end from the middle, then tackle the beginning of the alphabet last, those letters will be “as close to stylistic perfection as possible,” Stamper writes, and it’s good to have the cleanest copy up front because “back in the days of yore when dictionaries were actually reviewed, reviewers would inevitably start looking at definitions in the first chunk of the book.”


While people tend to go to the dictionary to look up long, technical, or academic words, those aren’t particularly difficult for the definers. It’s the little words, “the ones that no one ever notices,” according to Stamper, that are the hardest to sort out. What do but, like, and as mean? What parts of speech are they? These are the questions that chain lexicographers to their desks in mental agony. Once you read about the exhausting whole month Stamper toiled on one little word—take—you will never take the little words for granted again.


It will also change the way you see matchbooks, shampoo bottles, barf bags—anything with text on it. Dictionaries must keep up with the language, and lexicographers are trained to notice not just new words, but emerging new uses of old words. They don't just sit around looking through great literature and scientific journals; every piece of text in the culture is another bit of evidence to consider. What sort of uses are quick and cook in the phrase “quick cook steel cut oats”? Evolving new uses? A good lexicographer cannot resist the impulse to clip out and file away the relevant piece of the oatmeal canister on which the phrase appears for future reference.


While anyone setting out to be a dictionary editor would benefit from having a firm grounding in grammar and etymology, it's not a requirement: Editors come from many backgrounds and fields not necessarily related to the study of language. What is a requirement is a quality so elusive yet specific that we had to borrow a German word for it. Sprachgefühl is a “feeling for language” or, Stamper writes,

"the odd buzzing in your brain that tells you that ‘planting the lettuce’ and ‘planting misinformation’ are different uses of ‘plant,’ the eye twitch that tells you that ‘plans to demo the store’ refers not to a friendly instructional stroll on how to shop but to a little exuberance with a sledgehammer. Not everyone has sprachgefühl, and you don’t know if you are possessed of it until you are knee-deep in the swamp of it."

Sprachgefühl can abandon even the best editors at times, putting them in an odd, confused state of “verbal fatigue” where they are driven to take a break and go searching for another human to confirm that they do, indeed, speak English.


A dictionary entry doesn’t just include a concise definition and information about pronunciation and grammar, it also gives example sentences that show how the word is typically used. Finding just the right sentence is an art, and much more difficult than you might think. In addition to finding the best, most straightforwardly boring illustration of a particular meaning, you must also be able to access your inner 12-year-old in order to detect any hint of double entendre. Examples like “I think we should do it!” or “That’s a big one” must be purged; to know that, you must view them through your naughtiest filter. Mistakes do happen sometimes. An example sentence at “cut” in the Merriam-Webster middle-school dictionary reads “Cheese cuts easily.” Stamper looks on the bright side: “I hope that it has given much joy to countless fart-obsessed middle schoolers and perhaps even convinced them that dictionaries are, if not cool, at least not boring and stupid.”

Find out more about the very cool, very not boring inside world of dictionary making in Word by Word

Original image
iStock // Ekaterina Minaeva
Man Buys Two Metric Tons of LEGO Bricks; Sorts Them Via Machine Learning
Original image
iStock // Ekaterina Minaeva

Jacques Mattheij made a small, but awesome, mistake. He went on eBay one evening and bid on a bunch of bulk LEGO brick auctions, then went to sleep. Upon waking, he discovered that he was the high bidder on many, and was now the proud owner of two tons of LEGO bricks. (This is about 4400 pounds.) He wrote, "[L]esson 1: if you win almost all bids you are bidding too high."

Mattheij had noticed that bulk, unsorted bricks sell for something like €10/kilogram, whereas sets are roughly €40/kg and rare parts go for up to €100/kg. Much of the value of the bricks is in their sorting. If he could reduce the entropy of these bins of unsorted bricks, he could make a tidy profit. While many people do this work by hand, the problem is enormous—just the kind of challenge for a computer. Mattheij writes:

There are 38000+ shapes and there are 100+ possible shades of color (you can roughly tell how old someone is by asking them what lego colors they remember from their youth).

In the following months, Mattheij built a proof-of-concept sorting system using, of course, LEGO. He broke the problem down into a series of sub-problems (including "feeding LEGO reliably from a hopper is surprisingly hard," one of those facts of nature that will stymie even the best system design). After tinkering with the prototype at length, he expanded the system to a surprisingly complex system of conveyer belts (powered by a home treadmill), various pieces of cabinetry, and "copious quantities of crazy glue."

Here's a video showing the current system running at low speed:

The key part of the system was running the bricks past a camera paired with a computer running a neural net-based image classifier. That allows the computer (when sufficiently trained on brick images) to recognize bricks and thus categorize them by color, shape, or other parameters. Remember that as bricks pass by, they can be in any orientation, can be dirty, can even be stuck to other pieces. So having a flexible software system is key to recognizing—in a fraction of a second—what a given brick is, in order to sort it out. When a match is found, a jet of compressed air pops the piece off the conveyer belt and into a waiting bin.

After much experimentation, Mattheij rewrote the software (several times in fact) to accomplish a variety of basic tasks. At its core, the system takes images from a webcam and feeds them to a neural network to do the classification. Of course, the neural net needs to be "trained" by showing it lots of images, and telling it what those images represent. Mattheij's breakthrough was allowing the machine to effectively train itself, with guidance: Running pieces through allows the system to take its own photos, make a guess, and build on that guess. As long as Mattheij corrects the incorrect guesses, he ends up with a decent (and self-reinforcing) corpus of training data. As the machine continues running, it can rack up more training, allowing it to recognize a broad variety of pieces on the fly.

Here's another video, focusing on how the pieces move on conveyer belts (running at slow speed so puny humans can follow). You can also see the air jets in action:

In an email interview, Mattheij told Mental Floss that the system currently sorts LEGO bricks into more than 50 categories. It can also be run in a color-sorting mode to bin the parts across 12 color groups. (Thus at present you'd likely do a two-pass sort on the bricks: once for shape, then a separate pass for color.) He continues to refine the system, with a focus on making its recognition abilities faster. At some point down the line, he plans to make the software portion open source. You're on your own as far as building conveyer belts, bins, and so forth.

Check out Mattheij's writeup in two parts for more information. It starts with an overview of the story, followed up with a deep dive on the software. He's also tweeting about the project (among other things). And if you look around a bit, you'll find bulk LEGO brick auctions online—it's definitely a thing!

Original image
Cs California, Wikimedia Commons // CC BY-SA 3.0
How Experts Say We Should Stop a 'Zombie' Infection: Kill It With Fire
Original image
Cs California, Wikimedia Commons // CC BY-SA 3.0

Scientists are known for being pretty cautious people. But sometimes, even the most careful of us need to burn some things to the ground. Immunologists have proposed a plan to burn large swaths of parkland in an attempt to wipe out disease, as The New York Times reports. They described the problem in the journal Microbiology and Molecular Biology Reviews.

Chronic wasting disease (CWD) is a gruesome infection that’s been destroying deer and elk herds across North America. Like bovine spongiform encephalopathy (BSE, better known as mad cow disease) and Creutzfeldt-Jakob disease, CWD is caused by damaged, contagious little proteins called prions. Although it's been half a century since CWD was first discovered, scientists are still scratching their heads about how it works, how it spreads, and if, like BSE, it could someday infect humans.

Paper co-author Mark Zabel, of the Prion Research Center at Colorado State University, says animals with CWD fade away slowly at first, losing weight and starting to act kind of spacey. But "they’re not hard to pick out at the end stage," he told The New York Times. "They have a vacant stare, they have a stumbling gait, their heads are drooping, their ears are down, you can see thick saliva dripping from their mouths. It’s like a true zombie disease."

CWD has already been spotted in 24 U.S. states. Some herds are already 50 percent infected, and that number is only growing.

Prion illnesses often travel from one infected individual to another, but CWD’s expansion was so rapid that scientists began to suspect it had more than one way of finding new animals to attack.

Sure enough, it did. As it turns out, the CWD prion doesn’t go down with its host-animal ship. Infected animals shed the prion in their urine, feces, and drool. Long after the sick deer has died, others can still contract CWD from the leaves they eat and the grass in which they stand.

As if that’s not bad enough, CWD has another trick up its sleeve: spontaneous generation. That is, it doesn’t take much damage to twist a healthy prion into a zombifying pathogen. The illness just pops up.

There are some treatments, including immersing infected tissue in an ozone bath. But that won't help when the problem is literally smeared across the landscape. "You cannot treat half of the continental United States with ozone," Zabel said.

And so, to combat this many-pronged assault on our wildlife, Zabel and his colleagues are getting aggressive. They recommend a controlled burn of infected areas of national parks in Colorado and Arkansas—a pilot study to determine if fire will be enough.

"If you eliminate the plants that have prions on the surface, that would be a huge step forward," he said. "I really don’t think it’s that crazy."

[h/t The New York Times]