CLOSE
Original image
iStock

Why the Pronunciation of GIF Really Can Go Either Way

Original image
iStock

How is language evolving on the internet? In this series on internet linguistics, Gretchen McCulloch breaks down the latest innovations in online communication.

Do you pronounce gif with a hard g as in get or with a soft g as in gem? It's a question that people won't stop arguing about on the internet.

But why are we so confused? Why is each camp so passionate about being right? And is there in fact a right way to pronounce gif?

Sure, the creator of the gif, Steve Wilhite, prefers a soft g, and sure, gif originated as an acronym for graphics interchange format, but inventors aren't always good at naming (the zipper was originally called the "clasp locker"), and acronyms aren't always pronounced like their roots (the "a" in NATO isn't the same as the "a" in Atlantic). In truth, language is far more democratic.

So Michael Dow, a linguistics professor at Université de Montréal, decided to investigate a different way, and I talked with him about his findings. The idea is, people decide how to pronounce a new word based on its resemblance to words they're already familiar with. So we can all agree on how to pronounce snapchat because it's made up of familiar words snap and chat, and we don't have any problems with blog because it rhymes with frog, log, slog, and so on, but we have no idea how to pronounce doge because there aren't any other common English words that end in -oge.

The problem with gif isn't the back half—we already know how to pronounce if. The problem is the front half: Does the i make the g soft or not? It's clearly not an absolute yes or no—there are English words in both categories: gift has a hard g before i, whereas gin has a soft g before i. What matters is the frequency. So Dow looked at a large corpus of 40,000 unique words with their frequency and pronunciation taken from The English Lexicon Project. Of these words, how many were like gift (hard g) and how many were like gin (soft g)?

Dow found 105 words in the corpus that had "gi" somewhere in their spelling, not counting variations on the same word, like gift/gifts or geography/geographical. At first glance, it looks like the gin group wins—there were 68 "gi" words that were pronounced with a soft g as in gin, but only half as many (37) that were pronounced with a hard g as in gift

Case closed? Not so fast. Although there are more soft g words, they don't get used as often. The least common word in the entire list was "tergiversate," which I had to look up—it apparently means "make conflicting or evasive statements; equivocate," and it's pronounced with a soft g. Rounding out the bottom eight are some more soft "gi" words you probably don't use every day: "gimcrack," "excogitate," "elegiac," "flibbertigibbet," "corrigible," "gibbet," and "giblet." Hard "gi" words don't show up until the ninth and tenth least common: "muggins" and "girt."

By contrast, the most-used words tend to be pronounced with hard g: Dow found that hard "gi" words were used overall around 10 thousand times in the corpus, whereas soft "gi" words were only used 4 thousand times. And our most-frequent list starts with four hard g words: "give" (#1), "begin" (#2), "girl" (#4) compared to "magic" (#3) and "engine" (#5). And "give" in particular is extraordinarily common—it's used almost four times as much as the next most common word, "begin." 

So in order to know what expectations we're approaching an unfamiliar "gi" word with, we need to balance the fact that there are twice as many soft g words but we use the hard g words twice as often—and it turns out, when Dow did a calculation known as the log frequency that does exactly this, the hard g words and the soft g words end up almost exactly the same.

And it doesn't matter what else we take into consideration. Want to compare only words that begin with "gi," to avoid the potential confounds of "magic" or "begin"? Again, when we take all factors into consideration, Dow found that they were the same.

Want to compare only monosyllables, and avoid "giant" or "forgive"? Yep, still the same.

In other words, when you see a new word starting with "gi," your previous exposure to "gi" words is basically telling you to flip a coin—it's just as likely that you'll decide to pronounce it with a hard g as with a soft g. And you'll never find an overwhelming enough piece of counter-evidence to get you to change your mind. Which probably means we'll be fighting the gif pronunciation war for generations to come.

Original image
iStock // Ekaterina Minaeva
arrow
technology
Man Buys Two Metric Tons of LEGO Bricks; Sorts Them Via Machine Learning
Original image
iStock // Ekaterina Minaeva

Jacques Mattheij made a small, but awesome, mistake. He went on eBay one evening and bid on a bunch of bulk LEGO brick auctions, then went to sleep. Upon waking, he discovered that he was the high bidder on many, and was now the proud owner of two tons of LEGO bricks. (This is about 4400 pounds.) He wrote, "[L]esson 1: if you win almost all bids you are bidding too high."

Mattheij had noticed that bulk, unsorted bricks sell for something like €10/kilogram, whereas sets are roughly €40/kg and rare parts go for up to €100/kg. Much of the value of the bricks is in their sorting. If he could reduce the entropy of these bins of unsorted bricks, he could make a tidy profit. While many people do this work by hand, the problem is enormous—just the kind of challenge for a computer. Mattheij writes:

There are 38000+ shapes and there are 100+ possible shades of color (you can roughly tell how old someone is by asking them what lego colors they remember from their youth).

In the following months, Mattheij built a proof-of-concept sorting system using, of course, LEGO. He broke the problem down into a series of sub-problems (including "feeding LEGO reliably from a hopper is surprisingly hard," one of those facts of nature that will stymie even the best system design). After tinkering with the prototype at length, he expanded the system to a surprisingly complex system of conveyer belts (powered by a home treadmill), various pieces of cabinetry, and "copious quantities of crazy glue."

Here's a video showing the current system running at low speed:

The key part of the system was running the bricks past a camera paired with a computer running a neural net-based image classifier. That allows the computer (when sufficiently trained on brick images) to recognize bricks and thus categorize them by color, shape, or other parameters. Remember that as bricks pass by, they can be in any orientation, can be dirty, can even be stuck to other pieces. So having a flexible software system is key to recognizing—in a fraction of a second—what a given brick is, in order to sort it out. When a match is found, a jet of compressed air pops the piece off the conveyer belt and into a waiting bin.

After much experimentation, Mattheij rewrote the software (several times in fact) to accomplish a variety of basic tasks. At its core, the system takes images from a webcam and feeds them to a neural network to do the classification. Of course, the neural net needs to be "trained" by showing it lots of images, and telling it what those images represent. Mattheij's breakthrough was allowing the machine to effectively train itself, with guidance: Running pieces through allows the system to take its own photos, make a guess, and build on that guess. As long as Mattheij corrects the incorrect guesses, he ends up with a decent (and self-reinforcing) corpus of training data. As the machine continues running, it can rack up more training, allowing it to recognize a broad variety of pieces on the fly.

Here's another video, focusing on how the pieces move on conveyer belts (running at slow speed so puny humans can follow). You can also see the air jets in action:

In an email interview, Mattheij told Mental Floss that the system currently sorts LEGO bricks into more than 50 categories. It can also be run in a color-sorting mode to bin the parts across 12 color groups. (Thus at present you'd likely do a two-pass sort on the bricks: once for shape, then a separate pass for color.) He continues to refine the system, with a focus on making its recognition abilities faster. At some point down the line, he plans to make the software portion open source. You're on your own as far as building conveyer belts, bins, and so forth.

Check out Mattheij's writeup in two parts for more information. It starts with an overview of the story, followed up with a deep dive on the software. He's also tweeting about the project (among other things). And if you look around a bit, you'll find bulk LEGO brick auctions online—it's definitely a thing!

Original image
Cs California, Wikimedia Commons // CC BY-SA 3.0
arrow
science
How Experts Say We Should Stop a 'Zombie' Infection: Kill It With Fire
Original image
Cs California, Wikimedia Commons // CC BY-SA 3.0

Scientists are known for being pretty cautious people. But sometimes, even the most careful of us need to burn some things to the ground. Immunologists have proposed a plan to burn large swaths of parkland in an attempt to wipe out disease, as The New York Times reports. They described the problem in the journal Microbiology and Molecular Biology Reviews.

Chronic wasting disease (CWD) is a gruesome infection that’s been destroying deer and elk herds across North America. Like bovine spongiform encephalopathy (BSE, better known as mad cow disease) and Creutzfeldt-Jakob disease, CWD is caused by damaged, contagious little proteins called prions. Although it's been half a century since CWD was first discovered, scientists are still scratching their heads about how it works, how it spreads, and if, like BSE, it could someday infect humans.

Paper co-author Mark Zabel, of the Prion Research Center at Colorado State University, says animals with CWD fade away slowly at first, losing weight and starting to act kind of spacey. But "they’re not hard to pick out at the end stage," he told The New York Times. "They have a vacant stare, they have a stumbling gait, their heads are drooping, their ears are down, you can see thick saliva dripping from their mouths. It’s like a true zombie disease."

CWD has already been spotted in 24 U.S. states. Some herds are already 50 percent infected, and that number is only growing.

Prion illnesses often travel from one infected individual to another, but CWD’s expansion was so rapid that scientists began to suspect it had more than one way of finding new animals to attack.

Sure enough, it did. As it turns out, the CWD prion doesn’t go down with its host-animal ship. Infected animals shed the prion in their urine, feces, and drool. Long after the sick deer has died, others can still contract CWD from the leaves they eat and the grass in which they stand.

As if that’s not bad enough, CWD has another trick up its sleeve: spontaneous generation. That is, it doesn’t take much damage to twist a healthy prion into a zombifying pathogen. The illness just pops up.

There are some treatments, including immersing infected tissue in an ozone bath. But that won't help when the problem is literally smeared across the landscape. "You cannot treat half of the continental United States with ozone," Zabel said.

And so, to combat this many-pronged assault on our wildlife, Zabel and his colleagues are getting aggressive. They recommend a controlled burn of infected areas of national parks in Colorado and Arkansas—a pilot study to determine if fire will be enough.

"If you eliminate the plants that have prions on the surface, that would be a huge step forward," he said. "I really don’t think it’s that crazy."

[h/t The New York Times]

SECTIONS
BIG QUESTIONS
arrow
BIG QUESTIONS
SECTIONS
WEATHER WATCH
BE THE CHANGE
JOB SECRETS
QUIZZES
WORLD WAR 1
SMART SHOPPING
STONES, BONES, & WRECKS
#TBT
THE PRESIDENTS
WORDS
RETROBITUARIES