How Do Computers Understand Speech?

More and more, we can get computers to do things for us by talking to them. A computer can call your mother when you tell it to, find you a pizza place when you ask for one, or write out an email that you dictate. Sometimes the computer gets it wrong, but a lot of the time it gets it right, which is amazing when you think about what a computer has to do to turn human speech into written words: turn tiny changes in air pressure into language. Computer speech recognition is very complicated and has a long history of development, but here, condensed for you, are the 7 basic things a computer has to do to understand speech.

1. Turn the movement of air molecules into numbers.


Wikimedia Commons

Sound comes into your ear or a microphone as changes in air pressure, a continuous sound wave. The computer records a measurement of that wave at one point in time, stores it, and then measures it again. If it waits too long between measurements, it will miss important changes in the wave. To get a good approximation of a speech wave, it has to take a measurement at least 8000 times a second, but it works better if it takes one 44,100 times a second. This process is otherwise known as digitization at 8kHz or 44.1kHz.

2. Figure out which parts of the sound wave are speech.

When the computer takes measurements of air pressure changes, it doesn't know which ones are caused by speech, and which are caused by passing cars, rustling fabric, or the hum of hard drives. A variety of mathematical operations are performed on the digitized sound wave to filter out the stuff that doesn't look like what we expect from speech. We kind of know what to expect from speech, but not enough to make separating the noise out an easy task.

3. Pick out the parts of the sound wave that help tell speech sounds apart.


Wikimedia Commons

A sound wave from speech is actually a very complex mix of multiple waves coming at different frequencies. The particular frequencies—how they change, and how strongly those frequencies are coming through—matter a lot in telling the difference between, say, an "ah" sound and an "ee" sound. More mathematical operations transform the complex wave into a numerical representation of the important features.

4. Look at small chunks of the digitized sound one after the other and guess what speech sound each chunk shows.

There are about 40 speech sounds, or phonemes, in English. The computer has a general idea of what each of them should look like because it has been trained on a bunch of examples. But not only do the characteristics of these phonemes vary with different speaker accents, they change depending on the phonemes next to them—the 't' in "star" looks different than the 't' in "city." The computer must have a model of each phoneme in a bunch of different contexts for it to make a good guess.

5. Guess possible words that could be made up of those phonemes.

The computer has a big list of words that includes the different ways they can be pronounced. It makes guesses about what words are being spoken by splitting up the string of phonemes into strings of permissible words. If it sees the sequence "hang ten," it shouldn't split it into "hey, ngten!" because "ngten" won't find a good match in the dictionary.

6. Determine the most likely sequence of words based on how people actually talk.

There are no word breaks in the speech stream. The computer has to figure out where to put them by finding strings of phonemes that match valid words. There can be multiple guesses about what English words make up the speech stream, but not all of them will make good sequences of words. "What do cats like for breakfast?" could be just as good a guess as "water gaslight four brick vast?" if words are the only consideration. The computer applies models of how likely one word is to follow the next in order to determine which word string is the best guess. Some systems also take into account other information, like dependencies between words that are not next to each other. But the more information you want to use, the more processing power you need.

7. Take action

Once the computer has decided which guesses to go with, it can take action. In the case of dictation software, it will print the guess to the screen. In the case of a customer service phone line, it will try to match the guess to one of its pre-set menu items. In the case of Siri, it will make a call, look up something on the Internet, or try to come up with an answer to match the guess. As anyone who has used speech recognition software knows, mistakes happen. All the complicated statistics and mathematical transformations might not prevent "recognize speech" from coming out as "wreck a nice beach," but for a computer to pluck either one of those phrases out of the air is still pretty incredible.

nextArticle.image_alt|e
iStock
'Lime Disease' Could Give You a Nasty Rash This Summer
iStock
iStock

A cold Corona or virgin margarita is best enjoyed by the pool, but watch where you’re squeezing those limes. As Slate illustrates in a new video, there’s a lesser-known “lime disease,” and it can give you a nasty skin rash if you’re not careful.

When lime juice comes into contact with your skin and is then exposed to UV rays, it can cause a chemical reaction that results in phytophotodermatitis. It looks a little like a poison ivy reaction or sun poisoning, and some of the symptoms include redness, blistering, and inflammation. It’s the same reaction caused by a corrosive sap on the giant hogweed, an invasive weed that’s spreading throughout the U.S.

"Lime disease" may sound random, but it’s a lot more common than you might think. Dermatologist Barry D. Goldman tells Slate he sees cases of the skin condition almost daily in the summer. Some people have even reported receiving second-degree burns as a result of the citric acid from lime juice. According to the Mayo Clinic, the chemical that causes phytophotodermatitis can also be found in wild parsnip, wild dill, wild parsley, buttercups, and other citrus fruits.

To play it safe, keep your limes confined to the great indoors or wash your hands with soap after handling the fruit. You can learn more about phytophotodermatitis by checking out Slate’s video below.

[h/t Slate]

nextArticle.image_alt|e
iStock
Why Eating From a Smaller Plate Might Not Be an Effective Dieting Trick 
iStock
iStock

It might be time to rewrite the diet books. Israeli psychologists have cast doubt on the widespread belief that eating from smaller plates helps you control food portions and feel fuller, Scientific American reports.

Past studies have shown that this mind trick, called the Delboeuf illusion, influences the amount of food that people eat. In one 2012 study, participants who were given larger bowls ended up eating more soup overall than those given smaller bowls.

However, researchers from Ben-Gurion University in Negev, Israel, concluded in a study published in the journal Appetite that the effectiveness of the illusion depends on how empty your stomach is. The team of scientists studied two groups of participants: one that ate three hours before the experiment, and another that ate one hour prior. When participants were shown images of pizzas on serving trays of varying sizes, the group that hadn’t eaten in several hours was more accurate in assessing the size of pizzas. In other words, the hungrier they were, the less likely they were to be fooled by the different trays.

However, both groups were equally tricked by the illusion when they were asked to estimate the size of non-food objects, such as black circles inside of white circles and hubcaps within tires. Researchers say this demonstrates that motivational factors, like appetite, affects how we perceive food. The findings also dovetail with the results of an earlier study, which concluded that overweight people are less likely to fall for the illusion than people of a normal weight.

So go ahead and get a large plate every now and then. At the very least, it may save you a second trip to the buffet table.

[h/t Scientific American]

SECTIONS

arrow
LIVE SMARTER
More from mental floss studios