CLOSE

How Do Computers Understand Speech?

More and more, we can get computers to do things for us by talking to them. A computer can call your mother when you tell it to, find you a pizza place when you ask for one, or write out an email that you dictate. Sometimes the computer gets it wrong, but a lot of the time it gets it right, which is amazing when you think about what a computer has to do to turn human speech into written words: turn tiny changes in air pressure into language. Computer speech recognition is very complicated and has a long history of development, but here, condensed for you, are the 7 basic things a computer has to do to understand speech.

1. Turn the movement of air molecules into numbers.


Wikimedia Commons

Sound comes into your ear or a microphone as changes in air pressure, a continuous sound wave. The computer records a measurement of that wave at one point in time, stores it, and then measures it again. If it waits too long between measurements, it will miss important changes in the wave. To get a good approximation of a speech wave, it has to take a measurement at least 8000 times a second, but it works better if it takes one 44,100 times a second. This process is otherwise known as digitization at 8kHz or 44.1kHz.

2. Figure out which parts of the sound wave are speech.

When the computer takes measurements of air pressure changes, it doesn't know which ones are caused by speech, and which are caused by passing cars, rustling fabric, or the hum of hard drives. A variety of mathematical operations are performed on the digitized sound wave to filter out the stuff that doesn't look like what we expect from speech. We kind of know what to expect from speech, but not enough to make separating the noise out an easy task.

3. Pick out the parts of the sound wave that help tell speech sounds apart.


Wikimedia Commons

A sound wave from speech is actually a very complex mix of multiple waves coming at different frequencies. The particular frequencies—how they change, and how strongly those frequencies are coming through—matter a lot in telling the difference between, say, an "ah" sound and an "ee" sound. More mathematical operations transform the complex wave into a numerical representation of the important features.

4. Look at small chunks of the digitized sound one after the other and guess what speech sound each chunk shows.

There are about 40 speech sounds, or phonemes, in English. The computer has a general idea of what each of them should look like because it has been trained on a bunch of examples. But not only do the characteristics of these phonemes vary with different speaker accents, they change depending on the phonemes next to them—the 't' in "star" looks different than the 't' in "city." The computer must have a model of each phoneme in a bunch of different contexts for it to make a good guess.

5. Guess possible words that could be made up of those phonemes.

The computer has a big list of words that includes the different ways they can be pronounced. It makes guesses about what words are being spoken by splitting up the string of phonemes into strings of permissible words. If it sees the sequence "hang ten," it shouldn't split it into "hey, ngten!" because "ngten" won't find a good match in the dictionary.

6. Determine the most likely sequence of words based on how people actually talk.

There are no word breaks in the speech stream. The computer has to figure out where to put them by finding strings of phonemes that match valid words. There can be multiple guesses about what English words make up the speech stream, but not all of them will make good sequences of words. "What do cats like for breakfast?" could be just as good a guess as "water gaslight four brick vast?" if words are the only consideration. The computer applies models of how likely one word is to follow the next in order to determine which word string is the best guess. Some systems also take into account other information, like dependencies between words that are not next to each other. But the more information you want to use, the more processing power you need.

7. Take action

Once the computer has decided which guesses to go with, it can take action. In the case of dictation software, it will print the guess to the screen. In the case of a customer service phone line, it will try to match the guess to one of its pre-set menu items. In the case of Siri, it will make a call, look up something on the Internet, or try to come up with an answer to match the guess. As anyone who has used speech recognition software knows, mistakes happen. All the complicated statistics and mathematical transformations might not prevent "recognize speech" from coming out as "wreck a nice beach," but for a computer to pluck either one of those phrases out of the air is still pretty incredible.

nextArticle.image_alt|e
iStock
arrow
Live Smarter
Feeling Anxious? Just a Few Minutes of Meditation Might Help
iStock
iStock

Some say mindfulness meditation can cure anything. It might make you more compassionate. It can fix your procrastination habit. It could ward off germs and improve health. And it may boost your mental health and reduce stress, anxiety, depression, and pain.

New research suggests that for people with anxiety, mindfulness meditation programs could be beneficial after just one session. According to Michigan Technological University physiologist John Durocher, who presented his work during the annual Experimental Biology meeting in San Diego, California on April 23, meditation may be able to reduce the toll anxiety takes on the heart in just one session.

As part of the study, Durocher and his colleagues asked 14 adults with mild to moderate anxiety to participate in an hour-long guided meditation session that encouraged them to focus on their breathing and awareness of their thoughts.

The week before the meditation session, the researchers had measured the participants' cardiovascular health (through data like heart rate and the blood pressure in the aorta). They evaluated those same markers immediately after the session ended, and again an hour later. They also asked the participants how anxious they felt afterward.

Other studies have looked at the benefits of mindfulness after extended periods, but this one suggests that the effects are immediate. The participants showed significant reduction in anxiety after the single session, an effect that lasted up to a week afterward. The session also reduced stress on their arteries. Mindfulness meditation "could help to reduce stress on organs like the brain and kidneys and help prevent conditions such as high blood pressure," Durocher said in a press statement, helping protect the heart against the negative effects of chronic anxiety.

But other researchers have had a more cautious outlook on mindfulness research in general, and especially on studies as small as this one. In a 2017 article in the journal Perspectives on Psychological Science, a group of 15 different experts warned that mindfulness studies aren't always trustworthy. "Misinformation and poor methodology associated with past studies of mindfulness may lead public consumers to be harmed, misled, and disappointed," they wrote.

But one of the reasons that mindfulness can be so easy to hype is that it is such a low-investment, low-risk treatment. Much like dentists still recommend flossing even though there are few studies demonstrating its effectiveness against gum disease, it’s easy to tell people to meditate. It might work, but if it doesn't, it probably won't hurt you. (It should be said that in rare cases, some people do report having very negative experiences with meditation.) Even if studies have yet to show that it can definitively cure whatever ails you, sitting down and clearing your head for a few minutes probably won't hurt.

nextArticle.image_alt|e
Ted Cranford
arrow
science
Scientists Use a CT Scanner to Give Whales a Hearing Test
Ted Cranford
Ted Cranford

It's hard to study how whales hear. You can't just give the largest animals in the world a standard hearing test. But it's important to know, because noise pollution is a huge problem underwater. Loud sounds generated by human activity like shipping and drilling now permeate the ocean, subjecting animals like whales and dolphins to an unnatural din that interferes with their ability to sense and communicate.

New research presented at the 2018 Experimental Biology meeting in San Diego, California suggests that the answer lies in a CT scanner designed to image rockets. Scientists in San Diego recently used a CT scanner to scan an entire minke whale, allowing them to model how it and other whales hear.

Many whales rely on their hearing more than any other sense. Whales use sonar to detect the environment around them. Sound travels fast underwater and can carry across long distances, and it allows whales to sense both predators and potential prey over the vast territories these animals inhabit. It’s key to communicating with other whales, too.

A CT scan of two halves of a dead whale
Ted Cranford, San Diego State University

Human technology, meanwhile, has made the ocean a noisy place. The propellers and engines of commercial ships create chronic, low-frequency noise that’s within the hearing range of many marine species, including baleen whales like the minke. The oil and gas industry is a major contributor, not only because of offshore drilling, but due to seismic testing for potential drilling sites, which involves blasting air at the ocean floor and measuring the (loud) sound that comes back. Military sonar operations can also have a profound impact; so much so that several years ago, environmental groups filed lawsuits against the U.S. Navy over its sonar testing off the coasts of California and Hawaii. (The environmentalists won, but the new rules may not be much better.)

Using the CT scans and computer modeling, San Diego State University biologist Ted Cranford predicted the ranges of audible sounds for the fin whale and the minke. To do so, he and his team scanned the body of an 11-foot-long minke whale calf (euthanized after being stranded on a Maryland beach in 2012 and preserved) with a CT scanner built to detect flaws in solid-fuel rocket engines. Cranford and his colleague Peter Krysl had previously used the same technique to scan the heads of a Cuvier’s beaked whale and a sperm whale to generate computer simulations of their auditory systems [PDF].

To save time scanning the minke calf, Cranford and the team ended up cutting the whale in half and scanning both parts. Then they digitally reconstructed it for the purposes of the model.

The scans, which assessed tissue density and elasticity, helped them visualize how sound waves vibrate through the skull and soft tissue of a whale’s head. According to models created with that data, minke whales’ hearing is sensitive to a larger range of sound frequencies than previously thought. The whales are sensitive to higher frequencies beyond those of each other’s vocalizations, leading the researchers to believe that they may be trying to hear the higher-frequency sounds of orcas, one of their main predators. (Toothed whales and dolphins communicate at higher frequencies than baleen whales do.)

Knowing the exact frequencies whales can hear is an important part of figuring out just how much human-created noise pollution affects them. By some estimates, according to Cranford, the low-frequency noise underwater created by human activity has doubled every 10 years for the past half-century. "Understanding how various marine vertebrates receive and process low-frequency sound is crucial for assessing the potential impacts" of that noise, he said in a press statement.

SECTIONS

arrow
LIVE SMARTER
More from mental floss studios