CLOSE
Original image

How Do Computers Understand Speech?

Original image

More and more, we can get computers to do things for us by talking to them. A computer can call your mother when you tell it to, find you a pizza place when you ask for one, or write out an email that you dictate. Sometimes the computer gets it wrong, but a lot of the time it gets it right, which is amazing when you think about what a computer has to do to turn human speech into written words: turn tiny changes in air pressure into language. Computer speech recognition is very complicated and has a long history of development, but here, condensed for you, are the 7 basic things a computer has to do to understand speech.

1. Turn the movement of air molecules into numbers.


Wikimedia Commons

Sound comes into your ear or a microphone as changes in air pressure, a continuous sound wave. The computer records a measurement of that wave at one point in time, stores it, and then measures it again. If it waits too long between measurements, it will miss important changes in the wave. To get a good approximation of a speech wave, it has to take a measurement at least 8000 times a second, but it works better if it takes one 44,100 times a second. This process is otherwise known as digitization at 8kHz or 44.1kHz.

2. Figure out which parts of the sound wave are speech.

When the computer takes measurements of air pressure changes, it doesn't know which ones are caused by speech, and which are caused by passing cars, rustling fabric, or the hum of hard drives. A variety of mathematical operations are performed on the digitized sound wave to filter out the stuff that doesn't look like what we expect from speech. We kind of know what to expect from speech, but not enough to make separating the noise out an easy task.

3. Pick out the parts of the sound wave that help tell speech sounds apart.


Wikimedia Commons

A sound wave from speech is actually a very complex mix of multiple waves coming at different frequencies. The particular frequencies—how they change, and how strongly those frequencies are coming through—matter a lot in telling the difference between, say, an "ah" sound and an "ee" sound. More mathematical operations transform the complex wave into a numerical representation of the important features.

4. Look at small chunks of the digitized sound one after the other and guess what speech sound each chunk shows.

There are about 40 speech sounds, or phonemes, in English. The computer has a general idea of what each of them should look like because it has been trained on a bunch of examples. But not only do the characteristics of these phonemes vary with different speaker accents, they change depending on the phonemes next to them—the 't' in "star" looks different than the 't' in "city." The computer must have a model of each phoneme in a bunch of different contexts for it to make a good guess.

5. Guess possible words that could be made up of those phonemes.

The computer has a big list of words that includes the different ways they can be pronounced. It makes guesses about what words are being spoken by splitting up the string of phonemes into strings of permissible words. If it sees the sequence "hang ten," it shouldn't split it into "hey, ngten!" because "ngten" won't find a good match in the dictionary.

6. Determine the most likely sequence of words based on how people actually talk.

There are no word breaks in the speech stream. The computer has to figure out where to put them by finding strings of phonemes that match valid words. There can be multiple guesses about what English words make up the speech stream, but not all of them will make good sequences of words. "What do cats like for breakfast?" could be just as good a guess as "water gaslight four brick vast?" if words are the only consideration. The computer applies models of how likely one word is to follow the next in order to determine which word string is the best guess. Some systems also take into account other information, like dependencies between words that are not next to each other. But the more information you want to use, the more processing power you need.

7. Take action

Once the computer has decided which guesses to go with, it can take action. In the case of dictation software, it will print the guess to the screen. In the case of a customer service phone line, it will try to match the guess to one of its pre-set menu items. In the case of Siri, it will make a call, look up something on the Internet, or try to come up with an answer to match the guess. As anyone who has used speech recognition software knows, mistakes happen. All the complicated statistics and mathematical transformations might not prevent "recognize speech" from coming out as "wreck a nice beach," but for a computer to pluck either one of those phrases out of the air is still pretty incredible.

Original image
AMNH // R. Mickens
arrow
science
What It’s Like to Write an Opera About Dinosaurs
Original image
AMNH // R. Mickens

There are many challenges that face those writing the lyrics to operas, but figuring out what can rhyme with dinosaur names isn’t often one of them. But wrangling multisyllabic, Latin- and Greek-derived names of prehistoric creatures into verse was an integral part of Eric Einhorn’s job as the librettist behind Rhoda and the Fossil Hunt, a new, family-friendly opera currently running at the American Museum of Natural History in New York City.

Created by On Site Opera, which puts on operas in unusual places (like Madame Tussauds Wax Museum) across New York City, in conjunction with the Lyric Opera of Chicago and the Pittsburgh Opera, Rhoda and the Fossil Hunt follows the true story of Rhoda Knight and her grandfather, the famous paleoartist Charles R. Knight.

Knight worked as a freelance artist for the American Museum of Natural History from 1896 until his death in 1953, creating images of extinct species that paved the way for how we imagine dinosaurs even now. He studied with taxidermists and paleontology experts and was one of the first to paint dinosaurs as flesh-and-blood creatures in natural habitats rather than fantastical monsters, studying their bones and creating sculptural models to make his renderings as accurate as contemporary science made possible.

In the 20-minute opera, singers move around the museum’s Hall of Saurischian Dinosaurs, performing among skeletons and even some paintings by Knight himself. Einhorn, who also serves as the director of On Site Opera and stage director for the opera, wrote the libretto based on stories about the real-life Rhoda—who now goes by Rhoda Knight Kalt—whom he met with frequently during the development process.

Soprano Jennifer Zetland (Rhoda) sings in front of a dinosaur skeleton at the American Museum of Natural History.
AMNH // R. Mickens

“I spent a lot of time with Rhoda just talking about her childhood,” he tells Mental Floss, gathering anecdotes that could be worked into the opera. “She tells this great story of being in the museum when they were unpacking the wooly mammoth,” he says. "And she was just there, because her grandfather was there. It's being at the foot of greatness and not even realizing it until later.”

But there was one aspect of Rhoda’s childhood that proved to be a challenge in terms of turning her story into a performance. “Unfortunately, she was a really well-behaved kid,” Einhorn says. “And that doesn't really make for a good opera.”

Knight Kalt, who attended the opera’s dress rehearsal, explains that she knew at the time that if she misbehaved, she wouldn’t be allowed back. “I knew that the only way I could be with my grandfather was if I was very quiet,” she says. “Sometimes he would stand for an hour and a half discussing a fossil bone and how he could bring that alive … if I had interrupted then I couldn't meet him [at the museum anymore].”

Though Knight Kalt was never an artist herself, in the fictionalized version of her childhood (which takes place when Rhoda is 8), she looks around the museum for the missing bones of the dinosaur Deinocheirus so that her grandfather can draw them. The Late Cretaceous dino, first discovered in 1965, almost didn't make it into the show, though. In the first draft of the libretto, the dinosaur Rhoda is searching for in the museum was a relatively new dinosaur species found in China and first unveiled in 2015—zhenyuanlong suni—but the five-syllable name proved impossible to rhyme or sing.

Rhoda Knight Kalt stands next to the head of a dinosaur.
Rhoda Knight Kalt
Shaunacy Ferro

But Einhorn wanted to feature a real dinosaur discovery in the opera. A paleontologist at the museum, Carl Mehling, suggested Deinocheirus. “There are two arms hanging right over there,” Einhorn says, gesturing across the Hall of Saurischian Dinosaurs, “and until [recently] the arms were the only things that had ever been discovered about Deinocheirus.” Tying the opera back to an actual specimen in the museum—one only a few feet away from where the opera would be staged—opened up a whole new set of possibilities, both lyrically and otherwise. “Once we ironed that out, we knew we had good science and better rhyming words.”

As for Knight Kalt, she says the experience of watching her childhood unfold in operatic form was a little weird. “The whole story makes me laugh,” she says. But it was also a perfectly appropriate way to honor her grandfather. “He used to sing while he was painting,” she says. “He loved the opera.”

Performances of Rhoda and the Fossil Hunt will be performed at the American Museum of Natural History on Fridays, Saturdays, and Sundays until October 15. Performances are free with museum admission, but require a reservation. The opera will later travel to the Lyric Opera of Chicago and the Pittsburgh Opera.

Original image
iStock
arrow
Animals
11 Buoyant Facts About Humpback Whales
Original image
iStock

Humpback whales are some of the most intelligent animals on the planet. Hunted almost to extinction during the 19th and early 20th centuries, their populations are slowly recovering, and now they’re a favorite sight for whale-watchers. Here are 11 facts you might not have known about the mysterious marine giants, who are known for their acrobatics and for sidling right up alongside boats to get a good look at their human observers.

1. THEY’RE LONGER THAN A SCHOOL BUS.

North American school buses max out at about 45 feet long. Female humpback whales—which are larger than males—can be up to 60 feet long, and their pectoral fins alone can be 15 feet long. At birth, humpbacks weigh around 1 ton, doubling in size during their first year of life and eventually reaching up to 40 tons.

2. THEY HAVE HUGE MOUTHS.

In keeping with the rest of their bodies, their mouths are huge—their tongue alone is the size of a small car. But the opening to their throat is only about the size of a grapefruit, according to the Hawaiian Islands Humpback Whale National Marine Sanctuary, so they can’t swallow large prey. Instead, they eat krill, small fish, and plankton. They can eat up to a ton of food per day, according to the 2015 documentary Humpback Whales.

3. THOSE BUMPS ARE HAIR FOLLICLES.

Each of the distinctive bumps along a humpback’s head holds a single hair that the whale uses to sense the environment around it. These hairs help the whale glean information about water temperature and quality.

4. THEIR FLUKES ARE LIKE FINGERPRINTS.

Like human fingerprints, humpback tails can be used to identify individuals. The pigmentation and scarring on their flukes is unique, and scientists document these markings to keep track of certain whales that they see repeatedly during their research trips.

5. THEY LIVE A LONG TIME, BUT NOT AS LONG AS MANY OTHER WHALES.

Most humpback whales make it into their 60s, but scientists estimate that they may live up to 80 years. Still, that’s nothing compared to bowhead whales, a species whose oldest known individuals have lived to be 200 years old.

6. THEY HAVE THE LONGEST MIGRATIONS OF ANY MAMMAL.

Each year, humpbacks migrate from their feeding grounds in cold waters toward warm breeding areas—Alaskan whales head to Hawaii, while Californian whales head to Mexico and Costa Rica, and Australian whales migrate to the Southern Ocean. These biannual journeys can involve distances of up to 5000 miles, which is officially the longest known migration of any mammal on earth.

The fastest documented migration of a humpback whale was observed in 1988, when a humpback traveled from Sitka, Alaska to to Hawaii in just 39 days—or possibly less, depending on how soon it left Alaskan waters after the researchers sighted it the first time [PDF]. That’s a journey of about 2750 miles point to point.

7. THEY HAVE BEEN KNOWN TO DEFEND OTHER SPECIES FROM ORCAS.

In 2009, marine ecologist Robert Pitman watched two humpback whales rescue a seal from a group of orcas that were pursuing it. The seal ended up on one of the humpbacks’ chests, and when it began to fall off, the whale even nudged it back on with a flipper, indicating that it was an intentional act of altruism. Though it’s not entirely clear why they would do so, it appears to be an offensive response on the part of the humpbacks, who may intervene whenever they hear killer whales fighting, whether one of their own is involved or not.

8. ONLY THE MALES SING.

Their songs may have made the species famous, but not every humpback sings. It’s strictly a male behavior, and plays an important part in courtship displays. There’s plenty of mystery that still surrounds the science of whale songs, but in 2013, researchers discovered that it’s a group activity that involves even sexually immature males. Both young and mature whales sing in chorus, giving the immature whales a lesson in singing and courtship behavior, and helping older whales amplify their songs to draw females to the area from afar. Other research has found that these songs change over times, and whales learn them much like a human learns a new song, bit by bit.

9. BREACHING IS LIKE YELLING

Though humpbacks are famous for their songs, that’s not the only way they communicate. Scientists only recently discovered that breaching—when whales jump up into the air, crashing back down into the water—is a way to keep in touch with far-away friends. Humpbacks leap higher and more often than other whales, and while spectacular to witness, the moves come at a cost: It takes a lot of energy, especially when the whales are fasting. But after 200 hours observing humpbacks migrating past the Australian coast, a team from the University of Queensland found that the whales were more likely to breach when the nearest group of other humpbacks was more than two and a half miles away, and that they were more likely to do so when it was windy out. It appears that breaching is a way to communicate over long distances when there is a lot of competing noise.

10. THEIR SONGS ARE INCREDIBLY COMPLEX …

Humpback songs aren’t just showy. They have their own grammar, and their songs are hierarchical, like sentences. In human language, this means that the meaning of sentences depends on the clauses within them and the words within them. In 2006, mathematical analysis found that humpbacks use phrases, too. And they remix their tunes, too, tweaking them and changing them over time, often combining new and old melodies. Humpback songs have even been visualized as sheet music.

11. … AND HELPED END WHALING.

Researchers estimate [PDF] that prior to the whaling boom of the 19th and 20th centuries, there were around 112,000 humpbacks in the North Atlantic alone, but that by the time commercial whaling was banned in the region in 1955, there were less than 1000 individuals left. Between 1947 and the 1970s, the USSR alone killed an estimated 338,000 humpbacks, falsifying data it was required to submit to the International Convention for the Regulation of Whaling to disguise the illegal magnitude of its hunting operation. It has been called “one of the greatest environmental crimes of the 20th century.”

While the populations have grown and humpbacks have been taken off the endangered species list, some estimates put the worldwide humpback population at only 40 percent of what it was before the whaling era. Whaling was banned throughout the rest of the world in 1966, though Norway, Iceland, and Japan still practice it.

Roger Payne, one of the scientists who first discovered that humpbacks sing songs, later became instrumental in pushing to protect the species in the 1960s. In 1970, he released his recording of humpback songs as a record, which remains the best-selling nature recording in history. In 1972, the songs were played at a Greenpeace meeting, and ended up galvanizing a new movement: Save the Whales. “It certainly was a huge factor in convincing us that the whales were an intelligent species here on planet Earth and actually made music, made art, created an aesthetic,” as former Greenpeace director Rex Weyler told NPR in 2014. The campaign gained traction with other organizations, too, and helped lead to the International Whaling Commission’s 1982 whaling ban.

SECTIONS

arrow
LIVE SMARTER
More from mental floss studios