CLOSE
Original image
iStock

Computer Scientists Are Using Reddit to Teach Machines Sarcasm

Original image
iStock

Even humans sometimes misread sarcastic comments (especially online and in text messages), so imagine how hard deciphering sarcasm must be for a robot. Sarcasm is a complex cognitive process—understanding it means not only figuring out the meaning of the words, but examining the context and intent of the speaker. Luckily, there’s Reddit.

In order to help train artificial intelligence programs in natural language processing, Princeton University computer scientists recently scraped together a huge dataset of sarcastic remarks from self-tagged comments on Reddit, according to their paper posted on arXiv.org.

Reddit is a treasure trove of data on sarcasm because users themselves identify their comments as sarcasm, so there’s no room for misclassification—we know that remark was definitely made sarcastically, because the person already said so. On the site, users end sarcastic statements with the marker “/s” to prevent confusion, since it can be hard to read sarcasm without facial expressions, tone of voice, or other in-person contextual clues.

The Self-Annotated Reddit Corpus consists of 1.3 million sarcastic remarks from the social media site, which the researchers say is 10 times more than any other training dataset for sarcastic language. The corpus also contains non-sarcastic remarks for a total of 500 to 600 Reddit comments. The comments pulled only include those from users who have employed the “/s” tag in their posts, meaning that they are familiar with and use the tag, so their posts are less likely to contain unmarked examples of sarcasm.

Future artificial intelligence and natural-language processing researchers can now make use of this dataset to teach machines sarcasm, creating a future in which Siri can talk back to us.

[h/t Boing Boing]

Original image
Getty Images
arrow
Words
Why Is 'Colonel' Spelled That Way?
Original image
Getty Images

English spelling is bizarre. We know that. From the moment we learn about silent “e” in school, our innocent expectations that sound and spelling should neatly match up begin to fade away, and soon we accept that “eight” rhymes with “ate,” “of” rhymes with “love,” and “to” sounds like “too” sounds like “two.” If we do sometimes briefly pause to wonder at these eccentricities, we quickly resign ourselves to the fact that there must be reasons—stuff about history and etymology and sound changing over time. Whatever. English. LOL. Right? It is what it is.

But sometimes English takes it a step too far, does something so brazen and shameless we can’t just let it slide. That’s when we have to throw our shoulders back, put our hands on our hips and ask, point blank, what is the deal with the word “colonel”?

“Colonel” is pronounced just like “kernel.” How did this happen? From borrowing the same word from two different places. In the 1500s, English borrowed a bunch of military vocabulary from French, words like cavalerie, infanterie, citadelle, canon, and also, coronel. The French had borrowed them from the Italians, then the reigning experts in the art of war, but in doing so, had changed colonello to coronel.

Why did they do that? A common process called dissimilation—when two instances of the same sound occur close to each other in a word, people tend to change one of the instances to something else. Here, the first “l” was changed to “r.” The opposite process happened with the Latin word peregrinus (pilgrim), when the first “r” was changed to an “l” (now it’s peregrino in Spanish and Pellegrino in Italian. English inherited the “l” version in pilgrim.)

After the dissimilated French coronel made its way into English, late 16th century scholars started producing English translations of Italian military treatises. Under the influence of the originals, people started spelling it “colonel.” By the middle of the 17th century, the spelling had standardized to the “l” version, but the “r” pronunciation was still popular (it later lost a syllable, turning kor-o-nel to ker-nel). Both pronunciations were in play for a while, and adding to the confusion was the mistaken idea that “coronel” was etymologically related to “crown”—a colonel was sometimes translated as “crowner” in English. In fact, the root is colonna, Italian for column.

Meanwhile, French switched back to “colonel,” in both spelling and pronunciation. English throws its shoulders back, puts its hands on its hips and asks, how boring is that?

Original image
iStock
arrow
language
Beyond “Buffalo buffalo”: 9 Other Repetitive Sentences From Around The World
Original image
iStock

Famously, in English, it’s possible to form a perfectly grammatical sentence by repeating the word buffalo (and every so often the place name Buffalo) a total of eight times: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo essentially means “buffalo from Buffalo, New York, who intimidate other buffalo from Buffalo, New York, are themselves intimidated by buffalo from Buffalo, New York.” But repetitive or so-called antanaclastic sentences and tongue twisters like these are by no means unique to English—here are a few in other languages that you might want to try.

1. “LE VER VERT VA VERS LE VERRE VERT” // FRENCH

This sentence works less well in print than Buffalo buffalo, of course, but it’s all but impenetrable when read aloud. In French, le ver vert va vers le verre vert means “the green worm goes towards the green glass,” but the words ver (worm), vert (green), vers (towards), and verre (glass) are all homophones pronounced “vair,” with a vowel similar to the E in “bet” or “pet.” In fact, work the French heraldic word for squirrel fur, vair, in there somewhere and you’d have five completely different interpretations of the same sound to deal with.

2. “CUM EO EO EO EO QUOD EUM AMO” // LATIN

Eo can be interpreted as a verb (“I go”), an adverb ("there," "for that reason"), and an ablative pronoun (“with him” or “by him”) in Latin, each with an array of different shades of meaning. Put four of them in a row in the context cum eo eo eo eo quod eum amo, and you’ll have a sentence meaning “I am going there with him because I love him.”

3. “MALO MALO MALO MALO” // LATIN

An even more confusing Latin sentence is malo malo malo malo. On its own, malo can be a verb (meaning “I prefer,” or “I would rather”); an ablative form of the Latin word for an apple tree, malus (meaning “in an apple tree”); and two entirely different forms (essentially meaning “a bad man,” and “in trouble” or “in adversity”) of the adjective malus, meaning evil or wicked. Although the lengths of the vowels differ slightly when read aloud, put all that together and malo malo malo malo could be interpreted as “I would rather be in an apple tree than a wicked man in adversity.” (Given that the noun malus can also be used to mean “the mast of a ship,” however, this sentence could just as easily be interpreted as, “I would rather be a wicked man in an apple tree than a ship’s mast.”)

4. “FAR, FÅR FÅR FÅR?” // DANISH

Far (pronounced “fah”) is the Danish word for father, while får (pronounced like “for”) can be used both as a noun meaning "sheep" and as a form of the Danish verb , meaning "to have." Far får får får? ultimately means “father, do sheep have sheep?”—to which the reply could come, får får ikke får, får får lam, meaning “sheep do not have sheep, sheep have lambs.”

5. “EEEE EE EE” // MANX

Manx is the Celtic-origin language of the Isle of Man, which has close ties to Irish. In Manx, ee is both a pronoun (“she” or “it”) and a verb (“to eat”), a future tense form of which is eeee (“will eat”). Eight letter Es in a row ultimately can be divided up to mean “she will eat it.”

6. “COMO COMO? COMO COMO COMO COMO!” // SPANISH

Como can be a preposition (“like,” “such as”), an adverb (“as,” “how”), a conjunction (“as”), and a verb (a form of comer, “to eat”) in Spanish, which makes it possible to string together dialogues like this: Como como? Como como como como! Which means “How do I eat? I eat like I eat!”

7. “Á Á A Á Á Á Á.” // ICELANDIC

Á is the Icelandic word for river; a form of the Icelandic word for ewe, ær; a preposition essentially meaning “on” or “in;” and a derivative of the Icelandic verb eiga, meaning “to have,” or “to possess.” Should a person named River be standing beside a river and simultaneously own a sheep standing in or at the same river, then that situation could theoretically be described using the sentence Á á á á á á á in Icelandic.

8. “MAI MAI MAI MAI MAI” // THAI

Thai is a tonal language that uses five different tones or patterns of pronunciation (rising, falling, high, low, and mid or flat) to differentiate between the meanings of otherwise seemingly identical syllables and words: glai, for instance, can mean both “near” and “far” in Thai, just depending on what tone pattern it’s given. Likewise, the Thai equivalent of the sentence “new wood doesn’t burn, does it?” is mai mai mai mai mai—which might seem identical written down, but each syllable would be given a different tone when read aloud.

9. “THE LION-EATING POET IN THE STONE DEN” // MANDARIN CHINESE

Mandarin Chinese is another tonal language, the nuances of which were taken to an extreme level by Yuen Ren Chao, a Chinese-born American linguist and writer renowned for composing a bizarre poem entitled "The Lion-Eating Poet in the Stone Den." When written in its original Classical Chinese script, the poem appears as a string of different characters. But when transliterated into the Roman alphabet, every one of those characters is nothing more than the syllable shi:

Shíshì shīshì Shī Shì, shì shī, shì shí shí shī.
Shì shíshí shì shì shì shī.
Shí shí, shì shí shī shì shì.
Shì shí, shì Shī Shì shì shì.
Shì shì shì shí shī, shì shǐ shì, shǐ shì shí shī shìshì.
Shì shí shì shí shī shī, shì shíshì.
Shíshì shī, Shì shǐ shì shì shíshì.
Shíshì shì, Shì shǐ shì shí shì shí shī.
Shí shí, shǐ shí shì shí shī shī, shí shí shí shī shī.
Shì shì shì shì.

The only difference between each syllable is its intonation, which can be either flat (shī), rising (shí), falling (shì) or falling and rising (shǐ); you can hear the entire poem being read aloud here, along with its English translation.

SECTIONS

arrow
LIVE SMARTER
More from mental floss studios