Computer Scientists Are Using Reddit to Teach Machines Sarcasm


Even humans sometimes misread sarcastic comments (especially online and in text messages), so imagine how hard deciphering sarcasm must be for a robot. Sarcasm is a complex cognitive process—understanding it means not only figuring out the meaning of the words, but examining the context and intent of the speaker. Luckily, there’s Reddit.

In order to help train artificial intelligence programs in natural language processing, Princeton University computer scientists recently scraped together a huge dataset of sarcastic remarks from self-tagged comments on Reddit, according to their paper posted on

Reddit is a treasure trove of data on sarcasm because users themselves identify their comments as sarcasm, so there’s no room for misclassification—we know that remark was definitely made sarcastically, because the person already said so. On the site, users end sarcastic statements with the marker “/s” to prevent confusion, since it can be hard to read sarcasm without facial expressions, tone of voice, or other in-person contextual clues.

The Self-Annotated Reddit Corpus consists of 1.3 million sarcastic remarks from the social media site, which the researchers say is 10 times more than any other training dataset for sarcastic language. The corpus also contains non-sarcastic remarks for a total of 500 to 600 Reddit comments. The comments pulled only include those from users who have employed the “/s” tag in their posts, meaning that they are familiar with and use the tag, so their posts are less likely to contain unmarked examples of sarcasm.

Future artificial intelligence and natural-language processing researchers can now make use of this dataset to teach machines sarcasm, creating a future in which Siri can talk back to us.

[h/t Boing Boing]

Guess the 100-Year-Old Word or Phrase

From Farts to Floozy: These Are the Funniest Words in English, According to Science

Fart. Booty. Tinkle. Weiner. We know these words have the ability to make otherwise mature individuals laugh, but how? And why? Is it their connotations to puerile activities? Is it the sound they make? And if an underlying structure can be found to explain why people find them humorous, can we then objectively determine a word funnier than bunghole?

Chris Westbury, a professor of psychology at the University of Alberta, believes we can. With co-author Geoff Hollis, Westbury recently published a paper ("Wriggly, Squiffy, Lummox, and Boobs: What Makes Some Words Funny?") online in the Journal of Experimental Psychology: General. The two analyzed an existing list of 4997 funny words compiled by the University of Warwick and assessed by 800 survey participants, whittling down the collection to the 200 words the people found funniest. Westbury wanted to see how a word's phonology (sound), spelling, and meaning influenced whether people found it amusing, as well as the effectiveness of incongruity theory—the idea that the more a word subverts expectations, the funnier it gets.

In an email to Mental Floss, Westbury said that a good example of incongruity theory is this video of an orangutan being duped by a magic trick. While he's not responding to a word, clearly he's tickled by the subversion of his own expectations:

With incongruity theory in mind, Westbury was able to generate various equations that attempted to predict whether a person would find a single word amusing. He separated the words into categories—insults, sexual references, party terms, animals, names for body parts, and profanity. Among those examined: gobble, boogie, chum, oink, burp, and turd.

Upchuck topped one chart, followed by bubby and boff, the latter a slang expression for sexual intercourse. Another equation found that slobbering, puking, and fuzz were reliable sources of amusement. Words with the letters j, k, and y also scored highly, and the vowel sound /u/ appeared in 20 percent of words the University of Warwick study deemed funny, like pubes, nude, and boobs.

In the future, Westbury hopes to examine word pairs for their ability to amuse. The smart money is on fart potato to break the top five.

[h/t Live Science]