Why reCAPTCHA is Good for Humanity

Last week we talked about KittenAuth, a novel CAPTCHA system used to differentiate between humans and spambots -- by using pictures of kittens. Today let's take a look at reCAPTCHA, the system in use by this very blog. What does it do, and why is it good for humanity?

What's a CAPTCHA?

First let's review the term CAPTCHA. It's a loose acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart." The idea is to force humans to do a (relatively) simple task like read a few words presented in an image, then type them into the form -- but this trick only works if the task is hard for computers (ahem, spambots) to do.

CAPTCHA systems are used on forms all over the web in order to cut down on spam form submissions. If you've ever run a blog, you'll know that legions of spambots are crawling the web, submitting every form they find -- so having a CAPTCHA on the form drastically reduces form spam. However, in most CAPTCHA systems the text you type in is meaningless, purposely scrambled text. reCAPTCHA is different.

What's Different About reCAPTCHA?

reCAPTCHA was born when Luis von Ahn, an assistant professor at Carnegie Mellon, realized that millions of people were spending time typing meaningless words into forms. Why not turn this word-decipherment into useful work that helped with some common goal? What if there was a set of words (as images) that needed to be viewed and deciphered by humans? It turns out that book scanning projects (including the Internet Archive) have just this problem: when scanning a print book into a computer -- particularly an old book in poor condition -- some words can't be deciphered automatically by Optical Character Recognition (OCR) software, and need a human to figure them out. In order to get a good text-only copy of a scanned book, lots of human attention is needed.

So reCAPTCHA is conceptually simple: take the words the OCR software can't read and put them in front of human users. If multiple users decipher the same hard-to-read word using the same text, reCAPTCHA can safely assume that it has been properly deciphered, and feed that word back into the book scanning project, slotting it into its associated book. Thus, text that is by definition difficult or impossible for a computer to accurately scan has been deciphered by humans -- and the humans doing the work generally don't even know it!

Yeah, But...

There's one technical catch -- what's to stop people from typing in random gibberish as "decipherment" of the words? Given that reCAPTCHA by definition doesn't know the correct decipherment of its subject words, how can it judge whether you've gotten it right? To solve this problem, reCAPTCHA presents two words together: one unknown and one known (the latter meaning a word for which reCAPTCHA already has a good decipherment). You have to get the known word correct, and the unknown word is (as described above) compared with other users' decipherments to eventually determine whether it's correct. There's also an audio variant for users with visual impairment, in which they listen to spoken language and convert it to written text.

So next time you fill out a reCAPTCHA form when commenting on a Mental Floss blog post, remember: you're helping to digitize books!

Further reading: Carnegie Mellon press release, Wikipedia page, reCAPTCHA project site.

Samsung Is Making a Phone You Can Fold in Half

The iPhone vs. Galaxy war just intensified. Samsung is pulling out all the stops and developing a foldable phone dubbed Galaxy X, which it plans to release next year, according to The Wall Street Journal.

It would seem the rumors surrounding a mythical phone that can fold over like a wallet are true. The phone, which has been given the in-house code name “Winner,” will have a 7-inch screen and be a little smaller than a tablet but thicker than most other smartphones.

Details are scant and subject to change at this point, but the phone is expected to have a smaller screen on the front that will remain visible when the device is folded. Business Insider published Samsung patents back in May showing a phone that can be folded into thirds, but the business news site noted that patents often change, and some are scrapped altogether.

The Galaxy Note 9 is also likely to be unveiled soon, as is a $300 Samsung speaker that's set to rival the Apple HomePod.

The Galaxy X will certainly be a nifty new invention, but it won’t come cheap. The Wall Street Journal reports the phone will set you back about $1500, which is around $540 more than Samsung’s current most expensive offering, the Galaxy Note 8.

Why a Readily Available Used Paperback Is Selling for Thousands of Dollars on Amazon

At first glance, getting ahold of a copy of One Snowy Knight, a historical romance novel by Deborah MacGillivray, isn't hard at all. You can get the book, which originally came out in 2009, for a few bucks on Amazon. And yet according to one seller, a used copy of the book is worth more than $2600. Why? As The New York Times reports, this price disparity has more to do with the marketing techniques of Amazon's third-party sellers than it does the market value of the book.

As of June 5, a copy of One Snowy Knight was listed by a third-party seller on Amazon for $2630.52. By the time the Times wrote about it on July 15, the price had jumped to $2800. That listing has since disappeared, but a seller called Supersonic Truck still has used copy available for $1558.33 (plus shipping!). And it's not even a rare book—it was reprinted in July.

The Times found similar listings for secondhand books that cost hundreds if not thousands of dollars more than their market price. Those retailers might not even have the book on hand—but if someone is crazy enough to pay $1500 for a mass-market paperback that sells for only a few dollars elsewhere, that retailer can make a killing by simply snapping it up from somewhere else and passing it on to the chump who placed an order with them.

Not all the prices for used books on Amazon are so exorbitant, but many still defy conventional economic wisdom, offering used copies of books that are cheaper to buy new. You can get a new copy of the latest edition of One Snowy Knight for $16.99 from Amazon with Prime shipping, but there are third-party sellers asking $24 to $28 for used copies. If you're not careful, how much you pay can just depend on which listing you click first, thinking that there's not much difference in the price of used books. In the case of One Snowy Knight, there are different listings for different editions of the book, so you might not realize that there's a cheaper version available elsewhere on the site.

An Amazon product listing offers a mass-market paperback book for $1558.33.
Screenshot, Amazon

Even looking at reviews might not help you find the best listing for your money. People tend to buy products with the most reviews, rather than the best reviews, according to recent research, but the site is notorious for retailers gaming the system with fraudulent reviews to attract more buyers and make their way up the Amazon rankings. (There are now several services that will help you suss out whether the reviews on a product you're looking at are legitimate.)

For more on how Amazon's marketplace works—and why its listings can sometimes be misleading—we recommend listening to this episode of the podcast Reply All, which has a fascinating dive into the site's third-party seller system.

[h/t The New York Times]


