Original image
iStock / Public Domain

The Surprisingly Devious History of CAPTCHA

Original image
iStock / Public Domain

Life in the Information Age changes so fast and so often that we often don’t even notice. Take, for example, the CAPTCHA system of internet user authentication, which became ubiquitous, then kind of sinister, then began to fade away. 

The word CAPTCHA is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.” The original system was developed in the early 2000s by engineers at Carnegie Mellon University. The team, led by Luis von Ahn (who calls himself "Big Lou"), wanted to find a way to filter out the overwhelming armies of spambots pretending to be people. 

They devised a program that would display some form of garbled, warped, or otherwise distorted text that a computer couldn’t possibly read, but a human could make out. All a user had to do was type the text in a box, and access was theirs.

The program was wildly successful. CAPTCHA became a ubiquitous tool and an accepted part of the internet user experience. 

Unfortunately, the designers overlooked one very human trait: a need to get paid. Before too long, spam-sponsored CAPTCHA farms were popping up all over the internet, especially in poor countries, offering workers money to solve CAPTCHA boxes by the thousands.   

Even with these spam farms, CAPTCHA was a solid product. But the engineers weren’t satisfied. Millions of people were voluntarily translating nonsensical images into text, which seemed, to von Ahn, like a waste of perfectly good free labor. 

Speaking to The New York Times in 2011, von Ahn remembered thinking, “’Can we do something useful with this time?” 

After some more tinkering, reCAPTCHA was born and implemented on sites all over the internet. The general user experience was pretty much the same: type the letters and numbers you see onscreen. But rather than randomized words, reCAPTCHA asked users to translate images of real words and numbers taken from archival texts. Computers are pretty good at reading old documents, but smeary ink and damaged paper may make some words hard to read. Fortunately for von Ahn, humans can still read those words just fine. 

They started with the archives of The New York Times, then sold the technology to Google, who began using it to transcribe old books. That’s right—you have likely worked for free for Google and The New York Times. Those grainy images of old-timey text are real words from real pages. 

Von Ahn was pleased with the new version and confident that reCAPTCHA was here to stay. “We’ll be going for a long time,” he told the Times. “There’s a lot of printed material out there.”

But, as we said, this is the Internet Age. Most of the programs and online behaviors that we take for granted today will be extinct in a few years, and the CAPTCHA dynasty is no exception. 

In 2014, a Google analysis found that artificial intelligence could crack even the most complex CAPTCHA and reCAPTCHA images with 99.8 percent accuracy, rendering the programs useless as security devices. 

In their place, Google unveiled the now-familiar “No CAPTCHA reCAPTCHA” system, which relies not on a users’ ability to decipher text, but on their online behavior prior to the security checkpoint. While a user is on a page, an invisible algorithm is monitoring how they interact with the content to determine if they’re human or robot.

Then, at the checkpoint itself, users are asked to confirm a single statement: “I am not a robot.” 

If the program believes you’re a human, all you have to do is check the box and move on. If you’re suspected of spambot tendencies, checking the box will open up a new challenge, like identifying all the kittens in a photo array.

The arms race between internet security experts and spambots may never end. In time, No CAPTCHA reCAPTCHA will be outsmarted, then replaced. And when that happens, pay attention.

Original image
iStock // Ekaterina Minaeva
Man Buys Two Metric Tons of LEGO Bricks; Sorts Them Via Machine Learning
Original image
iStock // Ekaterina Minaeva

Jacques Mattheij made a small, but awesome, mistake. He went on eBay one evening and bid on a bunch of bulk LEGO brick auctions, then went to sleep. Upon waking, he discovered that he was the high bidder on many, and was now the proud owner of two tons of LEGO bricks. (This is about 4400 pounds.) He wrote, "[L]esson 1: if you win almost all bids you are bidding too high."

Mattheij had noticed that bulk, unsorted bricks sell for something like €10/kilogram, whereas sets are roughly €40/kg and rare parts go for up to €100/kg. Much of the value of the bricks is in their sorting. If he could reduce the entropy of these bins of unsorted bricks, he could make a tidy profit. While many people do this work by hand, the problem is enormous—just the kind of challenge for a computer. Mattheij writes:

There are 38000+ shapes and there are 100+ possible shades of color (you can roughly tell how old someone is by asking them what lego colors they remember from their youth).

In the following months, Mattheij built a proof-of-concept sorting system using, of course, LEGO. He broke the problem down into a series of sub-problems (including "feeding LEGO reliably from a hopper is surprisingly hard," one of those facts of nature that will stymie even the best system design). After tinkering with the prototype at length, he expanded the system to a surprisingly complex system of conveyer belts (powered by a home treadmill), various pieces of cabinetry, and "copious quantities of crazy glue."

Here's a video showing the current system running at low speed:

The key part of the system was running the bricks past a camera paired with a computer running a neural net-based image classifier. That allows the computer (when sufficiently trained on brick images) to recognize bricks and thus categorize them by color, shape, or other parameters. Remember that as bricks pass by, they can be in any orientation, can be dirty, can even be stuck to other pieces. So having a flexible software system is key to recognizing—in a fraction of a second—what a given brick is, in order to sort it out. When a match is found, a jet of compressed air pops the piece off the conveyer belt and into a waiting bin.

After much experimentation, Mattheij rewrote the software (several times in fact) to accomplish a variety of basic tasks. At its core, the system takes images from a webcam and feeds them to a neural network to do the classification. Of course, the neural net needs to be "trained" by showing it lots of images, and telling it what those images represent. Mattheij's breakthrough was allowing the machine to effectively train itself, with guidance: Running pieces through allows the system to take its own photos, make a guess, and build on that guess. As long as Mattheij corrects the incorrect guesses, he ends up with a decent (and self-reinforcing) corpus of training data. As the machine continues running, it can rack up more training, allowing it to recognize a broad variety of pieces on the fly.

Here's another video, focusing on how the pieces move on conveyer belts (running at slow speed so puny humans can follow). You can also see the air jets in action:

In an email interview, Mattheij told Mental Floss that the system currently sorts LEGO bricks into more than 50 categories. It can also be run in a color-sorting mode to bin the parts across 12 color groups. (Thus at present you'd likely do a two-pass sort on the bricks: once for shape, then a separate pass for color.) He continues to refine the system, with a focus on making its recognition abilities faster. At some point down the line, he plans to make the software portion open source. You're on your own as far as building conveyer belts, bins, and so forth.

Check out Mattheij's writeup in two parts for more information. It starts with an overview of the story, followed up with a deep dive on the software. He's also tweeting about the project (among other things). And if you look around a bit, you'll find bulk LEGO brick auctions online—it's definitely a thing!

Original image
© Nintendo
Nintendo Will Release an $80 Mini SNES in September
Original image
© Nintendo

Retro gamers rejoice: Nintendo just announced that it will be launching a revamped version of its beloved Super Nintendo Classic console, which will allow kids and grown-ups alike to play classic 16-bit games in high-definition.

The new SNES Classic Edition, a miniature version of the original console, comes with an HDMI cable to make it compatible with modern televisions. It also comes pre-loaded with a roster of 21 games, including Super Mario Kart, The Legend of Zelda: A Link to the Past, Donkey Kong Country, and Star Fox 2, an unreleased sequel to the 1993 original.

“While many people from around the world consider the Super NES to be one of the greatest video game systems ever made, many of our younger fans never had a chance to play it,” Doug Bowser, Nintendo's senior vice president of sales and marketing, said in a statement. “With the Super NES Classic Edition, new fans will be introduced to some of the best Nintendo games of all time, while longtime fans can relive some of their favorite retro classics with family and friends.”

The SNES Classic Edition will go on sale on September 29 and retail for $79.99. Nintendo reportedly only plans to manufacture the console “until the end of calendar year 2017,” which means that the competition to get your hands on one will likely be stiff, as anyone who tried to purchase an NES Classic last year will well remember.

In November 2016, Nintendo released a miniature version of its original NES system, which sold out pretty much instantly. After selling 2.3 million units, Nintendo discontinued the NES Classic in April. In a statement to Polygon, the company has pledged to “produce significantly more units of Super NES Classic Edition than we did of NES Classic Edition.”

Nintendo has not yet released information about where gamers will be able to buy the new console, but you may want to start planning to get in line soon.