Original image
Getty Images

5 Stats That Might Mean Less Than You Think

Original image
Getty Images

We're living in the golden age of statistics for sports geeks. If you want to estimate how many wins Wilt Chamberlain accounted for in the 1963-1964 season, you can do that. If you want to know whether people with the initial K strike out more often, you can do that, too. Many number crunchers believe the old stats we grew up with aren't actually very helpful. Like these:

1. RBI

Getty Images

Because it's such a team sport, baseball is chock full of metrics that give, at best, an incomplete picture of a player's performance. Take Runs Batted In, still one of the chief offensive statistics. Sure, the all-time RBI list is a who's who of the best hitters—Hank Aaron, Babe Ruth and Barry Bonds are the top three—but critics have long argued that it's a poor way to judge how good an individual hitter is. In The Sabermetric Manifesto, David Grabiner wrote that RBIs are not "meaningless, only incomplete" because they don't measure a batter's full offensive production. Outside of a solo home run, the only way to record an RBI is to have someone on base ahead of you, so even if you put, say, Hank Aaron onto the Miami Marlins, there just wouldn't be many chances.

2. Pitcher Wins

Getty Images

But the RBI debate pales in comparison to some of baseball's long-hated pitching stats. When Felix Hernandez won the Cy Young award in 2010 despite recording just 13 wins (by contrast, last year's two winners each had 20 wins), it was hailed as a victory for the stat geeks. Among his achievements, Hernandez led the American League in ERA, quality starts, and fewest hits per nine innings, and was second in strikeouts, walks and hits per nine innings. In short, just about everything except for wins, which most saw as a reflection of the putrid Seattle Mariners team he pitched for. A great pitcher on a bad team might look unimpressive in the win column simply because his team doesn't score.

3. Points Per 48 Minutes

Getty Images

Of course, the statistical noise goes well beyond baseball. Take basketball, where minutes played is still an oft-tallied and much-discussed statistic, with estimates of what a player's contribution would be over a full 48-minute game. But as Hall of Famer Charles Barkley has said, the only reason to consider what somebody would have done in 48 minutes is because they weren't good enough to play all 48 in the first place.

4. Passer Rating

Getty Images

Even football—with its multi-million dollar fantasy business based entirely on statistics—has struggled to come up with a set of advanced metrics of its own. That's highlighted by the ultra-confusing quarterback passer rating, measured on a scale up to 158.3. Even the NFL has admitted the stat only rates passers, not total quarterbacks. QBR doesn't fully account for rushing plays, the offense the QB plays in, or his overall record, and the stats "do not reflect leadership, play-calling and other intangible factors," according to the NFL's own site. And while QBR has its defenders—Sports Illustrated's Kerry Byrne has pointed out that it does correlate closely with winning percentage—ESPN has worked to replace it with their Total QBR stat, which they say incorporates the context of each play to better account for the quarterback's contribution. But that has its own critics for being too confusing and for not weighing scores based on the situation.

5. Time of Possession

Getty Images

Even one of the most valued NFL stats—time of possession—is being challenged. The argument went that the team who had the ball the longest was dominating on offense and "controlling the ball." But that's given way recently to a number with even more bearing on the game: points scored. New Philadelphia Eagles coach Chip Kelly, who prides himself on his fast-paced offense, recently took aim at the traditional time of possession figures, saying it was really "how much time can the other team waste?"

"Most games, we lose the time of possession, but it's how many snaps do you face?," Kelly said. "And I think in both [preseason] games we've played, we've played more snaps than our other team."

Across all sports, there are traditional metrics that may not ultimately carry much value. Fans have questioned hockey's shots on goal numbers, pointing out that shots that don't go in still don't count for anything. People talk about serve speed in tennis, but that ultimately doesn't reflect how a player reacts or plays on the surface. Golf's "Driving Accuracy Percentage" is supposed to measure the number of hits that land on the fairway, but doesn't really measure how errant a shot is or how it impacts performance.

But with so much attention on slicing and dicing every shot of every game, there's a good chance that every questionable stat will be tweaked and refined and replaced with something else. That is, until something more advanced comes along to replace them.

Game Changers: Real. Sports. Data.

A recurring web series on sports and big data, featuring industry experts and social commentary. For more information, visit

Original image
iStock // Ekaterina Minaeva
Man Buys Two Metric Tons of LEGO Bricks; Sorts Them Via Machine Learning
May 21, 2017
Original image
iStock // Ekaterina Minaeva

Jacques Mattheij made a small, but awesome, mistake. He went on eBay one evening and bid on a bunch of bulk LEGO brick auctions, then went to sleep. Upon waking, he discovered that he was the high bidder on many, and was now the proud owner of two tons of LEGO bricks. (This is about 4400 pounds.) He wrote, "[L]esson 1: if you win almost all bids you are bidding too high."

Mattheij had noticed that bulk, unsorted bricks sell for something like €10/kilogram, whereas sets are roughly €40/kg and rare parts go for up to €100/kg. Much of the value of the bricks is in their sorting. If he could reduce the entropy of these bins of unsorted bricks, he could make a tidy profit. While many people do this work by hand, the problem is enormous—just the kind of challenge for a computer. Mattheij writes:

There are 38000+ shapes and there are 100+ possible shades of color (you can roughly tell how old someone is by asking them what lego colors they remember from their youth).

In the following months, Mattheij built a proof-of-concept sorting system using, of course, LEGO. He broke the problem down into a series of sub-problems (including "feeding LEGO reliably from a hopper is surprisingly hard," one of those facts of nature that will stymie even the best system design). After tinkering with the prototype at length, he expanded the system to a surprisingly complex system of conveyer belts (powered by a home treadmill), various pieces of cabinetry, and "copious quantities of crazy glue."

Here's a video showing the current system running at low speed:

The key part of the system was running the bricks past a camera paired with a computer running a neural net-based image classifier. That allows the computer (when sufficiently trained on brick images) to recognize bricks and thus categorize them by color, shape, or other parameters. Remember that as bricks pass by, they can be in any orientation, can be dirty, can even be stuck to other pieces. So having a flexible software system is key to recognizing—in a fraction of a second—what a given brick is, in order to sort it out. When a match is found, a jet of compressed air pops the piece off the conveyer belt and into a waiting bin.

After much experimentation, Mattheij rewrote the software (several times in fact) to accomplish a variety of basic tasks. At its core, the system takes images from a webcam and feeds them to a neural network to do the classification. Of course, the neural net needs to be "trained" by showing it lots of images, and telling it what those images represent. Mattheij's breakthrough was allowing the machine to effectively train itself, with guidance: Running pieces through allows the system to take its own photos, make a guess, and build on that guess. As long as Mattheij corrects the incorrect guesses, he ends up with a decent (and self-reinforcing) corpus of training data. As the machine continues running, it can rack up more training, allowing it to recognize a broad variety of pieces on the fly.

Here's another video, focusing on how the pieces move on conveyer belts (running at slow speed so puny humans can follow). You can also see the air jets in action:

In an email interview, Mattheij told Mental Floss that the system currently sorts LEGO bricks into more than 50 categories. It can also be run in a color-sorting mode to bin the parts across 12 color groups. (Thus at present you'd likely do a two-pass sort on the bricks: once for shape, then a separate pass for color.) He continues to refine the system, with a focus on making its recognition abilities faster. At some point down the line, he plans to make the software portion open source. You're on your own as far as building conveyer belts, bins, and so forth.

Check out Mattheij's writeup in two parts for more information. It starts with an overview of the story, followed up with a deep dive on the software. He's also tweeting about the project (among other things). And if you look around a bit, you'll find bulk LEGO brick auctions online—it's definitely a thing!

Original image
Name the Author Based on the Character
May 23, 2017
Original image