The IRS's Favorite Mathematical Law

When it comes to catching tax cheats, the IRS has more than just federal law on its side. The agency’s arsenal also includes a mathematical truth known as Benford’s law. Armed with this law, the IRS can sniff out falsified returns just by looking at the first digit of numbers on taxpayers’ forms.

While most Americans wouldn’t put it past the IRS to use black magic, the truth behind Benford’s law is far from mystical. In 1938, GE physicist Frank Benford undertook a comprehensive study of numbers and how they occur. His findings mirrored the discovery of American astronomer Simon Newcomb, who had undertaken similar research in 1881. Benford found that when it came to naturally or socially generated data, the distribution of the first digit in a series of numbers is not uniform.

In analyzing 20,000 sets of numbers from a variety of sources—numbers from a newspaper, population figures, American League baseball stats—Benford found that a whopping 30 percent of the numbers in his sample had one as their first digit. The numeral two turned up in the first position 18 percent of the time, and three occurred 12 percent.

There’s a simple explanation for what Benford observed. In the number set 0 to 99, 11 percent of the numbers start with 1, and 11 percent start with each digit from 2 to 9. In the number set 0 to 199, over half of the numbers start with 1, and less than 6 percent start with 2 to 9. In the number set 0 to 299, 37 percent start with 1 and 37 percent start with 2, and the numbers 3 through 9 start 3.7 percent each. This situation goes on forever, so over a large enough data set, the distribution of leading digits follows a predictable pattern. The bigger the integer, the less likely it is to be the first digit in a data set.

How does the IRS use this distribution? Many folks who happen to fudge a bit on their tax returns or expense reports call attention to their creativity by using too many dollar amounts that start with an eight or nine (the least common integers found in the first position) and not enough that start with numeral one. Savvy CPAs know what to look for, and many computer systems that tabulate figures are also programmed to catch any suspicious strings of numbers.