A Crash Course in Wikipedia Vandalism
Reader Johnny Cat wrote in to ask about which Wikipedia entries have the highest incidences of false information in them. "I'm aware that almost everything there, from Applebee's to Zorro, has errors every day," he wrote, "but something in my gut tells me there are certain topics that just attract bad submitters."
Johnny Cat—and you—will probably be as surprised as I was scrolling down Wikipedia's List of Most Vandalized Pages, because there doesn't seem to be any method to the madness of wiki vandalism as far subject matter is concerned. Among the victims of "exceptionally high vandalism" are the entries for Jack London, baseball, Halo 2, Harry Potter, piano, home improvement and buttocks. The commonality among some of the most vandalized entries seems to be that they're recent major news events, topics that are currently, or have been, subjects of controversy, or entries that are simply popular and often read.
Back up. What is wiki vandalism in the first place?
Wikipedia defines it as any "addition, removal, or change of content made in a deliberate attempt to compromise the integrity of Wikipedia," which can come in variety of flavors, such as...
Blanking: Removing all or significant parts of a page's content without any reason, or replacing entire pages with nonsense.
Page creation: Creating new pages with the intent of malicious behavior, like blatant advertising pages, personal attack pages and hoaxes.
Page lengthening: Adding large amounts of bad-faith content in order to make the page's load time abnormally long or even make it impossible to load without browser crashing.
Spam: Adding external links to non-notable or irrelevant sites or sites that have some relationship to the subject matter, but advertise or promote in the user's interest.
Silly vandalism: Adding profanity, graffiti, random characters or other nonsense to entries or creating nonsensical and non-encyclopedic pages.
Image vandalism: Uploading shock images, inappropriately placing explicit images on pages, or using images in other disruptive ways.
Once the damage is done, how long does it take to fix?
In the interest of science, Wikipedia user Colonel Chaos vandalized featured articles, the entries that are considered the cream of the Wikipedia crop. Since Wikipedia employs software created to help find easy-to-spot vandalism (like "Your mom!" or "POOP!!!!"), the Colonel engaged in slightly more complex vandalism of three types: Complete Nonsense, where passages of completely irrelevant prose were inserted into articles; Grave Factual Accuracy, where material was changed or inserted in a way that it would be obvious to the average reader or editor of Wikipedia that the material was untrue (e.g. That Martin Sheen discovered hydrochloric acid by mixing potatoes with salt and invented Agent Orange for the purpose of dissolving gold); and Factual Inaccuracy, where articles were changed slightly so a reader would need some knowledge of the topic in order to spot the inaccuracy (e.g. the article on Norman Borlaug was changed from "Between 1965 and 1970, wheat yields nearly doubled in Pakistan and India" to "Between 1968 and 1975, wheat yields nearly tripled in Pakistan and India."
The average response time on these changes were 11.5 hours for Complete Nonsense, 9.25 hours for Grave Inaccuracy and 57.4 minutes for Factual Inaccuracy. Colonel Chaos notes that for featured articles, which rotate on Wikipedia's main page and are heavily viewed, a reversion time of 10 minutes would be more appropriate.
Here are some highlights from the study:
Is there any way to stop this madness?
Well, there was the plan to simply let vandals run amok on the entry for chickens. By sacrificing this article—"Dudes already know about chickens. Ladies also already know about chickens. Does an encyclopedia really need an article about nature's tastiest birds?"—it was hoped that the rest of Wikipedia would be spared. The plan, like the bird, never really got off the ground.
Then there's WikiScanner, created by Daniel Erenrich and Virgil Griffith, which allows users to trace the source of anonymous edits to Wikipedia entries and by using IP address of the anonymous user (which Wikipedia logs) to identify the owner of the computer network from which the edits were made. In the past, the tool has exposed insiders at Diebold Election Systems, Exxon and the CIA covertly deleting or changing information that was unflattering to their organizations. If you can't stop a vandal, you can at least pull back the curtain of anonymity.