CLOSE
iStock
iStock

Report Finds Microsoft Excel Causes Errors in 20 Percent of Genomics Studies

iStock
iStock

Microsoft Excel, that ubiquitous tool for data crunching, has been playing an unexpected role in the scientific world. The program has been screwing with data in genomics studies. A new report in the journal Genome Biology estimates that around 20 percent of scientific papers published in leading genome-focused journals that include gene lists from Excel contain errors due to the program’s default autocorrect settings, Slate reports.

The problem is, several genes have symbols that look a lot like dates. The program has a tendency to convert gene symbols like SEPT2 (Septin 2) and MARCH1 (Membrane Associated Ring-CH-Type Finger) into what Excel thinks is proper date form—turning them into 2-Sept and 1-Mar instead. In some, SEPT2 became “2006/09/02.”

"Inadvertent gene symbol conversion is problematic because these supplementary files are an important resource in the genomics community that are frequently reused," the paper’s authors write. They reviewed the supplementary gene list Excel files from 18 journals, examining studies published between 2005 and 2015—Excel’s gene-typo issue was first reported in 2004—for date formatting within lists of genes. The analysis was performed by a program that flagged supplementary materials that seemed to be lists of genes, then searched them for date formatting. Out of more than 35,000 supplementary files, they confirmed 987 files with gene errors that were published as part of 704 studies.

Overall, 19.6 percent of papers in the 18 journals contained gene name errors caused by Excel’s autocorrect function, but some journals were worse than others. High-impact journals, typically the most respected outlets to publish research in, actually had more affected gene lists, which the researchers speculate may be because studies published in these journals are more likely to have larger and more numerous data sets.

The highest proportion of gene lists with errors (more than 20 percent) came from the journals Nucleic Acids Research, Genome Biology, Nature Genetics, Genome Research, Genes and Development, and Nature; conversely, the journals Molecular Biology and Evolution, Bioinformatics, DNA Research, and Genome Biology and Evolution showed errors in less than 10 percent of genomics papers.

While this isn’t the worst scientific error to end up in a journal, since it’s pretty clear that 2006/09/02 isn’t a gene symbol, it’s also fairly disturbing that this many papers could make it through the editing process without anyone noticing that they contained lists of nonexistent genes.

The researchers highlight Google Sheets as a potential alternative for Excel, because it doesn’t suffer from the same symbol-date mixup, and it seems that when you open Sheets documents in other programs like Excel, the data is protected from Excel’s default autocorrection. They suggest that journal editors and reviewers should look out for these errors, pasting gene name lists into blank files and sorting them so that any dates that have been mistakenly inserted will become apparent.

[h/t Slate]

Know of something you think we should cover? Email us at tips@mentalfloss.com.

nextArticle.image_alt|e
iStock
arrow
science
DNA Analysis of Loch Ness Could Reveal the Lake's Hidden Creatures
iStock
iStock

Stakeouts, sonar studies, and a 24-hour video feed have all been set up in an effort to confirm the existence of the legendary Loch Ness Monster. Now, the Associated Press reports that an international team of scientists will use DNA analysis to learn what's really hiding in the depths of Scotland's most mysterious landmark.

The team, led by Neil Gemmell, who researches evolutionary genetics at the University of Otago in New Zealand, will collect 300 water samples from various locations and depths around the lake. The waters are filled with microscopic DNA fragments animals leave behind as they swim, mate, eat, poop, and die in the waters, and if Nessie is a resident, she's sure to leave bits of herself floating around as well.

After extracting the DNA from the organic material found in the water samples, the scientists plan to sequence it. The results will then be compared to the DNA profiles of known species. If there's evidence of an animal that's not normally found in the lake, or an entirely new species, the researchers will hopefully spot it.

Gemmell is a Nessie skeptic, and he says the point of the project isn't necessarily to discover new species. Rather, he wants to create a genetic profile of the lake while generating some buzz around the science behind it.

If the study goes according to plan, the database of Loch Ness's inhabitants should be complete by 2019. And though the results likely won't include a long-extinct plesiosaur, they may offer insights about other invasive species that now call the lake home.

[h/t AP]

nextArticle.image_alt|e
iStock
arrow
science
Scientists Figure Out Why Roses Don't Smell as Good as They Used To
iStock
iStock

Roses are red, violets are blue, but they just don't smell like they used to.

A team of 40 international researchers has successfully mapped an heirloom rose's genome and learned where the bud's color and scent come from—and how to tweak those traits to yield a more fragrant flower. Historically, rose breeders have opted for pretty petals over pleasant perfumes, and as a result, the rose's natural scent has faded over time, according to Science News.

The study, published in the journal Nature Genetics, reports that some of the genes of the "Old Blush" pink China rose cancel each other out, "with some turning on to brew a scent component while others shut down manufacture of anthocyanin pigments needed for rosy petals," Science News reports. The researchers also found 22 new biochemical steps in the production of terpenes, the volatile organic compounds key to the rose's perfume. With a better understanding of the complex relationship between color and scent, breeders of both roses and other plants could start producing flowers without sacrificing one trait for the other.

"The big challenge is you need to know what to edit," Todd Mockler, a plant researcher who was not involved with the rose study, tells The New York Times. “You can't just randomly start editing. You have to know what to target. The only way to know that is to have a genome sequence.”

The rose is most closely related to the strawberry plant, but it also has family ties with the apple and pear. Given that modern roses contain a blend of genes from between eight and 20 different species, mapping its genome was no small feat. It took researchers eight years to complete this study, according to the BBC. And while it's not the first time the rose genome has been mapped, this new analysis is far more comprehensive.

Similarly, the sunflower contains a complex genetic code, but scientists were able to map its genome last year, serving to aid future researchers and flower breeders. 

[h/t BBC]

SECTIONS

arrow
LIVE SMARTER
More from mental floss studios