Errors recording the sex of human tissue samples, or cell contamination, may explain why nearly half of RNA data sets studied don’t seem to match the sex noted in the data set.

Errors recording the sex of human tissue samples, or cell contamination, may explain why nearly half of RNA data sets studied don’t seem to match the sex noted in the data set.

© BSIP SA/Alamy Stock Photo

Sex problems? Researchers find ‘widespread’ mislabeling of the sex of human samples

What if scientists don’t really know what’s in their vials and lab dishes? A research team has analyzed dozens of data sets from human genomics studies and found that nearly half of them have a sexual identity problem—they’re labeled as coming from a male but the data suggest they must be from a female, or vice versa . These mix-ups, likely due to accidental mislabeling of the data at some point, but possibly also from cell contamination in the original samples, could have untold effects on the validity of comparisons in genomics experiments conducted worldwide, according to the group, which last week posted its findings on bioRxiv, a site for preprints that have not yet been formally peer reviewed.

The disputed data sets describe a tissue’s transcriptome—the array of messenger RNAs (mRNAs) produced when genes in cells turn on to manufacture a protein. Although much work has been done in recent years to reduce errors in studies of RNA transcriptomes, computational biologist Lilah Toker and her colleagues at the University of British Columbia, Vancouver, in Canada, kept noticing errors in how samples were labeled after they performed routine quality checks of data sets. “At some point we were wondering if this is just because we are doing so much data analysis, or is it actually something much more widespread,” Toker says.

Toker and her colleagues then examined the transcriptomes from 70 publicly available data sets for human tissue samples, trying to corroborate the sex of the tissues by looking for mRNAs from male- or female-specific genes. They found discrepancies between the labeled sex and the mRNA results in 32 out of the 70 data sets.

Amanda Capes-Davis, based in Sydney, Australia, and chair of the International Cell Line Authentication Committee, agrees that mislabeling of cell samples in transcriptomic studies is a serious issue, particularly when using so-called cell lines, which have been grown in lab dishes for extended periods, even years to decades. However, she questions the accuracy of using sex-specific genetic tests as an authentication tool because many male cell lines lose their Y chromosomes while being cultured, which could obscure their true origin. A Nature study published last year, she notes, looked at nearly 2000 cell lines and found that hundreds originally labeled as male appeared instead to be female based on sex-specific gene tests, whereas just 10 labeled as female seem to be male. The contrast is so large, says Capes-Davis, that Y chromosome loss is the most likely explanation.

Nonetheless, too many labs do not perform basic authentication checks on their cells before experimenting with them, Capes-Davis says. Toker and her colleagues argue that the simple sex identity genetic tests they conducted should become standard in any lab. “I hope people would just pay more attention to what they’re analyzing,” Toker concludes.