Privacy takes a hit in genetic databases


It has often been said, at least on this page, that the big data future will depend a great deal on how well we protect privacy today. Well privacy--and by association big data--took a big hit in the world of genetics this week, as another loophole was identified, this one potentially serious. It puts at risk the identities of people who contribute their DNA sequences to research projects.

In reaction to a new study, the United States National Institute of General Medical Sciences, part of the National Institutes of Health, removed some data from public view. Nature magazine reported on the study published in Science this week by Yaniv Erlich, a human geneticist at the Whitehead Institute for Biomedical Research. It showed that identities were even more vulnerable than identified in research.

It confirmed that the identity of study participants could be gleaned from public genetic data if someone knows that participant's genetic makeup.

The new study also shows that it is also possible to identity male participants by cross-referencing research data about that participant and his DNA sequence to information posted on genetic genealogy and public records databases.

Eric Schadt of Mount Sinai Hospital in New York City told Nature that removing the data was not a solution and suggested being more up front with participants about the inability to protect privacy completely. "We should ensure that the most appropriate legislation is in place to protect participants from being exploited in any way," he said.

Erlich's team used this cross-referencing technique to discover the identities of five men whose genomes were sequenced and released as part of the 1,000 Genomes Project. These men also had participated in a project that studied Mormon families from Utah. He was able to discover the identities of their male and female relatives, and notified the National Institutes of Health without releasing the information.

