Skip to content

Taking steps to prevent jigsaw re-identification in genomic research

25 Mar, 2014

B0004877 DNA double helix and sequencing output

Four major UK research funders have released their joint response to a statement from the Expert Advisory Group on Data Access (EAGDA), on the issue of the re-identifiability of participants from genomic research studies. Natalie Banner, Policy Officer at the Wellcome Trust, explains why this issue is becomingly increasingly important.

When individuals participate in consented research studies, researchers are duty bound to protect their confidentiality as far as possible. Much genomic research relies on using large-scale aggregate datasets, which are anonymised, to establish which genes may be associated with different diseases. If any potentially identifiable data are made available for other researchers to access, there are strict legal restrictions in place stipulating how the data can and cannot be used. Such safeguards are designed to reduce the risk of participants being re-identified from research data, which would be a breach of confidentiality.

With advances in bioinformatics, an explosion in the volume of data, and an increasing push towards sharing data, there are growing possibilities for linking datasets in order to seek out and answer new research questions. To take just one example, linking data on the incidence of particular diseases, genomic markers, and socioeconomic indicators may reveal new insights into the complex relationship between aspects of health, genomics and environment.

However, linking data sets raises a challenging ethical and practical issue for participant confidentiality: the risk of jigsaw re-identification. With only one or two pieces of information, very little can be tied to particular individual and the possibility of actually identifying a person within the mass of aggregated  data is remote. But when data from multiple sources is available, it may, in certain circumstances, allow a more complete picture of an individual to be pieced together. This could result in some confidential information being linked to an identifiable person.  A paper on genomics published in Science last year demonstrated how it could be technically possible to link data in this way.  The authors developed a complex methodology that involved linking open access genetic sequence data, information from publicly available genealogy databases that link surnames with specific genetic markers on the Y-chromosome, and public demographic records. The team successfully triangulated the identities and genomes of up to 50 participants from a research study, the 1000 Genomes Project. Importantly, while the participants themselves had given consent for their data to be openly and freely used, linking a genome to an individual has implications for their biological relatives as well, and future generations with whom they will share genetic characteristics.

JigsawAlthough the method used by the authors succeeded only in a highly specific set of circumstances, the paper alerted funders, researchers and institutions to the technical possibility that anonymised genomic data could, in principle, be subject to re-identification through linkage with other data sources. It’s important to realise that for the purposes of biomedical research, linking any research data to a name is neither necessary nor desirable: researchers want to know how different genes, diseases, environmental factors and so on relate to one another, not who has what condition. To continue the jigsaw analogy, they are much more interested in finding many pieces of the same shape or size from lots of different individuals than they are in putting together a complete picture of a single person. But if the risk of building such a picture is there, we need to mitigate it as strongly as possible.

In light of this, EAGDA conducted research last year into the issue of identifiability in UK research studies involving or linking to genomic data, seeking to establish whether there was a risk of participants being re-identified and steps that funders and study leaders could take to mitigate this risk. This culminated in a series of recommendations to EAGDA’s funders, the MRC, ESRC, CRUK and the Wellcome Trust. These recommendations centre around issues of consent, risk assessment, controlling access to data and enforcing sanctions against anyone found to have deliberately attempted to re-identify individuals from research data.

It would be naïve to presume that data can ever be 100% secure: there are going to be risks, but we believe that with good governance and management, and constant vigilance for the kinds of issues EAGDA has alerted us to, these can be managed. Given the recent public concerns over access to and the use of primary care records through the Government’s scheme, it has never been more important for biomedical and health researchers to be transparent in how and why they use data, upfront to participants and their families about the risks involved and robust in the governance systems they use to control access to research data.

At the Wellcome Trust we’re continuing to work with EAGDA and the other funders to mitigate the risks of re-identification from the data our researchers generate and analyse. Our work cannot proceed without the generous participation in research from individuals all over the UK and beyond: it is our duty to push the boundaries of medical research whilst protecting and respecting their confidentiality as far as we can.

You can read the statement and response on the EAGDA section of the Wellcome Trust website. 

Image credits: (Top) Peter Artymiuk, Wellcome Images, (Lower right)Adrian Cousins, Wellcome Images

2 Comments leave one →
  1. Michael Vanzieleghem permalink
    25 Mar, 2014 12:46 pm

    What an excellent paper. While it may be difficult to triangulate to individual genetic data sources just now, it may become much easier as information technology grows. There are those who might wish to exploit this information source with deep enough pockets to focus on such an effort, both good guys and bad guys…

  2. 11 Apr, 2015 3:55 pm

    Researchers, Scientists, Institutions, Funders participating in EAGDA are failing in that they have not had ANY dialogue with citizen scientists in the Genetic Genealogy community, researchers in the Anthropology community, nor with a panel of lay persons representing the participants the persons in health genetics research studies.

    Many of these senior level Genetic Genealogists have MDs, PhDs, etc. and are every bit as skilled as many Population Geneticists and Research Scientists I have encountered. The Genetic Genealogy community in no way seeks to delve into health genetics … ancient ancestry genetics prior to 1400AD is the bag of Ancient Ancestry Genetic Genealogists I represent.

    The Genetic Genealogy community in no way seeks to use various databases and analytical processes to re-identify ‘specific individuals’ in a research study. The attitude of those on EAGDA is a condescending one similar to a PhD Professional Astronomer saying to a citizen scientist Astronomer: “only I can look at the stars … you don’t have enough academic training as me … so go see Dr. Mark Thomas at UCL and take up Astrology”

    The above blog EAGDA posting states: “Our work cannot proceed without the “generous participation in research from individuals all over the UK and beyond ….” Well, those providing their ‘generous participation in health research’ are beginning to rebel in substantial numbers because of the close minded attitude of those at EAGDA and funders such as the Wellcome Trust. Here’s an actual example.

    A March 30, 2015 blog reply from the Wellcome Trust concerning the “two clear research aims” of the PoBI (People of the British Isles) study are very telling:

    Wellcome Trust says: “The POBI study itself was set up to look at the patterns of differences in people’s genetic make-up around the UK with two clear aims: to contribute to our understanding of genetics in health and disease, and to shed light on ANCIENT MIGRATIONS WITHIN THE BRITISH ISLES.”

    On the first aim of “contributing to our understanding of genetics in health and disease”, I assert that this particular PoBI study in Nature contributed practically nothing. Was there even one mention of a disease and its genetic linkage in this particular PoBI study? The answer is a clear No. Was there even one mention of a higher genetic based Odds Ratio for a disease such as diabetes in any of the 17 PoBI clusters? Again, the answer is a clear No. The general public knows that are many other factors than just genetics affecting health and disease such as: environmental – lifestyle – dietary – economic – etc. For the Wellcome Trust not to mention these other factors is shameful and misleading. I contend that the Wellcome Trust Board of Governors should reallocate their priorities and focus more on providing the poor in the British Isles charitable funds to purchase additional fresh fruits and vegetables to supplement their diet. But, we know that ain’t going to happen because it would starve all the PhDs on funding from the Wellcome Trust. Well, I am of the opinion to serve the greater good and to starve those PhDs.

    On the second aim to “shed light on ancient migrations within the British Isles”, I contend that citizen scientists in the Genetic Genealogy community and the Anthropology community are much ahead of the PoBI researchers in this aspect.

    First of all there are many specialties within the Genetic Genealogy community that the Wellcome Trust cares little about.

    The Genetic Genealogy community can be segmented into two distinct groups: (1) ‘Genetic Genealogy Recent Ancestry’ concerns the largest group seeking to fill in ‘their own’ family trees over the past 600 years (1400AD-2000AD). This period covers about their last 20 generation of ancestors. For those of us with British Ancestry in the USA, we need to research our ancestors born in the USA as well as in the British Isles.

    The second distinct group” ‘Genetic Genealogy Ancient Ancestry’ concerns a much smaller citizen scientist group seeking to learn more about the migratory patterns and geographic origins of their ancestors from about 5000BC to 1400AD. This 6400 year period covers about 220 generations. Unlike the recent PoBI study which looked at only Autosomal DNA, we also look at uni-parental Y-DNA unique to men which is passed from father to son as well as mt-DNA which a mother passes to both female and male children. These uni-parental Y-DNA and mt-DNA markers / SNPs / STRs / CNV / etc do not recombine as does Autosomal DNA and are thus much easier to trace via advanced NGS – Next Generation Sequencing DNA tests which many in a group such as Y-DNA Haplogroup R1b-L371 investigate.

    This particular Hg is ‘Ancient British’ from NW Wales and dates to about 3100BC.
    The PoBI study fails as it did not seek to verify some of its Autosomal DNA Cluster age narratives with Y-DNA and mt-DNA evidence.

    So, on the second aim to “shed light on ancient migrations within the British Isles”, I contend that those in the citizen scientist group concerned with ‘Genetic Genealogy Ancient Ancestry’ in the British Isles have far surpassed those in the academic community who did research on the US$5 million PoBI study.

    The PoBI researchers could and should release annoymized macro level Cluster data to bonded qualified researchers in the ‘Genetic Genealogy Ancient Ancestry’ community. None of this would violate ‘individual participant’ confidentiality.

    Until this is done, I call on members of the public to refrain from participating in any research funded or sponsored by the Wellcome Trust or with any organization participating in EAGDA. Journalists will soon be all over this story and let the chips fall where they may.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: