Come Together,Right Now!

Abstract

One of the greatest challenges facing the genetics community today is the question of how human genetic variability affects disease and response to treatment. Traditionally, this question has been addressed on a macro level via a gene by gene approach. But as technology continues to advance and tools like next-generation sequencing are more widely available, the significance of variants can be assessed in a more systematic way. With the advent of next-generation sequencing technologies, data interpretation, as opposed to production and annotation, has become the rate-limiting step in research. It may soon be as quick and cost effective to sequence the entire genome as it is to order a single gene test, and so the challenge will continue to grow.

In order to understand the role that variants play in disease and response to treatment, many clinical laboratories have created in-house databases to catalog the variant data that are obtained through their testing services. This same activity takes place in many academic laboratories as they strive to associate genotypes with phenotypes. With each laboratory creating its own unique database, the rich resource of variation data remains fragmented and incomplete. The need for aggregating clinical-grade data on genetic variability was one of the focal points of discussion at the Institute of Medicine Roundtable on Translating Genomic-Based Research for Health workshop held on July 19 and 20, 2011. Much discussion centered on identifying the key requirements for creating an ideal clinical-grade variant database or federation of databases. A curation process to ensure data quality, a common set of terms, and standardized mechanisms for updating classifications as new data become available were among the factors identified as key requirements. There was also recognition that such a system should allow for the integration of clinical data with genomic data. Further, a transparent, standardized set of rules for classifying the significance of variants, and ways to alert health-care professionals when the significance of a variant changes are critical.

The roundtable also discussed the larger ecosystem changes that would be needed to support such a comprehensive clinical-grade variant aggregation in improving our understanding of the role that genetic variability plays in disease and response to therapy. There was a consensus on the need for greater collaboration between health-care settings and clinical laboratories, both to link the genetic data with clinical data and to ensure that the genetic data are integrated into patient care. The need for changes in scientific publishing practices was also discussed because the literature base is currently limited by the inability to publish “negative” research results. Lastly, there was a discussion of where such a comprehensive variant database should live in order to get the needed widespread buy-in from the entire genetics community. Some felt that it would need to be housed with a stable, trusted intermediary with a track record for supporting broad data sharing, such as the National Institutes of Health, while others identified a role for international bodies such as the recently formed International Rare Disease Research Consortium in creating truly comprehensive databases.

This is not a new conversation. Currently, a number of initiatives are underway to create widely accessible, comprehensive databases to catalog clinical-grade variant data. The Human Variome Project strives to catalyze the reduction in human disease in the 21st century by facilitating the establishment and maintenance of standards, systems, and infrastructure for the worldwide collection and by sharing of all genetic variations affecting human disease. The Leiden Open (source) Variation Database is designed to provide a flexible, freely available tool for gene-centered collection and display of DNA variations. National Center for Biotechnology Information and the European Bioinformatics Institute are collaborating to create a stable reference genome and creating a new database, ClinVar, to collate database of single nucleotide polymorphism entries from a known clinical source. Likewise MutaDATABASE is a publically available, open-access database established by the not-for-profit MutaDATABASE Foundation that provides standardized information on variants. While there is still a need for discussion around the relative merits and limitations of each of these different approaches, the more urgent need is for the discussion to continue around some of the larger systems-level problems that threaten to hamper collaborative initiatives as a whole.

We as a community will need to continue to discuss how we will encourage and support broad data-sharing across all stakeholder groups including academic researchers, clinical laboratories, disease advocacy organizations, and private industry. Without widespread data-sharing these types of initiatives will consistently fail to thrive. At times the obstacles are not data systems, but Health Insurance Portability and Accountability Act and privacy requirements, as well as the limited time physicians have to fill out paperwork. These are major inhibitors of the ability to collect the phenotype information that integrates with the genomic data, making the genomic information less useful than it could be to shed light on variant interpretation. Resolving these questions remains a high priority if we are to create shared resources for addressing questions that are too big for any one individual or institution to address in isolation.