Abstract
The use of sequencing technologies has greatly expanded in both research and clinical settings. The generation of voluminous datasets has raised several issues regarding data sharing and access. Current regulations require clinical laboratories and some research laboratories to provide access to test data, including sequencing data, directly to patients upon request. There is some controversy over whether this access right may be somewhat broader, encompassing research data as wellโa question beyond the scope of this article. It is clear that in the research setting, deposition of sequencing data into public or private databases often occurs, although little information exists about the return of data files to research participants (in contrast to the extensive deliberations regarding return of results). Thus, further consideration of the issue of access to data files is warranted as well as more effort to understand both patients' and research participants' use of the data.
W
The use of sequencing in clinical practice is generally for diagnostic purposes or to inform treatment for the benefit of a given patient, whereas sequencing in research is used to create generalizable knowledge for populations with no necessary benefit to individual participants. Despite these differences, there are some parallels to consider with respect to the use of WGS/WES. One key issue that has been heightened by the increased use of WGS/WES is the issue of incidental or secondary findings (Jaitovich Groisman and Godard, 2016). Similar to other testing platforms such as microarrays and comparative genomic hybridization that look across the genome for particular variants or structural anomalies, WGS/WES will potentially uncover variants unrelated to the primary clinical indication for testing or the research focus. Often referred to as incidental or secondary findings, some of these findings may be clinically significant and actionable. Recommendations regarding the return of incidental or secondary findings that are considered clinically actionable to patients undergoing clinical sequencing were published in 2013 (Green et al., 2013) and updated in 2017 (Kalia et al., 2017).
In research, though, the issue is more complex with respect to anticipated findings (those linked to the study goals) and secondary findings, largely due to a lack of obligation of researchers to participants to return results, the quality of results, and the lack of support/staff to plan and appropriately report potentially clinically significant findings (Johns et al., 2014; Klitzman et al., 2014). Several studies have confirmed research participants' interest in accessing research results generated through genomic research (anticipated or incidental) (Allen et al., 2014; Strong et al., 2014). Likewise, several studies of researchers' views, scholarly opinion, and institutional and professional organizational positions are generally supportive of the practice of offering participants the option to learn of findings deemed clinically useful (Fernandez et al., 2013; Klitzman et al., 2013; Knoppers et al., 2013; Senecal et al., 2015; Christensen et al., 2017). The clinical benefit of participant access to research results may be quite low (Johns et al., 2017). However, despite participant interest and support from the community, the prevalence of returning research results does not appear to routinely occur (Heaney et al., 2010).
A second issue looming on the horizon for sequencing relates to access to raw data files. Similar to the issue of incidental or secondary findings, this issue has already been addressed for clinical-based sequencing. In 2014, U.S. regulations were finalized that require clinical testing laboratories subject to the Clinical Laboratory Improvements Act of 1988 (CLIA) and laboratories subject to the Health Information Portability and Accountability Act (HIPAA) Privacy Rule to make available data from tests maintained in a patient's designated record set (Centers for Medicare and Medicaid Services et al., 2014; Department of Health and Human Services, 2016). The new data access policy was developed to align with other changes in health practice, such as electronic medical records, personalized medicine, broader support for patient engagement in healthcare decisions, and to serve various other policy objectives. Thus, patients or a person designated by the patient may request raw sequencing data. A recent study of clinical sequencing laboratories reported that the prevalence of data requests from patients is low (<10%), with more requests received from providers (10-50%) (O'Daniel et al., 2017). In the experience of GeneDx, a US-based clinical laboratory that performs WES, a review of their records between September 2015 and May 2016 shows that the laboratory received 343 data requests for sequencing data files over a 7-month period (average of 49 requests/month). At least eight requests were known to have been submitted on a patient's behalf as the data file was either requested to be sent directly to the patient or initially sent to the physician who subsequently requested that the data be forwarded directly to the patient.
In the clinical setting, patients (or clinicians) may desire access to WES/WGS sequence data for a variety of reasons. WES/WGS has been reported to result in a diagnosis or positive finding in 20-30% of patients (Yang et al., 2013, 2014; Lee et al., 2014; Retterer et al., 2015), leaving the majority of patients without a diagnosis. Inconsistencies between laboratories' interpretation and reporting of variants may lead to a different result (Martin et al., 2015; Amendola et al., 2016; Pepin et al., 2016; Van Driest et al., 2016). Thus, one primary reason to request sequencing data files is to obtain a second opinion. While some laboratories offer an option for reanalysis, some patients may wish to submit their data to a different clinical laboratory. In genetics and other specialties, some second opinions have led to reclassification or a different diagnosis from the original test interpretation (Lysack et al., 2013; Middleton et al., 2014; Faas, 2015; Khazai et al., 2015; Meyer et al., 2015; Zhu et al., 2016). However, more than half (57%) of the 343 data requests mentioned above were submitted to GeneDx before completion of sequencing. This may simply be due to the convenience of completing all of the forms at once during a clinic visit when the clinician and family members are present or simply because the option to request the data exists and patients and providers โcan.โ
In addition, patients may want to share their sequence data for research purposes, related or unrelated to his/her clinical indication. Sites such as DNAland (https://dna.land) and openSNP (https://opensnp.org) currently accept raw sequencing data files from consumers of direct-to-consumer testing companies. Similarly, patients may choose to share their test report (or the specific mutations identified) through databases such as Free the Data (www.free-the-data.org) consortium, which supports open access to genetic data and fostering research or the NIH ClinVar database (www.ncbi.nlm.nih.gov/clinvar/intro). Some patients may request the sequence data files to determine if they qualify for a drug trial (Might and Wilsey, 2014). One study reported that those who shared their personal genomic data obtained through direct-to-consumer testing did so primarily to learn more about themselves or to contribute to research (Haeusermann et al., 2017).
Last, some patients may want to search through the data on their own and have greater autonomy to choose what additional types of information they desire to learn (Brothers et al., 2017). Many variants are unreported in clinical WES/WGS as they are not known to be associated with the clinical indication for which the test was ordered, are of unknown clinical significance, or are outside the scope of the initial analysis. Thus, patients may be interested in evaluating the data for health risks unrelated to the current clinical indication, carrier status, or pharmacogenetic drug response status.
The current research environment suggests that we should anticipate that some participants will also desire access to their sequencing data files. The culmination of several factors contributes to this forecast: the trends toward greater participant engagement and patient-centered research, greater public awareness about genetics and genomics, continued use of WES/WGS and other omics technologies in research, and the involvement of thousands of research participants (and likely millions worldwide) in numerous large-scale, longitudinal cohort studies. A few studies have reported that some research participants are indeed interested in accessing the raw data from a genomic research study (Middleton et al., 2015; Sanderson et al., 2016). The upcoming large-scale U.S. initiative such as the All of Us (originally named the Precision Medicine Initiative) has outlined two participant-centered principles that relate to data access: respecting participant preferences and participant empowerment to access to information (Interagency Working Group, 2015).
While policies requiring data sharing have been developed and implemented, enabling researchers worldwide access to genomic and other types of data, research participants do not typically have access to their own data. The lack of participant access may be due to a number of reasons. The first may be due to the quality of the data. Clinically related results that are returned to research participants should be confirmed in a CLIA-certified or an accredited clinical laboratory. However, most research sequencing is not performed in a clinical laboratory; thus, research sequencing data may contain a higher number of errors due to lack of standard operating protocols, appropriately trained laboratorians, analytical validation issues, and limited quality assurances and controls. For sequencing performed in a research laboratory, would it be acceptable to share a data file with participants if they are informed about the quality of the data? In some cases, sequencing for a research protocol is performed in a CLIA-based sequencing laboratory (e.g., a study to ascertain the benefit of using WGS or WES to identify the cause of a disease in undiagnosed individuals). In this situation, would it be a double standard if an undiagnosed (affected) individual undergoes clinical sequencing and has access to their sequencing file, whereas an undiagnosed affected individual who opts to enroll in a research study and undergoes the same type of sequencing test performed in an accredited laboratory does not have access to the sequencing file?
If we assume that the quality of the sequencing data is equivalent for research participants and patients (meaning that sequencing is performed in a CLIA-certified laboratory), are there additional considerations for research participants that would warrant a different policy for access to sequencing data files than is currently available for patients undergoing clinical WGS/WES? Research participants may be interested in accessing their sequencing data files for many of the same reasons that a patient would. While research and clinical practice serve very different purposes, some have recognized (although may not support) that the division between research and clinical practice has blurred (Miller et al., 2008; Kass et al., 2013; Berkman et al., 2014), giving rise to discussions such as returning research results, specifically those that are clinically actionable. Researchers' obligation or duty to look for clinically significant results is not supported (Gliwa and Berkman, 2013; Ross and Reiff, 2013), while consensus exists for the practice of offering participants the option of learning about clinically significant results discovered in research (Gliwa et al., 2016).
While data files for other types of clinical tests performed as part of a research protocol (e.g., standard blood tests to determine eligibility) may be accessible to participants, the provision of raw sequencing data generated from WES/WGS testing raises some issues that are not comparable with other clinical tests. For WES, the volume of sequence data is incomparable with other clinical tests, including on average 100,000 variants in 20,000 genes, of which an estimated 5000 variants are of potential clinical relevance (Retterer et al., 2015). In addition to how to handle and map the vast amount of sequence variants, research participants will inevitably need help with identifying good tools to analyze the data and present it in an understandable manner. Programs such as Promethease (https://promethease.com), LiveWello (https://livewello.com), and other groups enable participants to upload and analyze their own sequence data. Displaying and communicating such complex voluminous information will also be challenging for a public with diverse literacy levels.
If sequencing files are made accessible to patients and potentially research participants, we suggest that further study is needed in three major areas: (1) how data are utilized; (2) potential risks associated with data access; and (3) the burden and responsibilities on research laboratories and investigators. First, a better understanding of how patients/research participants intend to use the data would help inform development of tools and resources for patients and participants to enable appropriate analysis of data and awareness of potential benefits and harms. Without such educational resources, they may not understand what programs are available to analyze sequencing data and the types of information that may be revealed.
While patients/research participants may gain some benefit in learning of potential disease susceptibility through their own analysis of the data, a number of potential risks exist with self-analysis. The information learned may be incorrect and, without the support of an expert provider, may cause anxiety, lead to misdiagnosis or false reassurance, discrimination, and costs for the patient and/or family members, as well as society (McGuire and Burke, 2011). Furthermore, patients/participants may not be aware of potential risks of uploading their data (Shabani and Borry, 2015), the variable levels of analytical functions available, the lack of oversight of publicly available programs, or that these programs are not intended for use in medical management (Pabinger et al., 2014). Some of these analytical tools may not be HIPAA compliant, and information regarding the storage of sequence data (and for how long) and when the software was last updated to reflect the current evidence for interpretation may not be disclosed. The implications for relatives should also be considered since WGS/WES is often performed on parents and the child (trios) and, in some cases, siblings to inform clinical interpretation and distinguish between de novo and inherited variants. Individuals may be unaware of privacy risks possible with raw data files for themselves or family members (Gymrek et al., 2013; Shi and Wu, 2017). Efforts are ongoing to strengthen privacy protection for genetic information in the research setting (Terry, 2016). In addition, while providing participants with access to sequence data could be viewed as a benefit to participation in research, it may potentially be considered coercive given that sequencing is currently not accessible directly to the public without physician involvement and likely costs hundreds of dollars out of pocket.
Last, in addition to consideration of the implications of participants' direct access to their sequencing data, it is also important to consider the impact of the policy on researchers and research funding. Clinical laboratories can charge a data access fee to cover the costs of retrieving, storing, and sending the sequencing files, typically in the range of $50-$100, but there is little precedent in the research setting. If no fee is charged to cover labor and supplies in either the clinical or research setting, individuals may be more likely to request it because it was available at no cost or simply from the desire to obtain all available data/information for potential future use or just for information (Middleton et al., 2015). This effect is anticipated as studies of research participants, the general public, genomic researchers, genetic health professionals, and other health professionals revealed that the majority of these individuals would want to receive their raw data if they participated in a genomic research study (Middleton et al., 2015; Sanderson et al., 2016). Some research laboratories may feel an ethical responsibility to assist participants with the complexity of this information, which may extend beyond the current expertise or resources of the laboratory. However, charging a fee could potentially limit participant access to data files.
In conclusion, access to sequencing data files by research participants will likely become an important issue warranting careful consideration and new policies. This new policy is not specific to genetic data generated from a clinical test, and therefore testing laboratories may face similar requests as patient/participant awareness grows or a specific use for the data develops. Policy makers can build upon the related discussion of returning results, as applicable, but greater attention to participant education will likely be needed to promote informed decision-making regarding participants' access and management of their own (or child's) sequencing file and to minimize potential harm that may come from data mishandling, inaccurate interpretations, and lack of expert support.
Footnotes
Author Disclosure Statement
Bethany Friedman and Gabi Richard are employees of GeneDx, Inc., a wholly owned subsidiary of Opko Health, Inc. No other competing financial interests exist.
