Abstract

Luke Hickey, Pacific Biossciences
It's an exciting time for researchers focused on solving rare diseases. Thanks to new tools and methods, scientists are better equipped than ever to identify these diseases, determine the biological mechanisms underlying them, and pursue the development for new treatments.
The biggest contributor to this research progress has come from improvements in DNA sequencing technology. As these platforms have become more affordable and more accurate, scientists have found it feasible to apply them for studies of rare disease. Exome- or even whole-genome sequencing, once out of reach for all but the best-funded labs, are now inexpensive enough to be used routinely.
With genome-wide data, researchers have made tremendous strides in discovering disease-causing variants. More than 7,000 rare and Mendelian diseases have been identified, and many of them still have unknown biological causes. New diseases are identified every year. While genome-wide exploration has already shown promise, there is much more work to be done to provide answers for these diseases—and the countless others yet to be identified.
As clinical research teams have deployed sequencing tools to understand rare diseases, they have dramatically increased the diagnostic yield. These solve rates now run between 25% and 50%, a significant improvement compared to pre-sequencing rates. But clearly, there are still too many people experiencing the diagnostic odyssey so common to rare disease patients. New innovations in DNA sequencing may offer a way to help.
Hunting pathogenic variants
Next-generation sequencing (NGS) tools are characterized by two common traits: the massively parallel reactions that allow them to produce a huge amount of data, and the short-read data type they generate. The length of any individual read typically ranges from 50 base pairs to 350 base pairs; these reads are then mapped to a reference or stitched together in silico to yield a useful assembly. Short-read sequencers are very good at detecting single nucleotide variants (SNVs) or indels that are less than 10 bases long. Since a substantial number of rare diseases are caused by SNVs (e.g., non-synonymous heterozygous variants occurring in exon coding region of genes), NGS platforms have been quite useful for solving these cases.
Unfortunately, short-read sequencing technologies struggle to detect larger variants, which explains at least some of the gap between known rare diseases and those that have been solved with NGS data. Short reads containing non-unique sequence will map to many places in the genome, leading to assembly errors. For rare diseases associated with repetitive regions, such as repeat expansion disorders like ALS and Fragile X, mapping ambiguity and expansion length prevent researchers from getting a clear view of the region of interest. In addition, some variants are so large they cannot be fully spanned by short reads alone. These structural variants, which have proven to be causative for numerous rare diseases, must be sequenced completely, with breakpoints spanned and mapped, in order for variant-calling algorithms to identify them correctly.
An alternative approach comes from long-read sequencing technology, which can produce highly accurate reads that are tens of kilobases long. With this length, any read is more likely to contain unique sequence for unambiguous mapping, and reads are sufficient to capture even large variants in a single, uninterrupted readout. Scientists have been applying this newer tool to rare disease and have been able to find answers where previous efforts—even whole-genome sequencing studies using short reads—have failed.
Solving rare disease
One of the earliest demonstrations of how long-read sequencing could help rare disease research came from Stanford University scientists seeking an answer for a patient who had undergone more than eight years of genetic analysis without any results to confirm clinicians' best guess at a diagnosis: Carney complex, which results in lifelong growth of benign tumors. By applying long-read sequencing, they identified a de novo deletion of more than 2 kilobases in the gene linked to Carney complex1. The deletion was pathogenic and, after all those years, provided an answer to the patient and his family.
Another example, from scientists in Japan, involved a family with two siblings suffering from a rare and progressive type of epilepsy. No answers had been found via short-read exome sequencing and other approaches. With long-read sequencing, though, scientists were able to focus on structural variants. They found more than 17,000 of them in the genome, but quickly homed in on a homozygous 12.4 kilobase deletion that was later confirmed as pathogenic2. The mutation was located in a gene known to be associated with this form of epilepsy and specifically in a region characterized by GC-rich sequence, which can be challenging for short-read sequencers to capture.
These cases illustrate a larger trend: many scientists have now solved rare diseases using SMRT sequencing. Repeat expansion disorders have been a noteworthy class, as these are perhaps the diseases least likely to be solved with short reads. From Fragile X syndrome to myotonic dystrophy and several types of ataxia, long-read data have been used to generate important new insights about their causative genetic mechanisms3–7. There are also more complex forms of pathogenic variants, such as retrotransposon insertions associated with X-linked dystonia-parkinsonism or large triplications linked to Temple syndrome8.
Scaling up
Recently, scientific teams have begun implementing long-read sequencing in larger-scale efforts to solve rare diseases. The European SOLVE-RD consortium, for instance, is working to sequence 500 whole genomes with SMRT sequencing tools to find pathogenic variants for cases that have proven intractable. In the U.S., NIH's Clinical Sequencing Exploratory Research program has adopted this approach to improve the solve rate for rare pediatric diseases. Through this program, researchers at the HudsonAlpha Institute for Biotechnology are sequencing the genomes of hundreds of children who have unexplained intellectual and developmental disabilities.
These and other large-scale efforts will undoubtedly allow scientists to make real progress in solving rare diseases and to improve our understanding of disease-causing variants in general. As new answers emerge, they will support improved methods for diagnosing and treating rare disease using long-read whole genome sequencing—further expanding those who benefit from this exciting new technology.
