Abstract

St. Jude Children’s Research Hospital, home of the largest public repository of pediatric cancer genomics data in the world, is now offering it free of charge to any researcher who wants to use it—along with tools it has designed to aid in cancer research. St. Jude Cloud launched April 16 aided by collaborators Microsoft’s Azure cloud and platform provider DNAnexus.
The St. Jude Cloud will allow researchers to conduct their own novel research with their own data, or leveraging St. Jude’s data, and will allows investigators to collaborate on the cloud, without having to incur the expense of building an infrastructure capable of handling the vast amount of complex data inherent in genomic research, according to Jinghui Zhang, Ph.D., chair of the Department of Computational Biology at St. Jude Children’s Hospital. On the first day of the launch, more than 2,000 researchers worldwide signed up to use the service, from countries including Australia, China, France, Germany, and the U.S., she said.
St. Jude Cloud should take pediatric cancer research—and even adult cancer research—to a whole new level, Zhang said. Leveraging cloud computing is important because it keeps all the data in one place, without different copies of the data being downloaded by researchers all over the world. In addition to saving infrastructure expenses for researchers, it also will save time. Without the use of the cloud, downloading all of St. Jude’s data takes up to a month, she said.
Jinghui Zhang, Ph.D., chair of the Department of Computational Biology at St. Jude will also head St. Jude Cloud. Zhang said she hopes other organizations will follow the St. Jude lead and make other large genomic data sets freely available to any researchers that want to use them.
Several scientists have told Zhang since the launch that they want to apply St. Jude’s tools to analyze data in ways that St. Jude hasn’t yet. “That’s exactly what we want,” she said. “We’re thrilled to see it.”
Zhang, a computational biologist who heads the St. Jude Cloud project, has spent her career conducting integrative analysis of large-scale, multi-dimensional genomic data to help understand and cure diseases like rare childhood cancer. She said St. Jude’s dream for the project is that other organizations may follow suit and be willing to share their data freely, as well. Rare diseases, in particular, need more data to find cures. A by-product of the work will be that discoveries made regarding pediatric cancer usually lead to findings that have implications on treating adult cancer, she said. St. Jude wants its data and tools to attract a variety of experts, not just cancer researchers, but those outside of the field, such as computational analysts, who will approach the research from different perspectives.
On St. Jude Cloud, researchers will be able to access whole genome data from more than 700 paired tumor/germline samples for common and rare pediatric cancers, which was sequenced as part of the St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project. The interactive data-sharing platform allows scientists to explore more than 5,000 whole-genome, 5,000 whole-exome and 1,200 RNA-seq datasets from more than 5,000 pediatric cancer patients and survivors. St. Jude expects to make 10,000 whole-genome sequences available on St. Jude Cloud by next year.
According to St. Jude, the data on St. Jude Cloud is accessible by disease, publication, and curated dataset. A tool created by St. Jude, called PeCan data explorer, allows researchers to drill down into the samples. In addition, researchers will also have access to a genomic visualization engine developed by St. Jude and a unique data browser that “allows frictionless navigation through the genome, including coding and non-coding regions.”
St. Jude Cloud will house the datasets of 10,000 WGS sequences by next year.
The goal was to make the platform “truly useful to regular researchers,” Zhang said. “Nothing like this is available in the world for regular researchers with no computational skills.”
Zhang said both DNAnexus and Microsoft were selected for their unique skillsets and expertise—particularly in privacy and security. “Data security on the cloud is extremely important, and we did not have the expertise to deal with this ourselves,” she said. “Privacy is our number one concern.”
Researchers who apply to use St. Jude Cloud must consent to a series of federally mandated privacy protocols. “We believe that with the data being centralized on the cloud, it will provide a better way of monitoring it,” Zhang added.
Microsoft has extensive experience in both the cloud and genomics. Microsoft’s cloud, Azure was launched in 2010. “We understand the complexities of large-scale genomics data and are proud to say we’ve processed half a petabyte of data for St. Jude Cloud to date,” said Geralyn Miller, director of Microsoft Genomics. “Microsoft has been involved in genomics for 12 years, with partners that include UC Santa Cruz Genomics Institute, Stanford Center for Genomics and Personalized Medicine, University of Medical Center Hamburg–Ependorf, and the University of Washington. Partners like DNAnexus, Curoverse, BC Platforms, and WuXi NextCODE have deployed platforms on Microsoft Azure to help manage, process and share genomic and biomedical data.”
“The sheer scale of genomics data requires technology that can help researchers harness data in a more secure way,” Miller said. “Microsoft Azure is uniquely positioned to help with this as it offers scale, efficiency, and data-analysis capabilities researchers need to manage and analyze massive datasets. By augmenting researchers, it in turn helps institutes and organizations advance their work all while meeting stringent data use, security, and privacy requirements.”
DNAnexus, another genomics heavyweight, has created the global network for genomic and biomedical data, operating in North America, Europe, Asia-Pacific (including China), South America, and Africa. In 2015, DNAnexus was awarded a research and development contract by the FDA’s Office of Health Informatics to build precisionFDA, an open source platform for community sharing of genomic information. DNAnexus also provided the platform for the Regeneron Genetics Center.
“The whole point of the [genomic] analytics is to take these monstrously huge files and make them into something useable,” said Richard Daly, CEO of DNAnexus. “What makes genomics data useful is to mix genomic and phenotypic data. This can only really be done on the cloud right now.” According to Daly, the ability to operate on the cloud with more and larger datasets helps to increase research insights and provide more opportunities for cures.
Richard Daly, CEO, DNAnexus
St. Jude Children’s Research Hospital is making its pediatric cancer genomic data freely available via St. Jude Cloud with the hope of significantly accelerating pediatric cancer research.
“The amazing thing, the most notable thing” is the truly visionary work of St. Jude, which offers care to children for free, along with its “incredible research,” Daly noted. “You can’t visit St. Jude without getting recruited to their mission. We’re really excited to be a part of this, not because it’s an important technological advancement, but you have to love the mission. This is special.” Daly is particularly impressed that St. Jude is making its data freely available to further cancer research. “This is unique,” he said.
About 7,500 patients are seen at Nashville, Tenn.-based St. Jude annually, with most of them treated on a continuing outpatient basis, and they are part of ongoing research programs, according to the hospital. St. Jude has treated children from all 50 states and from around the world. Patients at St. Jude are referred by a physician, and nearly all have a disease currently under study and are eligible for a clinical trial.
