The Coming of Age for Big Data in Systems Radiobiology,an Engineering Perspective

Abstract

As high-throughput approaches in biological and biomedical research are transforming the life sciences into information-driven disciplines, modern analytics platforms for big data have started to address the needs for efficient and systematic data analysis and interpretation. We observe that radiobiology is following this general trend, with -omics information providing unparalleled depth into the biomolecular mechanisms of radiation response—defined as systems radiobiology. We outline the design of computational frameworks and discuss the analysis of big data in low-dose ionizing radiation (LDIR) responses of the mammalian brain. Following successful examples and best practices of approaches for the analysis of big data in life sciences and health care, we present the needs and requirements for radiation research. Our goal is to raise awareness for the radiobiology community about the new technological possibilities that can capture complex information and execute data analytics on a large scale. The production of large data sets from genome-wide experiments (quantity) and the complexity of radiation research with multidimensional experimental designs (quality) will necessitate the adoption of latest information technologies. The main objective was to translate research results into applied clinical and epidemiological practice and understand the responses of biological tissues to LDIR to define new radiation protection policies. We envisage a future where multidisciplinary teams include data scientists, artificial intelligence experts, DevOps engineers, and of course radiation experts to fulfill the augmented needs of the radiobiology community, accelerate research, and devise new strategies.

Introduction

The field of biomolecular radiation research is experiencing a significant transformation during recent times, with high-throughput technologies used to address questions about the response of biological systems to radiation at the molecular level, including genomics, epigenomics, transcriptomics, and proteomics.¹ This intense activity has generated novel insights into the mechanisms with which organisms respond to high- or low-dose ionizing radiation (LDIR).² The latter, defined as LDIR, has a particular significance for our present lifestyles across global communities, as it involves facets of human health, radioprotection, and safety on a wider scale.³

The amount of LDIR we receive today has been increasing, due to a number of factors including medical diagnosis,⁴ radiation-based therapies,⁵ increased air travel,⁶ and long-distance effects of nuclear accidents.^7,8 The significance of LDIR research extends beyond Earth,⁹ as health effects on astronauts due to long-term exposure to cosmic radiation will become of paramount importance for future space flights,^10,11 and ultimately space colonization. Among the greatest risks involving radiation, especially in outer space environments, relate to cancer and damage to the central nervous system (CNS),^12,13 topics of intense interest.

The study of LDIR effects on the CNS is a genuine challenge due to the inherent difficulties of studying brain and behavior—as brain research cannot be invasive and behavior is affected at the time of the study, translating molecular signatures into phenotypes, and behavioral patterns—during or after exposure.¹⁴ Thus, despite significant progress in understanding the effects of LDIR on various cell lines in vitro and their health implications,¹⁵ these studies need to be extended to the brain and the CNS, where knowledge of LDIR impact remains limited.

Consequently, it is not surprising that the number of publications for LDIR effects on various tissues and/or cell lines generally, and on the brain specifically, has been constantly increasing. High-quality data sets obtained by genome-wide experiments have started to transform the field of LDIR research to a genomics subdiscipline, giving rise to typical big data issues. These data form the basis upon which a systems approach can integrate the genome-wide structure and function of LDIR response, in the same manner as genomics feeds into systems biology.¹⁶ Therefore, we can now contemplate the birth of systems radiobiology, where high-throughput experiments provide the structure of the components and the systems approach offers appropriate frameworks for the analysis and simulation of the dynamics of biological responses.

The production of genomics data has been characterized as a “four-headed beast”: acquisition, storage, distribution, and analysis.¹⁷ Evidently, it is not straightforward to address all these challenges within a single, monolithic technological framework. In fact, an apt combination of contemporary approaches in data science will be able to advance computational activities forward so that research results can be translated into applied clinical and epidemiological practice.

Various technologies relating to data production and storage have been on the rise in recent years. Large complex data sets are inherently difficult to navigate, and both cloud storage and special programming interfaces (e.g., software development kits) remain the methods of choice for big data analysis and distribution. The analysis step is probably one of the most demanding problems we are faced with; this facet also affects LDIR research, especially in its ever-increasing genomics context. Software solutions for data analytics provided by academic efforts and commercial companies typically involve cloud-based systems while software development is shifting toward the use of artificial intelligence and related algorithmic approaches.

Herein, we provide an overview on the current state of LDIR brain research and analyze specific challenges for big data analysis and interpretation, connected to the growth of systems radiobiology, following a more familiar interplay of bioinformatics and systems biology, as well as legacy, epidemiology, environment, and biomaterial data sets.¹⁸ Elements of this exposition might also be relevant to radiomics (large-scale radiobiology with medical imaging) and regardless of dose or organ, thus impacting not only radiation safety but also the clinical realm and therapy.

Literature Growth

Hundreds of radiobiology studies took place in the past half-century or so (Fig. 1). Until recently, these studies were primarily gene-centric and tissue-specific in that they have measured effects of LDIR on the responses of a handful of genes and usually in a single tissue or typically a cell line.¹⁹ This is exemplified by the limited number of genes known to be affected in those detailed studies—for instance, we have managed to explore the literature, in a by no means exhaustive review, and have managed to identify 40 genes across vertebrates (mostly model organisms, and human) whose expression has been shown to respond to low-dose radiation in a range of experimental setups, with varying measurements of dose and amounts of radiation (Table 1). These invaluable studies form the basis upon which important steps were made toward a better understanding of radiation responses in biological systems, and mammalian species in particular.²⁰ From a methodological point of view, these findings might form a key resource for genomics experiments, as they represent a gold standard, which will need to be corroborated by high-throughput studies. In other words, the gene-centric experiments provide an estimate of the coverage of the genome-wide response, as they would need to be detected in those experiments.

FIG. 1.

Total numbers of publications per year in PubMed^®; blue line-left y-axis: total number of publications; green bars-right y-axis, publications related to LDIR effects on the brain. LDIR, low-dose ionizing radiation.

Table 1.

A list of 40 genes that have been studied in low-dose ionizing radiation experiments affecting the vertebrate brain

Gene	Species	Dose	Pmid
GFAP	Rat	0.0129 cGy/kg	1886997
ICAM-1	Mouse	2 Gy	7558951
ATase	Mouse	0.015 Gy/min	8472332
N-CAM	Rat	15 cGy	8691035
ANXA5	Human	<50 cGy	15498930
CASP3	Human	<50 cGy	15498930
p53	Human	<50 cGy	15498930
AKT	Mouse	0.5 Gy	15555557
CaMKII	Mouse	0.5 Gy	15555557
CREB	Mouse	0.5 Gy	15555557
ERK1	Mouse	0.5 Gy	15555557
ERK2	Mouse	0.5 Gy	15555557
p42	Mouse	0.5 Gy	15555557
Fe65	Mouse	0.5 or 1 Gy	17121854
p53	Mouse	0.5 or 1 Gy	17121854
Ku70	Zebrafish	50 cGy	17630212
XRCC6	Zebrafish	50 cGy	17630213
SOD2	Mouse	2, 10, 50 cGy	21056117
HSP 5	Mouse	1 Gy	21319302
PGK1	Mouse	1 Gy	21319302
TA1	Mouse	1 Gy	21319302
Arc	Mouse	53 or 200 cGy	21958477
CD-1	Mouse	50 or 200 cGy	21958477
IL-1RA	Mouse	52 or 200 cGy	21958477
TNF-a	Mouse	51 or 200 cGy	21958477
53BP1	Mouse	100 mGy	22947398
DNA-PKcs	Mouse	100 mGy	22947398
Ku70	Mouse	100 mGy	22947398
Apbb1	Mouse	100 mGy	23908553
Il1a	Mouse	100 mGy	23908553
Lrp1	Mouse	100 mGy	23908553
ICAM-1	Mouse	10 cGy	24937778
CaMKII	Mouse	350 and 500 mGy	25265567
GAP-43	Mouse	350 and 500 mGy	25265567
Syp	Mouse	350 and 500 mGy	25265567
CA1	Rat	1 Gy	25537960
CA1	Mouse	1 Gy	25621896
INaP	Mouse	1 Gy	25621896
CREB	Mouse	0.1 or 0.5 Gy	25807253
CCR2	Mouse	≤2 Gy	26042591
Iba1	Mouse	≤2 Gy	26042591
CREB	Mouse	0.1, 0.5, or 2.0 Gy	26420666
Rac1	Mouse	0.1, 0.5, or 2.0 Gy	26420666
RhoGDI	Mouse	0.1, 0.5, or 2.0 Gy	26420666
CD95	Human	250–500 mSv	26695909
TERF1	Human	250–500 mSv	26695909
TERF2	Human	250–500 mSv	26695909

The list has been generated from a quasi-systematic review of the literature and contains 47 records (genes CA1, CaMKII, ICAM-1, Ku70, and p54 are listed twice and CREB is listed three times, thus corresponding to 7 additional records). Column names: “gene” refers to the gene name reported (not always using a unique nomenclature), “species” is the model species or human, “dose” is given as reported, and “pmid” is the PubMed^® identifier for the corresponding publication. The list is sorted by pmid, which roughly corresponds to chronological order, therefore time of discovery for a specific gene.

A cursory review of the literature reveals a trend for both general LDIR research and brain-oriented studies, with the total number of relevant publications exhibiting a general trend for growth, at least in absolute numbers (Fig. 1). For the particular analysis shown here, we have searched the literature following the guidelines of the PRISMA protocol (“Preferred Reporting Items for Systematic Reviews and Meta-Analyses”) to collect all relevant publications.^21,22 The text mining search is used in conjunction with two search engines, PubTator²³ and Correlation Engine.^24,* One potential future plan might be the deployment of a text mining suite for literature scans, so that the contents always remain updated—a general issue of wider relevance. This step can be performed with minimal effort, by executing a semi-automated scan of the literature. Text mining technology might also be used to create relationship links for molecules and other terms recorded in data warehouses, such as BRIDE.²⁵ Additional experimental data will be continually contributed, once relevant studies are published in peer-reviewed journals. We have discovered 39 articles related to LDIR effects on the brain (Fig. 1). Of these, a total of 32 articles contain gene-related information—still low but expected to rise quickly in the near future.

Another common trend shared with the general genomics literature is that, increasingly, LDIR-related articles have started to contain extensive supplementary archives, with large data sets of genomics, transcriptomics, or proteomics experiments. In contrast to previous years, where publications addressed a small number of genes and their responses, genome-wide experiments provide significant information-rich results that need to be deposited to open access archives—these have not been used too widely in the LDIR field (e.g., FigShare, etc.). At present, most of this information is available via specialized databases for radiation research, such as StoreDB,²⁶ a major effort that records complex and complete information for radiobiology experiments.

Software Architectures

Herein, we provide a short perspective of big data frameworks for systems radiobiology throughout the data cycle—most of these features are arguably relevant for other types of -omics and systems biology research. We remark that, as much useful these frameworks can be for data science in general and computational genomics in particular, they offer significant opportunities for radiobiology research, as the nature of radiation experiments may push their boundaries in terms of data representation, complexity, and computational efficiency. These frameworks are designed to ensure minimum goals for computation, namely the integration of heterogeneous data, the acceleration of analysis and reproducibility, the application of latest analysis tools, and knowledge transformation into applications. The components of an idealized framework are shown in Figure 2, addressing the needs for knowledge extraction from complex data sets.²⁷ Indeed, LDIR experiments produce heterogeneous and multidimensional data sets; one of their hallmarks is that measurements may be coupled with other complex parameters or metadata, for example, developmental stage, tissue specificity, and standardized behavioral experiments in animals, recording intricate patterns of movement and interaction “phenotypes.” Cloud computing and tools such as data lakes will help us manage these complex heterogeneous data beyond traditional storage and analytical tools, which can no longer provide the agility and flexibility required to deliver relevant analyses. We describe the modules of this idealized framework below.

FIG. 2.

An outline of an idealized big data analytics framework architecture, also relevant for big data applications in systems radiobiology.

The first module, “data ingestion,” primarily addresses the discovery phase where decisions are made about those data sets from public databases that need to be integrated with experimental results. A strict software requirement is to extract data from several heterogeneous resources (e.g., web, ftp, REST application programming interfaces [APIs] etc.) and provide support for both structured and unstructured data formats.

The second module, “store,” concerns storage and is typically referred to as a “data lake.” Data lakes represent a new concept for the big data domain.²⁸ Using a data lake architecture, it is possible to gather massive amounts of raw data from different sources in a central location. Data lakes are optimized storage for big data analytics workloads. They differ from the data warehouse concept as they store unprocessed data, for which aims are not defined and maintenance is minimal, compared with data warehousing where processed data are stored with defined aims and higher maintenance costs.^† Data can be stored either as structured or unstructured. While it is relatively easy to deposit raw data into a data lake, it can be much harder to always be aware of all content details for downstream analysis.²⁹ Therefore, a data catalog is required to develop a metadata management system to search and understand the features of all available information, along other components.³⁰ To push the analogy a bit further, it is argued sometimes that without a sophisticated metadata management system, there is always a danger to convert a data lake into a “data swamp.”³¹

The “provision” phase is an additional multithreaded layer, concerning data security and user access restrictions to a data lake. It is implemented through user authorization, data control access on different levels, and encryption capabilities to secure sensitive data, with standard protocols.

The third module, “analytics,” evidently relates to the downstream data interpretation needs. In the analysis phase, we can use analytics engines such as Hadoop³² and Spark.³³ Well-established tools and techniques such as machine learning and network analysis may be applied to search for patterns among heterogeneous data sets. The cloud's capability for scalability with the on-demand use of computation-intensive clusters reduces processing times, while usage costs remain low as the pay-as-you-go charging model is usually applied.

The final module, “applications,” allows end users to interact with the enriched data sets in various ways. Batch or interactive queries, web APIs, and multipurpose notebooks are some of them. Finally, the research results can be presented clearly with web visualization dashboards-notebooks, such as Apache Zeppelin.³⁴

The implementation of this framework represents an optimal integrated approach for systems radiobiology and genomics, and beyond. The framework is batch-oriented for data processing, while it is possible to add appropriate modules to perform streaming analysis and real-time analytics. While these systems are not readily available, at least in the academic realm, we must admit that they represent a sort of a wish list, in our past, joint efforts to address the data analytics phase of the Cerebrad project.^‡ Such efforts have been under way to link together various relevant archives for the field of radiobiology.^35,36

This data integration phase for Cerebrad included data from genomics, transcriptomics, proteomics, tissue specificity, developmental stages, dose and time of exposure, as well as standardized behavioral experiments with mice (Karapiperis et al., in preparation). Specifically, the total number of experiments addressing molecular mechanisms involved a range of exposure from 24 hours to 6 months, a range of doses from 0.1 to 1 Gy, for pre- and postnatal developmental stages, accurate tissue profiling for various brain tissues (hippocampus, cerebellum, cortex), and behavioral tests. The total number of genome-wide measurements was therefore of the order of 100 different profiles, with a varying degree of efficacy and success, given the limitations of in vivo studies and other factors. Yet, the data size for those results was of the order of gigabytes (∼10 GB); in future studies, and at the same cost, these data are expected to be of the order of terabytes (1000 GB), that is, 100 × larger—which will raise serious challenges for data analysis and interpretation.

Data Analysis

Indeed, data availability for LDIR effects on the brain, and other tissues or cell lines, has also been increasing dramatically. High-quality -omics and next-generation sequencing experiments under various conditions and time–dose combinations produce high volumes of data and new challenges for interpretation through the use of techniques based on big data analytics. Data sets from current experiments can be integrated with publicly available data resources to enrich existing results and obtain further novel insights related to this domain. All this information from several data sources can be represented as a network graph.³⁷

As an example, we provide a representation of proteomics experiments on the mouse brain,³⁸ enriched with additional information. The present case shows that irradiation to a single dose of 500 mGy may cause developmental neurotoxic effects, both male and female mice, manifested by a lack-of, or reduced, capacity to habituate an unfamiliar environment (behavioral phenotype). Moreover, irradiation to a dose of 350 mGy seems to be a tentative threshold for induction of this type of neurotoxicity. For demonstration purposes and to reduce the complexity of visualization, we created three data-enriched networks with Cytoscape.³⁹ For those doses (20, 500, and 1000 mGy, 24 hours after exposure), we have added information for gene and chromosome identity, and phenotypes (including those describing disease) (Fig. 3). Obviously, several data layers can be added as we see fit, for example, metabolic pathways, gene ontology terms, expression levels, tissue specificity, and others. The specific example illustration is original in the sense that phenotypes and chromosome information are added to the standard attributes (gene, dose, time—fixed here at 24 hours) and is intended to sketch out how data integration and visualization can be coupled to aid interpretation of large-scale data, a common theme in data-intensive fields such as genomics.⁴⁰ This approach can help us understand hard-gained costly results and implicit relationships hidden in relevant connections, through exploratory data analysis.⁴¹

FIG. 3.

Visualization of three states, representing the experimental results discussed in Data Analysis section. Genes are depicted by red circles, and the chromosomes in which those genes are found are depicted by yellow circles, phenotypes are depicted by blue circles, while disease phenotypes are marked by open blue triangles. For instance, in the simple 20 mGy state, there are four networks covering six genes, which belong to four chromosomes (the top networks contain two genes each on the same chromosome, whereas the bottom simple networks with only two nodes contain genes with no association with known phenotypes or diseases). A more complex picture emerges at the 500- and 1000-mGy states: DOIDs are provided for disease phenotypes in these complex networks. For more information, please see Data Analysis section. DOIDs, disease ontology identifiers.

It is evident that certain genes influenced by exposure to 500 and 1000 mGy are known to be associated with certain human diseases, in contrast to those responding at 20 mGy, where there are no such cases (Fig. 3). Three genes associated with diseases for the dose of 500 mGy are as follows: Cox6a1 (12q24), implicated in a Charcot–Marie–Tooth disease recessive intermediate D⁴² (DOID:0110203); Tubb3 (16q24), involved in congenital fibrosis of the extraocular muscles⁴³ (DOID:0080143); and Chkb (22q13), implicated in megaconial-type congenital muscular dystrophy⁴⁴ (DOID:0110632). At 1000 mGy, the relevant genes are four: Cox6a1 (as above—corroborating this result; DOID:0110203), Aldh5a1 (6p22) associated with epilepsy⁴⁵ (DOID:1826)—among other diseases, Gdi1 (Xq28) involved in nonsyndromic X-linked intellectual disability⁴⁶ (DOID:0050776), and Acta1 (1q42) associated in nemaline myopathy 3⁴⁷ (DOID:0110927).

The above findings provide a basis upon which further biological investigations can be performed, thus extending our understanding of LDIR effects in a highly precise manner. This short story provides a sketchy example of data drilling into big data of high complexity for systems radiobiology and the knowledge extraction steps that help us focus on specific targets and their associated networks. To qualify for a big data perspective, readers need to imagine the above example multiplied by 100-fold, with dozens of experimental conditions (time–dose), and multiple hundreds of genes, coupled with expression and variation information.

Conclusions

Certain individual molecules represented by genes or proteins that have been discovered by high-throughput biology experiments can serve as specific biomarkers for LDIR response in mammalian tissues, including the brain. The ultimate goal of molecular and computational systems biology is to generate objective detailed models of complex associations at the molecular level that can be correlated with the phenotypes under consideration, for health and disease management. For complex phenotypes, single-molecule patterns might indicate possible involvement of genes or proteins in a particular condition, but they might not suffice to provide the required specificity and robust validation elements for diagnosis, prognosis, and/or monitoring. We thus need to maintain a view toward a systems radiobiology approach and prepare for a phase of intense use of relevant tools and algorithms to analyze emerging large data sets. Future developments for big data analytics in radiobiology omics will require intelligent solutions and rules that establish a framework for openly sharing data resources on a large scale, akin to similar efforts.⁴⁸ Data resources such as StoreDB will have a central role to play in those efforts.²⁶

Despite progress on many fronts, key challenges remain. As mentioned above, we draw from brain radiobiology specifically—where difficulties are associated with the limited range of possible experiments and the analysis of animal behavior. Even if we suppose an explosion of useful data and research results, the implementation of better reporting processes can be a significant obstacle. Open science standards—first and foremost—will need to be adopted, with seamless information flow from publications and their data supplements to open data collections and databases.⁴⁹ Requirements for appropriate minimal information standards and wider community efforts should be encouraged, facilitating reproducibility and comparison across different experimental designs.⁵⁰ Finally, dedicated software platforms with some characteristics outlined above can be deployed and made available for the scientific community.

It is important to emphasize that many processes, tools, and techniques are already in a mature phase that facilitates and indeed encourages exploitation of big data approaches. Thus, systems radiobiology research teams should consider a closer association with data scientists, as the field is both amenable to and in need of high-performance computing and big data analytics. Commercial services currently offered by some of the largest cloud providers—Amazon,⁵¹ Google,⁵² IBM,⁵³ Microsoft⁵⁴—are indeed simplifying future implementations of the proposed, idealized framework presented here, and similar incarnations and designs. In addition, implementation and operations will require knowledge on DevOps to be able to ensure the proper functioning of all framework modules and solicit analytical solutions for scientists on demand. These envisaged developments will necessarily have to follow progress elsewhere, in fields as diverse as radiation biology, systems biology, bioinformatics, and software engineering. For the latter, surprisingly, there is also a constant need for open science, that is, open access, data, source.⁵⁵

It is somewhat paradoxical that some of the above issues have been raised before, in the data-intensive field of genomics, in particular data integration and re-annotation, prompted years ago.⁵⁶ This is indeed encouraging, as there is vast experience that can be readily adopted,⁴⁰ in the specific case of systems radiobiology. In conclusion, we believe that the field of radiation research and safety is now entering a new big data phase, where some of the considerations above will need to be endorsed widely, for a successful future.

Footnotes

Authors' Contributions

All authors have contributed toward the generation of data, analysis of results, and writing the article; they have approved the submitted version and endorsed the submission. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the article, and in the decision to publish the results.

Acknowledgments

We thank all Cerebrad partners for their input, suggestions, and feedback.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This work has been supported by the collaborative European project Cerebrad (Grant Agreement No. 295552), within the 7th EU framework programme, Nuclear Fission and Radiation Protection. C.A.O. acknowledges support by the project Elixir-Gr, implemented under the Action “Reinforcement of the Research & Innovation Infrastructure,” funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).

Cite this article as: Karapiperis C, Chasapi A, Angelis L, Scouras ZG, Mastroberardino PG, Tapio S, Atkinson MJ, Ouzounis CA (2021) The coming of age for big data in systems radiobiology, an engineering perspective. Big Data 9:1, 63–71, DOI: 10.1089/big.2019.0144.

Abbreviations Used

References

Morgan

, Bair

. Issues in low dose radiation biology: The controversy continues. A perspective. Radiat Res. 2013; 179:501–510.

Tang

, Loke

. Molecular mechanisms of low dose ionizing radiation-induced hormesis, adaptive responses, radioresistance, bystander effects, and genomic instability. Int J Radiat Biol. 2015; 91:13–27.

Thompson

. Unmasking the truth: The science and policy of low-dose ionizing radiation. Bull At Sci. 2012; 68:44–50.

Beausejour Ladouceur

, Lawler

, Gurvitz

, et al. Exposure to low-dose ionizing radiation from cardiac procedures in patients with congenital heart disease: 15-year data from a population-based longitudinal cohort. Circulation. 2016; 133:12–20.

Yang

, Kong

, Wang

, et al. Low-dose ionizing radiation induces direct activation of natural killer cells and provides a novel approach for adoptive cellular immunotherapy. Cancer Biother Radiopharm. 2014; 29:428–434.

De Angelis

, Caldora

, Santaquilani

, et al. Radiation exposure of civilian airline crew members and associated biological effects due to the atmospheric ionizing radiation environment. Phys Med. 2001; 17 Suppl 1:258–260.

Kinoshita

, Sueki

, Sasa

, et al. Assessment of individual radionuclide distributions from the fukushima nuclear accident covering central-east Japan. Proc Natl Acad Sci U S A. 2011; 108:19526–19529.

Butler

. Radioactivity spreads in Japan. Nature. 2011; 471:555–556.

Nelson

, Simonsen

, Huff

. 2016. Risk of acute (in-flight) or late central nervous system effects from radiation exposure. Available online at https://humanresearchroadmap.nasa.gov/Evidence/reports/CNS.pdf (last accessed September 21, 2020).

10.

Parihar

, Allen

, Tran

, et al. What happens to your brain on the way to Mars. Sci Adv. 2015; 1: e1400256.

11.

Parihar

, Allen

, Caressi

, et al. Cosmic radiation exposure and persistent cognitive dysfunction. Sci Rep. 2016; 6:34774.

12.

Dicello

. The impact of the new biology on radiation risks in space. Health Phys. 2003; 85:94–102.

13.

Cucinotta

, Schimmerling

, Wilson

, et al. Uncertainties in estimates of the risks of late effects from space radiation. Adv Space Res. 2004; 34:1383–1389.

14.

Eriksson

, Buratovic

, Fredriksson

, et al. Neonatal exposure to whole body ionizing radiation induces adult neurobehavioural defects: Critical period, dose—Response effects and strain and sex comparison. Behav Brain Res. 2016; 304:11–19.

15.

Dauer

, Brooks

, Hoel

, et al. Review and evaluation of updated research on the health effects associated with low-dose ionising radiation. Radiat Prot Dosim. 2010; 140:103–136.

16.

Salomaa

, Jourdain

, Kreuzer

, et al. Multidisciplinary European low dose initiative: An update of the MELODI program. Int J Radiat Biol. 2017; 93:1035–1039.

17.

Stephens

, Lee

, Faghri

, et al. Big data: Astronomical or genomical?. PLoS Biol. 2015; 13: e1002195.

18.

Schofield

, Kulka

, Tapio

, Grosche

. Big data in radiation biology and epidemiology; an overview of the historical and contemporary landscape of data and biomaterial archives. Int J Radiat Biol. 2019; 95:861–878.

19.

Park

, Kwon

, Lee

, et al. Mapping the research trends on the biological effects of radiation less than 100 msv: A bibliometric analysis for 30 years publication. Int J Radiat Biol. 2019; 95:527–536.

20.

Haley

, Wang

, Wanzer

, et al. Past and future work on radiobiology mega-studies: A case study at argonne national laboratory. Health Phys. 2011; 100:613–621.

21.

Wong

. Technologies for integrating biological data. Brief Bioinform. 2002; 3:389–404.

22.

Moher

, Liberati

, Tetzlaff

, Altman

. Preferred reporting items for systematic reviews and meta-analyses: The prisma statement. PLoS Med. 2009; 6: e1000097.

23.

NCBI. Pubtator. 2019. Available online at https://www.ncbi.nlm.nih.gov/research/pubtator/ (last accessed September 21, 2020).

24.

NextBio. Correlation engine. 2019. Available online at http://www.nextbio.com/b/nextbio.nb (last accessed September 21, 2020).

25.

Karapiperis

, Kempf

, Quintens

, et al. Brain radiation information data exchange (BRIDE): Integration of experimental data from low-dose ionising radiation research for pathway discovery. BMC Bioinformatics. 2016; 17:212.

26.

Store_db. 2019. Available online at https://www.storedb.org/store_v3/ (last accessed September 21, 2020).

27.

Fayyad

, Piatetsky-Shapiro

, Smyth

. From data mining to knowledge discovery in databases. AI Mag. 1996; 17:37–54.

28.

Khine

, Wang

. Data lake: A new ideology in big data era. In: Proceedings of 4th Annual International Conference on Wireless Communication and Sensor Network, Wuhan, China: WCSN2017, 2018.

29.

Sadiku

MNO

, Olaleye

, Musa

. Data lakes: A primer. Adv Res Comp Sci Soft Eng. 2019; 9:47–50.

30.

Grossman

. Data lakes, clouds, and commons: A review of platforms for analyzing and sharing genomic data. Trends Genet. 2019; 35:223–234.

31.

Hai

, Geisler

, Quix

. Constance: An intelligent data lake system. In: SIGMOD ’16 Proceedings of the 2016 International Conference on Management of Data. 2016. pp. 2097–2100.

32.

Taylor

. An overview of the hadoop/mapreduce/hbase framework and its current applications in bioinformatics. BMC Bioinform. 2010; 11 Suppl 12: S1.

33.

Apache. Apache Spark^TM—Lightning-fast cluster computing. 2019. Available online at https://spark.apache.org/ (last accessed September 21, 2020).

34.

Apache. Apache zeppelin. 2019. Available online at https://zeppelin.apache.org/ (last accessed September 21, 2020).

35.

Tapio

, Schofield

, Adelmann

, et al. Progress in updating the European radiobiology archives. Int J Radiat Biol. 2008; 84:930–936.

36.

Morioka

, Blyth

, Imaoka

, et al. Establishing the japan-store house of animal radiobiology experiments (j-share), a large-scale necropsy and histopathology archive providing international access to important radiobiology data. Int J Radiat Biol. 2019; 95:1372–1377.

37.

Ma'ayan

, Rouillard

, Clark

, et al. Lean big data integration in systems biology and systems pharmacology. Trends Pharmacol Sci. 2014; 35:450–460.

38.

Kempf

, Casciati

, Buratovic

, et al. The cognitive defects of neonatally irradiated mice are accompanied by changed synaptic plasticity, adult neurogenesis and neuroinflammation. Mol Neurodegener. 2014; 9:57.

39.

Shannon

, Markiel

, Ozier

, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13:2498–2504.

40.

Yamada

, Okada

, Wang

, et al. Interpretation of omics data analyses. J Hum Genet. 2020. DOI: 10.1038/s10038-020-0763-5.

41.

Batch

, Elmqvist

. The interactive visualization gap in initial exploratory data analysis. IEEE Trans Vis Comput Graph. 2018; 24:278–287.

42.

Tamiya

, Makino

, Hayashi

, et al. A mutation of cox6a1 causes a recessive axonal or mixed form of charcot-marie-tooth disease. Am J Hum Genet. 2014; 95:294–300.

43.

Tischfield

, Baris

, Wu

, et al. Human tubb3 mutations perturb microtubule dynamics, kinesin interactions, and axon guidance. Cell. 2010; 140:74–87.

44.

Sher

, Aoyama

, Huebsch

, et al. A rostrocaudal muscular dystrophy caused by a defect in choline kinase beta, the first enzyme in phosphatidylcholine biosynthesis. J Biol Chem. 2006; 281:4938–4948.

45.

Gupta

, Polinsky

, Senephansiri

, et al. Seizure evolution and amino acid imbalances in murine succinate semialdehyde dehydrogenase (SSADH) deficiency. Neurobiol Dis. 2004; 16:556–562.

46.

Bianchi

, Farisello

, Baldelli

, et al. Cognitive impairment in gdi1-deficient mice is associated with altered synaptic vesicle pools and short-term synaptic plasticity, and can be corrected by appropriate learning training. Hum Mol Genet. 2009; 18:105–117.

47.

Nguyen

, Joya

, Kee

, et al. Hypertrophy and dietary tyrosine ameliorate the phenotypes of a mouse model of severe nemaline myopathy. Brain. 2011; 134:3516–3529.

48.

Amann

, Baichoo

, Blencowe

, et al. Toward unrestricted use of public genomic data. Science. 2019; 363:350–352.

49.

Boulton

. Reproducibility: International accord on open data. Nature. 2016; 530:281.

50.

Taylor

, Field

, Sansone

, et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: The mibbi project. Nat Biotechnol. 2008; 26:889–896.

51.

Amazon. Amazon web services (AWS)—Cloud computing services. 2019. Available online at https://aws.amazon.com/ (last accessed September 21, 2020).

52.

Google. Google cloud computing, hosting services & apis, google cloud platform. 2019. Available online at https://cloud.google.com/ (last accessed September 21, 2020).

53.

Chen

, Argentinis

, Weber

. IBM Watson: How cognitive computing can be applied to big data challenges in life sciences research. Clin Ther. 2016; 38:688–701.

54.

Microsoft. Microsoft azure cloud computing platform & services. 2019. Available online at https://azure.microsoft.com/ (last accessed September 21, 2020).

55.

Méndez-Fernández

, Graziotin

, Wagner

, Seibold

. Open science in software engineering. arXiv. 2019;1904.06499.

56.

Ouzounis

, Karp

. The past, present and future of genome-wide re-annotation. Genome Biol. 2002; 3:COMMENT2001.