Artificial Intelligence as Accelerator for Genomic Medicine and Planetary Health

Abstract

Genomic medicine has made important strides over the past several decades, but as new insights and technologies emerge, the applications of genomics in medicine and planetary health continue to evolve and expand. An important grand challenge is harnessing and making sense of the genomic big data in ways that best serve public and planetary health. Because human health is inextricably intertwined with the health of planetary ecosystems and nonhuman animals, genomic medicine is in need of high throughput bioinformatics analyses to harness and integrate human and ecological multiomics big data. It is in this overarching context that artificial intelligence (AI), particularly machine learning and deep learning, offers enormous potentials to advance genomic medicine in a spirit of One Health. This expert review offers an analysis of the rapidly emerging role of AI in genomic medicine, including its current drivers, levers, opportunities, and challenges. The scope of AI applications in genomic medicine is broad, ranging from efficient and automated data analysis to drug repurposing and precision medicine, as with its challenges such as veracity of the big data that AI sorely depends on, social biases that the AI-driven algorithms can introduce, and how best to incorporate AI with human intelligence. The road ahead for AI in genomic medicine is complex and arduous and yet worthy of cautious optimism as we face future pandemics and ecological crises in the 21st century. Now is a good time to think about the role of AI in genomic medicine and planetary health.

Introduction

Although big data remain a cornerstone of medical research, there is a growing recognition that it is the meaningful use of big data—clinically, environmentally, and socially—that creates value for health and society. New analytics and sense-making tools that can harness big data for various applications have, therefore, recently emerged, epitomized with the expectations such as “digital transformation of medicine through machine learning” or “artificial intelligence redefining healthcare” (Arga, 2020; Özdemir and Hekim, 2018).

A complete understanding of the genome, its functions, and its links to disease is still a long way off. To address this critical bottleneck, artificial intelligence (AI) techniques, particularly machine learning and deep learning, offer great promise by providing novel predictive tools not inherent in traditional techniques for optimizing key problems in genomic medicine and planetary health (Leung et al., 2016; Lin and Wu, 2021).

Guiding clinical care of individuals and provision of personalized/precision medicine that optimize diagnostic or therapeutic decisions in tandem call for genomic medicine and planetary health (Arga and Sinha, 2021; Shendure et al., 2019). The main goal of high-throughput sequencing efforts, such as whole genome or whole-exome sequencing, often through next-generation sequencing (NGS) technologies, is to identify genetic variations, understand their associations with diseases or potential therapies, and in this way accelerate the annotation process and advance genomic medicine (Shendure et al., 2019).

Increasing efforts are being made to curate population-wide genome sequence data sets and elucidate the relationships of genomic data with phenotype and clinical outcomes. Integrative approaches that incorporate other types of multiomics data sets (e.g., transcriptomics, proteomics, lipidomics, and metabolomics) are also emerging (Karczewski and Snyder, 2018). Such integrative analyses of multiomics and clinical data make the concept of personalized/precision medicine possible in clinics, but also presents complex computational and sociological challenges. In addition, advances in biotechnology are enabling the rapid generation of highly tailored data sets to test new hypotheses. Single-cell resolution NGS and targeted gene editing tools to study specific gene disorders are the most recent examples (Way and Greene, 2018).

Human health is intimately linked to the health of other organisms and the shared environment and cannot be considered separately from planetary health, as envisioned by the concept of One Health (OIE, 2020). Human impact on climate and environment has become the dominant factor in the 21st century, we are facing pandemics and ecological crises due to the degradation of natural systems and disruption of environmental conditions and habitats. This leads to hard-to-solve vital problems such as climate change, zoonotic diseases, antibiotic resistance, food safety, vector-borne diseases, environmental pollution, and other health threats that affect humans, animals, and the environment together.

Consequently, it is clear that holistic approaches such as the One Health perspective and genomic medicine are at the center of the solution space. For example, in addition to understanding transmission cycles, elucidating mechanisms to prevent and mitigate transmission could be useful for future risk conditions associated with emerging zoonotic diseases, as in the case of COVID-19.

As a result of all these increasing concerns and developments, there is a growing need for computational approaches that can handle the analysis of complex, heterogeneous, and high-dimensional data sets and provide accurate solutions in a fast and cost-effective manner. In addition, diverse data, including molecular expression profiles, physiological measurements, medical imaging, and other clinically and environmentally relevant information, can be integrated into data sets, and some data types can also span many dimensions (Leung et al., 2016; Madabhushi and Lee, 2016). These aspects make the process more complex than could be covered by conventional computational approaches that use predefined rules.

The encouraging thing is that AI algorithms train and learn from data, create their own rules, and improve with experience in a continuous learning process. This innate advantage, not inherent in conventional statistical methods, allows AI to enable new discoveries, such as unrecognized patterns or hard-to-detect relationships, in these data sets without requiring explicit rules to perform these tasks. For these reasons, AI techniques are rapidly gaining attention in genomic medicine and planetary health.

Current and Emerging Applications of AI in Genomic Medicine

AI methods have already been applied to almost every aspect of the genomic data analysis pipeline, from presequencing to interpretation, and recent advances have focused on enabling improvements in existing techniques. These efforts offer a number of potential benefits that can help accelerate discovery in genomic medicine, either by directly facilitating the phases of clinical genomic analysis or by improving understanding of health- and disease-related genomic variation (Ho et al., 2019; Leung et al., 2016; Sundaram et al., 2018). In line with developments in recent years, machine learning and deep learning are helping to better identify variants in the genome, but understanding the significance of these variations remains a challenge. Therefore, most emerging activities in machine learning and deep learning within genomics are occurring in the analysis and interpretation phase (Dias and Torkamani, 2019).

Although previous efforts have achieved certain standards in genomic data analysis pipelines (Ko et al., 2020), there are still major challenges and limitations that need to be solved, including identifying hard-to-detect variants, predicting phenotypic effects of variants, handling extremely large amounts of genomic sequence data, and integrating genomic data with other relevant information, such as other molecular or clinical data. In contrast, machine learning and deep learning have provided accurate algorithms for identifying somatic and copy number variants (Ho et al., 2019; Sahraeian et al., 2019), and more precise AI tools have been developed for predicting the effects of genetic variants, such as the resulting effects on proteins or key molecular processes such as transcription, signal transduction, and metabolism (Dias and Torkamani, 2019; Sundaram et al., 2018).

Facilitated by increased computational power in recent years, the resurgence of deep learning has created new opportunities for analyzing massive data sets such as medical imaging data and electronic health records (EHRs). Significant progress has also been made in developing deep learning-driven tools for extracting clinical data from EHRs and integrating with genomic data sets (Esteva et al., 2019; Norgeot et al., 2019).

AI applications have also accelerated discovery in genomic medicine by improving our understanding of the relationship between genomic variation and diseases or disorders. These include studies to investigate cancer cell development and survival and to elucidate the genetic changes that drive tumor formation and growth (Sahraeian et al., 2019; Sundaram et al., 2018), to integrate and analyze genomic data along with other types of molecular and clinical data (Karczewski and Snyder, 2018; Zitnik et al., 2019), and to improve the efficiency and accuracy of gene editing techniques such as CRISPR (Kim et al., 2018; Leenay et al., 2019; Shen et al., 2018).

The popularity of and the growing investments in machine learning and deep learning methods for integrative multiomics data analysis are noteworthy and remain relevant to the current and future design of innovation ecosystems driven by AI (Fig. 1). To better elucidate the molecular mechanisms under the diseases or the causal relationships between environmental exposure and planetary health, integrative multiomics approach is a preferred strategy (Arga, 2019; Crandall et al., 2020; Gov and Arga, 2016; Gulfidan et al., 2021; Hasin, et al., 2017; Karczewski and Snyder, 2018; Koh and Hwang, 2019; Turanli et al., 2018).

FIG. 1.

Technological dimensions in designing innovation systems that can harness the full potential of AI in genomic medicine and planetary health. AI, artificial intelligence.

For example, the emergence of invasive species and the effects of climate change have particular implications for the planetary health and, therefore, a holistic approach to understanding how and why pathogenesis occurs under changing environmental conditions is critical for effective disease management (Crandall et al., 2020). Given the limitations of single-level omics data sets in comprehensively interpreting specific biological phenomena, studies are combining data from multiple omics technologies with clinical data and environmental monitors to find accurate solutions quickly and cost-effectively, and machine learning techniques are becoming increasingly important in analyzing these heterogeneous data sets (Ozer et al., 2020).

The potential to apply machine learning to all phases of genome-based drug development has motivated many pharmaceutical companies to invest resources in this area. Machine learning is being applied to genomic and other omics data sets for a variety of purposes, including disease subtype definition, disease biomarker identification, target discovery, drug repurposing, and drug response prediction (Ekins et al., 2019; Koromina et al., 2019; Turanli et al., 2021; Vamathevan et al., 2019; Zafeiris et al., 2018). Many large pharmaceutical companies have AI-focused research and development (R&D) initiatives or collaborations underway (Baldoni, et al., 2020).

Limitations and Grand Challenges

The application of AI has not yet led to significantly improved outcomes in genomic medicine, and the potential to explore genomic data sets remains largely untapped (Topol, 2019). To drive progress, several interrelated issues must be addressed, including data quality and accessibility, bias, replicability, infrastructure, and uncertainties in privacy, security, regulation, and clinical governance (Fig. 2).

FIG. 2.

Beyond the laboratory: societal, critical governance, and policy dimensions for AI-driven innovation ecosystems.

The accuracy of an AI technique is highly dependent on the reliability and quality of the training data. Therefore, data credibility is the primary bottleneck. The performance of AI algorithms is affected by the amount and quality of data used for development or training, so optimized access to high-quality genomic and health data is essential. However, big data not only increase the data volume but also of the errors and noise (Arga, 2019). Data sets are noisy, poorly annotated, and generally unstructured. Fundamental to curating reliable and high-quality training data sets are standards (universal protocols) for data quality, curation, interoperability, and accessibility.

In addition, monitoring and eliminating sources of bias within training data sets are also challenges that must be resolved to demonstrate validity and clinical utility. There are biases in the data sets that limit the representation of heterogeneity in society (Demirci et al., 2021). Some populations are underrepresented in data sets used to develop or train AI algorithms. Algorithmic biases may also arise due to the availability of data, the way data are processed and combined, the way questions are formulated, and pre-existing biases in society.

Replicability and reproducibility of results are another problem (Hutson, 2018). Replication and reproduction of the methods and results of AI tools are typically difficult and can be even more challenging in genomic data analysis, given the complexity of the various steps (i.e., presequencing, sequencing, data processing, analysis, and interpretation) in the data analysis pipeline. Implementation of a flexible scalable data processing infrastructure would streamline this process. It is even more challenging in multiomics R&D, the integration and interpretation of diverse large-scale omics data in a way that provides biological and clinical insights (Arga, 2019; Turanli et al., 2019). The difficulties often arise from the size and multiple different formats of the data sets generated at different omics levels on different platforms. It is important to ensure that the differences observed in the samples before integration are due to biological variability and are not a technical artifact embedded in the data.

Of course, these issues, in turn, pose challenges for privacy, security, and safety measures (Char et al., 2018; Özdemir, 2019a; Topol, 2019). Security is one of the most controversial aspects of AI applications. Along with the issue of genomics, concerns about security, confidentiality, and ethical use reach the highest levels in society. In addition, there are uncertainties about the regulatory status of AI algorithms used in clinical genomics. How algorithms are certified, how they are used for clinical decision making and patient management, liability for innocuous use, and compliance with transparency and explanation requirements within national regulations are unclear. These concerns need to be effectively managed and addressed to remove the risk of hindering the use and implementation of these technologies. Deciphering legal and regulatory implications and establishing accountability would help curb malicious or discriminatory use of AI.

Interwoven with nearly all of these considerations is the sociological importance of building (patient and health care professional) trust in AI systems (Özdemir, 2020). Given the magnitude of the interrelationships, it is unlikely that these issues can be addressed in isolation and entail the need for efforts to acquire expertise and foster collaboration. AI-driven science is also similar to other forms of science as well in that it requires consideration of democratic representation in technology and innovation to address, for example, historical power asymmetries, climate justice, and gender parity. Although ethical and legal dimensions are important, critical governance of the AI suite of technologies also require thinking about the politics of technology and innovation: who funds and designs AI studies, applications, to what ends, using which means? (Fig. 2).

Applying AI to genomics requires skills, expertise, data, and resources from multiple disciplines. No single discipline has a monopoly on the subject, but there is a significant need for interdisciplinary collaboration. Moreover, there may be a limit to the extent to which tacit human qualities and contexts can be mimicked or constructed by AI, pointing to the indispensable need for human intelligence, which can be defined as “a collection of contextual tacit knowledge about human values, responsibility, empathy, intuition, or care for another living being that cannot be readily described or performed by algorithms” (Özdemir, 2019b). Considering that it is not cutting-edge technologies but people's data-driven and value-laden decisions that are transforming society, developing rational ways to integrate AI with human intelligence is a must.

Conclusions

The number and range of applications for AI in genomic medicine are growing rapidly. AI has yet to transform genomic medicine, but it is contributing to important incremental improvements in the quality and accuracy of predictions along the genomic analysis pipeline. Despite its enormous potential, major challenges remain to be overcome if AI is to meet the high expectations placed on it to transform genomic medicine. The road ahead for AI in genomic medicine is complex and arduous and yet worthy of cautious optimism as we face future pandemics and ecological crises in the 21st century.

Footnotes

Acknowledgments

The views expressed are the personal opinions of the authors only. The scholarships under the YOK 100/2000 Doctoral Fellowship Program and 2211/A National PhD Scholarship Program under The Scientific and Technological Research Council of Turkey (TUBITAK) provided to Gizem Gulfidan are greatly acknowledged.

Author Disclosure Statement

The authors declare they have no conflicting financial interests.

Funding Information

No funding was received in support of this article.

Abbreviations Used

References

Arga

. (2019). Interview with Prof. K. Yalçın Arga: a pioneer of multi-omics science and health care innovation. OMICS, 23, 460–462.

Arga

. (2020). COVID-19 and the futures of machine learning. OMICS, 24, 512–514.

Arga

, and Sinha

. (2021). Recent developments in cancer systems biology: lessons learned and future directions. J Personal Med, 11, 271–273.

Baldoni

, Begoli

, Kusnezov

, and MacWilliams

. (2020). Solving hard problems with AI: dramatically accelerating drug discovery through a unique public-private partnership. J Commercial Biotechnol, 25, 42–48.

Char

, Shah

, and Magnus

. (2018). Implementing machine learning in health care – addressing ethical challenges. N Engl J Med, 378, 981–983.

Crandall

, Gold

, Jiménez-Gasco

MdM

, Filgueiras

, and Willett

. (2020). A multi-omics approach to solving problems in plant disease ecology. PLoS One, 15, e0237975.

Demirci

, Darendeliler

, Poyrazoglu

, et al. (2021). Monogenic childhood diabetes: dissecting clinical heterogeneity by next-generation sequencing in maturity-onset diabetes of the young. OMICS, 25, 431–449.

Dias

, and Torkamani

. (2019). Artificial intelligence in clinical and genomic diagnostics. Genome Med, 11, 70.

Ekins

, Puhl

, Zorn

, et al. (2019). Exploiting machine learning for end-to-end drug discovery and development. Nat Mater, 18, 435–441.

10.

Esteva

, Robicquet

, Ramsundar

, et al. (2019). A guide to deep learning in healthcare. Nat Med, 25, 24–29.

11.

Gov

, and Arga

. (2016). Integrative cooperation and hierarchical operation of microRNA and transcription factor crosstalk in human transcriptional regulatory network. IET Syst Biol, 10, 219–228.

12.

Gulfidan

, Beklen

, Sinha

, et al. (2021). Differential protein interactome in esophageal squamous cell carcinoma offers novel systems biomarker candidates with high diagnostic and prognostic performance. OMICS, 25, 495–512.

13.

Hasin

, Seldin

, and Lusis

. (2017). Multi-omics approaches to disease. Genome Biol, 18, 83.

14.

DSW

, Schierding

, Wake

, et al. (2019). Machine learning SNP based prediction for precision medicine. Front Genet, 10, 267.

15.

Hutson

. (2018). Artificial intelligence faces reproducibility crisis. Science, 359, 725–726.

16.

Karczewski

, and Snyder

. (2018). Integrative omics for health and disease. Nat Rev Genet, 19, 299–310.

17.

Kim

, Min

, Song

, et al. (2018). Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat Biotechnol, 36, 239–241.

18.

, Kim

, Cho

, et al. (2020). Bioinformatics services for analyzing massive genomic datasets. Genomics Inform, 18, e8.

19.

Koh

, and Hwang

. (2019). Multi-omics approaches for understanding environmental exposure and human health. Mol Cell Toxicol, 15, 1–7.

20.

Koromina

, Pandi

, and Patrinos

. (2019). Rethinking drug repositioning and development with artificial intelligence, machine learning, and omics. OMICS, 23, 539–548.

21.

Leenay

, Aghazadeh

, Hiatt

, et al. (2019). Large dataset enables prediction of repair after CRISPR-Cas9 editing in primary T cells. Nat Biotechnol, 37, 1034–1037.

22.

Leung

MKK

, Delong

, Alipanahi

, et al. (2016). Machine learning in genomic medicine: a review of computational problems and data sets. Proc IEEE, 104, 176–197.

23.

Lin

, and Wu

. (2021). Digital transformation in personalized medicine with artificial intelligence and the internet of medical things. OMICS [Epub ahead of print]; http://doi.org/10.1089/omi.2021.0037

24.

Madabhushi

, and Lee

. (2016). Image analysis and machine learning in digital pathology: challenges and opportunities. Med Image Anal, 33, 170–175.

25.

Norgeot

, Glicksberg

, and Butte

. (2019). A call for deep-learning healthcare. Nat Med, 25, 14–15.

26.

OIE. (2020). One health. https://www.oie.int/en/forthe-media/onehealth/ Last viewed on September 1, 2021.

27.

Ozer

, Sarica

, and Arga

. (2020). New machine learning applications to accelerate personalized medicine in breast cancer: rise of the support vector machines. OMICS, 24, 241–246.

28.

Özdemir

. (2019a). The big picture on the “AI Turn” for digital health: the internet of things and cyber-physical systems. OMICS, 23, 308–311.

29.

Özdemir

. (2019b). Not all intelligence is artificial: data science, automation, and AI meet HI. OMICS, 23, 67–69.

30.

Özdemir

. (2020). Genomics, the internet of things, artificial intelligence and society. In: Applied Genomics and Public Health. Patrinos

, ed. San Diego, CA: Academic Press, 275–284.

31.

Özdemir

, and Hekim

. (2018). Birth of Industry 5.0: making sense of big data with artificial intelligence, “The Internet of Things” and next-generation technology policy. OMICS, 22, 65–76.

32.

Sahraeian

SME

, Liu

, Lau

, et al. (2019). Deep convolutional neural networks for accurate somatic mutation detection. Nat Commun, 10, 1041.

33.

Shen

, Arbab

, Hsu

, et al. (2018). Predictable and precise template-free CRISPR editing of pathogenic variants. Nature, 563, 646–651.

34.

Shendure

, Findlay

, and Snyder

. (2019). Genomic medicine-progress, pitfalls, and promise. Cell, 177, 45–57.

35.

Sundaram

, Gao

, Padigepati

, et al. (2018). Predicting the clinical impact of human mutation with deep neural networks. Nat Genet, 50, 1161–1170.

36.

Topol

. (2019). High-performance medicine: the convergence of human and artificial intelligence. Nat Med, 25, 44–56.

37.

Turanli

, Altay

, Boren

, et al. (2021). Systems biology based drug repositioning for development of cancer therapy. Semin Cancer Biol, 68, 47–58.

38.

Turanlı

, Karagoz

, Bidkhori

, et al. (2019). Multi-omic data interpretation to repurpose sub-type specific drug candidates for breast cancer. Front Genet, 10, 420.

39.

Turanli

, Karagoz

, Gulfidan

, Sinha

, Mardinoglu

, and Arga

. (2018). A network-based cancer drug discovery: from integrated multi-omics approaches to precision medicine. Curr Pharm Des, 24, 32.

40.

Vamathevan

, Clark

, Czodrowski

, et al. (2019). Applications of machine learning in drug discovery and development. Nat Rev Drug Discov, 18, 463–477.

41.

Way

, and Greene

. (2018). Bayesian deep learning for single-cell analysis. Nat Methods, 15, 1009–1010.

42.

Zafeiris

, Rutella

, and Ball

. (2018). An artificial neural network integrated pipeline for biomarker discovery using Alzheimer's disease as a case study. Comput Struct Biotechnol J, 16, 77–87.

43.

Zitnik

, Nguyen

, Wang

, et al. (2019). Machine learning for integrating data in biology and medicine: principles, practice, and opportunities. Inf Fusion, 50, 71–91.