Abstract
Omics data are multidimensional, heterogeneous, and high throughput. Robust computational methods and machine learning (ML)-based models offer new prospects to accelerate the data-to-knowledge trajectory. Deep learning (DL) is a powerful subset of ML inspired by brain structure and has created unprecedented momentum in bioinformatics and computational biology research. This article provides an overview of the current DL models applied to multi-omics data for both the beginner and the expert user. Additionally, COVID-19 will continue to impact planetary health as a pandemic and an endemic disease, with genomic and multi-omic pathophysiology. DL offers, therefore, new ways of harnessing systems biology research on COVID-19 diagnostics and therapeutics. Herein, we discuss, first, the statistical ML algorithms and essential deep architectures. Then, we review DL applications in multi-omics data analysis and their intersection with COVID-19. Finally, challenges and several promising directions are highlighted going forward in the current era of COVID-19.
Introduction
Multi-omics studies are vital to unpack the systems biology of COVID-19 (Soko et al., 2022). Because omics science is composed of big data, dimension reduction methods have been widely applied in data analysis (Meng et al., 2016). Machine learning (ML) and artificial intelligence (AI) offer new prospects at the intersection of COVID-19 and multi-omics systems science. Deep learning (DL) is a subset of ML that is poised to accelerate bioinformatics and computational biology research concerning COVID-19 and life sciences broadly.
DL is currently a rapidly developing frontier of ML and AI applications in the field of omics systems science. DL brings about an end-to-end mechanism through a multilayer network structure without human intervention (Su et al., 2020). DL techniques display robust performance in many real-world applications ranging from criminology to health care (Zhang et al., 2019).
Because COVID-19 is here to stay as an endemic disease after the pandemic course, unraveling its multi-omics basis through robust computational approaches is timely and bears significance for planetary health. This article provides an overview of the current DL models applied to multi-omics data for both the beginner and the expert user. We discuss, first, the commonly used ML algorithms and DL architectures, such as support vector machine (SVM), logistic regression (LR), random forest (RF), deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM), autoencoder (AE), and generative adversarial network (GANs). Next, we highlight the recent DL models applied to omics data and the relevant literature at the intersection of multi-omics and COVID-19. We conclude with an overview of the challenges and promising directions going forward in the current era of COVID-19.
Common ML and DL Models
The data analysis algorithms in ML aim to exploit meaningful correlations between data and perform predictions using historical data (Abiodun et al., 2018). As shown in Figure 1, ML models are divided into the following four major types (Pugliese et al., 2021): (i) supervised learning, in which new input data are predicted based on the prelabeled input data; (ii) unsupervised models are applied to unlabeled and unclassified data, and the primary purpose is to discover and analyze patterns in unsorted data (Pugliese et al., 2021); (iii) semi-supervised is a hybrid approach for labeled and unlabeled samples, which eliminates the need for numerous labels; and (iv) reinforcement learning is suitable for sequential decision-making in which the agent attempts to fulfill a target in an uncertain and potentially complex situation (Mahmud et al., 2018).

The approaches in machine learning. The boxes where there are instances of real-world health care have their references inserted: Disease Prediction (Bowler et al., 2022), DNA methylation regression (Tian et al., 2019), Biomarker Gene Clustering (Pal et al., 2007), Pathway's pattern discovery (Huang et al., 2014), Microarray dimension reduction (Aziz et al., 2017), Human activity recognition (Zahin et al., 2019), Privacy-preserving clustering (Huang et al., 2016), Diagnostic clinical concept (Moeinizade et al., 2022), Robotic-assisted surgery (Kassahun et al., 2016).
Contrary to statistical ML methods, where features are manually imported into the model, deep models automatically extract features without the intervention of a human. Consequently, large data sets are required for effective feature extraction and meaningful correlation (Pirmoradi et al., 2020). Briefly, common models were discussed in summary.
Random forest
RF (Schonlau and Zou, 2020) is a supervised ML algorithm that applies multiple decision trees to handle regression and classification problems by majority vote. This algorithm employs ensembles of trees where random variables extend each tree from the original set of attributes.
Logistic regression
LR is a derived version of linear regression with binomial variables. It could be an effective model for binary class issues to estimate the relationship between whole independent variables, ranging from categorical to continuous (Shipe et al., 2019).
Support vector machine
SVM is a robust algorithm capable of recognizing subtle patterns in complex data sets. It comes in two types, namely support vector classifier (SVC) and support vector regression (SVR) (Nayak et al., 2015). The SVC is divided into three categories. First, SVC, with maximum margin, was used to establish a linear decision boundary between two classes to investigate hyperplanes and achieve generalization capability and the consequent high-performance classification and accurate prediction. In real life, many problems do not have linear variables, and then, they are unable to be solved linearly. In this respect, two approaches are introduced, soft margin SVC using curved decision boundaries and kernel trick allowing nonlinear boundaries to be formed (Suzumura et al., 2017). Kernel-based models are common for handling noisy data (Yu et al., 2021). However, SVR was performed for regression problems ranging from linear to nonlinear, using an alternative loss function (Parbat and Chakraborty, 2020).
Deep neural network
The DNN model is a deep artificial neural network model that contains input and output layers and hidden layers, which enable resolving complicated and nonlinear model relationships. Each layer consists of neurons that receive inputs from the preceding layer to prepare an outputs map and transform them into the next layer (Sze et al., 2017).
Convolutional neural network
CNN was inspired by a neurobiological model of the visual cortex of the human brain. The visual cortex comprises a map of local receptive fields that decrease granularity in the anterior of the cortex. It contains two types of cells: simple and complex cells. Simple ones respond to primitive patterns, whereas complex cells receive information from simple cells and recognize the complex form (Min et al., 2017). CNNs operate on high-dimensional inputs such as images, videos, and three-dimensional (3D) medical images. The basic model consists of three layers: convolution, pooling, and fully connected (Albaradei et al., 2021), in which the output of each layer could be an input for the next one (Min et al., 2017). A convolution activation layer, the first critical component, performs most of the calculations and is responsible for identifying features of input data through several small filters (Ravì et al., 2016).
A pooling layer is located between convolution layers, which receives feature maps and then reduces input size to conserve critical characteristics. Finally, a multilayer perceptron (MLP) network makes up the fully connected layer. This layer takes the input vector from the previous section to explore the global feature interactions (Su et al., 2020) and predicts the class of data.
Recurrent neural network
RNN analyzes sequential or time series data with a cyclic connection structure (Min et al., 2017). RNNs evaluate the current state's output depending on the previous steps, whereas traditional DNNs consider independent inputs and output data. RNNs evaluate the current state's output depending on the previous steps and have a memory-like property. Short-term memory allows RNNs to remember information from prior states (Mahmud et al., 2018). There are two inputs to this network, the present and the recent past, which combine to determine the state of the inputs. RNNs have usages in bioinformatics, generating medical image descriptions and deoxyribonucleic acid (DNA) sequences (Su et al., 2020).
Long short-term memory
The LSTM network is the advanced version of RNN (Sze et al., 2017), which overcome the RNN weakness. The major impediment to RNN is the vanishing gradient issue, lack of handling long sequences of data, and long-term dependency (Yu et al., 2019). This dependency makes the training process difficult and causes gradient explosion problems, which LSTM could overcome. As a result of these algorithms, it is possible to determine long-term lags between significant events of unknown size (Ravì et al., 2016).
Deep belief network
Deep belief network is a powerful probabilistic generative model combining stacked latent variable layers with restricted Boltzmann machines (RBM) networks. The training involves two steps: layer-by-layer training is referred the unsupervised training of each RBM; fine-tuning refers to the backpropagation technique to perform classification and other tasks on a small labeled data set after the unsupervised training (Sohn, 2021).
Autoencoder
AE is one of the most influential unsupervised neural networks (Yu et al., 2019) that attempt to reduce dimensions and reconstruct the most crucial input features (Sarker, 2021). In the network structure, the bottleneck controls the flow of information and permits only highly ranked information to pass. In addition, the decoder is a part of the network that decompresses the represented knowledge. It reconstructs the data passed through the bottleneck as a mirror image of the encoder and generates similar data with the same input data distribution. In this regard, the small bottleneck decreases the risk of overfitting, but an extremely small size impedes sending beneficial information (Pirmoradi et al., 2020). Denoising, detecting anomalies in time series, detecting network penetration, and generating new data are some applications of AEs (Su et al., 2020).
Generative adversarial networks
GAN is a contemporary architecture introduced by Goodfellow et al. It has been used for semi-supervised and unsupervised learning. GAN consists of two competing networks: a generator and a discriminator (Goodfellow et al., 2020). As an unsupervised technique, the generator model tries to generate new examples using random noise Z. The generator has no accessibility to original data, thus generating fake data via interaction with the discriminator (Alqahtani et al., 2021). However, the discriminator receives fake and real data to predict original or false data. The networks are trained until the discriminator is fooled approximately half the time.
Single Omics Data Analysis Using DL
Over the past few years, DL has gained a strong foothold in many precision medicine domains and addressed different challenges with high-throughput omics data. The omics study initiates with genomics, which assesses genome and the genetic variants related to complex diseases, DNA sequencing, and microarray Data (Fig. 2a and Table 1) (Kaur et al., 2021). Notwithstanding rapid advancements in sequencing technologies, precisely calling variants in the genome is difficult. In this regard, Poplin et al. (2018) applied the CNN method, DeepVariant, which can learn statistical likelihoods to call genetic variation in aligned next-generation sequencing (NGS). It allows nonhuman sequencing projects to benefit from a wealth of human data through the learned model, which generalizes across genome builds and mammalian species.

The Application of DL techniques for omics data analyses.
Leading Deep Learning Models for Multi-Omics Data Analyses
3D, three-dimensional; AE, autoencoder; CGAN, conditional GAN; CNN, convolutional neural network; CNV, copy number variation; CoA, coarctation of the aorta; Cox-PH, Cox proportional hazard; DDA, data-dependent acquisition; DIA, data-independent acquisition; DL, deep learning; DNA, deoxyribonucleic acid; DNN, deep neural network; EPIs, enhancer–promoter interactions; GAN, generative adversarial network; HRMAS, high-resolution magic angle spinning; LSTM, long short-term memory; MLP, multilayer perceptron; NAS, network architecture search; NMR, nuclear magnetic resonance; RF, random forest; RNA, ribonucleic acid; RNN, recurrent neural network; SVM, support vector machine; VAE, Variational Auto-Encoder; WGAN-GP, Wasserstein GAN gradient penalty.
In the same direction, a new design termed missense variant pathogenicity prediction (MVP) diagnoses the diseases of missense variants with supervised practice. MVP used the ResNet, a deep residual neural network model introduced for computer vision, to enhance the performance of prioritizing pathogenic missense variants over former models (Qi et al., 2021).
An approach that identifies genetic variants associated with specific traits and diseases is genome-wide association studies (GWAS) (Uffelmann et al., 2021). GWAS investigate the genome of a huge crowd to assess millions of single nucleotide polymorphisms (SNPs). Demontis et al. (2019) employed GWAS for attention deficit hyperactivity disorder (ADHD) prediction. GWAS is a practical method, but its effectiveness is reduced by insignificant SNPs and is sensitive to significant p values (Sun et al., 2020). To this end, Liu et al. (2021) have introduced a CNN-based binary classifier to predict ADHD diseases based on SNPs. Additionally, researchers nominated a number of potential risk genes for ADHD. The authors used three SNP sets with different p values, and the final results indicated that a lower p value achieves the most accurate classification evaluation.
With the growth of massive genetic data, such as SPNs and high-quality time-to-event phenotypes, the applications of DL approaches in survival prediction need more attention. Due to this, a survival framework compares three prediction models applied to GWAS data. The authors compared the proposed DNN with random survival forest and least absolute shrinkage and selection operator (LASSO).
A major component of the disease is the disruption of transcription factor (TF) binding. TF is a set of proteins that control the transcription rate of genetic information by binding to a specific DNA sequence (Zeng et al., 2020b). In this regard, MTTFsite offers a practical framework for predicting transcription factor binding sites (TFBSs) without labeled data using a multitasking algorithm (Zhou et al., 2019). Considering that some cell types have more labeled data than others, this framework employed a shared-private model to leverage the labeled data available. A shared CNN learns common features of all cell types, and a private one learns private features of each cell type. This article also proposed a new gene expression prediction technique termed TFChrome that incorporates TFBSs diagnosed by MTTFsite with histone modification features. It can predict gene expression for labeled and unlabeled cell types. TFChrome outperformed the method employing only histone modification features.
Multi-Omics Integration Using DL
A complex disease can be caused by cascades of modifications at different stages of omics, such as gene expression dysregulation, DNA methylation, and proteomics. In this context, multi-omics integration integrated the multiple omics techniques into the biological aspects. High-throughput NGS technologies have made unprecedented progress in the growth of multi-omics data, which allow analyzing the molecular mechanisms in multiple diseases simultaneously (Nicora et al., 2020).
Chaudhary et al. (2018) proposed a DL model that successfully used ribonucleic acid (RNA) and miRNA sequencing data, methylation data, and clinical information to predict subgroups in liver cancer. The authors have claimed that this is the first attempt that uses DL methods for the multi-omics hepatocellular carcinoma (HCC) data set integration as the most common type of liver cancer. Researchers selected AE architecture to integrate heterogeneous transcriptomics and epigenomics data. The bottleneck layer in AE generates new features from the omics data. Besides, the supervised learning classifier, SVM, has been used to diagnose healthy and patient individuals.
Francescatto et al. (2018) combined integrative network fusion, a bioinformatics technique incorporating similarity network fusion (SNF), with ML. The proposed framework identifies integrated multi-omics biomarkers based on predictive profiling and considers a new strategy to integrate genomics and transcriptomics data. RF and linear support vector machine (LSVM) classifiers were applied at three levels to achieve final biomarkers. First, they trained on direct concatenation multi-omics data. Then, the model learned the high-ranked feature rendered by SNF. Eventually, it learned the intersection outcomes of two previous steps and ranked features again to integrate multi-omics biomarkers.
Integrated multi-omics studies can significantly assess the complex molecular underpinnings of diseases, improving the biological interpretability of results and characterizing disease mechanisms and pathways at a detailed molecular level. Park et al. (2020) designed a DL model that merges DNN and Bayesian models to predict Alzheimer's disease. The authors used heterogeneous omics data sets, large-scale gene expression, and DNA methylation data. Since both genetic and epigenetic alterations have an impact on gene expression regulation, Seal et al. (2020) have established a system based on multi-omics integration. The authors employed deep denoising autoencoder to extract features and MLP to estimate gene expression from genetic and epigenetics. The effectiveness of the DL method has been demonstrated in data from The Cancer Genome Atlas (TCGA) liver HCC study.
A research team reported multi-omics late integration (Sharifi-Noghabi et al., 2019), which benefits from the RBM, to integrate the epigenomics and transcriptomics data and predict drug response. This model includes three feed-forward encoding subnetworks to the number of omics data types, and each one takes their related data to encode them into the learned feature space. These subnetworks are developed by a fully connected layer that outputs the trained features for corresponding omics. Then, the framework using the late integration technique concatenated the learned features to demonstrate a multi-omics representation.
Another deep framework has suggested survival analysis learning with multi-omics neural networks, SALMON, that integrates gene expression data (mRNA and miRNA) and cancer biomarkers to explore breast cancer corresponding co-expression modules (Huang et al., 2019). Original data before inputting to the DNN converted to an eigengene matrix inferred from co-expression network analysis that significantly decreases the feature space dimension. The SALMON located Cox proportional hazard (Cox-PH) regression after neural networks layers. Thus, high-ranked features selected by neural networks, extra information such as age, tumor mutation burden, and copy number burden are sent to Cox-PH to estimate survival probabilities.
Omics-Based COVID-19 Diagnosis Using AI
The coronavirus is turning into an endemic, so understanding its genetic characteristics and developing accurate identification techniques are imperative. AI has afforded new insights into the evolution of COVID-19. Since multi-omics analysis provided a comprehensive perspective of COVID-19, a two-level disease prediction model was implemented. The framework initially diagnoses the patient or control individuals and then predicts the disease's severity. For negative and positive classification, several ML models such as SVM, RF, LR, KNN, RUS, and MLP were applied to single omics data type, DNA methylation, and RNA-Seq. Then, the severity prediction was conducted in two strategies. At first, they used single omics, including DNA methylation, proteomics data, metabolomics data, and the mentioned ML models in the article. For the second approach, they applied an AE neural network on multi-omics data containing transcriptomics, proteomics, metabolomics, and lipidomics. The authors proved that multi-omics analysis outperformed single-type omics data (Liu et al., 2023).
Randhawa et al. (2020a) identified an intrinsic genomic signature of the COVID-19 virus by a combined ML with digital signal processing (MLDSP) model, which was improved as MLDSP-GUI applying decision tree algorithm as a supervised approach, to classify a large data set of viral genomic sequences within a few minutes. Consequently, Spearman's rank correlation coefficient analysis was used for the final evaluation (Randhawa et al., 2020b).
Studies have indicated that the patient reaction to the coronavirus differs between individuals ranging from asymptomatic to critical. Additionally, some confusing factors such as age, sex, and comorbidities cause difficulty in identifying mediators of COVID-19. Moreover, molecular mechanisms of patients without comorbidities were unexplored. Therefore, researchers performed a range of multi-omics profiling on this cohort to detect a critical gene for the COVID-19 severity using an ensemble of ML, DL, quantum annealing, and structural causal modeling techniques (Carapito et al., 2022).
As genomic sequencing is a leading component in monitoring COVID-19, several studies report the usage of AI for genome sequence analysis. Ahmed and Jeon (2022) also aimed to comparatively study the basic patterns of the genome sequence of COVID-19 and other viruses, namely SARS, MERS, and Ebola. Further, they performed an AI-based model for prediction. Firstly, they analyzed DNA sequences and then, using visualization, discovered the nucleotide information or length of the genome sequence of four diseases. Finally, SVM classified COVID-19 genome sequences and other viruses.
Another hybrid algorithm incorporated CNN with bidirectional LSTM for two purposes. The first aim is to correctly classify SARS-CoV-2 sequences among coronaviruses. The second is to classify genome sequences to indicate candidate regulatory motifs. The model has been evaluated significantly using several comparison parameters, such as various hyperparameters, different neural network models, and data sets, where the proposed model outperformed (Whata and Chimedza, 2021).
Drug repositioning, also termed drug repurposing, provides new remedial prospects for known drugs and detects cures for untreated conditions (Jarada et al., 2020). This approach delivers a promising path for accelerating drug discovery and treatment strategies for COVID-19. To this aim, a network-based DL methodology named CoV-KGE (Zeng et al., 2020a) integrated Amazon's AWS supercomputing resources and a DL framework. In the proposed model, the knowledge graph embedding model (KGE) rotatE trains vectors for entities to infer various relations from a global network of biomedical relationships (Percha and Altman, 2018) and DrunBanks, a comprehensive and free online resource including information about drugs (Wishart et al., 2018). After thoroughly investigating drug connections, gene expressions, biological pathways, and proteins, the researchers identified repurposable drugs for COVID-19.
Alvarsson et al. (2016) developed a large-scale ligand-based virtual screening named ChemAI for drug discovery to discover new therapies for COVID-19. The authors applied the DNN model on the ZINC database, a collection of chemical compounds that aim to illustrate the biologically relevant, and DrugBank databases to suggest a screening library of potential inhibitors of molecules SARS-CoV-2. A similar study has been introduced by Karki et al. (2021), which employs ZINC and DrugBank databases. It was a rapid screening method based on neural networks named SSnet and merged with the Smina docking algorithm (Koes et al., 2013) to specify protein–ligand interaction. The outcomes of this method expand the libraries that identify probable high-affinity compounds and are possible clues for new drug discovery.
A group of researchers analyzed immune system protein interactome networks, scRNA sequencing, and AI networks to demonstrate potential therapeutic marks for drug repurposing against COVID-19. They assessed immune system proteins to discover probable therapeutic targets significantly overexpressed in individuals with several immunopathologies. Then, a fully connected DNN, optimized with grid search, performed the best classification to diagnose drug activity and discover several approved drugs with the highest characteristics (Lopez-Cortes et al., 2021).
Another leading field in COVID-19 is protein structure prediction since nonsynonymous mutations may change the function of the resulting protein (Tiwari and Mishra, 2020). In this regard, Senior et al. (2020) developed a model that trained a CNN named AlphaFold to accurately predict the difference between pairs of residues to reveal extra details about the structure compared with contact predictions. The same authors developed AlphaFold2, a modern evolutionary ML approach quite different from AlphaFold. This bioinformatic tool indicates the 3D coordinates of whole atoms for a protein through the amino acids and multiple sequence alignments. The proposed network has extraordinary impacts on the biological ability for protein structure prediction, which extensively surpasses other methods (Jumper et al., 2021).
Challenges Faced by the Omics Data Analysis and DL
Even though omics technologies have collected massive amounts of biological data and have demonstrated many characteristics of primitive biological processes, analyzing these complex data remains a challenging task. A critical challenge is finding a suitable method that simultaneously investigates and extracts biological interactions among data (Grapov et al., 2018).
Omics data
According to some natural characteristics of omics data, their analysis and interpretation have faced some problems in extracting meaningful insights, which are listed below.
Curse of dimensionality
The mentioned problem occurred when the number of features was more extensive than the samples (Zhang et al., 2019). Thus, developing a model to discover significant patterns is challenging due to an insufficient number of observations to train correctly. Various dimensional reduction methods exist to overcome the limitation, including feature selection, feature extraction, regularization (Kaur et al., 2021), and data augmentation (Wei et al., 2022).
Heterogeneous data
Such circumstances, populations, samples, or results are statistically different. Omics data are usually obtained from various samples and thus have different data types and variable formats (Mirza et al., 2019). For example, two persons with the same cancer may have different biomarkers that cause their cancer. Due to this, identifying and prioritizing cancer-related genes is challenging. In this regard, ML approaches such as clustering attempt to tackle the problem.
Missing value
Missing data problems occur when some variables (observations or genes value) are not stored and are lost for reasons such as lower sensitivity or noise. As collecting omics data is intrinsically a noisy technique, missing data problem is common (Momeni et al., 2020). Because missing data is a typical issue, varying solutions exist that categorize into two main methods, imputation and removing data (Mirza et al., 2019).
Deep learning
Although DL technologies have been applied successfully in a wide range of areas, several challenges exist, as mentioned subsequently.
Large volumes of data
Deep models require big data to avoid overfitting. If the data set is too small, deep models harm more than statistical ML algorithms. Like the human brain, the artificial neural network needs much data and experience to learn more precisely and infer knowledge well. For instance, the small number of samples in gene expression microarray data sets can cause problems in data analysis through deep models. An efficient solution for addressing this issue is generating new data in two manners. The first is employing GAN models to synthesize new fake data based on existing data distribution. The second solution is various slight changes in data, such as image rotation or flip (Albaradei et al., 2021).
Overfitting
When a model learns more details over a long time or the design is very complex, it can fit training data and the learned noises in omics data. Therefore, this model will not be able to predict new data, and its performance will decrease significantly (Pirmoradi et al., 2020). Data augmentation, dropout, feature selection, early stopping, and a well-designed model are used to address overfitting.
Hyperparameter tuning
Unlike the parameters that automatically determine during learning, hyperparameters should be predefined manually before starting the learning. Although all hyperparameters have no equal effects on the learning process, finding appropriate values for all of them is of paramount importance. Noteworthy, since they are sensitive values and small changes resulted in obtaining significant results, thus optimizing algorithms is highly recommended to set them accurately (Min et al., 2017).
The black box nature
Mathematical methods were unable to prove DL algorithms. DL models are usually based on a trial-and-error process, and the logic is hidden. Hence, basic principles are imperceptible even in optimal and best solutions (Kaur et al., 2021).
Computational complexity
Deep models are more complex than statistical ML and traditional models. As a result, they have a longer training time. The deep models have numerous parameters and hyperparameters for tuning, and adjusting them should be performed in complex processes. Thus, they require heavy mathematical computations and robust systems (Kaur et al., 2021).
Conclusions
AI provided opportunities for bioinformatics scientists to improve machines' abilities. In this context, DL is a promising approach for analyzing complex and heterogeneous data. Moreover, omics comprehensively describe biological molecules and have the benefit of applications. It can be applied to clarify the relationship between multi-omics data and COVID-19 consequences. Therefore, we focused on various ML-based techniques in the diagnosis of COVID-19 based on omics data. In the inclusive survey, first, widely used ML algorithms and DL architectures were discussed. Then, diverse leading DL techniques have been applied to omics data and multi-omics data, and COVID-19 analysis has been summarized. In the end, several potential challenges have been highlighted. The survey intends for the beginner and the expert users interested in incorporating AI with omics data and can expand computer researchers' insights into omics data analysis.
Future works are required to establish comprehensive ensemble models for multi-omics integration and employ diverse methods in combination and simultaneous manner to obtain better predictive performance. Since multi-omics data suffer from the curse of dimensionality, developing dimension reduction approaches can be considered promising future directions. For instance, hybrid feature selection as a novel technique can attempt to pursue the advantages of multiple models. Moreover, data augmentation, one of the hottest paradigms in AI, can control the shortage of sample amounts next to too large a feature space by generating realistic synthetic samples. The pathophysiology of coronavirus and its clinical phenotypes is still an open issue. Therefore, more emerging and robust computational models for investigating the intersection of COVID-19 and multi-omics data are essential to unravel this complicated network for future direction.
Footnotes
Acknowledgments
The authors thank the anonymous reviewers and the editor for their valuable and constructive suggestions to improve the article.
Authors' Contributions
B.J.: Conceptualization (lead), writing—original draft (lead), formal analysis (lead), writing—review and editing (equal). H.T.: Conceptualization (supporting), writing—review and editing (equal). A.R.: Methodology (lead), original draft (supporting), writing—review and editing (equal).
Author Disclosure Statement
The authors declare they have no conflicting financial interests.
Funding Information
No funding was received for this article.
