Transforming cutting-edge healthcare: Emerging trends in metabolomics and drug design using artificial intelligence and big data methodologies

Abstract

Artificial intelligence is a disruptive area which transforms cutting-edge healthcare technology to analyze clinical workflows, sharpen diagnostics, and improve precision medicine. The goal of this review is to identify approaches involving a collaborative examination to determine the key features influencing the adoption of artificial intelligence methodologies in advanced cutting-edge solutions for metabolomics and drug design. In clinical and translational settings, a comprehensive investigation of legal and ethical principles will be included to highlight the significance of omics analysis and drug design in the application of artificial intelligence tools with artificial intelligence in healthcare, the real-world uses, and difficulties tied to societal and regulatory issues. The real-world effects of artificial intelligence for researchers and technicians can provide guidance for tailored strategies focused on leveraging potential to improve high-dimensional data analysis on metabolomics and drug design. As artificial intelligence methodologies continue to evolve, efforts must be directed toward structured frameworks that uphold human oversight and engagement to optimize the utility of artificial intelligence algorithms and big data methodologies. The key contributions of this study include a comprehensive overview of cutting-edge artificial intelligence methodologies and software programs in metabolomics and drug design, and critical perspectives which can solidify the future directions in the development of algorithmic approaches to bridge metabolomics and drug design.

Keywords

artificial intelligence big data precision medicine drug design software

Introduction

Precision medicine is a paradigm-shifting approach that processes the vast data of individual variations based on patients’ genetics and lifestyles to detect diseases and manage lifelong health.¹ As foundation tools, artificial intelligence technologies can identify complex biological, chemical, pharmacological associations of features in multi-omics data which represent patients’ individual variations²; features can represent metabolites, biochemical variables or environmental exposures, and interactions of features can indicate co-existing elements, activators or inhibitors. Artificial intelligence and big data methodologies that explain interrelated structures in real-world data should be developed and extended. Artificial intelligence algorithms in the analysis of high dimensional data can illustrate the interactions of metabolites and furthermore metabolic pathways related to the complex mechanisms used to predict targeted outcomes such as absorption, distribution, metabolism, excretion, and toxicity.³ Analyzing interactions of features in high dimensional data is challenging since big data includes large volume of data with high complexities.⁴ Effective analysis and valid statistical inference of high dimensional biomedical data followed by significant interpretation of analysis results should be devised. The application of feature engineering with statistical analysis of high dimensional features to predict outcomes can be significantly considered. Artificial intelligence techniques with statistical analysis can help to analyze unknown or complex interactions of biomedical features in a dimensionality reduction framework. Available artificial intelligence tools have broadly been applied to genomic data,^5,6 but rarely to metabolomics for drug design. Precision medicine requires well-organized data collection and processing in metabolomics and drug design. As a fundamental tool with the profiling of small molecules, metabolomics has been used to obtain biomedical information and illustrate the properties of collected and processed metabolomics data.⁷

Applying artificial intelligence methodologies on high-dimensional metabolomics data with complex structures can build predictive models for data analysis with biomedical interpretation.⁸ Artificial intelligence in drug design strengthens effective and efficient solutions by identifying the interactions of features and predicting various outcomes in pharmacological and pharmaceutical science including the identification of target, drug repurposing, diagnoses of diseases, market analysis and clinical studies.⁹ Recent studies have concentrated on artificial intelligence methods in high-dimensional biomedical data modeling through extensive computational and technological resources.¹⁰ Artificial intelligence techniques are applied in biological systems to learn data without specific prior knowledge to build reproducible and efficient models.¹¹ To effectively analyze high-dimensional biomedical data with complexity due to biological, chemical or pharmacological factors, artificial intelligence methods can be expanded to facilitate supervised, unsupervised, and reinforcement learning.¹² The utilization of artificial intelligence methodologies depends on the types and complexity of biomedical data with efficient learning algorithms. Researchers conduct classification or regression with dimensionality reduction, clustering and feature selection to improve hypothesis-driven systems and evaluate various bit data technologies.¹³ In data preprocessing, traditional statistical methods begin with a hypothesis based on probabilistic distribution from data while artificial intelligence methods develop predictive models directly from the data without requiring rigid the assumption of probabilistic distributions. Traditional statistical techniques work well with low-dimensional datasets where the number of observations exceeds the number of features, but artificial intelligence methods can be used to handle complex high-dimensional data such as genomics, transcriptomics, proteomics, metabolomics, texts or medical imaging. In the analysis of latent interactions of features in biomedical data, traditional statistical models are highly interpretable on the clinical associations between features, although artificial intelligence models can provide better predictive accuracy possibly with lower interpretability in decision-making processes in terms of prediction accuracy and model efficiency (Table 1).¹⁴

Table 1.

Artificial intelligence algorithms in biomedical applications.

Algorithm	Key algorithms	Applications
Supervised Learning	Decision trees, Random forest, K-nearest neighbors, Regression, Neural networks (Convolutional Neural Networks, Graph Neural Networks, etc.), Support vector machine, Boosting	Disease classification (e.g. heart disease,¹⁵ diabetes,¹⁶ cancer¹⁷), biomedical text classification,^18,19 identification of biomarkers,²⁰ medical image analysis for lesion segmentation,^21–23 Convolutional neural networks in metabolomics,²⁴ Graph neural networks for molecular property prediction,²⁵ AI-based metabolic pathway modeling²⁶
Unsupervised Learning	K-Means Clustering, Hierarchical Clustering, Gaussian Mixture Models, Principal Component Analysis, t-SNE (t-distributed stochastic neighbor embedding), UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction), Generative Adversarial Models (GANs), Variational Autoencoders (VAEs)	Identification of cell types,^27,28 detection of genetic ancestry,²⁹ extracting rare cell subpopulations with immune responses,³⁰ disease diagnosis or staging with image analysis,^31–33 Generative models (GANs, VAEs) for drug design^34,35
Reinforcement Learning	Q-Learning, SARSA (State-Action-Reward-State-Action)	Disease classification,³⁶ Healthcare resource scheduling^37,38

Environmental factors including metal toxicity can occur from environmental exposure to heavy metals such as cadmium or lead causing deleterious effects in the function of the biological system.³⁹ To examine metabolomics affected by heavy metals, many studies are conducted in biological model systems.^40,41 This study proposes the utilization of machine learning algorithms to identify the effect of heavy metals on the metabolites in biological entities. The utilization of algorithms estimates the association of strongly interacting metabolites and classified the exposure to cadmium or lead.⁴² The data analysis results can be corroborated for the association of metabolomics from medical literature with the use of artificial intelligence models for high throughput analysis in the biological model system.⁴³ The interaction of metabolites in high-dimensional metabolomics can occur with the existence of common environmental exposures such as trace elements.⁴⁴ The use of high-dimensional metabolomics data in biology is still at a nascent stage. It is strongly expected that big data analytics tools can improve the classification of target health outcome in the field of biology because large-scale data has been cumulated. Big data analytics in metabolomics strengthens the development of classification models which can effectively and efficiently define whether strongly interacting metabolites can estimate a health outcome.⁴⁵ Depending on the structures of metabolomics data and classification model, biological mechanisms can be thoroughly investigated. This study explores the network of metabolites related to the effect of environmental exposure such as trace elements and the network is formed to utilize machine learning algorithms with statistical pre-processing of metabolomics data.⁴⁶ The advantage of developing and utilizing big data analytics tools is to visualize the associations of interconnected metabolites and classify the existence of environmental exposures in metabolomics.⁷ The utilization of big data analytics tools enables biomedical researchers to identify biological pathways, possibly understanding the biological mechanisms of metabolism.⁴⁷ Studies on metabolomics have been implemented in order to develop existing artificial intelligence methodologies for application to complex networks of metabolites related to health conditions.⁴⁸ Extensively analyzing high-dimensional datasets with complex structures requires computational and statistical methods to investigate the biological pathways of metabolomic data. If there are missing values in the biomedical data, there may be multiple reasons affected by insufficient or unrelated answers to survey inquiries or numerous experiments with inconsistent results from the complex data.⁴⁹ While assuming and building a network model of complex but sparse biomedical data, extensive datasets with complex structures can decrease computational efficiency of data analysis in practice.⁵⁰ Below, an example is shown regarding the applications of artificial intelligence methodologies including principal component analysis and network analysis based on LASSO for the analysis of zebrafish metabolomics which is possibly affected by environmental exposures such as metal toxicity. This example demonstrates that to analyze large and complex biomedical data (e.g. zebrafish metabolites) and predict a specific outcome (e.g. exposure level), not only a single artificial intelligence method (e.g. PCA) but also additional methods including network analysis can be utilized to obtain an improved classification result. Various software programs which can solve such problems are listed in the later sections.

As shown in Figure 1, detecting exposed zebrafish using principal component analysis only is not clear since the classified groups are tightly clustered. In this case, network analysis which identifies the interactions of metabolites can be suggested as shown in Figure 2.

Figure 1.

Classification of exposed and not exposed metabolites of zebrafish using principal component analysis. PCA failed to separate exposed vs. non-exposed metabolites, indicating high feature interdependence; network analysis resolved connectivity and exposure patterns.

Figure 2.

Interactions of metabolites in zebrafish models. LASSO method was applied to build networks of metabolites in an undirected graphical model.

Effectively and efficiently utilizing appropriate artificial intelligence techniques with network analysis and statistical inference is quite significant since big data are huge and include complex interactions of factors. Thus, this review will survey major artificial intelligence methodologies significantly applied in metabolomics studies which can be utilized to biomedical data in drug design with different levels of predictive accuracy or model performance. While the model performance and efficacy of artificial intelligence methods in drug design are considered, major challenges and limitations including data security, legal regulations, or ethical issues will be discussed to optimize the big data analytics in biomedical sciences and healthcare.⁵¹

In drug design with metabolomics studies, alterations in metabolic patterns of biomedical data can reveal mechanisms of absorption, distribution, metabolism, excretion or toxicity.⁵² Removing a potentially harmful chemical entity in the development process can be shown in recent studies using metabolomics for identifying environmental toxicity biomarkers. It is demonstrated that environmental exposure to various toxicants including heavy metals can lead to changes in oxidative stress responses, or amino acid and lipid storage processes. The main goal of the research is depicted in Figure 3.

Figure 3.

How AI integrates with omics data, network analysis, and clinical decision-making.

Search strategies of this review have included a comprehensive literature search of relevant articles from Google Scholar, PubMed, and WEB of Science electronic databases. This limited most of the publication dates to 2010 to 2025. The database search terms included “big data”, “metabolomics”, and “drug design” which are Medical Subject Headings (MeSH) terms included in the keyword search.

The objective of this review is to convey a comprehensive overview of cutting-edge artificial intelligence models and software programs for metabolomics and drug design with perspectives on challenges and future research directions. This review will organize the current literature to offer researchers possibilities in the most recent trends and advancements in application of artificial intelligence in metabolomics and drug design.

Artificial intelligence techniques

In biomedical research, supervised learning covers machine learning techniques that identify approximated functions linking input data to a continuous or categorical output through computational and statistical validation. Labeled training data can be used to evaluate a collection of training observations to derive a generalized prediction function in text or imaging data. Unsupervised learning analyzes biomedical data sets without labels, involving clustering, feature extraction, and dimensionality reduction.⁵³ Reinforcement learning methods involve rewards or penalties by leveraging big data gathered from the agent’s environment to increase the reward or reduce the risk. A robust tool for training artificial intelligence models can improve automation or the effectiveness of complex systems in biomedical research. Analytics of high-dimensional data in healthcare can be utilized on structured, semi-structured, or unstructured information.⁵⁴ Structured data in orderly format is systematically organized and easily accessible by software applications while unstructured data mainly consists of text or imaging elements. Semi-structured data is not stored in a relational database like structured data is, but metadata pertains to information regarding large-scale data which specifies key details about the data. Regarding data preprocessing, this review will address data management in large-scale healthcare analytics by emphasizing the significance of data quality and standardization with data governance frameworks and standardized protocols for data collection and reliability.⁵⁵

Supervised learning

The decision tree is a non-parametric supervised learning technique which is applied for both classification and regression, by analyzing the instances by traversing down from the root to leaf nodes.⁵⁶ Decision tree can be used to identify drug metabolism and pharmacokinetic mechanisms for new chemicals. Since chemical data including metabolism and pharmacokinetic mechanisms involve complex structures of biological or chemical entities, building the appropriate size of decision tree is crucial to avoid the overfitting of modelling and save processing time for big data analytics.⁵⁷ Decision tree can train models to perfectly classify outcomes using metabolomics and drug data, but early stopping in the modeling can be quite significant to obtain the optimized artificial intelligence model.⁵⁸

A random forest algorithm implements prediction or classification across various domains of application.⁵⁹ The random forest algorithm trains multiple decision trees in parallel on subsamples of the datasets by estimating majority voting or averages to determine the prediction results. Random forest algorithm can be used in the GC-MS analysis using various clinical metabolomic datasets for biomarker selection. Metabolic pathways of glycolysis, microbial-host metabolism, and drug gene expression.⁶⁰ Since random forest implements bagging over decision trees, the process of sampling over complex high-dimensional datasets requires efficient computation with proper levels of decision trees.⁶¹

K-Nearest Neighbors is a type of instance-based learning referred to as a lazy learning algorithm.⁶² K-nearest neighbors algorithm sets the number of nearest neighbors in the prediction problem with all data points according to similarity metrics based on the majority vote of k closest neighbors for each data value. Since metabolomics datasets contain missing values and nonlinear dependencies among features, data imputation implemented by k-nearest neighbors method can accurately reconstruct and complete metabolomics data with the generalized modeling.⁶³ K-nearest neighbors method can be used to identify molecular connectivity to filter potentially problematic absorption, distribution, metabolism, or excretion properties. However, as an instance-based method, k-nearest neighbors algorithm should optimize computational time estimated by the size of biomedical data and the number of neighbors, resulting in high computational complexity with biomedical entities.⁶⁴

The regression method takes a linear function by establishing a relation between the outcome variable and one or more explanatory variables, producing the optimal straight line.⁶⁵ As regression methods, LASSO and Ridge regression effectively construct learning models with high-dimensional features and simplify the model with regularization by considering multicollinearity.⁶⁶ The prediction of specific continuous outcomes can be estimated by considering the interactions of metabolites in the network and LASSO by jointly modelling biomedical entities. LASSO can be used to predict absorption, distribution, metabolism, excretion or toxicity properties with explainability which selects significant molecular substructures or chemical properties which affect predictive accuracy.⁶⁷ However, in recent studies, in terms of prediction accuracy, random forest or boosting methods outperform LASSO. As one of supervised classification methods, regression can be used to identify pathway features to build metabolic networks in the pathway modeling²⁶

The architecture of neural network method includes modelling functions and steps in learning examples of biomedical research.⁶⁸ Neural network methods can be applied to define characteristic metabolites for the discrimination of the geographical or biological differences between entities and detect biomarkers for prediction of outcomes.⁶⁹ In drug design studies, the dependent variables of drug discovery can discrete outcomes like permeability or the continuous outcome of IC₅₀ values with molecular structures.⁷⁰ The structure of neural networks consists of nodes and edges, indicating that the complex structure of neural network can cause inefficiencies as in the big data analytics with random forest. As a type of deep neural network, the application of the convolutional neural network model can better interpretability in the analysis of metabolomics by utilizing spectral metabolomics profiles with biomedical research driven by artificial intelligence.²⁴ The graph neural network can be utilized to analyze molecular structures over bonds and connections among fragments by increasing prediction accuracies of molecular properties.²⁵

A support vector machine is a non-parametric approach in machine learning for modeling regression and classification tasks with high-dimensional data.⁷¹ The hyperplane preserves the maximum distance from the nearest training data points in any class, as a wider margin is linked to a lower generalization error for the classifier. A support vector machine is used to investigate disease diagnosis with alterations in metabolite concentrations in biofluids or pharmacokinetic studies to classify the variations between two clinical groups.⁷² In large biomedical data sets, the utilization of support vector machine may require long training times with high computational cost.

Gradient Boosting is a method of ensemble learning that implements the integration of analytical models. Extreme Gradient Boosting, XGBoost, is a gradient boosting method that provides estimations during the search for the best model.⁷³ XGBoost with metabolic profiles can be applied to conduct comparative analyses on cases and controls by identifying biomarkers for specific diseases.⁷⁴ Boosting methods can be applied for a reliable framework in target identification of drug design. Since boosting is computationally intensive in the analysis of biomedical data with high complexity, flexibility in allocating resources is required.⁷⁵

Unsupervised learning

K-means clustering effectively provides reliable results when data points are grouped into a cluster, minimizing the distance between each data value and the centroid.⁷⁶ The K-means algorithm randomly assigns the k centroids and subsequently allocates each data point to the closest cluster. K-means clustering can be applied to classify metabolites using a data-driven approach focused on distinct biological profiles and cluster the composition of drugs related to these biological traits.⁷⁷ With a random choice of centroids of clusters, the classification outcomes may, however, vary in biomedical data.

Hierarchical clustering can build a hierarchy of clusters represented as a tree structure in an agglomerative or divisive way.⁷⁸ Hierarchical clustering can be used to cluster mass spectra for metabolomics data which visualizes the identification of specific biological profiles and data-driven approach for screening of biological drugs.⁷⁹ Hierarchical clustering is sensitive to outliers and preprocessing biomedical data to remove extreme data values is critical.

Gaussian Mixture Models assign each data value a probability of being classified to each cluster.⁸⁰ Classifying data values using Gaussian Mixture Models reinforces the identification of diseases with various biomedical conditions from the molecular level to human level. Since generating the precise prediction of drug results is a complex and difficult process, the utilization of Gaussian Mixture Models can be expanded to thoroughly monitor patients’ overall symptoms by different time groups.⁸¹ Similarly with K-means clustering, the initialization of Gaussian Mixture Models may cause the failure to converge. If the true biomedical data distribution is non-normal, prediction or classification using Gaussian Mixture Models will underperform.⁸²

Principal component analysis is used to transform a set of correlated variables into a set of uncorrelated variables known as principal components.⁸³ Principal component analysis acts as a technique for feature extraction that reduces the dimensionality of datasets by maximizing the variance of projections and minimizing reconstruction errors. Principal component analysis conducts partial classification of metabolites and loadings of principal component analysis can be monitored for the detection of more important metabolites in the analysis.⁸⁴ Since the principal components instead of features are used to understand biomedical data, direct selection of interpretation of significant features is not possible.

T-SNE (t-distributed stochastic neighbor embedding) implements a statistical method for visualizing big data by embedding data points into a low-dimensional map and can be utilized to analyze metabolite profiles and metabolic pathways.⁸⁵ Due to the complexity of biomedical data, t-SNE not only requires high computational time but also produces non-deterministic results with challenging interpretability.⁸⁶

UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) can be used for visualization with nonlinear dimensionality reduction.⁸⁷ UMAP can be applied to low-dimensional metabolites without bias across multi-omics contexts, identifying potential drug leads or improving model generalizability. With complex high-dimensional biomedical data, UMAP may generate non-existent clusters by determining clusters which may not be close to each other.⁸⁸

In GANs, the generator generates real samples, and discriminator distinguishes generated samples from real samples, and in VAEs, an encoder embeds data to latent representations, and a decoder reconstructs the data with re-sampled representations, minimizing the reconstruction error during the training process.⁸⁹ GANs are used to generate the structures of molecules in drug-target interaction.³⁴ The improved VAEs can detect the structural interaction of molecules and generate molecules with improved properties more effectively³⁵

Reinforcement learning

Q-learning is a model-free off-policy reinforcement learning algorithm which can optimize reward based on Markov decision processes and SARSA is an on-policy reinforcement learning algorithm to make agents learn optimal policies with temporal difference.⁹⁰ In real-time applications, Reinforcement learning can be applied to analyze electronic health records in health care systems as targeted nodes.⁹¹ Reinforcement learning algorithms are utilized to conduct the generation of new chemical structures by overcoming the complexity of chemical space exploration by avoiding massive computing.⁹²

Personalized drug design and precision medicine

The aim of precision and personalized medicine with artificial intelligence technologies is to supplement healthcare systems which store and process demographic and biomedical traits of patients.⁹³ Overall factors related to a single patient include both biomedical and demographic factors related to substantial genotypes and phenotypes to maintain health and handle diseases. With the patients’ lifestyles and specificity in pathology, managing patients’ illness is as crucial as the disease process regarding diagnostic and therapeutic methodologies.⁹⁴ Clinical specialists have referred to disease-specific guidelines to standardize healthcare for patient care. Advances in genetic testing for the identification of subtypes of diseases or drug responses have led to a more stable customization of clinical treatment.⁹⁵

Personalized drug design processes and utilizes the genetic, environmental and clinical information of patients to expand molecular insights for the detection and analysis of diseases such as HIV with clinical decision support to improve preventive health care strategies and treatment outcomes.⁹⁶ Information from the human genome helps biomedical researchers and healthcare providers develop optimized care strategies for phases of a disease, covering the long but effective path from scientific breakthroughs in the lab to the vital application of the innovative medical skills into human biology in clinical and transitional sciences.⁹⁷ A patient’s genes, environments and lifestyles for precision medicine allows for the detection of latent risk factors for disease susceptibility, forecasting responses to drug treatments, and evaluating the likelihood of side effects after pharmacological intervention.⁹⁸

Precision medicine methodologies have been devised to optimize clinical assessments and treatment outcomes for clinicians and practitioners.⁹⁹ Diagnostic tests are arranged to extract the optimal treatments based on molecular or cellular data. Wearable sensors in precision medicine have led to findings related to tailored interventions for older patients dealing with syndromes and morbidities.¹⁰⁰ Collaborative efforts in precision medicine can greatly accelerate progress of personalized drug design with artificial intelligence methods and medical skills including chemotherapies.¹⁰¹ The analysis of biomedical data with complex structures and high dimensionality on molecular and human levels can be very challenging. The effective and efficient analysis of high-dimensional biomedical data can be implemented with large network analysis. The network analysis can improve the process for suitable medications to match the dosage and timing for delivery or minimize adverse medical effects.¹⁰²

As shown in Table 2., optimizing network analysis for the high-dimensional data for precision medicine has been broadly developed in biomedical sciences. The application of artificial intelligence techniques with network analysis can be developed to complete the comprehensive analysis of big data targeting personalized drug design.

Table 2.

Software for network analysis.

Software	Functions	Link
1¹⁰³	Visualization and comparative analysis with neural network models	https://github.com/ElsevierSoftwareX/SOFTX-D-24-00280
2¹⁰⁴	A python library for network analysis	https://github.com/damianfraszczak/netcenlib
3¹⁰⁵	A temporal network analysis software package	https://github.com/ElsevierSoftwareX/SOFTX-D-22-00139
4¹⁰⁶	A temporal neural network tool for causality analysis	https://github.com/ElsevierSoftwareX/SOFTX-D-20-00005
5¹⁰⁷	Tensor-network package for optimization on graphs	https://github.com/ElsevierSoftwareX/SOFTX-D-25-00078

Application of artificial intelligence in metabolomics and drug design

AI in metabolite biomarker discovery

Detecting significant biomarkers with metabolomic studies of metabolic alterations and external biomedical factors can give insights into the diagnoses of complex diseases. Input datasets for metabolite biomarker discovery include clinical data such as age, gender, or BMI and the metabolites with metabolite abundance.¹⁰⁸ The databases for biomarker discovery include MarkerDB, a resource for molecular biomarkers with consistency and applicability.¹⁰⁹ Artificial intelligence models can help researchers to classify diagnosed or undiagnosed cases with accurate statistical analysis to identify metabolic patterns.¹¹⁰ The application of artificial intelligence models can detect biomarkers and targets for successful precision medicine.

AI-based metabolic pathway modeling

Metabolic pathways can be identified with the utilization of artificial intelligence methodologies such as ensemble modeling where multiple modeling algorithms are included to analyze and predict features of metabolic pathways.¹¹¹ Input data for the metabolic pathway modeling include molecular fingerprints or concentration values of metabolites. The database KEGG includes pathway maps with information about molecular interaction in functional and systems biology.¹¹² Large-scale artificial intelligence models simulate the networks of metabolic alterations to analyze metabolic pathways. Analyzing metabolic pathways encourages researchers to discover important biochemical reactions and metabolic mechanisms.¹¹³

AI for drug target identification

Researchers can conduct the identification of drug targets in a data-driven system-level inference by handling biological complexity and limited scalability.¹¹⁴ The types of data include ligands or proteins stored and curated in databases such as DrugBank or ChEMBL.^115,116 The patterns from omics data are used to track specific targets or regulatory networks.¹¹⁷ The biomedical data analysis can identify binding sites in target proteins with virtual screenings.¹¹⁸ The identification of drug targets with knowledge about biological activity can lead to pharmacokinetic studies.

AI for ADMET prediction, Pharmacokinetic properties can be specified by processes of absorption, distribution, metabolism, excretion, and toxicity in drug discovery and development. Key target outcomes include human intestinal absorption or Caco-2 permeability for absorption, Blood-Brain Barrier permeability for distribution, Cytochrome P450 interaction or metabolic stability for metabolism, total clearance for excretion, or hERG Channel inhibition or hepatotoxicity for toxicity.¹¹⁹ Types of input data for ADMET prediction are chemical structures represented by Simplified Molecular Input Line Entry System (SMILES) and chemical descriptors such as molecular weight or hydrogen bond donors or acceptors.¹²⁰ The databases of SMILES include PubChem, DrugBank or ChEMBL.^115,116,121 Artificial intelligence and big data methodologies can effectively examine high-dimensional biomedical data including SMILES with optimized model performance and generalizability of reproducible artificial intelligence models on standardized datasets.

Integration of metabolomics in drug discovery pipelines

The current drug discovery pipelines include natural product-based drug-discovery pipelines and interdisciplinary fields of research with metabolomics studies.^122,123 The Human Metabolome Database (HMDB) is one of the largest and most comprehensive collections of human metabolites and metabolism data.¹²⁴ The utilization of artificial intelligence methodologies with statistical analysis enables the efficient curation of complex metabolomics data in HMDB and the integration of metabolomics in drug discovery will lead to crucial findings in biomedical sciences related to drug design and development.⁴⁸ As one of open-source pipelines for the integration of metabolomics in drug discovery, NP³ MS Workflow shows an effective approach to extract significant features from complex mixtures for untargeted metabolomics.¹²⁵ Investigating metabolic networks at varied molecular levels can provide the improved analysis of selected biomarkers or targets related to drug reactions and diagnoses of diseases in drug discovery.^126,127

Challenges of artificial intelligence in healthcare

Practical implementation challenges such as data security or legal regulations over clinical workflows may critically affect artificial intelligence research into healthcare applications. As shown in Table 3, there are data protection regulations such as General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and Health Insurance Portability and Accountability Act (HIPAA), highlighting the importance of privacy-preserving technologies to protect patient data.¹²⁸

Table 3.

Data protection regulations.

Data protection regulations	What they do
GDPR	Protects all personal data of people in the EU
CCPA	Protects residents in California for the for-profit businesses with specific revenue
HIPAA	Protects protected health information within the U.S. healthcare industry.

While artificial intelligence in healthcare can advantageously enhance precision medicine and personalized drug design, the utilization of large-scale biomedical data may face algorithmic bias or data vulnerability. Algorithmic bias can be related to disparities for minority groups, requiring responsible deployment with algorithmic fairness since artificial intelligence applied to the analysis of non-representative data can result in bias or inaccuracy handling racial, gender, or socioeconomic factors in precision medicine.¹²⁹ Explainable artificial intelligence platforms in user-friendly interfaces should be introduced for clinicians and policy makers to understand how stable and safe the integration of artificial intelligence with biomedical data analysis can be in healthcare systems. In clinical decision making of healthcare systems, issues related to algorithmics bias in the ethical examination of profiling methods using synthetic data can be mitigated by considering the ethical aspects encompassing algorithmic biases or privacy issues.¹³⁰ Maintaining data diversity, handling resource limitations, and controlling algorithmic biases must be considered along with the difficulties of merging AI-driven systems into biomedical workflows. Artificial intelligence technologies can effectively and appropriately function across various groups in society by addressing issues stemming from health professional practices that have traditionally been biased with inaccurate documentation, resulting in imbalanced predictions that unfairly affect certain groups.¹³¹

Healthcare data confidentiality can be validated in ethical, legal and technical aspects within various regulatory systems. International regulations can flexibly reflect local variations regarding cutting-edge technological approaches to strengthen security vulnerabilities. The notable variations in data privacy have stemmed from regions of different times. For example, North America confronts HIPAA enforcement issues or Europe has been tied up with GDPR.¹³² With regulations, standardizing demographic traits of human population over healthcare systems has been complicated with the wide range of pharmaceutical products involving the Food and Drug Administration. Nonprofit organizations have developed standardized metrics to assist with the growing regulatory requirements for care units and hospital operations.¹³³ HIPAA encompasses several provisions which can ensure standardization and privacy over distinct claim forms in circulation, and apart from the well-known component of patient confidentiality.¹³⁴ Existing frameworks in healthcare systems often fail to encompass artificial intelligence technologies which implement real-time learning, validation processes in clinical practices.¹³⁵ An extensive examination of the relationship between artificial intelligence and regulatory systems can resolve ethical or legal conflicts and obtain equitable and secure medications.¹³⁶

Limitations and future directions

Potential research directions and innovative studies have been proposed to build the incorporation of artificial intelligence in pharmacology, clinical trials, and healthcare services.¹³⁷ The use of artificial intelligence models in drug discovery can investigate the application of artificial intelligence methods, including machine learning and generative artificial intelligence within the field of drug development.¹³⁸ Biomedical researchers may concentrate on virtual screening, de novo molecular design, and forecasting drug-target interactions to speed up the discovery of new therapeutic agents.

Artificial intelligence methodologies applied to clinical research can create a range of algorithms to develop trial designs, simplify patient recruitment, and establish frameworks of interdisciplinary research.¹³⁹ Techniques for utilizing real-world evidence can be examined through electronic health records, wearable devices, and mobile health applications with clinical trial processes and high-quality biomedical and clinical data.

As shown in Table 4, web servers and tools which analyze large-scale metabolites are available for biomedical researchers. These methods can be used to analyze high-dimensional metabolites including complex interactions. With real-time monitoring and prediction analysis in healthcare delivery, biomedical and clinical research emphasize artificial intelligence algorithms to estimate the proportions of patient decline, and hospital returns by assessing proactive measures and customized care provision.¹⁴⁰ Artificial intelligence methodologies are utilized to extensively analyze multi-omics data with integrating approaches for various types of omics data, including genomics, transcriptomics, proteomics, and metabolomics.¹⁴¹ Artificial intelligence techniques can be flourished by comprehending disease mechanisms and responses to drugs. Identifying biomarkers in complex biological data and optimizing treatment plans based on individual genotypes or phenotypes can combine artificial intelligence methodologies with healthcare systems in robust and promising ways.¹⁴²

Table 4.

Software for metabolomics analysis.

Software	Functions	Link
1¹⁴³	Enhancing biological sequences data integration with real-time identifier translation	https://github.com/ElsevierSoftwareX/SOFTX-D-24-00554
2¹⁴⁴	A quick and simple gene set enrichment analysis and visualization tool	https://github.com/michalbukowski/dge-ontology
3¹⁴⁵	A platform for large-scale processing, storage and analysis of metabolomics data	https://methos.cebitec.uni-bielefeld.de/
4¹⁴⁶	A suite of metabolomics data analysis tools	https://pypi.python.org/pypi/secimtools
5¹⁴⁷	A web server for metabolomic data analysis and interpretation	https://www.metaboanalyst.ca

As shown in Table 5, various drug design approaches can be considered for the analysis of absorption, distribution, metabolism, excretion and toxicity. Large-scale data analysis can be furthermore examined.

Table 5.

Software for drug design.

Software	Functions	Link
1¹⁴⁸	Drug design with target-aware molecule generation through a chemical language model	https://github.com/SigmaGenX/TamGen
2¹⁴⁹	Modern AI–driven generative molecule design	https://github.com/MolecularAI/REINVENT4
3¹⁵⁰	Integrating graph and sequence models for predicting drug-target binding affinity	https://github.com/zhuziguang/GS-DTA
4¹⁵¹	Integrating Multi-Modal Deep Learning with Pocket-Drug Graphs for Drug-Target Binding Affinity Prediction	https://github.com/zhc-moushang/MMPD-DTA
5¹⁵²	Planning of phase II/III drug development programs with optimal sample size allocation and Go/No-go decision rules in R	https://github.com/ElsevierSoftwareX/SOFTX-D-24-00273

Healthcare text mining methods have broadened insights from unstructured clinical data, medical publications, and biomedical information produced by patients.¹⁵³ Interactively building natural language processing models for clinical decision support can give coding and documentation efficiency while analyzing population health with refined information retrieval and knowledge discovery in healthcare environments.¹⁵⁴

Conclusion

The future directions regarding the application of artificial intelligence methodologies in precision medicine are crucial with the improved healthcare services. Studies in biomedical sciences focus on achieving more precise and efficient artificial intelligence algorithms, enhancing data quality and accessibility, and resolving ethical and privacy issues. The primary challenge in solidifying artificial intelligence methodologies for clinical environments with medical documentation is to provide effective, efficient, and fair artificial intelligence tools for practitioners and healthcare professionals within the healthcare system. The application of artificial intelligence and statistical methods to convert data into insights will have a great impact on the healthcare industry, replacing a significant portion of the tasks performed by clinicians. Preserving human factors in medicine with algorithmic fairness and transparency can improve the likelihood of inducing suitable treatment decisions. Digital health is transforming the conventional medical hierarchy into an equitable collaboration between patients and healthcare providers. The disruptive artificial intelligence technologies can provide the potential to strengthen healthcare systems with big data analytics. With high volume of texts, signals, images in healthcare systems, the next decade of biomedical research may utilize and control artificial intelligence focusing on multimodal learning with continuously increasing complex data. Researchers in the next generation will effectively and efficiently analyze big data with appropriately established ethics and algorithmic fairness with artificial intelligence tools. By prioritizing data privacy and security related to experiences of patients, researchers should transform biomedical sciences into a lively and evolving environment with healthy artificial intelligence.

Footnotes

ORCID iDs

Sangjin Kim

Donggeun Kim

Juyong Ko

Xin Zan

Kin Lok Wong

Yujing Mao

Lakhwinder Kaur

Hyeonjun Nam

Jai Woo Lee

Ethical considerations

Because this study did not involve humans or animals, and the underlying data were obtained from public databases, ethical approvals and informed consent were not required.

Author contributions

Conceptualization, S.K. and J.W.L.; Writing — original draft preparation, S.K. and J.W.L.; Writing — review and editing, All co-authors; Visualization, S.K. and J.W.L.; Supervision and Funding acquisition, J.W.L.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article is financially supported by the 2026 College of Public Policy at Korea University. This work was supported by the National Research Foundation of Korea (NRF) grant (BK21 FOUR (Education & Research Center for Leading Talent in Data Science-Based Future Health Society, Korea University, Sejong)) funded by the Korea government (MOE).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Gray

Kross

Renfrew

, et al.

Precision Medicine in Lifestyle Medicine: The Way of the Future?

Am J Lifestyle Med 2019; 14(2): 169–186. https://doi.org/10.1177/1559827619834527

Yetgin

. Revolutionizing multi-omics analysis with artificial intelligence and data processing. Quantitative Biology 2025; e70002: e70002. https://doi.org/10.1002/qub2.70002

Yin

, et al. Artificial Intelligence in Pharmaceutical Sciences. Engineering 2023; 27: 37–69, ISSN 2095-8099. https://doi.org/10.1016/j.eng.2023.01.014

Chakraborty

Bhattacharya

Pal

, et al. From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare. Current Research in Biotechnology 2024; 7: 100164. https://doi.org/10.1016/j.crbiot.2023.100164

Weerarathna

Kamble

Luharia

. Artificial Intelligence Applications for Biomedical Cancer Research: A Review. Cureus 2023; 15(11): e48307, PMID: 38058345; PMCID: PMC10697339. https://doi.org/10.7759/cureus.48307

Steward

Parker

APJ

Minassian

, et al. Genome annotation for clinical genomic diagnostics: strengths and weaknesses. Genome Med 2017; 9: 49. https://doi.org/10.1186/s13073-017-0441-1

Qiu

Cai

Yao

, et al. Small molecule metabolites: discovery of biomarkers and therapeutic targets. Signal Transduct Target Ther 2023; 8(1): 132. https://doi.org/10.1038/s41392-023-01399-3

Shenouda

Senthilkumar

Mourad

, et al. Artificial intelligence to investigate metabolomics data for precision medicine. Metabolomics 2026; 22: 29. https://doi.org/10.1007/s11306-026-02401-z

Paul

Sanap

Shenoy

, et al. Artificial intelligence in drug discovery and development. Drug Discov Today 2021; 26(1): 80–93. https://doi.org/10.1016/j.drudis.2020.10.010

10.

Rahnenführer

De Bin

Benner

, et al. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023; 21: 182. https://doi.org/10.1186/s12916-023-02858-y

11.

Goshisht

. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS Omega 2024; 9: 9921–9945. https://doi.org/10.1021/acsomega.3c05913

12.

Alhumaidi

Dermawan

Kamaruzaman

, et al. The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review. JMIR Med Inform 2025; 13: e68898. https://doi.org/10.2196/68898

13.

Ahmed

Alam

MSB

Hassan

, et al. Deep learning modelling techniques: current progress, applications, advantages, and challenges. Artif Intell Rev 2023; 56: 13521–13617. https://doi.org/10.1007/s10462-023-10466-8

14.

Sharma

Lysenko

Jia

, et al. Advances in AI and machine learning for predictive medicine. J Hum Genet 2024; 69(10): 487–497. https://doi.org/10.1038/s10038-024-01231-y

15.

Khemphila Boonjing

. Heart Disease Classification Using Neural Network and Feature Selection. In: 2011 21st International Conference on Systems Engineering. Las Vegas, NV, USA, 2011, pp. 406–409. https://doi.org/10.1109/ICSEng.2011.80

16.

Jelinek

Muteir

Al-Aubaidy

. Hierarchical random forest model, inflammation and oxidative stress as predictors of the atherogenic index of plasma and diabetes progression. Sci Rep 2025; 15: 35381. https://doi.org/10.1038/s41598-025-19289-9

17.

Zelli

Manno

Compagnoni

, et al. Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations. J Transl Med 2023; 21(1): 836. https://doi.org/10.1186/s12967-023-04720-4

18.

Raj

Namdeo

Singh

, et al. Identification and prioritization of disease candidate genes using biomedical named entity recognition and random forest classification. Comput Biol Med 2025; 192(Pt B): 110320. https://doi.org/10.1016/j.compbiomed.2025.110320

19.

Wang

Zhu

. Named Entity Recognition from Biomedical Text Using SVM. In: 2011 5th International Conference on Bioinformatics and Biomedical Engineering. Wuhan, China, 2011, pp. 1–4. https://doi.org/10.1109/icbbe.2011.5779984

20.

Hartman

Scott

Karlsson

, et al. Interpreting biologically informed neural networks for enhanced proteomic biomarker discovery and pathway analysis. Nat Commun 2023; 14: 5359. https://doi.org/10.1038/s41467-023-41146-4

21.

Naeem

Yang

Saleem

, et al. A hybrid approach for accurate skin lesion segmentation using LEDNet and Swin-UMamba. Sci Rep 2026; 16: 5415. https://doi.org/10.1038/s41598-026-38056-y

22.

Drukker

Sahiner

, et al. MIDRC-MetricTree: a decision tree-based tool for recommending performance metrics in artificial intelligence-assisted medical image analysis. J Med Imaging (Bellingham) 2024; 11(2): 024504. https://doi.org/10.1117/1.JMI.11.2.024504

23.

Steenwijk

Pouwels

Daams

, et al. Accurate white matter lesion segmentation by k nearest neighbor classification with tissue type priors (kNN-TTPs). Neuroimage Clin 2013; 3: 462–469. https://doi.org/10.1016/j.nicl.2013.10.003

24.

Wang

Liu

Shi

, et al. An interpretable CNN model for NMR-based whole metabolomic profiling of sepsis. Results in Engineering 2025; 28: 107077, 2590-1230. https://doi.org/10.1016/j.rineng.2025.107077

25.

Panapitiya

Gao

Maupin

, et al. FragNet: A Graph Neural Network for Molecular Property Prediction with Four Levels of Interpretability. J Am Chem Soc 2026; 148(9): 9930–9950. https://doi.org/10.1021/jacs.5c22620

26.

Basher Ar

McLaughlin

Hallam

. Metabolic pathway inference using multi-label classification with rich pathway features. PLOS Computational Biology 2020; 16(10): e1008174. https://doi.org/10.1371/journal.pcbi.1008174

27.

Wang

Yuan

, et al. Accurate identification of single-cell types via correntropy-based Sparse PCA combining hypergraph and fusion similarity. J Appl Stat 2024; 52(2): 356–380. https://doi.org/10.1080/02664763.2024.2369955

28.

Peyvandipour

Shafi

Saberian

, et al. Identification of cell types from single cell data using stable clustering. Sci Rep 2020; 10: 12349. https://doi.org/10.1038/s41598-020-66848-3

29.

Elhaik

. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Sci Rep 2022; 12: 14683. https://doi.org/10.1038/s41598-022-14395-4

30.

Wang

Feng

, et al. scCAD: Cluster decomposition-based anomaly detection for rare cell identification in single-cell expression data. Nat Commun 2024; 15: 7561. https://doi.org/10.1038/s41467-024-51891-9

31.

Monisha

Suresh

Rashmi

. Artificial Intelligence Based Skin Classification Using GMM. J Med Syst 2018; 43(1): 3. https://doi.org/10.1007/s10916-018-1112-5

32.

Vikas

Makkapati

Bogireddy

, et al. Advancements in Lung Cancer Diagnosis: A Comprehensive Study on the Role of PCA, LDA, and t-SNE in Deep Learning Frameworks. In: 2024 Asian Conference on Communication and Networks (ASIANComNet). Bangkok, Thailand, 2024, pp. 1–7. https://doi.org/10.1109/ASIANComNet63184.2024.10811058

33.

Dadu

Satone

Kaur

, et al. Application of Aligned-UMAP to longitudinal biomedical studies. Patterns (N Y) 2023; 4(6): 100741. https://doi.org/10.1016/j.patter.2023.100741

34.

Ramesh

Rao

Moudgalya

, et al. GAN Based Approach for Drug Design. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). Pasadena, CA, USA, 2021, pp. 825–828. https://doi.org/10.1109/ICMLA52953.2021.00136

35.

Nguyen

Karolak

. Transformer graph variational autoencoder for generative molecular design. Biophys J 2025; 124(22): 3867–3875. https://doi.org/10.1016/j.bpj.2025.01.022

36.

Mishra

Shrivastava

Bhushan

. Disease diagnosis and control by reinforcement learning techniques: a systematic literature review. Discov Computing 2025; 28: 129. https://doi.org/10.1007/s10791-025-09539-9

37.

Han

Yan

, et al. Reinforcement learning for healthcare operations management: methodological framework, recent developments, and future research directions. Health Care Manag Sci 2025; 28(2): 298–333. https://doi.org/10.1007/s10729-025-09699-6

38.

Wang

, et al. Optimal scheduling in cloud healthcare system using Q-learning algorithm. Complex Intell Systems 2022; 8(6): 4603–4618. https://doi.org/10.1007/s40747-022-00776-9

39.

Jomova

Alomar

Nepovimova

, et al. Heavy metals: toxicity and human health effects. Arch Toxicol 2025; 99(1): 153–209. https://doi.org/10.1007/s00204-024-03903-2

40.

Salaudeen

Vuanghao

. Metabolomics Approach in Environmental Studies: Methodologies, Application and Challenges. Critical Reviews in Analytical Chemistry 2025: 1–17. https://doi.org/10.1080/10408347.2025.2521734

41.

Nzabanita

Shen

Grist

, et al. Heavy metal concentrations in feathers and metabolomic profiles in Pacific black ducks (Anas superciliosa) from Southeastern Australia. Environmental Toxicology and Chemistry 2025; 44(1): 92–102. https://doi.org/10.1093/etojnl/vgae004

42.

Wan

Yang

Zhao

, et al. Proteomic signatures and predictive modeling of cadmium-associated anxiety in middle-aged and elderly populations: an environmental exposure association study. J Transl Med 2025; 23: 499. https://doi.org/10.1186/s12967-025-06466-7

43.

Bartel

Krumsiek

Theis

. Statistical methods for the analysis of high-throughput metabolomics data. Comput Struct Biotechnol J 2013; 4: e201301009. https://doi.org/10.5936/csbj.201301009

44.

Maitre

Bustamante

Hernández-Ferrer

, et al. Multi-omics signatures of the human early life exposome. Nat Commun 2022; 13(1): 7024. https://doi.org/10.1038/s41467-022-34422-2

45.

Kim

Lee

Hur

, et al. Metabolomics and nutrient intake reveal metabolite–nutrient interactions in metabolic syndrome: insights from the Korean Genome and Epidemiology Study. Nutr J 2025; 24: 128. https://doi.org/10.1186/s12937-025-01189-3

46.

Lee

Zhou

Moen

, et al. Prediction of an outcome using NETwork Clusters (NET-C). Comput Biol Chem 2021; 90: 107425. https://doi.org/10.1016/j.compbiolchem.2020.107425

47.

Acosta

Falcone

Rajpurkar

, et al. Multimodal biomedical AI. Nat Med 2022; 28: 1773–1784. https://doi.org/10.1038/s41591-022-01981-2

48.

Nafie

Abu-Elsaoud

Diab

. A comprehensive review on computational metabolomics: Advancing multiscale analysis through in-silico approaches. Comput Struct Biotechnol J 2025; 27: 3191–3215. https://doi.org/10.1016/j.csbj.2025.07.016

49.

Stahlmann

Reitsma

Zapf

. Missing values and inconclusive results in diagnostic studies - A scoping review of methods. Stat Methods Med Res 2023; 32(9): 1842–1855. https://doi.org/10.1177/09622802231192954

50.

Fan

Han

Liu

. Challenges of Big Data Analysis. Natl Sci Rev 2014; 1(2): 293–314. https://doi.org/10.1093/nsr/nwt032

51.

Kant

Deepika & Roy

. Artificial intelligence in drug discovery and development: transforming challenges into opportunities. Discov. Pharm. Sci 2025; 1: 7. https://doi.org/10.1007/s44395-025-00007-3

52.

Zhang

Tang

. Drug metabolism in drug discovery and development. Acta Pharm Sin B 2018; 8(5): 721–732. https://doi.org/10.1016/j.apsb.2018.04.003

53.

Marini

Marchesin

Wodzinski

, et al. Multimodal representations of biomedical knowledge from limited training whole slide images and reports using deep learning. Med Image Anal 2024; 97: 103303. https://doi.org/10.1016/j.media.2024.103303

54.

Jiang

Zhi

, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol 2017; 2(4): 230–243, PMID: 29507784; PMCID: PMC5829945. https://doi.org/10.1136/svn-2017-000101

55.

Bernardi

Alves

Crepaldi

, et al. Data Quality in Health Research: Integrative Literature Review. J Med Internet Res 2023; 25: e41446. https://doi.org/10.2196/41446

56.

Luna-Benoso

Martínez-Perales

Morales-Rodríguez

ÚS

, et al. A New Classification Model Using a Decision Tree Generated from Hyperplanes in Dimensional Space. Applied Artificial Intelligence 2024; 38(1): 2426377. https://doi.org/10.1080/08839514.2024.2426377

57.

Bach

Bridges

. A decision tree approach for the application of drug metabolism and kinetic studies to in vivo and in vitro toxicological and pharmacological testing. Arch Toxicol Suppl 1985; 8: 173–188. https://doi.org/10.1007/978-3-642-69928-3_27

58.

Sirocchi

Biancucci

Donati

, et al. Exploring machine learning for untargeted metabolomics using molecular fingerprints. Computer Methods and Programs in Biomedicine 2024; 250: 108163, 0169-2607. https://doi.org/10.1016/j.cmpb.2024.108163

59.

Zhao

Lee

C -D

Chen

, et al. Research on the Prediction Application of Multiple Classification Datasets Based on Random Forest Model. In: 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS). Shenyang, China, 2024, pp. 156–161. https://doi.org/10.1109/ICPICS62053.2024.10795875

60.

Chen

Cao

Zhang

, et al. Random forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evid Based Complement Alternat Med 2013; 2013: 298183. https://doi.org/10.1155/2013/298183

61.

Sun

Wang

, et al. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Systems with Applications 2024; 237: 121549, Part B0957-4174. https://doi.org/10.1016/j.eswa.2023.121549

62.

Song

Liang

, et al. An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 2017; 251: 26–34, ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2017.04.018

63.

Yang

Zhu

, et al. Research on the application of dynamic weighted KNN with preprocessing based on a normal distribution in metabolomics data imputation. Computational Biology and Chemistry 2026; 121: 108804, 1476-9271. https://doi.org/10.1016/j.compbiolchem.2025.108804

64.

Jayatilake

SMDAC

Ganegoda

. Involvement of Machine Learning Tools in Healthcare Decision Making. J Healthc Eng 2021; 2021: 6679512–6679520. https://doi.org/10.1155/2021/6679512

65.

Silhavy

Prokopova

. Analysis and selection of a regression model for the Use Case Points method using a stepwise approach. Journal of Systems and Software 2017; 125: 1–14, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2016.11.029

66.

Zhao

Yuan

Liao

, et al. Interpreting LASSO regression model by feature space matching analysis for spatio-temporal correlation based wind power forecasting, Applied Energy 2025; 380: 124954, ISSN 0306-2619. https://doi.org/10.1016/j.apenergy.2024.124954

67.

Schapin

Majewski

Varela-Rial

, et al. Machine learning small molecule properties in drug discovery. Artificial Intelligence Chemistry 2023; 1(2): 100020, 2949-7477. https://doi.org/10.1016/j.aichem.2023.100020

68.

Woessner

Anjum

Salman

, et al. Identifying and training deep learning neural networks on biomedical-related datasets. Briefings in Bioinformatics 2024; 25(Issue Supplement_1): bbae232, bbae232. https://doi.org/10.1091/bib/bbae232

69.

Wang

. Deep neural network-based biostatistical analysis for disease marker screening. Sci Rep 2026; 16(1): 4021. https://doi.org/10.1038/s41598-025-34115-y

70.

Sliwoski

Kothiwale

Meiler

, et al. Computational methods in drug discovery. Pharmacol Rev 2013; 66(1): 334–395. https://doi.org/10.1124/pr.112.007336

71.

Cervantes

Garcia-Lamont

Rodríguez-Mazahua

, et al. A comprehensive survey on support vector machine classification: Applications, challenges and trends, Neurocomputing 2020; 408: Pages 189–215, ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2019.10.118

72.

Mahadevan

Shah

Marrie

, et al. Analysis of metabolomic data using support vector machines. Anal Chem 2008; 80(19): 7562–7570. https://doi.org/10.1021/ac800954c

73.

Qinghe

Wen

Boyan

, et al. Optimised extreme gradient boosting model for short term electric load demand forecasting of regional grid system. Sci Rep 2022; 12: 19282. https://doi.org/10.1038/s41598-022-22024-3

74.

Guan

, et al. Construction of the XGBoost model for early lung cancer prediction based on metabolic indices. BMC Med Inform Decis Mak 2023; 23: 107. https://doi.org/10.1186/s12911-023-02171-x

75.

Kumar

Gupta

Kim

. AI-driven drug discovery using a context-aware hybrid model to optimize drug-target interactions. Sci Rep 2025; 15: 35719. https://doi.org/10.1038/s41598-025-19593-4

76.

Jaganathan

. Optimizing weighted k-means clustering with gradient-based methods. Systems Science & Control Engineering 2025; 13(1): 2550755. https://doi.org/10.1080/21642583.2025.2550755

77.

Zou

Holmes

Nicholson

, et al. Automatic Spectroscopic Data Categorization by Clustering Analysis (ASCLAN): A Data-Driven Approach for Distinguishing Discriminatory Metabolites for Phenotypic Subclasses. Anal Chem 2016; 88(11): 5670–5679. https://doi.org/10.1021/acs.analchem.5b04020

78.

Tokuda

Comin

Costa

Lda F

. Revisiting agglomerative clustering. Physica A: Statistical Mechanics and its Applications 2022; 585: 126433, 0378-4371. https://doi.org/10.1016/j.physa.2021.126433

79.

Meinicke

Lingner

Kaever

, et al. Metabolite-based clustering and visualization of mass spectrometry data using one-dimensional self-organizing maps. Algorithms Mol Biol 2008; 3: 9. https://doi.org/10.1186/1748-7188-3-9

80.

Belciug

Gabriel Iliescu

, Deep learning and Gaussian Mixture Modelling clustering mix. A new approach for fetal morphology view plane differentiation, Journal of Biomedical Informatics, 143, 2023, 104402, ISSN 1532-0464. https://doi.org/10.1016/j.jbi.2023.104402

81.

Okyere

Bravo-Merodio

, et al. Drug response in the era of precision medicine: A methodological review. Comput Struct Biotechnol J 2025; 27: 5503–5520. https://doi.org/10.1016/j.csbj.2025.11.067

82.

Liu

Wang

, et al. Health Care Provider Clustering Using Fusion Penalty in Quasi-Likelihood. Biom J 2024; 66(6): e202300185. https://doi.org/10.1002/bimj.202300185

83.

Krishnan

Dutta

. A Study of Effectiveness of Principal Component Analysis on Different Data Sets. In: 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC). Coimbatore, India, 2017, pp. 1–6. https://doi.org/10.1109/ICCIC.2017.8524329

84.

Yamamoto

Fujimori

Sato

, et al. Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics 2014; 15: 51. https://doi.org/10.1186/1471-2105-15-51

85.

Cheng

Wang

Xia

. Supervised t-distributed stochastic neighbor embedding for data visualization and classification. INFORMS J Comput. 2021 Spring 2020; 33(2): 419–835. https://doi.org/10.1287/ijoc.2020.0961

86.

Lee

Park

Kim

, et al. Artificial intelligence on biomedical signals: technologies, applications, and future directions. Med-X 2024; 2: 25. https://doi.org/10.1007/s44258-024-00043-1

87.

Trozzi

Wang

Tao

. UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study. J Phys Chem B 2021; 125(19): 5022–5034. https://doi.org/10.1021/acs.jpcb.1c02081

88.

Castellano-Escuder

Zachman

Han

, et al. GAUDI: interpretable multi-omics integration with UMAP embeddings and density-based clustering. Nat Commun 2025; 16: 5771. https://doi.org/10.1038/s41467-025-60822-1

89.

Cao

Tan

. Generative artificial intelligence: a historical perspective. National Science Review 2025; 12(5): nwaf050. https://doi.org/10.1093/nsr/nwaf050

90.

Wang

Y-H

T-HS

Lin

C-J

, et al. The combination of Sarsa algorithm and Q-learning. Engineering Applications of Artificial Intelligence 2013; 26(9): 2184–2193. 0952-1976. https://doi.org/10.1016/j.engappai.2013.06.016.

91.

Park

Lee

, et al. Reinforcement learning-based expanded personalized diabetes treatment recommendation using South Korean electronic health records. Expert Systems with Applications 2022; 206: 117932, ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2022.117932

92.

J Wang

Wang

, et al. ClickGen: Directed exploration of synthesizable chemical space via modular reactions and reinforcement learning. Nat Commun 2024; 15: 10127. https://doi.org/10.1038/s41467-024-54456-y

93.

Mathur

Sutton

. Personalized medicine could transform healthcare. Biomed Rep 2017; 7(1): 3–5. https://doi.org/10.3892/br.2017.922, Epub 2017 Jun 2. PMID: 28685051; PMCID: PMC5492710.

94.

Fahim

Hasani

Kabba

, et al. Artificial intelligence in healthcare and medicine: clinical applications, therapeutic advances, and future perspectives. Eur J Med Res 2025; 30(1): 848. https://doi.org/10.1186/s40001-025-03196-w

95.

Yang

Sun

, et al. Personalized Drug Therapy: Innovative Concept Guided With Proteoformics. Mol Cell Proteomics 2024; 23(3): 100737. https://doi.org/10.1016/j.mcpro.2024.100737, Epub 2024 Feb 13. PMID: 38354979; PMCID: PMC10950891.

96.

Sah

Elshaikh

Shalabi

, et al. Role of artificial intelligence and personalized medicine in enhancing hiv management and treatment outcomes. Lifee2025 2025; 15(5): 745. https://doi.org/10.3390/life15050745

97.

Schork

Beaulieu-Jones

Liang

, et al. Exploring human biology with N-of-1 clinical trials. Camb Prism Precis Med 2023; 1: e12. https://doi.org/10.1017/pcm.2022.15, Epub 2023 Jan 10. PMID: 37255593; PMCID: PMC10228692.

98.

Alsaedi

Ogasawara

Alarawi

, et al. AI-powered precision medicine: utilizing genetic risk factor optimization to revolutionize healthcare. NAR Genomics and Bioinformatics 2025; 7(2): lqaf038. https://doi.org/10.1093/nargab/lqaf038

99.

Baird

Westphalen

Blum

, et al. How can we deliver on the promise of precision medicine in oncology and beyond? A practical roadmap for action. Health Sci Rep 2023; 6(6): e1349, PMID: 37359405; PMCID: PMC10286856. https://doi.org/10.1002/hsr2.1349

100.

Rony

MKK

Parvin

Wahiduzzaman

, et al. Challenges and Advancements in the Health-Related Quality of Life of Older People, Advances in Public Health, 2024; 8839631: 18. https://doi.org/10.1155/2024/8839631

101.

Tsimberidou

Fountzilas

Nikanjam

, et al. Review of precision cancer medicine: Evolution of the treatment paradigm. Cancer Treat Rev 2020; 86: 102019. https://doi.org/10.1016/j.ctrv.2020.102019, Epub 2020 Mar 31. PMID: 32251926; PMCID: PMC7272286.

102.

Gonzalez

Rao

Bailey

, et al. Precision Dosing: Public Health Need, Proposed Framework, and Anticipated Impact. Clin Transl Sci 2017; 10(6): 443–454. https://doi.org/10.1111/cts.12490, Epub 2017 Aug 10. PMID: 28875519; PMCID: PMC5698804.

103.

Sanaullah

Waßmuth

Ulrich

, et al. 0: Enhanced visualization and comparative analysis for neural network models. SoftwareX; 29(2025): 102006, 2352-7110. https://doi.org/10.1016/j.softx.2024.102006

104.

Frąszczak

. NetCenLib: A comprehensive python library for network centrality analysis and evaluation. SoftwareX 2024; 26: 101699, 2352-7110. https://doi.org/10.1016/j.softx.2024.101699

105.

Badie-Modiri

Kivelä

. Reticula: A temporal network and hypergraph analysis software package. SoftwareX 2023; 21: 101301, 2352-7110. https://doi.org/10.1016/j.softx.2022.101301

106.

Rossi

Murari

Martellucci

, et al. NetCausality: A time-delayed neural network tool for causality detection and analysis. SoftwareX 2021; 15: 100773, 2352-7110. https://doi.org/10.1016/j.softx.2021.100773

107.

Śmierzchalski

Dziubyna

Jałowiecki

, et al. SpinGlassPEPS.jl: Tensor-network package for Ising-like optimization on quasi-two-dimensional graphs. SoftwareX 2025; 31(2025): 102257, 2352-7110. https://doi.org/10.1016/j.softx.2025.102257

108.

Castelli

Rosati

Moguet

, et al. Metabolomics for personalized medicine: the input of analytical chemistry from biomarker discovery to point-of-care tests. Anal Bioanal Chem 2022; 414(2): 759–789. https://doi.org/10.1007/s00216-021-03586-z

109.

Wishart

Bartok

Oler

, et al. MarkerDB: an online database of molecular biomarkers. Nucleic Acids Res 2021; 49(D1): D1259–D1267. https://doi.org/10.1093/nar/gkaa1067

110.

Baser

Samayoa

Yapar

, et al. Artificial Intelligence in Identifying Patients With Undiagnosed Nonalcoholic Steatohepatitis. J Health Econ Outcomes Res 2024; 11(2): 86–94. https://doi.org/10.36469/001c.123645

111.

Helmy

Smith

Selvarajoo

. Systems biology approaches integrated with artificial intelligence for optimized metabolic engineering. Metab Eng Commun 2020; 11: e00149. https://doi.org/10.1016/j.mec.2020.e00149, Epub 2020 Oct 9. Erratum in: Metab Eng Commun. 2021 Oct 28;13:e00186. doi: 10.1016/j.mec.2021.e00186.

112.

Kanehisa

Furumichi

Sato

, et al. biological systems database as a model of the real world. Nucleic Acids Research 2025; 53(D1): D672–D677. https://doi.org/10.1093/nar/gkae909

113.

Zhang

Wan

Wang

, et al. Artificial intelligence and computational methods in human metabolism research: A comprehensive survey. J Pharm Anal 2025; 15(8): 101437. https://doi.org/10.1016/j.jpha.2025.101437

114.

Zhang

Liu

S-X

Tao

, et al. Artificial Intelligence Tools for Drug Target Discovery Research: Database, Tools, Applications, and Challenges. Chemistry – A European Journal32 2026; 32(2): e03240. https://doi.org/10.1002/chem.202503240

115.

Wishart

Craig

Guo

, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research 2006; 34(Issue suppl_1): D668–D672. https://doi.org/10.1093/nar/gkj067

116.

Gaulton

Bellis

Bento

, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 2012; 40(Database issue): D1100–D1107. https://doi.org/10.1093/nar/gkr777

117.

Moreno-Risueno

Busch

Benfey

. Omics meet networks - using systems approaches to infer regulatory networks in plants. Curr Opin Plant Biol 2010; 13(2): 126–131. https://doi.org/10.1016/j.pbi.2009.11.005

118.

Kasam

Tautermann

, et al. Computational method to identify druggable binding sites that target protein-protein interactions. J Chem Inf Model 2014; 54(5): 1391–1400. https://doi.org/10.1021/ci400750x

119.

Juan Pablo Betancourt Arango Rodriguez

DYM

Cruz

Taborda Ocampo

. In silico evaluation of pharmacokinetic properties and molecular docking for the identification of potential anticancer compounds. Computational Biology and Chemistry 2026; 120(Part 1): 108626, 1476-9271. https://doi.org/10.1016/j.compbiolchem.2025.108626

120.

Pathan

Raza

Sahu

, et al. AI-powered approaches in molecular modeling and ADMET prediction. Medicine in Drug Discovery 2025; 28: 100223, 2590-0986. https://doi.org/10.1016/j.medidd.2025.100223

121.

Wang

Xiao

Suzek

, et al. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Research 2009; 37(Issue suppl_2): W623–W633. https://doi.org/10.1093/nar/gkp456

122.

Muthuraj

Chandrasekaran

. Nature meets machine: the AI renaissance in natural product drug discovery. Nat Prod Bioprospect 2026; 16(1): 37. https://doi.org/10.1007/s13659-025-00589-6

123.

Chelliah

Vijayalakshmi

Karuvelan

, et al. Revitalizing natural products for sustainable drug Discovery: Innovations bridging food bioscience and green therapeutic development, Sustainable Chemistry and Pharmacy, Volume 46, 2025, 102082, ISSN 2352-5541. https://doi.org/10.1016/j.scp.2025.102082

124.

Wishart

Tzur

Knox

, et al. HMDB: the Human Metabolome Database. Nucleic Acids Res 2007; 35(Database issue): D521–D526. https://doi.org/10.1093/nar/gkl923

125.

Bazzano

de Felicio

Alves

LFG

, et al. NP³ MS Workflow: An Open-Source Software System to Empower Natural Product-Based Drug Discovery Using Untargeted Metabolomics. Anal Chem 2024; 96(19): 7460–7469. https://doi.org/10.1021/acs.analchem.3c05829

126.

Danzi

Pacchiana

Mafficini

, et al. To metabolomics and beyond: a technological portfolio to investigate cancer metabolism. Sig Transduct Target Ther 2023; 8: 137. https://doi.org/10.1038/s41392-023-01380-0

127.

Lee

Moen

Punshon

, et al. An Integrated Gaussian Graphical Model to evaluate the impact of exposures on metabolic networks. Comput Biol Med 2019; 114: 103417. https://doi.org/10.1016/j.compbiomed.2019.103417

128.

Dave

Agrawal

. A Comparative Study with GDPR, HIPAA, CCPA, PIPEDA and DPDPA. 2025 IEEE International Conference on Computer, Electronics, Electrical Engineering & their Applications (IC2E3). Srinagar Garhwal, 2025, pp. 1–6. https://doi.org/10.1109/IC2E365635.2025.11167394

129.

Norori

Aellen

, et al. Addressing bias in big data and AI for health care: A call for open science. Patterns (N Y) 2021; 2(10): 100347. https://doi.org/10.1016/j.patter.2021.100347

130.

Pham

. Ethical and legal considerations in healthcare AI: innovation and policy for safe and fair use. R Soc Open Sci 2025; 12(5): 241873. https://doi.org/10.1098/rsos.241873

131.

Cross

Choma

Onofrey

. Bias in medical AI: Implications for clinical decision-making. PLOS Digit Health 2024; 3(11): e0000651. https://doi.org/10.1371/journal.pdig.0000651

132.

Bradford

Aboy

Liddell

. International transfers of health data between the EU and USA: a sector-specific approach for the USA to ensure an 'adequate' level of protection. J Law Biosci 2020; 7(1): lsaa055. https://doi.org/10.1093/jlb/lsaa055

133.

Dunbar

Keyes

Browne

. Determinants of regulatory compliance in health and social care services: A systematic review using the Consolidated Framework for Implementation Research. PLOS ONE 2023; 18(4): e0278007. https://doi.org/10.1371/journal.pone.0278007

134.

McGraw

Mandl

. Privacy protections to encourage use of health-relevant digital data in a learning health system. npj Digit. Med 2021; 4: 2. https://doi.org/10.1038/s41746-020-00362-8

135.

FAIR-AI Consortium Wells

Nguyen

McWilliams

, et al. A practical framework for appropriate implementation and review of artificial intelligence (FAIR-AI) in healthcare. NPJ Digit Med 2025; 8(1): 514. https://doi.org/10.1038/s41746-025-01900-y

136.

Mennella

Maniscalco

De Pietro

, et al. Ethical and regulatory challenges of AI technologies in healthcare: A narrative review. Heliyon 2024; 10(4): e26297. https://doi.org/10.1016/j.heliyon.2024.e26297

137.

Bassey

Daniel

Okesina

, et al. Transformative Role of Artificial Intelligence in Drug Discovery and Translational Medicine: Innovations, Challenges, and Future Prospects. Drug Design, Development and Therapy 2025; 19: 7493–7502. https://doi.org/10.2147/DDDT.S538269

138.

Seo

Lee

. Applications of Big Data and AI-Driven Technologies in CADD (Computer-Aided Drug Design). Methods Mol Biol 2024; 2714: 295–305, PMID: 37676605. https://doi.org/10.1007/978-1-0716-3441-7_16

139.

Askin

Burkhalter

Calado

, et al. Artificial Intelligence Applied to clinical trials: opportunities and challenges. Health Technol (Berl) 2023; 13(2): 203–213. https://doi.org/10.1007/s12553-023-00738-2, Epub 2023 Feb 28. PMID: 36923325; PMCID: PMC9974218.

140.

Dixon

Sattar

Moros

, et al. Unveiling the Influence of AI Predictive Analytics on Patient Outcomes: A Comprehensive Narrative Review. Cureus 2024; 16(5): e59954, PMID: 38854327; PMCID: PMC11161909. https://doi.org/10.7759/cureus.59954

141.

Chen

Wang

Pan

, et al. Applications of multi-omics analysis in human diseases. MedComm 2020; 4(4): e315, PMID: 37533767; PMCID: PMC10390758. https://doi.org/10.1002/mco2.315

142.

Alum

. AI-driven biomarker discovery: enhancing precision in cancer diagnosis and prognosis. Discov Onc 2025; 16: 313. https://doi.org/10.1007/s12672-025-02064-7

143.

Sola

Ayala

Pulido

, et al. Gintegrator: Enhancing biological sequences data integration with real-time identifier translation. SoftwareX 2025; 29: 102041. https://doi.org/10.1016/j.softx.2025.102041. 2352-7110.

144.

Bukowski

Wladyka

. DGE-ontology: A quick and simple gene set enrichment analysis and visualisation tool. SoftwareX 2024; 28: 101899, 2352-7110. https://doi.org/10.1016/j.softx.2024.101899

145.

Tzanakis

Nattkemper

Niehaus

, et al. MetHoS: a platform for large-scale processing, storage and analysis of metabolomics data. BMC Bioinformatics 2022; 23: 267. https://doi.org/10.1186/s12859-022-04793-w

146.

Kirpich

Ibarra

Moskalenko

, et al. SECIMTools: a suite of metabolomics data analysis tools. BMC Bioinformatics 2018; 19(1): 151. https://doi.org/10.1186/s12859-018-2134-1

147.

Xia

Psychogios

Young

, et al. MetaboAnalyst: a web server for metabolomic data analysis and interpretation. Nucleic Acids Res 2009; 37(Web Server issue): W652–W660. https://doi.org/10.1093/nar/gkp356

148.

Xia

Deng

, et al. TamGen: drug design with target-aware molecule generation through a chemical language model. Nat Commun 2024; 15(1): 9360. https://doi.org/10.1038/s41467-024-53632-4

149.

Loeffler

Tibo

, et al. Reinvent 4: Modern AI–driven generative molecule design. J Cheminform 2024; 16: 20. https://doi.org/10.1186/s13321-024-00812-5

150.

Luo

Zhu

, et al. GS-DTA: integrating graph and sequence models for predicting drug-target binding affinity. BMC Genomics 2025; 26: 105. https://doi.org/10.1186/s12864-025-11234-4

151.

Wang

Zhang

Shao

, et al. MMPD-DTA: Integrating Multi-Modal Deep Learning with Pocket-Drug Graphs for Drug-Target Binding Affinity Prediction. J Chem Inf Model 2025; 65(3): 1615–1630. https://doi.org/10.1021/acs.jcim.4c01528

152.

Cepicka

Sauer

Kirchner

, et al. drugdevelopR: Planning of phase II/III drug development programs with optimal sample size allocation and Go/No-go decision rules in R. SoftwareX 2025; 30(2025): 102066, 2352-7110. https://doi.org/10.1016/j.softx.2025.102066

153.

Wieland-Jorna

Kooten

Verheij

, et al. Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review. JAMIA Open 2024; 7(2): ooae044. https://doi.org/10.1093/jamiaopen/ooae044

154.

Eguia

Sánchez-Bocanegra

Vinciarelli

, et al. Clinical Decision Support and Natural Language Processing in Medicine: Systematic Literature Review. J Med Internet Res 2024; 26: e55315. https://doi.org/10.2196/55315