Abstract
Introduction:
Early cancer detection can lead to improved outcomes and a shift toward a prevention model. This review explores the role of artificial intelligence (AI) in early cancer detection, focusing on its application in various clinical settings.
Methods:
A literature search was conducted using databases such as PubMed, Scopus, Web of Science, and IEEE Xplore, focusing on studies that applied AI in early cancer detection. The search, covering articles up to November 2023, included studies discussing clinical applications and trials. The findings were categorizing studies based on AI techniques, cancer types, and clinical application stages.
Results:
AI in oncology, particularly machine learning (ML) and deep learning, has shown promise in enhancing early cancer detection, including breast, lung, prostate, skin, gastrointestinal, and other cancers. These technologies have been applied to various data types, including electronic health records, medical imaging, and biomarkers such as genomics and proteomics. AI/ML has numerous indications throughout the continuum of oncology care, including screening, detection, classification, characterization, segmentation, and monitoring. AI has contributed to numerous early clinical applications in major cancer types such as breast, lung, skin, and prostate cancer, including early symptom detection, clinical decision support, radiology, pathology, medical image and video analysis, biomarker testing including liquid biopsies, detection of recurrence, remote monitoring, and risk stratification.
Conclusion:
AI may advance early cancer detection, and holds promise of revolutionizing health care and diagnosis processes. However, data privacy, ethical considerations, and potential biases require addressing. Implementing quality assurance frameworks and standardization initiatives is vital for the future quality and adoption of AI models in oncology. Future AI applications will include integrating multimodal models, interpretable models, and digital twin technologies, ensuring model transparency and generalizability.
Introduction
Although the United States pays more for medical care per capita than any other country,1,2 health care in the United States has unsustainably high costs, poor outcomes, and worsening health disparities. 3 Chronic diseases, including cardiovascular disease, cancer, and diabetes, cause ∼70% of deaths in the United States and account for ∼75% of expenditures.3,4 Modern medicine has traditionally focused on sick care, using advanced and costly interventions after the disease has manifested itself. Precision cancer treatments primarily target cancers when they are further along in both stage and cumulative mutational burden. However, shifting toward a prevention model would help forestall disease development and reduce morbidity. Figure 1 describes health care's current and possible future state (as described by Hood and Price 5 ).

Early detection refers to catching disease symptoms when there are early signals of transition from wellness to disease. 4 Early detection identifies cancer with fewer mutations and fewer perturbed cellular pathways. 6 For most patients, an earlier diagnosis at an earlier stage means a higher chance of positive outcomes such as a cure, less need for surgery, radiation, or systemic therapies, and a higher chance of maintaining a good quality of life after cancer treatment.7–9
Our ability to optimize cancer prevention may potentially require deep knowledge of our genomes, our phenome (which is the result of the interaction of our genome with lifestyle and environment), and digital health measurements of our gut microbiome, blood analytes, including proteins, metabolites, and other molecules with data derived from smartphones, watches, smart rings, other wearable devices that track heart rate, body temperature, respiration, activity level, calories consumed and burned, sleep, menstrual cycles, blood sugar, hormone balances, and more.5,10 Early studies combining these data are promising but not yet conclusive.
Screening programs can improve the early detection of cancer, such as for breast, prostate, cervical, lung, colorectal cancers, and others.7,11 However, in unselected patients where a standardized “one-size-fits-all” screening paradigm is utilized, there are trade-offs between overdiagnosis, overtreatment, lead time, false positives, costs, and true screening benefits. 12 Proper patient risk stratification and selection are needed to personalize cancer screening paradigms.
The burgeoning field of artificial intelligence (AI) has revolutionized various aspects of health care, particularly in cancer detection. Both early cancer detection and AI are undergoing significant transformations. This evolution underscores the critical role of AI in transforming oncological care, offering a potential new paradigm in early diagnosis and treatment strategies.
Computational methodologies in ML have improved significantly and are beginning to provide new tools and better ways to detect cancer earlier, such as the ability to read and detect cancer in radiology and pathology images, leading to Food and Drug Agency (FDA) approvals of AI-enhanced diagnostic tools, which aim to improve early detection rates, hence reducing mortality and morbidity associated with late-stage cancer. In this review, we discuss several representative early applications of AI in the early detection of cancer.
Methods
We employed rigorous criteria for selecting studies to ensure the reliability and relevance of our findings. Studies were chosen based on their methodological robustness, the novelty of AI techniques used, and their direct applicability to early cancer detection. We focused on peer-reviewed articles (original articles and review articles) published in the past 10 years to capture the most recent advancements in the field. We utilized the following to identify relevant literature: PubMed, Scopus, Web of Science, and IEEE Xplore databases for articles published up to November 2023.
Keywords used in the search included “artificial intelligence,” “machine learning,” “deep learning,” “cancer detection,” “early diagnosis,” “computational oncology,” and “clinical applications,” with the requirement that included clinical studies focused on early cancer detection. Both MeSH terms and free-text terms were employed to broaden the search scope. We also manually searched the reference lists of key articles to identify additional studies.
Studies were selected based on the following inclusion criteria: (1) original research articles, review articles, and meta-analyses published in English; (2) studies focusing on the use of AI in early cancer detection; and (3) studies that discuss clinical applications or clinical trials. Exclusion criteria were as follows: (1) articles not in English; (2) studies focusing on late-stage cancer diagnosis or treatment; and (3) studies without a clear focus on AI applications in oncology. Pertinent websites or commentaries were included. We categorized studies based on the AI technique used (e.g., machine learning [ML] and deep learning [DL]), the type of cancer, and the stage of clinical application.
Results
Applications of AI in early cancer detection
AI/ML has numerous indications throughout the continuum of oncology care. The clinical context begins with screening and ends with ongoing monitoring/surveillance (Table 1). Early applications of AI in early cancer detection have included early symptom detection, radiology, pathology, medical images and videos, biomarker testing, recurrence detection, remote monitoring, and risk stratification, among other emerging applications (Fig. 2).

Clinical applications of AI in early cancer detection include early symptom detection, radiology, pathology, medical images and videos, biomarker testing, recurrence detection, remote monitoring, and risk stratification. AI, artificial intelligence; CDS, clinical decision support; ctDNA, circulating tumor DNA; DL, deep learning; EHR, electronic health record; ML, machine learning.
Artificial Intelligence/Machine Learning Has Numerous Indications Throughout the Continuum of Oncology Care
Screening and early symptom detection
AI can pinpoint at-risk populations and inform targeted screening initiatives by detecting nuanced patterns and correlations within multimodal data. For instance, AI's advanced analytical capabilities can discern trends and cancer case clusters from extensive data sets, enabling public health entities to optimize their screening and preventive strategies.13,14 As natural language processing algorithms improve, electronic health record (EHR) data can be mined with AI tools to predict the likelihood of developing cancer based on a patient's medical history, genetics, and lifestyle factors. This allows for early intervention and personalized screening plans.
AI-powered chatbots and virtual assistants can help individuals monitor and recognize early cancer symptoms, prompting them to seek medical attention sooner. AI can analyze patient data, including genetic information, family history, and lifestyle factors, to assess cancer risk and recommend regular screenings for high-risk individuals. AI systems can assist oncologists in making treatment decisions by analyzing vast amounts of patient data and recommending personalized treatment plans based on the latest medical research.
AI has the potential to directly assist in cancer diagnosis by triggering investigations or referrals based on clinical parameters in individuals undergoing screening. Numerous studies have shown the potential benefits of cancer screening in terms of early detection and reduced mortality rates. However, even in diseases such as breast cancer, where screening programs are well-established with routine mammography and/or magnetic resonance imaging (MRI),15,16 ongoing debates persist regarding how to select the right patients and balance the risks and benefits. Concerns have arisen about the one-size-fits-all approach, which aligns differently from the principles of personalized medicine.15,17
For example, the United States Preventative Services Task Force (USPSTF) recommends annual low-dose computed tomography (LDCT) for lung cancer screening for adults aged 50–80, 20+ pack-year history of smoking, currently smoking or quitting within the past 15 years. 18 In 2021, only ∼6% of those eligible were screened, possibly due to a lack of access to screening centers, poor documentation of smoking status, insufficient time during provider visits, and other socioeconomic factors. 19
Investigators recently trained a convolutional neural network (CNN) model that incorporated minimal EHR data elements, including age, sex, current smoking status plus chest X-ray data from >10,000 patients on two major clinical trials—the Prostate, Lung, Colorectal, and Ovarian (PLCO) trial and National Lung Screening Trial (NLST). 20 Interestingly, CNN predicted 12-year lung cancer incidence better than CMs' eligibility criteria (area under the curve [AUC] 0.755 vs. 0.634).
Similarly, Gould et al. developed an ML model using nonimaging EHR data to predict lung cancer. 21 The model, trained on a data set of 6505 lung cancer patients and 189,597 controls, achieved an AUC of 0.86, outperforming the PLCO Cancer Screening Trial criteria in predicting lung cancer within 9–12 months.21,22 This study demonstrated that AI can enhance the assessment of routine clinical data to better identify patients for lung cancer screening programs. It suggests that AI's role in improving patient selection for screening could be a valuable approach to achieving early diagnosis of lung cancer in the future.
AI may also help predict the risk of future pancreatic cancer. Pancreatic cancer is a low-incidence but high-mortality cancer diagnosis, and Placido et al. used the time sequence of disease events to assess the ability to predict cancer risk for increasing intervals between the end of the disease trajectory used for risk prediction and cancer occurrence. 23 They investigated 6 million patients from the Danish National Patient Registry (24,000 pancreatic cancer cases) and cross-applied their models to 3 million patients from U.S. Veterans Affairs.
They trained an ML model CancerRiskNet on the sequence of disease codes from the EHR, including International Classification of Diseases diagnosis and billing codes in clinical histories, and tested the prediction of pancreatic cancer within 36 months. Their model had an AUC of 0.88 for the Danish cohort, whereas cross-application to the United States Veterans Affairs cohort had a slightly lower AUC of 0.71. Retraining was needed to improve performance (area under the receiver operating characteristic = 0.78). This is an excellent example of a prediction–surveillance selection process in a real-world population.
This study used real-world data such as ICD disease codes and billing codes, both a strength and a weakness. Such studies will benefit from incorporating more expansive data beyond disease codes, such as medications, laboratories, clinical notes, and abdominal imaging (such as computed tomography [CT] or MRI), as well as population-wide germline genetic profiles and health records from general practitioners and, in the future, patient-provided information about their health state from wearable devices.
Other studies on pancreatic cancer have also utilized ML models to sift through expansive EHRs using sophisticated pattern recognition and natural language processing to flag individuals exhibiting potential cancer indicators, those predisposed based on established risk factors, or particular health metrics linked to cancer. ML algorithms can be deployed to scrutinize specific health markers, such as newly developed hyperglycemia—a potential harbinger of pancreatic cancer—as well as other data such as age, body mass index, change in body mass index, smoking, use of proton pump inhibitors, antidiabetic medications, and hemoglobin A1c, cholesterol, hemoglobin, creatinine, and alkaline phosphatase levels.24,25
These models can also evaluate data from health surveys to stratify individuals into risk categories, thereby enriching the pool of individuals screened for pancreatic cancer, increasing the yield and cost-effectiveness of early-stage screenings.25,26 Such stratification ensures a more focused and efficient allocation of screening resources, potentially catching malignancies at a more treatable stage.
The development of AI-enabled chatbot-based symptom checker apps in health care has been gaining traction. 27 These apps aim to provide users with potential diagnoses and assist in self-triaging using AI-driven human-like conversations. However, more research is needed to explore their functionalities and user experiences. Early studies suggest that current apps must catch up in replicating the diagnostic process experienced during an in-person medical visit.27,28
Users have reported that these apps lack in several areas: they do not adequately support a comprehensive medical history review, they have limitations in accepting flexible symptom input, the questions asked by the apps are often not easily understandable, and there is a lack of coverage for a wide range of diseases and user demographics. These studies suggest the need to enhance their conversational design and include features that address these gaps, aiming to improve these AI-driven health care tools' overall effectiveness and user experience.
Radiology and pathology for early diagnosis
Medical imaging broadly includes radiology, pathology, images, and videos from procedures such as endoscopies and skin cancer images. This section will cover representative examples of major cancer disease sites.
Breast cancer
The USPSTF currently recommends that women aged 50–74 at average risk of developing breast cancer undergo biennial screening mammography. 29 Significant issues with screening include false positive and negative rates, overdiagnosis and overtreatment, psychological impact, and cost, among other harm and benefits.15,30 Improved breast cancer risk models are needed to achieve earlier detection of early-stage breast cancers but also reduce overtreatment.
Yala et al. 31 and Brentnall et al. 32 from Massachusetts Institute of Technology incorporated both imaging factors with nonimaging risk factors, such as age and hormonal factors. When these factors were unavailable in the data, their algorithm would impute those expected values based on the image. Their model takes as input standard views of a mammogram, followed by four modules—an image encoder, then an aggregator to combine information across views of the entire mammogram. Based on this rich representation of the mammogram, risk factors are then predicted and used if these values need to be imputed.
Finally, an additive hazard layer predicts a patient's risk of breast cancer for each year for the next 5 years. Training and testing data was multi-institutional and multinational. Impressively, they found that 41.5% of patients who would develop cancer within 5 years were identified as high risk by their model, compared with only 22.9% by the more commonly utilized Tyrer-Cuzick model (p < 0.001).
Investigators from Google curated a large representative data set from the United Kingdom and a large enriched data set from the United States. 33 Their model resulted in an absolute reduction of 5.7% and 1.2% (United States and United Kingdom cohorts, respectively) in false positives and 9.4% and 2.7% in false negatives. The AI system outperformed six human radiologists, where the AUROC for the AI system was greater than the AUROC for the average radiologist by an absolute margin of 11.5%.
Given the high false positive and negative rates for screening mammography, mammograms may be double-read to reduce errors. 34 The AI system maintained noninferior performance in the double-read setting and surprisingly reduced the workload of the second reader by 88%. These results significantly impact resource availability and utilization in radiology, particularly in environments with limited access to radiologists and double-read workflows.
Most recently, Lang et al. conducted a randomized controlled trial to evaluate the clinical safety of an AI-supported mammography screening protocol compared with standard double reading by radiologists. 35 The trial involved women aged 40–80 years undergoing mammography screening at four sites in Sweden. Participants were randomly assigned to either AI-supported screening or standard double reading without AI (Fig. 3). The AI system provided a malignancy risk score, guiding the radiologists in determining whether a screening examination required single or double reading.

Participants were randomly assigned to either AI-supported screening or standard double reading without AI.
The study aimed to assess secondary outcome measures, including cancer detection rate, recall rate, false positive rate, positive predictive value (PPV) of recall, type of cancer detected, and screen-reading workload. The findings revealed that AI-supported screening resulted in a similar cancer detection rate compared with standard double reading, meeting the predefined safety threshold. The recall rates and false positive rates were comparable between the two groups, whereas the PPV of recall was slightly higher in the AI-supported group. Notably, there was a 44.3% reduction in screen-reading radiologist workload.
Another study by Dembrower et al. compared double reading by one radiologist plus AI with standard-of-care double reading by two radiologists, as well as single reading by AI alone and triple reading by two radiologists plus AI. 36 The findings revealed that double reading by one radiologist plus AI was noninferior for cancer detection compared with standard double reading by two radiologists (Fig. 4). Similarly, single reading by AI alone and triple reading by two radiologists plus AI were also noninferior to standard double reading by two radiologists.

This study compared double reading by one radiologist plus AI with standard-of-care double reading by two radiologists, as well as single reading by AI alone and triple reading by two radiologists plus AI. The findings revealed that double reading by one radiologist plus AI was noninferior for cancer detection compared with standard double reading by two radiologists.
Notably, replacing one radiologist with AI resulted in a 4% higher noninferior cancer detection rate. The study suggests that AI has the potential for controlled implementation in screening mammography, which would involve risk management and real-world follow-up of performance.
Numerous other studies are evaluating the ability of innovative methods such as transformers and CNNs to improve the performance of AI algorithms in reading multiview mammograms.37–40
Lung cancer
More than 1.5 million lung nodules are identified in patients in the United States each year.41,42 Although modern technology has led to innovations in imaging, biopsy, and bronchoscopic techniques, these approaches are expensive, often invasive, and may fail to distinguish between benign and malignant nodules. As previously described, LDCT for lung cancer screening improves lung cancer mortality, although most eligible people are not being screened. 43 The NLST and NELSON trials showed that LDCT screening in those aged 50+ with 20 pack-year smoking history can result in a 20–24% decrease in lung cancer mortality.44,45
Ardila et al. from Google recently published a DL algorithm to enhance lung cancer screening. 46 Current lung cancer screening may frequently have variable interpretations by different graders and high rates of false positives and negatives.18,47,48 The investigators created an end-to-end algorithm that evaluates the entire screening process into a single workflow, from nodule identification to ROI detection to classification and risk prediction (Fig. 5).

A comprehensive representation of an end-to-end clinical workflow incorporating AI, exemplified by the model from Ardila et al. 58 This process initiates with the acquisition of imaging data, specifically a low-dose CT scan utilized for lung cancer screening in this example. Subsequently, the AI algorithm undertakes the critical task of detecting and segmenting the ROI. After this, the extracted ROI undergoes a rigorous classification algorithm, culminating in applying an activation function. This sequence concludes with the computation of the outcome variable, in this case, the risk prediction for lung cancer, demonstrating the workflow's capability in harnessing AI for advanced clinical diagnostics. CT, computed tomography; ROI, region of interest.
Their DL model utilized current and prior CT scans to assess lung cancer risk. It demonstrated state-of-the-art performance, with an AUC of 94.4%, validated on 6716 cases from the NLST Trial and further confirmed on an independent set of 1139 clinical patients. The model's efficacy was compared with that of radiologists through two reader studies. Notably, the model surpassed the performance of six radiologists in scenarios where prior CT images were unavailable, showing significant reductions in false positives (11% reduction) and negatives (5% reduction).
When previous CT scans were available, the model's performance was comparable with the radiologists. This model's capability to enhance the accuracy and consistency of lung cancer screenings could lead to broader adoption and improved screening practices globally. This development is particularly significant considering the current low rates of lung cancer screening worldwide.
Another group of investigators from MIT (Mikhael et al. 41 ) similarly hypothesized that a DL model assessing the entire volumetric LDCT data could be built to predict individual risk without requiring additional demographic or clinical data. Radiologists annotated suspicious lesions on NLST LDCTs using MD.ai, 49 although they had numerous exclusions for lack of follow-up data, thick slices (no thin slice images), and data load failure.
They first extracted features from the LDCTs with a pretrained three-dimensional (3D) Resnet-18 encoder, which computed a global feature vector for the volume through a Max Pooling layer and an attention-guided pooling layer. The resulting vectors were concatenated and passed through a hazard layer to produce a cumulative probability of developing lung cancer within 6 years. They trained the same algorithm architecture five times and averaged those risk predictions to create their new algorithm called Sybil.
Bounding box annotations of visible cancer nodules were used to guide the model's attention during training but were not used during testing. Surprisingly, Sybil performed exceptionally well, with an AUC of 0.86–0.92 for predicting lung cancer at 1 year in 3 multinational and multi-institutional cohorts. Sybil can run in the background at a radiology reading station when LDCT images are available without inputting demographic or other clinical data and requiring radiologists to annotate areas of interest. Interestingly, LDCT can be used to predict clinical factors such as smoking duration, weight, race, smoking status, sex, family history, and age. 41
With such improvements in the performance of screening methodologies, the general diagnostic workflow for screening studies may evolve shortly; for standard screening procedures such as mammograms and LDCT, a threshold for patient risk, risk prediction, and confidence scores could be set a priori. If the risk is low and confidence is high, then the AI algorithm's analysis could serve as the primary read, with physician as a secondary review. Conversely, if risk is high or confidence in the AI evaluation is low, those cases would be escalated for physician review to supplement the AI evaluation (Fig. 6).

Triaging the diagnostic workflow. For standard screening procedures such as mammograms and low-dose CT imaging, a threshold for patient risk, risk prediction, and confidence scores could be set a priori. If the risk is low and confidence is high, then the AI algorithm's analysis could serve as the primary read, with physician as a secondary review. Conversely, if risk is high or confidence in the AI evaluation is low, those cases would be escalated for physician review to supplement the AI evaluation.
Chen and colleagues map radiomics features of sub-centimeter solid nodules (SCSNs) to hematoxylin and eosin (H&E) stained pathological images. 50 Radiomics features were extracted from noncontrast chest CT images preoperatively. The least absolute shrinkage and selection operator regression model was used for radiomics feature extraction and radiomics signature construction.
Lung adenocarcinoma was significantly associated with an irregular margin and lobulated shape in the training and external validation sets. Their radiomics signature consisted of 22 features associated with lung adenocarcinomas of SCSNs, with AUCs of the combined model being 0.808–0.885. 50 Such studies will provide more granular and interpretable information to clinicians evaluating these DL models' outputs.
Although an LDCT scan can be used for screening, diagnostic CT scans and more advanced imaging modalities such as positron emission tomography—computed tomography scans are commonly used for staging and surveillance. Biomarkers of non-small cell lung cancers (such as mutational status of the Epidermal Growth Factor Receptor (EGFR), ALK, ROS, PD-L1, and next-generation sequencing) direct systemic therapy decisions, such as cytotoxic chemotherapy, tyrosine kinase inhibitors, immune checkpoint inhibitors, and others.
Currently, mutational analysis must be done on biopsy tissue and cannot be done noninvasively. Mu and colleagues recently used unsupervised hierarchical clustering of DL features to identify significant associations between DL expression patterns and EGFR mutation, stage, smoke status, histology, and sex. 51 The EGFR-Deep Learning Score (EGFR-DLS) distribution varied among subgroups based on EGFR mutation status and histology type.
There were notable differences between adenocarcinoma and squamous cell carcinoma for EGFR-wild-type patients. EGFR-DLS was significantly and positively associated with more prolonged progression-free survival (PFS) in patients treated with EGFR-tyrosine kinase inhibitors (TKIs). At the same time, EGFR-DLS is significantly and negatively associated with higher durable clinical benefit, reduced hyperprogression, and longer PFS among patients treated with immune checkpoint inhibitors (ICIs). Thus, the EGFR-DLS provides a noninvasive method for precise quantification of EGFR mutation status in NSCLC patients, which is promising for identifying NSCLC patients sensitive to EGFR-TKI or ICI treatments. 51
Prostate cancer
ML-based studies on the early detection of prostate cancer (PC) have increased significantly in recent years. 52 Numerous studies have focused on the use of AI in biopsy-based detection of PC, MRI-guided PC detection, CT scan-based PC detection, transrectal ultrasound-guided PC detection, 3D pathology-based PC detection, and genomics-based and proteomics-based PC detection. 52
Gleason grading is the most common system for grading PC. Urological pathologists typically conduct grading during examination of prostate biopsy cores. 53 The Gleason score ranges from 6 to 10, with lower scores representing slower-growing and biologically less aggressive disease. 54 Nagpal et al. developed a DL model using 112 million image patches from 1226 slides annotated by pathologists and tested on 331 slides from 331 patients to improve the Gleason score for PC slides obtained during prostatectomies. Their model achieved a higher accuracy rate of 0.70 compared with the human eye detection method. 55
Furthermore, another study utilized an AI-based algorithm trained on 698 biopsies, achieving 100% sensitivity in validation. 56 Cross-validation and data set diversity were emphasized for reliable predictions. Another study employed CNNs to enhance Gleason pattern and grading, training on 96 tissue specimens but noting the need for larger data sets. 57 A more extensive analysis of 838 biopsies showed that AI assistance improved pathologists' efficiency and reduced result inconsistencies, demonstrating the potential benefits of AI in PC diagnosis. 58
Recent advances have allowed DL techniques to be applied to advanced imaging modalities such as multiparametric MRI, 59 which has emerged as an alternative to transrectal ultrasound-guided biopsies.60,61 For example, in a retrospective study by Schelb et al. comparing clinical assessment with a U-Net DL system, both methods showed similar performance in detecting significant PC using T2-weighted and diffusion 3T MRI. 62 The study involved 312 men, with U-Net achieving a sensitivity of 96% and a specificity of 31% at a probability threshold of ≥0.22, comparable with Prostate Imaging-Reporting and Data System (PI-RADS) cutoffs.
The combination of PI-RADS and U-Net increased the PPV to 67%, demonstrating the effectiveness of integrating DL in clinical diagnostics for PC. Other studies have also shown improvement in detection when used with PI-RADS.63,64 Various CNN architectures, such as Alexnet, VGG, ResNet, and Googlenet, have also been developed to enhance prediction accuracy. 65 In medical imaging, including prostate multiparametric MRI scan images, data augmentation may be necessary to create an ideal data set for CNN networks such as VGG, leading to improved PC diagnosis, detection, and segmentation. 66
Similarly, improved technology has now allowed for whole pathology slide imaging, where high-resolution digital images are created from scanned pathology slides.67,68 First, the whole slide images are segmented, then divided into tissue patches called tiling, and then undergo image adaptation, region of interest extraction, classification, and the calculation of a confidence score. Predefined thresholds and levels of confidence can be set before analysis. After each step, the data are passed through a DL algorithm such as a CNN. 69
In 2021, the FDA authorized the first AI-based software to detect PC called Paige Prostate. 70 Paige Prostate resulted in the identification of four additional patients whose diagnoses were upgraded from benign/suspicious to malignant. In addition, this AI-based test provided an estimated 65.5% reduction in the diagnostic time for the material analyzed.
Interestingly, investigators are also working on nondestructive 3D computational pathology based on entire prostate biopsy cores, which may allow for volumetric segmentation of pathology tissues compared with a two-dimensional (2D) pathology approach. 71 The risk assessment for individuals with low- to intermediate-risk PC showed better results when considering the 3D glandular attributes of the cancer biopsies as opposed to the comparable 2D features. 71
Investigators across multiple institutions recently published a new foundation model called Virchow. 72 This deep neural network foundation model comprises 632 million parameters, is designed explicitly for computational pathology, and leverages self-supervised learning based on a training data set of 1.5 million H&E-stained whole slide images encompassing diverse tissue groups. This data set size is orders of magnitude larger than previous efforts.
Virchow surpasses state-of-the-art systems when subjected to various downstream tasks, including tile-level pan-cancer detection, subtyping, and slide-level biomarker prediction, demonstrating its superior performance on internal data sets drawn from the same population as the pretraining data and external public data sets. Virchow achieves a balanced accuracy of 93% for pan-cancer tile classification, with impressive AUCs of 0.983 for predicting colon microsatellite instability status and 0.967 for predicting breast CDH1 status.
These substantial performance improvements underscore the critical importance of pretraining on extensive pathology image data sets and suggest that even more significant performance enhancements can be achieved by pretraining on even larger data sets. This approach holds great potential for various high-impact applications, particularly in scenarios with limited training data, such as drug outcome prediction. 72
As detailed hereunder, AI is also being used in the genomic and proteomic-based detection of PC and other cancers. Traditionally, prostate-specific antigen-level testing has aided in risk stratification, diagnosis, and surveillance of PC.73,74 Within the last several years, many new biomarkers have been introduced for PC.75–77 However, further multiomics integration with genomic, transcriptomic, metabolomic, radiomic, and pathomic data will likely drive further development, including improved performance prediction, discovery of new PC biomarkers, and enhanced PC detection and monitoring/surveillance.78–82
Video and image analysis for early detection
The integration of AI in analyzing medical images and videos from non-radiology and non-pathology domains is increasingly recognized for its potential in early cancer detection. A recent study in cervical cancer aimed to develop a DL-based algorithm for automatically detecting cervical precancer and cancer in women, particularly in low-resource settings where screening is lacking. 83
Using a longitudinal cohort of 9406 women in Costa Rica and 60,000 digitized cervical images, the algorithm demonstrated high accuracy (AUC = 0.91) in identifying precancerous and cancerous cases, outperforming traditional cervigram interpretation and cytology by human experts. A single visual screening round for women aged 25–49 could identify more than half of precancer cases whereas referring only a small percentage for further evaluation.
Health workers can use a cell phone or similar camera for this type of cervical screening. Other studies have similarly reported the benefits of DL-based models for automated visual evaluation during colposcopy.84–86 This advancement is particularly promising for improving cancer care in low-resource settings, where access to skilled specialists is limited, thereby expanding the reach of life-saving early detection methods.
DL is also being utilized during endoscopies, such as colonoscopy. 87 Colonoscopy is the gold standard for colorectal cancer screening, although 25% of colorectal neoplasia may be missed at screening colonoscopy.88,89 AI has been shown to improve real-time polyp detection, to differentiate between benign and malignant lesions, and increase adenoma detection rate (ADR).90,91 A recent systematic review and meta-analysis of 33 trials showed impressive improvements with AI during colonoscopy, with a 1.39 incidence rate ratio for ADR using AI-aided systems. 92
Future efforts will focus on the ability of AI algorithms to classify polyps into premalignant versus non-premalignant, a process called computer-aided diagnosis. 93 Cost-effectiveness, long-term benefits, and reduction of overdiagnosis are all future research areas for this technology. However, this tool may ultimately become a co-pilot for endoscopists.
For skin cancers, AI algorithms employ dermoscopic images to discern malignant melanomas from benign moles with high accuracy, sometimes at par with expert dermatologists. Esteva, Thrun, and their team at Stanford created an AI tool for diagnosing skin cancer based on 129,450 clinical images. 94 The complete taxonomy contained >2000 diseases and was organized based on conditions' visual and clinical similarity. The model takes in an image of a skin lesion (e.g., melanoma), and the image is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet data set (which in turn was trained on 1.28 million images over 1000 generic object classes).
The authors fine-tuned their data set of 129,450 skin lesions comprising 2032 diseases. They calculated the probability of an inference class by summing the probabilities of the training classes according to that taxonomy structure. The investigators could visualize the high-dimensional data on the last hidden layer of the CNN, with different clusters of disease categories. Their model here achieved parity with 21 board-certified dermatologists.
However, <5% of their data set pertained to darker skin patients. Recent systematic analysis suggests ethnicity data are available for only 1.3% of images in publicly available skin data sets. 95 State-of-the-art dermatology AI models exhibit substantial limitations on diverse dermatology images, particularly dark skin tones, and uncommon diseases. 96 Fine-tuning AI models on diverse data sets can close the performance gap between light and dark skin tones.
These findings identify important weaknesses and biases in dermatology AI that should be addressed for reliable application to diverse patients and diseases. 97 More prospective trials and external validation of these models are required to ensure generalizability. However, the continuous refinement of these AI systems, coupled with growing databases of medical images, is paving the way for more nuanced and widespread early cancer detection capabilities.
Liquid biopsies
Currently, screenings are organ-specific and limited to five cancer types (breast, lung, colon, cervical, and prostate) for the general population. 98 These tests, however, do not provide information about other cancer types in different body parts. Multi-cancer early detection tests could detect >50 different types of cancer through a single blood draw, including many cancers that currently lack screening methods. 99
Ideally, such a test would be utilized in patients at elevated risk, can detect a maximal number of cancers, has a low false positive rate, limits overdiagnosis localizes the tissue of origin, and is easy to use.99,100 Cancer detection through a blood sample is possible because tumor cells shed DNA fragments into the bloodstream. These fragments, known as cell-free DNA (cfDNA), originate from the genome, which contains all the instructions for cellular functions.
Other biomarkers can also be detected in the blood, such as circulating tumor cells (CTCs) or tumor nucleic acids. 101 Peripheral blood can be sampled to detect these changes during screening, after treatments, and over time. By sequencing and analyzing these cfDNA fragments, it is possible to determine whether cells in the body are normal or have undergone malignant changes by recognizing differences in DNA methylation patterns in cfDNA. 102
ML algorithms are adept at sifting through the vast and complex data produced by liquid biopsies, identifying disease signatures with higher accuracy and efficiency. These advanced computational methods enhance the detection of cancer-specific signatures from minimal samples, such as blood, by effectively analyzing molecular data, which includes the genetic and epigenetic information of CTCs and DNA. For instance, the Galleri test evaluates >1 million methylation sites in cfDNA and applies ML and pattern recognition to identify abnormal methylation patterns that could signal the presence of cancer. Both supervised and unsupervised approaches have been described.103–105
Ethical considerations
Although AI's potential in early cancer detection is immense, it is imperative to address the associated ethical challenges and biases. Future research must prioritize the development of ethically sound and unbiased AI systems. This includes implementing robust data privacy measures, ensuring representativeness in data sets to avoid biased outcomes, protecting autonomy, promoting transparency and explainability in AI algorithms, and ensuring inclusiveness and equity.106–108 Addressing these concerns is crucial for the responsible and equitable implementation of AI in health care.
Future directions
Future applications of AI in early cancer detection will continue to build on the applications described in this review. Such models will include integration into clinical decision support, remote monitoring with the Internet of Medical Things, multimodal models, interpretable and explainable models, and digital twin technologies.
Single data models may overlook vital predictive interactions of complex biological models and interactions with interdependent data sources, such as epigenomics, proteomics, radiomics, pathomics, or clinical information.109,110 These multidimensional multiomic models may better describe the tumor landscape and improve diagnostic precision rather than using a single data type alone.
For instance, most current models focus on histology or genomics, but there is a growing interest in integrating these data sources to develop joint image-omic prognostic models. A recent model from the Broad Institute integrates pathology slides and molecular data to stratify patients into low and high-risk across 14 cancer types. 111 This integration and identification of explainable morphological and molecular descriptors can predict outcomes and identify features linked to prognosis.
Future models will also need to be interpretable and explainable. Elmarakeby et al. developed a biologically informed DL model that stratifies patients with PC by treatment resistance and evaluates molecular drivers of treatment resistance. 112 This model processes genomic changes in PC and then determines and ranks the importance of different mutations in explaining the outcome, thereby communicating more transparency about the roles of these genes.
Finally, digital twin technologies may link a physical model, such as a patient with a specific cancer, to a virtual counterpart. 113 This innovative approach leverages data from multimodal sources such as EMRs, imaging, pathology, multiomics, wearable sensors, and other data sources to simulate personalized patient care inside a computer (Fig. 7). Such an approach may help to improve diagnostic accuracy, predict and risk stratification, and even test outcomes of therapies in silico through sophisticated data analysis and ML.114,115 This technology could help identify those at risk of developing cancer.

Digital twin technologies may link a physical model, such as a patient with a specific cancer, to a virtual counterpart. This approach leverages data from multimodal sources such as EMRs, imaging, pathology, multiomics, wearable sensors, and other data sources to simulate personalized patient care inside a computer.
Conclusion
AI's integration in early cancer diagnosis signifies a paradigm shift in health care, offering transformative potential in analyzing complex data from diverse modalities. This technology, however, should complement, not replace, medical expertise. Successfully integrating AI into medicine requires a robust multidisciplinary approach. Collaborative efforts between clinicians, computer scientists, statisticians, ethicists, and other stakeholders are essential to ensure the safe, effective, and ethical application of AI technologies.
This collaboration will foster innovation while addressing the complex challenges of data privacy, accurate data curation, ethical considerations, storage costs, inherent biases (particularly in under-represented diverse demographics), and technological advancements in health care. Improving the quality of studies and adopting models in the future can be achieved by implementing quality assurance frameworks, such as Standard Protocol Items: Recommendations for Interventional Trials—Artificial Intelligence, 116 and standardizing radiomic feature values as suggested by the image biomarker standardization initiative.
In addition, establishing disease-specific “gold standard” test sets would enable clinicians to compare and evaluate different models more effectively. Addressing these issues through quality assurance frameworks and standardized methods is crucial. Despite these hurdles, AI's role in health care is expanding, with its promise in early cancer detection being particularly notable. The field's rapid growth highlights its significant potential in revolutionizing patient care and diagnosis processes.
This review highlights significant advancements in AI-driven methodologies for early cancer detection, emphasizing their precision and potential for integration into clinical practice. Our findings demonstrate that AI applications not only improve diagnostic accuracy but also contribute to personalized treatment planning, ultimately enhancing patient outcomes.
Footnotes
Authors' Contributions
Conceptualization by N.G.T., A.L.B., S.M., P.O., A.W., and D.K. Methodology by N.G.T. and A.L.B. Software, visualization, and project administration by N.G.T. Validation by A.L.B., A.P.D., and A.G.R. Formal analysis by N.G.T., A.L.B., and C.D. Investigation by N.G.T., S.J.F., and P.O. Resources by C.D. and A.L.B. Data curation by N.G.T., A.P.D., A.L.B., A.G.R., S.J.F., and C.D. Writing—original draft preparation by N.G.T., A.P.D., and A.L.B. Writing—review and editing by N.G.T., A.P.D., A.L.B., A.W., D.K., A.G.R., S.M., S.J.F., P.O., and C.D. Supervision by N.G.T. and C.D.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
