Abstract
The explosive growth of artificial intelligence (AI) marks a pivotal shift in colorectal cancer (CRC) screening, offering an avenue to bolster early detection and heighten the quality of patient care. This thorough review charts the progression of AI and its branches—machine learning (ML) and deep learning (DL) with an emphasis on their application within colonoscopy to aid in recognizing and diagnosing CRC. Presently, AI application in colonoscopy can be divided into two classes as follows: computer-aided detection (CADe) and computer-aided diagnosis (CADx). In randomized controlled trials, CADe has shown favorable outcomes, significantly increasing adenoma detection rate (ADR) and adenoma per colonoscopy (APC), while decreasing adenoma miss rate (AMR). For instance, a meta-analysis by Hassan et al. involving 4,354 participants demonstrated that CADe notably improved ADR (36.6% vs. 25.2%; relative risk [RR]: 1.44, 95% confidence interval [CI]: 1.27–1.62, p < 0.01) and APC (0.58 vs. 0.36; RR: 1.70, 95% CI: 1.53–1.89, p < 0.01). Conversely, Wei et al.’s review of real-world studies with 11,660 patients showed a statistically significant but clinically minimal improvement in ADR with CADe (36.3% vs. 35.8%; RR: 1.13, 95% CI: 1.01–1.28, p = 0.04) and no notable differences in APC highlighting the need for further research to understand the factors affecting these mixed results. AI in colonoscopy is not limited to spotting polyps; it can also aid in estimating polyp size, evaluating the quality of bowel preparation, and appraising CRC risk, streamlining the entire screening process. Nevertheless, the adoption of AI in CRC screening encounters several obstacles. The presence of false positives, concerns over data privacy, inherent biases in training data, and susceptibility to cyber threats are matters that warrant vigilant consideration. Furthermore, there are ethical and regulatory dilemmas regarding AI’s role in health care, issues of transparency, accountability for diagnostic errors, and the potential for reducing physicians’ diagnostic expertise that need to be resolved. Economic barriers, such as the lack of defined reimbursement methods for AI applications in endoscopy, stand as challenges too. Looking forward, the advancement of AI in CRC screening requires deep-seated collaboration among various fields, enhancements in medical training programs, and initiatives to ensure fair access to AI, averting health care disparities.
Introduction
Artificial intelligence (AI) refers to the capabilities of machines to exhibit intelligence akin to that of humans, such as learning and problem-solving. 1 This concept of AI emerged at 1956 Dartmouth Conference, thanks to visionaries like McCarthy and colleagues, and since then has seen rapid advancements. 2 In the realm of medicine, AI involves sophisticated computational methods to interpret complex medical data, predict health outcomes, and support clinical decisions, potentially transforming health care through enhanced diagnostics and treatment plans.
Professionals in health care might encounter confusion regarding the distinctions between AI, machine learning (ML), and deep learning (DL). ML is a set of methods that automatically detect patterns in data to anticipate future events or assist in decision-making under uncertainty. Its goal is to perform tasks by analyzing vast amounts of data using algorithms, reducing the need for explicit programming. While ML can be remarkably effective, it often depends on substantial data inputs and sometimes requires human oversight. 3
DL, a subset of ML, enables computers to collect, analyze, and interpret data swiftly and autonomously, forming learning models from large datasets. DL is realized through artificial neural networks, which are inspired by the brain’s neuronal hierarchy. It shows great promise in health care for functions such as detecting lesions and drafting initial medical reports, leveraging supercomputing advancements for speedy accelerated neural network computation 3 (Fig. 1).

Relationship Diagram of AI, ML, and DL.
This article will explore the application of AI in the screening of colorectal cancer (CRC), highlighting its potential to enhance early detection and improve patient care outcomes.
Overview of Colorectal Cancer
CRC poses a global public health concern, characterized by a significant incidence and mortality rate. By 2030, it is expected that there will be an additional 2.2 million new CRC cases and 1.1 million cancer related deaths. 4 In the United States, CRC is the second leading cause of cancer-related deaths. However, there has been a notable decrease in both incidence and the death rate over recent decades, thanks largely to effective screening programs that can remove precancerous polyps and catch the disease early on.5–7
Screening for CRC is unique among cancers in that it has been proven to reduce mortality rates for both average-risk men and women. There are various screening tests available, each with its own advantages and drawbacks. 8 The U.S. Preventive Services Task Force recommends a range of CRC screening options, including high-sensitivity Guaiac fecal occult blood test, stool DNA tests, CT colonography, flexible sigmoidoscopy, and colonoscopy.9–11
Colonoscopy is regarded as the gold standard screening tool for CRC. 12 The adenoma detection rate (ADR) is an indicator of the efficacy of colonoscopy, with higher rates indicating a better outcome. The benchmark for ADR is set at 25% or more, aiming for over 30% in men and over 20% in women.13,14 Despite its effectiveness, colonoscopy’s reliability can be compromised by the varying skill levels of the physicians performing the procedure, which can result in missed lesions. Missed adenomas can be as high as 26%, with 9% for advanced adenomas and 27% for serrated lesions. 15 Importantly, a higher ADR is associated with a lower risk of CRC and related deaths after a colonoscopy. There is a negative correlation between ADR and CRC incidence, where every 1% increase in ADR correlates with a 3–6% decrease in the risk of CRC. 16
The accuracy of CRC screening can be compromised by various challenges, such as the incomplete examination of the colon, endoscopists missing adenomas that are difficult to spot because of their size, shape, or location, or not allowing enough time for a thorough inspection.17,18 These factors can lead to an increased risk of undetected polyps. To mitigate these issues, it is critical for patients to be examined by highly proficient endoscopists. Introducing AI into the CRC screening process has the potential to augment the skill level of endoscopists, helping them to identify subtle polyps that might otherwise be missed, thereby enhancing the effectiveness of the screening and improving outcomes for patients.
AI in Colonoscopy
AI applications in colonoscopy can be divided into two classes as follows: computer-aided detection (CADe) and computer-aided diagnosis (CADx). CADe systems apply ML algorithms to spot precancerous lesions during a colonoscopy, which helps to standardize detection across different practitioners and improve the overall quality of the screening process. Meanwhile, CADx systems take on the role of characterizing and differentiating the lesions that have been detected, performing what is known as optical biopsies and potentially bypassing the need for traditional histopathological tissue examination.
CADe
Most of the research on CADe revolves around randomized control studies, which have shown favorable outcomes, indicating a significant increase in ADR and adenoma per colonoscopy (APC) and a corresponding decrease in adenoma miss rate (AMR). A meta-analysis of five randomized controlled trials (RCTs) by Hassan et al. which included 4,354 participants showed that the ADR was notably greater in the group using CADe compared with the non-CADe group (36.6% vs. 25.2%; relative risk [RR]: 1.44, 95% confidence interval [CI]: 1.27–1.62, p < 0.01). APC was also higher in the CADe group compared with control (0.58 vs. 0.36; RR: 1.70; 95% CI: 1.53–1.89, p < 0.01). 19
Three tandem studies have compared colonoscopies with AI to colonoscopies without AI showing a significant reduction in AMR when utilizing CADe. Moreover, nonpolypoid and right-sided adenomas, often overlooked during colonoscopy, were less likely to be missed when CADe was used.20–22
Miss rates for flat adenomas and serrated polyps are higher than for polypoid adenomas, with sessile serrated lesions (SSLs) disproportionately contributing to postcolonoscopy CRCs. 23 SSLs were often excluded from primary endpoint analyses in previous CADe trials due to difficulty in being dected24–26 ; however, these tandem studies included SSLs in the primary endpoint analysis. The miss rate of SSLs was significantly lower in the CADe-first group compared with the high-definition white light colonoscopy-first group (7.140% vs. 42.11%; p = 0.0482) in the Glissen Brown et al. study. In addition, in the Kamba et al. study, the miss rate of SSLs in the CADe-first group was significantly lower than in the standard colonoscopy-first group (13.0% vs. 38.5%, p = 0.03). 20
In contrast, Wei et al. illustrated that CADe offered no significant benefit. Wei et al. evaluated the use of CADe in colonoscopy in community-based, nonacademic practices. The study included adults aged 45 and older undergoing screening or low-risk surveillance colonoscopy, enrolling 769 patients in total—382 without CADe and 387 with CADe. The use of CADe resulted in no significant difference in the number of APC compared with non-CADe (0.73 vs. 0.67, p = 0.496). CADe did not improve the identification of serrated polyps per colonoscopy (0.08 vs. 0.08, p = 0.965), but it did increase the detection of nonadenomatous nonserrated polyps (0.90 vs. 0.51, p < 0.0001). There was no difference in ADR between the non-CADe and CADe groups for screening colonoscopies (34.6% vs. 34.3%, p = 1.000) or for surveillance procedures (43.9% vs. 40.0%, p = 0.654). CADe was associated with a decreased adenomas per examination (APE) in all colonoscopies (44.8 vs. 56.8 APE, p < 0.001) and in screening colonoscopies (43.0 vs. 57.8 APE, p < 0.001). 27 Overall, CADe did not significantly increase adenoma detection. The increased detection of polyps in the CADe group was mainly due to nonadenomatous nonserrated polyps, leading to a lower APE. The lack of significant benefit from CADe in this study could be due to various factors. Experienced community-based endoscopists might already be proficient at detecting adenomas visible to the CADe system, limiting its additional value. Some endoscopists may ignore small lesions highlighted by CADe, considering them clinically unimportant, reducing the potential benefit of the system. In addition, endoscopists in community practices might be frustrated by false positives from CADe systems, leading to missed true-positive lesions. Finally, the relatively small size of the study raises the possibility of type II statistical errors, particularly in measuring outcomes with small differences between groups. 27
In a recent systematic review by Wei et al., examining the impact of CADe in real-world colonoscopy practice, which included 12 studies (10 fully published studies and 2 abstracts) with a total of 11,660 patients, there was a statistically significant but clinically minimal improvement in ADR with CADe versus without CADe (ADR 36.3% vs. 35.8%; RR: 1.13, 95% CI: 1.01–1.28, p = 0.04). When two abstracts were excluded, there was no longer statistically significant improvement in ADR. Subanalyses demonstrated a statistically significant increase in ADR with CADe in the six prospective studies (37.3% vs. 35.2%, RR: 1.15, 95% CI: 1.01–1.32), but not in the six retrospective studies (35.7% vs. 36.2%, RR: 1.12, 95% CI: 0.92–1.36). No notable differences in APC were observed between studies with and without CADe among the six applicable studies that included adequate data on APC 28 (Table 1).
Summary of 12 Studies of CADe in Real-World Settings
Five of the studies examined the GI Genius (Medtronic) system, involving a total of 6,892 participants. The analysis showed no significant difference in ADR with GI Genius compared to without it (RR: 0.96, 95% CI: 0.85–1.07, p = 0.42). Similarly, in three studies, where data allowed for APC comparison, there was no significant difference in APC with or without GI Genius (rate ratio: 0.94, 95% CI: 0.82–1.08, p = 0.37). This finding is likely not specific to GI Genius, as many CADe platforms exhibit similar performance and usability features. 28
As per authors, one cause for the variation between outcomes in the real-world clinical practice and RCT settings is that in real world, procedures are carried by a diverse group of endoscopists, each with their own level of expertise and different practice, whereas RCTs include colonoscopies with good bowel preparation performed by expert endoscopists with documented withdrawal time. In addition, the absence of blinding for endoscopists is a significant consideration in evaluating new technologies. The awareness of being observed (Hawthorne effect) might significantly impact performance in RCTs. Although real-world studies cannot entirely eliminate the Hawthorne effect, they might be less subject to unconscious bias favoring CADe. 28
The authors highlighted that it would be premature to dismiss the potential of AI to enhance colonoscopy quality based on current mixed results from RCTs and real-world settings. Instead, further research is needed to understand the underlying factors affecting these outcomes.
A most recent comprehensive systematic review and network meta-analysis of all RCTs by Aziz et al. was conducted to evaluate the effectiveness of AI versus various endoscopic techniques intended to enhance ADRs, including the use of distal attachment devices, dye-based/virtual chromoendoscopy, water-assisted methods, and balloon-assisted technologies. The review included total of 94 RCTs involving 61,172 patients (average age 59.1 ± 5.2 years, with 45.8% being female), and 20 different study interventions were analyzed. The network meta-analysis revealed that AI significantly improved ADRs compared to various other methods as follows: dye-based chromoendoscopy (RR: 1.22, CI: 1.06–1.40), endocap (RR: 1.32, CI: 1.17–1.50), autofluorescence imaging (RR: 1.33, CI: 1.06–1.66), endoring (RR: 1.30, CI: 1.10–1.52), endocuff (RR: 1.19, CI: 1.04–1.35), endocuff vision (RR: 1.26, CI: 1.13–1.41), full-spectrum endoscopy (RR: 1.40, CI: 1.19–1.65), flexible spectral imaging color enhancement (RR: 1.26, CI: 1.09–1.46), high-definition (RR: 1.41, CI: 1.28–1.54), linked color imaging (RR: 1.21, CI: 1.08–1.36), narrow band imaging (RR: 1.33, CI: 1.18–1.48), water immersion (RR: 1.47, CI: 1.19–1.82), and water exchange (RR: 1.22, CI: 1.06–1.42). 29
AI was found to significantly enhance APC compared with high-definition (HD) colonoscopy, although the improvement was not as marked as it was for ADRs compared with other endoscopic methods. This may be due to the inclusion of patients undergoing procedures for reasons other than screening or surveillance in studies comparing AI with HD colonoscopy. Previous retrospective studies have shown that both ADR and APC are significantly lower in patients undergoing diagnostic colonoscopy compared with those undergoing screening or surveillance colonoscopy. Therefore, ADR and APC calculations should be limited to screening or surveillance colonoscopy. Future research focusing exclusively on screening or surveillance could better showcase AI’s potential in increasing APC compared with other techniques. In addition, AI did not improve the serrated polyp detection rate (SPDR) compared with other methods; however, only three AI studies reported SPDR. 29
In conclusion, while RCTs demonstrate the potential of AI to improve colonoscopy quality by increasing ADR and APC and reducing AMR, real-world studies show mixed results. The possible advantages of AI in improving colonoscopy should not be overlooked; instead, further investigation is necessary to address the variability in outcomes and optimize the integration of AI technologies in clinical practice.
CADx
Optical diagnosis (OD) predicts the histology of a polyp from its endoscopic features. The American Society for Gastrointestinal Endoscopy in 2011 recommended two OD-based strategies to decrease risks and costs associated with the endoscopic removal of small polyps. 30 The first strategy, “leave-in-situ,” is suitable for small hyperplastic polyps in the rectosigmoid region and is acceptable when the negative predictive value (NPV) for detecting diminutive rectosigmoid adenomas exceeds 90%. 30 The second strategy, “resect and discard,” uses OD to determine the histological characteristics of diminutive polyps, followed by their resection and disposal without histopathological evaluation. For the “resect and discard” approach to be considered valid, the concordance between the surveillance intervals determined by OD and those established through pathological examination must be at least 90%. 31
The effectiveness of OD is often constrained, particularly if conducted by endoscopist without specialized training, which can impede their widespread adoption.31–33 The adoption of CADx has gained attention as a promising approach to improve accuracy and uniformity in OD.34,35
When applied to unaltered colon polyp video sequences, AI models demonstrate accurate differentiation between adenomas and hyperplastic polyps in diminutive colorectal polyps. The model achieves an overall accuracy of 94%, sensitivity of 98%, and specificity of 83%. In addition, the positive predictive value (PPV) is 90%, and NPV is 97%. 36
Multiple studies have been conducted in clinical setting to assess the effectiveness of CADx in the optical biopsy of colorectal polyps. In a prospective, single-center study in Italy, Hassan et al. demonstrated the practicality of CADx, with successful diagnoses in 291 out of 295 cases (98.6%) with NPV of 97.6% for rectosigmoid lesions measuring 5 mm or less. The study indicated that the real-time performance of CADx surpassed the 90% NPV supporting the leave-in-situ strategy and suggesting that polypectomies and associated cost could be reduced by 44.4%. For the resect-and-discard approach, the CADx system was able to accurately predict postpolypectomy surveillance intervals in alignment with the American 95.9% (95% CI: 89.8–98.4%) and European guidelines 95.6% (95% CI: 90.8–98.0%). 37
In a prospective cohort study by Rondonotti et al conducted in Italy at four endoscopy centers, AI-assisted OD achieved a NPV of 91.0% for small rectosigmoid polyps, which is in line with the Preservation and Incorporation of Valuable endoscopic Innovations (PIVI)-1 criteria for a leave-in-situ approach. For the resect-and-discard approach, the AI system fulfilled the PIVI-2 requirements. The study found that nonexperts had a lower accuracy rate (82.3%; 95% CI: 76.4–87.3%) in AI-assisted OD compared to experts (91.9%; 95% CI: 88.5–94.5%). Nonetheless, the gap between nonexperts and experts decreased over time as nonexperts gained more experience using the system. Experts were defined as endoscopists who have completed specialized training, routinely perform optical diagnoses, and were subject to regular audits and assessments. 38
Barua’s multicenter study compared a CADx-system with standard visual inspection of small rectosigmoid polyps. The study included 892; 359 of them were neoplastic. Sensitivity for the diagnosis of neoplastic polyps with CADx was 90.4% (95% CI: 86.8–93.1, p = 0.33) versus 88.4% (95% CI: 84.3–91.5) with standard visual inspection. Specificity was 85.9% (95% CI, 82.3–88.8) with CADx versus 83.1% (95% CI: 79.2–86.4) with standard visual inspection. The proportion of polyp assessment with high confidence was 92.6% (95% CI: 90.6–94.3) with CADx versus 74.2% (95% CI: 70.9–77.3) with standard visual inspection. There was an observed increment of 1.3% for NPV and 3.1% for PPV with CADx.
Although AI did not notably increase sensitivity, it did improve specificity and the clinicians’ confidence in their diagnoses. CADx also met the PIVI criteria with an NPV over 90%, supporting the “leave-in-situ” strategy. 39
In a prospective study by Li et al. at four large tertiary referral centers, the NPV for CADx was 80.6% and 83.3% for endoscopists regarding diminutive sigmoid-rectal polyps, with both failing to meet the PIVI-1 threshold of 90%. For the “resect-and-discard” strategy, neither CADx nor endoscopists met the 90% agreement threshold according to U.S. guidelines, but both surpassed it as per European guidelines, with CADx at 97.5% and endoscopists at 97.1%. Endoscopists outperformed CADx in overall diagnostic sensitivity of 70.3% (95% CI: 65.7–74.7) (p < 0.001) versus 61.8% (95% CI: 56.9–66.5) and accuracy of 75.2% (95% CI: 71.7–78.4) versus 71.6% (95% CI: 68.0–75.0) (p = 0.023) for neoplastic polyps, although CADx exhibited greater specificity. Concordance between CADx and endoscopists led to improved diagnostic accuracy, indicating that CADx could enhance endoscopist performance by acting as a secondary reviewer to assist in accurate polyp diagnosis. 40
Despite the positive outcomes, clear determinations about the AI’s efficacy in resect-and-discard and leave-in-situ are still inconclusive. The role of CADx as an adjunctive reviewer instead of a stand-alone diagnostic instrument seems to yield greater advantages. However, this has been challenged in this recent study by Djinbachian et al. The authors conducted a randomized noninferiority trial comparing the accuracy of an autonomous AI system versus a human-assisted AI (AI-H). Patients were allocated into two groups as follows: the autonomous AI group, which relied solely on the CADx system for OD of all colorectal polyps, and the AI-H group, where endoscopists made the diagnoses after reviewing real-time CADx assessments. Using pathology as the gold standard, the accuracy of OD was 77.2% (95% CI: 69.7–84.7) in the autonomous AI group and 72.1% (95% CI: 65.5–78.6) in the AI-H group (p = 0.86). The autonomous AI demonstrated a significantly better alignment with pathology-based surveillance intervals compared with the AI-H group with 91.5% (95% CI: 86.9–96.1) versus 82.1% (95% CI: 76.5–87; p = 0.016). The study showed that OD using autonomous AI is not inferior in accuracy compared with human-based diagnosis.
While both autonomous AI and AI-H had relatively low accuracy rates for OD, the autonomous AI was more consistent with pathology-based surveillance intervals. Only AI system achieved PIVI1 thresholds of 90% agreement with surveillance intervals. The observed high rate of diagnostic disagreement potentially leading to suboptimal outcomes could be due to endoscopist’s skepticism toward AI or their perceived supremacy over AI evaluation, which might lead them to discount AI’s suggestions. 41
For larger polyps, CADx systems can enhance decision-making in the management of larger polyps, potentially improving outcomes in both resection strategies and referral processes. Nemoto et al. successfully developed a CADx system that can identify endoscopically treatable early-stage CRCs using standard nonmagnified white-light endoscopy images from over 1000 early-stage CRCs. The CADx system demonstrated high specificity of 94.4% (95% CI: 91.3–96.6), with an 87.3% (95% CI: 83.7–90.4) accuracy. Compared to human physicians, the CADx’s diagnostic performance matched that of expert and was better compared with trainees, making it a promising tool for trainees and potentially an alternative to expert physicians. 42
Yao et al. developed the first CADx system to accurately predict deep invasion of large sessile colorectal polyps, achieving an accuracy level of 90.4% comparable to experts and better than both senior and junior endoscopists. Notably, the study indicated that junior endoscopists’ diagnostic accuracy was significantly improved when assisted by CADx from 75.4% to 85.3% (p = 0.002). 43
Other Applications in CRC
AI has the potential to significantly improve both quality and quantity measures in CRC screening processes. AI can provide more accurate polyp measurement, aid in identifying anatomical landmarks such as successful cecal intubation, automate the timing of the withdrawal phase, and more thorough evaluation of the bowel preparation and mucosal surface area, all of which contribute to improved efficiency and workflow in CRC screening. 44
For example, the ENDOANGEL system can assess the cleanliness of the colon in real time. It has been trained on both still images and videos from colonoscopies, achieving an accuracy rate for evaluating bowel preparation that ranges from 89.4% to 93.3%. This level of accuracy exceeds that of human endoscopists. 45
Gong et al. developed a CADe system designed to monitor the speed of the colonoscope withdrawal in real time. It automatically records the withdrawal time and provides alerts to the endoscopist if the colonoscope slips. 46
Accurate measurement of polyp size is crucial for choosing appropriate polypectomy techniques and setting surveillance intervals. A new tool, the virtual scale endoscope (VSE), by Fuji SCALE EYE, is a new endoscopic imaging technology that allows for a virtual measurement scale to be superimposed during endoscopies. A study by Djinbachian et al. aimed to compare the accuracy of VSE against the existing methods using endoscopic rulers (ER) and biopsy forceps (BF). In a preclinical trial, six endoscopists conducted 60 randomized measurements each using VSE, ER, and BF, totaling 360 measurements. The study found that VSE was significantly more accurate at 82.7% (95% CI: 80.8–84.8) than both BF at 78.9% (95% CI: 76.2–81.5) (p = 0.02) and ER at 78.4% (95% CI: 76.2–81.5) (p = 0.006). 47
CAD has the potential to improve diagnostic capability of computed tomography colonography (CTC). A study using the Haralick texture analysis with CTC demonstrated enhanced classification (area under the curve [AUC]: 0.74–0.85) in differentiating neoplastic from non-neoplastic lesions. 48 Another study integrated CAD with CTC, increasing tumor detection sensitivity with decreasing sphericity and suggesting CAD’s usefulness in detecting morphologically flat nonpolypoidal cancer. 49
Colon capsule endoscopy (CCE) is a minimally invasive CRC screening method, particularly useful for patients with incomplete colonoscopy and contraindications for sedation. However, manual reading of CCE images for polyp detection poses a risk of error, prompting a study by Balnes-Vidal et al. that developed a deep learning-based algorithm achieving 96.4% accuracy, 97.1% sensitivity, and 93.33% specificity for automated polyp detection, with further matching of polyps based on size, location, and morphology between CCE and colonoscopy. 50
Other applications of AI are in personalized medicine, particularly in assessing patients’ risk for developing CRC. Few studies have indicated its utility in this area. For instance, the effectiveness of a ML algorithm known as ColonFlag was evaluated in detecting early-stage CRC by analyzing data from the Kaiser Permanente Northwest Region tumor registry. This analysis incorporated variables like gender, age, and complete blood count (CBC) from a sample of the insured population in the United States. Involving 17,095 subjects—900 with CRC and 16,195 controls without CRC—the study extracted patient demographics and CBC results. ColonFlag demonstrated greater sensitivity in identifying CRC in both older and younger age brackets compared to using hemoglobin levels alone, both immediately after and six months following the CBC test. It was notably more sensitive in detecting CRC within the first 180 days (39.9%) than in the 181–360-day period (27.4%) before a CRC diagnosis was confirmed. Importantly, ColonFlag was more sensitive in pinpointing CRC cases in the broader 40–89 age range, in contrast to the targeted 50–75-year-old screening demographic. 51
Researchers used electronic medical record data from Israel and the United Kingdom to develop an AI model named MeScore for predicting high CRC risk based on a person’s CBC, age, and sex. In Israel, they split the data into two parts as follows: one to create the model and another to test it. The U.K. data were used as a separate test to ensure that the model worked outside of Israel. MeScore performed well; it exhibited an area under the curve (AUC) of 0.82 ± 0.01 and 0.81 for Israeli and U.K. validation sets, respectively. It was more accurate than just using age, with 88% and 94% accuracy in the two countries at a set level of sensitivity. When MeScore was combined with guaiac fecal occult blood test, detection of CRC in the Israeli group improved by 115% (from 170 to 365 in 63,847 individuals). 52
In a separate study by Kinar et al., they analyzed MeScore’s performance in predicting CRC risk in average-risk individuals, demonstrating odds ratios of 10.9 and 21.8 for specific MeScore cutoffs. The study highlighted the potential of ML algorithms like MeScore in identifying high-risk patients for screening colonoscopy. 53
Other potential benefits
The use of AI in colonoscopy, specifically with CADe systems, has the promise to improve the precision of detecting polyps. Integrating AI technology into standard colonoscopy practices has the capacity to improve ADR, leading to better identification and treatment of precancerous lesions before they turn cancerous. Thus, AI is expected to enhance the quality of life for patients over their lifetime while reducing health care costs due to fewer needs for further colon cancer treatments and follow-ups. Despite the initial expenses associated with acquiring and maintaining AI technology, there is a potential reduction in overall costs per case with its adoption. 54
AI can ensure consistent quality in OD, thereby expediting the adoption of leave-in-situ and resect-and-discard strategies. This, in turn, will lead to a substantial reduction in the cost of colonoscopy due to reduced expenditures on unnecessary polypectomies and pathological examinations.55–57
A study conducted by Mori et al. across Japan, Norway, England, and the United States explored the cost implications of using CADx alongside a diagnose-and-leave strategy for diminutive colorectal polyps. The research revealed potential cost savings ranging from US$34 to US$125 per colonoscopy, depending on the country. These savings could have significant financial implications, with potential total annual reimbursement savings of up to US$149.2 million, US$12.4 million, US$1.1 million, and US$85.2 million for colonoscopies covered by public health insurances in Japan, England, Norway, and the United States, respectively. 58
Areia et al. demonstrated that using AI detection tools in screening colonoscopy is a cost-saving strategy that significantly reduces CRC incidence and mortality. Researchers conducted a Markov model microsimulation on a hypothetical cohort of 100,000 individuals aged 50–100 years in the United States. The primary analysis compared screening colonoscopy with and without AI every 10 years from ages 50 to 80, assuming a 60% uptake rate. Secondary analyses modeled a once-in-a-lifetime colonoscopy at age 65. Compared to no screening, AI-assisted screening colonoscopy reduced CRC incidence by 48.9% and mortality by 52.3%, yielding incremental gains of 4.8% and 3.6%, respectively, over standard colonoscopy. AI detection tools decreased the discounted costs per screened individual from $3,400 to $3,343, saving $57 per person. Secondary analyses of once-in-a-lifetime colonoscopy showed similar results. On a population level, using AI detection in screening colonoscopy annually prevents an additional 7,194 CRC cases and 2,089 related deaths, resulting in an annual saving of $290 million. 59
Hassan et al. evaluated the cost-utility of GI Genius in a high-risk Italian population. A 1-year cycle cohort Markov model was created to simulate disease progression in a cohort of Italian individuals, aged 50 years, who tested positive on the fecal immunochemical test and underwent colonoscopy with or without the AI system. Detection rates for adenoma or CRC were specific to each technique, and costs were estimated from the perspective of the Italian National Health Service. The results showed that the GI Genius system was dominant compared to standard colonoscopy, preventing 155 CRC cases (a reduction of 2.7%) and 77 CRC-related deaths (a reduction of 2.8%) and improving quality of life by 0.027 quality-adjusted life-years. Although there was an increase in screening costs (+€10.50) and adenoma care costs (+€3.53), these were offset by savings in CRC care costs (–€28.37), resulting in a total saving of €14.34 per patient. Probabilistic sensitivity analysis confirmed the cost-effectiveness of the AI system with almost an 80% probability. 60
AI has the potential to democratize health care, ensuring that high-quality services are accessible to all. By providing doctors with advanced AI tools, even those in regions without specialized services can deliver care that stands up to the expertise found in better-resourced settings. With ongoing issues like limited resources and the scarcity of highly trained doctors, AI could be the key to enabling physicians in less advantaged areas to provide care that matches the quality of that given by experts. In the research by Waljee et al., an ML algorithm was created using factors such as age, sex, race, body mass index, complete blood count, and complete metabolic panel blood tests to predict luminal gastrointestinal cancers. This algorithm showed commendable accuracy in sub-Saharan Africa, achieving an AUC of 0.75. 61
Moreover, integrating computer vision algorithms can support and enhance clinical expertise in low-resource settings where there is a shortage of adequately trained professionals. For example, sub-Saharan Africa faces a shortage of trained pathologists, especially for gastrointestinal histopathology, which is crucial for cancer diagnosis. A specialized computer vision algorithm was developed to evaluate histopathology, aiding in the diagnosis and classification of CRC. Through cross-validation, this technology demonstrated notable accuracy with an AUC of 0.76. 61
Challenges and Limitations
CADe systems have faced some criticism, particularly concerning their potential to extend the duration of procedures and the frequency of distracting false positive identifications of polyps, which are often caused by misinterpretations of normal colon features like folds, fecal matter, debris, and bubbles.
The issue of false positives is a concern, especially regarding the fatigue they may cause to proceduralists. A survey in 2023 revealed that over 80% of gastroenterologists were concerned about the high number of false positive alerts generated by commercially available CADe systems. 62 Interestingly, the actual impact of false positives on withdrawal time seems minimal, with 91% of false positive events lasting under half a second. 63 In a post hoc analysis of an RCT, Hassan et al. found a high rate of overall false positives—27.3 per colonoscopy—but noted that only 5.7% of these necessitated extra examination time, averaging 4.8 s per occurrence, leading to a trivial 1% increase in total withdrawal time. 64
Tang et al. investigated the potential of diminishing false positives by utilizing water exchange colonoscopy. This method enhanced mucosal clarity and reduced the common distractions caused by bubbles and fecal debris. It notably improved the detection of additional polyps with CADe compared to without it (30.1% vs. 12.3%, p = 0.001). 65
Given the rapid evolution of AI technologies and the associated developments, it is likely that their integration into everyday clinical practice is imminent. However, the real-world application of AI is complex and raises many ethical and regulatory issues.
The implementation of AI algorithms, particularly DL-based ones, necessitates large datasets for initial training, ongoing validation, and adjustment. To ensure the models’ applicability, these datasets ought to be amassed from diverse geographical locations. This necessitates a clear framework for data governance in AI, focusing on ownership and management. 66 Patients rightfully express concerns about access to their data, its use for ancillary purposes, the extent of data anonymization, and the risk of commercialization. Conventional models of informed consent might not suffice for AI development as they could limit data access and introduce uncertainties about its future use, possibly skewing algorithm training. 67
In addition, there is the risk of data being manipulated through cyberattacks, which poses a threat to patient safety and could lead to fraudulent reimbursement claims. 68
To address these privacy concerns, there is a movement toward the use of secure, cloud-based platforms for the management of electronic health records. Despite this, significant hurdles remain in terms of feasibility and the need for coordinated efforts to standardize data across various health care institutions. 67
Training data for algorithms may possess intrinsic biases if it fails to reflect the population’s diversity accurately. This concern is particularly acute in health care, where marginalized groups or those with rare diseases may be underrepresented in training data. 69 It is essential to maintain transparency in the datasets and clarity in the algorithms’ decision-making processes to address and reduce these risks effectively. A key goal is to avoid the “black box” problem of AI models that operate without clear explanations of their decision-making processes. The field of “Explainable AI” has developed to promote transparency in clinical decision-support systems.70,71 Another challenge is the potential for data manipulation through cyberattacks, which could harm patients or be used for fraudulent activities related to reimbursement.
Legal questions also arise as AI begins to influence clinical decisions, particularly regarding liability for medical errors. Identifying responsibility is critical when an AI-derived conclusion is incorrect. In addition, there is the phenomenon of automation bias, where clinicians may favor AI recommendations even if they are incorrect, potentially amplifying errors. 72 The increased reliance on AI decision support can also lead to the risk of de-skilling physicians. 73
Another obstacle is the issue of reimbursement. As of now, there is no established Current Procedural Terminology code for the use of AI in endoscopy, making it more challenging to implement, particularly in private practices. Therefore, it is crucial to carry out detailed and high-quality cost-effectiveness evaluations of these AI devices across health care systems with various reimbursement policies before they are adopted into standard clinical practice. Addressing this issue might involve developing multifunctional AI systems that can perform several tasks using a single device. 74
Future Direction
The integration of AI into CRC screening represents a transformative shift that demands extensive investment and collaboration among government bodies, academic institutions, and the private sector. The incorporation of AI is set to redefine the roles of clinicians and the nature of patient interactions. Although forecasting the full extent of AI’s impact is challenging, it is evident that health care professionals will need to adapt to these technologies. This adaptation includes a revision of medical education curricula to encompass foundational AI concepts and an awareness of its limitations, particularly in specialized environments such as endoscopy suites. It is also essential for clinicians to have the means to record their doubts or disagreements with AI-generated insights, which could involve integrating AI-generated recommendations into patient medical records as a reference.
AI in CRC is often regarded as a supportive tool that enhances polyp detection and characterization for less experienced gastroenterologist. However, the aspiration should extend beyond mere supplementation to achieve widespread implementation in health care settings. The objective is to prevent the emergence of disparities in the quality of care provided by AI-enhanced facilities versus those without such technology, which could lead to a bifurcated health care system. To circumvent such inequality, it is crucial to develop and implement strategies that encourage the adoption of proven AI applications across the health care spectrum.
Footnotes
Authors’ Contributions
M.E.Z. and S.A.G. contributed equally to this work. Conceptualization: M.E.Z. and S.A.G.; Software: M.E.Z. and S.A.G.; Validation: M.E.Z. and S.A.G.; Investigation: M.E.Z. and S.A.G.; Resources: M.E.Z. and S.A.G.; Data Curation: M.E.Z. and S.A.G.; Writing—Original Draft Preparation: M.E.Z. and S.A.G.; Writing—Review & Editing: M.E.Z. and S.A.G.; Visualization: M.E.Z. and S.A.G. Supervision: M.E.Z. and S.A.G.; Project Administration: Nonapplicable; Formal Analysis: Nonapplicable; Methodology: Nonappxlicable; Funding Acquisition: Nonapplicable. Both authors have read and agreed to the published version of the article.
Author Disclosure Statement
M.E.Z. has nothing to disclose. S.A.G. is a consultant for Olympus, Medtronic, Iterative Scopes, Microtech, Cook, and Fujifilm.
Funding Statement
M.E.Z. and S.A.G. have no funding information to declare.
