Artificial intelligence for triaging of breast cancer screening mammograms and workload reduction: A meta-analysis of a deep learning software

Abstract

Objective

Deep learning (DL) has shown promising results for improving mammographic breast cancer diagnosis. However, the impact of artificial intelligence (AI) on the breast cancer screening process has not yet been fully elucidated in terms of potential workload reduction. We aim to assess if AI-based triaging of breast cancer screening mammograms could reduce the radiologist's workload with non-inferior sensitivity.

Methods

PubMed, EMBASE, Cochrane Central, and Web of Science databases were systematically searched for studies that evaluated AI algorithms on computer-aided triage of breast cancer screening mammograms. We extracted data from homogenous studies and performed a proportion meta-analysis with a random-effects model to examine the radiologist's workload reduction (proportion of low-risk mammograms that could be theoretically ruled out from human's assessment) and the software's sensitivity to breast cancer detection.

Results

Thirteen studies were selected for full review, and three studies that used the same commercially available DL algorithm were included in the meta-analysis. In the 156,852 examinations included, the threshold of 7 was identified as optimal. With these parameters, radiologist workload decreased by 68.3% (95%CI 0.655–0.711, I² = 98.76%, p < 0.001), while achieving a sensitivity of 93.1% (95%CI 0.882–0.979, I² = 83.86%, p = 0.002) and a specificity of 68.7% (95% CI 0.684–0.723, I² = 97.5%, p < 0.01).

Conclusions

The deployment of DL computer-aided triage of breast cancer screening mammograms reduces the radiology workload while maintaining high sensitivity. Although the implementation of AI remains complex and heterogeneous, it is a promising tool to optimize healthcare resources.

Keywords

Artificial intelligence breast cancer cancer screening deep learning meta-analysis

Introduction

Breast cancer is the leading cause of cancer death among women worldwide.¹ In order to identify the disease at an early stage and reduce mortality, breast cancer screening programs have been established in many countries. Such programs result in a substantial volume of mammograms and workload associated with their interpretation. In 2022, about 39 million annual mammography procedures were reported in the USA, generating a high pressure on imaging related professionals.²

Workforce shortage in healthcare is a global issue, with added impact in low resource settings. In addition, radiologists and technicians specialized in breast cancer screening are becoming increasingly scarce. For instance, a study revealed that Mexico had less than 300 breast radiologists to interpret mammograms for the 14 million eligible women.³ Even in areas like Europe,⁴ where double reading of mammograms is standard, and the USA,⁵ personnel shortages have also been predicted. Alternative approaches, such as computer-aided screening, are being pursued to support breast cancer screening programs. However, no studies have proven that conventional computer-aided detection methods directly increase screening performance and cost effectiveness or impact workload. The lack of evidence has also made it impractical for computer systems to be used for stand-alone reading in mammography screening.⁶

The development of techniques such as deep learning (DL) convolutional neural networks is revolutionizing the field of artificial intelligence (AI), and automation of some cognitively intense activities is becoming a reality. Well-known examples include self-driving vehicles and advanced speech recognition tools. In medical imaging, DL has shown promising results in several applications including image segmentation, cancer detection, characterization, classification, and monitoring.⁷ There are currently more than 10 algorithms for mammographic interpretation authorized by the US Food and Drug Administration, particularly for use as clinical decision support systems.⁸

Recent research has revealed that DL-based systems have the potential to identify breast cancer on mammography with diagnostic rates comparable to those of radiologists alone.⁹ Compared to breast cancer detection rates, workload improvement has gotten significantly less attention, and how these systems should be used in real-time workflows remains unclear. In this context, the purpose of this systematic review and meta-analysis is to investigate whether AI-based triaging of breast cancer screening mammograms could effectively reduce radiologist workload without compromising sensitivity.

Methods

Protocol and search strategy

The systematic review and meta-analysis were performed in line with recommendations from the Cochrane Collaboration and the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) 2020 statement guidelines.¹⁰ The meta-analysis protocol was registered on PROSPERO on 02 July 2022 (PROSPERO ID: CRD42022341374).

We systematically searched PubMed, EMBASE, Cochrane Central Register of Controlled Trials, and Web of Science databases for studies that met the eligibility criteria published from inception to June 2022. Search strategy consisted of (“artificial intelligence” OR “AI” OR “deep learning”) AND (“breast cancer” OR “breast neoplasm” OR “breast microcalcification” OR “breast”) AND (“screening” OR “triage” OR “CADt” OR “computer-aided triage” OR “computer aided triage”) AND (“mammography” OR “mammograms” OR “mammogram” OR “mammographic” OR “DM”) and was conducted by two different authors (D.X and C.A.C), which had no disagreements. The last search of all databases was conducted on 23 March 2023. No language restrictions were applied to the search.

Data extraction

Two authors (DX and CAC) independently extracted baseline characteristics and outcome data based on predefined search criteria. Information collected from studies included first author, year of publication, study design, country of patient recruitment, patient enrollment, technical specifications, reference standard, DL algorithm, and the number of true positives, true negatives, false positives, and false negatives. If more than one threshold was investigated, we extracted data from the approach with the most available information. In case of disagreements, a third author was consulted (IAM).

Inclusion and exclusion criteria

Inclusion in this meta-analysis was limited to studies that met all the following eligibility criteria: (1) retrospective, prospective, randomized, or non-randomized studies; (2) evaluating AI-based triaging of breast cancer screening mammograms; and (3) reporting any of the outcomes of interest. We excluded papers with overlapping populations, understood as derived from overlapping study locations and recruitment periods. The outcomes of interest were (1) workload reduction, (2) sensitivity, and (3) specificity.

Study quality assessment

The revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool¹¹ was used to assess the quality and potential bias of studies included in the meta-analysis by two independent reviewers (CAC and IAM). Conflicts were resolved with discussion and involvement of the third author (DX).

Statistical analysis

A proportion meta-analysis with a random-effects model from the DerSimonian method and 95% confidence interval (95% CI) was carried out to evaluate the radiologist's workload reduction. The meta-analysis was performed using OpenMeta-Analyst, an open-source, cross-platform software for advanced meta-analysis.¹²

Meta-analysis was performed if a minimum of three papers used the same commercially available algorithm, with results from the same threshold available for data extraction, to estimate sensitivity and specificity. We contacted authors to obtain additional data, if necessary. Workload reduction was defined as the proportion of low-risk mammograms that could be theoretically ruled out from a human's assessment.

Pooled sensitivity and specificity for included studies with a 95% CI were obtained using a random-effects analysis and forest plots were constructed using MetaDTA.¹³ Summary receiver-operating characteristic (SROC) curves using the bivariate method were constructed to display the summary point and the area under the curve (AUC) was calculated. The diagnostic odds ratio (DOR) was computed with the 95% CI. The inconsistency index (I²) was calculated to assess heterogeneity between studies.

Results

Study selection and description

The initial search yielded 1003 studies. After duplicate removal and exclusion through title and abstract reading, we reviewed 14 full articles and included 11 papers (Figure 1) in qualitative synthesis (Table 1),^14,15–26 and three papers in quantitative synthesis (Figures 2 to 4).

Figure 1.

PRISMA 2020 flow diagram of literature search and selection.

Figure 2.

Workload reduction of 68.3% using a Transpara® score of 7 for triaging of breast cancer screening mammograms.

Figure 3.

Sensitivity of 93.1% using a Transpara® score of 7 for triaging of breast cancer screening mammograms.

Figure 4.

Specificity of 68.7% using a Transpara® score of 7 for triaging of breast cancer screening mammograms.

Table 1.

Baseline characteristics of included studies.

Study	Larsen 2022¹⁴	Lauritzen 2022¹⁸	Raya-Povedano 2021¹⁷	Yi 2021¹⁵	Balta 2020¹⁶	Dembrower 2020¹⁹
Country	Norway	Denmark	Spain	USA	Germany	Sweden
Algorithm architecture	Unknown (commercial name: ScreenPoint Transpara 1.7.0)	Unknown (commercial name: ScreenPoint Transpara 1.7.0)	Unknown (commercial name: Transpara 1.6.0)	DeepCAT	Unknown (commercial name: Transpara 1.6.0)	ResNet-34 (commercial name: Lunit 5.5.0)
Mammography machine vendors	Siemens Mammomat Inspiration	Siemens Mammomat Inspiration	Hologic Selenia Dimensions	NA	Siemens Mammomat Inspiration and Hologic Selenia Dimensions	Hologic
Single center or multicenter training	Multicenter	Multicenter	Single center	Multicenter	Single center	Single center
Internal or external data	External	External	External	External	External	External
Triage goal	Triage of normal cases	Triage of normal cases	Triage of normal cases to single or double reading	Triage for all cases	Triage of normal cases to single reading	Triage of normal cases
Ground truth	HP and FU, >2 years	HP and FU, >2 years	HP and review by four readers	HP, no FU	HP, no FU	HP and FU, >2 years
Test threshold	9.13, 9.43, 10	5	7	0.1	7	Bottom 60% AI score
Sample size	122,969	114,421	15,987	1878	18,015	75,534
Age, years	Range: 50–69	59 ± 6^a	58 ± 6^c	NA	NA	53.6 (15.4)^b
Workload definition	Examinations selected not to be interpreted by radiologists	Percentage of mammograms read by the AI system only, corresponding to normal or suspicious mammograms	Screening workload was defined as the number of readings, and an estimate in hours was computed using the average reading time per examination originally reported in this cohort (21): 25 s for a DM examination	Any normal predictions would be excluded from radiologist review	Number of mammogram readings performed by reader 1 and reader 2	AI cancer detector assessed all screening examinations as a single reader without radiologists
Workload reduction	11,638 negative screening results out of 12,383 (threshold of 10)	62.6%	71.5% (double reading); 72.4% (single reading)	53%	32.6%	>60%
Performance	77.9% of the cancers had the highest AI score of 10 (86.8% screen detected and 44.9% of interval cancers)	AI sensitivity of 69.7% and specificity of 98.6%; human sensitivity of 70.8% and specificity of 98.1%	AI sensitivity: 69%; human sensitivity: 67.3%	DeepCAT recommended low priority for 315 images (53%), of which none contained a malignant mass	Cancer detection rate would have remained the same, recall rate would have decreased by 11.8% (from 5.35% to 4.79%), and screen reading workload would have decreased by 32.6%.	When including 60%, 70%, or 80% of women with the lowest AI scores in the “no radiologist” stream, the proportion of screen-detected cancers that would have been missed was 0, 0.3% (95% CI 0·0–4.3), or 2.6% (1.1–5.4), respectively

Study	McKinney 2020²⁰	Rodriguez-Ruiz 2020²¹	Kyono 2020²²	Yala 2019²³	Kyono 2018²⁴
Country	UK and USA	Sweden, UK, Netherlands, USA, Italy, Spain, and Austria	USA	USA	UK
Algorithm architecture	ResNet (V2 50 and V1 50), MobileNetV2 and RetinaNet	Unknown (commercial name: Transpara 1.4.0)	Inception-ResNetV2 and multitask learning (commercial name: AURA)	ResNet-18^c	Inception-ResNetV2 and multi-task learning (commercial name: MAMMO)
Mammography machine vendors	Hologic, GE and Siemens	Hologic, GE, Siemens and Philips	NA	Selenia Dimensions and Selenia; Hologic (Bedford, MA, USA)	NA
Single center or multicenter training	Multicenter	Multicenter	Multicenter	Single center	Multicenter
Internal or external data	Internal	External	External	Internal	External
Triage goal	Triage of normal cases	Triage of normal cases to single or double reading	Triage of normal cases	Triage of normal cases	Triage for all cases
Ground truth	HP and FU, >1 year	HP and review by three readers of 2D and DBT images	HP and review by three readers of 2D and DBT images	HP and FU, >1 year	HP and review by three readers of 2D and DBT images
Test threshold	Negative predictive value >99.0%	10	Negative predictive value >99.0%	Minimum probability score of a radiologist true-positive assessment on the validation	Least patients seen by radiologist without adversely affecting radiologist's FPR or FNR
Sample size	25,856	2892	2000	264,657	8162
Age, y	NA	NA	Range, 40–73	57.8 ± 10.9^a	Range, 40–73
Workload definition	Second reader assessed mammograms (when the decision of the AI system agreed with that of the first reader)	Number of necessary radiologists’ assessments	What percentage of mammograms would need to be read by a radiologist	Triage cancer-free mammograms from the workflow	Overall number of mammograms a radiologist would need to read
Workload reduction	87.98%	44%	34%	19.3%	42.8%
Performance	AI sensitivity of 66.6% and specificity of 96.26%; human sensitivity of 67.39% and specificity of 96.24%	AI sensitivity of 81.4% and specificity of 75.2%; human sensitivity of 81.5% and specificity of 69.9%	At a negative predictive value of 0.99, the proposed model was able to identify 34% (95%CI: 25–43%) and 91% (95%CI: 88–94%) of the negative mammograms for test sets with a cancer prevalence of 15% and 1%	AI sensitivity of 90.1% and specificity of 94.2%; human sensitivity of 90.6% and specificity of 93.5%	AI sensitivity of 76.92% and specificity of 95.14%; human sensitivity of 76.92% and specificity of 95.02%

DBT: digital breast tomosynthesis; FU: follow-up; HP: histopathologic findings; FNR: false-negative rate; FPR: false-positive rate.

^aNumbers are means ± standard deviations.

^bNumbers are medians, with interquartile ranges in parentheses.

^cCode available at https://github.com/yala/OncoNet_public.

All the included studies were retrospective in nature and conducted in the USA or Europe. Study designs and performance thresholds varied broadly, especially in the chosen threshold. A total of six algorithms were utilized; three out of the six computer codes are open-source, and one¹⁵ of them was still being developed and tested. Five papers, out of 11, used the Transpara® system, a software that highlights and rates the suspicious findings with a score between 1 and 100. A proprietary conversion table generates an examination score from 1 to 10, with higher scores suggesting a higher likelihood of a visible cancer being present on the mammography, based on the maximum suspicious finding from the examination.

Algorithm performance was compared to reader performance for all included studies. All studies included histopathologic examination as ground truth.

Accuracy of AI for breast cancer screening and workload reduction

Three studies^14,16,17 that used the same commercially available DL software (Transpara®) provided sufficient data to be included in the meta-analysis to assess workload reduction, sensitivity and specificity. To estimate sensitivity and specificity, the threshold of 7 was used (approximately 70% according to the device specifications), that is, images with a score of 7 or lower were likely normal and those with a score of 8 or higher were considered positive. The cut-off was chosen based on previous research indicating that replacing double reading with single reading for these very likely normal cases would not reduce screening sensitivity by more than 5%.¹⁴ A total of 156,852 examinations were evaluated.

Workload reduction, diagnostic accuracy and heterogeneity

By using the AI algorithm with a score of 7, the radiologist workload significantly decreased by 68.3% (95% CI 0.655–0.711, p < 0.001, I² = 98.76%, Figure 2). The pooled sensitivity was 93.1% (95% CI 0.882–0.979) (Figure 3) and the pooled specificity was 68.7% (95% CI 0.659–0.715) (Figure 4). There was statistically significant heterogeneity for sensitivity (I² = 83.86%, p < 0.01) and specificity (I² = 98.75%, p < 0.01). The pooled SROC curve with the bivariate approach yielded an AUC of 1.0 and the DOR for the studies was 15.6 (95% CI 5.7–25.5, Figure 5).

Figure 5.

Hierarchical summarized receiver-operating curves (HSROC) using the bivariate approach yielded an area under the curve of 1.0 and the diagnostic odds ratio for the studies was 15.6.

Quality assessment

Concerning risk of bias, in the “patient selection” domain, all studies were at unclear risk of bias. In the “index test” domain, two studies were at low risk of bias, and one was unclear. In terms of “reference standard” and “flow and timing,” all studies were regarded as low risk of bias. Regarding applicability concerns, in the “patient selection” domain, all studies were at low risk of concerns. In the “index text” domain, two studies were at low risk of concerns, and one was unclear. Regarding reference standard, all studies show low risk of applicability concerns (Figure 6).

Figure 6.

Summary of risk of bias and applicability concerns.

Discussion

Our comprehensive review of previous research suggests that the implementation of computer-aided screening algorithms into the workflow of screening breast mammograms may have a significant impact in terms of workload reduction. Evaluation of multiple studies showed that incorporating the Transpara® software with a test threshold of 7 resulted in a workload decrease by 68%. If applied to clinical practice, nearly 7 in 10 breast mammograms could potentially be assessed solely by the AI algorithm, whether as a second reader or a stand-alone interpreter. Importantly, the sensitivity and specificity of the screening programs are not compromised by these techniques. By using the AI algorithm with an optimized lesion detection threshold, the achieved sensitivity was 93.1%, indicating a false-negative rate of 6.9%. As a comparator, the false-negative rate of the current standard of care for screening mammograms is 12.5%.²⁵

Vulnerable populations disproportionally suffer from lack of resources, and AI could potentially be applied to decrease this gap. AI triage systems could expand screening programs, decrease turnaround time, and reduce cost of breast cancer screening in low resource settings, providing vulnerable populations with more equitable access. AI-enabled triaging is extremely scalable. In an adjusted screening workflow (i.e. using machine-only reading of cases assigned to be normal as an alternative to single or double reading), algorithms were performing a high volume of analysis and tasks without a negative impact on performance. A challenge in the field is that there is no agreement towards the acceptable “missed cases” rate for a machine-only mammogram triaging system, which has led to a high heterogeneity among study designs and chosen thresholds (above 80%). Additionally, there is a question of the ethics involved in how to manage a cancer diagnosis missed by a computer algorithm. Patients would need to be informed that their images were being read solely by a machine, which may also result in dissatisfaction. There is still much work to be done to address the appropriate implementation of AI-aided breast cancer screening across clinical practices.

Caution should be taken during the clinical implementation of AI products for breast image triage as it is possible that the relative improvement in performance of machine learning (ML) algorithms is overestimated. Even though the three studies included in the quantitative analysis presented with Low Risk regarding Applicability Concerns at QUADAS-2, the risk of bias in ML studies is underestimated. Underreporting of demographic information including resource setting and ethnic diversity used to train and test ML software could limit generalizability in diverse patient populations. Furthermore, since all studies are retrospective, non-randomized, risk of bias was deemed unclear in the “patient selection” domain. These and other common pitfalls in AI research have been extensively highlighted by a previous analysis by our group looking into study designs and bias of ML algorithms in breast oncology.²⁶

Limitations of the current study include the lack of access to raw data to reproduce or examine findings from the published articles, despite effort made to reach out to all corresponding authors. Even though half of the algorithms are available on open-source platforms, not all papers provided training weights, nor the criteria for the chosen cut-off point at which algorithm performance was determined, preventing the development of a deployable model and restricting reproducibility and transparency. The process of selection of studies for inclusion could be a limitation, since it is not always possible to identify if a study is suitable for inclusion solely by reading the abstract. The studies used to obtain the sensitivity, specificity, and AUC were all single center studies, and the sample was heavily influenced by Larsen 2022¹⁵ due to its large sample size. Additional evaluations are needed to understand the broader impact of these AI tools in settings of variable resource availability and diverse populations. Our analysis was performed using data from three studies that all used Transpara® but in two different versions, and the scores differ slightly between versions. This could potentially affect the results. Our findings suggest a potential benefit from adopting Transpara® into a breast imaging workflow, but many other ML algorithms are commercially available for breast imaging triage (CMTriage, HealthMammo, Saige-Q, CogNet QmTriage). These AI products are not included in this analysis because there are no studies assessing their performance in the literature. It may be beneficial, both for providers and patients, to have performance data from these algorithms published and analyzed to ensure safe and practical adoption into health systems.

To our knowledge, this is the first meta-analysis of AI-based triaging of breast cancer screening mammograms applied to radiologists’ workload reduction. A previous systematic review²⁷ assessed seven studies evaluating computer-aided triage. Since then, there have been a high number of publications focused on triage. No prospective studies have yet been published. Our findings show the need to conduct well-planned prospective trials comparing different breast screening programs. To replicate the group imbalance in screening, these prospective studies should contain realistic case proportions, with readers of various expertise engaging with ML algorithm outputs inside the clinical workflow. This will make it possible to evaluate reader performance as well as technological viability, reading time, reader acceptance, and impact on performance. Prospective studies examining the use of ML for mammographic screening are now being conducted in the UK, Norway, Sweden, China, and Russia, and results are awaited.^28–30

AI-enabled triaging of screening breast mammograms could reduce radiology workload and potentially be deployed to decrease healthcare disparities. ML can triage screening breast mammograms at a pace that is unattainable for human readers without compromising performance. Prospective data are necessary to evaluate the interaction between human readers and algorithms, their impact on reader performance and patient outcomes, and to validate algorithm thresholds.

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Debora Xavier

Maxwell Lloyd

Felipe Batalini

References

Sung

Ferlay

Siegel

, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209–249.

MQSA National Statistics | FDA, https://web.archive.org/web/20220827193238/https://www.fda.gov/radiation-emitting-products/mqsa-insights/mqsa-national-statistics (accessed August 27, 2022).

Torres-Mejía

Smith

Carranza-Flores

, et al. Radiographers supporting radiologists in the interpretation of screening mammography: A viable strategy to meet the shortage in the number of radiologists. BMC Cancer 2015;15:410. https://doi.org/10.1186/s12885-015-1399-2.

Rimmer

. Radiologist shortage leaves patient care at risk, warns royal college. Br Med J 2017;359:j4683.

Wing

Langelier

. Workforce shortages in breast imaging: impact on mammography utilization. Am J Roentgenol 2009;192:370–378.

Lehman

Wellman

Buist

, et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med 2015;175:1828–1837.

Farina

Nabhen

Dacoregio

, et al. An overview of artificial intelligence in oncology. Future Science OA 2022;8:FSO787. https://www.future-science.com/doi/10.2144/fsoa-2021-0074 (accessed August 27, 2022).

American College of Radiology Data Science Institute, AI Central, August 27, 2022, https://web.archive.org/web/20220827173022/https://aicentral.acrdsi.org/.

Phang

Park

, et al. Deep neural networks improve radiologists’ performance in breast cancer screening. IEEE Trans Med Imaging 2020;39:1184–1194.

10.

Higgins

JPT

Thomas

Chandler

, et al., (eds). Cochrane handbook for systematic reviews of interventions. Cochrane 2022. https://training.cochrane.org/handbook (accessed August 28, 2022).

11.

Whiting

Rutjes

Westwood

, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011;155:529–536.

12.

OpenMeta-Analyst: Open-Source, Cross-Platform Software for Advanced Meta-Analysis | Colloquium Abstracts. , https://abstracts.cochrane.org/2010-keystone/openmeta-analyst-open-source-cross-platform-software-advanced-meta-analysis (accessed August 28, 2022).

13.

Patel

Cooper

Freeman

, et al. Graphical enhancements to summary receiver operating characteristic plots to facilitate the analysis and reporting of meta-analysis of diagnostic test accuracy data. Res Synth Methods 2021;12:34–44.

14.

Larsen

Aglen

Lee

, et al. Artificial intelligence evaluation of 122 969 mammography examinations from a population-based screening program. Radiology 2022;303:502–511. Epub 2022 Mar 29. PMID: 35348377; PMCID: PMC9131175.

15.

, et al. DeepCAT: deep computer-aided triage of screening mammography. J Digit Imaging 2021; 34: 27–35.

16.

Balta

Rodriguez-Ruiz

Mieskes

, et al. Going from double to single reading for screening exams labeled as likely normal by AI: what is the impact? In: Bosmans

Marshall

VanOngeval

(eds) Proceedings of SPIE: 15th International Workshop on Breast Imaging. vol. 11513. Bellingham, Wash: International Society for Optics and Photonics, 2020, 115130D.

17.

Raya-Povedano

Romero-Martín

Elías-Cabot

, et al. AI-Based Strategies to reduce workload in breast cancer screening with mammography and tomosynthesis: a retrospective evaluation. Radiology 2021;300:57–65.

18.

Lauritzen

Rodríguez-Ruiz

von Euler-Chelpin

, et al. An artificial intelligence-based mammography screening protocol for breast cancer: outcome and radiologist workload. Radiology 2022;304:41–49. Epub 2022 Apr 19. PMID: 35438561.

19.

Dembrower

Wåhlin

Liu

, et al. Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study. Lancet Digit Health 2020;2:e468–e474.

20.

McKinney

Sieniek

Godbole

, et al. International evaluation of an AI system for breast cancer screening. Nature 2020;577:89–94.

21.

Rodriguez-Ruiz

Lång

Gubern-Merida

, et al. Can AI serve as an independent second reader of mammograms? A simulation study. In: Bosmans

Marshall

VanOngeval

(eds) 15th International Workshop on Breast Imaging. vol. 11513. Bellingham, Wash: International Society for Optics and Photonics, 2020,115130O .

22.

Kyono

Gilbert

van der Schaar

. Improving workflow efficiency for mammography using machine learning. J Am Coll Radiol 2020;17:56–63.

23.

Yala

Schuster

Miles

, et al. A deep learning model to triage screening mammograms: a simulation study. Radiology 2019;293:38–46.

24.

Kyono

Gilbert

van der Schaar

. MAMMO: A deep learning solution for facilitating radiologist-machine collaboration in breast cancer diagnosis. (arXiv. October 2018); 30. https://doi.org/10.48550/arXiv.1811.02661

25.

Limitations of Mammograms | How Accurate Are Mammograms?, https://www.cancer.org/cancer/breast-cancer/screening-tests-and-early-detection/mammograms/limitations-of-mammograms.html (accessed November 11, 2022).

26.

Corti

Cobanaj

Marian

, et al. Artificial Intelligence for Prediction of Treatment Outcomes in Breast Cancer: Systematic Review of Design, Reporting Standards, and Bias, Cancer Treat Rev 2022;108. https://doi.org/10.1016/j.ctrv.2022.102410.

27.

Hickman

Woitek

, et al. Machine learning for workflow applications in screening mammography: systematic review and meta-analysis. Radiology 2022;302:88–104.

28.

NHSX. Mia mammography intelligent assessment, https://transform.england.nhs.uk/ai-lab/explore-all-resources/understand-ai/mia-mammography-intelligent-assessment/ (accessed October 8, 2023).

29.

ClinicalTrials.gov. Development of Artificial Intelligence System for Detection and Diagnosis of Breast Lesion Using Mammography, https://clinicaltrials.gov/study/NCT03708978#publications (accessed October 8, 2022).

30.

ClinicalTrials.gov. Experiment on the use of innovative computer vision technologies for analysis of medical images in the moscow healthcare system, https://clinicaltrials.gov/ct2/show/NCT04489992 (accessed October 8, 2022).