Can Artificial Intelligence Accurately Detect Urinary Stones? A Systematic Review

Abstract

Objectives:

To perform a systematic review on artificial intelligence (AI) performances to detect urinary stones.

Methods:

A PROSPERO-registered (CRD473152) systematic search of Scopus, Web of Science, Embase, and PubMed databases was performed to identify original research articles pertaining to AI stone detection or measurement, using search terms (“automatic” OR “machine learning” OR “convolutional neural network” OR “artificial intelligence” OR “detection” AND “stone volume”). Risk-of-bias (RoB) assessment was performed according to the Cochrane RoB tool, the Joanna Briggs Institute Checklist for nonrandomized studies, and the Checklist for Artificial Intelligence in Medical Imaging (CLAIM).

Results:

Twelve studies were selected for the final review, including three multicenter and nine single-center retrospective studies. Eleven studies completed at least 50% of the CLAIM checkpoints and only one presented a high RoB. All included studies aimed to detect kidney (5/12, 42%), ureter (2/12, 16%), or urinary (5/12, 42%) stones on noncontrast computed tomography (NCCT), but 42% intended to automate measurement. Stone distinction from vascular calcification interested two studies. All studies used AI machine learning network training and internal validation, but a single one provided an external validation. Trained networks achieved stone detection, with sensitivity, specificity, and accuracy rates ranging from 58.7% to 100%, 68.5% to 100%, and 63% to 99.95%, respectively. Detection Dice score ranged from 83% to 97%. A high correlation between manual and automated stone volume (r = 0.95) was noted. Differentiate distal ureteral stones and phleboliths seemed feasible.

Conclusions:

AI processes can achieve automated urinary stone detection from NCCT. Further studies should provide urinary stone detection coupled with phlebolith distinction and an external validation, and include anatomical abnormalities and urologic foreign bodies (ureteral stent and nephrostomy tubes) cases.

Introduction

Kidney stone disease (KSD) is a frequent urologic condition affecting 10% of the population in developed countries.¹ KSD prevalence is now estimated to reach a 30% rate in 2050 in the U.S. warm areas according to a climate change-based predictive model. Moreover, KSD has a significant economic impact on health care systems, because of interventional management (25%) and (single [50%] or multiple [10%]) recurrence rates.^1,2 Acute renal colic (ARC) is the most common urologic emergency, causing 120,000 visits per year (1% of total emergency visits).

According to the National Institute for Health and Care Excellence (NICE) guidelines, a noncontrast computed tomography (NCCT) or an ultrasound (US) has to be performed within 24 hours after the initial visit.³ This 24-hour delay is mainly explained by the lack of human resources to analyze NCCT or US in radiologic emergency departments. Therefore, an automatic stone detection could help patients, radiologists, and urologists to obtain an etiologic diagnosis at the initial stage of ARC, and to improve the ARC quality of care.⁴

Artificial intelligence (AI) was born in the early 1950s with Turing's machine and statement “can machines think?”⁵ AI is a scientific field that includes machine learning (ML), aiming to train a machine for specific tasks. Once completed training, the machine would autonomously execute the learned task. With a particular efficiency for learning and detecting text or images, deep learning (DL) represents an ML subcategory, which has been recently spreading among medical applications. From now on, researchers can easily access DL networks for computer vision of medical images. Four learning methods are frequently used for an algorithm or network training: supervised, unsupervised, self-supervised, and reinforcement learning.⁶ Unsupervised learning learns from data without human supervision. Indeed, unsupervised ML models are given unlabeled data and allowed to discover patterns without any explicit instruction.

In self-supervised ML methods, the model generates its own supervision signal from auxiliary tasks such as predicting missing parts of the input data. Reinforcement learning learns to make decisions by interacting with the environment. The model receives feedback in the form of rewards or penalties based on its actions and fits its policy to optimize the reward over time. Finally, supervised learning, that is, the preferred method for DL, is given input and output data to the network. During training, the network will define the rules for the path between them. In the field of urinary stones, DL has been proposed to create surgical outcome predictions and also urinary stone detection and characteristics, from which ARC and nonacute interventional management of KSD could benefit. Indeed, the surgical planning of endourologic procedures should include stone location and size, according to the national and international guidelines.^3,7

Among other characteristics, urologists could take better decisions by having an accurate stone burden estimation without human intervention.

This systematic study aimed to review AI performances to detect urinary stones.

Methods

This study was conducted in accordance with the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) checklist and PROSPERO registered (CRD473152).⁸

Search strategy

A systematic review of literature was carried out on September 4, 2023, using the Scopus, Web of Science, Embase, and PubMed databases. No time period limited the literature research and every publication was considered for screening without restriction.

The search terms (“automatic” OR “machine learning” OR “convolutional neural network” OR “artificial intelligence” OR “detection” AND “stone volume”) were used. Reference lists of selected articles were checked manually for eligible additional articles.

Inclusion and exclusion criteria

Inclusion criteria were: (1) stone volume (SV) or stone detection or distinction with other anatomical structures as vascular or nonvascular calcifications as the main topic of the article; (2) automation or AI process (supervised learning, ML, convolutional neural network) as the main method; (3) full-text available for screening and analysis of the methodology or data; and (4) English-written publications only. Original studies as well as conference abstracts on in vivo studies were considered. Systematic reviews, editorials, and letters were not considered. Exclusion criteria were: (1) manual measurement of SV or manual stone detection only; (2) linear measurement of stones only; (3) ambiguous report of results such as the absence of accuracy on stone detection or SV; and (4) publications in any other language than English.

Data extraction

Two authors (F.P. and D.S.) extracted data independently using a standardized-item form. Conflicts were resolved by selective analysis and consensus. Included studies were assessed for study characteristics and relevant outcomes. Primary outcomes of interest were automation or AI processes and results in either in vitro or in vivo studies. Imaging modalities were also extracted when available, as well as patients' demographics.

Quality assessment: risk-of-bias

A risk-of-bias (RoB) assessment for nonrandomized studies was undertaken independently by two authors (F.P. and D.S.) using the validated Joanna Briggs Institute Critical Appraisal Checklist for nonrandomized studies.⁹ The ROBVIS tool was used for a graphical representation, assuming the randomization bias was not applicable in all studies.¹⁰ Finally, each included study underwent the Checklist for Artificial Intelligence in Medical Imaging (CLAIM).¹¹ Conflicts were resolved by consensus. Comprehensive description of the quality of bias assessment can be found in Supplementary Figure S1.

Statistical analysis

Considering the heterogeneity of the study outcomes and the lack of comparative trials, a meta-analysis was not performed. The structure of the article was decided based upon the consensus of authors.

Results

Literature search

The search identified 94 studies through the database searches and 1 additional study from reference lists of which 12 studies were selected for the final review.^{12

–23} All studies were retrospective, including three multicenter^15,19,20 and nine single-center^{12
–14,16
–18,21
–23} studies (Table 1). The selection process is detailed in Figure 1.

FIG. 1.

PRISMA flow diagram. PRISMA = Preferred Reporting Items for Systematic Review and Meta-Analysis.

Table 1.

Demographic and Imaging Characteristics

Author (year)	Objective	Study design	Participants/samples	Imaging modality
Author (year)	Objective	Study design	Participants/samples	NCCT station	Major NCCT protocol (when available)
Mukherjee et al. (2023)²³	To automate the measurement and tracking of kidney stones on NCCT	Retrospective single-center cohort study	113 symptomatic kidney stone patients between 2006 and 2019, followed by ultralow-dose NCCT between 2017 and 2019 (259 NCCT) No validation data	Unique	Tension: 120 kV Intensity: “Smart mA modulation” Number of electrons: NA Slice thickness: 2.5 mm Interval reconstruction: 1.25 mm
Kim et al. (2023)²²	To automate the detection of urinary stones on NCCT	Retrospective single-center cohort study	469 patients with 487 NCCT, surgically treated between 2015 and 2020. After duplicates and kidney inserted foreign body removal, 410 NCCT included	Multiple (8)	Tension: NA Intensity: NA Number of electrons: NA Slice thickness: 2–2.5 mm Interval reconstruction: 0.39–0.98 mm
Park et al. (2023)²¹	To automate the detection of ureteral stones on NCCT	Retrospective single-center cohort study	150 stone cases and 150 control cases	NA	NA
Elton et al. (2022)²⁰	To automate the detection and volume of kidney stones on NCCT	Retrospective multicenter study	Training and test data set: 1186 colonography CT from 3 centers (91 patients with kidney stones, 89 without kidney stones) Validation data set: 12,351 colonography CT from a single center	Multiple	Tension: 120 kV Intensity: NA Number of electrons: NA Slice thickness: 1.25–3 mm Interval reconstruction: 0.75–1 mm
Babajide et al. (2022)¹⁹	To automate the detection and characteristics of urinary stones on NCCT	Retrospective multicenter study	94 patients (children and adults) from a pediatric and adult center with kidney stones between 2007 and 2020	NA	Tension: NA Intensity: NA Number of electrons: NA Slice thickness: 2 mm Interval reconstruction: NA
Caglayan et al. (2022)¹⁸	To automate the detection of kidney stones on NCCT	Retrospective single-center cohort study	455 patients with NCCT for kidney stones between January 2016 and January 2020 Kidney stones: 405 patients No stones: 50 patients	Unique	Tension: 120 kV Intensity: 100–200 mAs Number of electrons: NA Slice thickness: ≤1.25 mm Interval reconstruction: 2 mm
Li et al. (2022)¹⁷	To automate the detection of kidney and stones on NCCT	Retrospective single center cohort study	260 CT scans were selected for data annotations. 209 scans with stones and 51 scans without stones.	NA	Tension: NA Intensity: NA Number of electrons: NA Slice thickness: 0.42–2.50 mm Interval reconstruction: NA
Jendeberg et al. (2021)¹⁶	To automate the distinction between detection distal ureteral stones from pelvic phleboliths on NCCT	Retrospective single-center cohort study	341 patients, 267 (calculi) vs 217 (phleboliths) Between 2012 and 2014, 1824 NCCT from acute colic patients in emergency department	Multiple (2)	Tension: 120 kV Intensity: 70 mAs Number of electrons: NA Slice thickness: 0.6–1 mm Interval reconstruction: NA
Cui et al. (2021)¹⁵	To automate kidney stone detection and scoring according to S.T.O.N.E. nephrolithometry.	Retrospective multicenter study	625 NCCT from February 2018 to April 2019	NA	NA
Parakh et al. (2019)¹⁴	To automate the detection and characteristics of urinary stones on NCCT	Retrospective single-center cohort study	540 patients with suspected urolithiasis condition and NCCT between January and October 2016. 535 NCCT scans (S1: n = 289; S2: n = 246) included	Multiple (2)	Single- and dual-energy NCCT: Tension: 100–120 kV Intensity: 21–120 mA Number of electrons: NA Slice thickness: 2–5 mm Interval reconstruction: NA
Längkvist et al. (2018)¹³	To automate the detection of urinary stones on NCCT	Retrospective single-center cohort study	465 clinically acquired NCCT scans of patients suffering from suspected renal colic.	Multiple (3)	Tension: 120 kV Intensity: NA Number of electrons: NA Slice thickness: NA Interval reconstruction: NA
Lee et al. (2010)¹²	To differentiate urinary stone and vascular calcifications on NCCT	Retrospective single-center cohort study	56 patients, with confirmed ureter stones by NCCT on preureteroscopy design from May 2003 to August 2003	Unique	Tension: 120 kV Intensity: automated modulation Number of electrons: NA Slice thickness: 5 mm Interval reconstruction: NA

NA = nonavailable; NCCT = noncontrast computed tomography.

Quality assessment: RoB

RoB assessment was applicable to all included studies. Only one study had a high RoB,²¹ with all other studies having moderate RoB (Supplementary Fig. S1a, b). According to the CLAIM checklist, 6 and 11 studies completed at least 80% and 50% of the 42 checkpoints, respectively (Supplementary Fig. S1c). The high RoB study was the only study not reaching 50% of CLAIM checkpoints.²¹

Study design, participants

Among the included studies, all were retrospective and three were multicenter (25%) (Table 1). Studies were focusing on various locations of stones: kidney (5/12, 42%), ureter (2/12, 16%), or both (5/12, 42%). The main objective was always described clearly but differed among studies: stone detection (12/12, 100%), stone characterization (maximum diameter, volume, density, complexity) (5/12, 42%), and stone distinction from a differential diagnosis (2/12, 16%). Regarding the two last articles, one intended to automate distal ureteral stone distinction from phleboliths,¹³ whereas the other one aimed to distinct urinary stones from vascular calcifications.¹² The description of included patients and NCCT varied significantly also among the publications. In regard of the population description, a control group (without urinary stones) was retrieved in four studies (33%).

Patients were selected from a stone former cohort in 10/12 (83%), from “another purpose” NCCT cohort in 1/12 (8%) studies²⁰: six studies included adult^{12,14,15,17,18,22} stone formers, while a single one included pediatric stone formers.¹⁹ Two studies included ARC patients.^13,16 Mukherjee et al. used a multiorgan segmentation data set that included stone disease but was not designed as a dedicated stone former cohort,²³ while Elton et al.'s study used colonography computed tomography with incidental stones.²⁰ In the last one (Park et al.), we did not find details on the cohort constitution.²¹

NCCT protocol

The imaging modalities were heterogeneously described among included studies. Authors reported at least the number of NCCT stations in eight publications.^{12
–14,16,18,20,22,23} Indeed, no detail from NCCT protocols were found for two studies,^15,21 whereas Babajide et al. reported only the slice thickness (Table 1).¹⁹ About NCCT characteristics, the slice thickness was the most reported characteristic (nine studies), followed by tube tension (seven studies). The remaining characteristics were the modulation/intensity +/− rotation duration and interval reconstruction rate. The radiation dose was not retrieved in any studies.

AI method

Method section of each included studies provided a detailed presentation of the AI process used for network training (Table 2). Network's names were systematically cited and if they were pretrained, newly or reshaped designed networks. The data set splitting used “validation” or “test” titles for internal validation in all studies that did not include an external validation. Data set splitting was described in 10 studies. Elton et al. conducted an external validation, whereas other included publications reported an internal validation.²⁰ Supervised training was the chosen method for DL network training in all studies. An automated NCCT annotation was done in two studies, in which an automated kidney segmentation followed by a threshold-based segmentation was realized.^20,23 One study aimed to define a composite method to differentiate vascular calcifications from stones (shape feature and texture criteria).¹²

Table 2.

Artificial Intelligence Methods and Outcomes

Author (year)	Automatic detection/measurement method			Endpoints
Author (year)	Data preprocessing	Training	Validation	Training	Validation
Mukherjee et al. (2023)²³	nnU-Net ML algorithm training for automatic kidney segmentation followed by 130 HU threshold-based stone segmentation Max SV accepted: 250 mm³ Min SV accepted: 3 mm³ Ground truth definition: correction by Human (3 radiologists) screening of automatically detected stones	Kidney segmentation: available data sets (Beyond the Cranial Vault, Multi-Modality Abdominal Multi-Organ Segmentation Challenge 2022, the 2021 Kidney and Kidney Tumor Segmentation Challenge and FLARE 2021 and 2022)	Kidney segmentation: 20 NCCT samples from FLARE database Stones: comparison between ML and human	NA	Kidney segmentation: Dice coefficient = 0.968 ± 0.030 Stone segmentation: Scan with stone: 228/233, sensitivity 97.8% [96.0–99.7] Stone detection: 726/830, sensitivity 87.5% [85.2–89.7] Median and mean false positives per scan: 0 [0, 1] and 0.72 Manual and automated SV concordance: 0.995 [0.993–0.996]
Kim et al. (2023)²²	Manual stone segmentation by two urologists using Seg3D, version 2.5.1 (Ground truth Definition) Data set splitting: fold ratio of training, validation, and test sets = 3:1:1 (82 NCCT per fold)	3 ML models (axial, coronal, sagittal, ensemble and additive models): U-Net++ with 100 epochs and 12 batch size, data augmentation, stochastic gradient descent optimizer. Overfitting avoided during training by checking validation loss improvement every 10 epochs	Internal validation by cross-validation (k-fold = 5)	1. 2D models: Stone detection: axial model sensitivity of 88.92% and a PPV of 85.92% superior to coronal and sagittal ones SV: SV accuracy in axial model was 87.56%, >4% higher to coronal and sagittal ones 2. Ensemble model: False positive rate (0.34 per patient) two times lower than axial model 3. Additive model: Sensitivity (90.97%), SV accuracy (88.44%) No external validation
Park et al. (2023)²¹	1. Fast R-CNN for urinary tract detection and segmentation 2. Watershed method for stone detection 3. Overlapping of the two methods retained as a stone No ground truth definition	Combined deep learning method: R-CNN and watershed method for object detection 10-fold cross-validation: k-fold (10) splitting data: 9:1	Internal validation by cross-validation (k-fold = 10)	Stone detection: Sensitivity 0.90 Specificity 0.91 Accuracy 0.84 No external validation
Elton et al. (2022)²⁰	Stone segmentation using a calcium-based scoring tool with a 130 HU-threshold, min volume of 3 mm³ by two radiologists (Ground truth Definition) Data set splitting: train (n = 90), test (n = 90) External validation 6185 NCCT	Kidney segmentation with a trained 3D U-Net Image denoising (2 methods) and 130 HU-Threshold segmentation Training to classify kidney stones from false positive with data augmentation, Adam optimizer and 8 batch size. Training stopped on validation of F1 score plateau	External validation Database 6185 NCCT F1 score calculation on 400 NCCT	Stone detection: SV >27 mm³/diameters >3.7 mm) sensitivity of 0.91 at a false positive rate of <0.05 per scan. Automated vs manual SV measurement correlation r = 0.95 (SV difference: 0.31 ± 0.92).	Stone detection AUC of 0.95 with a sensitivity of 0.88 and specificity of 0.91 (6/7 of the E4 stones (85%) and 11/14 (78%) of the E3 stones)
Babajide et al. (2022)¹⁹	Manual NCCT annotation by two urologists and two researchers (three times), using MRIcroGL image slicing program Data set splitting: Training: 37 NCCT from adult database Validation: seven NCCT from adult database Pediatric and adult database: inter- and intraobserver variability analysis	Pretrained brain ML MRI segmentation algorithm: U-shaped Network and CNN, 1000 epochs	Internal validation (7 NCCT)	1. ML: detection sensitivity and specificity: 100%. mean SV segmented by the algorithm was smaller than that of the human reviewers (744 vs 589 voxels, p = 0.5), with Dice score: 0.66 ± 0.16. 2. Inter- and intraobserver variability: Interobserver reliability: renal pelvis width (0.30), ureter diameter (0.46), transverse stone length (0.72), anteroposterior stone length (0.78), craniocaudal stone length (0.87), stone Location (0.95). Intraobserver reliability: renal pelvis width (0.97), ureter diameter (0.97), transverse stone length (0.81), anteroposterior stone length (0.98), and craniocaudal stone length (1.00).	NA
Caglayan et al. (2022)¹⁸	Manual annotation by two radiologists without any details Data set splitting by size (0–1, 1–2, >2 cm)	Pretrained xResNet50 CNN with cross-entropy loss, 35 epochs, Adam optimizer Training in three axes separately	On training data No external validation	1. Accuracy on test data: Group 1 (0–1 cm):78% in axial section, 63% in coronal section, 85% in sagittal section in group 1 Group 2: 78% in axial section, 72% in coronal section, and 89% in sagittal section Group 2: 70% in axial section, 64% in coronal section, and 93% in sagittal section. 2. Other Statistics: Group 1 (0–1 cm): Positive predictive value 75.0%, 78.0%, 82.0%; negative predictive value 82.0%, 48.0%, 88.0%, sensitivity 80.4%, 60.0%, 87.2%, specificity 75.9%, 68.5%, 80.0% Group 2 (1–2 cm): Positive predictive value 74.0%, 80.0%, 92.0%, negative predictive value 62.0%, 64.0%, 86.0%, sensitivity 66.1%, 68.9%, 86.7%, specificity 70.4%, 76.1%, 91.4% Group 3 (>2 cm): Positive predictive value 76.0%, 94.0%, 94.0%, negative predictive value 64.0%, 34.0%, 92.0%, sensitivity 67.8%, 58.7%, 92.1%, specificity 72.7%, 85.0%, 93.8%	NA
Li et al. (2022)¹⁷	Manual annotation by kidney and stone segmentation (3DSlicer) by 30 students, correction by instructors and validation by 10 radiologists (Ground truth definition) Image normalization (slice thickness) Data set splitting: training and test sets in a 7:3 ratio with 5-fold cross-validation.	3D UNet, Res U-Net, SegNet, DeepLabV3+ and UNETR training to select the best one, with Adam optimizer, 8 Batch size, Dice and cross entropy combination for loss function, 200 epochs	Internal validation using cross-validation	Network: Kidney Stone Dice, Kidney Dice, Specificity, Sensitivity, Accuracy 1. SegNet 75.42%, 95.50%, 99.96%, 97.50%, 99.94% 2. DeepLabV3 + 41.09%, 65.56%, 99.75%, 70.91%, 99.59% 3. 3D U-Net 77.63%, 96.70%, 99.97%, 97.20%, 99.96% 4. UNETR 61.92%, 77.14%, 99.82%, 82.02%, 99.72% 5. Res U-Net 79.83%, 95.81%, 99.97%, 96.61%, 99.95% In subgroup analysis, Res U-Net for each stone size
Jendeberg et al. (2021)¹⁶	Manual annotation by radiologist (Ground truth definition) Data set splitting: Training: 217 (stone) +167 (Phleboliths) NCCT Test: 50 (stone) +50 (Phleboliths) NCCT	Newly designed 2,5-CNN, Data augmentation Comparison between CNN and seven radiologists in stone-phleboliths distinction, and semiquantitative method	Internal validation	CNN sensitivity, specificity and accuracy of 94%, 90%, and 92% and an AUC of 0.95. CNN: higher accuracy than the mean radiologist accuracy (92% vs 86%, p = 0.03) Semi-quantitative method: low accuracy for both attenuation (0.56 [0.45–0.68]) and volume (0.52 [0.41–0.64])
Cui et al. (2021)¹⁵	Manual image annotation by two radiologists (detection, stone characteristics, scoring) Kidney and 250 HU-threshold stone segmentation Data set splitting: segmentation data set (n = 178) classification data set (n = 314) test set for S.T.O.N.E. grading (n = 133)	Pretrained 3D U-Net with 5-fold cross-validation	Internal validation	Kidney segmentation: training data set mean cross-validation Dice for the next kidney and renal sinus segmentation procedure was 0.97 ± 0.01 and 0.94 ± 0.01; test data set, Dice for the whole kidney and renal sinus segmentation algorithm was 0.97 ± 0.01 and 0.93 ± 0.01, respectively Hydronephrosis: training data set: mean cross-validation testing accuracy, AUC, sensitivity, and specificity of 90.3%, 0.96, 89.0%, and 91.2%, respectively; test data set: AUC for the classification algorithm was 0.97 (95% CI 0.94–0.99). The accuracy, sensitivity, and specificity were 91.9% (95% CI 87.6–95.0), 86.5% (95% CI 74.2–94.4), and 93.4% (95% CI 88.8–96.5), respectively Stone detection algorithm reached a sensitivity of 95.9% (236/246) and a PPV of 98.7% (236/239). Automated and manual stone segmentation results demonstrated a mean Dice of 0.83.
Parakh et al. (2019)¹⁴	Manual annotation by radiologist (stone present or absent), compared with NCCT report Data set splitting: Training 435 NCCT (stone absent: n = 206; stone present: n = 229) Validation: 60 NCCT (n = 30 from each scanner) randomly selected Test: 100 NCCT (n = 5 from each scanner) randomly selected	ImageNet, GrayNet pretrained CNN models with segmentation (30 epochs, minibatch stochastic gradient descent, 64 batch size)	Internal validation	From both scanners database, AUC for GrayNet-SB (0.954) was higher than ImageNet-SB (0.936) and Random-SB (0.925). GrayNet: sensitivity 94% [87.4, 100], specificity 96.0% [90.6, 100], PPV 95.9% [90.3, 100], NPV 94.1% [87.7, 100], accuracy 95% [90.7, 99.3]
Längkvist et al. (2018)¹³	Manual annotation using connected components and HU thresholding segmentation Data set splitting: Training 80% of data set (349 NCCT) Test: 20% of data set (88 NCCT)	CNN using data augmentation, minibatch SGD, 100 batch size, Overfitting avoiding during training by checking validation loss improvement every 10 epochs	Internal validation	Sensitivity of 100% and an average of 2.68 false-positives
Lee et al. (2010)¹²	Manual image annotation by two radiologists: Shape features (dispersions, convex hull depth, and lobulation count) Internal texture of a lesion (edge density, skewness, DHV, and the GLCM No data set splitting	ANN training with sigmoid function	Internal validation	Dispersions showed a statistical difference between ureter stones and vascular calcifications (p < 0.05). For the internal texture features, skewness and DHV showed statistical differences between ureter stones and vascular calcifications (p < 0.05). ANN performance: AUC value was 0.85 for the shape parameters and 0.88 for the texture parameters.

ANN = artificial neural network; AUC = area under the curve; CI = confidence interval; CNN = convolutional neural network; DHV = difference histogram variation; GLCM = gray-level co-occurrence matrix moment; ML = machine learning; SGD = stochastic gradient descent; SV = stone volume.

The other studies ensured the ground truth definition by a manual annotation, that is, stone segmentation, of each NCCT certified by at least one radiologist or urologist. The network compilation and training characteristics were described in five studies, as well as overfitting dealing methods or data augmentation.^{13,14,17,18,20,22}

Outcomes and statistics

All included studies presented results from a described statistic method (Table 2). However, statistical contents were heterogenous with various calculations or criteria: Dice score, F1 score, sensitivity, specificity, false positive, accuracy, positive predictive value, negative predictive value, concordance, or correlation. For studies using a first step of kidney segmentation followed by threshold-based segmentation, excellent performances were reported: 0.968 Dice score with a manual-automated SV concordance of 0.995 [0.993–0.996] in Mukherjee et al.'s work, and sensitivity/specificity of 88% and 91%, respectively, with 0.95 correlation between manual and automated segmentation in the external validation cohort for Elton et al.^20,23 Finally in Cui et al.'s study, the kidney segmentation Dice was 0.97 with an excellent concordance.¹⁵ One-step stone segmentation achieved stone detection and SV Dice scores of 0.79 and 0.66, respectively.^17,19

For stone detection, trained networks reported sensitivity and specificity rates ranging from 58.7% to 100%^{13

–23} and 68.5% to 100%,^{14,16

–21} respectively. According to the clinical context, sensitivity varied from 66.1% to 97.5%, 88%, 94% to 100%, 87.5% to 88% in studies that included adult, pediatric stone formers, suspected ARC, nonstone formers, respectively. In Park et al.'s study, a 90% sensitivity rate was reported but no clinical context was given. Studies that focused on suspected acute colic patients included also control cases without stones.^13,16 In three studies, eligibility criteria considered stone size limits.^18,20,23

Using the two-step segmentation methods, Mukherjee et al. limited the stone size as 3 to 250 mm,³ while Elton et al. considered stone to be greater than 3 mm³, providing similar stone detection sensibilities (87%–88%). Caglayan and colleagues divided cases according to the axial diameter (0–10, 10–20, and >20 mm), showing higher sensitivities in larger stones.¹⁸

Considering accuracy, six studies found 63% to 99.95% rates in the stone detection.^{14,16
–18,21,22} Overall Dice score ranged from 0.83 to 0.97.^15,17 Focusing on the quantitative comparison between AI-automated and manual SV measurements, five studies reported included such comparisons.^{19,20,22
–24} Overall, the correlation between network-generated and manual SV (ground truth) ranged from 88.44% to 99.5%.^20,22,23 Moreover, a 0.31 ± 0.92 mm³ SV difference was reported in Elton et al.'s study, with higher SV by manual measurements according to Babajide et al.^19,20

Furthermore, a distinction has to be made according to the NCCT annotation method used is these studies (one-step stone segmentation or two-step kidney segmentation followed by 130 HU-threshold segmentation). The latter one was associated with better AI-manual SV concordance/correlation that a one-step stone segmentation, even if we can legitimately consider 88.4% as a clinically impactful correlation for SV estimation. However, the two-step annotation method was only feasible on kidney stones and could not include ureteral stones consecutively.

Kidney and ureteral stone detections were associated with similar outcomes but distinct methods were available to achieve the ground truth definition (NCCT annotation). For kidney stones, an automated kidney segmentation and threshold-based segmentation was achievable, which could not be transposed to ureteral stones, because of the absence of contrast agent injection in NCCT. A single study aimed to differentiate distal ureteral stones from phleboliths.¹⁶ Jendeberg and colleagues DL method was associated with a sensitivity, specificity, and accuracy of 94%, 90%, and 92%, and a higher accuracy than the mean radiologist accuracy (92% vs 86%, p = 0.03).

Discussion

NCCT and stone diagnosis

NCCT technologic characteristics

NCCT is a routine imaging modality for both initial diagnosis and follow-up of ARC. Overpassing US except for young people, children and pregnant women as the first-line imaging in case of suspected ARC, a low dose NCCT has to be offered urgently (within 24 hours of presentation), according to NICE guidelines.³ If scrolling NCCT images is a common task for urologists, NCCT protocol knowledge is rarely widespread within the urology community. Overall, NCCT protocols include multiple parameters such as irradiation dose, intensity and tube tension, and slice thickness.

As the most clinically relevant one, NCCT irradiation dose refers to the ALARA principle (“As Low As Reasonably Achievable”) by obtaining the best possible information but with the safest parameters and lowest radiation exposure.²⁵ Differently speaking, ALARA means avoiding exposure to radiation that does not have a direct benefit to your purpose, even if the dose is small. Therefore, detailing irradiation dose (standard, low, or ultralow dose) in the NCCT protocol seems mandatory for any clinical or preclinical study.

To better understand intensity and tube tension, a historical point of view has to be developed. Using a rotating X-ray source on one side and a detector on the other side of the patient, NCCT is based on tissue attenuation to visualize organs in three dimensions. Initially reported by Wilhelm Conrad Röntgen in 1895, X-rays are created from electric current and acceleration between cathode and anode in a tube, with two main characteristics: intensity (of the current, mA) and tension (between cathode and anode, kV).²⁶

The last parameter is the slice thickness that can potentially refer to two distinct entities: first, detector slice thickness that used to describe the size of the individual components of the detector array, and correlates to thickness of the thin slice series. For example, if NCCT was acquired using a detector slice thickness of 2 mm, an image with voxel size less than 2 mm along the z-axis cannot be generated. On the contrary, reconstruction slice thickness determines the voxel depth of your multiplanar reconstructions. That represents how much data are included in a single slice. Commonly, slice thickness refers to the reconstruction slice thickness. Slice thickness ultimately determines the trade-off in image quality between spatial resolution (how clearly you can differentiate small changes in the image) and image noise (the standard deviation of the image). Thus, increasing slice thickness will decrease spatial resolution and image noise.

Overall, our research identified two publications that did not report details about the NCCT protocol. Thus, the described characteristics varied among studies, the slice thickness being reported in 75% of cases. Tension and intensity were more scarcely reported.

Impact of NCCT protocols on KSD

Low-dose NCCT has been proposed and achieved good performances in stone detection and measurements (sensitivity and specificity of 99% and 94%, respectively), by reducing intensity and exposure duration (mA and mAs, respectively).²⁷ An automated current modulation as in Lee et al.'s or recently Mukherjee et al.'s studies adapt the current to the tissue attenuation to avoid information loss in low-dose NCCT.^12,23 Brisbane et al. acknowledged low-dose NCCT for stone diagnosis or follow-up except in case of body mass index >30 kg/m² because of tissue attenuation.²⁸ Moreover, image quality and accuracy tend to decrease with reduced tube current but in reasonable proportions for stone detection. Thus, NCCT using 140, 100, 60, 30, 15, and 7.5 mAs settings resulted in 98%, 97%, 97%, 96%, 98%, and 97% sensitivity, and 83%, 83%, 83%, 86%, 80%, and 84% specificity for small stone detection (3–7 mm) in cadaveric ureters, respectively.²⁹

These results are consistent with a recent animal study, confirming mA (intensity or dose) reduction is feasible for stones without losing information.³⁰ On its side, decreasing tube tension results in contrast enhancement (vascular/calcification/bones) and higher global attenuation, but 100 to 120 kV tension setting is efficient for calcium-based structure analysis in abdominal NCCT.³¹

Lastly, low-dose NCCT frequently includes greater slice thickness, but even ultralow-dose NCCT can avoid slice thickness increase and consequently small stone misdiagnosis with an adequate protocol.²³

In summary, if describing the NCCT protocol for research purposes is required for qualitative analysis, low kVp and low-intensity settings seem acceptable for stone detection and quantification. The only parameter that could be mandatory to report is the slice thickness, which should not exceed 2 mm to avoid small stone misdetection. Moreover, a network trained with various NCCT protocols could present a better external validity.

AI efficiency in stone detection

Turing introduced AI in the well-known essay “The Imitation Game” in 1950.⁵ Replacing the question “Can a machine think?” by the different steps of task learning as close as an anime mind could do, Turing acknowledged for the first time the concepts of “rules,” “stores,” and “control,” still used in current AI experiments. Thus, with almost infinite possible combinations, AI can automate tasks for which a model or network is trained with input and output material. Training consists in defining the rules to find a path from given native to annotated data. That being said, AI can achieve one action in several ways, as shown in the present review. In the field of stone detection, various methods and networks have been described for data preprocessing (annotation), with two predominating segmentation methods: kidney segmentation followed by threshold-based segmentation or direct stone segmentation.

These methods presented similar outcomes, but only the second one can achieve an efficient ureteral stone segmentation in NCCT. Indeed, in Park et al., the Fast R-CNN was able to segment the urinary tract, a second method (“watershed”) was needed to reach an 84% detection rate only.²¹ Furthermore, segmentation seems to overpass other detection methods, with a simple quantification process. A segment is a three-dimensional region of interest (voxel) that can be recorded also as cubic millimeter (mm³). If currently the SV is not the gold-standard measurement (i.e., stone maximum diameter [SMD]) in international guidelines, SV seems more accurate to estimate the stone burden, especially for irregular or complex stone shapes.^3,7,32 Moreover, obtaining the maximum linear dimension, that is, SMD, its maximum density (HU) from a segment is feasible with several free user-friendly imaging software on daily practice.^33
–35

Our review found 2D and 3D networks with distinct architectures and reported better outcomes with 3D ones for stone detection.²² When comparing several 3D networks, Li et al. reported the Res U-Net as the most efficient for kidney stones. Recently, Elton et al. first reported a trained network for kidney stone detection with a proper external validation, that is, data from another center that have never been shown by the network during training.²⁰ Four studies used cross-validation to improve an internal validation and increase the training data set.^15,17,21,22

The cross-validation method consists in multiple data sets splitting (k times, i.e., k-fold cross-validation) and multiple training phases with various positions of the validation fold in the data set. At the end of each cycle (training and validation), performance metrics are calculated to assess how the network is performing. It tries to demonstrate that the network performance is not due to the random split of data, and would perform similarly in real-world conditions, but is less robust than an external validation.³⁶

Transfer learning represents another technical aspect of AI processes. In addition to input and output data, a network and its architecture are required to achieve an efficient automated task.³⁶ Researchers are given two options: build their own network or reuse a previous network that has been trained for a similar task. The first option is more challenging than the second one because it requires high coding skills. On the contrary, the second one is easier to infer as ML has spread in the research community with a large offer of pretrained networks for segmentation or image classification. Our review recorded 10 studies that chose to use pretrained networks.^{12,14,15,17

–23} The last two publications were focusing on a new task (phlebolith-distal ureteral stone distinction, 2021) or the first intending to achieve urinary stone detection (2018).^13,16

Up to date, no algorithm has been validated for urinary (kidney and ureter) stone detection and distinction between distal ureteral stones and phleboliths. Regarding this last task, a single study showed promising results (sensitivity, specificity, and accuracy of 94%, 90%, and 92% and a higher accuracy than the mean radiologist accuracy [92% vs 86%, p = 0.03]) but without external validation.¹⁶ Furthermore, most studies excluded complex cases such the presence of foreign bodies (nephrostomy tubes, ureteral stents, hip prosthesis) or phleboliths. Indeed, solitary kidneys, atrophic kidneys, renal anomalies, calcified renal masses, renovascular calcifications, regional lymph node calcifications, metallic implants, pigtail ureteral catheters, percutaneous nephrostomy catheters, artifacts were excluded from Caglayan et al.'s study.¹⁸ Therefore, integrating these trained networks in the clinical decision-making process does not seem feasible at the moment.

Moreover, studies primary objective differed on the location of the stone that was looked for detection, quantification, and distinctions from phlebolith or vascular calcifications, as said before. Furthermore, a single study conducted a proper external validation. Thus, to further advance in integrating trained neural network models in daily practice, further studies should focus on both ureteral and kidney stone detection, coupled with phlebolith distinction. Using a pretrained 3D network with both internal and external validations seems adequate for this purpose.

Design, outcomes, and statistics

As part of the heterogeneity in the reviewed data, participants, outcomes, and statistics varied among studies. First, the NCCT selection and database screening involved nonkidney stone formers in some studies, as shown in Elton et al.'s study with colonography CTs, for example.²⁰ The presence of a control group without urinary stones was reported in only six studies.^{14,16
–18,20,21} In case of ureteral stones, the great variability of the stone location can justify the absence of a control group: all images without stones but an “empty” ureter can be considered control images. On the contrary, kidney stones' location varies slightly. Therefore, a control group without stone appears mandatory, but was described only for studies that used a direct stone segmentation.^17,18,20 As shown previously, the two remaining studies conducted a two-stage kidney stone segmentation method, explaining why authors judged unnecessary to include a control group.^15,23

Our research found also a high variability rate regarding the provided clinical data: no data,^12,13,21 clinical data (including age, gender, or urinary abnormality or variation),^{14
–16,18,19,22,23} and stone characteristics (SMD, density [HU], or SV).^{14

–20,22,23} Clinical and stone data are primordial in AI studies as well as in clinical trials for both outcomes analysis and generalization. Therefore, providing demographic and stone characteristics appears mandatory for further studies. Similarly, NCCT details were partially lacking in a non-negligible number of included studies, and a high degree of heterogeneity in NCCT protocols. However, AI-network performances differed in an acceptable range, but most studies conducted an internal validation that lowers the reliability of their findings for clinical practice.

Furthermore, the data annotation method varied among studies, without any detail on how the bone window was defined (manually or presetting) for stone segmentation. If a consensus has been reached on using the bone window for stone measurements on NCCT to avoid overestimation, it has been show that a manual bone window is reliable and accurate.^37
–39

Heterogeneity lay also in the outcomes and statistics. Among the 12 included studies, we recorded 11 different statistical criteria. On one hand, standard probability tests such as sensitivity, specificity, and positive and negative predictive values were frequently reported. Thus, they are common measures to assess the performance of diagnostic tests or classification models, such as ML networks. On the other hand, the Dice score, also known as the Sorensen-Dice coefficient or F1 score, is a statistical measure used to assess the similarity or overlap between two sets or groups, ranging from 0 (no overlap) to 1 (total similarity). It is commonly used in various fields, including image segmentation, natural language processing, and information retrieval, to evaluate the agreement or similarity between two sets of data.⁴⁰

Accuracy differs from Dice by measuring only true values (percentage of correct detection), whereas Dice score includes also false values.⁴¹ Dice score could represent the best criteria to analyze AI network performances in the field of image segmentation and was recorded in four studies.^15,17,19,23 Table 3 summarizes the encountered metrics and provides an ML-oriented definition for each.

Table 3.

Statistics in Artificial Intelligence Segmentation Models

Metrics	Formula	Definition
Se (recall)	Se = TP/(TP + FN)	Proportion of actual positive cases correctly identified by the model
Sp	Sp = TN(TN + FP)	Proportion of actual negative cases correctly identified by the model
PPV (precision)	PPV = TP/(TP + FP)	Proportion of positive predictions that are actually true positives
NPV	NPV = TN/(TN + FN)	Proportion of negative predictions that are actually true negatives
Acc	Acc = (TP + TN)/(TP + TN + FP + FN)	Proportion of correct predictions out of all data
Dice score (or F1 score)	Dice = 2 × TP/(2 × TP + FN + FP)	Degree of similarity or overlap between predictions and ground truth values

Acc = accuracy; NPV = negative predictive value; PPV = positive predictive value; Se = sensitivity; Sp = specificity.

Clinical implications and surgical planning

Our literature review intended to report the current evidence on urinary stone automated detection on NCCT. This radiologic classification task, among other oncologic and nononcologic imaging interpretations, can benefit from AI and supervised learning, with a direct improvement for patients. Involving AI processes in the radiology field does not aim to and will not overcome radiologists, but radiologists with AI will provide better interpretations compared with radiologists without AI.^42,43 In the initial stage of ARC, an automated stone detection would help to improve patients' path from emergency departments to urology clinics in several ways. First, radiologists will have more time dedicated to find what is usually missed on NCCT because of focusing on finding the stone (the “unreported data”), such as urologic (anatomical variation, clots, small renal mass, vascular abnormalities) and non-urologic findings.

Moreover, the decision of contrast agent injection for nonstone-related ARC will be facilitated by the gained time on standard cases. Therefore, an etiologic diagnosis will be given to a higher proportion of patients, even in case of nonstone-related ARC. Then, a greater amount of NCCT can be interpreted in the same amount of time, which would increase NCCT access in emergency for patients, currently limited because of human resources.³ Regarding this last aspect, we acknowledge that the new potential human limit for NCCT access in case of ARC would be the reasonable number of NCCTs a radiographer can perform. A recent clinical audit conducted by the British Association of Urological Surgeons (BAUS) of ureteral stone care pathways reported female patients to have lower access to NCCT performed within 24 hours of ARC presentation (13% vs 7.3% for men [chi-squared p = 0.01]).⁴ We can reasonably think AI-aided imaging to solve discrepancies in access. Finally, an AI-detected stone could trigger an automated clinic apportionment with the urology department, improving without any doubt the follow-up. Thus, recent publications have demonstrated a threefold reduction of the duration between referral and treatment by the creation of a dedicated ARC clinic.⁴⁴ Consequently, having an automated method for stone detection could reduce even more this waiting time, but will reach the available human resource limit and the reasonable delay for spontaneous passage or medical expulsive therapy (2–4 weeks).⁷

A step further in integrating AI to clinical practice could consist in large language models for automated radiology reports given to radiologists. A recent experiment using ChatGPT has shown promising results in generating accurate reports.⁴⁵ However, authors emphasized some incorrect statements, missed relevant medical information, and potentially harmful passages. AI integration, instead of practitioner replacement, will solve the responsibility issue inherent with AI misdiagnosis or unreported data, but a special attention has to be given to ethical principles for the application of AI to health care and in urology.⁴⁶

Finally, automated stone detection will facilitate endourologic procedures planning by automated stone burden estimation. From NCCT segments, SV is easily accessible in daily practice, from which a lithotripsy duration can accurately be calculated for flexible ureteroscopy.^32,35,47,48 After being applied to the stone diagnosis and quantification, AI networks will carry on surgical planning and pursue to improve patients' care.

Conclusion

AI and DL processes can detect measure urinary stones from NCCT. Currently, trained networks do not compile all requirements for stone detection on NCCT: ureteral and kidney stones, and phlebolith-distal ureteral stone distinction. Further studies are needed, providing an external validation to generalize the presented results, including complex cases with anatomical abnormalities and frequent urologic foreign bodies (ureteral stent and nephrostomy tubes). Stone detection on NCCT represents the future management for early emergency and urologic planning stages of urolithiasis.

Availability of Data and Materials

The data sets used and analyzed during this study are available from the corresponding author upon reasonable request.

Research Involving Human Participants or Animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Footnotes

Authors' Contributions

F.P., D.S.: Conceptualization, methodology, data collection and analysis, writing—original draft, and writing—review and editing. H.C.-S., Y.P., C.A., V.A., S.C., S.A.: Writing—review and editing.

Author Disclosure Statement

The authors declare that they have no conflict of interest. but F.P. has declared consultancy for Dornier MedTech. D.S. has declared educational work with Olympus, Storz, and Cook. S.A. has declared educational work with Storz.

Funding Information

EUSP Scholarship of the European Association of Urology (FPT) (grant number: 2023-002). French Association of Urology Research Grant (FPT).

Supplementary Material

Supplementary Figure S1

Abbreviations Used

References

Stamatelou

, Goldfarb

. Epidemiology of kidney stones. Healthcare (Basel), 2023; 11(3):424; doi: 10.3390/healthcare11030424

Brikowski

, Lotan

, Pearle

. Climate-related increase in the prevalence of urolithiasis in the United States. Proc Natl Acad Sci U S A, 2008; 105(28):9841–9846; doi: 10.1073/pnas.0709652105

Recommendations | Renal and Ureteric Stones: Assessment and Management | Guidance | NICE. NICE; 2019. Available from: https://www.nice.org.uk/guidance/ng118/chapter/Recommendations#diagnostic-imaging [Last accessed: February 7, 2023 ].

Finch

, Calvert

, Fowler

, et al. Enabling national improvement in quality of care for renal colic. BJU Int, 2023; 131(5):602–610; doi: 10.1111/bju.15936

Turing

. I.—Computing machinery and intelligence. Mind, 1950; LIX(236):433–460; doi: 10.1093/mind/LIX.236.433

Chollet

Deep Learning with Python. Manning Publications Co: Shelter Island, NY; 2018.

Türk

, Petřík

, Sarica

, et al. EAU guidelines on interventional treatment for urolithiasis. Eur Urol, 2016; 69(3):475–482; doi: 10.1016/j.eururo.2015.07.041

Liberati

, Altman

, Tetzlaff

, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. PLoS Med, 2009; 6(7):e1000100; doi: 10.1371/journal.pmed.1000100

JBI Critical Appraisal Tools | JBI. Available from: https://jbi.global/critical-appraisal-tools [Last accessed: October 16, 2023].

10.

Risk of bias tools—Robvis (visualization tool). Available from: https://www.riskofbias.info/welcome/robvis-visualization-tool [Last accessed: October 31, 2023].

11.

Mongan

, Moy

, Kahn

. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A guide for authors and reviewers. Radiol Artif Intell, 2020; 2(2):e200029; doi: 10.1148/ryai.2020200029

12.

Lee

, Kim

, Hwang

, et al. Differentiation of urinary stone and vascular calcifications on non-contrast CT images: An initial experience using computer aided diagnosis. J Digit Imaging, 2010; 23(3):268–276; doi: 10.1007/s10278-009-9181-0

13.

Längkvist

, Jendeberg

, Thunberg

, et al. Computer aided detection of ureteral stones in thin slice computed tomography volumes using convolutional neural networks. Comput Biol Med, 2018; 97:153–160; doi: 10.1016/j.compbiomed.2018.04.021

14.

Parakh

, Lee

, et al. Urinary stone detection on CT images using deep convolutional neural networks: Evaluation of model performance and generalization. Radiol Artif Intell, 2019; 1(4):e180066; doi: 10.1148/ryai.2019180066

15.

Cui

, Sun

, Ma

, et al. Automatic detection and scoring of kidney stones on noncontrast CT images using S.T.O.N.E. nephrolithometry: Combined deep learning and thresholding methods. Mol Imaging Biol, 2021; 23(3):436–445; doi: 10.1007/s11307-020-01554-0

16.

Jendeberg

, Thunberg

, Lidén

. Differentiation of distal ureteral stones and pelvic phleboliths using a convolutional neural network. Urolithiasis, 2021; 49(1):41–49; doi: 10.1007/s00240-020-01180-z

17.

, Xiao

, Liu

, et al. Deep segmentation networks for segmenting kidneys and detecting kidney stones in unenhanced abdominal CT images. Diagnostics, 2022; 12(8):1788; doi: 10.3390/diagnostics12081788

18.

Caglayan

, Horsanali

, Kocadurdu

, et al. Deep learning model-assisted detection of kidney stones on computed tomography. Int Braz J Urol, 2022; 48(5):830–839; doi: 10.1590/S1677-5538.IBJU.2022.0132

19.

Babajide

, Lembrikova

, Ziemba

, et al. Automated machine learning segmentation and measurement of urinary stones on CT scan. Urology, 2022; 169:41–46; doi: 10.1016/j.urology.2022.07.029

20.

Elton

, Turkbey

, Pickhardt

, et al. A deep learning system for automated kidney stone detection and volumetric segmentation on noncontrast CT scans. Med Phys, 2022; 49(4):2545–2554; doi: 10.1002/mp.15518

21.

Park

, Eun

S-J

, Na

. Development and evaluation of urolithiasis detection technology based on a multimethod algorithm. Int Neurourol J, 2023; 27(1):70–76; doi: 10.5213/inj.2346070.035

22.

Kim

, Song

, Park

, et al. Deep-learning segmentation of urinary stones in noncontrast computed tomography. J Endourol, 2023; 37(5):595–606; doi: 10.1089/end.2022.0722

23.

Mukherjee

, Lee

, Elton

, et al. Fully automated longitudinal assessment of renal stone burden on serial CT imaging using deep learning. J Endourol, 2023; 37(8):948–955; doi: 10.1089/end.2023.0066

24.

Cui

, Tan

, Christiansen

, et al. The utility of automated volume analysis of renal stones before and after shockwave lithotripsy treatment. Urolithiasis, 2021; 49(3):219–226; doi: 10.1007/s00240-020-01212-8

25.

CDC. ALARA—As Low As Reasonably Achievable. 2022. Available from: https://www.cdc.gov/nceh/radiation/alara.html [Last accessed: October 30, 2023 ].

26.

Busch

. Claims of priority—The scientific path to the discovery of X-rays. Z Für Med Phys, 2023; 33(2):230–242; doi: 10.1016/j.zemedi.2022.12.002

27.

Niemann

, Kollmann

, Bongartz

. Diagnostic performance of low-dose CT for the detection of urolithiasis: A meta-analysis. AJR Am J Roentgenol, 2008; 191(2):396–401; doi: 10.2214/AJR.07.3414

28.

Brisbane

, Bailey

, Sorensen

. An overview of kidney stone imaging techniques. Nat Rev Urol, 2016; 13(11):654–662; doi: 10.1038/nrurol.2016.154

29.

Jellison

, Smith

, Heldt

, et al. Effect of low dose radiation computerized tomography protocols on distal ureteral calculus detection. J Urol, 2009; 182(6):2762–2767; doi: 10.1016/j.juro.2009.08.042

30.

Talso

, Emiliani

, Froio

, et al. Low-dose CT scan in stone detection for stone treatment follow-up: Is there a relation between stone composition and radiation delivery? Study on a porcine-kidney model. Minerva Urol Nefrol, 2019; 71(1):63–71; doi: 10.23736/S0393-2249.18.03265-4

31.

Dion

, Berger

, Hélie

, et al. Dose reduction at abdominal CT imaging: Reduced tension (kV) or reduced intensity (mAs)? [in French]. J Radiol, 2004; 85(4 Pt 1):375–380; doi: 10.1016/s0221-0363(04)97596-8

32.

Panthier

, Doizi

, Illoul

, et al. Developing free three-dimensional software for surgical planning for kidney stones: Volume is better than diameter. Eur Urol Focus, 2021; 7(3):589–590; doi: 10.1016/j.euf.2020.06.003

33.

Anonymous. Horos Project—Free DICOM Medical Image Viewer. 2020. Available from: https://horosproject.org/ [Last accessed: January 28, 2020 ].

34.

Fedorov

, Beichel

, Kalpathy-Cramer

, et al. 3D Slicer as an image computing platform for the quantitative imaging network. Magn Reson Imaging, 2012; 30(9):1323–1341; doi: 10.1016/j.mri.2012.05.001

35.

Ziemba

, Li

, Gurnani

, et al. A user-friendly application to automate CT renal stone measurement. J Endourol, 2018; 32(8):685–691; doi: 10.1089/end.2018.0326

36.

Anonymous. Keras U-Net Starter—LB 0.277. 2019. Available from: https://kaggle.com/keegil/keras-u-net-starter-lb-0-277 [Last accessed: October 23, 2019 ].

37.

Eisner

, Kambadakone

, Monga

, et al. Computerized tomography magnified bone windows are superior to standard soft tissue windows for accurate measurement of stone size: An in vitro and clinical study. J Urol, 2009; 181(4):1710–1715; doi: 10.1016/j.juro.2008.11.116

38.

Soomro

, Hammad Ather

, Salam

. Comparison of ureteric stone size, on bone window versus standard soft-tissue window settings, on multi-detector non-contrast computed tomography. Arab J Urol, 2016; 14(3):198–202; doi: 10.1016/j.aju.2016.06.006

39.

Peyrottes

, Chicaud

, Fourniol

, et al. Clinical reproducibility of the stone volume measurement: A “Kidney Stone Calculator” study. J Clin Med, 2023; 12(19):6274; doi: 10.3390/jcm12196274

40.

Anonymous. Understanding Dice Coefficient. Available from: https://kaggle.com/code/yerramvarun/understanding-dice-coefficient [Last accessed: October 30, 2023 ].

41.

Anonymous. The Basics of Classifier Evaluation: Part 1. 2015. Available from: https://www.svds.com/the-basics-of-classifier-evaluation-part-1/ [Last accessed: October 30, 2023 ]

42.

Cacciamani

, Sanford

, Chu

, et al. Is artificial intelligence replacing our radiology stars?. Not yet! Eur Urol Open Sci, 2023; 48:14–16; doi: 10.1016/j.euros.2022.09.024

43.

Hoppe

, Rueckel

, Dikhtyar

, et al. Implementing artificial intelligence for emergency radiology impacts physicians' knowledge and perception: A prospective pre- and post-analysis. Invest Radiol, 2024; 59(5):404–412; doi: 10.1097/RLI.0000000000001034

44.

Cullen

, Kum

, Scott

, et al. Introduction of a dedicated colic clinic reduces referral to treatment times in patients managed expectantly with acute ureteric colic: A quality improvement project. BMJ Open Qual, 2023; 12(3):e002168; doi: 10.1136/bmjoq-2022-002168

45.

Jeblick

, Schachtner

, Dexl

, et al. ChatGPT makes medicine easy to swallow: An exploratory case study on simplified radiology reports. Eur Radiol, 2023. [Epub ahead of print]; doi: 10.1007/s00330-023-10213-1

46.

Cacciamani

, Chen

, Gill

, et al. Artificial intelligence and urology: Ethical considerations for urologists and patients. Nat Rev Urol, 2024; 21(1):50–59; doi: 10.1038/s41585-023-00796-1

47.

Panthier

, Kutchukian

, Ducousso

, et al. How to estimate stone volume and its use in stone surgery: A comprehensive review. Actas Urol Esp (Engl Ed), 2024; 48(1):71–78; doi: 10.1016/j.acuroe.2023.08.009

48.

Panthier

, Traxer

, Yonneau

, et al. Evaluation of a free 3D software for kidney stones' surgical planning: “Kidney stone calculator” a pilot study. World J Urol, 2021; 39(9):3607–3614; doi: 10.1007/s00345-021-03671-z

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

5.60 MB