Clinical Applications of Machine Learning for Urolithiasis and Benign Prostatic Hyperplasia: A Systematic Review

Abstract

Introduction:

Previous systematic reviews related to machine learning (ML) in urology often overlooked the literature related to endourology. Therefore, we aim to conduct a more focused systematic review examining the use of ML algorithms for the management of benign prostatic hyperplasia (BPH) or urolithiasis. In addition, we are the first group to evaluate these articles using the Standardized Reporting of Machine Learning Applications in Urology (STREAM-URO) framework.

Methods:

Searches of MEDLINE, Embase, and the Cochrane CENTRAL databases were conducted from inception through July 12, 2021. Keywords included those related to ML, endourology, urolithiasis, and BPH. Two reviewers screened the citations that were eligible for title, abstract, and full-text screening, with conflicts resolved by a third reviewer. Two reviewers extracted information from the studies, with discrepancies resolved by a third reviewer. The data collected were then qualitatively synthesized by consensus. Two reviewers evaluated each article according to the STREAM-URO checklist with discrepancies resolved by a third reviewer.

Results:

After identifying 459 unique citations, 63 articles were retained for data extraction. Most articles consisted of tabular (n = 32) and computer vision (n = 23) tasks. The two most common problem types were classification (n = 40) and regression (n = 12). In general, most studies utilized neural networks as their ML algorithm (n = 36). Among the 63 studies retrieved, 58 were related to urolithiasis and 5 focused on BPH. The urolithiasis studies were designed for outcome prediction (n = 20), stone classification (n = 18), diagnostics (n = 17), and therapeutics (n = 3). The BPH studies were designed for outcome prediction (n = 2), diagnostics (n = 2), and therapeutics (n = 1). On average, the urolithiasis and BPH articles met 13.8 (standard deviation 2.6), and 13.4 (4.1) of the 26 STREAM-URO framework criteria, respectively.

Conclusions:

The majority of the retrieved studies effectively helped with outcome prediction, diagnostics, and therapeutics for both urolithiasis and BPH. While ML shows great promise in improving patient care, it is important to adhere to the recently developed STREAM-URO framework to ensure the development of high-quality ML studies.

Introduction

Artificial intelligence (AI) involves testing and training computerized algorithms that aim to simulate human cognitive functions such as problem solving and learning. The applications of AI within the medical field include but are not limited to the diagnosis, management, and outcome prediction of health conditions. Machine learning (ML) is a subtype of AI that utilizes dynamic algorithms to analyze complex patterns and problems to then generate useful predictive outputs. ML can be categorized into supervised, unsupervised, and reinforcement learning approaches.¹ A supervised algorithm refers to one that is trained on a prelabeled dataset and is designed to solve classification or regression problems.²

On the contrary, an unsupervised algorithm does not rely on the labeling of data when generating predictions, as it learns to recognize patterns from the input data on its own.² Unsupervised algorithms are often used for clustering problems, which relate to the grouping of data based on their similarities and differences. Finally, reinforcement learning operates by trial and error to fine-tune an algorithm's parameters so that it can achieve its designated goal. A more detailed explanation of these learning approaches can be found in the Supplementary Data S1.

ML has been widely adopted within the field of urology to help with the diagnosis, outcome prediction, and management of urologic conditions.^3,4 The number of studies utilizing ML to advance the field of urology are increasing. Given this recent surge, it is especially important for clinicians to better understand the fundamentals of the technology and learn how it can be applied to their clinical practices. Previous systematic reviews related to the application of AI in urology have been conducted.^5
–7 However, these reviews were either nonexhaustive or overlooked the literature related to endourology. Therefore, we sought to conduct a more focused systematic review examining the use of ML algorithms specifically for patients with benign prostatic hyperplasia (BPH) and urolithiasis. In addition, we are the first group to evaluate the quality of these articles using the newly developed Standardized Reporting of Machine Learning Applications in Urology (STREAM-URO) framework.⁸

Methods

We conducted a systematic review according to a prespecified protocol, with reporting according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement.

Search strategy

The search strategy was developed with the help of an experienced librarian (Lucy Kiester) and reviewed by a clinical expert (N.B.). References were identified through searches of MEDLINE, Embase, and the Cochrane CENTRAL databases from inception through July 12, 2021. Keywords and Medical Subject Heading terms searched included those related to ML, endourology, urolithiasis, and BPH. Additional information related to the search strategy used can be found in the Supplementary Data S1.

Study selection

Original research articles published in peer-reviewed journals discussing the application of ML for urolithiasis or BPH were included without any language restrictions. Studies were excluded if they (1) did not address the application of ML for urolithiasis or BPH; (2) were related to prostate cancer; (3) were reviews, case reports, commentaries, or conference proceedings; or (4) were not published in the English language. Studies related to prostate cancer were specifically excluded as there is often diagnostic confusion with BPH, and these were beyond the scope of this review. Two reviewers independently screened the citations that were considered eligible for title, abstract, and full-text screening, with conflicts resolved by a third reviewer.

Data extraction

Data related to general study and population characteristics were collected. In addition, data related to the ML algorithms used and their applications were retrieved. The complete list of information extracted can be found in the Supplementary Data S1. Two independent reviewers were responsible for extracting information from each included study. Any discrepancies in data extraction were resolved with the help of a third reviewer. In addition, two reviewers well-versed in ML (X.H.L. and W.X.L.) reviewed and assessed the data extracted from the included articles. These reviewers have experience with developing ML algorithms and have published ML studies in the past.⁹ The data collected were then qualitatively synthesized by consensus.

STREAM-URO assessment

Each retained article was evaluated according to the STREAM-URO framework, which is a 26-item checklist designed to promote and ensure the development of standardized and high-quality studies within the urologic community.⁸ Two independent reviewers were responsible for grading each included study. Discrepancies in grading were resolved with a third reviewer.

Results

Study selection

The initial search identified a total of 615 references. After removing all duplicate references, 459 unique citations remained. From this list, 93 articles remained following the initial title/abstract screening. Following the full-text review, 63 articles were retained for data extraction (Fig. 1).

FIG. 1.

PRISMA flow diagram of study selection. PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Study characteristics

Among the 63 articles retrieved, most studies consisted of tabular (n = 32) and computer vision (n = 23) tasks. Other tasks included signal processing (n = 5), natural language processing (NLP; n = 2), and time series modeling (n = 1). The two most common problem types were of classification (n = 40) and regression (n = 12). Other problem types encountered were segmentation (n = 5), object detection (n = 4), and entity recognition (n = 2). In general, most studies utilized neural networks (NN) as their ML algorithms (n = 36). Alternative algorithms included support vector machines (SVMs; n = 6), linear models (n = 2), nearest neighbors (n = 1), ensemble learning (n = 1), boosting (n = 1), and decision trees (n = 1) among many others (n = 15).

Moreover, among the 63 studies retrieved, 58 were related to urolithiasis, and 5 to BPH. With regard to the clinical applications of the studies related to urolithiasis, 20 were designed for outcome prediction (Table 1), 18 were related to stone classification (Table 2), 17 studies aided with diagnostics (Table 3), and 3 focused on therapeutics (Table 4). Among the studies related to BPH, two aimed to help with outcome prediction, two with diagnostics, and one with treatment (Table 5).

Table 1.

Applications of Machine Learning for Outcome Prediction of Kidney Stone Disease

First author	Task category	Problem type	ML algorithm	Sample size	Study objective	Evaluation metrics	Multiple models	Algorithm names	Improvement over RS
SFS following SWL
Choo¹⁰	Computer Vision	Classification	Decision Trees	791	Predict SFS	Sensitivity: 95.9% Specificity: 85.8% Accuracy: 92.3% AUC: 0.951	—	—	—
Mannil¹¹	Computer Vision	Regression	Other	34	Predict SFS	Sensitivity: 94% Specificity: 59% AUC: 0.84	Yes	Linear regression, SMOreg	—
Mannil¹⁷	Computer Vision	Regression	Other	51	Predict SFS	3D TA + BMI Sensitivity: 76% Specificity: 76% AUC: 0.8 3D TA + Initial stone size Sensitivity: 76% Specificity: 76% AUC: 0.81 3D TA + Skin to stone distance Sensitivity: 65% Specificity: 72% AUC: 0.81	Yes	NN, RF, SMOreg, kNN, decision trees	Yes
Gomha¹²	Tabular	Classification	NN	984	Predict SFS	ANN Sensitivity: 77.9% Specificity: 75% Accuracy: 77.7% LR Sensitivity: 100% Specificity: 0% Accuracy: 93.2%	Yes	MLP classifier, LR	Yes
Xu¹³	Tabular	Classification	NN	1174	Predict SFS	Accuracy: 75.3% AUC: 0.935	—	—	—
Seckiner¹⁴	Tabular	Classification	NN	203	Predict SFS	Accuracy: 88.7%	—	—	—
Yang¹⁵	Tabular	Classification	Ensemble Learning	358	Predict SFS	SFS (Accuracy) RF: 86% XGBoost: 87.5% LightGBM: 88% SFS (Sensitivity) RF: 74% XGBoost: 75% LightGBM: 78% SFS (Specificity) RF: 92% XGBoost: 93% LightGBM: 92% SFS (AUC) RF: 85% XGBoost: 84% LightGBM: 85% One-session success (Accuracy) RF: 78% XGBoost: 77.4% LightGBM: 77% One-session success (Sensitivity) RF: 81% XGBoost: 80% LightGBM: 79% One-session success (Specificity) RF: 75% XGBoost: 75% LightGBM: 74% One-session success (AUC) RF: 78% XGBoost: 77% LightGBM: 77%	Yes	RF, XGBoost, LightGBM	—
Poulakis¹⁶	Tabular	Classification	NN	701	Predict SFS	Accuracy: 92% AUC: 0.936	—	—	—
SFS following PCNL
Shabaniyan²⁹	Tabular	Classification	Other	254	Predict SFS	Stent placement Accuracy: 94.8% Blood transfusion Accuracy: 85.2% Prediction of cases Accuracy: 95.0%	Yes	Linear Models, Nearest Neighbors, NN, SVM	—
Hameed³⁰	Tabular	Classification	Other	100	Predict SFS	Accuracy: 81% AUC: 0.81	Yes	LR, SVM, decision trees, RF, K-Means	—
Aminsharifi¹⁸	Tabular	Classification	NN	454	Predict SFS	SFS Sensitivity: 83% Accuracy: 82.8% Need for repeat PCNL Sensitivity: 97% Accuracy: 97.7% Need for SWL Sensitivity: 98% Accuracy: 98.2% Need for TUL Sensitivity: 92% Accuracy: 92.5% Need for stent insertion Sensitivity: 81% Accuracy: 81.1% Blood transfusion Sensitivity: 85% Accuracy: 85.8%	—	—	—
Aminsharifi¹⁹	Computer Vision	Classification	SVMs	146	Predict SFS	SFS Sensitivity: 92% Specificity: 91.8% Need for repeat PCNL Sensitivity: 97% Specificity: 95.1% Need for SWL Sensitivity: 82% Specificity: 80% Need for ureteroscopy and stent insertion Sensitivity: 91% Specificity: 91.1% Blood transfusion Sensitivity: 89% Specificity: 83%	—	—	Yes
SFS following multiple endourologic procedures
Kadlec²⁰	Tabular	Regression	NN	382	Predict SFS	Sensitivity: 75.3% Specificity: 60.4% Accuracy: 69.6% AUC: 0.749	—	—	—
Prediction of infection
Liu²¹	Tabular	Classification	Other	322	Predict infection	Sensitivity: 96.1% Specificity: 100% Accuracy: 99.1% AUC: 0.981	Yes	LR, lasso, SVM, RF, XGBoost	Yes
Spontaneous stone passage
Cummings²³	Tabular	Classification	NN	181	Predict SFS	Accuracy: 76%			—
Dal Moro²⁴	Tabular	Classification	Other	402	Predict SFS	Sensitivity: 84.5%Specificity: 86.9%	Yes	NN, Linear Models, SVMs	—
Solakhan²²	Tabular	Classification	NN	192	Predict SFS	Stone passage Accuracy: 99.1%	—	—	—
Parekattil²⁵	Tabular	Classification	Linear Model	384	Predict SFS	Passage outcome Accuracy: 86%Passage duration Accuracy: 87%	—	—	—
Stone fragmentation following SWL
Goyal²⁶	Tabular	Regression	NN	276	Predict SFS	PowerMVRA:COC = 0.0195ANN:COC = 0.8343Number of shocksMVRA:COC = 0.5726ANN:COC = 0.9329	—	—	—
Moorthy²⁷	Tabular	Classification	NN	120	Predict SFS	Sensitivity: 80.7% Specificity: 98.4%Accuracy: 90%.	—	—	—
Hamid²⁸	Tabular	Regression	NN	82	Predict SFS	Predictability: 75%	—	—	—
Unique outcomes
Nguyen⁹	Tabular	Classification	Other	3206	Estimate HRQoL	Out-of sample (DL) AUC: 0.592Out-of-sample (gradient boosting) AUC: 0.70In-sample set AUC: 0.92 Validation set AUC: 0.71	Yes	Boosting, NN	—

3D = three-dimensional; ANN = artificial neural network; AUC = area under the curve; BMI = body mass index; COC = coefficient of correlation; DL = deep learning; HRQoL = health-related quality of life; kNN = k-nearest neighbors; LightGBM = light gradient boosting method; LR = logistic regression; ML = machine learning; MLP = multilayer perceptron; MVRA = multivariate regression analysis; NN = neural networks; PCNL = percutaneous nephrolithotomy; RF = random forest; RS = reference standard; SFS = stone-free status; SMOreg = sequential minimal optimization regression; SVM = support vector machine; SWL = shockwave lithotripsy; TA = texture analysis; TUL = transurethral lithotripsy; XGBoost = extreme gradient boosting trees.

Table 2.

Applications of Machine Learning for the Classification of Kidney Stones

First author	Task category	Problem type	ML algorithm	Sample size	Study objective	Evaluation metrics	Multiple models	Algorithm names	Improvement over RS
Kazemi³¹	Tabular	Classification	Other	936	Classifying kidney stone type	Accuracy: 97.1%AUC: 0.996	Yes	Naive Bayes, SVM, NN, NLP, Boosting, Decision trees	—
Zheng³³	Tabular	Regression	Other	1198	Detecting infectious stones	Training set AUC: 0.898Validation set 1 AUC: 0.832Validation set 2 AUC: 0.825Validation set 3 AUC: 0.812	Yes	Linear Models, Decision Trees, SVMs	—
Dussol³²	Tabular	Regression	NN	215	Classifying kidney stone type	Discriminant analysisSensitivity: 66.4%Specificity: 87.5%Accuracy: 75.8%ANNSensitivity: 62.2%Specificity: 89.6%Accuracy: 74.4%	Yes	Discriminant analysis, NN	Yes
Zhang³⁴	Computer Vision	Classification	SVM	45	Classifying UA vs non-UA stones	Sensitivity: 100% Specificity: 96.9% AUC: 0.984	—	—	Yes
Estrade³⁵	Computer Vision	Object detection	NN	347	Classifying pure vs mixed stones	Pure stones Sensitivity: 98% Specificity: 100% Accuracy: 99% AUC: 0.98Mixed stones Sensitivity: 91% Specificity: 99% Accuracy: 95%AUC: 0.93	—	—	—
Kuzmanovski³⁷	Tabular	Classification	NN	12	Classifying whewellite, weddellite and UA stones	ANN (SEP) Whewellite: 0.043 Weddellite: 0.051 UA: 0.019PLS (SEP) Whewellite: 0.069 Weddellite: 0.077 UA: 0.056PCR (SEP) Whewellite: 0.070 Weddellite: 0.084 UA: 0.058	—	—	Yes
Black³⁸	Computer Vision	Classification	NN	127	Classifying kidney stone type	UA Sensitivity: 94.1% Specificity: 97.8% AUC: 0.97COM Sensitivity: 90.5% Specificity: 97.62% AUC: 0.96Struvite Sensitivity: 71.4% Specificity: 91.84% AUC: 0.99Cystine Sensitivity: 75% Specificity: 98.31%Brushite Sensitivity: 85.7%Specificity: 96.4% AUC: 0.95	—	—	—
Kuzmanovski³⁶	Tabular	Classification	NN	58	Classifying whewellite, weddellite and carbonate apatite stones	Whewellite RMSE: 0.035Weddellite RMSE: 0.064Carbonate apatite RMSE: 0.078	—	—	—
Fitri³⁹	Computer Vision	Classification	NN	30	Classifying calcium, UA, and mixed stones	Overall accuracy: 99.6%	—	—	—
Bejan⁴³	NLP	Entity recognition	Other	400	Mining kidney stone composition from EHR	COM Sensitivity: 90.2% PPV: 94.9%COD Sensitivity: 83.3% PPV: 93.8%Hydroxyapatite Sensitivity: 90.9% PPV: 90.9%Brushite Sensitivity: 80% PPV: 100%UA Sensitivity: 100% PPV: 87.5%Struvite Sensitivity: 100% PPV: 100%	—	—	—
Grose Hokamp⁴⁰	Computer Vision	Classification	NN	200	Predict the main component of pure and mixed stones	Per voxel-based analysis Accuracy: 91.1%	—	—	—
Kriegshauser⁴¹	Tabular	Classification	Other	38	Classifying kidney stone types	Distinguishing UA from non-UA Accuracy: 100%Distinguishing three non-UA subtypes Accuracy: 88%	Yes	SVM, RandomTree; ANN; NB Tree; Decision Tree	Yes
Kahani⁴²	Computer Vision	Classification	Other	6	Classifying kidney stone types	Overall Accuracy: 96% ± 2%	Yes	SVM, Decision tree, Discriminant analysis, kNN	—
Cui⁴⁴	Signal processing	Classification	Other	135	Classifying kidney stone types	kNN (Euclidian) Accuracy: 96.3%kNN (Cosine) Accuracy: 94.8%kNN (Minkowski) Accuracy: 94.8%SVM (Linear) Accuracy: 96.3%SVM (Polynomial Kernel) Accuracy: 96.3%SVM (RBF) Accuracy: 88.1%	Yes	SVM, kNN	—
Blanco⁴⁵	Computer Vision	Classification	NN	215	Classifying kidney stone types	Overall accuracy: 94.4%	—	—	Yes
Sacli⁴⁶	Signal Processing	Classification	Nearest Neighbors	105	Classifying kidney stone types	kNN (overall):Accuracy: 98.2%Sensitivity: 98.0%Specificity: 98.6%Precision: 98.8%ANN (overall):Accuracy: 98.1%Sensitivity: 98.0%Specificity: 98.6%Precision: 98.8%	—	—	—
Volmer⁴⁷	Signal Processing	Regression	NN	261	Kidney stone composition analysis	Training set RMSE: 1.5%Validation set RMSE: 2.3%	—	—	Yes
Volmer⁴⁸	Signal Processing	Regression	NN	36	Kidney stone composition analysis	Training set RMSE: 1.842	—	—	—

COD = calcium-oxalate dihydrate; COM = calcium-oxalate monohydrate; EHR = electronic health records; NB Tree = Naive-Bayes Tree; NLP = natural language processing; PCR = principal component; PLS = partial least squares; PPV = positive predictive value; RBF = radial basis function; RMSE = root mean square error; SEP = standard error of prediction; UA = uric acid.

Table 3.

Applications of Machine Learning for the Diagnosis of Kidney Stones

First author	Task category	Problem type	ML algorithm	Sample size	Study objective	Evaluation metrics	Multiple models	Algorithm names	Improvement over RS
Selvarani⁴⁹	Computer Vision	Classification	SVM	250	Stone detection on US	Accuracy: 98.8%False acceptance rate: 1.8False rejection rate: 3.3	—	—	Yes
Divya Krishna⁵⁰	Computer Vision	Classification	SVMs	508	Stone detection on US	Sensitivity: 100%Specificity: 96.82%Accuracy: 98.14%	Yes	SVM, NN	—
Langkvist⁵¹	Computer Vision	Object detection	NN	465	Stone detection on CT	Sensitivity: 100%Specificity: 97%AUC: 0.997	—	—	—
Cui⁵²	Computer Vision	Segmentation, Classification	NN	566	Stone detection and scoring on CT	STONE scoring Sensitivity: 86.5% Specificity: 93.4% Accuracy: 91.9%AUC: 0.97Stone detection Sensitivity: 95.9% PPV: 98.7%	—	—	—
Fernandez⁵³	Computer Vision	Segmentation	NN	1851	Detection of calcium phosphate deposit plugs on endoscopic imaging	Jaccard-cross-entropy loss score: 0.174Intersection over union score: 0.900	—	—	—
Lee⁵⁴	Computer Vision	Object detection	NN	112	Differentiate stones from phleboliths	AUC: 0.88	—	—	—
De Perrot⁵⁵	Computer Vision	Segmentation, Dimensionality Reduction	Boosting	412	Differentiate stones from phleboliths	Sensitivity: 91.7%Specificity: 78.3%Accuracy: 85.1%AUC: 0.902	—	—	—
Jendeberg⁵⁶	Computer Vision	Classification	NN	100	Differentiate stones from phleboliths	Sensitivity: 94%Specificity: 90%Accuracy: 92%AUC: 0.95	—	—	Yes
Chen⁵⁷	Tabular	Regression	Other	277	Predicting stone development	LRSensitivity: 83%Specificity: 56%AUC: 0.74Super learnerSensitivity: 80%Specificity: 60%AUC: 0.69RFSensitivity: 46%Specificity: 72%AUC: 0.64LogitBoostSensitivity: 59%Specificity: 66%AUC: 0.63Decision treeSensitivity: 46%Specificity: 65%AUC: 0.61	Yes	Linear Model, Decision Tree, Boosting, Stacking	No
Eken⁵⁸	Tabular	Classification	NN	227	Predicting renal colic	Sensitivity: 94.9%Specificity: 78.4%AUC: 0.867	Yes	MLP, GA, LR	Yes
Chiang⁶¹	Tabular	Classification	NN	256	Predicting stone development	Accuracy: 89%	—	—	—
Chen⁵⁹	Tabular	Regression	Other	38,597	Predicting stone development	Kidney stones vs other GU diseasesSensitivity: 76%Specificity: 71%AUC: 0.80Kidney stones vs other conditionsSensitivity: 90%Specificity: 81%AUC: 0.92Kidney stones vs acute localized painSensitivity: 81%Specificity: 82%AUC: 0.86	Yes	Linear Model, Decision Tree, Boosting	Yes
Caudarella⁶³	Tabular	Classification	NN	80 Patients	Predicting risk of stone recurrence	Sensitivity: 97.1%Specificity: 82.2%Accuracy: 88.8%	—	—	Yes
Tanthanuch⁶²	Tabular	Classification	NN	168	Predicting stone development	Accuracy: 100%	—	—	—
Jungmann⁶⁰	NLP	Entity recognition	SVMs	1714	Identify suspected cases of stone disease	CalculusF1 score: 0.98Renal CalculusF1 score: 0.88Ureteral CalculusF1 score: 0.89Obstructive uropathyF1 score: 0.94	—	—	Yes

GA = genetic algorithm; GU = genitourinary; US = ultrasound.

Table 4.

Applications of Machine Learning for the Treatment of Kidney Stones

First author	Task category	Problem type	ML algorithm	Sample size	Study objective	Evaluation metrics	Multiple models	Algorithm names	Improvement from RS
Chen⁶⁴	Time Series Modeling	Classification	NN	8583	Plan optimal automated SWL protocols	Power levelAccuracy: 98%Precision: 98.8%Recall: 98.0%F1: 0.980Shock rateAccuracy: 98.1%Precision: 96.3%Recall: 95.7%F1: 0.960	Yes	NN, SVMs, Decision Tree	No
Li⁶⁵	Tabular	Classification	NN	312	Predict optimal stone localizing method during PCNL	No standard metric report	Yes	NN, Multiple variable regressions	Yes
Muller⁶⁶	Computer Vision	Segmentation, Classification	NN	23,212	Improve accuracy of SWL shocks	Accuracy: 63.9%Sensitivity: 56.0%Specificity: 74.7%PPV: 75.3%NPV: 55.2%Youden's J: 30.7%. No-information rate: 58.0% Cohen's k: 0.2931	—	—	Yes

NPV = negative predictive value.

Table 5.

Applications of Machine Learning Related to Benign Prostatic Hyperplasia

First author	Task category	Problem type	ML algorithm	Sample size	Study objective	Evaluation metrics	Multiple models	Algorithm names	Improvement from RS
Khalid⁶⁷	Computer Vision	Object detection	NN	59	Detect BPH on histopathologic specimens	Sensitivity: 84%Specificity: 93%	—	—	—
Shatalova⁶⁹	Signal Processing	Classification	NN	120	Predict BPH surgery complications	NNClass 1Sensitivity: 84%Specificity: 93%DE: 88%Class 2Sensitivity: 80%Specificity: 89%DE: 87%Class 3Sensitivity: 82%Specificity: 92%DE: 86%LR modelClass 1Sensitivity: 77%Specificity: 75%DE: 70%Class 2Sensitivity: 85%Specificity: 82%DE: 83%Class 3Sensitivity: 92%Specificity: 88%DE: 91%	Yes	NN, LR	Yes
Djavan⁷⁰	Tabular	Classification	NN	397	Predict factors leading to high risk of BOO	Sensitivity: 82%Specificity: 77%Accuracy: 79%	—	—	—
Habes⁶⁸	Computer Vision	Regression	SVMs	53	Measure prostate volume	Spearman's rank correlation coefficient ρ: 0.965Mean difference: −0.05 mL (CI 95% −3.8 to 3.7 mL)	—	—	No
Sethi⁷¹	Computer Vision	Segmentation	Linear Model	100	Detect histologic changes secondary to dutasteride	Year 2 biopsiesAUC: 0.79Year 4 biopsiesAUC: 0.97TotalAUC: 0.79Accuracy: 76%	—	—	Yes

BOO = bladder outlet obstruction; BPH = benign prostatic hyperplasia; CI = confidence interval; DE = diagnostic efficiency of the decision rule.

Urolithiasis

Outcome prediction

Among the 58 urolithiasis studies, the majority (n = 20) aimed to predict outcomes. Outcomes included stone-free status (SFS), the detection of infection, predicting the spontaneous passage of stones, the optimization of kidney stone fragmentation, and the prediction of stone patients' health-related quality of life (HRQoL). A detailed description of these studies can be found in Table 1.

Eight studies helped predict the SFS of patients following shockwave lithotripsy (SWL).^{10

–17} These studies used parameters, including patient age, stone location, stone volume, stone length, and Hounsfield units to build their algorithms. Among these studies, only one group incorporated stone texture analyses in their algorithm to help determine SFS.^11,17 The accuracy of the models used were as high as 99% in predicting SFS.¹⁴ The study that achieved this accuracy level developed an artificial neural network (ANN) algorithm using data extracted from 203 patients that presented for SWL. In addition to these eight studies, two other studies aimed to predict SFS following percutaneous nephrolithotomy (PCNL). Both of these studies were conducted by Aminsharifi and colleagues.^18,19

In their initial study, they designed a ML algorithm to predict SFS and other postoperative complications following PCNL. The algorithm was found to predict SFS with an accuracy of 82.8%. Following these promising results, the group then validated the accuracy of their algorithm and compared its performance to two widely used nomograms for the prediction of SFS post-PCNL; the Guy's Stone Score and the Clinical Research Office of the Endourological Society nomogram. The authors concluded that the predictive performance of their ML-based algorithm was better than both nomograms. Finally, there was one study that predicted SFS and other postoperative outcomes following either SWL, ureteroscopy, or PCNL using an ANN.²⁰ The results retrieved were compared with traditional statistical methods and showed that ANNs were superior.

One study applied ML models to help with the prediction of infection. The authors used an extreme gradient boosting algorithm to identify patients with obstructed hydronephrosis at high risk of developing pyonephrosis. This model achieved this task with an accuracy, sensitivity, and specificity of 99%, 96%, and 100%, respectively.²¹

The prediction of spontaneous stone passage using ML methods was attempted by four different groups.^22

–25 These studies utilized either ANNs or SVMs to help with this task. Of note, the most recent algorithm displayed an accuracy of 99% in estimating spontaneous stone passage rate.²² The authors highlight stone size, body weight, pain score, serum C-reactive protein levels, and erythrocyte sedimentation rate as criteria that were superior than others in predicting spontaneous stone passage.

Another outcome that researchers often attempted to predict was stone fragmentation following SWL. There were three studies that attempted this task using an ANN.^26
–28 The first group conducted a pilot study in 2003 and showed that their algorithm effectively identified patients who were unlikely to benefit from SWL.²⁸ However, they emphasize that their limited sample size prevents them from making final recommendations. A more recent study verified this outcome on a larger sample and showed that their ANN could predict if a stone is fragmentable or not using noncontrast CT imaging.²⁷ They report that the chance of misclassifying a nonfragmentable stone was of 2.6%.

However, their algorithm was not tested to predict the fragmentation of multiple stones as well as stones larger than two centimeters. Finally, the study conducted by Goyal and colleagues compared the use of an ANN and multivariate regression analysis to predict renal stone fragmentation.²⁶ Using a coefficient of correlation, they showed that the ANN was superior to multivariate regression analysis in predicting renal stone fragmentation by SWL.

There were two studies that developed a decision support system to help patients and clinicians select an appropriate surgical treatment for the management of their kidney stones. Both studies focused on predicting surgical outcomes following PCNL. One of these systems was specifically designed for the use of PCNL to treat large stones.²⁹ Using multiple ML models, this system predicted SFS, surgical complications, and the need for ancillary surgical procedures. The algorithms used provided an accuracy of 95% in predicting the need for a blood transfusion or retreatment, and an accuracy of 85% in determining the need for stent placement post-PCNL. The other decision support system focused on predicting SFS following PCNL of staghorn stones to help urologists counsel their patients appropriately when selecting a treatment option. Within this system, the algorithm with the best accuracy in predicting SFS (81%) was the random forest (RF) classifier.³⁰

Additional studies predicted unique outcomes with the help of ML. For instance, a study conducted by Nguyen and colleagues estimated the HRQoL of kidney stone patients using clinical data retrieved from the Wisconsin Stone Quality-of-Life (WISQOL) questionnaire.⁹ Their model effectively predicted stone patient's HRQoL using clinical information with an area under the curve (AUC) of 0.79 and 0.83 for patients within the lowest and highest quintiles of their HRQoL stratification.

Classification of stone type

A common application of ML for urolithiasis was in helping with the classification of stones. The studies retrieved (n = 18) showed that there are multiple ML algorithms that can help with stone classification, including NN, NLP, SVM, and computer vision, among others. A detailed list of the studies using ML for the classification of stones can be found in Table 2.

Three studies were designed to predict kidney stone type. One study used a combination of ML algorithms (ensemble learning) and another did so with an ANN. The study evaluating the use of ensemble learning achieved this task with an accuracy of 97.1%.³¹ The study utilizing ANNs also evaluated the use of discriminant and logistic regression analysis. This study focused on predicting the risk of developing calcium oxalate stones specifically. Their results showed that ANNs were not superior to the classical statistical analyses used.³² The remaining study conducted by Zheng and coworker extracted over 1000 radiomic features from CT scans to then develop and validate a nomogram designed to identify infectious kidney stones. This nomogram effectively achieved this task with an AUC of 0.842.³³

Three studies focused on distinguishing between two specific stone classes. For instance, a study carried out by Zhang and coworkers differentiated uric and nonuric acid stones.³⁴ This study attempted to achieve the latter by evaluating the texture feature of the stones using CT texture analysis. The data collected were then used to train a SVM classifier, which led to a diagnostic accuracy of 88% and 92% for uric acid and nonuric acid stones, respectively. Another study focused on identifying and distinguishing between pure and mixed kidney stones using a deep convolutional neural network (CNN) that was trained on intraoperative endoscopic images.³⁵

The authors used the unique physical features of the two stone types to train the NN. The algorithm's accuracy was higher than 87% for both the pure and mixed stones. The third study aimed to quantitatively differentiate stones composed of whewellite, weddellite, and carbonate apatite using ANN.³⁶ The authors of this study achieved this goal by measuring the infrared spectra of the different stone types.

Other classification studies aimed to distinguish between multiple classes of kidney stones.³⁷ One study attempted this task using a deep learning computer vision algorithm trained on digital camera images of stones. The overall weighted sensitivity in predicting stone type was of 85%. However, this varied for each stone type; the highest sensitivity was for uric acid stones (94%), and the lowest was for brushite stones (71%).³⁸ While the previous study used digital camera images of stones, a study led by Fitri and colleagues used a CNN trained on microCT images to classify the different classes of kidney stones. This group achieved this task with an accuracy of 99% and a classification error of 1.2%. However, the classification groups were broader and only consisted of uric acid, calcium, and mixture stones.³⁹

Two studies used dual energy CT as an imaging modality to train their model. The first study conducted by Grose Hokamp and coworkers used a NN to effectively predict the main component of pure and mixed kidney stones with an overall accuracy of 91%.⁴⁰ The second study applied multiple algorithms, including SVMs, RandomTree, ANN, and naive Bayes tree to distinguish uric acid from nonuric acid stones with an accuracy of 100%. Once distinguished, the algorithm subclassified these nonuric acid stones with an accuracy of 88%.⁴¹ Other imaging modalities used in the automated classification of stones included dual-energy kidney, ureter, and bladder (DEKUB) X-ray imaging. In the latter study, a mean accuracy of 96% was achieved in appropriately classifying stones using linear discriminant analysis.⁴²

Interestingly, only one article applied an NLP algorithm to extract stone composition information from electronic health records. This algorithm provided a positive predictive value (PPV) >87.5% for all the possible stone compositions. The authors explain that most of the false positives were due to the mislabeling of urinary uric acid mentions as uric acid stones.⁴³ Other methods such as Raman spectroscopy, hyperspectral imaging, infrared spectroscopy, and using the microwave dielectric properties of stones were also used in conjunction with ML tools to classify the different types of kidney stones.^44

–48

Diagnostics

The identification and diagnosis of kidney stones was another common theme among the studies retrieved. These studies varied with respect to the imaging modality and ML algorithm used. A detailed list of the studies related to the applications of ML for the diagnosis of kidney stones can be found in Table 3.

Selvarani and Rajendran and Divya Krishna and colleagues used data retrieved from ultrasound imaging systems to train and test their algorithms. The former used a metaheuristic SVM classifier to enhance the quality of ultrasound images when detecting kidney stones.⁴⁹ The methodology used led to an accuracy of 98.8% in appropriately detecting kidney stones. The study led by Divya Krishna and colleagues also used SVMs. However, the goal of the study was not specific to stones, as it aimed to identify any kidney abnormality on ultrasound such as cysts or stones.⁵⁰ Nevertheless, this algorithm achieved their goal with a similar accuracy of 98.14%.

Two studies developed an algorithm specific to CT imaging. The model used by Langkvist and coworkers is notable as it was the first of its kind to develop a CNN algorithm for the detection of stones using three-dimensional data. In addition, it also specifically focused on the detection of ureteral stones which are more challenging to detect than kidney stones. This algorithm allowed for the detection of ureteral stones on CT scans with a sensitivity of 100% and a false positive rate of 2.7 per CT scan.⁵¹

Cui and coworkers developed an algorithm to both detect and grade the severity of kidney stones using the S.T.O.N.E. (stone size, tract length, obstruction, number of involved calices, and essence/stone density) scoring system. The algorithm combined CNNs with thresholding methods and achieved stone detection with a sensitivity of 95.9% and a PPV of 98.7%.⁵² The last study that focused on the detection of kidney stones was unique in its kind, as it discussed the development of a deep learning algorithm that identified stone precursors such as plaque and plugs on video endoscopic data.⁵³

An issue that is often encountered when diagnosing kidney stones on imaging is the misdiagnosis of ureteral stones as pelvic phleboliths. In this review, three studies focused on the differentiation of kidney stones to phleboliths. Lee and associates carried out this assessment on CT images using an ANN. This was achieved with an AUC of 0.85 for the shape and 0.88 for the texture parameters.⁵⁴ In comparison, the study conducted by De Perrot and colleagues differentiated phleboliths from kidney stones with an accuracy of 85.1% and AUC of 0.90 using a combination of radiomics and ML.⁵⁵ The third study led by Jendeberg and coworkers focused specifically on the distinction of distal ureteral stones with phleboliths using a CNN. The evaluation metrics of this study were superior to the previous two as it had an accuracy of 92% and AUC of 0.95.⁵⁶

In addition, this review included studies that predicted stone development. Chen and coworkers compared multivariate logistic regression and statistical ML methods' ability to predict the development of large kidney stones using data from laboratory testing results and detailed patient demographics. Their logistic regression model was superior to all other ML models with a sensitivity of 83% and a specificity of 56%.⁵⁷ Eken and associates compared ANN, genetic algorithms, and logistic regression analysis that used data such as patient's relevant medical history and clinical signs related to urolithiasis. Their ANN model was found to be the best model in predicting urolithiasis with a sensitivity and specificity of 94.9% and 78.4%, respectively.⁵⁸

Another group predicted the development of urolithiasis by developing a multidimensional algorithm based on statistical and ML models. The finalized algorithm included data related to patient demographics and known clinical diagnoses. After testing multiple models, this group found that their stepwise-selected model was most optimal with an AUC, sensitivity, and specificity as high as 0.90, 90%, and 82%, respectively. When applying ML techniques, the authors did not notice a significant increase in their performance.⁵⁹ Jungmann and coworkers propose a unique method to identify suspected cases of urolithiasis using NLP. The authors used this algorithm to identify keywords related to stone disease in radiology free-text reports to detect suspected cases of urolithiasis.⁶⁰ The last group predicted the incidence of stone disease using both discriminant analysis and ANN.⁶¹ This study showed that the ML approach was superior to discriminant analysis in classifying participants known for stone disease.

Additional studies helped in predicting the location and risk of stone recurrence. One study aimed predicted the presence of upper urinary tract stones using an ANN with an accuracy of 100% on a testing sample of 68 records.⁶² This model included data related to patient's history of kidney stone development, the presence of nephrocalcinosis on imaging, and biochemical data from urine cultures and 24-hour urine assays for citrate. Finally, regarding the risk of stone recurrence, there was one study that predicted the 5-year recurrence rate of kidney stones using an ANN model.⁶³ The input data for this model consisted of different serum and urine electrolyte levels. The algorithm accurately predicted the recurrence of stone disease 89% of the time.

Therapeutics

ML-based algorithms can also be used for therapeutic purposes in the treatment of kidney stones. There were three studies identified within this review that aimed to improve the treatment of urolithiasis with the help of ML.^64
–66 Two of these studies discussed SWL and one was related to PCNL. The detailed characteristics related to these studies can be found in Table 4.

One of the studies related to SWL developed a deep learning model designed to automate SWL treatment plans according to baseline patient characteristics.⁶⁴ The authors showed that their model was on par with physician planning of SWL. The other study related to SWL built a CNN to improve shocking accuracy.⁶⁶ Given that this was a pilot study with a small sample size, the results presented require further validation. Nevertheless, the authors showed that their algorithm improved the operator hit rate from 55.2% to 75.3%. Finally, the study related to PCNL predicted the optimal kidney stone localizing method using an ANN.⁶⁵ The authors showed that B-mode ultrasonography with X-ray was recommended for the localization of small renal stones, whereas the localization of simple and large stones only required one of the two methods, with X-ray being the ideal method.

Benign prostatic hyperplasia

This review found five studies examining the use of ML in the management of BPH. The detailed characteristics related to these studies can be found in Table 5.

Among the five studies retrieved, two aimed to help with the diagnosis of BPH. The first study used a computer vision-based system to diagnose BPH on histopathologic specimens with an accuracy of 93%.⁶⁷ The second study examined the use of a SVM algorithm to accurately measure prostatic volume on magnetic resonance imaging.⁶⁸ This algorithm predicted prostate volumes that were in accordance to planimetry. The other two studies retrieved aimed to help with outcome prediction related to BPH and its management. Shatalova and coworkers utilized NN to predict the risk of complications secondary to BPH surgery using the electrical resistance of biologic active points. This was achieved with a diagnostic sensitivity of 84% and a specificity of 93%.⁶⁹

Djavan and associates designed a NN to help predict the risk of symptomatic progression in patients with bladder outlet obstruction to then determine the factors that put patients at highest risk of disease progression.⁷⁰ The group highlighted that prostate-specific antigen, transition zone volume, and obstructive symptom score were associated with a high risk of disease progression. One of the studies evaluated BPH medical management. This study detected subtle histologic effects attributed to dutasteride treatment using a computer vision approach. They then retrieved features associated with patient's degree of responsiveness and developed a histologic score to determine if a patient would respond to dutasteride well or not.⁷¹ Overall, this model was able to distinguish nontreated histologic prostate tissue from dutasteride-treated prostate tissue with an accuracy of 76%.

STREAM-URO assessment

Overall, the articles met 13.7 (standard deviation [SD] 2.7) of the 26 items included in the STREAM-URO assessment. On average, the urolithiasis and BPH articles met 13.8 (SD 2.6) and 13.4 (4.1) of the 26 STREAM-URO framework criteria, respectively.⁸ Of the 63 articles, 62 (95%) met the background, objective, and label criteria of the framework. Only 3 of the 63 articles (4.8%) included an assessment of bias. In addition, only 23 articles (37%) compared their ML models to a reference standard. A detailed view of the STREAM-URO assessment can be found in Table 6.

Table 6.

Standardized Reporting of Machine Learning Applications in Urology Grading of All Included Articles

STREAM-URO criteria	Number of kidney stone studies meeting criteria, out of 58 studies (%)	Number of BPH studies meeting criteria, out of 5 studies (%)
Title: Identify the report as a ML application to a specific urologic question. If applicable, state whether DL was used	45 (78%)	3 (60%)
Background: Describe the urologic problem and rationale for implementing ML models	58 (100%)	4 (80%)
Objective: Clearly state what the proposed ML model(s) aims to address with respect to study population and outcome	58 (100%)	4 (80%)
Problem: State whether the study is a supervised or unsupervised, classification or regression problem	26 (45%)	4 (80%)
Source of data: Describe how the dataset was obtained (e.g., single/multicenter or local/national database) and the study period	49 (84%)	3 (60%)
Eligibility criteria: Specify all criteria for inclusion/exclusion of patients and features, and provide rationale	33 (57%)	2 (40%)
Label: Define the label of interest and how it was assessed	58 (100%)	4 (80%)
Data abstraction: Describe the methods used to develop the final dataset, with consideration of the following: Feature abstraction, Handling of missing data, Feature engineering, and Removal of features (e.g., clinical intuition, principal component analysis, recursive feature elimination, or correlation analysis)	23 (40%)	3 (60%)
Data splitting: Outline the RS that will serve as the baseline for comparison for the study (e.g., existing models from the literature or regression model using the same features)	43 (74%)	4 (80%)
RS: Outline the RS that will serve as the baseline for comparison for the study (e.g., existing models from the literature or regression model using the same features)	20 (34%)	3 (60%)
Model selection: Describe the ML model(s) and version(s) used	51 (88%)	4 (80%)
Hyperparameter tuning: Specify all model hyperparameters that were optimized, search space for hyperparameter tuning, and evaluation metric(s) used to optimize parameters	20 (34%)	3 (60%)
Model evaluation: List the evaluation metrics used to assess performance and clinical utility, including the justification for selection	53 (91%)	4 (80%)
Cohort characteristics: Provide the sample size and summary statistics of the training, validation (if used), and testing cohorts, including incidence of the label of interest	37 (64%)	2 (40%)
Model specification: Present the final ML model and specify the final panel of features included and hyperparameters tuned	25 (43%)	3 (60%)
Model evaluation: Compare evaluation metrics for the ML model(s) and RS	20 (34%)	3 (60%)
Bias assessment: Compare evaluation metrics for the ML model(s) and RS when stratified by relevant factors such as age group, gender, ethnicity, or socioeconomic status, to identify subgroups that benefit, are not helped at all, or harmed by the models	3 (5%)	0 (0%)
Limitations: Discuss the limitations of the ML model(s), with consideration of the data, features, model(s), and/or biases	39 (67%)	2 (40%)
Critical analysis: Describe the main findings of the study, including the following: New predictors of the label of interest identified using ML, Strengths of the ML model(s) compared to the current models in the urologic literature, Why the ML model(s) performed better/worse than what is currently available	37 (64%)	3 (60%)
Clinical utility: Describe how the ML model(s) can be applied to urologic practice, with respect to the potential to improve patient care, clinical decision-making, and/or efficiency	55 (95%)	5 (100%)
Disclosures: Disclose all financial relationships, sources of funding, and potential conflicts of interest	45 (78%)	4 (80%)

STREAM-URO = Standardized Reporting of Machine Learning Applications in Urology.

Discussion

Within the field of endourology (specifically urolithiasis and BPH), ML is applied to help with the prediction of many outcomes. The studies retrieved in this review demonstrated excellent evaluation metrics that often-outperformed clinicians, traditional statistical analyses, or other validated nomograms.^{18
–20,29,30} In general, most of these studies were limited by their single-institution data pool and retrospective study design. In addition, only one of the included studies were validated on an external dataset.²⁵ External validation on a diverse dataset is important to ensure that a newly developed model is free of bias that may be inherently found within the dataset used to develop the model.

After having designed a NN, Parekattil and colleagues validated their model on an external dataset from six different institutions. The testing performed on the initial design institution dataset was found to have a prediction accuracy of 86% for stone passage and 87% for stone passage duration.⁷² In comparison, the external validation of the algorithm led to an accuracy of 88% and 80% in predicting stone passage and stone passage duration, respectively.

This review highlighted the different ways that ML can be applied to help with the classification, diagnosis, predicting the risk of recurrence, and treatment of kidney stones. Overall, for the classification of kidney stones, the developed ML algorithms included used stones' inherent properties (texture, morphology, infrared spectra, and microwave dielectric properties) to develop accurate systems that have the potential to be faster and more affordable than traditional stone analysis.^{35,44

–48} With regard to the diagnosis of kidney stones, this review showed that ML can be applied to different imaging modalities and effectively help diagnose kidney stones.

While ML was effective in diagnosing kidney stones, studies reported that there was an underestimation of individual stone measurements in comparison to manual assessments.^52,56 This is a factor that should be considered when building new ML algorithms. In this review, there were also studies that helped in predicting the development and risk of recurrence of stones. Key variables used for these predictions included hypertension, older age, calcium oxalate supersaturation, log-transformed protein percentage, a history of stones, the presence of nephrocalcinosis, and urine culture results. These algorithms have the potential to reduce the number of unnecessary radiographic testing for kidney stones in the acute care setting. Only three studies examined the application of ML for the treatment of kidney stones, two of which were published in the last year.^64
–66 These studies highlighted the potential that ML applications can have in providing personalized treatment plans.^64
–66

The application of ML for the management of BPH is also a novel and understudied field. In this review, only five studies were retrieved, wherein two aimed to help with outcome prediction of BPH, two targeted the diagnosis of BPH, and one helped with the treatment of BPH.^67

–71 Among the two studies aiming to aid with the diagnosis of BPH, none used ultrasound as an imaging modality, which is the only imaging modality recommended in certain society guidelines when undergoing surgical therapy for BPH.⁷³

The STREAM-URO framework aims to help urologists develop a better understanding of how to appropriately conduct standardized ML studies. In this review, half of the STREAM-URO criteria were not met for the kidney stone and BPH articles. Only three of the studies included within this article were published after the release of the STREAM-URO criteria, so few studies would have been able to consult the STREAM-URO framework.^13,30,33 This finding emphasizes the importance for authors to adhere to the recently developed STREAM-URO framework to promote the development of high-quality ML studies. In this review, almost all articles omitted a bias assessment in their study. Only three articles evaluated and compared the algorithm's metrics when stratified by factors such as age, gender, ethnicity, or socioeconomic status. This is especially important as ML algorithms may lack generalizability across diverse populations.⁵

For instance, studies have shown that the performance of ML algorithms may vary according to race.⁷⁴ Therefore, it is especially important to evaluate ML algorithms in race subgroups to prevent the creation of disparities in care when implementing ML tools. Another frequently missed item was the omission of a reference standard. While demonstrating the feasibility of ML models has its purpose, it is important for investigators to compare these models with a reference standard to evaluate whether ML models are truly superior to the current standard of care.⁸ Reference standards can be in the form of existing models, nomograms or traditional regression models that use similar features. Ultimately, these head-to-head comparisons can allow investigators to advance the field of ML.

Finally, other items that were poorly represented were related to the technical aspects of ML. Most studies omitted to describe the methods used when developing the final dataset and did not present the final ML model with the list of features and hyperparameters used. However, as urologists become accustomed to the STREAM-URO framework, they may develop a better understanding of how to appropriately conduct standardized ML studies and address the limitations identified in this study.

Limitations

This review helped identify common limitations related to the literature discussing the applications of ML for both BPH and urolithiasis. First, only one study was validated on an external dataset. Therefore, one should be cautious when interpreting the data presented in these studies, as the lack in external validation hinders the generalizability of their results. External validation of ML-based models is limited within health care, as it is difficult to ensure uniform data collection since electronic medical records and physician documentation varies across institutions.⁷⁵

In addition, within urology, there is poor insight into ML “black-box” models in comparison to statistical approaches. Therefore, this lack of insight has the potential to perpetuate biases if the ML models are left unchecked. Finally, the lack of standardized study design and reporting of results found within the retrieved studies prevented quantitative analyses from being carried out in this review. Fortunately, a framework designed specifically for ML studies within urology was recently published.⁸ This framework provides guidelines for investigators within the urologic community to help promote the development of high-quality ML studies and address the limitations revealed in this review.

Conclusions

This systematic review highlighted the important role that ML can have within the field of endourology. Studies retrieved within this review effectively helped with outcome prediction, disease classification, diagnostics, and therapeutics for both urolithiasis and BPH. While ML shows great promise in improving patient care within the field of endourology, it is important for investigators to adhere to the recently developed STREAM-URO framework designed to promote the development of high-quality ML studies.

Footnotes

Acknowledgments

We thank Lucy Kiester for helping with the development of this project's search strategy. We acknowledge that the abstract related to this project was presented at the 2022 Northeastern Section of American Urological Association annual meeting and published in the Canadian Urological Association Journal (doi: https://doi.org/10.5489/cuaj.8071).

Authors' Contributions

Study concept and design: D.B., X.H.L., W.X.L., D.-D.N., and N.B. Acquisition of data: D.B., X.H.L., W.X.L., A.A., C.D., A.G., D.-D.N., and J.C.C.K. Analysis and interpretation of data: D.B., X.H.L., W.X.L., A.A., C.D., A.G., D.-D.N., and J.C.C.K. Drafting of the article: D.B. Critical revision of the article for important intellectual content: D.B., X.H.L., W.X.L., A.A., C.D., A.G., D.-D.N., J.C.C.K., B.C., D.S.E., K.C.Z., Q.-D.T., and N.B. Obtaining funding: None. Supervision: B.C., D.S.E., K.C.Z., Q.-D.T., and N.B. Other: None.

Author Disclosure Statement

All other authors report no relevant conflicts of interest.

Funding Information

No funding was received for this article.

Supplementary Material

Supplementary Data S1

Abbreviations Used

References

Obermeyer

, Emanuel

. Predicting the future—Big data, machine learning, and clinical medicine. N Engl J Med, 2016; 375:1216.

Hashimoto

, Rosman

, Rus

, et al. Artificial intelligence in surgery: Promises and perils. Ann Surg, 2018; 268:70.

Bertolo

, Hung

, Porpiglia

, et al. Systematic review of augmented reality in urological interventions: The evidences of an impact on surgical outcomes are yet to come. World J Urol, 2020; 38:2167–2176.

Lucas

, Liem

, Savci-Heijink

, et al. Toward automated in vivo bladder tumor stratification using confocal laser endomicroscopy. J Endourol, 2019; 33:930–937.

Checcucci

, De Cillis

, Granato

, et al. Applications of neural networks in urology: A systematic review. Curr Opin Urol, 2020; 30:788–807.

Shah

, Naik

, Somani

, et al. Artificial intelligence (AI) in urology—Current use and future directions: An iTRUE study. Turk J Urol, 2020; 46:S27–S39.

Hameed

, Shah

, Naik

, et al. The ascent of artificial intelligence in endourology: A systematic review over the last 2 decades. Curr Urol Rep, 2021; 22:1–18.

Kwong

, McLoughlin

, Haider

, et al. Standardized reporting of machine learning applications in urology: The STREAM-URO framework. Eur Urol Focus, 2021; 7:672–682.

Nguyen

, Luo

, Lu

, et al. Estimating the health-related quality of life of kidney stone patients: Initial results from the Wisconsin Stone Quality of Life Machine-Learning Algorithm (WISQOL-MLA). BJU Int, 2021; 128:88–94.

10.

Choo

, Uhmn

, Kim

, et al. A prediction model using machine learning algorithm for assessing stone-free status after single session shock wave lithotripsy to treat ureteral stones. J Urol, 2018; 200:1371–1377.

11.

Mannil

, von Spiczak

, Hermanns

, et al. Prediction of successful shock wave lithotripsy with CT: A phantom study using texture analysis. Abdom Radiol (NY), 2018; 43:1432–1438.

12.

Gomha

, Sheir

, Showky

, et al. Can we improve the prediction of stone-free status after extracorporeal shock wave lithotripsy for ureteral stones? A neural network or a statistical model?. J Urol, 2004; 172:175–179.

13.

, Zhou

, Jia

, et al. Prediction of proximal ureteral stones clearance after shock wave lithotripsy using an artificial neural network. Urol J, 2021; 24:491–496.

14.

Seckiner

, Seckiner

, Sen

, et al. A neural network-based algorithm for predicting stone—Free status after ESWL therapy. Int Braz J Urol, 2017; 43:1110–1114.

15.

Yang

, Hyon

, Na

, et al. Machine learning prediction of stone-free success in patients with urinary stone after treatment of shock wave lithotripsy. BMC Urol, 2020; 20:88.

16.

Poulakis

, Dahm

, Witzsch

, et al. Prediction of lower pole stone clearance after shock wave lithotripsy using an artificial neural network. J Urol, 2003; 169:1250–1256.

17.

Mannil

, von Spiczak

, Hermanns

, et al. Three-dimensional texture analysis with machine learning provides incremental predictive information for successful shock wave lithotripsy in patients with kidney stones. J Urol, 2018; 200:829–836.

18.

Aminsharifi

, Irani

, Pooyesh

, et al. Artificial neural network system to predict the postoperative outcome of percutaneous nephrolithotomy. J Endourol, 2017; 31:461–467.

19.

Aminsharifi

, Irani

, Tayebi

, et al. Predicting the postoperative outcome of percutaneous nephrolithotomy with machine learning system: Software validation and comparative analysis with Guy's Stone score and the CROES nomogram. J Endourol, 2020; 34:692–699.

20.

Kadlec

, Ohlander

, Hotaling

, et al. Nonlinear logistic regression model for outcomes after endourologic procedures: A novel predictor. Urolithiasis, 2014; 42:323–327.

21.

Liu

, Wang

, Tang

, et al. Machine learning-assisted decision-support models to better predict patients with calculous pyonephrosis. Transl Androl Urol, 2021; 10:710–723.

22.

Solakhan

, Seckiner

. A neural network-based algorithm for predicting the spontaneous passage of ureteral stones. Urolithiasis, 2020; 48:527–532.

23.

Cummings

, Boullier

, Izenberg

, et al. Prediction of spontaneous ureteral calculous passage by an artificial neural network. J Urol, 2000; 164:326–328.

24.

Dal Moro

, Abate

, Lanckriet

, et al. A novel approach for accurate prediction of spontaneous passage of ureteral stones: Support vector machines. Kidney Int, 2006; 69:157–160.

25.

Parekattil

, Kumar

, Hegarty

, et al. External validation of outcome prediction model for ureteral/renal calculi. J Urol, 2006; 175:575–579.

26.

Goyal

, Kumar

, Trivedi

, et al. A comparative study of artificial neural network and multivariate regression analysis to analyze optimum renal stone fragmentation by extracorporeal shock wave lithotripsy. Saudi J Kidney Dis Transpl, 2010; 21:1073–1080.

27.

Moorthy

, Krishnan

. Prediction of fragmentation of kidney stones: A statistical approach from NCCT images. Can Urol Assoc J, 2016; 10:E237–E240.

28.

Hamid

, Dwivedi

, Singh

, et al. Artificial neural networks in predicting optimum renal stone fragmentation by extracorporeal shock wave lithotripsy: A preliminary study. BJU Int, 2003; 91:821–824.

29.

Shabaniyan

, Parsaei

, Aminsharifi

, et al. An artificial intelligence-based clinical decision support system for large kidney stone treatment. Australas Phys Eng Sci Med, 2019; 42:771–779.

30.

Hameed

BMZ

, Shah

, Naik

, et al. Application of artificial intelligence-based classifiers to predict the outcome measures and stone-free status following percutaneous nephrolithotomy for staghorn calculi: Cross-validation of data and estimation of accuracy. J Endourol, 2021; 20:20.

31.

Kazemi

, Mirroshandel

. A novel method for predicting kidney stone type using ensemble learning. Artif Intell Med, 2018; 84:117–126.

32.

Dussol

, Verdier

, Le Goff

, et al. Artificial neural networks for assessing the risk of urinary calcium stone among men. Urol Res, 2006; 34:17–25.

33.

Zheng

, Yu

, Batur

, et al. A multicenter study to develop a non-invasive radiomic model to identify urinary infection stone in vivo using machine-learning. Kidney Int, 2021; 12:12.

34.

Zhang

, Sun

, Shi

, et al. Uric acid versus non-uric acid urinary stones: Differentiation with single energy CT texture analysis. Clin Radiol, 2018; 73:792–799.

35.

Estrade

, Daudon

, Richard

, et al. Towards automatic recognition of pure & mixed stones using intraoperative endoscopic digital images. BJU Int, 2021; 16:234–242.

36.

Kuzmanovski

, Trpkovska

, Soptrajanov

, et al. Determination of the composition of human urinary calculi composed of whewellite, weddellite and carbonate apatite using artificial neural networks. Anal Chim Acta, 2003; 491:211–218.

37.

Kuzmanovski

, Zografski

, Trpkovska

, et al. Simultaneous determination of composition of human urinary calculi by use of artificial neural networks. Fresenius J Anal Chem, 2001; 370:919–923.

38.

Black

, Law

, Aldoukhi

, et al. Deep learning computer vision algorithm for detecting kidney stone composition. BJU Int, 2020; 125:920–924.

39.

Fitri

, Haryanto

, Arimura

, et al. Automated classification of urinary stones based on microcomputed tomography images using convolutional neural network. Phys Med, 2020; 78:201–208.

40.

Grose Hokamp

, Lennartz

, Salem

, et al. Dose independent characterization of renal stones by means of dual energy computed tomography and machine learning: An ex-vivo study. Eur Radiol, 2020; 30:1397–1404.

41.

Kriegshauser

, Paden

, He

, et al. Rapid kV-switching single-source dual-energy CT ex vivo renal calculi characterization using a multiparametric approach: Refining parameters on an expanded dataset. Abdom Radiol (NY), 2018; 43:1439–1445.

42.

Kahani

, Hariri Tabrizi

, Kamali-Asl

, et al. A novel approach to classify urinary stones using dual-energy kidney, ureter and bladder (DEKUB) X-ray imaging. Appl Radiat Isot, 2020; 164:109267.

43.

Bejan

, Lee

, Xu

, et al. Performance of a natural language processing method to extract stone composition from the electronic health record. Urology, 2019; 132:56–62.

44.

Cui

, Zhao

, Zhang

, et al. Analysis and classification of kidney stones based on Raman spectroscopy. Biomed Opt Express, 2018; 9:4175–4183.

45.

Blanco

, Lopez-Mesas

, Serranti

, et al. Hyperspectral imaging based method for fast characterization of kidney stone types. J Biomed Opt, 2012; 17:076027.

46.

Sacli

, Aydinalp

, Cansiz

, et al. Microwave dielectric property based classification of renal calculi: Application of a kNN algorithm. Comput Biol Med, 2019; 112:103366.

47.

Volmer

, de Vries

, Goldschmidt

. Infrared analysis of urinary calculi by a single reflection accessory and a neural network interpretation algorithm. Clin Chem, 2001; 47:1287–1296.

48.

Volmer

, Wolthers

, Metting

, et al. Artificial neural network predictions of urinary calculus compositions analyzed with infrared spectroscopy. Clin Chem, 1994; 40:1692–1697.

49.

Selvarani

, Rajendran

. Detection of renal calculi in ultrasound image using meta-heuristic support vector machine. J Med Syst, 2019; 43:300.

50.

Divya Krishna

, Akkala

, Bharath

, et al. Computer aided abnormality detection for kidney on FPGA based IoT enabled portable ultrasound imaging system. IRBM, 2016; 37:189–197.

51.

Langkvist

, Jendeberg

, Thunberg

, et al. Computer aided detection of ureteral stones in thin slice computed tomography volumes using Convolutional Neural Networks. Comput Biol Med, 2018; 97:153–160.

52.

Cui

, Sun

, Ma

, et al. Automatic detection and scoring of kidney stones on noncontrast CT images using S.T.O.N.E. nephrolithometry: Combined deep learning and thresholding methods. Mol Imaging Biol, 2021; 23:436–445.

53.

Fernandez

, Korinek

, Camp

, et al. Automatic detection of calcium phosphate deposit plugs at the terminal ends of kidney tubules. Healthc Technol Lett, 2019; 6:271–274.

54.

Lee

, Kim

, Hwang

, et al. Differentiation of urinary stone and vascular calcifications on non-contrast CT images: An initial experience using computer aided diagnosis. J Digit Imaging, 2010; 23:268–276.

55.

De Perrot

, Hofmeister

, Burgermeister

, et al. Differentiating kidney stones from phleboliths in unenhanced low-dose computed tomography using radiomics and machine learning. Eur Radiol, 2019; 29:4776–4782.

56.

Jendeberg

, Thunberg

, Liden

. Differentiation of distal ureteral stones and pelvic phleboliths using a convolutional neural network. Urolithiasis, 2021; 49:41–49.

57.

Chen

, Prosperi

, Bird

, et al. Analysis of factors associated with large kidney stones: Stone composition, comorbid conditions, and 24-H urine parameters—A machine learning-aided approach. SN Compr Clin Med, 2019; 1:597–602.

58.

Eken

, Bilge

, Kartal

, et al. Artificial neural network, genetic algorithm, and logistic regression applications for predicting renal colic in emergency settings. Int J Emerg Med, 2009; 2:99–105.

59.

Chen

, Bird

, Ruchi

, et al. Development of a personalized diagnostic model for kidney stone disease tailored to acute care by integrating large clinical, demographics and laboratory data: The diagnostic acute care algorithm - kidney stones (DACA-KS). BMC Med Inform Decis Mak, 2018; 18:72.

60.

Jungmann

, Kampgen

, Mildenberger

, et al. Towards data-driven medical imaging using natural language processing in patients with suspected urolithiasis. Int J Med Inform, 2020; 137:104106.

61.

Chiang

, Chiang

, Chen

, et al. Prediction of stone disease by discriminant analysis and artificial neural networks in genetic polymorphisms: A new method. BJU Int, 2003; 91:661–666.

62.

Tanthanuch

, Tanthanuch

. Prediction of upper urinary tract calculi using an artificial neural network. J Med Assoc Thai, 2004; 87:515–518.

63.

Caudarella

, Tonello

, Rizzoli

, et al. Predicting five-year recurrence rates of kidney stones: An artificial neural network model. Arch Ital Urol Androl, 2011; 83:14–19.

64.

Chen

, Zeng

, Seltzer

RGN

, et al. Automated generation of personalized shock wave lithotripsy protocols: Treatment planning using deep learning. JMIR Med Inform, 2021; 9:e24721.

65.

, Liu

, Zhang

, et al. Discrimination analysis of B-mode ultrasonography and X-ray on the percutaneous nephrolithotomy localization of urinary stones: A prospective, controlled study. Int J Emerg Med, 2016; 9:2261–2268.

66.

Muller

, Abildsnes

, Ostvik

, et al. Can a dinosaur think? Implementation of artificial intelligence in extracorporeal shock wave lithotripsy. Eur Urol Open Sci, 2021; 27:33–42.

67.

Khalid

, Syed

, Shah

SSH

. Machine learning approaches for the histopathological diagnosis of prostatic hyperplasia. Ann Clin Anal Med, 2020; 11:425–428.

68.

Habes

, Bahr

, Schiller

, et al. New technique for prostate volume assessment. World J Urol, 2014; 32:1559–1564.

69.

Shatalova

, Filist

, Korenevskiy

, et al. Application of fuzzy neural network model and current-voltage analysis of biologically active points for prediction post-surgery risks. Comput Methods Biomech Biomed Engin, 2021; 24:1504–1516.

70.

Djavan

, Fong

, Harik

, et al. Longitudinal study of men with mild symptoms of bladder outlet obstruction treated with watchful waiting for four years. Urology, 2004; 64:1144–1148.

71.

Sethi

, Sha

, Kumar

, et al. Computer vision detects subtle histological effects of dutasteride on benign prostate. BJU Int, 2018; 122:143–151.

72.

Parekattil

, White

, Moran

, et al. A computer model to predict the outcome and duration of ureteral or renal calculous passage. J Urol, 2004; 171:1436–1439.

73.

Nickel

, Aaron

, Barkin

, et al. Canadian Urological Association guideline on male lower urinary tract symptoms/benign prostatic hyperplasia (MLUTS/BPH): 2018 Update. Can Urol Assoc J, 2018; 12:303.

74.

Nayan

, Salari

, Bozzo

, et al. Predicting survival after radical prostatectomy: Variation of machine learning performance by race. Prostate, 2021; 81:1355–1364.

75.

Chen

, Remulla

, Nguyen

, et al. Current status of artificial intelligence applications in urology and their potential to influence clinical practice. BJU Int, 2019; 124:567–577.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB