The Evaluation of Computer Vision-Based Automated Performance Metrics for Endoscopic Kidney Stone Surgery

Abstract

Background:

The assessment of surgical competency is essential for clinical training and safety. No objective, real-time tools exist to evaluate competency during endoscopic stone operation. We sought to apply endoscopic computer vision models to define automated performance metrics (APMs) from videos of flexible ureteroscopy.

Materials and Methods:

We assessed three APMs for endoscopic treatment of kidney stones, including percentage of frames without stone visibility, screen occupancy by stone, and frame-to-frame change in stone occupancy. Surgical videos of a surgeon performing either stone localization or stone ablation were recorded. Using our previously validated computer vision model for endoscopic stone segmentation, APMs were compared between experts (fellowship-trained endourologists) and trainees.

Results:

Forty-six videos, including 28 of stone localization and 18 of stone laser ablation, were analyzed from nine surgeons (three experts and six trainees). During stone localization, trainee videos had a higher percentage of frames without visible stone (4% vs 27%, p < 0.01) and lower screen occupancy by stone (5% vs 14%, p = 0.03) compared with expert videos. During laser ablation, trainee videos had a higher frame-to-frame change in stone occupancy (3% vs 2%, p < 0.01) compared with expert videos.

Conclusions:

APMs from computer vision methods differ between expert and trainee surgical videos of endoscopic kidney stone treatment. These metrics could be used to objectively assess skill evaluation and acquisition.

Introduction

Ureteroscopy has been utilized since the 1980s and remains a central tool in the armamentarium for management of urinary calculi, as well as other benign and malignant urologic conditions.^1,2 Evaluating and facilitating surgical competency of flexible ureteroscopy (fURS) is critical during urologic surgical training because of the high prevalence of stone disease. However, efforts to improve skill evaluation and acquisition in the operating room are inherently limited by time constraints and patient safety concerns. In addition, many of the current methods of surgical evaluation are subjective.

Feedback to trainees is usually in the form of verbal communication from the attending surgeon after the case or in real time, but is inherently subjective. This is particularly true for endoscopic kidney stone operation, where there are few ways to evaluate competency and provide feedback objectively. Widely used flexible endoscopes give no kinematic data (unlike robotic procedures), and the manual review of endoscopic video recordings has not been able to distinguish endoscopic surgical expertise.³ In contrast, several automated performance metrics (APMs) have been identified in robotic procedure and provide an objective method to assess surgical performance.^4,5 Validated, objective real-time tools to evaluate surgical expertise and provide feedback to trainees during endoscopic stone procedure are necessary for both fostering patient safety and improving surgical outcomes.

We have previously demonstrated that computer vision models (i.e., an application of machine learning, which automatically analyzes image data) could accurately and automatically segment stones in endoscopic surgical videos.⁶ Not only do these models accurately track kidney stones during the operation in real time but also provide an objective way to evaluate surgical videos of kidney stone treatment for surgical skills. In this study, we sought to define APMs that may help in distinguishing surgical expertise. Specifically, we performed a pixel-based analysis of stone segmentation and assessed performance using novel APMs for both expert and trainee videos during endoscopic kidney stone procedure.

Materials and Methods

Study design and video collection

After internal review board approval, we performed a computer vision-based analysis of prospectively collected surgical videos of fURS for renal stones between January 2022 and October 2022. We collected 50 surgical videos and identified the associated surgeons. Surgeons were categorized as “expert” (n = 3, fellowship-trained endourologist, case volume of >100 fURS per year) or “trainee” (n = 6, postgraduate year 3–4 resident, <100 fURS per year). We compared the expert and trainee videos for two specific tasks as follows: (1) stone localization and (2) stone laser ablation (Fig. 1). For each included case, experts and trainees were randomized to perform either the localization or ablation task (i.e., one task was performed by the expert endourologist and one by the trainee; Fig. 1). For the study, stone localization was standardized to include systematic evaluation of the entire collecting system from upper to lower pole along with observation of any stones in the collecting system. If a stone was seen early in the process, the entire collecting system still was required to be seen before the task was considered complete. For stone ablation, we analyzed the first 20 seconds after laser activation. Stone ablation was performed using dusting settings (starting at 0.3 J and 50 Hz) using a 200-μm holmium laser fiber with pulse modulation (MOSES®). Before beginning stone ablation, the surgeon was permitted to move the stone into an easily accessible calix. Continuous pressure irrigation was used using either a pressure bag or fluid management system. When a trainee was assigned to either task, the experts were permitted to take control of the scope and confirm complete collecting system evaluation (after the stone localization task) or complete stone treatment (after the stone ablation task). All surgeons are aware of the videos being recorded, but were not aware of the purpose of this study. We also reviewed postoperative imaging associated with each video for stone-free status. We defined stone free as having no residual stone fragments seen on postoperative renal ultrasound. Because of cost and limited radiation exposure, renal ultrasound imaging is obtained routinely at our institution to evaluate for residual stone fragments and hydronephrosis around 6 weeks postoperatively.

FIG. 1.

Experimental protocol. Expert and trainee surgeons were assigned to stone localization or ablation tasks. Computer vision models were applied to each video to determine and compare pixel-based, automated performance metrics.

Karl Storz Flex XC digital ureteroscopes (1920 × 1080 pixels) were used for all cases, and all surgeons were right-handed. A 10/12 Fr access sheath was routinely placed into the ipsilateral ureter at the beginning of each case. Patients with anatomical abnormalities of their collecting systems were excluded from analysis. All videos were visually validated for quality and frames extracted at 30 frames per second (fps).

Data extraction and analyses

Data on stone size, location, density, and volume were manually extracted from preoperative CT imaging by a single surgeon. We applied our previously validated computer vision model for automated kidney stone segmentation and performed a pixel-based analysis of each surgical video.^6,7 This model is trained and optimized for kidney stone segmentation during endoscopic kidney stone procedure. In addition, the model has been deployed in real time in the operating room. We performed a pixel-based analysis on each video and compared trainees and experts for each task for three APMs. As there are no previously described APMs for endoscopic urologic operation using video input, we assessed three metrics focused on efficiency and stability stone visualization during tasks. Specifically, we evaluated three metrics as follows: 1.

Loss of Stone Visibility: We reported the percentage of frames without automatically identified stone (i.e., without visible stone).

Screen Occupancy by Stone: We computed the percentage of pixels identified as stone per frame. We reported the median percentage over the entire dataset (i.e., median screen occupancy by stone).

Frame-to-Frame Change in Stone Occupancy: We computed the percent change in screen occupancy by stone between consecutive frames for each task.

Statistical analyses

Comparisons of stone characteristics, loss of stone visibility, and screen occupancy by stone were performed to assess for any differences between experts and trainees using Fisher’s exact and chi-square analysis for categorical variables, as well as Wilcoxon rank sum tests for continuous variables. Only segments of video that included stone were used to evaluate the frame-to-frame change in stone occupancy during the stone localization task. All statistical analyses were performed using R version 3.4.3 (Vienna, Austria).

Results

Video and stone characteristics

Fifty separate fURS videos of kidney stone treatment were recorded and reviewed. Four videos were excluded from analysis because of poor video quality, and 46 total videos were analyzed. Specifically, 28 videos (14 expert and 14 trainee videos) of the stone localization task and 18 videos (9 expert and 9 trainee videos) of the stone ablation task were analyzed. In comparison between cases of trainee and expert videos, there was no difference in stone sidedness, location, size, or density for either task (Table 1).

Table 1.

Characteristics of Stones Treated in the Surgical Videos Based on Surgeon Expertise (i.e., Trainee vs Expert)

	Stones treated in trainee videos	Stones treated in expert videos	Total	p-Value
Stone localization task	N = 14	N = 14	N = 28
Stone sidedness, N (%)				1.0
Left	13 (93)	12 (86)	25 (89)
Right	1 (7)	2 (14)	3 (11)
Stone location, N (%)				1.0
Renal pelvis	1 (7)	1 (7)	2 (7)
Upper pole	3 (21)	4 (29)	7 (25)
Interpolar	1 (7)	1 (7)	2 (8)
Lower pole	9 (64)	8 (57)	17 (61)
Median HU density (IQR)	882.5 (737–998)	939 (703–1175)	885 (700–1125)	0.67
Stone volume, mm³ (IQR)	94.2 (75–231)	105.6 (36–291)	103	1.0
Stone free, N (%)	7 (50)	7 (50)	14 (50)	1.0
Stone ablation task	N = 9	N = 9	N = 18
Stone sidedness, N (%)				1.0
Left	6 (67)	7 (78)	13 (72)
Right	3 (33)	2 (22)	5 (28)
Stone location, N (%)				0.66
Renal pelvis	0 (0)	2 (22)	2 (11)
Upper pole	3 (33)	4 (45)	7 (39)
Interpolar	1 (11)	1 (11)	2 (11)
Lower pole	5 (56)	2 (22)	7 (39)
Median HU density (IQR)	700 (635–885)	900 (700–1200)	797 (635–998)	0.40
Stone volume, mm³ (IQR)	158 (75–224)	105 (70–253)	131.8 (75–246)	0.89
Stone free, N (%)	3 (33)	4 (44)	7 (39)	0.63

N reflects number of videos analyzed for each task for trainees and experts, respectively.

HU = Hounsfield units.

Computer vision analysis of stone localization and ablation tasks

On average, the stone localization task took 72 ± 31 seconds (± standard deviation [SD]) for trainees and 58 ± 17 seconds for experts (p < 0.01). For this task, 2163 frames were analyzed from trainee surgeon videos and compared with 1731 frames from expert surgeon videos (Table 2). Loss of stone visibility was greater in trainee videos compared with expert videos (27% vs 4%, p < 0.01). In addition, trainee videos had a significantly lower screen occupancy by stone compared with expert videos during the localization task (5% vs 14%, p < 0.01). Frame-to-frame change in stone occupancy was the same between expert and trainee videos.

Table 2.

Frame-Based Comparison of Trainee and Expert Videos During the Stone Localization and Stone Ablation Tasks Using Stone Segmentation Model

	Trainee	Expert	p-Value
Localization task (total number of frames)	N = 2163	N = 1731
Loss of stone visibility,^a N (%)	592 (27)	79 (4)	<0.01
Screen occupancy by stone,^b median percentage (IQR)	5 (0–11)	14 (5–28)	0.03
Frame-to-frame change in stone occupancy,^c median percentage (IQR)	1 (0–2)	1 (0–3)	0.7
Stone ablation task (total number of frames)	N = 5202	N = 5202
Loss of stone visibility (%)	428 (8)	359 (7)	0.99
Screen occupancy by stone, median percentage (IQR)	22 (8–34)	13 (6–22)	0.11
Frame-to-frame change in stone occupancy, median percentage (IQR)	3 (1–6)	2 (1–4)	<0.01

Loss of stone visibility is defined as proportion of frames with no stone seen out of all frames per task.

Screen occupancy by stone is defined as the percentage of total pixels on screen occupied by stone for each frame.

Frame-to-frame change in stone occupancy is defined as the change in frame-to-frame occupancy of stone.

For the stone ablation task, 5202 frames were analyzed from both trainee and expert videos, corresponding to the first 20 seconds of stone ablation. There were no differences in loss of stone visibility nor screen occupancy by stone in trainee and expert videos during stone ablation. However, during the stone ablation task, we observed greater frame-to-frame change in stone occupancy in trainee videos compared with expert videos (3% vs 2%, p < 0.01). A representative figure showing screen occupancy by stone over time for two videos (one trainee and one expert) is shown in Figure 2. In addition, individual metrics for trainees and experts can be seen in Supplementary Table S3 with comparisons of loss of stone and stone occupancy for individual surgeons depicted in Supplementary Figures S3 and Figure S4.

FIG. 2.

Example of trainee (red) vs expert (blue) observation of kidney stone as measured by percentage of frame occupied by stone. Corresponding frames are shown on lower panel. (A) The raw ureteroscopy image. (B) The ureteroscopy image and a blue overlay of the stone segmentation as identified by our computer vision model. The plot exemplifies instances of loss of stone observation and greater movement of the stone between frames (i.e., frame-to-frame change in stone occupancy) between experts and trainees.

Discussion

In this study, we demonstrate an application of a computer vision-based analysis to evaluate APMs in both expert and trainee surgical videos depicting fURS for the treatment of renal stones. We found differences between expert and trainee surgical videos when evaluating each APM. The application of an automated computer vision-based analysis for endoscopic stone treatment could be helpful as an objective feedback tool and chart for changes to trainee efficiency over time. Given the time limitations impressed on academic surgeons and trainees alike, these automated objective feedback tools could help supplement traditional methods of feedback.

The ability to adequately navigate the collecting system and observe renal stones depends on hand–eye coordination, memory, and spatial reasoning. Surgical experience impacts these factors and likely reflects the differences seen in the evaluated APMs. For example, during both stone localization and ablation, the trainee videos depicted more frequent loss of stone visibility compared with expert videos. This is likely because of absence of efficiency of motion and scope control while surveying the collecting system and treating the stones completely. Moreover, during the stone localization task, analysis of the videos revealed that experts observed the stones more completely than trainees with the median screen occupancy by stone being 14% and 5%, respectively. Less observation of stones by trainees compared with experts may reflect absence of scope control or a greater uncertainty in understanding stone position within the surgical anatomy.

In addition, we assessed the frame-to-frame change in stone occupancy during both the localization and ablation tasks. The frame-to-frame change in stone occupancy reflects an increase in endoscopic motion from movement of either the ureteroscope or the stone. Although there was no difference in frame-to-frame change in stone occupancy during stone localization, we found a greater frame-to-frame change in stone occupancy in trainee compared with expert videos during stone ablation. This likely represents less hand–eye coordination during laser ablation of kidney stones for trainees. This is expected as trainees may have more difficulty in maintaining scope position when trying to treat a stone or cause more stone motion during active lasering compared with experts. An example of frame-to-frame change in stone occupancy can be seen in Figure 2. Other factors such as respiratory movement, surgical anatomy, laser modality, and laser settings may impact stone motion during treatment. Future work is aimed at evaluating how these factors may impact surgical technique and efficiency of stone ablation for both trainees and experts. We choose to analyze only the first 20 seconds of stone ablation to allow for a direct comparison of dusting technique between groups and avoid potential confounding variables such as alternate treatment strategies (i.e., dusting vs pop-dusting) toward the end of stone ablation. Although isolating to the first 20 seconds of ablation allows for a more direct comparison of dusting technique between trainees and experts, it is not as applicable to other techniques of stone ablation.

Efforts to improve training for endoscopic surgical techniques predominantly focus on ex vivo simulation to supplement the intraoperative training experience. Multiple endoscopic simulators are currently available, and simulator-based training is associated with subsequent improved surgical performance in patients.^8,9 However, ex vivo simulators are limited. They often lack surgical realism, are associated with additional costs, and are limited in applicability for skill acquisition in more advanced learners.^8,10 There is a need for training tools that objectively evaluate competency during endoscopic urologic procedures in real time. In this study, we describe an objective real-time tool for evaluating expertise during endoscopic procedure and define specific APMs that distinguish surgical expertise. This tool is software based and could easily be included in training programs as an adjunct to surgical video review. Furthermore, application of this computer vision model could be used as a performance metric to aid in the assessment of completion of stone ablation during dusting.

Using machine learning-based tools to evaluate surgical performance has not been described for endoscopic stone procedure. However, these technologies have been applied for laparoscopic–robotic procedures. For example, Hung and colleagues previously found that APMs based on kinematic surgical data could reliably distinguish between trainee and expert surgeons performing steps of a robotic prostatectomy.⁴ Several other studies have demonstrated the use of machine learning methods to identify objective measures such as task-evoked pupillary responses and substitch handling during vesicourethral anastomosis to distinguish robotic surgeon expertise during live procedure.^11,12 Application of our models from a training perspective may clarify the learning curve for endoscopic stone procedure and provide actionable steps for improvement for trainees. In addition, such models could eventually provide real-time intraoperative feedback for endoscopic surgeons. The concept of automated segmentation is broadly applicable to other operations and can be used for things like identification of surgical anatomy and pathology report or image-guided procedure.

Beyond education, there is an association between APMs and surgical outcomes in robotic procedure.^3,13 For instance, Lee and coworkers previously demonstrated that APMs captured during radical prostatectomy by the Intuitive Surgical System such as camera path length or instrument idle time could delineate risk of positive surgical margins.¹³ Our findings demonstrate that APMs can be automatically identified during endoscopic kidney stone procedure.

We did find that experts more closely observed stone during the stone localization task, and these APMs correlated with quicker time to completion of the task. Limited postoperative imaging was performed in this study with most patients having a renal ultrasound done within 6 weeks. Although we observed no differences in overall stone-free rate between groups, this study was not specifically designed to assess stone-free rates, and this finding is limited by the low specificity of renal ultrasound. Further work is needed to assess whether the APMs identified in this study correlate with objective postoperative markers of quality such as stone-specific outcomes (e.g., stone-free rate on cross-sectional imaging), efficiency of stone ablation, or complications. Specifically, ongoing work is underway utilizing computer vision models to aid in real-time identification of sufficiently small dust that correlates with subsequent stone-free status.¹⁴

The findings of this study must be interpreted in the context of some limitations. This was a relatively small, single-institution study at a tertiary referral center and, therefore, may not be applicable to all clinical settings. Only three expert surgeons and six trainees were evaluated, and all were right-handed. This does not provide enough diversity to evaluate the broad variety in surgical skill among both groups on a larger scale or in different clinical settings. In addition, the accuracy of our computer vision models is limited, particularly during active stone treatment as small stone debris generated during treatment may decrease accuracy of segmentation.⁶ We plan on including more surgical videos demonstrating debris to “robustify” future versions of our model. We also plan on incorporating models that remove visual impairments from images to mask debris during endoscopic procedure.^15,16

In this study, we were unable to control for type of anesthetic use during each case, introducing a potential confounding variable as differing degrees of respiratory motion could impact degree of difficulty of stone observation. Additional videos from a broader set of patients, clinical settings, surgeon experience, and scope types could improve future computer vision models. Future work will involve obtaining videos from multiple other institutions—including both trainees of differing experience levels and different surgeon-performed tasks. It will be vital to obtain videos across the spectrum of resident experience to help maximize the impact on applications to surgical training. Compiling these videos will help increase robustness of our metrics, as well as help associate them with surgical outcomes, including stone-free rate. Experimental design will need to account for the potential impact of each task on stone-free rate. Despite these limitations, we defined and evaluated APMs for endoscopic kidney stone procedure by evaluating differences between expert and trainee surgical videos. Future efforts will aim to improve and validate these APMs, as well as evaluate how they correlate with clinical outcomes.

Conclusions

Using an automated computer vision-based analysis, we evaluated APMs from surgical videos of endoscopic kidney stone procedure. Differences between expert and trainee videos were seen both during stone localization and stone treatment. Augmentation of this model with a broader set of videos could facilitate creation of enhanced educational and real-time feedback tools for fURS.

Footnotes

Authors’ Contributions

J.C.: Writing, data analysis, and data collection. D.L.: Writing, data collection, and data analysis. C.F.: Data collection. T.K.: Data analysis and editing. I.O.: Supervision, conceptualization, and editing. N.K.: Data analysis, conceptualization, editing, and supervision.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

- Summer Scholarship from the Endourology Society, 2022 (C.F.)

- VISE physician in residence program (N.K.)

- NIH R21 1R21DK133742-01a1 (N.K. and I.O.)

- Training Program for Innovative Engineering Research in Surgery and Intervention Project Number 3T32EB021937-08S1 (D.L.).

Supplementary Material

Supplementary Data

Abbreviations Used

References

Abdelshehid

, Ahlering

, Chou

, et al. Comparison of flexible ureteroscopes: Deflection, irrigant flow and optical characteristics. J Urol, 2005; 173(6):2017–2021; doi: 10.1097/01.ju.0000158139.65771.0a

Dretler

, Watson

, Parrish

, Murray

. Pulsed dye laser fragmentation of ureteral calculi: Initial clinical experience. J Urol, 1987; 137(3):386–389; doi: 10.1016/s0022-5347(17)44043-2

Conti

, Brubaker

, Chung

, et al. Crowdsourced assessment of ureteroscopy with laser lithotripsy video feed does not correlate with trainee experience. J Endourol, 2019; 33(1):42–49; doi: 10.1089/end.2018.0534

Hung

, Chen

, Jarc

, Hatcher

, Djaladat

, Gill

. Development and validation of objective performance metrics for robot-assisted radical prostatectomy: A pilot study. J Urol, 2018; 199(1):296–304; doi: 10.1016/j.juro.2017.07.081

Trinh

, Mingo

, Vanstrum

, et al. Survival analysis using surgeon skill metrics and patient factors to predict urinary continence recovery after robot-assisted radical prostatectomy. Eur Urol Focus, 2022; 8(2):623–630; doi: 10.1016/j.euf.2021.04.001

Setia

, Stoebner

, Floyd

, Lu

, Oguz

, Kavoussi

. Computer vision enabled segmentation of kidney stones during ureteroscopy and laser lithotripsy. J Endourol, 2023; 37(4):495–501; doi: 10.1089/end.2022.0511

Stoebner

, Lu

, Hong

, Kavoussi

, Oguz

. Segmentation of kidney stones in endoscopic video feeds. Medical Imaging, 2022; doi: 10.4855/0/arXiv.2204.14175

Brunckhorst

, Aydin

, Abboudi

, et al. Simulation-based ureteroscopy training: A systematic review. J Surg Educ, 2015; 72(1):135–143; doi: 10.1016/j.jsurg.2014.07.003

Schout

, Ananias

, Bemelmans

, et al. Transfer of cysto-urethroscopy skills from a virtual-reality simulator to the operating room: A randomized controlled trial. BJU Int, 2010; 106(2):226–231; discussion 231; doi: 10.1111/j.1464-410X.2009.09049.x

10.

Ahmed

, Jawad

, Abboudi

, et al. Effectiveness of procedural simulation in urology: A systematic review. J Urol, 2011; 186(1):26–34; doi: 10.1016/j.juro.2011.02.2684

11.

Chen

, Liang

, Nguyen

, Liu

, Hung

. Machine learning analyses of automated performance metrics during granular sub-stitch phases predict surgeon experience. Surgery, 2021; 169(5):1245–1249; doi: 10.1016/j.surg.2020.09.020

12.

Nguyen

, Chen

, Marshall

, et al. Using objective robotic automated performance metrics and task-evoked pupillary response to distinguish surgeon expertise. World J Urol, 2020; 38(7):1599–1605; doi: 10.1007/s00345-019-02881-w

13.

Lee

, Ma

, Pham

, et al. Machine learning to delineate surgeon and clinical factors that anticipate positive surgical margins after robot-assisted radical prostatectomy. J Endourol, 2022; 36(9):1192–1198; doi: 10.1089/end.2021.0890

14.

Maciolek

, Lu

, Oguz

, Kavoussi

. AUTOMATED ANALYSIS OF STONE DUST DURING URETEROSCOPY TO PREDICT STONE FREE STATUS USING COMPUTER VISION MODELS. Abstract. Journal of Urology, 2024; 211(5S):e552.

15.

Liu

WEI

, Hou

, Duan

, Qiu

. End-to-End Single Image Fog Removal Using Enhanced Cycle Consistent Adversarial Networks. IEEE Trans on Image Process, 2020; 29:7819–7833; doi: 10.1109/TIP.2020.3007844

16.

Lin

C-Y

, Tao

, Xu

A-S

, Kang

L-W

, Akhyar

. Sequential Dual Attention Network for Rain Streak Removal in a Single Image. IEEE Trans on Image Process, 2020; 29:9250–9265; doi: 10.1109/TIP.2020.3025402