Abstract
Objective:
To assess the diagnostic accuracy and intra-observer agreement of endoscopic stone recognition (ESR) compared with formal stone analysis.
Introduction:
Stone analysis is a corner stone in the prevention of stone recurrence. Although X-ray diffraction (XRD) and infrared spectroscopy are the recommended techniques for reliable formal stone analysis, this is not always possible, and the process takes time and is costly. ESR could be an alternative, as it would give immediate information on stone composition.
Materials and Methods:
Fifteen endourologists predicted stone composition based on 100 videos from ureterorenoscopy. Diagnostic accuracy was evaluated by comparing the prediction from visual assessment with stone analysis by XRD. After 30 days, the videos were reviewed again in a random order to assess intra-observer agreement.
Results:
The median diagnostic accuracy for calcium oxalate monohydrate was 54% in questionnaire 1 (Q1) and 59% in questionnaire 2 (Q2), whereas calcium oxalate dihydrate had a median diagnostic accuracy of 75% in Q1 and 50% in Q2. The diagnostic accuracy for calcium hydroxyphosphate was 10% in Q1 and 13% in Q2. The median diagnostic accuracy for calcium hydrogen phosphate dihydrate and calcium magnesium phosphate was 0% in both questionnaires. The median diagnostic accuracy for magnesium ammonium phosphate was 20% in Q1 and 40% in Q2. The median diagnostic accuracy for uric acid was 22% in both questionnaires. Finally, there was a diagnostic accuracy of 60% in Q1 and 80% in Q2 for cystine. The intra-observer agreement ranged between 45% and 72%.
Conclusion:
Diagnostic accuracy of ESR is limited and intra-observer agreement is below the threshold of acceptable agreement.
Introduction
Urolithiasis is a common medical condition worldwide. Prevalence ranges from 5% to 14% in Europe, 7% to 13% in the United States, and 1% to 5% in Asia and these numbers keep rising. 1
The most common types of urolithiasis can be divided into eight groups: calcium oxalate monohydrate (COM), calcium oxalate dihydrate (COD), calcium hydroxyphosphate (CHP), calcium hydrogen phosphate dihydrate (CHPD), calcium magnesium phosphate (CMP), magnesium ammonium phosphate (MAP), uric acid (UA), and cystine. Calcium oxalate stones (COD and COM) represent around 70% to 80% of all urolithiasis, whereas calcium phosphate (CaPh) stones (CHP, CHPD, and CMP) represent around 15%, UA stones 8%, cystine stones 1% to 2%, and MAP stones 1%. 2 The incidence of pure stones varies greatly between studies, ranging from 5% to 59% depending on the study. 3 –7
Knowing the stone's composition leads to a better etiological approach to treat and prevent urolithiasis. If a stone sample is available, then X-ray diffraction (XRD) and infrared spectroscopy (IRS) are the recommended techniques to accurately identify stone composition and its relative proportions. 2 However, with the advent of dusting techniques used to break up stones, a sample is not always available to send for analysis. In 1993, Daudon and colleagues described a stone classification based on the morphoconstitutional aspects of urolithiasis and their possible pathophysiology. 8 –10 Morphoconstitutional assessment has an essential role in the etiological diagnosis of urolithiasis and although this classification is based on the appearance of ex vivo, dry urolithiasis, Estrade and colleagues showed that endoscopic stone recognition (ESR) is feasible. 11
Nevertheless, this was a single-surgeon study and intra-observer agreement was not assessed. Therefore, the present international multicenter study evaluates whether endourologists can accurately predict stone composition based on visual appearance during endoscopic stone treatment when compared with formal stone analysis and if their assessment is reproducible.
Materials and Methods
Ethics
This study was performed according to the ethical standards described in the 1964 Declarations of Helsinki and its later amendments and was approved by the local ethics committee (W20_212 #20.245 on May 14, 2020).
Study design
In this international multicenter study, 15 endourologists assessed stone composition based on the endoscopic appearance of urolithiasis in videos to evaluate diagnostic accuracy and intra-observer agreement. The videos were embedded in a questionnaire on an encrypted web-based platform (Data Management System; T&S Innovations, Maarssen, the Netherlands:

Questionnaires' interface. Color images are available online.
The raters were instructed to view the videos on a laptop or stand-alone with a screen of minimally 13″ in a quiet environment to avoid disturbance. Optimal video quality could be obtained by adjusting the size of the browser screen.
At the beginning of questionnaire 1 (Q1), information about the raters' experience (years practicing urology, years practicing stone treatment, and number of procedures per year) was collected. Then, each video was played in a loop and could be viewed unlimitedly. After assessment, the raters determined whether it was a pure or mixed stone. When raters decided on a pure composition, they had to identify the component subsequently. If they opted for a mixed composition, they had to identify the main and secondary component, as well as a tertiary component if deemed applicable. The exact numbers and relative proportions of the included urolithiasis as well as any other clinical information were unknown to the raters. Finally, they were asked to score the video quality on a 3-point Likert scale (1 = low, 2 = moderate, 3 = high) for each video. 12
After a wash-out period of 30 days, the raters assessed the same videos in a random order in a second questionnaire (Q2) to assess intra-observer agreement. 12
Data acquisition
All videos were prospectively collected during consecutive endoscopic stone treatment with a digital flexible ureterorenoscope (FLEX-XC®; Karl Storz, Tuttlingen, Germany) at a tertiary referral center (Amsterdam UMC, the Netherlands). The procedures were recorded with RVC Clinical Assistant® (RVC, Amersfoort, the Netherlands). After surgery, the recordings were screen captured with Snagit™ (TechSmith, Okemos, MI) and edited with Apple iMovie V.10.2.3. (Apple, Inc., Cupertino, CA) to obtain qualitative full HD (1080p) videos of 5 to 15 seconds.
Stones were treated with a combination of dusting (20 Hz–0.2 J) and fragmentation (10 Hz–1 J) with a laser VersaPulse® PowerSuite™ 100W laser (Lumenis Ltd., Borehamwood, United Kingdom) in the majority of the cases. However, some of the included stones were also extracted in toto. The extracted stone fragments were collected and sent to the department of Clinical chemistry of the Erasmus MC (Rotterdam, the Netherlands) for formal stone analysis with XRD to analyze the stone's composition. XRD was performed according to the general guidelines. First, the extracted fragments were pulverized into dust, after which monochromatic X-rays were used to identify the constituents of the sample. 13
Main component, as well as secondary and tertiary components if applicable, were determined. Pure stones were defined as stones consisting of only one composition on formal stone analysis with XRD. Not all stone compositions and relative proportions are as common. The respective prevalence in this study reflects the prevalence in an academic clinical practice, as data from consecutive procedures was collected. This is comparable with studies describing the prevalence in France, Norway, and the United States. 3,6,7
Statistical analysis
After consulting our institution's statistician, we concluded that a power-analysis to determine the sample size was not possible, due to the lack of publications on this topic. Based on practical grounds and on a similar diagnostic study by Freund and colleagues, we decided to include 100 videos. 12
SPSS V.28 (IBM Corp, Armonk, NY) was used to perform the statistical analysis. Figures and tables were created with Microsoft® Excel (Microsoft Corp., Redmond, WA).
Descriptive analysis was used to assess rater experience and quality of the videos as a median (range) for the whole group.
The diagnostic accuracy for each stone composition was assessed by calculating the concordance of the rater's visual assessment for the main component with the main chemical composition identified with XRD. Results are described as a median (range) for the whole group. In addition, diagnostic accuracy for pure and mixed stones was calculated.
The intra-observer agreement was defined as the percentage of cases where the raters predicted the same main component in both questionnaires. Acceptable agreement was defined as minimally 80% agreement. 12,14
Spearman's rank correlation coefficient was calculated for the experience of each rater and the diagnostic accuracy.
The influence of video quality was assessed by comparing the diagnostic accuracy for all videos with the diagnostic accuracy for videos rated of a high or intermediate quality. A Mann–Whitney U-test was applied to determine statistical significance between these groups.
A two-sided p-value ≤0.05 was considered statistically significant.
Results
Rater experience
At the time of assessment, the raters had been in practice for a median of 10 years (5–20 years), with a median endourological experience of 8 years (5–18 years) and an average caseload of ∼120 ureteroscopies for stone treatment per year (70–300 procedures).
Diagnostic accuracy
The diagnostic accuracy for each chemical composition is presented in Table 1 and as simple boxplots in Figure 2.

Simple boxplot for correctly predicted composition per chemical component. CHP = calcium hydroxyphosphate; CHPD = calcium hydrogen phosphate dihydrate; CMP = calcium magnesium phosphate; COD = calcium oxalate dihydrate; COM = calcium oxalate monohydrate; MAP = magnesium ammonium phosphate; Q1 = questionnaire 1; Q2 = questionnaire 2; UA = uric acid. Color images are available online.
Median (Range) of the Diagnostic Accuracy
CHP = calcium hydroxyphosphate; CHPD = calcium hydrogen phosphate dihydrate; CMP = calcium magnesium phosphate; COD = calcium oxalate dihydrate; COM = calcium oxalate monohydrate; MAP = magnesium ammonium phosphate; Q1 = questionnaire 1; Q2 = questionnaire 2; UA = uric acid.
The median diagnostic accuracy for stones with COM as main component (n = 39) was 54% (44%) in Q1 and 59% (61%) in Q2. Stones with COD as main component (n = 4) had a median diagnostic accuracy of 75% (100%) in Q1 and 50% (75%) in Q2.
Stones with CHP as main component (n = 30) had a median diagnostic accuracy of 10% (43%) in Q1 and 13% (53%) in Q2. The median diagnostic accuracy for CHPD as main component (n = 8) was 0% (38%) in Q1 and 0% (50%) in Q2. One stone had CMP as main component; the median diagnostic accuracy was 0% (100) in both questionnaires.
The median diagnostic accuracy for stones with MAP as main component (n = 4) was 20% (80%) in Q1 and 40% (60%) in Q2.
The median diagnostic accuracy for UA as main component (n = 9) was 22% (45%) in both questionnaires.
Finally, stones with cystine as main component (n = 5) revealed a median diagnostic accuracy of 60% (100%) in Q1 and 80% (80%) in Q2.
A sub-analysis for pure stones (n = 33) was performed. Results are shown in Table 2 and Figure 3.

Simple boxplot for correctly predicted composition per chemical component in pure stones. Color images are available online.
Median (Range) of the Diagnostic Accuracy in Pure Stones
A sub-analysis for mixed stones (n = 67) was also performed. Results are presented in Table 3 and Figure 4.

Simple boxplot for correctly predicted stone composition per chemical component in mixed stones. Color images are available online.
Median (Range) of the Diagnostic Accuracy in Mixed Stones
Additional analysis for subgroups (diagnostic accuracy for pure and mixed stones, calcium (COM, COD, CHP, CHPD, and/or CMP) and noncalcium (MAP, UA, and/or cystine) stones, calcium oxalate (COM and/or COD) and calcium phosphate (CHP, CHPD and/or CMP) stones, as well as for infection (CHP and/or MAP) stones) is presented in Supplementary Appendix SA1. 15 Supplementary Appendix SA2 shows the results per rater.
Intra-observer agreement
The median intra-observer agreement calculated as simple percentage agreement was 56% (27%) (Fig. 5). This is below the predefined threshold of 80% to classify as acceptable agreement. 12,14 The Cohen's kappa coefficient was 0.461 (SE 0.015, p = 0.000). Hence, the overall inter-observer agreement was moderate.

Intra-observer agreement for stone composition.
Experience and diagnostic accuracy
Experience with endourology was not correlated with a higher diagnostic accuracy. Specifics can be found in Supplementary Appendix SA3.
Video quality
The overall median video quality based on the 3-point Likert scale for Q1 was 2.2 (1.6–2.8) and 2.1 (1.4–2.9) for Q2. No statistically significant difference was found in diagnostic accuracy when videos rated as low quality were omitted (Supplementary Appendix SA4).
Discussion
According to the EAU guidelines, medical treatment and prevention of urolithiasis is based on basic metabolic evaluation and reliable stone-analysis. XRD or IRS are recommended for stone analysis. 2 While corroborating this recommendation, a recent international consensus meeting emphasized the importance of visual identification of stone morphology to avoid sampling error. 16
The present study demonstrates the diagnostic limitations of ESR. COM, COD, and cystine were correctly recognized in more than 50% of the cases. In contrast, CHP, CHPD, CMP, MAP, and UA all had a diagnostic accuracy of less than 40%. Further, intra-observer agreement was below the threshold of acceptable agreement.
The ESR could be beneficial to preserve an etiological approach in the treatment and prevention of urolithiasis. This approach might not be possible when postoperative analysis of stone dust provides insufficient information. 17 Further, there could be a difference between the composition of the stone surface and core. The composition of the surface represents recent stone formation, and stone treatment with laser will unavoidably lead to loss of information on the etiology of the stone. 11
Moreover, Keller and colleagues described that high-frequency dusting with the Thulium fiber laser can lead to changes in the infrared spectra, thus possibly biasing stone analysis through IRS. 17,18 Not only postoperative prevention of urolithiasis can be aided by ESR, but also the preoperative treatment strategy (laser settings, acceptance of residual fragments, postoperative policy, and further direct medical management) might be influenced depending on the composition. Laser setting, for example, will depend on the hardness of the urolithiasis. Even though high-power lasers can destroy any type of urolithiasis, harder stone will require greater potency. ESR can help the surgeon to choose the right settings to treat the urolithiasis. 18,19
Another possible preoperative influence of ESR is the choice to accept residual fragments. Endourologists might accept small residual fragments in case they are treating a calcium oxalate stone, whereas they will not accept residual fragments when they are treating infection stones. This is because the risk of recurrence is higher in patients with residual fragments after the treatment of infection stones than for other stones. 2,20
Daudon and colleagues developed a classification system to summarize the principal etiological causes for urolithiasis formation based on their morphoconstitutional aspects that has been a guide for urologists all over the world. 8,10 This classification system, however, was based solely on the subjective assessment of the macroscopic appearance of ex vivo, dry urolithiasis. Estrade and colleagues evaluated the feasibility ESR of commonly encountered urolithiasis.
This study had one experienced urologist evaluate 399 stones during endoscopic stone treatment and found a concordance between microscopy and IRS of 81.6%. Diagnosis was confirmed for COM, COD, UA, CHP–MAP association, and CHPD stones. 11 In contrast to our study, this was a single-surgeon study, and they did not evaluate the intra-observer agreement.
Estrade's group also evaluated whether the recognition of stone compositions could be learned. Fifteen urologists-in-training were asked to recognize and describe the morphology of nine stones before and after expert coaching in their study and they concluded that it was possible to train urologists in stone recognition based on their visual appearance. 21 Our study tried to bypass the said learning curve by having experienced endourologists perform the assessment. Further, we did not find that experience was correlated with a higher accuracy in the prediction of stone composition.
This might be due to the high level of experience of the participating endourologists, and the involvement of less experienced raters may lead to another conclusion. Further, it is still possible that dedicated training on visual identification of stone composition might lead to better results. Further research is needed to confirm these hypotheses.
Similar to our study, Sampogna and colleagues described a limited diagnostic accuracy for ESR. Their study, where 32 urologists predicted the composition of 20 stones based on videos on YouTube, showed a diagnostic accuracy of 69.8% for COD and 78.1% for cystine, respectively 75% (50%) and 60% (80%) in our study. However, their results for COM were worse with a diagnostic accuracy of 41.8% vs a diagnostic accuracy of 54% and 59% in our study. 22
Black and colleagues were the first to describe a deep learning computer vision algorithm to recognize the composition of pure urolithiasis in 2020. They applied a neural network on digital photographs of 63 ex vivo urolithiasis (surface and section) and concluded that deep learning can be used to detect the composition of the most frequent urolithiasis. The neural network correctly predicted 94% of the UA, 90% of the COM, 86% of the MAP, 71% of the CHPD, and 75% of the cystine stones. 23 Similarly, following up on their previous studies, Estrade and colleagues also trained a neural network to predict stone composition. 11,17
In contrast to the study performed by Black and colleagues, in vivo endoscopic images of urolithiasis were used in this study. Their neural network analyzed pure COM, COD, and UA stones, as well as mixed COM–COD and COM–UA stones. They achieved high diagnostic accuracy for the surface of pure UA (98%) and pure COM (91% surface and 94% section). Mixed COM–COD and COM–UA had a lower, but still high sensitivity. 17
Although different techniques, such as convolutional neural networks, dual energy CTs and ESR, have been studied to determine stone composition, none are as good as the recommended techniques for reliable formal stone analysis, XRD and IRS. Nonetheless, the results achieved with artificial intelligence are promising. More research and big data are needed to further improve this technique.
The present study holds limitations. A major limitation is the limited number of some stone types, which limits the power of this study. Moreover, the fact that only a fraction of the stone was analyzed could lead to sample bias. This is especially true in mixed stones and is unfortunately inherent to the current technique of formal stone analysis in stones that were not extracted and sent for analysis in toto. It has been shown that the diagnostic accuracy of IRS on stone fragments (80%) is lower than the diagnostic accuracy of the combination of microscopic assessment and IRS of entire stones (95%). 24
Further, Krambeck and colleagues 25 showed that there is a significant variability in the reporting of mixed stones when analyzed with IRS. 21 Sutor had similar results in regards to XRD. 26 As many stones these days are treated with a combination of fragmenting and dusting, the fragments sent for analysis may not be representative of the entire stone. One could send dust for formal stone analysis; however, the diagnostic accuracy of IRS for stone dust is even lower than for stone fragments (60% vs 80%). 24 A possible solution would be to start with visual identification to define what portion of the stone should be sent for formal stone analysis. 16
Nonetheless, for this to work, urologists should be trained to visually identify different stone compositions and send the respective fragments for formal stone analysis. A way to bypass the difficulties of visual identification by urologists is the use of micro-CT. This technique was first described in 2001 by Cleveland and colleagues and has been shown to adequately recognize different stone compositions or regions of interest. 27 –33 As micro-CT has used formal stone analysis (IRS) as a benchmark to evaluate its specificity and sensitivity, additional formal analysis, such as IRS or XRD, is still necessary. Nonetheless, this technique can provide information about which regions of a stone should be further investigated. Further, Williams and colleagues stated that micro-CT is much easier than visual identification in the recognition of regions of interest. 34
In mixed stones, the difference in percentage composition between primary, secondary, and, if applicable, tertiary components, can vary. The lower the percentage of the primary component, the more difficult it might be to correctly predict the primary composition. However, it is unknown what the minimal percentage should be for ideal ESR. This study included stones where the primary component minimally made up 40% of the stone.
Further, even though the videos were carefully selected to optimally visualize the stones, more extensive fragments with different angles as well as images of both surface and sections of the stone may lead to a higher diagnostic accuracy. 12 Nevertheless, our dataset included mostly surface videos and prior studies have shown that images of stone surfaces are easier to correctly recognize. 11,21
The strength of this study lies in the international, multicenter design with videos rated of intermediate to high quality in a controlled questionnaire and a second assessment after 30 days to assess intra-observer agreement. Further, the raters were blinded to additional clinical information to evaluate a baseline diagnostic accuracy of ESR and avoid bias of the raters. Clinical information might increase the diagnostic accuracy even more; however, further research is needed to confirm this hypothesis. 12
Conclusion
The visual appearance of urolithiasis during endoscopy did not allow accurate prediction of stone composition. The diagnostic accuracy was limited, and the intra-observer agreement was low.
Author's Contributions
M.M.E.L.H.: Protocol/project development, data collection or management, data analysis, and article writing/editing.
S.J.M.S.: Article writing/editing.
D.M.D.B.: Protocol/project development, data analysis, and article writing/editing.
H.W.: Protocol/project development, data collection or management, and article writing/editing.
J.E.F.: Protocol/project development, article writing/editing.
O.J.W.: Data collection or management, article writing/editing.
A.P.: Data collection or management, article writing/editing.
A.S.: Data collection or management, article writing/editing.
B.K.S.: Data collection or management, article writing/editing.
T.E.Ş.: Data collection or management, article writing/editing.
E.E.: Data collection or management, article writing/editing.
L.B.D.: Data collection or management, article writing/editing.
L.V.: Data collection or management, article writing/editing.
M.T.: Data collection or management, article writing/editing.
M.D.: Article writing/editing.
O.T.: Article writing/editing.
P.K.: Data collection or management, article writing/editing.
S.D.: Data collection or management, article writing/editing.
T.T.: Data collection or management, article writing/editing.
T.T.: Data collection or management, article writing/editing.
N.H.: Article writing/editing.
H.P.B.: Article writing/editing.
J.B.: Data collection or management, article writing/editing.
G.M.K.: Protocol/project development, data collection or management, article writing/editing.
Informed Consent
Informed consent was not obtained from individual participants included in the study as consent is not required as long as information is anonymized, and the submission does not include images that may identify the person. Therefore, our institution waived the need for informed consent.
Footnotes
Author Disclosure Statement
M.M.E.L.H. is a consultant for Coloplast; S.J.M.S. declares no conflicts of interest; D.M.D.B. has no conflicts of interest; H.W. declares no conflicts of interest; and J.E.F. declares no conflicts of interest. O.J.W. is a consultant for Boston Scientific, Coloplast, EMS, and Ambu, has received research funding from Coloplast and EMS, and has undertaken education for Boston Scientific, Coloplast, EMS, Ambu, and Olympus. A.P. declares no conflicts of interest; A.S. declares no conflicts of interest; B.K.S. is a consultant for Lumenis, Coloplast, Boston Scientific, and Pusen; T.E.Ş. declares no conflicts of interest; E.E. declares no conflicts of interest; L.B.D. declares no conflicts of interest; L.V. declares no conflicts of interest; M.T. declares no conflicts of interest; M.D. declares no conflicts of interest; O.T. declares no conflicts of interest; P.K. declares no conflicts of interest; S.D. declares no conflicts of interest; T.T. declares no conflicts of interest; T.T. declares no conflicts of interest; N.H. declares no conflicts of interest, H.P.B. declares no conflicts of interest; J.B. is a consultant for Coloplast; and G.M.K. is a consultant for Boston Scientific, EMS, Coloplast, and Olympus.
Research involved human participants and/or animals: This study was reviewed by and approved by the ethical commission of our hospital under reference W20_212 #20.245 on May 14, 2020.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Appendix SA1
Supplementary Appendix SA2
Supplementary Appendix SA3
Supplementary Appendix SA4
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
