Abstract
Objectives:
To validate AutoMated UroLithiasis Evaluation Tool (AMULET) software for kidney stone volumetry and compare its performance to standard clinical practice.
Materials and Methods:
Maximum diameter and volume of 96 urinary stones were measured as reference standard by three independent urologists. The same stones were positioned in an anthropomorphic phantom and CT scans acquired in standard settings. Three independent radiologists blinded to the reference values took manual measurements of the maximum diameter and automatic measurements of maximum diameter and volume. An “expected volume” was calculated based on manual diameter measurements using the formula:
Results:
Ninety-six stones were analyzed in the study. We had initially aimed to assess 100. Nine were replaced during data acquisition due of crumbling and four had to be excluded because the automated measurement did not work. Mean reference maximum diameter was 13.3 mm (5.2–32.1 mm). Correlation coefficients among all measured outcomes were compared. The correlation between the manual and automatic diameter measurements to the reference was 0.98 and 0.91, respectively (p < 0.001). Mean reference volume was 1200 mm3 (10–9000 mm3). The correlation between the “expected volume” and automatically measured volume to the reference was 0.95 and 0.99, respectively (p < 0.001).
Conclusions:
Patients' kidney stone burden is usually assessed according to maximum diameter. However, as most stones are not spherical, this entails a potential bias. Automated stone volumetry is possible and significantly more accurate than diameter-based volumetric calculations. To avoid bias in clinical trials, size should be measured as volume. However, automated diameter measurements are not as accurate as manual measurements.
Introduction
U
Acceptance of the maximum diameter as a size measurement would implicate a direct correlation between the stone's diameter and its actual volume. However, most stones are not spherical, and there is no evidence that their maximum diameter correlates with their volume. One problem with diameter as size is that some physicians use the axial CT slice measurement and others use coronal slice, which again reduces the accuracy of the diameter compared to volume. 8
Some recommend resolving this shortcoming partially by calculating volume via ellipsoid formulas, but being extremely complex, they are seldom used. 9
Nowadays low-dose noncontrast-enhanced CT (NCCT) and dual-energy CT (DECT) are widely used to diagnose and measure stones. 2,10,11 There are software solutions for automated stone volumetry designed for CT scans acquired with DECTs. 12 We recently acquired new AutoMated UroLithiasis Evaluation Tool (AMULET) software, which only required that the radiologist click on the stone of interest to obtain a stone's maximum diameter and volume. The objective of this study was to validate this new software and compare its performance to standard clinical practice.
Materials and Methods
Study design
This study was approved by the University of Freiburg Ethics Committee and conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. Patient informed consent was waived due to the study's design. The data acquisition took place between August 2015 and November 2015.
Our aim was to measure 100 human and animal kidney stones of different composition (calcium oxalate, brushit, cystine, and uric acid) and diameters of at least 5 mm. Nine stones crumbled during the measurements and were replaced so that a total of 109 stones were measured. The crumbled stones were excluded from further analysis. Four additional stones had to be excluded from the analysis because the automated measurement delivered no data. The reason for this is not stated by the software; the algorithm is most likely not converged due to the specific geometry of the calculus (personal communication from Siemens Healthcare). For the user, there is no way to influence the convergence of the algorithm, thus if no results are displayed, this must be accepted.
Manual measurement
Initially three urologists (with 3–5 years of experience in urology) independently assessed the maximum diameter with a digital sliding caliper and the volume using a water displacement/overflow method (Fig. 1A, B): Cylindric test tubes accommodating different stone sizes were filled with a fixed amount of water. The stones were placed inside the tube and the volume was evaluated according to the surface rise. Those measurements were accepted as reference values because the methodology appeared to provide the most precise values.

Experiments.
CT scans
The same stones were then positioned in a radiologic phantom consisting of a water bath 560 × 363 × 128 mm (representing a human abdomen), and human vertebrae were placed in central position. Stones were placed on 3 cm-high, nonradiopaque polyvinyl chloride podiums to surround them completely by water (Fig. 1C, D). This method corresponds to earlier published models. 13
CT scans were acquired at low dose settings. One hundred kVp, 55 mAs, CTDIvol 2.3 mGy for the standard axial and coronal reconstructions with a resolution 3 mm slice thickness and slice sparing (kernel: I31f). Dual-energy technique was applied with 100 kVp 172 mAs/Sn 140 kVp 122 mAs, CTDIvol 12.94 mGy for axial dual-energy reconstructions with high resolution 0.75 mm slice thickness and slice sparing (kernel: Q30f). These settings correspond to the standard settings for urolithiasis diagnostic (Somatom Flash; Siemens, Forchheim, Germany). 14
Three radiologists (with 2–6 years of experience) independently took manual measurements of maximum diameter in standard CT reconstructions and automated measurements of maximum diameter and volume using the AutoMated UroLithiasis Evaluation Tool (
Measurement rounds were done at least 4 weeks apart to avoid bias. The radiologists were blinded to the reference measurements at all times.
Based on the manual measurements of maximum diameter (mm), we calculated an “expected volume” using the formula for spheres:
Statistical analyses
Inter-rater correlations were calculated for the urologists' measurements and those of the radiologists via intraclass correlation coefficient.
Mean values of the three urologists' measurements were used as our reference standard.
Differences between reference values and values measured by radiologists were plotted in Bland–Altman difference plots indicating mean difference, upper and lower twofold standard deviation with lines.
Correlations between reference values and those the radiologists measured were calculated with Pearson's correlation coefficient. A p-value <0.05 was considered to indicate statistically significant results. Statistics were run with R version 3.2.4.
Results
Ninety-six stones could be integrated in our analysis. Out of 100 initially planned stones, 9 crumbled after the reference measurements and could not be processed for the radiologic measurements. They were replaced by new ones. The automated stone volume measurement malfunctioned in four cases. It remained unclear which factors contributed to this measurement failure, as all those stones were of different sizes, shapes, compositions, and positions in the phantom. The smallest stone that malfunctioned had 7 mm diameter. One out of the four stones was composed of urate, the others were calcium oxalate stones.
The measurement values of 96 stones could be included in our final assessment.
The reference measurement for maximum diameter resulted in a mean diameter of 13.3 mm (range: 5.2–32 mm). The mean reference volume was 1200 mm3 (range: 10–9000 mm3). The inter-rater correlation for the reference measures was 0.99 for diameter and volume.
The manual measurements of maximum diameter in standard CT reconstructions revealed an inter-rater correlation of 0.96. The correlation between the manual maximum diameter and the reference value was 0.98, and that between the automated measurements of maximum diameter in DECT reconstructions and the reference value was 0.91, p = 0.001. Figure 2 demonstrates the difference between the reference vs manually measured diameter and the reference vs automatically measured diameter values with Bland–Altman difference plots. Note that the discrepancy increases with stone size. The automated measurement tends to underestimate the maximum diameter.

Stone diameter. The middle dashed line in the Bland–Altman difference plot shows the mean difference between reference diameter and measured value. The upper and lower dashed lines show the twofold standard deviations. Left: Values of differences between reference diameter and manually measured diameter on CT scans. Right: Values of differences between reference diameter and automated diameter measurement on CT scan. (Δ = difference).
The mean automated radiologic volume was 1058 mm3 (14–7690 mm3) and mean calculated volume based on the radiologic diameter was 2252 mm3 (22–18,817 mm3). The inter-rater correlation for radiologic automated stone volumetry was >0.99 and 0.91 for the calculated volume based on the radiologic diameter.
The correlation between the reference value and automated radiologic volume assessment was 0.99—significantly better than the correlation between reference values and the calculated values based on hand measurements (0.95, p = 0.001). Figure 3 demonstrates the difference between reference vs manually measured and reference vs automatically-measured volume values with Bland–Altman difference plots. Errors associated with the calculated volume multiply with increasing stone size, incurring the tendency to overestimate stone volume.

Stone volume. The middle dashed line in the Bland–Altman difference plot shows the mean difference between reference diameter and measured value. The upper and lower dashed lines show the twofold standard deviation. Left: Values for differences between reference volume and calculated (“expected”) stone volume based on stone diameter, calculated with the formula for spheres:
Real patient data
We were able to correlate CT data of one patient to the ex vivo stone size for one patient who required laparoscopic removal of a kidney stone due to unfavorable anatomy for ESWL, URS, or PCNL (a growth-restricted individual with severe scoliosis and bone deformation). A part of this patient's CT scan is shown in Figure 4. This patient's mean automated volume was 67.1 mm3 and the manually measured volume 63 mm3.

AMULET software. The AMULET software in clinical practice: The radiologist clicks on the stone of interest and the software instantly displays the most important information including maximum and minimum diameter and volume. AMULET = AutoMated UroLithiasis Evaluation Tool.
Discussion
Stone size is a relevant factor when deciding whether a kidney stone requires therapy and if so, which treatment modality is ideal for the specific case. 15 It is an important prognostic parameter for the stone treatment's success and duration of surgery. Even complication rates appear to depend on stone size. 6 In trials comparing two different treatment modalities, mean stone sizes should be similar in both study arms. Most available studies consider maximum diameter as the size parameter; see examples. 16 –18 More recent studies and nomograms referred to stone surface. 4 However, volume should enable the most exact description of stone burden. For example, stone volume, as measured by NCCT, is the strongest predictor of stone-free status after ESWL and is more significant than stone diameter. 12 As a technical note-point, three-dimensional volumetry for kidney stones on NCCT has been described, but there was no software validation involving alternative volume measurements in a large series, and the authors only used calcium oxalate stones. 19
In another publication, automated CT volumetry of kidney stones is described for growth measurements during active surveillance of kidney stones in vivo, but the authors did not validate by taking ex vivo measurements. 20
We demonstrated that volume assessment is easily doable with CTs and that those results correlate well with reference measurements taken manually. Three independent radiologists proved the reproducibility of those results.
The reference to maximum or cumulative maximum diameters as size parameter used by most investigators relies on an implicit close correlation between the diameter and actual stone burden.
We demonstrated, however, that this correlation between maximum diameter and actual volume is in fact not very close. For example, our series had stones with the same maximum diameter but with volumes that differed by the factor 2 to 3 (Supplementary Table S1; Supplementary Data are available online at
To illustrate this clearly, we calculated an expected volume and show the poor correlation between this expected volume and the genuine volume. In light of these data, it is quite possible that trials on urolithiasis relying on maximum diameter might have delivered misleading results. Stone volume is not a directly proportional function of stone diameter, and stone diameter cannot be used as approximation of volume. Studies using stone diameter as inclusion or exclusion criteria risk comparing patients presenting very different actual stone burdens and thus deliver misleading results.
The use of maximum diameter or (in rare cases) a surface plane is attributable to the former use of plain abdominal X-ray as a diagnostic tool for stone status. Before the wider use of CT scans for stone diagnostics, stone volumetry was impossible. Today, low dose NCCT is a standard diagnostic tool, and we should aim for the standardized use of volumetric tools at least in clinical trials addressing urolithiasis. 15
Interestingly, the automated maximum diameter underestimated stone size compared to the reference and to the manually measured maximum diameter. Obviously, the maximum diameter algorithm calculates very conservative values. This configuration of the algorithm may be due to the fear of overestimating the calculus' size, thus causing unnecessary therapies. However, our study's sometimes significant underestimation of maximum diameter in a clinical setting could also have unfavorable therapeutic consequences. Considering that our results reveal excellent automatic volume assessments but underestimated automatic diameters shows that the software urgently needs to be optimized. Our assessment of the excellent volumetry performance is not diminished by the faulty data on maximum diameter measurements.
There are some study limitations. So far, the software tool we used is only compatible with DECTs. It would be interesting to know whether results might differ if the automated measurements were taken on standard dose CT as opposed to low dose techniques with reconstruction. However, we did not investigate that in this study. We therefore can make no comment on this issue based on the available data. Our study's Dual Energy evaluations were done on CT data whose dose setting is in the normal range and it should thus not be considered a low-dose study. In our clinical algorithm, therefore, we first conducted a low-dose study and the dual energy analysis only over the area where a calculus had been detected in the low-dose study. In this way, we combine maximum radiation protection with maximum information about the stone burden.
If one kidney contains multiple stones, the radiologist has to click on each stone and add up their total volume.
Future software developments should be based on standard CTs. It should become possible to draw a region of interest around the kidney and have automated stone volumetry for the entire kidney.
It is noteworthy that volume alone does not adequately describe a stone. When deciding on treatment modalities, other factors such as the stone's exact location or its shape should be considered; they are incorporated in prognostic nomograms such as Guy's stone score or S.T.O.N.E. 3,4 But volume should become part of the standard description of stones and patient groups in clinical trials to avoid bias and misleading results.
A genuine validation series for intracorporeal stones of real patients is lacking. However, such a validation is impossible because most stones >5 mm need to be fragmented before extraction. For this study, we used a phantom that mimics the belly of a normal adult and approximates it in terms of X-ray resorption values. Therefore, we do not expect those results to differ significantly between our phantom studies and clinical trials. However, further clinical studies are needed to verify this.
Automated stone volumetry is both possible and delivers precise results. There is no direct correlation between stone diameter and stone volume. This lack of correlation can result in clinical investigations delivering inaccurate findings. New software tools should be developed to facilitate standardized automatic volumetry to integrate volume in the description of stones and patient groups.
Footnotes
Acknowledgments
The study was supported by institutional funding (University Medical Center Freiburg); no external or industrial funding was received.
Ethical Approval
481/15, Ethic committee University Medical Center Freiburg. Study register no.: DRKS00009403.
Authors' Contributions
K.W. and J.N. designed the study, performed experiments and statistics, and wrote the article. S.H., D.S., and F.A. carried out urologic experiments and took measurements. J.N., M.B., and B.F. conducted radiologic experiments and took measurements. A.H. helped to design the study and organized stone material. A.M., M.S., and M.L. supervised the project and helped with its design. All authors critically revised the article and approved the final version. They agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Author Disclosure Statement
The authors declare that no competing financial interests exist.
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
