Abstract
Background:
A 24-hour urine collection is central to the metabolic evaluation and prevention of nephrolithiasis. Despite its widespread use, methodological inconsistencies in data reporting and analysis limit the reliability, reproducibility, and clinical utility of findings. We aim to review and evaluate how 24-hour urine parameters are reported, analyzed, and interpreted in nephrolithiasis research.
Materials and Methods:
We conducted a methodological review of 264 studies involving 450,624 patients with nephrolithiasis. Data extraction covered urine collection protocols, parameter reporting, units used, statistical methods, and missing data handling.
Results:
Retrospective cohort studies comprised 42.8% of included articles; cross-sectional and prospective cohort studies made up 25.0% and 19.3%, respectively. Only 6.8% (18/264) of studies reported power calculations, and 40.9% (108/264) provided reference ranges. Calcium was the most frequently reported parameter (93.9%), followed by citrate (87.5%), oxalate (86.7%), and uric acid (84.1%). Supersaturation indices were reported in 44.3% of studies. Reporting formats varied: continuous units were used in 94.7% for supersaturation, 91.7% for calcium/creatinine ratio, and 77.9% for calcium. Most common units were mg/day (e.g., calcium: 78.9%) and mmol/day (e.g., sodium: 56.9%). Regarding statistical analysis, 72.3% of studies used t-tests or Mann–Whitney U tests, 39.4% used chi-square tests, and only 21.6% used multivariate regression. Missing data were handled by complete case analysis in 71.2% of studies, while 27.7% did not report their approach.
Conclusions:
There is significant variability in how 24-hour urine data are reported and analyzed across nephrolithiasis studies. This inconsistency undermines evidence synthesis, limits external validation, and obstructs the integration of advanced tools like artificial intelligence. We recommend creating a standardized reporting checklist to improve the rigor, reproducibility, and clinical relevance of future research.
Introduction
Nephrolithiasis is a common and recurring condition, with prevalence reaching 13% in some regions. Its incidence is increasing, likely due to climate, social, metabolic, dietary, and environmental factors. 1 A 24-hour urine collection is a key step in the metabolic assessment and prevention for individuals with recurrent or high-risk stones. It serves as a diagnostic tool to evaluate urinary solutes and their levels, enabling the detection of metabolic issues such as hypercalciuria. These results support targeted dietary, pharmacologic, and lifestyle interventions.2,3 The American Urological Association (AUA) and European Association of Urology (EAU) highlight the importance of performing 24-hour urine collections when appropriate; however, they do not specify standardized reporting formats or analysis protocols for interpreting the results.3,4
Despite the widespread use of 24-hour urine testing, there is significant methodological variation across studies in how 24-hour urine test results are handled in analytical models. These differences include, but are not limited to, the use of different measurement units (e.g., mg/day, mmol/day, or normalized to creatinine), inconsistent sample collection procedures, varying reference ranges based on different population demographics, and different ways of presenting data, such as raw values vs supersaturation indices.5–7 Discrepancies also exist in analytical approaches, mainly in the use of different regression models and inadequate adjustment for confounders like diet and hydration. Ultimately, this could affect the interpretation of the findings. 8 Additionally, many studies do not adequately report how missing data are managed, which could introduce bias and limit the generalizability and reproducibility of the results. 9
These inconsistencies highlight the need for a systematic, unified methodology to ensure more reliable and applicable data due to the variability in data reporting and analysis, which has important implications for the clinical and scientific aspects of the field. Regarding the former, inconsistently reported results can lead to misclassification of metabolic risk profiles, thus influencing dietary advice and other clinical decisions.8,10 Concerning the latter, the lack of standardized methodology hampers evidence-based studies, impeding the ability to apply clinical guidelines universally.6,11 As a result, due to the absence of harmonized practices in data collection, reporting, and statistical analysis, nephrolithiasis research remains fragmented. While many studies have explored the link between 24-hour urine chemistry and stone formation, none have systematically evaluated how these data are reported or analyzed within the nephrolithiasis literature.8,10,11 This gap is especially concerning given the central role of 24-hour urine testing in stone prevention. Therefore, this review aims to examine reporting and analysis inconsistencies, guide future standardization, and improve research quality.
Methods
Study design
We conducted a methodological review to assess how studies involving 24-hour urine collections in nephrolithiasis report parameters, units, data classification, statistical tests, and handling of missing data.
Eligibility criteria
We included original studies that reported 24-hour urine collection in patients with nephrolithiasis, including first-time stone formers, recurrent stone formers, and post-surgical patients. Eligible study designs comprised observational studies (prospective and retrospective cohorts, cross-sectional studies, and case–control studies) and interventional studies (randomized controlled trials). We excluded reviews, editorials, conference abstracts without full text, and studies unrelated to nephrolithiasis or urinary parameter analysis.
Information sources and search strategy
We searched PubMed, Scopus, and Web of Science using relevant search terms such as “24-hour urine,” “nephrolithiasis,” “urinary calcium,” “urinary citrate,” “hyperoxaluria,” “uric acid,” and “statistical analysis” up to January 2025. Additionally, we manually screened the reference lists of relevant reviews and included articles to ensure comprehensive coverage.
Study selection
Three reviewers (M.G.D., M.R., H.M.) independently screened titles and abstracts using Rayyan. After initial screening, we retrieved full texts for potentially eligible studies and assessed them according to predefined inclusion criteria. A third reviewer (M.S.) resolved disagreements between reviewers.
Data extraction
We used a Google Form to extract data from each study independently by two reviewers. We collected bibliographic details, study design, population type, sample size, number of groups, and whether a power calculation was reported. We recorded urine collection methodology, including the number of collections, compliance measures, and the presence of reference ranges. For each urinary parameter, we extracted whether it was reported, the units used, and whether values were presented as continuous, binary, or both. We documented descriptive statistics, statistical tests (e.g., t-test, analysis of variance [ANOVA], regression models), and how missing data were handled.
Outcomes and data synthesis
The primary outcome of this review was the degree of variability in the reporting and analysis of 24-hour urine parameters across studies. Specifically, we assessed the frequency of reported parameters, the measurement units used, the classification format (binary vs continuous), the types of statistical tests and models applied, and the handling of missing data. We used descriptive statistics as frequencies and percentages to summarize patterns across the included studies.
Results
Descriptive summary of included studies
A total of 264 studies, encompassing 450,624 participants, were included after screening 1416 studies (Fig. 1). Retrospective cohort studies were the most common design (42.8%), followed by cross-sectional (25.0%). The median sample size was 125.5 participants. Most studies involved comparisons between two groups. Interventional studies represented 23.9% of the total. Urine collection was performed once in 48.1% of studies, while multiple collections were less frequently reported. Only 6.8% of studies included a sample size calculation, and 40.9% reported normal reference ranges for urinary parameters. Among these, only 45.1% of studies provided explicit numerical cutoffs or defined biochemical abnormalities (e.g., hypercalciuria), and none of the mixed-age studies specified whether separate ranges were applied for adults and children. Missing data were primarily addressed through complete case analysis (71.2%), with limited use of multiple imputation (0.8%) or sensitivity analysis (0.4%). A substantial proportion (27.7%) did not report how missing data were handled (Table 1).

Flowchart of included studies.
Characteristics of Included Studies
Demographic and anthropometric variables were variably reported as body weight was included in 28.4% of studies, body mass index (BMI) in 52.7%, gender in 94.7%, and participant age in 95.5% of the included studies. Most studies (74.6%) enrolled adult participants (≥18 years), whereas pediatric cohorts accounted for 13.3% and mixed-age studies for 8.0%; 2.7% did not specify the participant age range.
Reporting rates of 24-hour urine parameters
The reporting of individual 24-hour urine parameters varied considerably across the 264 included studies. Calcium was the most frequently reported parameter, included in 93.9% of studies, followed by citrate (87.5%), oxalate (86.7%), and uric acid (84.1%). Urine volume and pH were reported in 80.3% and 74.2% of studies, respectively. Electrolytes such as sodium (66.7%), magnesium (60.2%), phosphate (59.1%), and potassium (48.5%) were reported less consistently. Supersaturation indices were included in 44.3% of studies, whereas calcium/creatinine ratio and urinary protein were rarely reported, appearing in only 10.2% and 6.1% of studies, respectively. Additional parameters such as urinary ammonium (23.1%), sulfate (28.0%), urea nitrogen (18.9%), and chloride (26.1%) were even less frequently measured (Fig. 2).

Reporting rates of 24-hour urine parameters.
Reporting conditions and analytical methodology
Dietary control, laboratory methods, and demographic reporting varied substantially across the 264 included studies. Most collections were performed under self-selected diets (73.2%), while only 18.8% of studies utilized fully controlled diets. A minority (7.6%) implemented partially modified or mixed protocols (e.g., standardized advice or inpatient hospital diets), and 1.9% did not specify the dietary conditions. Methods for urinary analyte measurement were described in 54.2% of studies, most commonly involving colorimetric, enzymatic, or chromatographic assays. Commercial laboratories performed the analyses in 39.9% of reports, whereas 58.9% of studies conducted assays within institutional laboratories.
Reporting format for 24-hour urine parameters: binary vs continuous
Twenty-four-hour urine parameters were primarily reported using continuous measures, such as milligrams or millimoles per day. Continuous reporting was most prevalent for supersaturation indices (94.7%), calcium/creatinine ratio (91.7%), phosphate (90.3%), sodium (90.1%), and pH (89.1%). Calcium, oxalate, citrate, and uric acid were also predominantly reported in continuous format, ranging from 75.0% to 79.2% of studies. A smaller proportion of studies presented these parameters in binary format (e.g., high/low or above/below threshold), with binary-only reporting ranging from 3.2% to 10.1% depending on the parameter. Additionally, 1.6%–14.9% of studies used both continuous and binary formats for the same parameter (Figure 3).

Reporting format of 24-hour urine parameters.
Measurement units used for reported 24-hour urine parameters
Reporting units for 24-hour urine parameters varied, mainly mg/24 h for calcium, oxalate, citrate, and uric acid (78.9%, 77.9%, 79.2%, 62.4%) and mmol/24 h for sodium (56.9%), potassium (49.2%), magnesium (27.6%), and oxalate (22.1%). Milliequivalents per 24 hours was used for sodium (31.9%) and potassium (39.3%), less for citrate, uric acid, and magnesium. Grams per 24 hours mainly for uric acid (16.0%) and phosphate (20.0%). Urine volume is mostly in mL/24 h (94.2%). Calcium/creatinine ratio was in mg/g (62.5%) (Table 2).
Measurement Units Used for 24-Hour Urine Parameters
Statistical reporting methods used for 24-hour urine parameters
Most studies reported 24-hour urine parameters using mean and standard deviation, especially for calcium (72.3%), oxalate (72.1%), citrate (70.2%), uric acid (70.9%), sodium (74.5%), and other electrolytes, with similar patterns seen across parameters. The second most common method was median and interquartile range, used in 6.8%–10.5% of studies, depending on the parameter. A smaller number of studies presented mean or median values along with ranges, such as mean (range) and median (range), but these formats were less frequent, ranging from 0.7% to 3.6%. Some studies reported only the central tendency without measures of variability, such as mean alone, while others provided ranges only or used categorical frequencies to describe value distribution. Notably, urine parameters like potassium, phosphate, and supersaturation indices showed more variation in reporting, relying more on ranges or categorical methods (Table 3).
Statistical Reporting Methods for 24-Hour Urine Parameters
IQR = interquartile range.
Statistical tests and regression models used in 24-hour urine parameters
Various statistical methods were used to analyze 24-hour urine parameters across the included studies. The most common tests were Student’s t-test or Mann–Whitney U test, reported in 72.3% of studies. Chi-square or Fisher’s exact tests were used in 39.4% of studies, while ANOVA and correlation analyses (Pearson or Spearman) were each employed in 23.1%. The Kruskal–Wallis test was rarely used (6.8%). Regression models were less frequently applied; linear regression appeared in 19.3% of studies, multivariate regression in 21.6%, and logistic regression in 14.4%. Only 2.7% of studies used Cox proportional hazards models (Table 4).
Statistical Tests and Regression Models Used in 24-Hour Urine Parameter Analyses
Discussion
This methodological review aimed to identify inconsistencies in how 24-hour urine collections are reported and analyzed in nephrolithiasis research. Its goal is to guide future efforts toward standardization and improve research quality, considering the essential role of 24-hour urine testing in preventing stones. The main finding is that, despite the common use of 24-hour urine testing, there is significant variation in methodology across studies. These differences highlight the need for a systematic, unified approach to ensure more reliable and applicable data, especially given the variability in reporting and analysis. The review included 264 original studies with a total of 450,624 participants. Retrospective cohort studies were the most common type (42.8%), followed by cross-sectional studies (25.0%). The median sample size was 125.5 participants, and most studies compared two groups.
Twenty-four-hour urine collection is a key component in the metabolic evaluation and prevention of kidney stones (nephrolithiasis). 12 It provides vital data on urinary solutes and their levels, helping to identify metabolic abnormalities such as hypercalciuria. 1 These findings are crucial for guiding personalized treatment and prevention strategies, including dietary, medication, and lifestyle changes.2,3 Despite its importance and widespread use, major urological associations like the AUA and EAU have not established comprehensive guidelines for standardized reporting formats or analytical methods for interpreting these results.3,4 This lack of a unified approach leads to significant variation in methodology across studies, ultimately affecting the usefulness and broader applicability of research and clinical care. A key aspect of this inconsistency is the variety of measurement units used to report urinary parameters. Studies often present results using different units, such as milligrams per day (mg/day), millimoles per day (mmol/day), or values normalized to creatinine. For example, common stone-risk indicators like calcium, oxalate, citrate, and uric acid are frequently reported in milligrams per 24 hours. This unit diversity creates a clear obstacle to comparing findings across different studies and complicates efforts to synthesize evidence for meta-analyses or systematic reviews.
Beyond unit discrepancies, inconsistencies are also evident in urine collection methods and how their adequacy is assessed. While guidelines recommend that collections reflect patients’ typical diet and daily activity to capture relevant stone-forming risk factors, adherence to such protocols varies.13–15 Some experts suggest that initial evaluations should include at least two collections—one on a workday and another on a non-working day—to account for environmental and dietary fluctuations.13,15 However, the optimal number of 24-hour urine collections remains debated.12,16 Studies have documented significant day-to-day variability in urine chemistries; for instance, nearly one in three patients may show a 30% or greater difference in calcium or volume between two consecutive samples. 12 Even when collected on back-to-back days, variability can remain substantial, often driven by diet and environmental influences.12,17 Evidence indicates that relying on a single collection may misclassify stone risk in a considerable proportion of patients: one study found that 17.1%–47.6% experienced a shift from normal to abnormal values—or vice versa—across 11 urinary parameters, potentially altering clinical decisions.12,17,18 Conversely, other studies have found no statistically significant differences in mean values between consecutive samples, leading to differing conclusions about the need for multiple collections.16,19 Despite ongoing controversy, both international consensus statements and the Canadian Urological Association guidelines recommend obtaining two 24-hour urine collections during the initial metabolic evaluation of patients with nephrolithiasis.15,20
The adequacy of a 24-hour urine collection is mainly evaluated through creatinine excretion, as creatinine—a waste product of muscle metabolism—is produced at a relatively stable rate, and its total excretion over 24 hours should be consistent. 21 However, definitions of an “adequate” collection vary. A common method involves assessing creatinine per kilogram within specific reference ranges, usually 15–20 mg/kg for females and 18–24 mg/kg for males.22,23 This approach, however, has notable limitations, as it can be affected by BMI, muscle mass, and age, which may lead to misclassification of adequacy.22–24 Studies have shown that more than half of 24-hour urine collections may fall outside these creatinine per kilogram reference ranges. 22 Women are more likely to have inadequate samples, possibly due to anatomical and mechanical factors. 22 Another method for assessing adequacy compares total 24-hour urine creatinine values between consecutive collections, where differences exceeding an arbitrary threshold (e.g., 10%–20%) might indicate an inadequate collection. However, some studies suggest that even larger differences, up to 40%, could still be acceptable, highlighting the need for clearer guidelines.12,18,22 The lack of standardized reference ranges in this approach further complicates clinical interpretation. In pediatric populations, researchers have used the percentage difference in urine creatinine between two samples as a practical indicator of adequacy. 17 While no method is perfect, comparing day-to-day creatinine excretion may provide a more reliable measure of collection quality than relying solely on absolute values.12,13 Notably, even though creatinine plays a central role in assessing collection adequacy, only 10.2% of studies in our review reported the calcium/creatinine ratio (Table 2).
Urinary parameter reference ranges vary based on demographics, age, and lab standards, complicating interpretation. 23 Traditional hypercalciuria thresholds (≥250 mg/day for women, ≥300 mg/day for men) are increasingly questioned, as risk may be continuous rather than dichotomous. Definitions based on arbitrary thresholds often do not reflect true pathology, where risk can increase within normal ranges.8,25 Studies differ in data presentation—some use continuous measures, others binary formats—and employ inconsistent analytical methods, with inadequate adjustment for confounders like diet and hydration. 8 Many neglect how they handle missing data; most (71.2%) use complete case analysis, while few employ advanced techniques like multiple imputation (0.8%) or sensitivity analysis (0.4%). About 27.7% do not report their approach, reducing transparency and robustness. Overall, differences in ranges, presentation, methods, and data handling make interpretation difficult and emphasize the need for standardization. Only a minority of studies described how normal ranges were established or whether they accounted for age-related differences, potentially affecting comparability between adult and pediatric populations. The lack of consistent reporting on how reference values are derived underscores the need for age- and sex-specific normalization in future studies.
The lack of standardized methods in 24-hour urine studies hinders the development of strong, evidence-based findings in nephrolithiasis. Variations in data collection, reporting, statistical analysis, and how missing data are handled limit comparison between studies, impede replication, and reduce the applicability of results. These methodological issues also obstruct external validation, multicenter trials, and the integration of advanced tools like artificial intelligence, which depend on structured, high-quality data.
To enhance comparability and reproducibility across future studies, standardized reporting of 24-hour urine data is warranted. At minimum, studies should specify the dietary conditions under which urine was collected (controlled vs self-selected), the analytical methods and laboratory source used, and the reference ranges applied. Reporting of participant demographics (age, sex, BMI, body weight) should also be standardized, with particular attention to studies that include both adult and pediatric participants, which should clearly distinguish age-specific reference ranges and analyses. Adoption of these elements into a unified reporting checklist would substantially improve methodological transparency and facilitate meta-analytic synthesis. Our methodological review, including 264 studies and over 450,000 participants, is the first to thoroughly examine how 24-hour urine data are managed in nephrolithiasis research. It uncovers significant variation in units, thresholds, reference ranges, and analytic methods. However, the review is limited because it depends on reported results, so undocumented practices (such as collection protocols, unit conversions, or imputation techniques) remain unknown, which might underestimate the true variability. Due to differences among studies, a meta-analytic synthesis was not possible.
We conclude that standardization is urgently needed. We call upon the urology research community to establish a consensus-based reporting checklist for 24-hour urine studies, analogous to existing biomarker and diagnostic reporting frameworks. Such a tool would strengthen reproducibility, enhance patient care, and facilitate the adoption of emerging data-driven technologies in stone disease research.
Conclusions
Our methodological review highlights a crucial milestone as the first comprehensive analysis of widespread inconsistencies in reporting 24-hour urine studies in nephrolithiasis. This variability impacts research replication, undermines evidence-based medicine, and hinders integrating artificial intelligence and big data into trials. The urological community must collaborate to create a universal checklist for reporting these parameters. Such efforts will improve research reliability, reproducibility, and patient care.
Authors’ Contributions
M.G.D.: Conceptualization, methodology, formal analysis, data curation, writing—original draft, visualization, and project administration. A.N.A.-F. and A.H.: Validation, data curation, and writing—review and editing. M.R.: Methodology, investigation, data curation, and writing—review and editing. M.A.A.-O.: Software and writing—review and editing. H.M.: Resources, investigation, and writing—review and editing. B.A.B.I. and H.H.: Investigation, visualization, and writing—review and editing. A.T.A.-S.: Investigation and writing—review and editing. T.M.: Visualization and writing—review and editing. M.S.: Supervision, validation, writing—review and editing, and project administration. All authors approved the final version of the article.
Footnotes
Acknowledgments
The authors would like to thank Dr. Jamie Landman and Dr. Roshan Patel for their critical review.
Author Disclosure Statement
The authors declare no conflicts of interest.
Funding Information
No funding was received.
Data Availability
The data that support the findings of this study are available upon request from the corresponding author.
