Abstract
Background
Large language models (LLMs) increasingly generate clinical recommendations, but their ability to translate biliary guidelines into safe procedural triage remains uncertain. We evaluated next-generation LLMs for ERCP indication in suspected choledocholithiasis and tested whether errors could affect workflow.
Methods
A cross-sectional in-silico diagnostic accuracy study was conducted from May 14 to May 18, 2026. One hundred locked synthetic vignettes were mapped to ASGE/ESGE-based standards: 45 ERCP-indicated and 55 nonindicated cases. GPT-5.5, Gemini 3.0 Pro, and Claude 4 Opus were queried with an identical zero-shot prompt at temperature 0.0. Outcomes included accuracy, sensitivity, specificity, kappa, error phenotype, and simulated under-triage delay.
Results
GPT-5.5 achieved the highest accuracy (96.0%; 95% CI, 90.2%-98.4%), followed by Gemini 3.0 Pro (90.0%; 95% CI, 82.6%-94.5%) and Claude 4 Opus (84.0%; 95% CI, 75.6%-89.9%). Agreement was near-perfect for GPT-5.5 (kappa = 0.92), substantial for Gemini 3.0 Pro (kappa = 0.80), and weaker for Claude 4 Opus (kappa = 0.68). GPT-5.5 outperformed Claude 4 Opus (McNemar P = .004). Claude 4 Opus produced the most under-triage errors (n = 9) and the largest simulated delay burden (163.8 hours per 100 vignettes; Kruskal-Wallis P = .007).
Conclusion
Next-generation LLMs can approximate guideline-based ERCP triage, but clinically meaningful differences emerge when errors are weighted by procedural delay and safety. GPT-5.5 showed the most balanced profile; conservative under-triage remains the key hazard requiring supervision.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
