Abstract

We are writing to commend the recent publication by Kun-peng Li, Li Wang, 1 and colleagues titled “Enhanced Artificial Intelligence in Bladder Cancer Management: A Comparative Analysis and Optimization Study of Multiple Large Language Models.” This study represents a timely and valuable contribution to the evolving intersection of artificial intelligence (AI) and oncology, particularly in the specialized domain of bladder cancer (BLCA) management. The authors’ meticulous approach to evaluating and optimizing large language models (LLMs) offers a compelling framework that we believe warrants recognition and further consideration by the academic and clinical communities.
The study’s design is notably robust, as the authors developed a comprehensive set of 100 clinical questions derived from established guidelines, covering critical aspects of BLCA such as epidemiology, diagnosis, treatment, prognosis, and follow-up. We find this breadth particularly commendable, as it ensures a holistic assessment of LLMs’ capabilities in a real-world clinical context. By testing six prominent models Claude-3.5-Sonnet, ChatGPT-4.0, Grok-beta, Gemini-1.5-Pro, Mistral-Large-2, and generative pre-trained transformer (GPT)-3.5-Turbo across multiple trials and validating responses against clinical guidelines and expert consensus, the authors provide a rigorous comparative analysis. The results, which highlight Claude-3.5-Sonnet’s leading accuracy (89.33 ± 1.53%) and GPT-3.5-Turbo’s initial underperformance (74.33 ± 3.06%), offer a clear benchmark for future research in this field.
What we find especially praiseworthy is the authors’ innovative two-phase training optimization of GPT-3.5-Turbo, which ultimately elevated its accuracy from a modest baseline to an impressive 100%. This achievement not only underscores the potential of strategic refinement in overcoming LLMs’ limitations but also provides a practical roadmap for enhancing AI tools in specialized medical applications. We appreciate how the authors frame this success as a broader implication for health care AI, suggesting that targeted optimization could bridge performance gaps in other domains of oncology and beyond. Their conclusion—that initial limitations need not be permanent barriers—is both insightful and encouraging for researchers seeking to integrate AI into clinical practice.
While the study’s focus on BLCA is a strength, we respectfully suggest that its findings could inspire similar investigations across other cancer types, further expanding the applicability of LLMs in oncology. The transparency of the methodology and the emphasis on clinical relevance make this work a model for future studies. We also applaud the collaborative effort of the research team, whose diverse expertise is reflected in the study’s comprehensive scope and high-quality execution.
We believe this article is a significant step forward in demonstrating the transformative potential of AI in health care. We congratulate Li, Wang, and their co-authors for their thoughtful and impactful research, and we encourage the Journal of Endourology to consider this commentary as a testament to the study’s merit. It is our hope that this work will stimulate further exploration and refinement of LLMs, ultimately benefiting clinicians and patients alike.
Footnotes
Authors’ Contributions
R.M. and R.S. critically provided comments on methodological aspects. R.M. and S.K. wrote and edited the draft.
Author Disclosure Statement
The authors report no conflicts of interest.
Funding Information
No funding was received.
