Beyond ChatGPT: It Is Time to Focus More on Specialized Medical LLMs

Abstract

Dear Editor,

We have read with interest the recent articles published in the Journal of Endourology about the application of ChatGPT in the medical field. As social science researchers specializing in medical Artificial Intelligence, we think that despite ChatGPT’s current popularity across various domains, medical professionals should not overly focus on this specific technology. Instead, more attention and discussion should be directed toward specialized medical large language models (LLMs).

Firstly, the underlying technical logic of ChatGPT relies on the mechanism of generative pre-trained transformers (GPT). This mechanism generates contextually relevant text by using the transformer architecture to analyze large amounts of unannotated text data through self-supervised learning. However, this method of generating text is primarily based on language pattern prediction rather than direct fact retrieval. Although ChatGPT-4 has shown better performance than ChatGPT-3.5 in answering medical questions,¹ this improvement only reflects its enhanced language generation capability, allowing it to produce more seemingly plausible answers. These answers are not necessarily based on accurate medical knowledge, and the accuracy of its output remains a challenge.²

Secondly, ChatGPT’s training data mainly come from public datasets, including online articles, books, and Wikipedia.³ These data sources are not specifically designed for the medical field and may contain errors or incomplete information. This broad and nonspecialized nature of its data leads to limitations in handling complex medical issues. While such data sources may form a basis for creating a general Frequently Asked Questions repository,⁴ their use in highly specialized medical domains may result in misleading answers and serious consequences in some practical applications.

To use artificial intelligence more effectively and responsibly in medical practice, future research and development should focus more on specialized medical LLMs rather than on ChatGPT. These models should be trained using authoritative medical databases, supplemented by rigorous human validation to ensure data accuracy and completeness. Specialized medical LLMs can provide more precise and contextually relevant medical advice, significantly enhancing patient care and medical education quality.⁵ By combining open access datasets and retrieval-augmented generation techniques, specialized medical LLMs can better cover specific domain knowledge and improve the reliability and relevance of their responses. Only through such specialized development can AI truly realize its great potential in the medical field.

Footnotes

Abbreviations Used

References

Şahin

, Genç

, Doğan

, et al. Evaluating the performance of ChatGPT in urology: A comparative study of knowledge interpretation and patient guidance. J Endourol, 2024; doi: 10.1089/end.2023.0413

Kiriakedis

, Duty

, Chase

, et al. Using ChatGPT-4 to analyze 24-hour urine results and generate custom dietary recommendations for nephrolithiasis. J Endourol, 2024; doi: 10.1089/end.2024.0055

Thompson

. What’s in my AI? Available from: https://lifearchitect.ai/whats-in-my-ai/ [Last accessed: June 3, 2024 ].

Javid

, Bhandari

, Parameshwari

, et al. Evaluation of ChatGPT for patient counseling in kidney stone clinic: A prospective Study. J Endourol, 2024; 38(4):377–383; doi: 10.1089/end.2023.0571

Connors

, Gupta

, Khusid

, et al. Evaluation of the current status of artificial intelligence for endourology patient education: A blind comparison of ChatGPT and Google Bard against traditional information resources. Hum Gene Ther, 2024; doi: 10.1089/end.2023.0696