Abstract

Dear Editor,
We would like to comment on “Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.” 1 In order to conduct the study, 24 simulated patient queries covering seven vascular surgical illness areas were asked to OpenAI’s ChatGPT-3.5 and Google Bard. In addition to readability scores, the responses from six vascular surgery faculty members with extensive expertise were assessed for accuracy, completeness, and appropriateness. According to the findings, responses from ChatGPT were judged as being more thorough and accurate than those from Bard. Furthermore, the majority of ChatGPT responses were judged proper, but several reviewers found a sizable percentage of Bard responses to be incorrect.
The study’s methodology flaw was its reliance on the subjective evaluations of a limited number of specialists to analyze the AI responses. This can have caused biases or irregularities in the grading scheme. One possible reason for the results’ lack of reliability could have been the absence of a systematic evaluation framework. In order to increase the robustness of the results, future research would benefit from including a wider and more varied panel of reviewers to evaluate AI-generated responses.
One of the study’s weaknesses is the small sample size of reviewers and fake patient queries, which would have affected how broadly the findings might be applied. Furthermore, the research did not investigate any differences in AI performance according to the intricacy of the problems or the types of diseases. To give a more thorough grasp of AI capabilities in the field of vascular surgery, future research might look into these issues.
Research may go in new ways in the future by enhancing the standards for evaluating AI responses and adding more metrics to rate relevance and quality. To ascertain which AI models are most suited for a certain clinical application, it could also be helpful to assess how well they perform across a range of medical specialties. The accuracy, comprehensiveness, and suitability of AI algorithms for delivering medical information could be further improved by ongoing monitoring and modifying them in response to input from specialists and end users.
Footnotes
Author contributions
HP partially contributed to ideas, writing, analyzing, and approval of the manuscript and VW partially contributed to ideas, supervision, and approval of the manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
