Chatting together: Using AI chatbots to improve diagnostic excellence

Abstract

The day she was turned away by a dismissive emergency room doctor, Susan Sheridan decided to turn for the first time to ChatGPT. The Idaho woman typed in her symptoms: “facial droop, facial pain, dental work.”¹ Immediately, the program suggested a few diagnoses, including Bell's palsy. Armed with this possibility, she returned to the emergency room. A new doctor confirmed the diagnosis and suggested a role for shingles triggered by the dental work. She was treated with steroids and antivirals and recovered rapidly.

The importance of diagnostic excellence is indisputable.² Diagnosis plays a central role in medicine and health, influencing prevention, treatment, and recovery. Early diagnosis can detect a disease while curative treatment is available. Accurate diagnosis can improve the efficiency of treatment and patient outcomes.

More and more, patients across the country are going online and beginning to use artificial intelligence (AI) to help them diagnose their ailments. A poll from the Kaiser Family Foundation found that 17% of U.S. adults say they use AI chatbots—a computer program that uses AI to simulate conversation with humans—at least once a month to find health information and advice.³ That figure was even larger (25%) for those under age 30.

Although this advance seems to have appeared de novo, it is more evolutionary than revolutionary. Beginning in 1970, Jack D. Myers, a doctor at the University of Pittsburgh, widely regarded as one of the best at diagnosis, developed with computer scientist Harry Pople one of the first computer programs to extend a doctor's abilities.⁴ The so-called “expert system” was based on his methods of reasoning and steps in decision making. His first program could match 3550 symptoms with more than 500 diseases. A later system called “Quick Medical Reference” scanned a database of information from medical research reports.⁵

Other systems have followed such as DxPlain,⁶ Visual Dx,^7,8 and Isabel.⁹ Isabel is a tool that can convert a patient's presenting clinical features into a list of relevant diseases or triage advice based on responses to a limited set of questions. It has been trained for over two decades on how diseases present using both evidence-based knowledge and feedback from practicing clinicians. Isabel now uses machine learning-based technology, which generates a list of likely diseases.⁹

In parallel to developments in technology aimed primarily at clinicians, patients have increasingly done their own research about their symptoms or condition before arriving at the clinic. This behavior has elicited pushback from clinicians already overburdened by queries from patients. There is evidence that the onslaught of online patient messages is a major driver of physician burnout. “Don’t confuse your google search with my medical degree” is a meme that has struck a chord among many clinicians. The phrase is a retort to patients who challenge the assessments of medical experts in general. This sentiment was magnified by the erroneous and conflicting information and advice that circulated during the COVID-19 pandemic.¹⁰ Now, physicians may also feel that their authority is being challenged when confronted by patient questions and demands.

This more gradual evolution of patients seeking information was punctuated emphatically in 2022 by the arrival of ChatGPT. Since then, an increasing number of patients are using it or related AI tools to research their symptoms and condition.

ChatGPT and related large language models (LLMs) are very large deep neural network programs that are pretrained on vast amounts of data and are capable of language generation. They can mimic human conversation and instantaneously return detailed responses to questions.

AI chatbots are capable of impressive diagnostic feats.^11,12 In an early study, ChatGPT-4 was able to reach a correct diagnosis in 39% of NEJM case challenges.¹³ This compared favorably to the performance of expert clinicians. Anecdotally, ChatGPT was able to diagnose a four-year old boy with pain, grinding teeth and a dragging leg that had stumped 17 specialists for three years. The correct diagnosis was a tethered cord syndrome from occult spina bifida, which was surgically corrected.¹⁴

The medical knowledge encoded in GPT-4 may be used for tasks that include diagnosis and treatment. When provided with questions about the presentation of a patient, GPT-4 can provide responses that may help clinicians address the problem at hand. The system is interactive and the user can ask follow-up questions to reach an answer.¹⁵

On the other hand, LLMs have demonstrated shortcomings in specific clinical applications.¹⁶ A study by Chen et al. showed that despite coherent-sounding responses, LLMs can also provide inaccurate information.¹⁷ When providing recommendations for cancer treatment, ChatGPT-3.5 yielded unreliable results, mixing in incorrect diagnoses in a third of cases. A review by Fraser et al. concluded that unsupervised patient use of ChatGPT for diagnosis and triage is not recommended without improvements to accuracy and extensive clinical evaluation.¹⁸

Regardless of this, armed with recommendations from AI chatbots, patients today arrive to their physician visits with a greater degree of certainty about their diagnosis and what they need. The information they have obtained can provide what appears to be a complete picture of the diagnosis and treatment options. However, some of that information may be incomplete or misleading.

Clinicians, who are already overburdened at work, tend to be ambivalent about the new information and hypotheses. To them, the suggestions proposed by patients tend to fail to take important contextual information into account or are inconsistent with current practice. On the other hand, a good physician knows that it is important to know what you don’t know. In my own primary care practice, I am smarter as a physician because I am surrounded by AI—in this instance the actual intelligence of the clinical colleagues at my institution. I also use electronic search tools in the clinic, including AI chatbots to help me think, and in making a diagnosis, to consider possibilities I may have overlooked.

What should we do now? The best AI chatbots will inevitably become more reliable and less fallible. As AI features are increasingly incorporated into internet search engines, browsers and platforms, such as Bing, Google, Safari, and Facebook, people will not need to deliberately log on to dedicated AI chatbots to utilize them. They may even be unaware that they are using them.

At least three courses of action could be useful: training patients, training providers, and training together. The overarching goal would be for AI and other sources of information to improve patients’ ability to manage their own health, and to improve how patients work with doctors. Ideally this would be done without adding extra time to a patient encounter.

Train patients

Patients should be encouraged and empowered to educate themselves about their ailments. They should also understand that before treating themselves, they should nearly always discuss the situation with a clinician. There are many variations and subtleties in a unique individual's case, and the right treatment is not something that an AI chatbot can necessarily provide.

Patients also need to be made aware of the limitations of the tools. Although they can be a valuable source of information and education, they can lead to overconfidence and even harm. To be most useful they need to be used with care. It matters how the inputs are entered, and the results need to be understood to be preliminary and viewed with an open mind.

Patients could be directed to instructions on how to use LLMs to help them diagnose their own condition and how to manage it. They could be directed to reliable tools demonstrated to provide valid and unbiased information. They could also be provided with examples of how to pose questions and the kinds of personal information that lead to the most informative response.

Train providers

There is a human impulse to push back when challenged. However, dismissing patient suggestions, and even their demands, will make the patient feel rejected and disrespected. This can reduce the patient's trust in their provider, and threaten the therapeutic alliance.

Physicians can be trained to communicate more effectively with patients about the use of AI chatbots. They can acknowledge the frustration caused by feeling unwell and not knowing why or what do. In addition, they can embrace the fact that patients know more about what is going on with their own bodies. They can express understanding and respect for the patient's impulse to find information. At the same time, they can reinforce the shared goal of not causing harm and remind patients of their unwillingness to prescribe a course of action that is not evidence based in terms of effectiveness and safety. Suggesting the compromise of “let's figure this out together” should be acceptable to all but the most insistent patient.

Train together

A third, more radical approach is for providers to explore the brave new world of chatbots together. When a patient presents a potential diagnosis or suggests a treatment, a solution might be to say: “There is so much information out there to deal with. Let's see what we can find out.” The physician might then turn the computer screen so the patient can see it, and initiate a search. Ideally, this practice should strengthen the partnership of patients with health professionals and increase the likelihood of effective care and desired outcomes.

There are potential barriers to the strategies enumerated above, but they are surmountable. The first and most problematic issue may be time, which is already in perilously short supply for practicing clinicians. But, it is possible that the use of AI technology could be applied to summarize information in the electronic record and automate or reassign tasks that currently compete for clinician time.¹⁹ Another issue is the prevailing culture that views clinicians as the guardians of information and decision makers. But there are benefits to both patients and clinicians to including patients as partners in their own care. A third issue is market failures which allow low quality tools to exist and distort the value of information available to patients to aid in diagnosis. Regulation and certification of specific tools would be needed to provide evidence of those that provide safe and valid information and advice.

AI tools will inevitably become smarter and less fallible. They are also certain to become a more ubiquitous element of the ecology of health care practice that everyone will need to adapt to. Patients and providers are currently confronted by a firehose of information relevant to their health and condition. Armed with chatbots and other AI tools, the nearly boundless stream of information is less apparent to patients, who are served up a more finite portion to consider. The situation is more difficult for clinicians to confront, as they attempt to reconcile potential diagnoses and suggestions for treatment with their own knowledge and experience. Adopting the compromise of “let's figure this out together” could be the most manageable strategy. This will require directed efforts to educate and inform patients, train clinicians, change practice culture, and make the time for training together.

References

Rosenbluth

Dr. Chatbot will see you now. New York Times. 2024. https://www.nytimes.com/2024/09/11/health/chatbots-health-diagnosis-treatments.html

Balogh

Miller

Ball

. Improving diagnosis in health care. Washington, DC: National Academies of Sciences, Engineering and Medicine. National Academies Press, 2015. https://www.ncbi.nlm.nih.gov/books/NBK338596/ (accessed 6 October 2024).

Presiado

Montero

Lopes

, et al. KFF misinformation tracking pool: artificial intelligence and health information. KFF. 2024. https://www.kff.org/health-misinformation-and-trust/poll-finding/kff-health-misinformation-tracking-poll-artificial-intelligence-and-health-information/ (accessed 6 October 2024).

Burkhart

Dr. Jack Myers, 84, a pioneer in computer-aided diagnoses. New York Times. 1998. https://www.nytimes.com/1998/02/22/us/dr-jack-myers-84-a-pioneer-in-computer-aided-diagnoses.html (accessed 6 October 2024).

Myers

. The background of INTERNIST-I and QMR. In: Blum

Duncan

(eds) A history of medical informatics. New York: ACM Press, 1990, pp.427–433.

Barnett

Cimino

Hupp

, et al. DXplain. An evolving diagnostic decision-support system. JAMA 1987; 258: 67–74.

Vardell

Bou-Crick

. Visualdx: A visual diagnostic decision support tool. Med Ref Serv Q 2012; 31: 414–424.

David

Chira

Eells

, et al. Diagnostic accuracy in patients admitted to hospitals with cellulitis. Dermatol Online J 2011; 17: 1.

Bond

Schwartz

Weaver

, et al. Differential diagnosis generators: an evaluation of currently available computer programs. J Gen Intern Med 2012; 27: 213–219.

10.

Boyle

. Physicians learn to partner with Dr. Google. AAMCNews. 2023. https://www.aamc.org/news/physicians-learn-partner-dr-google (accessed 6 October 2024).

11.

Lee

Bubeck

Petro

. Benefits, limits, and risks of GPT-4 as an AI Chatbot for medicine. N Engl J Med 2023; 388: 1233–1239.

12.

Levine

Tuwani

Kompa

, et al. The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study. Lancet Digit Health 2024; 6: e555–e561.

13.

Kanjee

Crowe

Rodman

. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 2023; 330: 78–80.

14.

Holohan

. A boy saw 17 doctors over 3 years for chronic pain. ChatGPT found the diagnosis. Today. 2023. https://www.today.com/health/mom-chatgpt-diagnosis-pain-rcna101843 (accessed 6 October 2024).

15.

Wilhelm

Roos

Kaczmarczyk

. Large language models for therapy recommendations across 3 clinical specialties: comparative study. J Med Internet Res 2023; 25: e49324.

16.

Barile

Margolis

Cason

, et al. Diagnostic accuracy of a large language model in pediatric case studies. JAMA Pediatr 2024; 178: 313–315.

17.

Chen

Kann

Foote

, et al. Use of artificial intelligence chatbots for cancer treatment information. JAMA Oncol 2023; 9: 1459–1462.

18.

Fraser

Crossland

Bacher

, et al. Comparison of diagnostic and triage accuracy of Ada health and WebMD symptom checkers, ChatGPT, and physicians for patients in an emergency department: clinical data analysis study. JMIR Mhealth Uhealth 2023; 11: e49995.

19.

Momenaei

Mansour

Kuriyan

, et al. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol 2024; 35: 205–209. Epub 2024 Feb 7. PMID: 38334288.