Abstract
This short report evaluated the accuracy and quality of information provided by ChatGPT regarding the use of complementary and integrative medicine for cancer. Using the QUality Evaluation Scoring Tool, a panel of 12 reviewers assessed ChatGPT's responses to 8 questions. The study found that ChatGPT provided moderate-quality responses that were relatively unbiased and not misleading. However, the chatbot's inability to reference specific scientific studies was a significant limitation. Patients with cancer should not rely on ChatGPT for clinical advice until further systematic validation. Future studies should examine how patients perceive ChatGPT's information and its impact on communication with health care professionals.
Introduction
Patients with cancer often turn to complementary and integrative medicine (CIM) to manage their symptoms or treatment-related toxicities. 1 The internet remains one of the most common sources of information regarding CIM. 2 However, misinformation and falsified claims on the internet, such as inaccurate information on the use of CIM for cancer, remain a concerning problem. 3
The development of ChatGPT, a language model that was released to the public by OpenAI in 2022, has generated considerable interest in various fields such as health care. 4 Using artificial intelligence technology, this chatbot can interact in a conversational manner and is capable of performing various tasks, such as answering questions, and coding, through training on data from the internet. 5 Studies have examined the use of ChatGPT to provide information on cancer, such as cancer symptoms and general misconceptions, but not on the use of complementary or integrative medicine. 6,7
In a recent editorial, Cramer posted an intriguing question “But the changes in society toward an information society are not leaving medicine unscathed. But do they also affect CIM, where direct human contact is still so much more important? I think so.” 8 It is becoming inevitable that the rise of artificial intelligence will affect CIM in terms of research and delivery of care. In response to this editorial, this brief report aimed to evaluate the accuracy and quality of information provided by ChatGPT on CIM use for cancer.
Methods
On March 30, 2023, the authors presented ChatGPT (GPT-3, February 13 version) with eight questions covering various aspects of CIM, including the indications of CIM modalities (Q1: Can I use acupuncture for general cancer pain or pain caused by taking anastrozole? Q2: Is omega-3 supplement helpful for peripheral neuropathy caused by chemo?), drug interactions (Q3: Can I take St. John's wort supplements when taking irinotecan? Q4: Can I take green tea supplements when taking gefitinib?), potential adverse effects (Q5: Will ginseng affect my sleep? Q6: Will ginseng damage my liver if I am a cancer patient?), and special precautions/controversies (Q7: I have estrogen-receptor positive breast cancer. Can I take soy products? Q8: Can I receive acupuncture if I have lymphedema?) (Table 1). These types of questions are common concepts and frequently asked questions on popular CIM websites, such as the American Cancer Society and the Memorial Sloan Kettering Cancer Center websites.
ChatGPT Responses to the Questions on Integrative Oncology
NA when reviewers provided a score <2 in “attribution” (see Table 2 for scoring details).
IQR, interquartile range; NA, not applicable; QUEST, QUality Evaluation Scoring Tool.
A panel of 12 reviewers, which comprised pharmacists, integrative medicine practitioners, oncologists, public health professionals, and cancer survivors, evaluated the responses from ChatGPT using the QUality Evaluation Scoring Tool (QUEST). 9 The QUEST is a seven-item questionnaire that has been shown to be a reliable tool to evaluate online information. 9 The authors modified the QUEST criteria to remove “authorship” because chatbot responses are not authored. The remaining 6 items were “attribution,” “type of study,” “conflict of interest,” “currency,” “complementarity,” and “tone” (Table 2), with a total score of 26 points.
Modified QUality Evaluation Scoring Tool Criteria for Evaluating Online Information
The inter-rater agreement was evaluated using Fleiss' kappa, which is a test used to measure the inter-rater agreement between two or more raters (κ > 0.75 indicates excellent agreement, κ of 0.40 to 0.75 indicates fair to good agreement, and κ < 0.40 indicates poor agreement). 10 Statistical analyses were conducted using R 4.1.2 (R Foundation, Vienna, Austria), the “irr” package (v0.84.1). 11
To assess the accuracy of the content, the authors also manually consolidated corresponding information from clinical guidelines and well-established databases for each question and presented them alongside the chatbot responses. The reviewers were asked to highlight any potential misinformation or falsified claims in the responses by ChatGPT.
Results
Table 1 shows the responses of ChatGPT to the eight questions presented on March 30, 2023. The median total scores rated by the 12 reviewers were the lowest at 13 points (out of 26 points) for the questions on the interaction between St. John's wort and irinotecan (Q3), adverse reactions of ginseng (Q5), and special precautions on the use of acupuncture for lymphedema (Q8). The scores were the highest at 16 points for the questions on the indication of a CIM modality (Q2) and special precaution/controversies on the use of soy products for patients with breast cancer (Q7). The inter-rater agreement for most questions was fair to good and ranged from a κ of 0.43 (lowest) for Q8 to 0.62 (highest) for Q7.
The distribution of scores for each criterion of the QUEST was relatively consistent across the eight questions. The reviewers generally provided high ratings for ChatGPT responses regarding providing unbiased information (“conflict of interest,” median score: 2/2), encouraging the patient–physician relationship (“complementarity,” median score: 1/1), and having a balanced or neutral tone (“tone,” median score: 1–2/2). However, the scores were low when referencing or presenting scientific studies in the responses (“attribution,” median score: 0–1/3), specifying the type of study or evidence (generally not applicable when reviewers provided a score <2 in “attribution”), and dating the evidence or references (“currency,” median score: 0/2).
In terms of accuracy of information, none of the reviewers identified any explicit misinformation, although two reviewers commented that the team's consolidated responses from clinical guidelines/literature are simply incomparable with the chatbot responses as the latter was vague in nature.
The ChatGPT responses were generated anew each time. Therefore, the authors extracted the responses to the same set of questions 1 month after the initial extraction and compared them (Supplementary Table S1). Generally, there were no major changes to the responses. The authors also used Google to search for answers to the eight questions, and the corresponding results are provided in Supplementary Table S2. Generally, the results are shorter compared with those generated by ChatGPT. The Google results typically originate from a single source, unlike ChatGPT that summarizes information from multiple sources. It is also worth noting that the answers obtained from Google are less straightforward to comprehend as they simply quote the words directly from the websites.
Discussion
Overall, this study showed that ChatGPT provided responses of moderate quality to the questions regarding integrative oncology. The responses were relatively unbiased and did not provide falsified claims or misinformation. Particularly, ChatGPT acknowledged the importance of health care professionals in evaluating the patient's individual situation and providing general guidance on the safety and appropriateness of CIM modalities. Many responses included statements such as “If you are considering/interested in …, it is important to talk to your healthcare provider first to ensure that it is safe for you to try ….” However, in reality, patients often fail to disclose their CIM use to physicians, 12 and a tool such as ChatGPT may further discourage communication on CIM use between patients and health care professionals.
Notably, the modest agreement among reviewers on the quality of information for certain questions indicates that readers may have different interpretations of the response provided. An example of this possibility is that Fleiss' kappa was the lowest for the question on the use of acupuncture for lymphedema, which remains a controversial topic because of limited evidence in the literature. Future studies should examine how patients interpret and perceive the information provided by ChatGPT, particularly when the responses are not definitive. Hence, it is also important to understand patients' downstream actions and tendencies to seek professional advice after using this chatbot.
One of the disadvantages of ChatGPT is its inability to refer to specific scientific studies or literature. When mentioning scientific evidence, ChatGPT often relies on vague statements, such as “several/some studies have suggested that …” without providing details on the type of study or any specific study from which the information was derived. This finding is expected because ChatGPT is supposed to provide simplified, succinct, and aggregated responses that laypersons can understand. Even if ChatGPT provides reference sources or lists citations, it can generate fake citations or summarize incorrect sources. 13,14 This error presents a considerable limitation because verifying the information or evaluating the contemporaneity of the references or evidence without access to the underlying sources is not possible.
In some cases, ChatGPT appears to fabricate convincing responses that contain factual errors or incorrect data when summarizing information about the effectiveness of therapies. 5 Integrative oncology emphasizes evidence-informed cancer care that uses multiple approaches. Decisions on which approach to use should incorporate patients' values, along with practitioners' clinical experience and current research evidence for the approaches. 15 Therefore, ChatGPT cannot be substituted for an integrative oncology consultation to determine the optimal modalities for a particular patient. Until ChatGPT can provide reliable reference sources and undergo more systematic validation, patients with cancer should be cautioned against relying on it for clinical advice on integrative oncology.
It is recommended that patients should seek reliable information about CIM in cancer from reputable sources. General information about integrative medicine can be obtained from various sources, such as textbooks, journals, and most commonly, reputable websites. In contrast to ChatGPT, these sources undergo rigorous review processes and are typically authored by professionals with expertise in the field. While ChatGPT can offer quick access to information and engage in interactive conversations, it may not have the same level of expertise and depth of information from those trusted sources. When finding information on the internet, patients can look for sources from government agencies, universities, hospitals, or health organizations.
This can be verified by examining the letters at the end of the web address. For example, the URL of government websites end with
This study has a few limitations. First, this study consisted of only eight questions, which may not be representative of how ChatGPT provides information on the topic of integrative oncology; however, the authors believe that they still serve the study purpose of investigating the quality of information provided by ChatGPT and identifying potential areas for future research. Second, the authors applied the QUEST, which may not be well suited to evaluate chatbot responses. However, to date, there has been no quality assessment tool that is specific to assessing chatbot responses. Information provided by ChatGPT originates from online health information found on the internet. Therefore, the authors consider that the QUEST was a reasonable option for this exploratory study. Third, the level of inter-rater agreement may be influenced, to some extent, by the heterogeneity of the panel, which included a diverse range of health care professionals and patients.
The authors included patients and different health care professionals as participants of this study, considering that ChatGPT is freely accessible and used by various stakeholders involved in cancer care. In addition, the overall fair-to-good level of agreement observed in most questions suggested that there was no significant divergence among participant regarding their assessment of the quality of information generated by ChatGPT. Besides, the version of ChatGPT was trained on data before 2021, and how this chatbot can keep up to date with new information and continue to provide accurate responses are unclear. Notably, a newer version of ChatGPT (GPT-4) has been released at the time of preparing this article, but it requires a subscription. 16 Future research should systematically evaluate updated models of the chatbot and its performance with different questions related to integrative oncology and cancer in general.
Conclusions
In conclusion, this study showed that ChatGPT can provide responses of moderate quality to questions regarding the use of CIM for cancer. The information from ChatGPT is relatively unbiased and not falsified. However, this chatbot's inability to refer to specific scientific studies or literature presents a major limitation. Importantly, patients with cancer should be reminded not to rely on ChatGPT for clinical advice until it undergoes more systematic validation. Future research should examine how patients perceive the information provided by ChatGPT and its effect on communication between patients and health care professionals.
Footnotes
Authors' Contributions
C.S.L.: Conceptualization, methodology, formal analysis, investigation, data curation, writing—original draft, writing—review and editing. R.H., H.K.K., K.Z., T.N.L., C.P.L., W.L.L., C.L.W., Y.M.L., H.H-.F.L., and V.C-.H.C.: Conceptualization, investigation, writing—review and editing. Y.T.C.: Conceptualization, methodology, formal analysis, investigation, supervision, writing—review and editing.
Ethics Approval
This study was approved by the Survey and Behavioural Research Ethics Committee of the Chinese University of Hong Kong (reference no. SBRE-22-0723).
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Table S1
Supplementary Table S2
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
