The Scientific Knowledge of Bard and ChatGPT in Endocrinology,Diabetes,and Diabetes Technology: Multiple-Choice Questions Examination-Based Performance

Abstract

Background:

The present study aimed to investigate the knowledge level of Bard and ChatGPT in the areas of endocrinology, diabetes, and diabetes technology through a multiple-choice question (MCQ) examination format.

Methods:

Initially, a 100-MCQ bank was established based on MCQs in endocrinology, diabetes, and diabetes technology. The MCQs were created from physiology, medical textbooks, and academic examination pools in the areas of endocrinology, diabetes, and diabetes technology and academic examination pools. The study team members analyzed the MCQ contents to ensure that they were related to the endocrinology, diabetes, and diabetes technology. The number of MCQs from endocrinology was 50, and that from diabetes and science technology was also 50. The knowledge level of Google’s Bard and ChatGPT was assessed with an MCQ-based examination.

Results:

In the endocrinology examination section, ChatGPT obtained 29 marks (correct responses) of 50 (58%), and Bard obtained a similar score of 29 of 50 (58%). However, in the diabetes technology examination section, ChatGPT obtained 23 marks of 50 (46%), and Bard obtained 20 marks of 50 (40%). Overall, in the entire three-part examination, ChatGPT obtained 52 marks of 100 (52%), and Bard obtained 49 marks of 100 (49%). ChatGPT obtained slightly more marks than Bard. However, both ChatGPT and Bard did not achieve satisfactory scores in endocrinology or diabetes/technology of at least 60%.

Conclusions:

The overall MCQ-based performance of ChatGPT was slightly better than that of Google’s Bard. However, both ChatGPT and Bard did not achieve appropriate scores in endocrinology and diabetes/diabetes technology. The study indicates that Bard and ChatGPT have the potential to facilitate medical students and faculty in academic medical education settings, but both artificial intelligence tools need more updated information in the fields of endocrinology, diabetes, and diabetes technology.

Keywords

Google’s board ChatGPT knowledge intellect level endocrinology diabetes technology

Introduction

Google’s Bard and OpenAI'’s ChatGPT are two leading artificial intelligence (AI) chatbots that are designed to generate and disseminate information. Bard is powered by LaMDA and is trained on a large data set, to generate realistic results. ChatGPT is a generative pre-trained transformer chatbot.^1,2 Google’s Bard and ChatGPT are advanced AI tools that can swiftly gather, interpret, and provide information on a variety of topics. Both are used to provide information and generate article drafts, which makes them valuable tools for research, answering exam questions, and other tasks.^3,4

Google’s Bard and ChatGPT have recently had a significant impact on academia and research, and offer exciting possibilities for students, faculty, and educators. However, Google’s Bard and ChatGPT also pose threats to the traditional framework of research and education and by reducing critical thinking skills.^5,6 Different viewpoints exist regarding the optimal use of ChatGPT in research, education, and health care, with some ambiguity surrounding its acceptability and ideal uses. Nevertheless, there is a notable scarcity of literature that evaluates the proficiency of Google’s Bard and ChatGPT specifically in the field of endocrinology, diabetes, and diabetes technology.^7,8

The most recent literature highlights the role of ChatGPT in basic and clinical medical sciences,⁹ and its use in diabetes technology.¹⁰ Moreover, there is great debate about Google’s Bard and ChatGPT, and their potential applications in medical sciences. However, endocrinology and diabetes literature are lacking in assessing the knowledge level of AI. The present study aimed to investigate the Bard and ChatGPT knowledge in endocrinology and diabetes through a multiple-choice question (MCQ) examination format.

Research Methodology

Study Design and Settings

The current cross-sectional study was carried out in July 2023 in the Department of Physiology, College of Medicine, King Saud University and Diabetes Research Institute, Mills-Peninsula Medical Center, San Mateo, CA, USA.

Establishing MCQ Bank

Initially, the research team member prepared MCQs bank based on the questions from various textbooks, including Guyton and Hall’s Text of Medical Physiology; First Aid USMLE Step 1; First Aid USMLE 2; AMBOSS Step 1; Kumar and Clark’s Clinical Medicine; and university examination pools. The MCQs and answer key were carefully checked, and assured that all the questions were relevant to the subject contents. Each question was scenario-based with four sub-stems and had a single correct answer. The MCQ scenario was modified where necessary without disturbing the exact meaning of the question and key answer.

Selection of MCQs Examination

In this study, a total of 100 MCQs in endocrinology, diabetes, and diabetes technology were selected from the MCQs pool (Table 1). Fifty MCQs from endocrinology, and 50 MCQs from diabetes and diabetes technology were selected. In endocrinology, the MCQs were based on the physiology, pathophysiology of hormones, normal hormone levels, and the diagnosis and management of various endocrine disorders. Similarly, the MCQs from diabetes and diabetes technology were based on the pathophysiology of type 1 and type 2 diabetes mellitus, gestational diabetes mellitus, normal and abnormal levels of blood glucose, hemoglobin A1C (HbA1c), epidemiology of diabetes, complications, and management. We assessed the knowledge level of Google’s Bard and ChatGPT. The questions were entered manually one by one, and a fresh Google’s Bard and ChatGPT session was started for each entry to avoid memory retention bias.

Table 1.

Distribution of MCQs in Endocrinology, Diabetes, and Diabetes Technology.

Physiology MCQs	Number of MCQs
Endocrinology	50
Diabetes and diabetes technology	50
Total	100

Abbreviation: MCQ, multiple-choice question.

MCQs and Higher Cognitive Functions

Worldwide, universities, and examination bodies frequently use MCQs while assessing knowledge in various examinations. Multiple-choice questions are commonly utilized in medical education as instruments to encourage learning processes. Medical schools and licensing testing agencies around the world administer MCQ-based exams.¹¹

Multiple-choice questions play a pivotal role in perfectly evaluating cognitive capacities, critical thinking skills, and problem-solving abilities. Multiple-choice questions provide a broad range of assessments of critical thinking and provide an ample evaluation of higher cognitive functions.^12,13 Multiple-choice questions examine the test-taker’s ability to connect concepts and analyze evidence across numerous contexts. Moreover, MCQs provide a robust framework for assessing higher cognitive functions, by promoting critical thinking and the application of knowledge.^12,13

Exclusion of MCQs

MCQs were carefully checked to ensure that no answers, related content, or explanations could be indexed on search engines. In addition, we removed any test questions that had visual components, such as clinical images, graphs, and illustrations from the study.

Data Collection

The MCQs were manually entered into Google’s Bard and ChatGPT, and the tool’s responses were recorded. The initial response obtained was used as the final response, and we did not use the option “regenerate response.” Scoring was done on a scale of 0 or 1, with zero signifying incorrect and one representing correct, based on a pre-determined answer key.

Ethical Approval and Statistical Analysis

The MCQs were gathered and constructed using databases and textbooks, and this study did not involve any animal or human subjects, hence, ethical permission was not required. The data were thoroughly scrutinized, and the findings were documented and analyzed. The comparison of marks was calculated, and presented in numbers and percentages (%) to establish the differences between the marks obtained by Bard and ChatGPT.

Results

The knowledge of Google’s Bard and ChatGPT was assessed on individual MCQs in endocrinology, diabetes, and diabetes technology. The performance assessment was based on a comprehensive set of standardized test questions. The MCQ-based examination was well-organized and well-arranged. Table 1 demonstrates the distribution of MCQs. A total of 100 MCQs were selected: endocrine, 50 MCQs; and both diabetes and diabetes technology combined, 50 MCQs (Table 1).

In the endocrinology examination section, ChatGPT obtained 29 marks (correct responses) of 50 (58%) and Bard obtained a similar mark of 29 marks of 50 (58%). However, in the diabetes and diabetes technology sections, ChatGPT obtained 23 marks of 50 (46%) and Bard obtained 20 marks of 50 (40%). However, with the whole score in the entire set of MCQs, ChatGPT obtained 52 marks of 100 (52%) and Bard obtained 49 marks of 100 (49%). The required score was determined at 60%, but both the Bard and ChatGPT did not achieve the required scores in endocrinology, diabetes, and diabetes technology (Table 2, Figure 1). The pattern of MCQs and the responses of Bard and ChatGPT are presented in Figures 2–3.

Table 2.

Comparison of Marks Obtained by Bard and ChatGPT in Multiple-Choice Question Examination in Endocrinology, Diabetes, and Diabetes Technology.

Distribution of MCQs	Marks obtained by ChatGPT (n = 100 MCQs)	Marks obtained by Bard (n = 100 MCQs)
Endocrinology (50 MCQs)	29/50 (58%)	29/50 (58%)
Diabetes and diabetes technology (50 MCQs)	23/50 (46%)	20/50 (40%)
The total score obtained (100 MCQs)	52/100 (52%)	49/100 (49%)

Abbreviations: ChatGPT, chat generative pre-training transformer; MCQ, multiple-choice question.

Figure 1.

Marks obtained by Bard and ChatGPT in endocrinology, diabetes, and diabetes technology MCQ-based examination.

Figure 2.

Google’s Bard and ChatGPT responses on MCQs in diabetes-allied complications.

Figure 3.

Google’s Bard and ChatGPT responses on MCQs in laboratory reports for the management of diabetes mellitus.

Discussion

Google’s Bard and ChatGPT have been the subject of much attention from the public, students, academicians, researchers, and the science community. These language models can provide swift and articulate answers to questions across a wide range of disciplines. They are also useful tools for enhancing scientific knowledge, generating essays, and providing explanations.¹⁴ There is a debate about the knowledge and intelligence levels of Google’s Bard and ChatGPT. The literature is lacking to assess and compare the knowledge level of Google’s Bard and ChatGPT in the fields of endocrinology, diabetes, and diabetes technology. In the present study, overall, ChatGPT obtained 52 marks of 100 (52%) and Bard obtained 49 marks of 100 (49%) (Table 2, Figure 1). Both the Bard and ChatGPT attempted all the questions but did not achieve satisfactory scores in endocrinology, diabetes, and diabetes technology.

To our knowledge, this is the first study of its kind to compare these two chatbots head-to-head in the field of knowledge about facts in the areas of endocrinology, diabetes, and diabetes technology to evaluate and compare the knowledge of two different AI tools, ChatGPT and Bard. The knowledge evaluation at the same time on two different AI tools provides more validity and reliability of the examination contents as well as AI tools.

Recently, Meo et al⁹ conducted a study and reported that ChatGPT obtained 74% marks in basic medical sciences and 70% marks in clinical medical sciences with an overall combined score of 72% in both basic and clinical medical sciences. Similarly, Duong et al¹⁴ found that ChatGPT achieved a score of 68.2%, and Gilson et al¹⁵ found that ChatGPT received 44%, 42%, 64.4%, and 57.8% in various examinations. Similarly, in the present study, overall, in the endocrinology/diabetes/diabetes technology examination, ChatGPT obtained 52% marks and Bard obtained 49% marks. In addition to these studies, Beaulieu-Jones et al¹⁶ found that ChatGPT-4 correctly answered 71% and 68% of MCQs. In another study, Mihalache et al¹⁷ reported that ChatGPT answered 46% of the questions, and Antaki et al¹⁸ reported that ChatGPT achieved 55.8% scores in basic and clinical science and 42.7% on the ophthalmology questions. In another study, Friederichs et al¹⁹ found that ChatGPT obtained 65.5% correct marks in the examination.

For a better understanding of the scientific knowledge of Bard and ChatGPT, it is vital to know the passing scores in international examinations, such as the United States Medical Licensing Examination (USMLE) conducted for worldwide physicians. The USMLE program provides a recommended pass or fail outcome on all step examinations. The percentages of correctly answered items required to pass each USMLE step, examinees must answer approximately 60% of items correctly to achieve a passing score.²⁰ Similarly, for the Medical Council of Canada Qualifying Examination (MCCQE), a passing score is between 60% and 70% of the questions.²¹ In the present study, overall, in the endocrinology/diabetes/diabetes technology examination, ChatGPT obtained 52% and Bard obtained 49% marks. Both the Bard and ChatGPT attempted all the questions, however, about 50% of both tests did not demonstrate a satisfactory score. It shows that AI still needs more updates in knowledge to achieve a passing score and reach the level of human knowledge.

The ground-breaking AI technology opens a slew of new avenues for inquiry, laying the groundwork for the knowledge required to explain underlying medical logic. More research is needed to further study and evaluate the capabilities of Google’s Bard and ChatGPT in tackling medical-reasoning queries. As technology progresses, new medical educational approaches could be developed that fully utilize the potential of Google’s Bard and ChatGPT in various domains of medical sciences, including endocrinology, diabetes, and diabetes technology.

Study Strengths and Limitations

The strength of this study of Google’s Bard, ChatGPT is that the capacity to demonstrate knowledge and answer questions about endocrinology, diabetes, and diabetes technology is highly essential, Google’s Bard and ChatGPT may eventually provide a simple tool to present appropriate knowledge, which is an acute need at this time. This study has some limitations, however. The study is based on a small size set of MCQs, and the knowledge level was not in real-world students’ settings. The MCQs were based on only scenarios without images in the MCQs.

Conclusions

Google’s Bard and ChatGPT demonstrated knowledge of endocrinology/diabetes/diabetes technology by scoring satisfactorily on a range of MCQs. ChatGPT and Google’s Bard obtained similar scores. These findings suggest that ChatGPT and Bard could be used to help medical professionals, and academicians for a better understanding of endocrinology/diabetes/diabetes technology, however, these chatbots will need to analyze more data from various sources, such as the medical literature, medical textbooks, websites, databases, and social media to be able to provide sufficient performance as necessary to pass a licensing examination. Finally, Google’s Bard and ChatGPT provided clear and consistent explanations for their answers, demonstrating a satisfactory degree of understanding.

Footnotes

Acknowledgements

The authors extend their appreciation to the “Deputyship for Research and Innovation, Ministry of Education in Saudi Arabia for funding this research work through project number (IFKSUOR3-51-1).”

Abbreviations

AI, artificial intelligence; Bard, AI-powered chatbot tool; ChatGPT, chat generative pre-training transformer; MCCQE, Medical Council of Canada Qualifying Examination; MCQs, multiple-choice question; USMLE, United States Medical Licensing Examination.

Author Contributions

S.A.M. contributed to research design, writing, and editing the manuscript; A.A.A., T.A.K., and A.S.M. contributed to literature review, checking, and analysis. D.C.K. critically reviewed the manuscript. All authors have read and approved the manuscript.

Ethical Statement

Humans and animals were not engaged; hence, ethical approval was not required.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Deputyship for Research and Innovation, Ministry of Education, Saudi Arabia (IFKSUOR3-51-1).

ORCID iDs

Sultan Ayoub Meo

Abdulelah Adnan AbuKhalaf

Anusha Sultan Meo

David C. Klonoff

References

Rahaman Md

Ahsan

MMT

Anjum

Rahman Md

Rahman

. The AI race is on! Google’s Bard and OpenAI’s ChatGPT head-to-head: an opinion article. SSRN Electr J. 2023. doi:10.2139/SSRN.4351785.

Salvagno

Taccone

Gerli

. Correction to: can artificial intelligence help for scientific writing? Crit Care. 2023;27(1):99. doi:10.1186/s13054-023-04390-0.

Hutson

. Could AI help you to write your next paper. Nature. 2022;611(7934):192-193. doi:10.1038/d41586-022-03479-w.

Ram

Verma

. Artificial intelligence AI-based Chatbot Study of ChatGPT, Google AI Bard and Baidu AI. World J Adv Eng Technol Sci. 2023;8(1):258-261. doi:10.30574/WJAETS.2023.8.1.0045.

Aydın

. Google Bard generated literature review: metaverse. 2023. https://papers.ssrn.com/abstract=4454615.

Rahman

Watanabe

. ChatGPT for education and research: opportunities, threats, and strategies. Appl Sci. 2023;13:5783. doi:10.3390/app13095783.

King

. The future of AI in medicine: a perspective from a Chatbot. Ann Biomed Eng. 2022;51:291-296. doi:10.1007/s10439-022-03121-w.

Hosseini

Gao

Liebovitz

, et al. An exploratory survey about using ChatGPT in education, healthcare, and research [published online ahead of print April 3, 2023]. Medrxiv. 2023. doi:10.1101/2023.03.31.23287979.

Meo

Al- Masri

Alotaibi

Meo

MZS

Meo

MOS

. ChatGPT knowledge evaluation in basic and clinical medical sciences: multiple choice question examination-based performance. Healthcare (Basel). 2023;11(14):2046. doi:10.3390/healthcare11142046.

10.

Huang

Yeung

Kerr

Klonoff

. Using ChatGPT to predict the future of diabetes technology. J Diabetes Sci Technol. 2023;17(3):853-854. doi:10.1177/19322968231161095.

11.

Palmer

Devitt

. Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple-choice questions? BMC Med Educ. 2007;7:49. doi:10.1186/1472-6920-7-49.

12.

Mingo

Chang

Williams

. Undergraduate students’ preferences for constructed versus multiple-choice assessment of learning. Innov Higher Educ. 2018;43:143-152.

13.

Bhayana

Krishna

Bleakney

. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology. 2023;307(5):e230582.

14.

Duong

Solomon

. Analysis of large-language model versus human performance for genetics questions. Eur J Hum Genet [published online ahead of print May 29, 2023]. doi:10.1038/s41431-023-01396-8.

15.

Gilson

Safranek

Huang

, et al. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. doi:10.2196/45312.

16.

Beaulieu-Jones

Shah

Berrigan

Marwaha

Lai

Brat

. Evaluating capabilities of large language models: performance of GPT4 on surgical knowledge assessments [published online ahead of print July 24, 2023]. medRxiv. doi:10.1101/2023.07.16.23292743.

17.

Mihalache

Popovic

Muni

. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141(6):589-597. doi:10.1001/jamaophthalmol.2023.1144.

18.

Antaki

Touma

Milad

El- Khoury

Duval

. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. 2023;3(4):100324. doi:10.1016/j.xops.2023.100324.

19.

Friederichs

März

. ChatGPT in medical school: how successful is AI in progress testing. Med Educ Online. 2023;28(1):2220920.

20.

USMLE. Scoring & score reporting. https://www.usmle.org/bulletin-information/scoring-and-score-reporting. Accessed July 27, 2023.

21.

Outlines of MCCWQ part 1 exam. https://www.aceqbank.com/mccqe-part-1-exam-outline-2021/. Accessed July 27, 2023.