The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches

Abstract

Aim:

This study assesses the reliability of artificial intelligence (AI) large language models (LLMs) in identifying relevant literature comparing inguinal hernia repair techniques.

Material and Methods:

We used LLM chatbots (Bing Chat AI, ChatGPT versions 3.5 and 4.0, and Gemini) to find comparative studies and randomized controlled trials on inguinal hernia repair techniques. The results were then compared with existing systematic reviews (SRs) and meta-analyses and checked for the authenticity of listed articles.

Results:

LLMs screened 22 studies from 2006 to 2023 across eight journals, while the SRs encompassed a total of 42 studies. Through thorough external validation, 63.6% of the studies (14 out of 22), including 10 identified through Chat GPT 4.0 and 6 via Bing AI (with an overlap of 2 studies between them), were confirmed to be authentic. Conversely, 36.3% (8 out of 22) were revealed as fabrications by Google Gemini (Bard), with two (25.0%) of these fabrications mistakenly linked to valid DOIs. Four (25.6%) of the 14 real studies were acknowledged in the SRs, which represents 18.1% of all LLM-generated studies. LLMs missed a total of 38 (90.5%) of the studies included in the previous SRs, while 10 real studies were found by the LLMs but were not included in the previous SRs. Between those 10 studies, 6 were reviews, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews.

Conclusions:

This study reveals the mixed reliability of AI language models in scientific searches. Emphasizing a cautious application of AI in academia and the importance of continuous evaluation of AI tools in scientific investigations.

Introduction

The application of artificial intelligence (AI) in medical research is a subject of increasing debate, characterized by its transformative potential across various domains such as data collection, analysis, and article writing.^1–3 The advent of AI technologies, particularly large language models (LLMs), heralds significant advances in the efficiency and capabilities of medical research, yet it also raises critical ethical and practical concerns.⁴ While LLMs like ChatGPT are capable of responding to complex queries with high efficiency, their integration into sensitive fields such as medicine must be navigated with caution due to potential inaccuracies and the ethical implications of their use in health care settings.⁵

LLMs are increasingly used to generate, correct, evaluate, and analyze text through sophisticated natural language processing (NLP) techniques. These models, which learn from vast corpora of digitized text, have demonstrated a remarkable ability to perform language-related tasks that are critical in medical research, such as literature synthesis and hypothesis generation.^6–9 However, the reliance on generative AI for sourcing medical literature is contentious, primarily due to concerns over the models’ tendency to produce fabricated or noncredible findings, a significant risk when the stakes involve human health and clinical outcomes.^10,11

Given these concerns, we aimed to critically assess the reliability of literature references provided by LLMs, focusing on recent systematic reviews (SRs) and meta-analyses that compared open, laparoscopic, and robotic techniques in inguinal hernia repair. By contrasting these AI-sourced references to those cited in traditional academic research and considering the recent insights provided by Fleming et al. (2023),¹² our goal is to ascertain the credibility and potential biases of AI-driven tools in medical research contexts. This study seeks to provide empirical evidence on the efficacy and reliability of LLMs, potentially guiding future integrations of generative AI in medical research.^2–9

Methods

Bing Chat AI, ChatGPT 4.0 (OpenAI, San Francisco, California, April 2023 version), ChatGPT 3.5 (OpenAI, San Francisco, California, January 2022 version), and Google Gemini were instructed to identify observational comparative studies and randomized controlled trials (RCTs) comparing open, laparoscopic, and robotic repair techniques indexed on PubMed, Embase, and Scopus. The LLMs were then instructed to calculate the total number of results generated from each database and present a comprehensive list of references derived from the search query. Comparative studies and RCTs were sought both simultaneously and independently. Results were tabulated for reference, verified with the results of the prior systematic review and meta-analyses, and confirmed online for validity. Two authors (J.P.G.K. and C.A.B.S.) thoroughly extracted relevant studies from the compared articles, and a third person (D.L.L.) reviewed the extracted studies for accuracy and completeness. A Venn diagram was generated using the Bioinformatics and Evolutionary Genomics Venn diagram tool to display the results of the analysis.¹³ Complete text inputs and results can be viewed in Supplementary Appendix. LLMs were not used in the writing or editing of the main content of this article.

Results

Through the utilization of LLMs, generative AI yielded a total of 22 possibly relevant scientific studies citations. These citations encompassed a complete bibliographic dataset, including authorship details, article titles, publication dates, volume and issue numbers, page ranges, and PubMed identifiers (PMIDs). The scope of these generated studies spanned from 2006 to 2023, with publications in journals noted for their high impact factors and distinct influence in the field.¹⁴

An in-depth analysis of these findings highlights that only 4 out of the 22 studies (18.2%) generated citations were previously recognized in recent SRs and meta-analyses.^15–17 Moreover, the authenticity of these 22 generated studies underwent a rigorous external validation process. This investigation revealed that the majority (14/22, 63.6%) were genuine publications, as depicted in Figure 1, while the remaining studies (8/22, 36.3%) were identified as fabrications, all of which were uncovered through Google Gemini. Significantly, two out of the eight fabricated studies (25.0%) were erroneously linked to legitimate digital object identifiers (DOIs) that, upon examination, were associated with publications unrelated to the search query, detailed in Table 1.

FIG. 1.

Venn diagram comparing results of a query for studies comparing open, laparoscopic, and robotic interventions for inguinal hernia repair via multiple artificial large language model chatbots (blue), previous published systematic reviews, and meta-analyses (red), and identified real studies (green).

Table 1.

Large Language Model Performance Compared with Articles from Systematic Reviews

Study retrieved from artificial intelligence (AI) large language model chatbot	Databases cited	AI source	Real article?	Paper reported in previous systematic review	Article corresponding to AI-reported DOI
Haladu N, Alabi A, Brazzelli M, Imamura M, Ahmed I, Ramsay G, Scott NW. Open versus laparoscopic repair of inguinal hernia: an overview of systematic reviews of randomised controlled trials. Surg Endosc. 2022 Jul;36 (7):4685–4700. doi: 10.1007/s00464-022-09161-6. Epub 2022 Mar 14. PMID: 35286471; PMCID: PMC9160137.	PubMed/Embase/Scopus	Bing AI	Yes	No	Haladu N, Alabi A, Brazzelli M, Imamura M, Ahmed I, Ramsay G, Scott NW. Open versus laparoscopic repair of inguinal hernia: an overview of systematic reviews of randomised controlled trials. Surg Endosc. 2022 Jul;36 (7):4685–4700. doi: 10.1007/s00464-022-09161-6. Epub 2022 Mar 14. PMID: 35286471; PMCID: PMC9160137.
Mohan, R., Yeow, M., Wong, J.Y.S. et al. Robotic versus laparoscopic ventral hernia repair: a systematic review and meta-analysis of randomised controlled trials and propensity score matched studies. Hernia 25, 1565–1572 (2021). https://doi.org/10.1007/s10029-021-02501-w	PubMed/Scopus	Bing AI	Yes	No	Mohan, R., Yeow, M., Wong, J.Y.S. et al. Robotic versus laparoscopic ventral hernia repair: a systematic review and meta-analysis of randomised controlled trials and propensity score matched studies. Hernia 25, 1565–1572 (2021). https://doi.org/10.1007/s10029-021-02501-w
Alabi, A., Haladu, N., Scott, N.W. et al. Mesh fixation techniques for inguinal hernia repair: an overview of systematic reviews of randomised controlled trials. Hernia 26, 973–987 (2022). https://doi.org/10.1007/s10029-021-02546-x	PubMed	Bing AI	Yes	No	Alabi, A., Haladu, N., Scott, N.W. et al. Mesh fixation techniques for inguinal hernia repair: an overview of systematic reviews of randomised controlled trials. Hernia 26, 973–987 (2022). https://doi.org/10.1007/s10029-021-02546-x
Peñafiel, J.A.R., Valladares, G., Cyntia Lima Fonseca Rodrigues, A. et al. Robotic-assisted versus laparoscopic incisional hernia repair: a systematic review and meta-analysis. Hernia (2023). https://doi.org/10.1007/s10029-023-02881-1	Embase	Bing AI	Yes	No	Peñafiel, J.A.R., Valladares, G., Cyntia Lima Fonseca Rodrigues, A. et al. Robotic-assisted versus laparoscopic incisional hernia repair: a systematic review and meta-analysis. Hernia (2023). https://doi.org/10.1007/s10029-023-02881-1
Solaini, L., Cavaliere, D., Avanzolini, A. et al. Robotic versus laparoscopic inguinal hernia repair: an updated systematic review and meta-analysis. J Robotic Surg 16, 775–781 (2022). https://doi.org/10.1007/s11701-021-01312-6	PubMed/Scopus	Bing AI / ChatGPT 4.0	Yes	No	Solaini, L., Cavaliere, D., Avanzolini, A. et al. Robotic versus laparoscopic inguinal hernia repair: an updated systematic review and meta-analysis. J Robotic Surg 16, 775–781 (2022). https://doi.org/10.1007/s11701-021-01312-6
Qabbani A, Aboumarzouk OM, ElBakry T, Al-Ansari A, Elakkad MS. Robotic inguinal hernia repair: systematic review and meta-analysis. ANZ J Surg. 2021 Nov;91 (11):2277–2287. doi: 10.1111/ans.16505. Epub 2021 Jan 21. PMID: 33475236.	Pubmed	ChatGPT 4.0	Yes	Paper given by AI was one of the used for comparison	Qabbani A, Aboumarzouk OM, ElBakry T, Al-Ansari A, Elakkad MS. Robotic inguinal hernia repair: systematic review and meta-analysis. ANZ J Surg. 2021 Nov;91 (11):2277–2287. doi: 10.1111/ans.16505. Epub 2021 Jan 21. PMID: 33475236.
LeBlanc KA. Design of a comparative outcome analysis of open, laparoscopic, or robotic-assisted incisional or inguinal hernia repair utilizing surgeon experience and a novel follow-up model. Contemp Clin Trials. 2019 Nov;86:105853. doi: 10.1016/j.cct.2019.105853. Epub 2019 Oct 25. PMID: 31669560.	PubMed	ChatGPT 4.0	Yes	No	LeBlanc KA. Design of a comparative outcome analysis of open, laparoscopic, or robotic-assisted incisional or inguinal hernia repair utilizing surgeon experience and a novel follow-up model. Contemp Clin Trials. 2019 Nov;86:105853. doi: 10.1016/j.cct.2019.105853. Epub 2019 Oct 25. PMID: 31669560.
Prabhu AS, Carbonell A, Hope W, Warren J, Higgins R, Jacob B, Blatnik J, Haskins I, Alkhatib H, Tastaldi L, Fafaj A, Tu C, Rosen MJ. Robotic Inguinal vs Transabdominal Laparoscopic Inguinal Hernia Repair: The RIVAL Randomized Clinical Trial. JAMA Surg. 2020 May 1;155 (5):380–387. doi: 10.1001/jamasurg.2020.0034. PMID: 32186683; PMCID: PMC7081145.	PubMed/Embase/Scopus	Bing AI/ChatGPT 4.0	Yes	Yes	Prabhu AS, Carbonell A, Hope W, Warren J, Higgins R, Jacob B, Blatnik J, Haskins I, Alkhatib H, Tastaldi L, Fafaj A, Tu C, Rosen MJ. Robotic Inguinal vs Transabdominal Laparoscopic Inguinal Hernia Repair: The RIVAL Randomized Clinical Trial. JAMA Surg. 2020 May 1;155 (5):380–387. doi: 10.1001/jamasurg.2020.0034. PMID: 32186683; PMCID: PMC7081145.
Huerta S, Timmerman C, Argo M, Favela J, Pham T, Kukreja S, Yan J, Zhu H. Open, Laparoscopic, and Robotic Inguinal Hernia Repair: Outcomes and Predictors of Complications. J Surg Res. 2019 Sep;241:119–127. doi: 10.1016/j.jss.2019.03.046. Epub 2019 Apr 22. PMID: 31022677.	PubMed	ChatGPT 4.0	Yes	Yes	Huerta S, Timmerman C, Argo M, Favela J, Pham T, Kukreja S, Yan J, Zhu H. Open, Laparoscopic, and Robotic Inguinal Hernia Repair: Outcomes and Predictors of Complications. J Surg Res. 2019 Sep;241:119–127. doi: 10.1016/j.jss.2019.03.046. Epub 2019 Apr 22. PMID: 31022677.
LeBlanc K, Dickens E, Gonzalez A, Gamagami R, Pierce R, Balentine C, Voeller G; Prospective Hernia Study Group. Prospective, multicenter, pairwise analysis of robotic-assisted inguinal hernia repair with open and laparoscopic inguinal hernia repair: early results from the Prospective Hernia Study. Hernia. 2020 Oct;24 (5):1069–1081. doi: 10.1007/s10029-020–02224-4. Epub 2020 Jun 3. PMID: 32495043.	PubMed	ChatGPT 4.0	Yes	Yes	LeBlanc K, Dickens E, Gonzalez A, Gamagami R, Pierce R, Balentine C, Voeller G; Prospective Hernia Study Group. Prospective, multicenter, pairwise analysis of robotic-assisted inguinal hernia repair with open and laparoscopic inguinal hernia repair: early results from the Prospective Hernia Study. Hernia. 2020 Oct;24 (5):1069–1081. doi: 10.1007/s10029-020–02224-4. Epub 2020 Jun 3. PMID: 32495043.
Tatarian T, Nie L, McPartland C, Brown AM, Yang J, Altieri MS, Spaniolas K, Docimo S, Pryor AD. Comparative perioperative and 5-year outcomes of robotic and laparoscopic or open inguinal hernia repair: a study of 153,727 patients in the state of New York. Surg Endosc. 2021 Dec;35 (12):7209–7218. doi: 10.1007/s00464-020–08211-1. Epub 2021 Jan 4. PMID: 33398566.	PubMed	ChatGPT 4.0	Yes	No	Tatarian T, Nie L, McPartland C, Brown AM, Yang J, Altieri MS, Spaniolas K, Docimo S, Pryor AD. Comparative perioperative and 5-year outcomes of robotic and laparoscopic or open inguinal hernia repair: a study of 153,727 patients in the state of New York. Surg Endosc. 2021 Dec;35 (12):7209–7218. doi: 10.1007/s00464-020–08211-1. Epub 2021 Jan 4. PMID: 33398566.
Zayan NE, Meara MP, Schwartz JS, Narula VK. A direct comparison of robotic and laparoscopic hernia repair: patient-reported outcomes and cost analysis. Hernia. 2019 Dec;23 (6):1115–1121. doi: 10.1007/s10029-019–01943-7. Epub 2019 Apr 29. PMID: 31037492.	PubMed	ChatGPT 4.0	Yes	Yes	Zayan NE, Meara MP, Schwartz JS, Narula VK. A direct comparison of robotic and laparoscopic hernia repair: patient-reported outcomes and cost analysis. Hernia. 2019 Dec;23 (6):1115–1121. doi: 10.1007/s10029-019–01943-7. Epub 2019 Apr 29. PMID: 31037492.
Kudsi OY, Bou-Ayash N, Kaoukabani G, Gokcal F. Comparison of perioperative and mid-term outcomes between laparoscopic and robotic inguinal hernia repair. Surg Endosc. 2023 Feb;37 (2):1508–1514. doi: 10.1007/s00464-022–09433-1. Epub 2022 Jul 18. PMID: 35851822.	PubMed	ChatGPT 4.0	Yes	No	Kudsi OY, Bou-Ayash N, Kaoukabani G, Gokcal F. Comparison of perioperative and mid-term outcomes between laparoscopic and robotic inguinal hernia repair. Surg Endosc. 2023 Feb;37 (2):1508–1514. doi: 10.1007/s00464-022–09433-1. Epub 2022 Jul 18. PMID: 35851822.
LeBlanc KA, Gonzalez A, Dickens E, Olsofka J, Ortiz-Ortiz C, Verdeja JC, Pierce R; Prospective Hernia Study Group. Robotic-assisted, laparoscopic, and open incisional hernia repair: early outcomes from the Prospective Hernia Study. Hernia. 2021 Aug;25 (4):1071–1082. doi: 10.1007/s10029-021–02381-0. Epub 2021 May 24. PMID: 34031762.	PubMed	ChatGPT 4.0	Yes	No	LeBlanc KA, Gonzalez A, Dickens E, Olsofka J, Ortiz-Ortiz C, Verdeja JC, Pierce R; Prospective Hernia Study Group. Robotic-assisted, laparoscopic, and open incisional hernia repair: early outcomes from the Prospective Hernia Study. Hernia. 2021 Aug;25 (4):1071–1082. doi: 10.1007/s10029-021–02381-0. Epub 2021 May 24. PMID: 34031762.
Randomized controlled trial of laparoscopic versus open inguinal hernia repair: results of the first 120 patients: van der Putten HB, van der Linden JC, de Graaf H, Smulders PG, van der Wilt GJ. Hernia. 2006;10 (3):240–244. doi: 10.1007/s10029-006–0052-5.	Embase	Google Gemini	No	No	No article
Laparoscopic versus open inguinal hernia repair: a randomized controlled trial: Neuman M, Srebrov D, Siskind G, Bessler M, Bessler H. Surg Endosc. 2021 Aug;35 (8):2273–2279. doi: 10.1007/s00464-021–08737-5. Epub 2021 Jan 28.	PubMed/Embase/Scopus	Google Gemini	No	No	No article
A multicenter randomized controlled trial comparing robot-assisted laparoscopic inguinal hernia repair with laparoscopic and open inguinal hernia repair: Tárnok A, Kulcsár L, Gál Z, Fekete Z, Kovács Z, Rácz L, Sztankovics B, Szabó M, Szabó G, Kálmán J, Géczi L, Keresztes Z, Szántó T, Bereczki I, Kónya G. Surg Endosc. 2022 May;36 (5):1172–1182. doi: 10.1007/s00464-022–09059-9. Epub 2022 Feb 9.	PubMed/Embase/Scopus	Google Gemini	No	No	No article
A randomized controlled trial comparing robotic-assisted inguinal hernia repair with laparoscopic and open inguinal hernia repair in recurrent cases: Tárnok A, Kulcsár L, Gál Z, Fekete Z, Kovács Z, Rácz L, Sztankovics B, Szabó M, Szabó G, Kálmán J, Géczi L, Keresztes Z, Szántó T, Bereczki I, Kónya G. Surg Endosc. 2023 Jun;37 (6):1689–1697. doi: 10.1007/s00464-022–09745-9. Epub 2023 Mar 8.	PubMed/Embase/Scopus	Google Gemini	No	No	No article
Open, laparoscopic, and robotic inguinal hernia repair: Outcomes and predictors of complications: Kim JH, Shin YK, Kim JJ, Yoo SY, Lee SH, Choi DW, Cho IJ. J Surg Res. 2019 Sep;241:119–127. doi: 10.1016/j.jss.2019.04.020. Epub 2019 May 9.	PubMed/Embase	Google Gemini	No	No	Cameron Casson, R. Ellen Jones, Kristin M. Gee, Alana L. Beres. Does Microscopic Hematuria After Pediatric Blunt Trauma Indicate Clinically Significant Injury?. Journal of Surgical Research. Volume 241 (2019), Pages 317–322, ISSN 0022–4804, https://doi.org/10.1016/j.jss.2019.04.020
Comparison of Robotic, Laparoscopic, and Open Unilateral Repair of Non-recurrent Inguinal Hernia: Al-Shukri N, El-Khatib A, Al-Momani M, Abdel-Wahab M. Am Surg. 2022 Oct 27:31348221136572. doi: 10.1172/31348221136572. Epub 2022 Oct 27.	PubMed/Embase/Scopus	Google Gemini	No	No	No article
Meta-analysis of randomized controlled trials comparing laparoscopic with open mesh repair of recurrent inguinal hernia: Li Z, Liu X, Chen F, Shi S, Wang H. World J Surg. 2023;47 (2):439–448. doi: 10.1007/s00381-022–05950-y. Epub 2022 Dec 9.	PubMed/Embase	Google Gemini	No	No	No article
Robotic inguinal hernia repair: systematic review and meta-analysis: Kulcsár L, Tárnok A, Gál Z, Fekete Z, Kovács Z, Rácz L, Sztankovics B, Szabó M, Szabó G, Kálmán J, Géczi L, Keresztes Z, Szántó T, Bereczki I, Kónya G. ANZ J Surg. 2021 Nov;91 (11):2277–2287. doi: 10.1111/ans.16659. Epub 2021 Sep 2.	PubMed/Embase	Google Gemini	No	No	Hojo, D., Kawai, K., Murono, K., Nozawa, H., Hata, K., Tanaka, T., Nishikawa, T., Shuno, Y., Kaneko, M., Sasaki, K., Emoto, S., Ishii, H., Sonoda, H., & Ishihara, S. (2021). Establishment of deformable three‐dimensional printed models for laparoscopic right hemicolectomy in transverse colon cancer. ANZ Journal of Surgery, 91 (7–8). https://doi.org/10.1111/ans.16659

Bing AI generated six potentially relevant citations, two of which overlapped with those identified by ChatGPT. ChatGPT, in turn, produced 10 potentially relevant citations. In contrast, Google Gemini generated eight citations; however, all of these were subsequently determined to be fabricated, as previously discussed.

LLMs missed 38 out of the 42 studies (90.5%) included in previous SRs, indicating that generative AI overlooked more than 90% of the studies covered by the SRs. However, the LLMs did identify 10 genuine studies that were not included in the previous SRs. Between those 10 studies, 6 were designed as SRs, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews. Adding the 42 studies included in the SRs and the 3 comparative studies identified only by the LLMs, we got a total of 45 studies that would accomplish the SRs inclusion criteria, representing a missing of 6.7% (3/45) of the eligible studies by the published SRs.

Discussion

In the present study, we assessed the reliability of various LLMs, specifically ChatGPT, Bing Chat AI, and Google Gemini, in generating scientific literature. Our results revealed that while ChatGPT 4.0 and Bing Chat AI aligned well with the predefined search parameters, Google Gemini exhibited a propensity to fabricate citations, falsely attributing them to reputable surgical journals. Notably, ChatGPT 3.5 generated no results, underscoring significant advancements in AI capabilities with its more advanced version, ChatGPT 4.0. This contrast highlights the evolution within AI systems, although it remains crucial to recognize that not all are currently equipped to reliably conduct scientific literature searches. The necessity for meticulous external validation continues to be paramount to prevent the dissemination of false research findings.¹⁸

AI encompasses a range of subdomains, including machine learning, artificial neural networks, NLP, and computer vision.^4,18 Despite notable advancements in these areas, our analysis suggests that these technologies do not yet offer uniform reliability in their applications. This variance in readiness necessitates a cautious approach to the utilization of generative AI within scientific research.

Moreover, the integration of generative models such as ChatGPT into medical education reveals both promising applications and inherent challenges, reflecting broader concerns about AI-generated content within the scientific community. As Eysenbach (2023)¹⁹ discusses, while ChatGPT shows substantial potential in enhancing educational tools and methodologies, its tendency to generate fabricated content, such as incorrect citations, presents a significant risk. This propensity to “hallucinate” data highlights a fundamental limitation of current AI technologies, which could mislead learners and researchers, potentially leading to the propagation of erroneous information. Thus, a call for rigorous scrutiny and continuous evaluation of these models is critical, particularly in domains where precision and reliability are paramount.¹⁹

The rapid proliferation of LLMs such as ChatGPT has introduced significant advancements in text generation capabilities, making them invaluable tools in various domains, including public health and medical research. However, as De Angelis et al. (2023) highlight, these models also pose the novel threat of an “AI-driven infodemic” in public health.²⁰ The ability of LLMs to generate vast amounts of plausible yet potentially misleading or inaccurate content could exacerbate the spread of misinformation on an unprecedented scale. The authors stress the importance of stringent regulatory frameworks and the development of sophisticated detection tools to differentiate between human-generated and AI-produced texts. Such measures are crucial in mitigating the risks associated with the dissemination of false information and ensuring the integrity and credibility of health information communicated to the public.²⁰

Furthermore, AI chatbots have evolved to simulate human decision-making processes, using algorithms that allow them to interpret data and learn from experiences.²¹ The expanding applications and the increasingly nuanced capabilities of AI mark it as a burgeoning tool in the scientific community. Its potential and the implications of its evolving nature warrant continuous observation and analysis. As we advance, it is essential to monitor and evaluate the impact and reliability of AI across diverse scientific domains, anticipating the possibilities it holds for the future. In this context, it is crucial to consider both the achievements and the limitations as outlined by Thirunavukarasu et al. (2023), who caution against the premature application of such models without rigorous validation, especially in high-stakes environments such as patient care and clinical decision-making.⁵

Conclusion

Our study evaluated the reliability of generative LLMs in providing scientific literature on inguinal hernia repair techniques, specifically Open, Laparoscopic, and Robotic approaches. While advanced models such as ChatGPT 4.0 and Bing Chat AI demonstrated certain alignment with predefined search parameters, the propensity of Google Gemini to fabricate citations highlights significant inconsistencies across different Generative AI models. This variability highlights the necessity for external validation to prevent the dissemination of false findings. As AI continues to evolve, continuous monitoring, rigorous scrutiny, and the development of sophisticated validation frameworks are essential to ensure the credibility and reliability of AI-generated content in high-stakes scenarios such as health care.

Footnotes

Authors’ Contributions

The authors included each contributed significantly to this research as per the guidelines of the International Committee of Medical Journal Editors (ICMJE). Study design: V.S., J.P.G.K., D.L.L., L.T.C., F.M., and J.M. Data collection and analysis: V.S., J.P.G.K., D.L., C.A.B.S., and A.C.R. Article preparation and editing: V.S., J.P.G.K., D.L., F.M., C.A.B.S., A.C.R., J.M., and L.T.C.

Disclosure Statement

V.S., J.P.G.K., D.L., C.A.B.S., A.C.R., and J.M., disclose no conflict of interest. L.T.C. discloses consulting fees from BD and Medtronic outside the submitted study. F.M. discloses consulting fees from BD, Intuitive, Integra, DeepBlue, Allergan and Medtronic, outside the submitted study.

Funding Information

There was no funding for this project.

Supplementary Material

References

Budhwar

, Chowdhury

, Wood

, et al. Human resource management in the age of generative artificial intelligence: Perspectives and research directions on ChatGPT. Human Res Mgmt Journal, 2023; 33(3):606–659; doi: 10.1111/1748-8583.12524

Kacena

, Plotkin

, Fehrenbacher

. The use of artificial intelligence in writing scientific review articles. Curr Osteoporos Rep, 2024; 22(1):115–121; doi: 10.1007/s11914-023-00852-0

Giglio

, Costa

MUPD

. The use of artificial intelligence to improve the scientific writing of non-native english speakers. Rev Assoc Med Bras (1992), 2023; 69(9):e20230560; doi: 10.1590/1806-9282.20230560

Beam

, Drazen

, Kohane

, et al. Artificial intelligence in medicine. N Engl J Med, 2023; 388(13):1220–1221; doi: 10.1056/nejme2206291

Thirunavukarasu

, Ting

DSJ

, Elangovan

, et al. Large language models in medicine. Nat Med, 2023; 29(8):1930–1940; doi: 10.1038/s41591-023-02448-8

Zimmerman

. A ghostwriter for the masses: ChatGPT and the future of writing. Ann Surg Oncol, 2023; 30(6):3170–3173; doi: 10.1245/s10434-023-13436-0

Hutson

. Could AI help you to write your next paper? Nature, 2022; 611(7934):192–193; doi: 10.1038/d41586-022-03479-w

Katsnelson

. Poor English skills? New AIs help researchers to write better. Nature, 2022; 609(7925):208–209; doi: 10.1038/d41586-022-02767-9

Else

. Abstracts written by ChatGPT fool scientists. Nature, 2023; 613(7944):423; doi: 10.1038/d41586-023-00056-7

10.

Benda

, Novak

, Reale

, et al. Trust in AI: Why we should be designing for APPROPRIATE reliance. J Am Med Inform Assoc, 2021; 29(1):207–212; doi: 10.1093/jamia/ocab238

11.

Dwivedi

, Kshetri

, Hughes

, et al. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int J Information Management, 2023; 71:102642; doi: 10.1016/j.ijinfomgt.2023.102642

12.

Fleming

, Phillips

, Drake

, et al. Sugarbaker Versus Keyhole repair for parastomal hernia: Results of an artificial intelligence large language model post hoc analysis. J Gastrointest Surg, 2023; 27(11):2567–2570; doi: 10.1007/s11605-023-05749-y

13.

Department of Plant Systems Biology. VIB. Venn diagram plotter. VIB. 2024. Available from: https://bioinformatics.psb.ugent.be/webtools/Venn/

14.

National Institute of Environmental Health Sciences. Superfund Research Program publications in high-impact journals. NIEHS. 2024. Available from: https://tools.niehs.nih.gov/srp/publications/highimpactjournals.cfm

15.

Aiolfi

, Cavalli

, Micheletto

, et al. Primary inguinal hernia: Systematic review and Bayesian network meta-analysis comparing open, laparoscopic transabdominal preperitoneal, totally extraperitoneal, and robotic preperitoneal repair. Hernia, 2019; 23(3):473–484; doi: 10.1007/s10029-019-01964-2

16.

Aiolfi

, Cavalli

, Ferraro

, et al. Treatment of inguinal hernia: Systematic review and updated network meta-analysis of randomized controlled trials. Ann Surg, 2021; 274(6):954–961; doi: 10.1097/SLA.0000000000004735

17.

Qabbani

, Aboumarzouk

, ElBakry

, et al. Robotic inguinal hernia repair: Systematic review and meta-analysis. ANZ J Surg, 2021; 91(11):2277–2287; doi: 10.1111/ans.16505

18.

Hashimoto

, Rosman

, Rus

, et al. Artificial intelligence in surgery: Promises and perils. Ann Surg, 2018; 268(1):70–76; doi: 10.1097/SLA.0000000000002693

19.

Eysenbach

. The role of ChatGPT, generative language models, and artificial intelligence in medical education: A conversation with ChatGPT and a call for papers. JMIR Med Educ, 2023; 9(1):e46885; doi: 10.2196/46885

20.

De Angelis

, Baglivo

, Arzilli

, et al. ChatGPT and the rise of large language models: The new AI-driven infodemic threat in public health. Front Public Health, 2023; 11:1166120; doi: 10.3389/fpubh.2023.1166120

21.

Goldenberg

, Kirby

, Albrecht

, et al. AI chatbots in surgery: What does the future hold? J Plast Reconstr Aesthet Surg, 2024; 88:310–313; doi: 10.1016/j.bjps.2023.11.032

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.29 MB