Evaluating ChatGPT’s Ability to Provide Patient Care Information in Foot and Ankle Surgery: A Comparative Analysis

Abstract

Background. Patients increasingly use the Internet and artificial intelligence (AI) platforms ChatGPT for medical information, raising concerns about the accuracy and clinical depth of AI-generated content. This study evaluated the reliability and clinical utility of ChatGPT (GPT-3.5 and GPT-4.0) for common foot and ankle conditions compared with patient education materials from the American Orthopaedic Foot & Ankle Society (AOFAS) FootCareMD. Methods. Between January 20 and 26, 2025, standardized prompts were used to query GPT-3.5 and GPT-4.0 across 15 common foot and ankle conditions. ChatGPT responses were compared with AOFAS FootCareMD content based on the number of symptoms, risk factors, and treatment options provided. Two fellowship-trained foot and ankle orthopaedic surgeons independently evaluated response accuracy, categorizing outputs as <50%, 50% to 74%, 75% to 99%, or 100% accurate. Paired t-tests were used for statistical comparisons, and inter-rater reliability was assessed using Cohen’s weighted kappa. Results. GPT-4.0 generated significantly more symptoms than AOFAS content (P = .015). In contrast, GPT-3.5 listed significantly fewer treatment options than both AOFAS and GPT-4.0 (P = .042). When addressing surgical management, both ChatGPT versions frequently provided vague or incomplete information. GPT-3.5 referenced surgery without procedural detail in 53% of responses, while GPT-4.0 lacked detailed surgical explanations or omitted them entirely in 80% of responses. Overall accuracy ratings were high, with 77% of responses judged as 75% to 99% accurate and only 3.4% rated below 50% accuracy. However, inter-rater agreement between surgeons was poor (κ = −0.02), for responses labeled as 100% accurate, highlighting subjectivity in grading AI-generated medical content. Conclusion. ChatGPT effectively provides general information on foot and ankle conditions, regarding causes and symptoms, and GPT-4.0 offers more comprehensive treatment discussions than GPT-3.5. Nevertheless, its limited depth and specificity regarding surgical options restrict its clinical usefulness. Until further improvements are made, AI-generated content should serve as a supplement rather than a replacement for expert-reviewed patient education resources.

Level of Evidence: Level III Case Control Study

Keywords

artificial intelligence ChatGPT foot and ankle conditions health information accuracy

Get full access to this article

View all access options for this article.

References

Dave

Athaluri

Singh

ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. doi:10.3389/frai.2023.1169595

Klein

LaGreca

White

Trasolini

Cohn

RM.

Comparison of sports medicine questions on the orthopaedic in-training examination between 2009 and 2012 and 2017 and 2020 reveals an increasing number of references. Arthrosc Sports Med Rehabil. 2023;5(2):e479-e488. doi:10.1016/j.asmr.2023.01.018

Gordon

Daniel

Ajiboye

, et al. A scoping review of artificial intelligence in medical education: BEME guide no. 84. Med Teach. 2024;46(4):446-470. doi:10.1080/0142159X.2024.2314198

Clay

Da Custodia Steel

Jacobs

Human-computer interaction: a literature review of artificial intelligence and communication in healthcare. Cureus. 2024;16(11):e73763. doi:10.7759/cureus.73763

FootCareMD. Conditions & treatments. Published 2025. Accessed April 16, 2025. https://www.footcaremd.org/conditions-treatments

Hofmann

Guerra

, et al. The rapid development of artificial intelligence: GPT-4’s performance on orthopedic surgery board questions. Orthopedics. 2024;47(2):e85-e89. doi:10.3928/01477447-20230922-05

Khamisy-Farah

Biras

Shehadeh

, et al. Gender and sexuality awareness in medical education and practice: mixed methods study. JMIR Med Educ. 2024;10:e59009. doi:10.2196/59009

Emile

Horesh

Garoufalia

Gefen

Boutros

Wexner

SD.

Assessment of the utility of artificial intelligence-based Chatbots in patient education: a systematic review and meta-analysis. Am Surg. 2026;92(1):258-269. doi:10.1177/00031348251367031

Gupta

Kingston

O’Malley

Williams

Ramkumar

PN.

Advancements in artificial intelligence for foot and ankle surgery: a systematic review. Foot Ankle Orthop. 2023;8(1):24730114221151079. doi:10.1177/24730114221151079

10.

Dale

Cheng

Casselman

Currie

The scope and limitations of extant research into ChatGPT as a tool for patient education: systematic review. medRxiv. 2025. doi:10.1101/2025.05.20.25328009

11.

Kasapovic

Ali

Babasiz

, et al. Does the information quality of ChatGPT meet the requirements of orthopedics and trauma surgery? Cureus. 2024;16(5):e60318. doi:10.7759/cureus.60318

12.

Giorgino

Alessandri-Bonetti

Luca

, et al. ChatGPT in orthopedics: a narrative review exploring the potential of artificial intelligence in orthopedic practice. Front Surg. 2023;10:1284015. doi:10.3389/fsurg.2023.1284015

13.

Morya

Lee

Shahid

, et al. Application of ChatGPT for orthopedic surgeries and patient care. Clin Orthop Surg. 2024;16(3):347-356. doi:10.4055/cios23181

14.

Massey

Montgomery

Zhang

AS.

Comparison of ChatGPT-3.5, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. J Am Acad Orthop Surg. 2023;31(23):1173-1179. doi:10.5435/JAAOS-D-23-00396

15.

Jaques

Abdelghafour

Perkins

Nuttall

Haidar

Johal

A study of orthopedic patient leaflets and readability of AI-generated text in foot and ankle surgery (SOLE-AI). Cureus. 2024;16(12):e75826. doi:10.7759/cureus.75826

16.

Seth

Lower

Bulloch

Seth

Letter to the editor: editorial: artificial intelligence applications and scholarly publication in orthopaedic surgery. Clin Orthop Relat Res. 2023;481(8):1652-1653. doi:10.1097/CORR.0000000000002725

17.

Cooperman

Olaniyan

Brandão

RA.

AI discernment in foot and ankle surgery research: a survey investigation. Foot Ankle Surg. 2025;31(3):214-219. doi:10.1016/j.fas.2024.10.001

18.

Sparks

Fasulo

Windsor

, et al. ChatGPT is moderately accurate in providing a general overview of orthopaedic conditions. JB JS Open Access. 2024;9(2):e23.00129. doi:10.2106/JBJS.OA.23.00129

19.

Fahim

Hasani

Kabba

Ragab

WM.

Artificial intelligence in healthcare and medicine: clinical applications, therapeutic advances, and future perspectives. Eur J Med Res. 2025;30(1):848. doi:10.1186/s40001-025-03196-w

20.

Jacob

Brasier

Laurenzi

, et al. AI for IMPACTS framework for evaluating the long-term real-world impacts of AI-powered clinician tools: systematic review and narrative synthesis. J Med Internet Res. 2025;27:e67485. doi:10.2196/67485