ChatGPT,Deepseek and Google AI: Are They Acceptable Sources of Health Information for Patients with Patellofemoral Instability?

Abstract

Background:

Patellofemoral instability (PFI) is common in the adolescent population. Patients and families increasingly use online sources to seek information about various health conditions, including PFI. The quality of online health information related to PFI has not been evaluated and may be inconsistent or misleading.

Hypothesis:

The purpose of our study was to perform a critical appraisal of the quality of PFI educational material available online using three common AI tools (ChatGPT 4o, DeepSeek-R1 and Google Gemini 2.5 pro).

Methods:

Based on existing literature and online resources, 14 questions, commonly asked by PFI patients, were formulated. The keywords expected in each answer were defined a priori. (Table 1) The answers to these questions, as provided by the three AI tools, were evaluated by 2 independent raters, using DISCERN score (16 questions addressing the clarity, balance, and content of the information, maximum score 80), GQS point system (Global Quality Scale, maximum 5 points) and keyword inclusion percentage. The readability of each answer was evaluated by the Flesch-Kincaid grade level, Flesch reading ease score and reading level. Statistical analysis was performed.

Results:

The mean DISCERN score for ChatGPT, DeepSeek and Google AI were 54.3 (+ 6.0), 57.6 (+2.9) and 53.2 (+2.9), respectively, which is overall ‘good’ quality of content (Table 2). The DISCERN score for DeepSeek was significantly better than Google AI (p=0.02). The mean GQS points were 4.5 (+ 0.5), 4.3 (+0.4) and 4.2 (+0.5), respectively, which is overall ‘good’ quality of content and flow. There were no differences for GQS points between the 3 sources (p=0.4). The mean keyword inclusion percentages were 74.7%, 71.4% and 82.7%, respectively, with no differences between them (p=0.51). The readability of the online content was ‘college’ level for all 3 AI tools, with no differences in Flesch Reading Ease Score (p=0.50), Flesch-Kincaid Grade Level (p=0.12), and Reading Level (p=0.69).

Conclusion:

The online PFI-related information provided by ChatGPT, DeepSeek and Google AI for patients and families is overall good (but not excellent) quality. DeepSeek had a slightly better score than Google AI. The readability of the online content was around college level for all three AI tools which is higher than the currently NIH recommended 6th grade level for ease of reading and interpretation. The current PFI-related online health information is considered acceptable though there is room for improvement.