Abstract
Objectives
This study compares the effectiveness of ChatGPT and Google Gemini in creating educational materials for patients with varicose veins and deep vein thrombosis of the lower extremities.
Methods
In this cross-sectional study, we used ChatGPT and Google Gemini to generate patient education materials for the two conditions. The materials were evaluated based on word count, sentence length, ease of understanding (using the Flesch-Kincaid calculator), similarity (analyzed with Quillbot), and reliability (assessed with a modified DISCERN score). Statistical analysis was performed using the unpaired t-test and Pearson correlation coefficient.
Results
The analysis found no significant differences between the materials produced by ChatGPT and Google Gemini regarding readability, word count, sentence length, or reliability. Correlation analysis showed a positive relationship in usability scores between the two tools, while reliability scores were negatively correlated. However, these correlations were not statistically significant.
Conclusion
ChatGPT and Google Gemini are equally effective in creating educational materials for patients with varicose veins and deep vein thrombosis of the lower extremities.
Introduction
Patient education plays a crucial role in disease management. By increasing disease awareness, patients can identify symptoms early, follow treatment regimens, and enhance collaboration with healthcare providers, ultimately improving outcomes and reducing complications.1–3 Artificial Intelligence (AI) tools are increasingly being used for patient education, thanks to their ability to efficiently integrate medical information.4,5 Unlike traditional search engines, AI can offer personalized and easily accessible health guidance. Although AI tools like ChatGPT and Google Gemini are widely used for medical information generation,6,7 their reliability and effectiveness in vascular disease education have not been systematically assessed.
Methods
A one-week cross-sectional study was conducted in February 2025. Since no human participants were involved, ethical approval was not required. The study focused on two common vascular diseases: varicose veins and deep vein thrombosis of the lower extremities. Two artificial intelligence tools, ChatGPT 3.5 and Google Gemini, were used to generate patient education materials. Prompts provided to both tools included: “Compile educational materials for patients with varicose veins of the lower extremities” and “Compile educational materials for patients with deep vein thrombosis of the lower extremities.” The generated materials were collected in Microsoft Word documents. The Flesch-Kincaid calculator was used to assess ease of understanding, including word count, sentence count, ease of understanding, and readability scores. 8 The originality of the content was evaluated using the Quillbot plagiarism tool, 9 and the modified DISCERN score was applied to assess the reliability of the information. Two vascular surgeons independently assessed the texts, with the average of their scores serving as the final result. This tool includes five questions to evaluate content quality, accuracy, and reliability on a five-point scale. 10 Final data were exported to Microsoft Excel for analysis using R version 4.3.2. Comparisons between the materials generated by ChatGPT and Google Gemini were performed using the unpaired t-test, with a p-value <.05 considered statistically significant. The Pearson correlation coefficient was used to evaluate the relationship between easy and reliability scores.
Results
Characteristics of responses generated by ChatGPT and Google Gemini.
Additionally, no significant differences were found between the two models in similarity percentage (63.52 vs 52.14, p = .1913) or reliability scores (2.32 vs 2.32, p = 1.0). The materials also showed no significant differences in grade level (12.85 vs 10.28, p = .1965) or ease score (33.28 vs 35.42, p = .8025) (see Table 1).
Correlation between patient education materials produced by ChatGPT and google gemini: ease score and reliability score comparison.
Discussion
Our findings showed no significant differences in readability metrics—including word count, sentence count, average words per sentence, average syllables per word, grade level, and ease score-between ChatGPT and Google Gemini. This suggests that both mainstream large language models (LLMs) offer a similar baseline for creating patient education brochures, with comparable text structure and readability. Clinicians and health educators can expect similar initial material with essential readability features, regardless of the model used.
Although both models in this study performed similarly in terms of readability, the potential of AI technology for personalized health education is vast. LLMs can adapt vocabulary, sentence structure, and content depth by analyzing patient data, such as age, cultural background, education level, and comorbidities, to create highly tailored educational materials. 11 This ability to customize and adjust readability is vital for enhancing patient health literacy and improving adherence to disease management, ensuring that the information aligns with the patient’s comprehension and needs.12,13 Future research should explore how to maximize this core capability of LLMs in clinical practice.
When evaluating content quality, both ChatGPT and Google Gemini showed no significant differences in similarity percentage or reliability scores, indicating that both models can serve as reliable information sources. However, it is important to note that LLMs are trained on vast amounts of internet text data, which may include inaccurate or inappropriate information, potentially leading to misleading content. 14 For example, a previous study comparing Google Gemini and ChatGPT on status epilepticus content found both models received a DISCERN score of only 3. 11 Similarly, in our study, both models achieved a reliability score of just 2.32. This underscores the limitations of even advanced LLMs in generating professional medical information, highlighting the need for caution.
Although the patient education guidelines generated by ChatGPT in this study showed a high average similarity percentage, we do not consider this plagiarism. Rather, it reflects the inherent function of LLMs. These models generate content by learning from large volumes of public information, often resulting in overlap with existing materials, especially for basic medical knowledge in patient education. However, it is essential to minimize similarity with existing literature to maintain the credibility of medical education. This is crucial for ensuring academic ethical compliance and protecting institutional reputation. 15 Even when high similarity arises from shared general knowledge, it should be minimized whenever possible. Future research should focus on strategies to balance LLM efficiency with the need for originality. This could involve refined prompt engineering or the integration of plagiarism detection tools to identify and address potentially problematic “high similarity” content.
Our research found a strong positive correlation in ease of use scores between patient education materials generated by ChatGPT and Google Gemini. In contrast, a moderate negative correlation was observed for reliability scores. However, neither correlation was statistically significant. The positive correlation in ease of use suggests that both models face similar challenges or use similar strategies to optimize usability in patient education materials. The negative correlation in reliability, though not significant, may indicate that the models occasionally sacrifice completeness or depth to enhance comprehensibility, which could indirectly affect reliability. This underscores the need to balance ease of use with reliability when using LLMs for patient education materials. Over-simplification should be avoided to prevent distortion or omission of key information. Future research should explore this potential trade-off further, using refined evaluation tools or user experience testing.
We recommend the cautious use of LLMs in medical settings. LLMs can serve as a starting point for patient education materials or initial drafts for clinician-authored handbooks. Their efficiency, ease of understanding, and ability to personalize content can help save healthcare professionals valuable time. However, for promotional materials used directly in clinics or information related to critical medical decisions, LLM-generated content must undergo rigorous review and validation by clinical professionals. Despite LLMs’ strong readability performance, their potential limitations in reliability and similarity to existing literature require manual verification prior to publication. This ensures the information is accurate, original, and aligned with the latest medical guidelines, thus preventing misinformation and preserving the professionalism and credibility of healthcare services.
Limitations
This study focused on two conditions: varicose veins and deep vein thrombosis of the lower extremities, which may limit the generalizability of the findings. Future research should explore a broader range of diseases to provide a more comprehensive assessment. Additionally, we analyzed only two AI tools, restricting the scope of our comparison.
Conclusion
This study found no significant differences between ChatGPT and Google Gemini in terms of average usability, reliability, or the quality of patient education materials for varicose veins and deep vein thrombosis of the lower extremities. Moreover, there was no clear correlation between the usability and reliability scores generated by these two tools. This suggests that both AI systems produce similar outcomes when creating patient education materials for specific diseases. Further research is needed to assess additional AI platforms and expand the range of diseases studied, particularly newer ones. Most importantly, the generated content should be compared with the latest medical guidelines to ensure that patients receive accurate, up-to-date information.
Footnotes
Ethical approval
Since no human participants were involved, ethical approval was not required.
Author contributions
Zhoupeng Wu designed the present study; Jing Huang wrote the manuscript and collected data; Zhoupeng Wu supervised the study to ensure they were conducted correctly and revised the manuscript. Manuscript revision and final approval of the version to be published.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Guarantor
Zhoupeng Wu (The guarantor is the person willing to take full responsibility for the article, including for the accuracy and appropriateness of the reference list).
Disclosure of relationships and activities
All the authors had completed disclosure of relationships and activities on the manuscript.
