Assessing the understandability,actionability,reliability,and readability of ChatGPT-4o in providing patient education on urinary incontinence

Abstract

Objective

This study assesses ChatGPT-4o′s responses to common patient inquiries regarding urinary incontinence (UI), a condition that significantly impacts quality of life but often goes untreated due to low healthcare-seeking behavior. The evaluation focuses on four key metrics: understandability, actionability, reliability, and readability.

Material and Methods

In this non-human subject qualitative study, 13 patient-focused questions—derived from AUA/SUFU and EAU guidelines—were posed to ChatGPT-4o in Turkish. The questions were categorized into four themes: Definition, Diagnosis, Management, and Surgical Considerations. Three blinded experts (an urogynecologist, a urologist, and a pelvic floor physiotherapist) independently evaluated the responses using the Patient Education Materials Assessment Tool (PEMAT) for understandability and actionability and the modified DISCERN (mDISCERN) tool for reliability. Readability was measured using the Çetinkaya–Uzun formula, specifically designed for Turkish text. Statistical analysis included descriptive statistics and the Intraclass Correlation Coefficient (ICC) to determine inter-rater reliability.

Results

In evaluating ChatGPT-4o’s performance in urinary incontinence education, experts found strong agreement in their assessments, with inter-rater reliability scores were 0.80 (95% CI: 0.70-0.91) for PEMAT and 0.82 (95% CI: 0.70-0.91) for mDISCERN. The AI’s responses were consistently highly understandable, particularly when explaining diagnoses (achieving a peak score of 94.4 %), yet they were significantly less actionable, meaning they often failed to provide clear, practical steps for patients to follow. This gap was most evident in surgical considerations, which were deemed the least actionable at 68.2 %. The overall reliability of the content was rated as “fair” across all categories—with surgical information being the most reliable. Most responses were classified as “difficult,” requiring a university-level education to comprehend, with surgery-related topics being the most linguistically complex.

Conclusion

While ChatGPT-4o yields comprehensible health information, its limited actionability and high linguistic complexity pose barriers to patients with lower health literacy.

Keywords

urinary incontinence patient education artificial intelligence chatgpt-4o

Introduction

Modern healthcare paradigms prioritize patient empowerment through the acquisition of self-management skills. Effective communication is a prerequisite for the health-seeking behaviors necessary to treat conditions like urinary incontinence (UI)^1,2. Historically, traditional educational tools—like patient leaflets and TV broadcasts—served as primary mediums for delivering health information. However, since the advent of the World Wide Web and platforms such as YouTube, many patients increasingly turn to the internet to access health information independently.³ Patient education strategies, such as structured counseling, group sessions, multimedia resources, and digital tools, are vital to improve awareness and reduce stigma around sensitive conditions like UI. Greater education is directly linked with earlier health-seeking behavior and adherence to treatment.

In regions like Türkiye, where UI education is often opportunistic and unstructured, this digital shift is particularly pronounced. Generative AI, specifically Large Language Models (LLMs) like ChatGPT, offers a new frontier for delivering tailored, empathetic health information. However, current online resources, such as YouTube, frequently lack the clarity or actionable guidance required for effective patient use. Given that only approximately one-quarter of women with UI obtain medical advice, evaluating whether AI can bridge this information gap is a clinical necessity.

AI-powered chatbots have been adopted across specialties to provide health guidance, including in chronic disease management, mental health support, telemedicine triage, and perioperative care.⁴ ChatGPT, developed by OpenAI and based on the Generative Pre-trained Transformer (GPT) architecture and has evolved significantly since its inception in 2018.⁵ With advancements through GPT-2, GPT-3, and GPT-4, the tool now supports real-time, multimodal interaction via GPT-4o, enabling text, voice, and image understanding.⁶ These capabilities make ChatGPT a valuable asset in patient-centered healthcare, delivering accessible, tailored, and empathetic information—enhancing trust, reducing anxiety, and improving comprehension.^5–7 Previous studies suggest that, when thoughtfully integrated into clinical contexts, ChatGPT can improve health literacy and support shared decision-making.^8,9

While some researchers have emphasized the potential application of ChatGPT in the field of urogynecology, such as its comparable accuracy and completeness in counseling for pelvic floor surgery,¹⁰ its practicality remains uncertain. In the context of UI, studies show that online platforms like YouTube often host content that is poorly understandable and lacking actionable guidance—e.g., 87.5% of English-language incontinence videos were deemed not easily understandable or actionable for viewers.¹¹ Other studies have shown that much of the publicly available online information regarding UI is incomplete, inaccurate, or conveyed at an understandability level too high for the average patient.¹² Consequently, high-quality UI information online is scarce, which may deter women from seeking professional care. Epidemiological data indicate that only around one quarter of women with UI obtain medical advice or treatment.¹³ This is a major concern, as UI significantly diminishes quality of life.¹⁴ Beyond physical discomfort, UI can cause psychological distress such as shame and poor self-image, and has even been associated with increased mortality in older adults.¹⁴

Despite the growing interest in digital health resources, there is still a clear research gap regarding the understandability and actionability of educational materials on UI, particularly those generated by AI. Given the rising popularity of ChatGPT, it is essential to evaluate the reliability of AI-generated content in the context of UI diagnosis and management. This study aimed to investigate how ChatGPT-4o counsels and guides women regarding UI, with a specific focus on understandability, actionability, reliability and readability of the information provided for the general Turkish public evaluated by health experts.

Materials and methods

I. Study design and selection process

The present study was deemed exempt from ethical approval by the Bezmialem Vakif University Institutional Review Board, as it involved a non-human subject qualitative design. Initially 25 patient questions regarding urinary incontinence were derived from patient materials available based on American Urological Association/Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction (AUA/SUFU) and the European Association of Urology (EAU).^15–18 After excluding 12 queries; a list of prompts comprising 13 UI related questions remained. (Figure 1) (Appendix 1). The prompts were strategically structured into four primary themes:

Figure 1.

Flowchart of the selection process for patient questions on urinary incontinence.

The questions were categorized into four primary themes:

1. Definition “Q1- What does urinary incontinence mean?”, “Q2-What are the different types of urinary incontinence?”

2. Diagnosis “Q3-Why do I leak urine when coughing and sneezing?”, “Q4- Why do I leak urine when I feel an urgent need to urinate and can’t reach the bathroom?”

3. Management – “Q5- Where should I attend to for my urinary incontinence complaints?”, “Q6-Can urinary incontinence be treated?”, “Q7-What are the treatment options for urinary incontinence?”, “Q8-What are the non-surgical treatment options for urinary incontinence?”, “Q9- What does pelvic floor training mean?”, “Q10- What does bladder training mean?”

4. Surgical Considerations – “Q11-Should I consider surgery for stress urinary incontinence?”, “Q12- Is sling surgery a good option for me?”, “Q13-What kind of complications are associated with sling surgery?”

II. Evaluation methodology

On June 25, 2025, all three of the researchers who were blinded to each other accessed ChatGPT-4o and input the prompt, “I am a patient and I want information regarding involuntary leakage of urine”, written in Turkish. The researchers then posed the above mentioned 13 patient questions about UI. Following the ChatGPT-4o interaction, the generated responses were evaluated by three clinical experts: an urogynecologist (A.F.G.K.), a physiotherapist (B.C.), and a urologist (A.I.). All 13 prompts were assessed using the PEMAT and modified DISCERN tools. Validated Turkish versions of both scales were employed. In addition, readability was measured using the Çetinkaya-Uzun readability formula tailored for the Turkish language by the principal researcher.

III. Scale and formulas used for evaluation of content

a. The PEMAT instrument

The Patient Education Materials Assessment Tool (PEMAT) was employed to systematically evaluate the understandability and actionability of the patient education materials included in this study. The PEMAT is a validated instrument designed to determine whether health-related content can be easily comprehended and acted upon by individuals with varying levels of health literacy. The instrument consists of two subscales: understandability, measuring the clarity, word choice, visual layout, and organization of the content; and actionability, assessing whether users can identify and implement recommended health behaviors. Each item is rated as “agree” (1), “disagree” (0), or “not applicable,” and subscale scores are calculated as percentages by dividing the total score by the total possible score, excluding items marked “N/A.” This is done separately for the Understandability section (17 items) and the Actionability section (7 items). A higher percentage means the material is easier to understand or act on. This methodology enables objective appraisal of the materials’ effectiveness in communicating essential health information in an accessible and user-centered manner. (Appendix 2).

b. Modified DISCERN instrument (mDISCERN)

The DISCERN instrument is a standardized, validated tool designed to assess the quality of written consumer health information and consists of three parts. Developed to guide both healthcare professionals. The first part is used to assess the reliability of the information while the second and third parts determine the overall quality. The modified DISCERN (mDISCERN) tool has previously been utilized to evaluate¹⁹ reliability of responses. mDISCERN tool only includes the first part of the original tool (Appendix 3). Each item was rated on a 5-point Likert scale ranging from 1 (“not at all”) to 5 (“definitely yes”), with an optional “not applicable” (N/A) response that is excluded from scoring. The total mDISCERN score was categorized as follows: poor (8-15 points), fair (16-31 points), and good (32-40 points).

c. Çetinkaya-Uzun readability formula

The Çetinkaya-Uzun Readability Formula is a widely used tool to assess how easy or difficult a Turkish text is to read. It calculates a readability score based on two key metrics: the average number of syllables per word and the average number of words per sentence. The formula is as follows:

Readability Score=198.825−(40.175×Average Syllables per Word)−(2.610×Average Words per Sentence)

To apply it, one must first count the total number of syllables, words, and sentences in the text. After calculating the averages, these values are entered into the formula to generate a numerical score. According to the scale developed by Çetinkaya and Uzun, a score above 80 indicates the text is very easy to read (suitable for primary school level), a score between 60 and 80 is easy to moderate, 50 to 60 is considered moderately difficult (often suitable for high school readers), and below 50 suggests the text is difficult and may require university-level reading skills.

Statistical analysis

After three researchers evaluated the content generated by ChatGPT 4o, descriptive statistics were utilized to express data. Mean and standard deviations were given for the total instrument scores. Readability score was calculated by the principal investigator using a custom Excel tool. Inter-rater reliability among the evaluators was assessed using the Intraclass Correlation Coefficient (ICC) based on a two-way random-effects model with absolute agreement.

Results

a. PEMAT Scores: The mean PEMAT I understandability scores across all prompts received notably high values, ranging from 83.8 % to 94.4%. In terms of domain specific evaluation, the combined mean of “diagnosis” questions was the highest with 94.4 % (±7.9) whilst the combined mean of “definition” related questions was lowest with 89.9% (±7.3) (Table 1). The actionability subscale (PEMAT II) yielded lower scores in comparison to PEMAT I scores. The combined mean of surgical related questions reached a nadir of 68.2% (±18.0) In particular, Q11 (“Should I consider surgery for stress urinary incontinence?”) and Q12 (“Is sling surgery a good option for me?”) received the lowest actionability scores, both at 60.7%. The highest actionability score (100%) was assigned to Q5 (“Where should I attend to for my urinary incontinence complaints?”). The inter-rater reliability for PEMAT was an ICC of 0.80 (95% CI: 0.70–0.91).

Table 1.

Domain specific PEMAT scores.

	PEMAT I understandability Score (%) mean (standard deviation)	PEMAT II actionability Score (%) mean (standard deviation)
Q1-What does urinary incontinence mean?	88.4% (± 8.2)	75.0% (±3.5)
Q2-What are the types of urinary incontinence?	91.4% (±6.8)	91.7% (±11.8)
COMBINED MEAN OF “DEFINITION” QUESTIONS	89.9% (± 7.3)	83.4% (± 12.5)
Q3-Why do I leak urine when coughing and sneezing?	94.4% (±7.9)	83.3% (±23.6)
Q4- Why do I leak urine when I feel an urgent need to urinate and can’t reach the bathroom?	94.4% (±7.9)	91.7% (±11.8)
COMBINED MEAN OF “DIAGNOSIS” QUESTIONS	94.4% (± 7.9	87.5% (± 18.9)
Q5-Where should I attend to for my urinary incontinence complaints?	89.3% (±7.6)	100.0% (±0)
Q6-Can urinary incontinence be treated?	91.5% (±6.4)	60.7% (±10.5)
Q7- What are the treatment options for urinary incontinence?	83.8% (±13.6)	69.0% (±22.1)
Q8-What are non-surgical treatment options for urinary incontinence?	83.8% (±13.6)	69.0% (±22.1)
Q9-What does pelvic floor training mean?	92.1% (±6.3)	74.6% (±18.4)
Q10-What does bladder training mean?	92.1% (±6.3)	69.0% (±22.1)
COMBINED MEAN OF “MANAGEMENT” QUESTIONS	88.8% (± 9.5)	73.7% (± 21.3)
Q11-Should I consider surgery for stress urinary incontinence?	92.1% (±6.3)	60.7% (±10.5)
Q12-Is sling surgery a good option for me?	94.9% (±7.3)	60.7% (±10.5)
Q13- What kind of complications are associated with sling surgery?	86.1% (±10.4)	83.3% (±23.6)
COMBINED MEAN OF “SURGICAL CONSIDERATIONS” QUESTIONS	91.0% (± 8.3)	68.2% (± 18.0)

b. mDISCERN scores: All four domains were evaluated as “fair” in terms of reliability by the mDISCERN instrument. The highest domain specific mean reliability score (27.53 (±4.63) was for “surgical” questions and the lowest mean domain specific reliability score (22.8 ±2.92) was attributed to “definition” questions (Table 2). The inter-rater reliability was good for mDISCERN with ICC of 0.82 (95% CI: 0.70–0.91).

Table 2.

Domain Specific mDISCERN Scores.

	mDISCERN I-reliability mean (standard deviation)
Q1-What does urinary incontinence mean?	22.3 (± 3.3)
Q2-What are the types of urinary incontinence?	23.3 (±2.6)
COMBINED MEAN OF “DEFINITION” QUESTIONS	22.8 (±2.92)
Q3-Why do I leak urine when coughing and sneezing?	24 (± 3.6)
Q4- Why do I leak urine when I feel an urgent need to urinate and can’t reach the bathroom?	23.7 (± 4.5)
COMBINED MEAN OF “DIAGNOSIS” QUESTIONS	23.85 (±3.94)
Q5-Where should I attend to for my urinary incontinence complaints?	24 (± 3.6)
Q6-Can urinary incontinence be treated?	23.7 (± 3.9)
Q7- What are the treatment options for urinary incontinence?	26.7 (± 2.6)
Q8-What are non-surgical treatment options for urinary incontinence?	28 (± 3.3)
Q9-What does pelvic floor training mean?	27.3 (± 1.9)
Q10-What does bladder training mean?	27 (± 1.4)
COMBINED MEAN OF “MANAGEMENT” QUESTIONS	26.95 (±3.34)
Q11-Should I consider surgery for stress urinary incontinencce?	29.3 (± 2.6)
Q12-Is sling surgery a good option for me?	29 (±2.9)
Q13- What kind of complications are associated with sling surgery?	24.3 (±6.1)
COMBINED MEAN OF “SURGICAL CONSIDERATIONS” QUESTIONS	27.53 (±4.63)

Readability Score: Regarding readability, only two responses (Q2 and Q9) were classified as “moderately easy to read” according to the Çetinkaya-Uzun readability index. Two other responses (Q1 and Q8) were rated as “moderately difficult” to read. The remaining nine responses were categorized as “difficult” and required reading comprehension at a university level. The only domain that was calculated as “moderately easy to read” was the “definition” domain. (Table 3) (Figure 2).

Table 3.

Domain specific Çetinkaya-Uzun readability scores.

	Çetinkaya-Uzun readability index scores
^{Q1-What does urinary incontinence mean}	^53.68
^{Q2-What are the types of urinary incontinence}	^72.67
^{COMBINED SCORE OF DEFINITION QUESTIONS}	^63.18
^{Q3-Why do I leak urine when coughing and sneezing?}	^32.21
^{Q4- Why do I leak urine when I feel an urgent need to urinate and can’t reach the bathroom?}	^48.78
^{COMBINED SCORE OF DIAGNOSIS QUESTIONS}	^40.50
^{Q5-Where should I attend to for my urinary incontinence complaints?}	^39.97
^{Q6-Can urinary incontinence be treated?}	^48.50
^{Q7- What are the treatment options for urinary incontinence?}	^49.09
^{Q8-What are non-surgical treatment options for urinary incontinence?}	^56.98
^{Q9-What does pelvic floor training mean?}	^60.28
^{Q10-What does bladder training mean?}	^29.41
^{COMBINED SCORE OF MANAGEMENT QUESTIONS}	^47.37
^{Q11-Should I consider surgery for stress urinary incontinence?}	^29.70
^{Q12-Is sling surgery a good option for me?}	^26.26
^{Q13- What kind of complications are associated with sling surgery?}	^43.49
^{COMBINED SCORE OF SURGERY RELATED QUESTIONS}	^33.15

Figure 2.

Heat map of question domains (dark green indicates higher scores, red indicates lower scores).

Discussion

The principal finding of this study is that ChatGPT-4o demonstrated high understandability addressing common patient inquiries about UI. High understandability/low actionability phenomenon is a cross-linguistic problem. Culture-specific adjustments like using the Çetinkaya-Uzun index and addressing local specialist titles are essential for accurately assessing AI utility. The “diagnosis” domain achieved the highest levels of understandability, particularly for questions explaining leakage during coughing or urgency. The “diagnosis” domain reached 94.4%, effectively explaining symptoms like leakage during coughing. The lowest score was the “definition” domain with 89.9 %. Similarly, Chen et al. previously evaluated ChatGPT-3.5 and reported the following overall understandability scores: 95.2% for definitions, 92.9% for management, 88.1% for diagnosis, and 81.4% for surgery-specific questions.¹⁸ Other literature sources utilized Likert scales rather than PEMAT percentages to measure how well the information was communicated. Rotem et al. approached 37 urogynecologists who rated ChatGPT’s responses.²⁰ They found a high level of comprehensiveness, with an average score of 4.0 out of 5. Approximately 74% of experts gave favorable ratings (a score of 4 or higher) for how well the information was conveyed. Barbosa-Silva et al. did not use a percentage score but noted that ChatGPT’s ability to engage in human-like conversation significantly simplifies complex medical terminology, making it more accessible and digestible for the average user.²¹

Additionally in this present study we demonstrated that ChatGPT-4o′s performance was less favorable in actionability scores, ranging from 60.7% to 100%. While in comparison; Chen et al. reported the most critical findings, with an actionability score of only 18%.¹⁸ This suggests a major failure in motivating readers to take specific medical steps or seek professional help, despite the information being relatively easy to understand.

In terms of reliability our results indicated that ChatGPT-4o showed “fair” reliability. Correspondingly, Rotem et al. also showed positive reliability, where global experts gave ChatGPT an average accuracy score of 3.9 out of 5²⁰. Barbosa et al. investigated how reliable ChatGPT was in answering 14 frequently asked questions (FAQs) regarding female urinary incontinence (UI) compared to established scientific guidelines.²¹ Most answers (6 out of 14) were classified as “more correct than incorrect”. Only one response—regarding bladder retraining —was rated as completely “correct” by the experts. Ultimately, the investigation highlighted a concern regarding inconsistency in the AI’s accuracy and noted that almost all answers failed to provide the full content expected based on medical guidelines.

In our study, readability assessments using the Çetinkaya–Uzun index indicated that most responses required university-level reading comprehension, reinforcing the need to address health literacy disparities, particularly for underserved populations. The study by Cao et al. specifically used the Simple Measure of Gobbledygook (SMOG) index as one of its primary tools to evaluate and compare the readability of responses from DeepSeek and ChatGPT-4.0¹⁹. The SMOG index was employed to calculate the years of education required to understand the generated text, finding that ChatGPT-4.o’s responses were primarily at an “undergraduate college” level while DeepSeek’s were closer to a “high school” level. Chen et al. also utilized the SMOG index to evaluate the readability of ChatGPT-3.5’s responses to 11 frequently asked questions regarding female stress urinary incontinence.¹⁸ The question regarding surgical considerations (“Should I think about surgery for SUI?”) received the highest SMOG score of 15.3, indicating it was the most linguistically complex response. Overall, Chen et al. highlighted a significant readability barrier, noting that while the information was often accurate and understandable in structure, the advanced reading level remains a major area for improvement to make AI tools more accessible to the general public.

To make AI tools like ChatGPT more useful for women with urinary incontinence, several improvements are needed. First, responses should be based more closely on established medical guidelines to increase reliability. Second, the language should be simplified to match the reading level of the general public, while still remaining accurate. Third, information should include clear, practical steps that patients can follow, such as how to do pelvic floor exercises or when to see a doctor. Fourth, ChatGPT should provide references and explain its answers more transparently to build trust. Finally, future development should involve input from specialists and patients to make sure the information is accurate, culturally appropriate, and relevant to real-life care. These steps would make AI-generated information more actionable and supportive in guiding women to seek professional help for UI.

A limitation of this study is its reliance on a single ChatGPT query for survey responses. Since the model produces different outputs with each prompt, variability across responses is inevitable. ChatGPT occasionally strayed from the prompt’s intent, offering generic or abbreviated explanations—echoing prior research that underscores its lack of clinical depth and up-to-date evidence citations. Another limitation is the relatively small number of domain experts consulted, which may have introduced subjectivity and limited the generalizability of the findings. Additionally, the use of the Turkish language instead of English can come in the way of translation and generalisation to other populations world wide. Finally AI-generated responses may be misleading and possess potential risks.²²

Conclusion

While ChatGPT demonstrates a degree of informational integrity, many of its outputs in this study contained technical language and complex phrasing. This may limit accessibility for individuals with lower levels of health literacy. To ensure effective patient education—particularly in sensitive areas such as urinary incontinence,^23,24further refinement of AI-generated content is necessary. Importantly, the role of healthcare professionals remains critical in interpreting, contextualizing, and validating this information to support informed decision-making and optimal patient outcomes.

Supplemental material

Supplemental material - Assessing the understandability, actionability, reliability, and readability of ChatGPT-4o in providing patient education on urinary incontinence

Supplemental material for Assessing the understandability, actionability, reliability, and readability of ChatGPT-4o in providing patient education on urinary incontinence by Ayse Filiz Gokmen Karasu, Betul Cinar, Melda Kuyucu, Abdullah Ilktac, and Tural Ismayilov in DIGITAL HEALTH.

Supplemental material

Supplemental material - Assessing the understandability, actionability, reliability, and readability of ChatGPT-4o in providing patient education on urinary incontinence

Supplemental material

Supplemental material - Assessing the understandability, actionability, reliability, and readability of ChatGPT-4o in providing patient education on urinary incontinence

Footnotes

ORCID iDs

Ayse Filiz Gokmen Karasu

Betul Cinar

Melda Kuyucu

Abdullah Ilktac

Tural Ismayilov

Ethical considerations

The present study was deemed exempt from ethical approval by the Bezmialem Vakif University Institutional Review Board, as it involved a non-human subject qualitative design.

Consent to participate

The present study did not involve any patient data therefore patient consent was not sought.

Author contributions

Ayse Filiz Gokmen Karasu:Project development, data collection, data analysis and manuscript writing. Betul Cinar: Project development, data collection, manuscript writing. Melda Kuyucu: Project development, data collection, manuscript writing. Abdullah Ilktac: Project development, data collection, manuscript writing. Tural Ismayilov: manuscript writing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article: This research was self funded by the researchers.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Data is available upon request. Also Chat GPT produces data that is publicly available.*

Trial registration

Our present study does not require a clinical trial registration because it involved a non-human subject qualitative design.

Permission to reproduce material from other sources

ChatGPT is a publicly available chatbot. The data is publicly reproducible.

Supplemental material

Supplemental material for this article is available online.

References

Ting

Huicai

Kudelati

, et al. Exploring the dynamics of self-efficacy, resilience, and self-management on quality of life in type 2 diabetes patients: a moderated mediation approach from a positive psychology perspective. PLoS One 2025; 20(1): e0317753. https://doi.org/10.1371/journal.pone.0317753

Cole

Sannidhi

Jadotte

, et al. Using motivational interviewing and brief action planning for adopting and maintaining positive health behaviors. Prog Cardiovasc Dis 2023; 77: 86–94. https://doi.org/10.1016/j.pcad.2023.02.003

Lin

Duan

, et al. Internet health information–seeking trend of urinary incontinence in mainland China: infodemiology study. JMIR Form Res 2025; 9: e55670. https://doi.org/10.2196/55670

Laymouna

Lessard

, et al. Roles, users, benefits, and limitations of chatbots in health care: rapid review. J Med Internet Res 2024; 26: e56930. https://doi.org/10.2196/56930

Shojaei

Khosravi

Jafari

, et al. ChatGPT utilization within the building blocks of healthcare services: a mixed-methods study. Digit Health 2024; 10: 20552076241297059. https://doi.org/10.1177/20552076241297059

OpenAI. GPT-4o . Wikipedia, 2024. [cited 2025 Jul 5]. Available from. https://en.wikipedia.org/wiki/GPT-4o

Zhao

Zhang

Zhou

, et al. Revolutionizing patient education with GPT-4o: a new approach to preventing surgical site infections in total hip arthroplasty. Int J Surg 2025; 111(1): 1571–1575. https://doi.org/10.1097/JS9.0000000000002023

Amin

Mayes

Khosla

, et al. Assessing the efficacy of large language models in health literacy: a comprehensive cross-sectional study. Yale J Biol Med 2024; 97(1): 17–27. https://doi.org/10.59249/ZTOZ1966

Unadkat

Abdulwadood

Hiredesai

, et al. ChatGPT 4.0's efficacy in the self-diagnosis of non-traumatic hand conditions. J Hand Microsurg 2025; 17(3): 100217. https://doi.org/10.1016/j.jham.2025.100217

10.

Johnson

Bradley

Kenne

, et al. Evaluation of ChatGPT for pelvic floor surgery counseling. Urogynecology 2024; 30(3): 245–250. https://doi.org/10.1097/UGA.0000000000000131

11.

Baran

. YouTube videos as an information source about urinary incontinence. J Gynecol Obstet Hum Reprod 2021; 50(10): 102197. https://doi.org/10.1016/j.jogoh.2021.102197

12.

Özkent

Kılınç

. Female urinary incontinence on TikTok and YouTube: is online video content sufficient? Int Urogynecol J 2023; 34: 2775–2781. https://doi.org/10.1007/s00192-023-05607-0

13.

Wang

Jin

, et al. Urinary incontinence in pregnant women and its impact on health-related quality of life. Health Qual Life Outcomes 2022; 20: 13. https://doi.org/10.1186/s12955-022-01920-2

14.

Soysal

Veronese

Ippoliti

, et al. The impact of urinary incontinence on multiple health outcomes: an umbrella review of meta-analyses of observational studies. Aging Clin Exp Res 2023; 35(3): 479–495. https://doi.org/10.1007/s40520-022-02257-2

15.

Lightner

Gomelsky

Souter

, et al. Diagnosis and treatment of overactive bladder (non-neurogenic) in adults: AUA/SUFU guideline amendment. J Urol 2019; 202: 558–563. https://doi.org/10.1097/JU.0000000000000309

16.

Kobashi

Vasavada

Loshchichak

, et al. Updates to surgical treatment of female stress urinary incontinence (SUI): AUA/SUFU guideline. J Urol 2023; 209: 1091–1098. https://doi.org/10.1097/JU.0000000000003435

17.

Nambiar

Bosch

Cruz

, et al. EAU guidelines on assessment and nonsurgical management of urinary incontinence. Eur Urol 2018; 73: 596–609. https://doi.org/10.1016/j.eururo.2017.12.031

18.

Chen

Jacob

Hwang

, et al. AUA guideline committee members determine quality of artificial intelligence-generated responses for female stress urinary incontinence. Urol Pract 2024; 11: 693–698. https://doi.org/10.1097/UPJ.0000000000000577

19.

Cao

Hao

Zhang

, et al. Battle of artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions. Front Public Health 2025; 13: 1605908. https://doi.org/10.3389/fpubh.2025.1605908

20.

Rotem

Zamstein

Rottenstreich

, et al. The future of patient education: a study on AI-driven responses to urinary incontinence inquiries. Int J Gynecol Obstet 2024; 167(3): 1004–1009. https://doi.org/10.1002/ijgo.15751

21.

Barbosa-Silva

Driusso

Ferreira

, et al. Exploring the efficacy of artificial intelligence: a comprehensive analysis of ChatGPT's accuracy and completeness in addressing urinary incontinence queries. Neurourol Urodyn 2025; 44(1): 153–164. https://doi.org/10.1002/nau.25603

22.

Wang

Wan

, et al. Applications and concerns of ChatGPT and other conversational large language models in health care: systematic review. J Med Internet Res 2024; 26: e22769. https://doi.org/10.2196/22769

23.

LaPier

Jericevic

Lang

, et al. Predictors of care-seeking behavior for treatment of urinary incontinence in women. Urogynecology 2024; 30(3): 352–362. https://doi.org/10.1097/SPV.0000000000001491

24.

Pizzol

Demurtas

Celotto

, et al. Urinary incontinence and quality of life: a systematic review and meta-analysis. Aging Clin Exp Res 2021; 33(1): 25–35. https://doi.org/10.1007/s40520-020-01712-y

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.07 MB

0.25 MB

0.12 MB

0.00 MB