Abstract
Introduction
Patient Information leaflets (PILs) from the British Association of Urologists (BAUS) are commonly used to communicate to patients about surgical procedures, but previous studies have highlighted that they are too difficult for some patients to read. BAUS PILs have since been rewritten using guidelines which emphasize readability.
Objectives
To identify if the readability of BAUS PILs has changed compared to historical versions.
Methods
Current BAUS PILs (published 2020-2025) were compared with historical PILs with similar titles from 2014 to 2016 using a custom Python script.
Results
Readability scores improved by a significant (P < .001) albeit small amount (FKGL 8.83 vs 8.67; SMOG 11.86 vs 11.71; FRE 57.06 vs 57.47), and continue to have a suboptimal readability. The revised PILs had significantly fewer long sentences (10.11% vs 6.84% P < .0001), fewer sentences that used the “passive voice” (30.50% vs 16.22% P < .0001) and were shorter (1875.5 words vs 1726 P = .004).
Conclusion
While there have been small improvements in urology PILs for patients, they remain too difficult for many patients, and more work is needed to improve readability.
Keywords
Patient Summary
We looked at the patient information leaflets (PILs) that the British Association of Urologists (BAUS) provides, and how easy they are to read. We compared the most recent version to those that were available before 2016 and found that there has only been a small improvement in how easy they are to read.
Introduction
Patient information leaflets (PILs) are commonly used to improve patient understanding, particularly about surgery. A recent “review of reviews” demonstrated that PILs can improve patient understanding and satisfaction. 1 Unfortunately, they are often not understandable by a significant proportion of patients, as they are written at a relatively high reading grade.2–5 Indeed, one-third of people over 65 years old in the UK struggle to understand PILs. 6 The NHS standard for health content for patients is a reading age of 9–11, although the 11–14 years level can be acceptable for complex or technical content. 7
British urologists commonly supplement the surgical consent process with British Association of Urological Surgeons (BAUS) PILs. Most of the PILs explain the operation, its risks and the expected patient recovery. A small number also provide advice on non-surgical treatments, prevention of disease, or lifestyle modification to alleviate or prevent conditions. A 2013 study 5 showed that BAUS PILS had poor readability and performed worse than leaflets produced by patient.co.uk (a generic UK patient organization), although they were shorter. In 2015, 4 BAUS information leaflets were found to be too difficult for many patients to understand. Similar issues have been identified with urology PILs from American and European organisations.8,9
The readability of printed or online information can be assessed using readability tools, such as Simple Measure of Gobbledygook (SMOG), Flesch Reading Ease (FRE), and Flesch-Kincaid Grade Level (FKGL),10–12 which include aspects such as word length, number of syllables, and word difficulty to generate a readability age score. These tools have previously been validated against texts tested against those of different reading ages. These traditional readability tools measure variables such as sentence length, syllables of words, which correlate with the perceived difficulty of texts, and with each other. 13
These tools have been widely used to assess the readability of PILs previously, 2 although they were not designed with healthcare content specifically, and some question their validity.14,15 Furthermore, many aspects of readability are not captured by such tools, which are more correlated with readability rather than a direct measure of readability. They also do not capture any non-text-based aspects of readability, such as layout or use of images, which have been shown to be important for the readability of PILs.15–18
Despite these limitations, readability tools do correlate with perceived readability,13,19 and allow for a more cost-effective estimate of the readability of text compared to direct feedback from patients. The formulaic nature of these tools also makes analyzing large datasets of PILS feasible by automating the analysis using a computer program.
Long sentences are also important to assess, as longer sentences are more difficult to understand. 20 The specific cut-off for a “long” sentence varies, but many use 25 words.21,22 Furthermore, use of the “active voice” (rather than “passive voice”) is widely recommended for patient communication,7,23,24 although there is mixed evidence about whether it actually improves understanding. 25
Following the 2015 study, BAUS revised its PILS using style guides from the Information Standard, 7 the Patient Information Forum, 23 and the Plain English Campaign, 24 intending to increase the readability of BAUS PILS. However, no study has assessed BAUS PILs’ readability since 2015. 4 The aim of this study is therefore to compare historical BAUS information leaflets with the most recent versions, which have been written specifically with readability in mind, assessing both sets of PILs on readability indices, use of long sentences and “active voice,” and their overall length.
Methods
Study Design and Overview
A between-groups study design was used, comparing historical and current PILs (matched for clinical content). PILs were acquired in PDF form from online websites, and then processed using a custom computer program to help isolate the relevant text (sanitization). The custom computer program then analyzed readability scores and other outcomes (eg, text length) for each document. The outcomes for the historical and current groups collectively were compared, and the statistical significance of comparisons was calculated.
Acquisition of Patient Information Leaflets
All current BAUS PILs were downloaded from the BAUS website 26 in February and March 2025. Historical PILs were not available from BAUS, and so they were acquired by accessing the “Wayback Machine” from the Internet Archive 27 in October and November 2024. The historical BAUS PILS were published between 2014 and 2016. PILs were matched by the author based on file name, title, and contents to the same (or very similar) subject areas.
Sanitization of the Data
The PILS (accessible in PDF form) were processed using a custom Python (Python 3.12.3) program created for this study. Leaflets were converted into plain text to allow readability analysis. PILs published since 2021 often include tables, making conversion from PDF to plain text inconsistent. PDFs were therefore converted into markdown files (a common filetype with simple text formatting), using the module “pymupdf4LLM” (version 0.0.27), which allowed more reliable processing of tabular data. This added some markdown-specific annotations, such as “#” for titles. The markdown documents were then imported into Python. A sample of the PILs converted to markdown was checked against the original PDFs for errors. Sentence structure was well preserved, with the main differences being formatting, and header and footer elements being added to the main text.
Using the “regex” (version 2024.9.11) module to identify specific patterns of text (regular expression matching), and multiple irrelevant text strings, such artefacts from conversion to Markdown were removed or corrected, along with headers and footers. Email or postal addresses, telephone numbers, header/footer text, and periods unrelated to sentences (such as “e.g.”) were also removed. The program was continually modified until errors were reduced to a minimal level.
Some subheadings in the form of sentences did not end with punctuation, but were clear sentences, such as “Questions you may wish to ask.” For the purposes of analysis, these were treated as individual sentences. Text was then inspected thoroughly on a small sample of PILs after script processing, and it was felt that the text was nearly entirely correctly preserved. All processed PIL texts were superficially scanned for significant errors, and errors were corrected by iterative improvement in the sanitization script. Due to some “bugs” in the modules used, minor manual modifications were made to correct these errors. These bugs have since been corrected in updated modules.
On assessing the PILs, bulleted lists and tables inconsistently had periods at the end of lines. From a readability perspective, it was thought that the bulleted lists and tables represented isolated text and should be treated as new sentences. Therefore, further Python text processing was performed to add punctuation to relevant bulleted or tabulated lists, to increase the consistency of text readability analysis.
Readability Formulae
Multiple readability formulae are available to assess printed or online text. We used FRE, FKGL, and SMOG formulae, as they have been widely used, including in the 2015 PILs study 4 and SMOG has been suggested as the most reliable for health-related content 28 (see Appendix for formulae). A higher score for FKGL and SMOG implies less readable text, whereas for FRE, a lower score implies more difficulty.
Calculation of Readability
Readability formulaic analysis was then performed using a modified version of the textstat (v0.74) Python module. This is a widely used module for calculating readability in previous studies. 29
Some modifications were made to the program. Firstly, the precision of the formulae results was increased to 2 decimal places. Secondly, during the early stages of writing the program, it was discovered that syllables of less common medical words were not correctly counted (eg, “ureteroscopy” was counted as two syllables rather than six). As such, textstat was modified to use the module “syllapy” (version 0.72) to count syllables, as this was found to be substantially more accurate in assessing medical words not included in standard dictionaries. This was tested on common urological terms and was found to correctly identify syllables in all tested terms.
Calculation of Other Outcomes
For other document properties, the “textstat” module was also used. Other outcomes calculated included the total number of sentences, total word count, number of sentences over 25 words long, and number of long words (3 syllables or more). The proportion of polysyllabic words per sentence and the percentage of long sentences in each document were also calculated.
To assess the use of “passive voice,” the module “PassivePy” 30 was used, which has been shown to have a 97% agreement rate with human-rated examples. 30
Statistical Analysis
The majority of documents had a similarly titled PILs with two dates: a historical date (2014-2016), and a recent date (2021-2024). In a few cases, no corresponding historical or recent PIL could be identified (mostly due to novel procedures, procedures becoming obsolete, or the combination or splitting of PILs). PILs without a historical or recent matched pair were excluded from the sample.
For PILs where a historic and recent version both existed, we performed a comparison of the readability between each version. The results were then processed partly externally using Microsoft Office Excel 2021 (v16.97) and partly within the custom program using the “pandas” module (v2.23), and then analyzed statistically using the “scipy” (scientific python-v1.15) module.
Distribution normality was assessed using the scipy module, which demonstrated that many of the distributions did not meet normality assumptions. Therefore, the non-parametric Wilcoxon signed rank test was used for all comparisons. Effect size was calculated using the “pingouin” module (v0.5.5), and graphs were produced using the “plotly” module (v 6.7.0).
The code used to produce this work is available in an online repository. 31
Results
A total of 311 PIL-like documents were identified on the BAUS website and the archived version on the Wayback Machine (including both historical and current PILs). Of these, 18 PILs were excluded due to not being produced by BAUS, or being mainly a form for patient completion rather than a PIL. A further 57 PILs were excluded due to missing a matching historical or recent PIL.
This left 118 pairs of PILs. The majority of the PILs discussed surgical operations, risks, and recovery, although 19 (16%) pairs of PILs (16%) covered non-surgical topics (tables listing which PILs were compared and which were excluded are available in the online repository 31 ).
For the readability analyses (see Table 1), the FRE median scores increased from 57.06 in the historical group to 57.98 in the current group, the FKGL decreased from 8.83 to 8.67, and the SMOG score decreased from 11.86 to 11.71 (Figure 1). All of these changes were highly significant (P = .0002 or greater).

SMOG versus FKGL. Point size represent text length of the PIL.
Statistical Comparison of Current and Historical PILs.
Degrees of freedom (n−1) = 117 for all analyses.
The current PILs were significantly shorter (mean 1875.5 words vs 1726.0 words; P = .004), as shown in Figure 2. There was also a significant reduction in average sentence length (14.3 vs 13.9; P < .0001) and the number of polysyllabic words per sentence (2.33 vs 2.26; P < .0001), as shown in Figure 3. The proportion of sentences using the “passive voice” (shown in Figure 4) fell from 30.5% to 16.22%, which was statistically significant (P < .0001). The proportion of sentences over 25 words long fell from 10.11% to 6.84%, which was statistically significant (P < .0001). Supplemental Table 1 describes the major layout changes between different years of publication.

Word count by group. Overlapping areas of groups on histogram are represented by purple color.

Number of sentences per 100 words versus number of polysyllabic words. Point size represents text length. Overlapping areas of groups on histogram are represented by purple color.

Passive sentence use.
Discussion and Conclusions
Discussion
Brief Summary of Findings
This study has demonstrated that revision of the BAUS urology PILs has resulted in a statistically significant but very small improvement in readability. Other factors that could impact on readability (PIL length, proportion of long sentences, and use of the “passive voice”) were also improved to a larger degree in current PILs.
Strengths and Limitations
This study has significant strengths. The use of a custom Python program allowed for more robust sanitization of the data, including the exclusion of text or punctuation, likely resulting in increased reliability of readability analyses. The use of open-source modules allowed problems to be identified and corrected, which may have otherwise gone undetected. The custom Python program also permitted sentence-level assessment of the use of the “passive voice,” which otherwise would not have been feasible.
The study used a matched pairs design, comparing PILs like with like, with only a relatively small proportion of PILs being excluded from the analysis due to a lack of a matching PIL. Consequently, the analysis included the majority of BAUS PILs, which are used widely in UK-based urology healthcare. Some bias, however, might have been introduced as the excluded unmatched PILs are more likely to be a novel or discontinued procedure, and there may be slight differences in readability with these more complex texts. Furthermore, it is worth noting that BAUS PILs are only written in English. A significant number of patients will not speak English as a primary language, which will alter how readable these PILs are for these patients, and highlights the importance of ensuring good readability for those using English as a second language.
Automated text extraction from PDF is a recognized problem. 32 By iterating on the program, the vast majority of issues were accounted for. A sample of PILs was checked, and the few remaining errors were minor and infrequent, and unlikely to affect the interpretation of the results.
Some degree of the difference in readability between the groups may be due to the greater use of tables to summarize the risks of procedures in the current PILs group, which would produce shorter sentences than the lists that are used in the historical groups.
The study measured a number of variables that are indicative of readability. However, it did not include other factors that are known to influence people's use and understanding of printed information, such as layout, headings and other navigation tools, font size, and use of diagrams and images.
Of note, these attributes did change significantly between years and would have a significant impact on readability (Supplemental Table 1), and would not be captured by this study. Particular changes of note which are likely to have improved patient understanding, readability, and experience include greater use of images/diagrams specific to the subject area (rather than generic images), better text layout in the form of tables or bullet points, and the use of a “key information” section at the start which provides a summary.
Furthermore, the traditional readability formulae are relatively simplistic and may not accurately capture the readability of the text. Usability testing, in which a sample of patients or non-patients is tested on their understanding of information content, can provide a more direct measure of the quality and inclusivity of PILs, which would capture aspects of readability that this study does not. This would be another useful direction for future work, where a direct comparison between the current and historical PILs could be made by a cohort of urology patients.
What This Study Adds
This study provides an update on the readability of the PILs created by BAUS, as well as showing that the readability has improved, albeit only slightly, over the last decade. This assessment is also more robust than previous studies, as it sanitizes the text before analysis and provides a more accurate assessment of readability, particularly regarding the SMOG score. This is also the first study that could be identified that quantifiably assesses the use of the passive voice in PILs, and demonstrates that in this domain, PILs have significantly improved.
The limited change in readability scores (<3% difference in all scores) is relatively consistent with other studies on readability, which have previously demonstrated no improvement over time in readability scores.33,34 One reason for the limited change in readability may be due to a limit to how much medical texts can be simplified. Most of the readability formulas rely on the average sentence length and the number of polysyllabic words as variables. There is often a trade-off between sentence length and the use of polysyllabic words. For example, to try to reduce the polysyllabic word count, the difficult 5-syllable word “epididymis” could be written as “the tube next to the testicle where sperm mature.” However, this change increases the word count by 8 and so increases sentence length. A pragmatic compromise (commonly used in the current PILs group) is to define the medical term in simple language when first used, and then continue to use the medical term later in the text. Such a change, while improving understanding of the text without overtly increasing sentence length, would, however, not lead to a change in the readability scores.
One important observed difference compared to the 2015 study 4 was that the SMOG score was much lower in this current study (14 compared to 11.86). This represents a difference in reading age from above post-secondary school education compared to the reading age of 16–17 years old. This difference may be a result of our improved sanitizations of the text compared to the original paper. Sanitizing text has been shown to reduce variability between different calculators and a manual calculation of the readability outcomes, 35 and will avoid errors such as incorrect sentence length calculation from non-prose-related punctuation. The previous study only performed limited text sanitizations.
Overall, this study has demonstrated a statistically significant but very small change between historical and current PILs in the reading level according to the commonly used readability formula. This highlights an area where the patient experience is still suboptimal and could be further improved by increasing the readability of PILS. This improvement in readability would also enhance patient's understanding of surgical procedures, which is particularly important for informed consent. Further work is therefore recommended to ensure the style guides BAUS used for writing the new PILs are followed to aid patient understanding.
Conclusions
Overall, this study has demonstrated that there has been a statistically significant improvement in BAUS urology PILs readability, although this is only small. There has, however, been a greater improvement in other measures that impact on readability, such as the use of long sentences and the use of the passive voice. Additionally, the study has shown that the estimated reading age of the information leaflets is likely lower than previously thought, due to better data sanitation before automated analysis. Doctors can feel more confident in their use of BAUS PILs than previously suggested, although the PILs may require further revisions to be ideal for patient experience and understanding.
Practice Implications
Writers of urology PILs should attempt to improve this further in future iterations. Multiple tools have been created to try and improve readability,36,37 and their use should be considered by those producing PILs, along with formal usability feedback from the target audience, to assess factors impacting understanding that are not captured by readability formulae and other metrics. 38 This is equally applicable to PILS in other specialties, where similar problems with readability occur. 2
Supplemental Material
sj-docx-1-jpx-10.1177_23743735261449660 - Supplemental material for The Readability of British Association of Urology Society Patient Information Leaflets: Have We Done Enough for Patients to Understand?
Supplemental material, sj-docx-1-jpx-10.1177_23743735261449660 for The Readability of British Association of Urology Society Patient Information Leaflets: Have We Done Enough for Patients to Understand? by Henry St Aubyn Bilton, BMedSci, MBChB, MSc, Peter Knapp, BA, PhD, and Gui Tran, BSc, MBChB, PhD in Journal of Patient Experience
Footnotes
Declaration of Generative AI and AI-Assisted Technologies in the Writing Process
The authors did not use generative AI or AI-assisted technologies in the development of this manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Appendix
Readability formulae used:
(where polysyllables refers to words of 3 or more syllables).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
