Development of a Disease-Specific Ureteral Calculus Patient Reported Outcome Measurement Instrument

Abstract

Introduction:

Patient reported outcome measures (PROMs) are powerful instruments to assess the impact of a disease on health from the patient's perspective. We describe the process of designing, testing, and validating the Cambridge Ureteral Stone PROM (CUSP).

Materials and Methods:

Patients recently diagnosed with ureteral stones were approached for participation in focus groups, structured interviews, and test–retest validation studies. Statistical tests included Cronbach's alpha for internal consistency, Spearman's and Pearson's correlation coefficients for test–retest validity, permutation tests of equality of means and Spearman's correlation coefficients for discriminant validity.

Results:

Forty-three patients participated in the development of the CUSP. Twenty-two patients were involved in the focus groups and structured interviews and a further 21 participated in the prospective test–retest study. Expressed comments were grouped into seven broad health domains: pain, fatigue, sleep disturbance, work and daily activities, anxiety, gastrointestinal (GI) symptoms, and urinary symptoms. Items were selected from established PROM platforms to form the draft (dCUSP) instrument, which was then used for test–retest validation and item reduction. All domains scored highly for Cronbach's alpha (>0.8), with the exception of GI symptoms. Large Spearman's (>0.76) and Pearson's correlation estimates (>0.83) were obtained for test–retest validity, suggesting that answers were reliable through the time period tested. The estimates of the Spearman's correlation coefficient between each pair of domains ranged from 0.17 to 0.78 and the upper bounds of the corresponding 95% confidence intervals were all smaller than 0.95, suggesting that each domain measures something different. The tests of equality of the mean of scores of the control (n = 25) and patient groups were all significant, suggesting that CUSP successfully discriminated patients suffering from ureteral stones for every domain.

Conclusion:

CUSP is a patient-derived ureteral stone PROM, which can be used to measure ureteral stone disease health outcomes from the patient's point of view.

Introduction

Outcomes from surgery have traditionally and conventionally been based on surgeon-reported outcomes, such as estimated blood loss, infection rates, and length of hospital stay. Recently, increased awareness about the need to assess healthcare from output to outcome in a multidimensional way, and the need for better qualitative assessment has led to a growing interest in assessing outcomes from a patient's perspective. Patient reported outcome measures (PROMs) are an effective way to do this.¹ PROMs are completed by patients before and after treatment, and can be a powerful measure of health and treatment success. Broad clinical outcomes, such as mortality rates, inform very little about the overall quality of care.² PROMs are distinct from patient reported experience measures, which focus instead on experience of care such as being treated with dignity, or clinic waiting times.¹

The private hospital group Bupa, now owned by Spire Healthcare, has been routinely collecting PROMs since the 1990s. The National Health Service (NHS) PROMs program initiated in 2009 by the Department of Health (now NHS England) has required the routine measurement of PROMs for all NHS patients in England before and after receiving surgery for four specific treatments (hernia, hip and knee replacement, and varicose veins),³ and there is considerable interest for extending measurements to encompass all elective surgeries.² There are two general types of PROMs, those that are generic (e.g., EQ-5D and SF-36) and measure quality of life indicators,^4,5 and those that are disease specific (e.g., International Prostate Symptom Score (I-PSS)⁶ and urethral stricture PROM⁷), which measure specific symptom indicators to assess impact of the disease in question.

Although PROMs have been used in Urology,^7,8 such as the Ureteral stent symptom questionnaire,⁹ and the Wisconsin Stone Quality of Life Questionnaire (WISQOL) has recently been validated in kidney stone formers,^10
–12 a systematic review of the literature did not identify a specific PROM for ureteral calculi; one of the commonest urological emergencies. In this study, we describe the process of designing, testing, and validating a disease-specific ureteral stone PROM, the Cambridge Ureteral Stone PROM (CUSP).

Materials and Methods

Stage 1

Patients recently diagnosed or treated for ureteral stones were identified at the stone multidisciplinary team meeting and stone clinic in a tertiary referral stone unit, Cambridge University Hospitals NHS Trust, and approached for participation in the development of CUSP. NHS Research Ethics Committee approval was granted (REC 14/N1/1111). Patients with kidney stones, ureteral stents, urinary tract infections, or pre-existing lower urinary tract conditions were excluded from the study. All patients were given an information sheet explaining the basis of the research, and provided written informed consent. Recruited patients were invited to either attend a focus group or a structured interview session.^13,14 Each focus group was comprised of at least 3 clinicians/research team members (moderator, assistant moderator, and transcriber). To facilitate the session, the moderator would use a standard set of open-ended questions that were identified by the researchers such as:

(1) What impact has having a stone had on your general health?

(2) What symptoms related to the stone did you find bothersome?

(3) What impact has having a stone had on your general day to day activity?

(4) What impact has having a stone had on your work performance?

(5) What impact has having a stone had on your home life?

(6) What other issues do you feel are important and that have not been addressed?

These open-ended questions also formed the basis of the one-to-one structured interviews. All focus group and structured interview sessions were audiorecorded. Each of the patient's comments were then anonymized and transcribed. Content analysis of the transcribed comments identified recurring patient reported “themes.” These were then tabulated to indicate frequency of reports and grouped into appropriate broad health domains, such as “urinary symptoms,” “pain,” and “sleep related.” Additional structured interviews were conducted to ascertain whether the themes adequately encompassed the spectrum of experiences and impact for ureteral disease, until no new comments were given (“content saturation”).

Stage 2

The themes identified from Stage 1 then formed the basis of the question/item selection for the first draft of CUSP. Themes were “weighted” depending on the frequency of the comments; for example, pain was the most frequently occurring concern and was proportioned to have eight questions, whereas urinary symptoms, gastrointestinal (GI) symptoms, and fatigue were equally weighted and were attributed three questions for each. A pragmatic approach was used to select questions from previously reported and well-validated PROMs (PROMIS healthcare Organization and FACIT.org).

A representative subgroup of patients (n = 10) who participated in the focus groups and structured interviews were then invited to review CUSP, to test comprehension, and to comment on whether it accurately and faithfully represented the issues that they had raised. Duplicated or redundant items were removed.

Stage 3

The revised CUSP instrument was then evaluated for reproducibility with test–retest validation and discriminant validity tests. Cronbach's alpha was used to assess the strength of the relationship between questions included in the PROM. Pearson's and Spearman's correlation coefficients were used to evaluate the level of association between participants' scores measured at different time points. We also assessed whether the different domains (pain, fatigue, etc.) were different from each other, to ensure that each domain measured something different.¹⁵ Therefore, we estimated the Spearman's correlation coefficients between the scores of the patient groups for each pair of domains and defined the corresponding 95% percentile confidence by means of a nonparametric bootstrap with R = 10,000 replicates.¹⁶

Finally, we tested the ability of the CUSP to discriminate controls from patients by comparing the mean CUSP results of the 21 patients with ureteral stones to 25 controls (without ureteral stone disease) for each domain of CUSP. Due to the relatively small sample size per condition, we used one-sided nonparametric permutation tests with R = 10,000 permutations to test the equality of means between conditions for each domain.¹⁶

Figure 1 is a flow diagram presenting the developmental process of CUSP.

FIG. 1.

Flow diagram demonstrating the developmental process of CUSP. CUSP = Cambridge Ureteral Stone Patient reported outcome measure.

Results

Stage 1

Patients with recent acute presentation with ureteral stones participated in three focus groups (n = 11) and structured interviews (n = 11). There were 4 female and 18 male patients, with an age range 34 to 78 years (median 54.5 years). Fifteen patients were married or had partners, 4 were divorced, and 3 were single. Twenty patients were Caucasian, 1 was of mixed race, and 1 was of Asian ethnicity. Six patients were educated to Bachelor's degree or higher, 9 patients were educated to Advanced (“A”) levels, and 7 to General Certificate of Secondary Education standards. Ureteral stone size ranged from 3 to 11 mm (median 5 mm). Eleven of these patients had ShockWave Lithotripsy (SWL), 6 had medical expulsive therapy, and 5 had ureteroscopy (URS) and laser fragmentation for their ureteral stones. Patients attended the focus groups or structured interviews between 2 and 4 weeks after diagnosis, either while awaiting or following treatment. In all, 218 patient reported impact comments were transcribed, and these were tabulated to identify the occurrence of recurring reports and comments (Fig. 2). These were then grouped into seven broad health domains: Pain, Fatigue, Work and Daily Activities, Sleep disturbance, Anxiety, GI symptoms, and Urinary Symptoms.

FIG. 2.

Impact statements reported by patients in the focus groups and structured interviews, and the frequency of occurrence of the statements.

Stage 2

Items were selected for the seven broad health domains identified in Stage 1. Pain was the most frequently reported concern and was represented by eight items, Fatigue was the next most common and was proportioned to have five items, followed by four items allocated to Sleep disturbance, and three items to each of Work and Daily activities, Anxiety, GI symptoms, and Urinary symptoms. A pragmatic approach was used to select and adopt previously well-validated questions from established PROM instruments (PROMIS healthcare Organization and FACIT.org). These 29 items consisted of statements followed by 5-Level Likert scale responses. The total score from this draft version of CUSP (dCUSP) ranged from 29 to 145 in total. Patients took between 2 and 3 minutes to complete dCUSP.

Stage 3

The dCUSP instrument was then used in a pilot prospective test on 21 patients with radiologically confirmed ureteral calculi. There were 6 female and 15 male patients, with an age range 20 to 67 years (median 50 years). Ureteral stone size ranged from 3 to 11 mm (median 5 mm). Seven patients were having SWL treatment, 3 were awaiting or were posttreatment with URS, and 11 were being treated conservatively. The patients completed dCUSP on two occasions, separated by at least an hour. Cronbach's alpha estimates and dimension reduction techniques using principal component analyses suggested that, for all domains except GI symptoms, the questions related to each domain measured the same (underlying) construct or variable (Table 1). The low Cronbach's alpha estimate obtained for GI symptoms was mainly due to the use of a “reverse scale” in one of the 3 questions related to the domain. Removal or reversing the scale of the question resulted in a Cronbach's alpha score of 0.52, likely due to some of the patients correctly reading the scale while others did not. In view of the ambiguity of the construct and the relatively low importance of the domain compared with others, it was removed from the final CUSP instrument. Of note, Cronbach's alpha score in all the domains (apart from GI symptoms) were larger than 0.7, which is often used as the lowest acceptable limit.

Table 1.

Statistical Analyses of Draft Cambridge Ureteral Stone Patient Reported Outcome Measure

		Cronbach's alpha		Correlation
Domain	Number of questions	Estimates	S.E.	Pearson	Spearman
Pain	8	0.978	0.007	0.944	0.952
Fatigue	5	0.945	0.021	0.952	0.967
Work and daily activities	3	0.957	0.016	0.919	0.898
Sleep disturbance	4	0.896	0.038	0.845	0.834
Anxiety	3	0.922	0.030	0.965	0.946
Gastrointestinal symptoms	3	0.516	0.178	0.832	0.764
Urinary symptoms	3	0.805	0.076	0.972	0.936

For test–retest validity, Spearman's and Pearson's correlation estimates were used. Large values were obtained when analyzing patients average scores at time points 1 and 2 for each domain (Fig. 3), suggesting that the answers were reliable through the time period tested.

FIG. 3.

Average scores from the participants at time 1 and 2 for each domain in dCUSP. dCUSP = draft CUSP.

We assessed discriminant validity by comparing the averages of the scores of participants to the control and patient groups for each domain of the CUSP. Figure 4 shows the box plots of the scores of participants per study allocation group and domain, as well as the average of the scores per condition (pink dots). The mean scores of patients were systematically larger than the ones of control. The analysis of the medians (thick black lines) led to the same conclusion. For the domain “Anxiety,” the median score of patients was only slightly higher than the median score of the control group, but as its distribution was more skewed, the mean per group was different. Also, for all domains, the spread of the scores of the patient group seemed larger than the one of the control group. Table 2 reports the mean difference between patient and control group scores per domain (mean of the patient group minus the mean of the control group), the 95% confidence intervals (CI) for the difference in mean as well as the p-values of the one-sided permutation tests of equality of mean.

FIG. 4.

Box plots showing the median scores of the participants per condition (control versus patients) for each domain of CUSP ( mean, — median, ○ outliers).

Table 2.

Statistical Analyses of the Difference Between the Patient and Control Group Scores for Each Domain of Cambridge Ureteral Stone Patient Reported Outcome Measure

	Difference in means
		95% CI
Domain	Estimates	Low	High	One-sided test, p-values
Pain	1.8415	1.1974	2.4965	<0.0001
Fatigue	1.0978	0.4929	1.7019	0.0006
Work and daily activities	1.0804	0.4941	1.7022	0.0003
Sleep disturbance	0.8838	0.2657	1.4862	0.0036
Anxiety	0.6279	0.1911	1.1048	0.0035
Urinary symptoms	0.9308	0.4629	1.4571	<0.0001

CI = confidence interval.

All tests were statistically significant and no CI contained 0. Therefore, we could conclude that the proposed CUSP can successfully discriminate patients with ureteral stones from controls for every domain.

Finally, we assessed if each construct measured something different to the others by estimating Spearman's correlation coefficient between pairs of domains (Table 3), and by defining the corresponding 95% CI (Table 4). Given that the estimated Spearman's correlation coefficient estimates ranged between 0.17 and 0.78 and that the upper bounds of the corresponding 95% CI were all below 0.95, we concluded that each domain measured something different.

Table 3.

Spearman's Correlation Coefficients Between the Average Scores of Patients in Each Domain

	Pain	Fatigue	Work and activities	Sleep disturbance	Anxiety	Urinary symptoms
Pain	1.00	0.45	0.36	0.52	0.56	0.17
Fatigue	0.45	1.00	0.71	0.78	0.63	0.48
Work and activities	0.36	0.71	1.00	0.59	0.44	0.55
Sleep disturbance	0.52	0.78	0.59	1.00	0.56	0.32
Anxiety	0.56	0.63	0.44	0.56	1.00	0.28
Urinary symptoms	0.17	0.48	0.55	0.32	0.28	1.00

Table 4.

95% Confidence Interval for Each Pair of Domain

		95% CI
Domain pair		Low	High
Fatigue	Pain	0.057	0.740
Work and activities	Pain	−0.108	0.788
Sleep disturbance	Pain	0.097	0.813
Anxiety	Pain	0.141	0.810
Urinary symptoms	Pain	−0.289	0.613
Work and activities	Fatigue	0.343	0.915
Sleep disturbance	Fatigue	0.503	0.915
Anxiety	Fatigue	0.228	0.932
Urinary symptoms	Fatigue	0.001	0.868
Sleep disturbance	Work and activities	0.182	0.843
Anxiety	Work and activities	−0.022	0.822
Urinary symptoms	Work and activities	0.079	0.873
Anxiety	Sleep disturbance	0.165	0.821
Urinary symptoms	Sleep disturbance	−0.157	0.699
Urinary symptoms	Anxiety	−0.221	0.730

In order of appearance in the article.

Discussion

Donald Berwick eloquently quoted in 1997 that “The ultimate measure by which to judge the quality of a medical effort is whether it helps patients (and their families) as they see it. Anything done in health care that does not help a patient or family is, by definition, waste, whether or not the professions and their associations traditionally hallow it.”¹⁷ PROMs are valuable tools from which clinicians and healthcare providers can gain important insights into the effects of medical and surgical interventions specifically from the patient's point of view. It is now widely recognized that classical clinical outcomes alone provide little information about what is relevant to patients. For example, a surgeon may declare an operation to be a success, but this means very little if the patient reports little benefit following surgery.^2,18 Case in point, data from the NHS PROMs program showed that by using a generic PROM, such as the EQ5D, almost half (49.8%) of patients reported improvement of their overall health following a groin hernia repair, whereas 88.4% of patients reported improvement following hip replacement. Use of a disease-specific PROM (Oxford Hip Score) pre and posttreatment, showed an even more impressive 96.5% improvement rate in these patients across the nation¹⁹ (http://digital.nhs.uk/catalogue/PUB30036). Therefore, while use of a generic PROM is useful to measure universal health issues, disease-specific PROMs, such as the Oxford Health Score, enables measurements of the effect of treatment on specific aspects of the disease in question.

Although the urology community has recognized the value of PROMs and embraced their use as a valuable means to benchmark treatment outcomes, there remains a paucity of urological disease-specific PROMs. The most widely used example is the I-PSS, which is adopted in standard practice to inform clinicians of the patient impact of prostatic symptoms before, during, and following treatment (medical or surgical).⁶ A PROM for urethral stricture surgery has also been successfully validated and is used internationally in several centers.^20
–22 More recently, exciting progress in PROMs has been made in the area of kidney stones. The WISQOL is a PROM for patients with a history of kidney stones. Its external validation in a multicenter study in North America showed that although the results correlated with the generic SF-36v2 (Pearson correlation 0.56, p < 0.001), the WISQOL was a superior instrument in discriminating between patients with and without current kidney stones.^10,11 However, WISQOL has yet to be assessed for test–retest validity and clinical use.¹⁰ Joshi and colleagues have also reported on the development of a ureteral stent symptom questionnaire, to evaluate the impact of different types of ureteral stents.⁹

Ureteral calculi are one of the commonest urological emergencies. Various treatment modalities, such as SWL, ureteroscopic fragmentation, or medical expulsive therapy, are available. Although there are broad clinical guidelines, treatment allocation is variable and depends on clinical judgment, access, and patient factors. Current evidence for each of these modalities is largely based on clinical outcome measures. Our objective was to produce a disease-specific ureteral stone PROM that could aid in treatment outcome assessment. In this article, we have described the processes involved from initially identifying the need for a new instrument, item generation, item consolidation and reduction, and test–retest validation. We have highlighted the importance of embedding patient involvement throughout the design and refinement stages, enabling the development of a truly patient-derived PROM.

There are three main limitations to our study. First, our patient cohort was relatively small, and a distinction was not made between first-time stone formers and recurrent stone formers, and so may not be representative of the general population suffering from ureteral stone disease. CUSP (Fig. 5) is currently being used in a national multicenter study, the results of which will inform on demographic, previous history of stone disease, and geographical variances. Second, we selected items from previously validated PROMs available on public platforms, rather than designing our own questions. This was a pragmatic decision, based on the need to work within the constraints of the resources available. Creating a questionnaire from scratch would necessitate psychometric testing and validation, which was beyond the scope of our study. Third, to complete validation of CUSP, it is necessary to demonstrate that the instrument is sensitive and can measure change as part of clinical validation studies. This is currently in progress as part of the U.K. national Therapeutic Interventions for Stones of the Ureter (TISU) study, and we eagerly await the results (expected circa 2019).

FIG. 5.

The Cambridge Ureteral Stone PROM. PROM = patient reported outcome measure.

Having guaranteed patient involvement at every stage of the development of CUSP (Fig. 1), a major strength of our tool, we are confident regarding its fidelity and utility. CUSP could provide clinicians with important insights into the impact of ureteral stone disease on patients. It will enable for the first time quantitative comparison of qualitative outcomes for management options for ureteral stones. Further validation currently ongoing will confirm if CUSP is an instrument that accomplishes its original aim.

Conclusion

We describe the detailed process of designing a PROM that is truly patient derived and encourage other clinicians and researchers to look beyond standard clinical indices of success and focus on ways to measure outcome from the patient's point of view.

Footnotes

Acknowledgments

The authors would like to thank Ms. Angela Cottrell, Mr. Andrew Dickinson, and Prof. Sam McClinton and all the patients who participated in this study.

Author Disclosure Statement

O.J.W. is in receipt of grants/research support from Porges Coloplast; is in receipt of honoraria or consultation fees from Boston Scientific, EMS, and Porges Coloplast; participates in company sponsored speakers' bureau from Boston Scientific, EMS, Porges Coloplast, and Olympus; and is a stock shareholder of Uroscreen Ltd.

Abbreviations Used

References

Black

. Patient reported outcome measures could help transform healthcare. BMJ, 2013; 346:f167.

Timmins

. NHS goes to the PROMS. BMJ, 2008; 336:1464–1465.

Devlin

, Parkin

, Browne

. Patient-reported outcome measures in the NHS: New methods for analysing and reporting EQ-5D data. Health Econ, 2010; 19:886–905.

Hurst

, Kind

, Ruta

, Hunter

, Stubbings

. Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness and reliability of EuroQol (EQ-5D). Br J Rheumatol, 1997; 36:551–559.

Brazier

, Harper

, Jones

, et al. Validating the SF-36 health survey questionnaire: New outcome measure for primary care. BMJ, 1992; 305:160–164.

Bosch

, Hop

, Kirkels

, Schroder

. The International Prostate Symptom Score in a community-based sample of men between 55 and 74 years of age: Prevalence and correlation of symptoms with age, prostate volume, flow rate and residual urine volume. Br J Urol, 1995; 75:622–630.

Jackson

, Sciberras

, Mangera

, et al. Defining a patient-reported outcome measure for urethral stricture surgery. Eur Urol, 2011; 60:60–68.

Tran

, Yip

, Uveili

, Biers

, Thiruchelvam

. Patient reported outcome measures in male incontinence surgery. Ann R Coll Surg Engl, 2014; 96:521–525.

Joshi

, Newns

, Stainthorpe

, MacDonagh

, Keeley

Jr. , Timoney

. Ureteral stent symptom questionnaire: Development and validation of a multidimensional quality of life measure. J Urol, 2003; 169:1060–1064.

10.

Penniston

, Nakada

. Development of an instrument to assess the health related quality of life of kidney stone formers. J Urol, 2013; 189:921–930.

11.

Penniston

, Antonelli

, Viprakasit

, et al. Validation and reliability of the Wisconsin stone quality of life questionnaire. J Urol, 2017; 197:1280–1288.

12.

Penniston

, Nakada

. Use of the WISQOL questionnaire. J Endourol, 2017; 31:420.

13.

Rothrock

, Kaiser

, Cella

. Developing a valid patient-reported outcome measure. Clin Pharmacol Ther, 2011; 90:737–742.

14.

Wiering

, de Boer

, Delnoij

. Patient involvement in the development of patient-reported outcome measures: A scoping review. Health Expect, 2017; 20:11–23.

15.

Hair

, Black

, Babin

, Anderson

. Multivariate Data Analysis: A Global Perspective, 7th ed. Upper Saddle River, NJ: Prentice Hall,. 2009, p. 680.

16.

Davison

, Hinkley

. Bootstrap Methods and Their Application. Cambridge, UK: Cambridge University Press,. 1997, section 4.3–5.3.

17.

Berwick

. Medical associations: Guilds or leaders?. BMJ, 1997; 314:1564–1565.

18.

Meredith

, Emberton

, Devlin

. What value is the patient's experience of surgery to surgeons? The merits and demerits of patient satisfaction surveys. Ann R Coll Surg Engl, 1993; 75(3 Suppl):72–73.

19.

Lim

, Harris

, Dawson

, Beard

, Fitzpatrick

, Price

. Floor and ceiling effects in the OHS: An analysis of the NHS Proms data set. BMJ Open, 2015; 5:e007765.

20.

Puche-Sanz

, Martin-Way

, Flores-Martin

, et al. Psychometric validation of the Spanish version of the USS-PROM questionnaire for patients who undergo anterior urethral surgery. Actas Urol Esp, 2016; 40:322–327.

21.

Jackson

, Chaudhury

, Mangera

, et al. A prospective patient-centred evaluation of urethroplasty for anterior urethral stricture using a validated patient-reported outcome measure. Eur Urol, 2013; 64:777–782.

22.

Lucas

, Koff

, Rosito

, Berger

, Bortolini

, Neto

. Assessment of satisfaction and quality of life using self -reported questionnaires after urethroplasty: A prospective analysis. Braz J Urol, 2017; 43:304–310.