Abstract
Introduction
The Patient-rated Tennis Elbow Evaluation (PRTEE) enables quantitative rating by the patient of pain and functional impairment associated with tennis elbow or lateral elbow tendinopathy. When used as an outcome measure in trials of therapies, a minimum clinically important difference (MCID) value is required to interpret trial outcomes. This study aimed to calculate the MCID for a sample of patients diagnosed with lateral elbow tendinopathy (LET).
Methods
The PRTEE was used as an outcome measure with participants in a trial of a novel therapy for LET. It was administered at baseline and after treatment, three weeks later. Score changes were compared with patient-rated global change scores using receiver operating curve analysis. MCID values were calculated for two different criteria of clinically important difference and the effects of baseline symptom severity on the MCID were investigated.
Results
Data were available from 57 participants, with PRTEE scores in the range 13–81/100. For clinical significance defined as ‘a little better’ the MCID for the total PRTEE score was 7/100 or 22% of baseline score. For clinical significance defined as ‘much better’ or ‘completely recovered’, the MCID was 11/100 or 37% of baseline score. The MCID value was higher for a subgroup with greater baseline severity.
Conclusions
Substantial changes in the PRTEE scores are required before they can be considered clinically significant. Clinically significant change varies according to the baseline score. The instrument may be less sensitive to change when used by people who are symptomatic in their non-dominant arm.
Introduction
Tennis elbow, also known as lateral elbow tendinopathy or lateral epicondylalgia, 1 is a common musculoskeletal disorder that can cause significant pain and disability. Various assessment instruments are used by clinicians and researchers for monitoring the progress of the disorder and the effectiveness of its treatment. A combination of measures is commonly employed, addressing physical variables such as pain and strength, functional limitations and psychosocial factors. The Patient-rated Tennis Elbow Evaluation (PRTEE) is an instrument that has been developed specifically for use with this disorder, 2 and is increasingly being employed in research. 3–8 It takes the form of a 15-item questionnaire, with five items addressing pain and 10 concerned with functional deficit. For each item, the respondent uses a 0–10 numerical scale to rate the average pain or difficulty they have experienced over the previous week while carrying out various activities that are commonly painful in tennis elbow. The marking system ensures pain and function are weighted equally in the total score. Higher scores represent greater severity and the maximum score is 100. 9 Its particular strengths are its simplicity and shortness, and its specificity to tennis elbow. Prior to its development, the only available patient-rated instruments were more generic, such as the Disabilities of the Arm, Shoulder and Hand (DASH) Questionnaire 10 and variants of the SF-36 Quality of Life Questionnaire. 11
The measurement properties of the PRTEE have been evaluated in several studies. It has been validated by findings of moderate to excellent correlation with more established scales such as the DASH and SF-36, 12–16 although a recent study found only poor to moderate correlation between individual items in the PRTEE and other comparable measures, including visual analogue pain ratings and items on the EuroQol Scale. 17 Several studies using English and other language versions of the PRTEE with people with tennis elbow have concluded that its test–retest reliability is excellent, with correlation coefficients for the total score and the pain and function subscores being greater than 0.9 in most cases. 12–16,18 Unfortunately the credibility of several of these studies is undermined by the use of short test–retest intervals – between 30 minutes and two days. Such intervals run the risk of introducing bias, since respondents are likely to remember their previous scores. Calculated reliability indices may thereby be inflated. A more recent study used a four-week test–retest period and calculated intraclass correlation coefficient values of 0.81 for the pain subscore and 0.76 for the total score, 17 still reasonable although not excellent. Such values are probably more reflective of reliability in clinical trials using the PRTEE, where intertest intervals of several weeks are common.
The responsiveness of the PRTEE – its capacity to detect change – has received little attention in the literature, although it is a key property in interpreting score changes. 19
Two dimensions of responsiveness are of interest. The minimum detectable change represents the smallest change in score that likely reflects true change rather than measurement error alone. 20 Knowledge of this quantity is helpful in establishing sample sizes required for clinical trials. It can be calculated from reliability data, 21 but has not been reported in published PRTEE reliability studies. The minimum clinically important difference (MCID) is the minimum change that would be interpreted as meaningful and worthwhile. 22 Knowing the MCID value for a particular instrument helps both clinicians and researchers to judge whether treatment is having a useful effect on the variables of interest. In the evaluation of patient-rated measures like the PRTEE, it is the patient's opinion that should be given most weight in defining ‘meaningful and worthwhile’. 23 There are various approaches to the determination of MCID values but an anchor based-method, involving comparison of scores on the measure of interest with those provided by a different but related measure, has been recommended. 23,24 Global change ratings are commonly adopted as the anchor referent because they can readily be used to define clinical significance, but this approach has not previously been employed to determine the responsiveness of the PRTEE. Therefore we conducted a study with the aim of determining the MCID of the PRTEE when used as an outcome measure with a sample of individuals with tennis elbow. Since MCID values may vary according to baseline status on the variables of interest, we also investigated whether this was the case when using the PRTEE.
Methods
This study formed part of a prospective trial of microcurrent, a subsensory electric current applied to promote analgesia and tissue healing. The trial involved the application of four different forms of microcurrent to compare their effectiveness (manuscript in preparation), and was approved by the authors' Institutional Ethics Committee. Participants were recruited via publicity in local offices, nearby sports clubs and local media, and all provided informed consent before enrolment. Eligible participants were over 18 years of age, able to complete the questionnaire, experiencing lateral elbow pain for at least three months with no significant change in the previous month and had a clinical diagnosis of tennis elbow made by a physiotherapist. The diagnosis was based on a history of lateral elbow pain exacerbated by gripping activities and tenderness at the lateral epicondyle coupled with lateral elbow pain on at least one of the following: resisted middle finger extension, resisted wrist extension or the chair-lift test. 4,25 This was supplemented by sonographic assessment of the common extensor tendon, conducted by the physiotherapist who had been trained for the purpose. 26 Tendinopathy was confirmed by the presence of thickening, hypoechoic areas, fibrillar disruption or calcification in the tendon. 27,28 Concomitant upper limb disorders did not exclude participation so long as the clinical and sonographic signs of tennis elbow were present.
Participants were instructed in the use of the PRTEE following the guidance provided in the user manual. 9 Minor changes were made to the wording of some questions to enhance comprehensibility in the British context. In Section 2, the words ‘coffee’ and ‘milk’ were removed, ‘pants’ was replaced by ‘trousers’, and ‘washcloth or wet towel’ was replaced by ‘wet cloth’. For the purposes of this study, participants were asked to complete the questionnaire twice, three weeks apart, during which time they received treatment for their tennis elbow. The treatment consisted of daily application of microcurrent to the lateral elbow via adherent electrodes. At second assessment participants were also asked to rate any change in their condition since first assessment, using a six point global change score (GCS), with the terms ‘much worse’, ‘a little worse’, ‘unchanged’, ‘a little better’, ‘much better’, ‘completely recovered’ as its descriptors. For analytical purposes, these were converted to scores of −2, −1, 0, 1, 2 and 3, respectively. Demographic and medical history data were also collected at baseline.
Data analysis
Absolute and percentage changes in PRTEE total and subscale scores between the assessments were calculated. These were compared with corresponding GCS values graphically and using the Spearman rank correlation coefficient, since the GCS is an ordinal scale. This was done to establish whether scores on the two scales were related, which is necessary if the GCS is to be used as the external comparator for the PRTEE scores. 23 The MCID for the instrument was then determined using the GCS as an anchor and determining the group average change in PRTEE score corresponding to two different definitions of clinical significance: CGS = 1 (‘a little better’) and GCS = 2 (‘much better’). The specificity and sensitivity of the change score for these classifications were examined using a receiver operator characteristic (ROC) curve analysis, and the MCID was estimated for the best match of sensitivity and specificity. 29 Area under the curve (AUC) analysis was conducted to gauge how well the PRTEE discriminates between those whose score change is clinically significant from those for whom it is not. To investigate the potential effect of baseline severity on the MCID, separate analyses were conducted on subgroups defined by a baseline total PRTEE score of <40 and ≥40. This cut-off was selected post hoc to ensure reasonably sized subgroups for this sample. All statistical tests were conducted using SPSS 17 (SPSS Inc, Chicago, IL, USA) with significance set at P ≤ 0.05.
Results
Sixty people diagnosed with tennis elbow underwent baseline assessment. Of these, one withdrew before second assessment and two had missing questionnaires due to administrative error, leaving data from 57 people for analysis. Their baseline characteristics are presented in Table 1. PRTEE subscale and total scores covered a wide range of the possible score values. Eleven participants were symptomatic in their non-dominant arm. Five of these reported bilateral symptoms and had baseline total scores in the range 34–76/100, representing a range of severities.
Baseline characteristics of participants
PRTEE, Patient-rated Tennis Elbow Evaluation
At second assessment, GCSs indicated that three participants judged their condition to have deteriorated, 18 said it was unchanged, 19 reported it was a little better and 17 said it was much better. Figure 1 is a box and whisker plot showing the median and interquartile range of change in PRTEE total scores corresponding to each GCS. The scores of the two outliers were checked and found correct. As the plot illustrates, the spread of score changes increased with higher GCS values; however, the spread of percentage score changes varied much less between GCS scores. Absolute and percentage changes in pain, function and total scores all showed significant (P < 0.001) moderate correlation with GCSs, with Spearman's rho between −0.54 and −0.66, the highest values being for the total score changes. The negative correlations are a consequence of the way score changes are calculated: by subtraction of baseline from follow-up values. Clinical improvements therefore corresponded to negative change values. Changes in pain and function subscale scores were significantly correlated (Spearman's rho = 0.63, P < 0.01).
Change in Patient-rated Tennis Elbow Evaluation (PRTEE) total score between assessments for each global change score
Table 2 presents data from the ROC curve analysis, conducted using the two definitions of clinically significant change and estimating a cut-off value for each score where sensitivity and specificity are approximately equal.
Area under curve (AUC) analysis and cut-off values for PRTEE score changes
PRTEE, Patient-rated Tennis Elbow Evaluation; GCS, global change score
AUC values greater than 0.8 are considered to indicate excellent discriminatory capacity, 22 and all scale and subscale changes met this criterion except for percentage changes in the function subscale score, which approached it. As would be expected, larger absolute and percentage changes in scores are required to meet the stricter criterion for the MCID, with score changes of 35–40% of the baseline value being required to be considered clinically significant. Table 3 presents data from the subgroup analysis, with cut-off values for total score where clinical significance is defined as GCS = 2. Both absolute and percentage changes in total score had AUC values greater than 0.8.
MCID values for subgroups with different PRTEE total scores at baseline
PRTEE, Patient-rated Tennis Elbow Evaluation; MCID, minimum clinically important difference
A number or participants volunteered comments about use of the questionnaire. A common comment is that it was problematic to score ‘difficulty’ and ‘pain’ as separate entities because the difficulty was seen as entirely a consequence of pain rather than, say, weakness; so they were inclined to see the two subscales as addressing the same question. Some also reported that, since they were either symptomatic on the non-dominant side or had learned to use the non-symptomatic limb for activities identified in the questionnaire, they did not feel limited in the activities specified.
Discussion
Since pain and functional limitations are the most common consequences of tennis elbow, it is reasonable to expect changes in their PRTEE scores to correlate with the individual's overall rating of the condition. A correlation threshold of 0.30–0.35 between a patient-rated outcome and an anchor has been recommended for estimating MCIDs, 23 and this criterion was met in this study. However, the correlation is only moderate, suggesting that factors other than those addressed by the questionnaire are involved in the patient's global impression. This interpretation is supported by the relatively large changes in PRTEE scores that are required before the individual considers that significant improvement has occurred. It may also be relevant that more than 20% of participants in this study either were symptomatic in the non-dominant arm or reported compensatory use of the unaffected arm. A consequence of this is that changes in the condition may not have been accompanied by commensurate changes in rating of those particular items by these individuals, in which case the instrument is less sensitive to change for this subgroup. The five participants who reported bilateral symptoms had a variety of GCSs and inspection of their data suggested that these were commensurate with their PRTEE score changes. Hence, the questionnaire appeared valid for use with this subgroup.
The difficulty that was reported by some participants in distinguishing ratings of pain and difficulty attempting tasks is reflected in the significant correlation between changes in the pain and function subscale scores. Pain and function scores were also significantly correlated at each time point. This suggests that respondents largely equated the two constructs and casts some doubt on the value of modelling them as distinct metrics in a patient-rated measure. It may be that separate rating of functional problems is worthwhile when weakness is a significant feature of the presentation.
The primary function of the MCID is to inform interpretation of group mean values, for example in clinical trials of treatment effectiveness. This study indicates that substantial changes in group mean PRTEE scores are required before a clinically significant change can be assumed to have occurred, even using the more liberal criterion of ‘a little better’. The more stringent criterion of ‘much better’ requires falls of the order of 35–40% of baseline scores to be confident that a desirable level of improvement has occurred. Where a study reports a statistically significant difference between two treatments, but the actual change in the more effective treatment group mean PRTEE score is less than 35%, the value of the treatment remains open to question.
The subgroup analysis confirms that the absolute score change considered clinically significant depends on the baseline severity of the condition. Subgroups with milder symptoms require considerably smaller PRTEE score changes than those with more severe presentations in order to consider that significant improvements have occurred. However, percentage changes in scores that are regarded as clinically significant are similar between severity subgroups. This finding is consistent with studies of numerical pain scales, which have concluded that changes of 30–35% of baseline score are clinically significant. 30,31
The minimum detectable difference can be determined using test–retest reliability data, 21 although most reliability studies have not reported it for the PRTEE. Calculations using data from three studies 13,16,18 suggests that the MDD for the total score lies in the range 8–12 points, meaning that the score must change by at least this much to be confidently interpreted as real, and not attributable to random error alone. These values exceed the liberal MCID value for milder presentations, and so suggest that the test–retest reliability of the instrument may limit its use for detecting clinically significant change in this group.
There were several potential limitations in this study. The changes made to the wording of some items in the questionnaire may have influenced its measurement properties. Indeed, this was the intention: we felt that the relatively minor changes would enhance understanding and consistent interpretation of the questions. This might have produced different findings than if the standard version had been used, although we suspect that the differences would be minor. The study sample size was not determined prospectively because the investigation drew data from a broader pilot study. This was a pragmatic pilot trial, whose liberal inclusion criteria mean that concomitant upper limb disorders may have contributed to the elbow pain, and so influenced PRTEE scores. In this sense, the questionnaire was being applied to a broader population than its name suggests. However, we would argue that since tennis elbow often presents clinically with other upper quadrant problems, testing the questionnaire with this broader population assesses its value in a realistic context. Several other studies evaluating the measurement properties of the PRTEE have not excluded participants with common upper limb co-morbidities such as radiculopathy or radial nerve involvement, 13,15,17 and it has also been used with different groups altogether, for instance those undergoing arthroplasty. 32
Objections have been raised regarding the retrospective use of global change measurements as anchors for determining responsiveness. 33 Recall bias may mean that change scores correlate with present status, and the validity and reliability of a subjectively-rated GCS is very difficult to evaluate. Empirically derived data in a variety of clinical contexts suggest that the method does have external validity, 34 but comparisons with MCID values derived using other methods are required to build confidence in the interpretation of PRTEE score changes.
It has been argued that responsiveness studies should be prospective, based on reasonable assumptions about whether and what kind of change is expected in the sample. 35 Although this analysis was retrospective, it used data from a study in which such differences were expected. The hypothesis of the trial, based upon previous studies, was that some types of microcurrent would be more effective than others. Nevertheless the results of this analysis require corroboration by other, prospectively designed studies.
Conclusions
This study has provided estimates of the changes in PRTEE scores that are required to judge whether significant changes have occurred in symptoms experienced by patients with tennis elbow. For a group using the questionnaire, mean PRTEE score reductions of at least 11 points – or an improvement of 37% on the mean baseline score – is necessary to consider that a substantial improvement has taken place. Using a less stringent criterion, falls of seven points or 22% of the baseline score can be interpreted as indicating a limited but meaningful improvement. The value of separate ratings for pain and function in this instrument, when used with this population, is open to question. Until this matter is resolved, the total score may be of value in estimating symptom severity. The questionnaire may have lower sensitivity to change when used with respondents who are symptomatic in the non-dominant arm.
Footnotes
Acknowledgement
The study was funded by the University of Hertfordshire.
