Abstract
Introduction
Lateral epicondylitis (LE) is a common, often chronic injury that affects a person's ability to grip and perform manual tasks including activities of daily living and leisure. Treatment of LE is controversial; however, there have been no standardized tools with which to measure outcomes in trials investigating new or existing therapies for LE. The Patient-Rated Tennis Elbow Evaluation (PRTEE) was designed to fill this gap, but has not yet been validated to the same standard as other, similarly developed tools.
Methods
Sixty subjects with LE who participated in a double-blind randomized controlled trial were measured with respect to the PRTEE, ‘overall pain’, ‘resting pain’, ‘pain during sleep’, ‘pain at its worst’ and ‘pain at its least’, and quality of life. All tools were administered at baseline, four and eight weeks.
Results
The PRTEE showed poor item-specific validity with measures of similar constructs. The intraclass correlation coefficient for test–retest reliability over four weeks was 0.76. The mean PRTEE differences between the group that experienced clinical significant change and the group that experienced virtually no change were significantly different from one another (t = −2.96, df = 30, P = 0.006).
Discussion
The PRTEE shows questionable discriminant ability due to its moderate test–retest reliability and possibly due to low convergent validity with other measures of similar constructs. The PRTEE appears to be sensitive to change, but the margin of difference between a clinically relevant change and ‘no change’ is very small.
Keywords
Introduction
Lateral epicondylitis (LE) is a common ailment of the elbow. It is characterized by pain around the lateral epicondyle, which worsens with resisted wrist extension, and gripping and twisting motions of the hand. Despite its common occurrence, there are very few outcome tools that have been validated for LE. The visual analogue scale has been the outcome used most often in LE intervention studies, but is usually used only for pain, with no reflections on function or disability. While pain is the critical diagnostic criterion, other dimensions of LE have not usually been investigated (e.g. impact on quality of life, activities of daily living, ability to perform common tasks).
The Patient-Rated Tennis Elbow Evaluation (PRTEE) was designed to fill a gap in outcome measures for LE. 1 It measures three dimensions: pain, function with the affected arm and usual activities.
The PRTEE consists of 15 items. All responses are rated on a visual numeric scale (VNS). This differs from the visual analogue scale (VAS) in that it is an ordinal scale as opposed to a continuous one. Respondents are asked to circle the number that best describes the situation or condition stated in the question. The numbers on the VNS are placed 1 cm apart from one another. The range of possible values is from 0 to 10, where 0 represents ‘no pain’ or ‘no difficulty’ and 10 represents ‘worst pain imaginable’ or ‘unable to do’, depending on the subscale (pain versus function/activities). The measurement tool is scored as the mean of all the items. Subscores for each dimension are scored as the mean of all the items in each particular dimension. Higher scores indicate higher pain and/or higher dysfunction.
The PRTEE has been tested for test–retest reliability over a one-week period in a population of unilaterally affected individuals with LE, with a minimum three-week duration of symptoms. 1 The intraclass correlation coefficient (ICC) for the total score was 0.89. The PRTEE subsequently underwent additional validity testing, whereby it showed good internal validity 2 ; however, test–retest reliability was assessed by Pearson correlation coefficient, which is not considered an appropriate measure of reliability. An assessment of sensitivity of the PRTEE showed good standardized response means and effect sizes, but did not comment on subjects' clinical status. 3 Therefore, while the PRTEE appears to change from a mathematical point of view, it is unclear whether these changes are corroborated by clinical change.
The purposes of this study were (a) to describe the PRTEE total, and subscale scores for a population of individuals with LE over the course of eight weeks during a randomized controlled trial investigating the effectiveness of extracorporeal shockwave therapy (ESWT) on previously untreated LE and to compare these values with those of Overend et al. 1 ; (b) to investigate item-specific convergent validity of specific items of the PRTEE with other measures; (c) to estimate the test–retest reliability of the PRTEE over a clinically relevant period of time that reflects the time frames of primary endpoints of interventional studies on LE and (d) to estimate the responsiveness to change of the PRTEE.
Methods
Ethics approval for this study was obtained from the Conjoint Health Research Ethics Board at the University of Calgary. Subjects were recruited through physicians' offices and through poster and e-mail campaigns within the City of Calgary as part of a randomized controlled trial on evaluating the effectiveness of ESWT over eight weeks in the treatment of previously untreated LE. 4 Subjects were included in the study if they presented with lateral elbow pain with a duration of not less than three weeks and no greater than one year; and had had no prior therapy for their LE. Exclusion criteria included contraindications for ESWT, the presence of bony or articular pathology (determined by X-ray examination), traumatic injury to the elbow, Worker's Compensation Board claimants and elite athletes (varsity, provincial and national team athletes). Subjects who were bilaterally affected were excluded from this study due to the small number of bilaterally affected subjects in the sample. They were not excluded from the randomized controlled trial.
Subjects were randomly allocated to receive sham or active ESWT once a week for three weeks. All subjects also received a wrist extensor stretching program, regardless of treatment allocation.
Demographic data (age, gender, height, weight, handedness, duration of symptoms) were collected from all subjects. Pain was measured for each affected elbow using a series of 10 cm VAS to evaluate overall elbow pain, resting pain, pain during sleep, pain during their main activity, pain at its worst and pain at its least. The left-hand anchors were labelled ‘No pain’ and the right-hand anchors were labelled ‘Worst pain imaginable’. Quality of life was assessed using the EQ5D quality-of-life instrument. 5
All tools and measures were administered at baseline, four weeks and eight weeks post study enrolment. The PRTEE was administered at baseline and all follow-ups. The order of presentation of the measures was as follows: VAS for overall pain, VAS for resting pain, VAS for pain during sleep, VAS for pain during main activity, VAS for pain at its worst, VAS for pain at its least, the EQ5D, and lastly the PRTEE. Subjects were not permitted to look back at previous questionnaires once they had moved onto the next questionnaire.
All subjects were classified as either having experienced clinically relevant change or not having experienced clinically relevant change at the eight-week endpoint. Clinically relevant change criteria consisted of the following two criteria: (1) at least a 50% reduction in the overall VAS pain score and (2) a maximum post treatment allowable VAS score of 4.0 cm.
Statistical methods
Descriptive statistics were calculated for each subsection and the total score of the PRTEE at baseline for comparison with the values reported by Overend et al. 1 These were further subdivided into scores by gender and by Overend et al.'s 1 definition of subacute (symptoms for 3 weeks to 6 months) and chronic (symptoms greater than 6 months). No other rationale for the subdivision of subacute and chronic was provided by Overend et al. 1
Bland and Altman plots 6 were generated for items in the PRTEE and corresponding VAS questions to investigate the level of agreement between those specific items on the PRTEE and other questionnaires. Specifically, the VAS for pain at rest, the VAS for pain at its worst and the VAS for pain at its least were directly compared. The VAS scores were rounded to the nearest whole number to create direct corresponding values to the PRTEE (e.g. VAS values of 3.50–4.49 cm were rounded to a corresponding PRTEE value of 4).
Bland and Altman plots are used to illustrate the mean difference between two measurement scores (of different tools or different administrations of the same tool) as well as the ‘limits of agreement’ of the differences between the two scores. The Bland and Altman plot is generated by plotting the difference between two scores against the average of the two scores, with the assumption that the arithmetic average of the two scores represents any given individual's ‘true’ value with respect to the construct being measured. Confidence intervals (CIs) are also calculated around each mean difference at the 95% level.
The limits of agreement are described by two numbers, an upper limit and a lower limit. They are determined by calculating two standard deviations above and below the mean difference, respectively. Since 95% of the differences will lie within these limits of agreement (provided the differences are normally distributed), the limits of agreement provide an estimate of the extent of agreement (or disagreement) between two scores. If the differences that lie between the limits of agreement are not indicative of clinically or practically important differences, it can be said that there is a strong agreement between the two scores. If the scores are derived from different tools, the tools could be used interchangeably. If the scores are derived from the same tool, but from different administrations of the tool (either in time or method), the Bland and Altman plot can be used as evidence towards establishing the reliability of the tool between the different administrations. 6
Responses from the categorical section of EQ5D were categorized into binary values: 0 for responses of ‘I have no problems with self-care’, and 1 for responses of ‘I have some problems with…’ and ‘I am unable to…’. Plots were created to compare the self-care question of the PRTEE with the self-care question of the EQ5D; and to compare the average score of the remaining three usual activities questions in the PRTEE (household work, work or main activity, and recreation or sporting activities) with the usual activities question of the EQ5D.
Test–retest reliability was calculated using ICCs 7 and Bland and Altman plots. Subjects were classified as not having changed clinically if their overall pain VAS changed less than 1 cm. Only subjects who met this criterion were included in the reliability analysis. ICCs were calculated for 0–4-week reliability. The reliability was tested only for the total PRTEE score and the pain subscore, due to the fact that the criteria for identifying individuals who had negligible change in their LE was based solely on a pain scale.
Responsiveness to change was investigated based on the clinically relevant change criteria. Change in PRTEE total scores and subscores was calculated for individuals who met the criteria and compared with the change in scores for individuals who had less than a 1 cm change on the overall pain VAS with an unpaired t-test. Additionally, effect sizes and standardized response means were calculated for those subjects who met the criteria for clinically relevant change as well as for those who exhibited negligible change in their VAS score (change of <1 cm over 8 weeks in the overall pain VAS).
Results
Sixty subjects were recruited for the randomized controlled trial. Seven bilaterally affected subjects were excluded from this validation study, leaving a sample of 53 unilaterally affected individuals. Demographic data for these subjects can be found in Table 1.
Demographic data for all 53 subjects
Mean values are presented, with standard deviations provided in parentheses. Median duration of symptoms is presented with interquartile range in parentheses due to the skewed distribution of duration of symptoms
Baseline comparison with previous reported values
Mean PRTEE scores for the pain and function subsections as well as the total score are presented, by gender, in Tables 2 and 3. Mean PRTEE scores for the pain and function subsections as well as the total score are presented in Tables 4 and 5 by the subacute/chronic classification. The values reported by Overend et al. 1 for each classification are also provided for comparison. Thirteen females were classified as subacute, and eight females were classified as chronic. Eighteen males were classified as subacute, and 14 males were classified as chronic.
Mean PRTEE scores by subsection and by gender
PRTEE = Patient-Rated Tennis Elbow Evaluation
Mean PRTEE scores by subsection and by gender as reported by Overend et al. 1
PRTEE = Patient-Rated Tennis Elbow Evaluation
Standard deviations are provided in round parentheses and 95% confidence intervals are provided in square brackets
Mean PRTEE scores by subsection and by subacute/chronic classifications
PRTEE = Patient-Rated Tennis Elbow Evaluation
Mean PRTEE scores by subsection and by subacute/chronic classifications as reported by Overend et al. 1
PRTEE = Patient-Rated Tennis Elbow Evaluation
Standard deviations are provided in round parentheses and 95% confidence intervals are provided in square brackets
Item-specific convergent validity with VASs
Figures 1 –3 show the Bland and Altman plot for the level of agreement between the question asking respondents to rate their pain ‘at rest’ in the PRTEE and the corresponding VAS; the question asking respondents to rate their pain ‘at its worst’ in the PRTEE and the corresponding VAS; and the question asking respondents to rate their pain ‘at its least’ in the PRTEE and the corresponding VAS. Corresponding quantitative values for the Bland and Altman plots are shown in Table 6.

Bland and Altman plot for the difference between the PRTEE item for ‘pain at rest’ and the VAS for ‘pain at rest’ versus the average of the PRTEE item and VAS for ‘pain at rest’. Larger dots indicate that more than one individual scored the same difference and average values. Dots are sized in direct proportion to the number of individuals with the same values. PRTEE = Patient-Rated Tennis Elbow Evaluation; VAS = visual analogue scale

Bland and Altman plot for the difference between the PRTEE item for ‘pain at its worst’ and the VAS for ‘pain at its worst’ versus the average of the PRTEE item and VAS for ‘pain at its worst’. Larger dots indicate that more than one individual scored the same difference and average values. Dots are sized in direct proportion to the number of individuals with the same values. PRTEE = Patient-Rated Tennis Elbow Evaluation; VAS = visual analogue scale

Bland and Altman plot for the difference between the PRTEE item for ‘pain at its least’ and the VAS for ‘pain at its least’ versus the average of the PRTEE item and VAS for ‘pain at its least’. Larger dots indicate that more than one individual scored the same difference and average values. Dots are sized in direct proportion to the number of individuals with the same values. PRTEE = Patient-Rated Tennis Elbow Evaluation; VAS = visual analogue scale
Quantitative values for Bland and Altman plots between PRTEE pain items and VAS pain scores
PRTEE = Patient-Rated Tennis Elbow Evaluation; VAS = visual analogue scale
Item-specific convergent validity with the EQ5D
Subjects who responded on the EQ5D that they had ‘…no problems with self-care’ had a range of responses from 0 to 8 points on the self-care item of the PRTEE. The median PRTEE score was 1 with an interquartile range of 0–2. Ten percent of these individuals scored 4 or higher on the PRTEE item, indicating that they had no problems with self-care on the EQ5D. Only one subject responded that he/she had ‘…some problems with self-care.’ This subject scored 5 on the PRTEE self-care item.
Subjects who responded on the EQ5D that they had ‘…no problems with usual activities’ scored in a range of 0–9.67 on the average of the three remaining usual activities items (‘household work’, ‘work or main activity’ and ‘recreation’) on the PRTEE. The median PRTEE average was 3 with an interquartile range of 2–4.3. Subjects who responded they had ‘…some problems with usual activities’ also scored in a range of 0–9.67 on the average of the remaining three usual activities items on the PRTEE, but a higher number of subjects scored in the 8–9.67 range than in the ‘…no problems with usual activities’ group. The median PRTEE average was 5.5 with an interquartile range of 3–6.67. Fifteen percent of subjects who reported having ‘…some problems with usual activities’ scored less than or equal to 1 on the average of the three usual activity items of the PRTEE.
Test–retest reliability
The Bland and Altman plot for test–retest reliability of the PRTEE total score over four weeks is presented in Figure 4, and for the PRTEE pain subscale alone in Figure 5. Corresponding ICCs and quantitative values for the Bland and Altman plots are reported in Table 7.

Bland and Altman plot of test–retest reliability of total PRTEE score over four weeks. PRTEE = Patient-Rated Tennis Elbow Evaluation

Bland and Altman plot of test–retest reliability for the PRTEE pain subscale from zero week to four weeks. PRTEE = Patient-Rated Tennis Elbow Evaluation
Intraclass correlation coefficients and quantitative values for Bland and Altman plots for PRTEE test–retest reliability at four weeks (n = 18)
PRTEE = Patient-Rated Tennis Elbow Evaluation; ICC= intraclass correlation coefficient
Responsiveness to change
Subjects who were classified as having experienced clinically relevant change at the eight-week endpoint were compared with subjects who had less than a 1 cm change in the overall pain VAS between baseline and eight weeks with respect to their PRTEE total score. The mean difference in PRTEE total score between week 0 and week 8 for subjects who had not changed more than 1 cm in their overall pain VAS was 1.0 cm (SD = 1.8 cm), while the mean difference in PRTEE total score between week 0 and week 8 for subjects who were classified as treatment successes was 3.1 cm (SD = 2.0 cm). The mean differences between these two groups were significantly different from one another (t = −2.96, df = 30, P = 0.006). Effect sizes and standardized response means are reported in Table 8.
Effect sizes and standardized response means for the PRTEE total score for subjects who experienced less than 1 cm of change on the overall VAS and subjects who met criteria for clinically relevant change
PRTEE = Patient-Rated Tennis Elbow Evaluation; VAS = visual analogue scale
Discussion
In comparison to the sample tested by Overend et al. 1 the sample in this study was similar in age, weight and height. The study sample at baseline, as a whole, was also very similar to Overend et al.'s 1 sample in terms of mean PRTEE total score, PRTEE pain subscore and PRTEE function subscore. This similarity was preserved when the PRTEE total score and subscores were examined by gender and by subacute/chronic classification. This indicates that the PRTEE is likely to capture the correct constructs in terms of quantifying pain, function and difficulty in performing usual activities and/or that in general individuals with LE tend to respond consistently on the PRTEE.
The PRTEE items for ‘pain at rest’, ‘pain at its worst’ and ‘pain at its least’ had poor to moderate correspondence with the VAS in the items tested in this study. Despite the strong possibility of recall bias (as the VAS questionnaire was completed minutes before the PRTEE, separated only by the EQ5D), the limits of agreement for pain items on the PRTEE and the corresponding VAS questions were often higher than 2 cm on either side of the mean difference (Figures 1 –3). It is possible that the use of the VNS, which labels each centimetre with a number, may unduly affect the responses of users of the PRTEE. That is, while the PRTEE presents the VNS as a 10 cm scale with equidistant numbers at each centimetre, the respondents do not perceive these numbers as being equidistant with respect to their perception of pain. Therefore, while a respondent may place a vertical line at 2 cm on a numberless VAS, it may correspond to a subjective rating of 5 when presented with a list of numbers from 1 to 10, despite the fact that the number 2 is 2 cm away from the left-hand anchor.
The PRTEE items in the usual activities section fared poorly against the EQ5D. The respondents who answered that they had no problems with self-care or usual activities rated themselves between 0 and 8 or 0 and 6.3 on the corresponding PRTEE item respectively. One possibility for this disagreement is that the EQ5D is presented as a general health questionnaire and not an LE-specific questionnaire, and that, on the whole, individuals affected with LE do not generally consider themselves as having ‘some problems’ unless asked specifically about tasks requiring their affected elbow. This is supported by the lack of change in the EQ5D thermometer score despite significant change in pain status as measured by the VAS in this sample. 4
The PRTEE showed moderate test–retest reliability over four weeks when a stable clinical condition was defined by a change of less than 1 cm on the VAS for overall elbow pain (ICC = 0.76). It had a lower ICC for the total score than for the pain subscore (ICC = 0.81), which can be explained due to the multidimensional nature of the PRTEE versus the unidimensional nature of the single VAS. Previous studies on the reliability of the PRTEE administered the questionnaire on consecutive days, which does not account for recall bias and therefore the possibility that the PRTEE exhibits high ICCs due to the short time interval between subsequent administrations. Further investigation into the longer-term stability of PRTEE scores in subjects whose LE does not change clinically with respect to pain and grip strength should be considered. This was not possible in this study due to the already small number of individuals (n = 12) who were considered as not having had symptomatic changes over four weeks with respect to the overall pain VAS.
The PRTEE pain subscale showed only a moderately higher test–retest reliability than the total score in subjects who exhibited very little change in the overall pain VAS (ICC = 0.81). The limits of agreement were at least 2 points on either side of the mean difference between the two scores. That is, an individual who was considered as not having experienced a change in their LE could have scored up to 2 points higher or lower than their previous score on the PRTEE pain subscale. This suggests that a change of 2 points on the pain subscale of the PRTEE may not be indicative of a clinical change of an individual's LE.
Overall, the PRTEE appears to be sensitive to clinically relevant change, though the margin that distinguishes relevant change from no change is quite small. A change of 2 points may not indicate a clinically relevant change in LE status, while a change of 3 points appears to be indicative of a clinically significant change in LE status. This is supported by the moderate to high effect sizes and standardized response means of the total PRTEE scores regardless of whether individuals were classified as having experienced clinically relevant change or as a group on the whole. On the individual level, it is difficult to determine whether the PRTEE is indeed a useful clinical tool to ‘objectively’ assess clinical change over time for any given patient, due to the wide range of responses. It is possible that an individual can score very similarly from baseline to eight weeks on the PRTEE but score very differently on other measures of pain such as a VAS. It seems that if the PRTEE is to be used as a valid tool for interventional studies on LE, then its test–retest reliability needs improvement in order to better distinguish individuals who have had significant change in LE status from those who have had none.
There are two main limitations to this study: (1) The true stability of symptoms in subjects who were classified as not having experienced any clinical change was not ascertainable. The assumption was made that subjects experiencing less than a 1 cm change on the overall pain VAS did not have significant changes in their LE status. Given that the subjects in this study were all part of a randomized controlled trial investigating a therapy for LE, it is possible (though unlikely) that the PRTEE was sensitive enough to detect changes in LE status such that subjects who were classified as having ‘no change’ actually did experience substantial changes, thereby lowering the test–retest ICC. (2) It is not known whether the VAS is truly a standard against which the PRTEE should be compared. It is possible that the PRTEE may be more sensitive to change, and possibly more reliable than the VAS, and that it is the PRTEE which should be the standard against which the VAS should be compared. The VAS, however, is a well-studied and validated tool for the measurement of ‘continuum’-type constructs, pain being one of its most common measured variables. 8 The lack of concordance between the two tools is nonetheless a concern, possibly for the validity of both tools in assessing LE status and change in LE symptoms.
The PRTEE appears to hold content valid in that the items appear to be relevant to characteristics of LE, although some items may be affected by lifestyle/environment more than an individual's difficulty at performing specific tasks. However, it may be beneficial to change from using a VNS to a VAS for each item, and to remove the temporal aspect of the usual activities section so that difficulty can be measured without relying on the recall of which activities were performed before the onset of pain.
Conclusion
The mean scores for the PRTEE and its subscales were reported in this study. The mean scores calculated in this study were very similar to those reported by Overend et al. 1 as a whole, as well as when divided by gender and subacute/chronic classifications. The PRTEE shows questionable discriminant validity due to its moderate test–retest reliability, and does not have high convergent validity with other measures of similar, if not identical, constructs. The PRTEE exhibited moderate test–retest reliability (ICC = 0.81) in subjects who had not changed clinically from one test to the other, but appears to be sensitive to change in LE, with a change of 3 points in the total PRTEE, indicating a clinically significant change in LE status. However, the difference between a change in the PRTEE score that indicates significant clinical change and no clinical change is very small (i.e. a change of 2 points may not indicate a substantial change, while a change of 3 points is indicative of substantial change). As such, it is not recommended that the PRTEE be used as an outcome measure in studies evaluating interventions for LE until further development and validation of this tool have been performed.
Footnotes
Acknowledgements
The authors wish to thank the Alberta Provincial Canadian Institutes of Health Research Training Program in Bone and Joint Health for their educational funding support.
