Abstract
Introduction
Goniometry is a common measure of range of motion and may be assessed by different therapists and goniometers. To date, there is limited psychometric data on active and passive range of motion measurements of individual thumb joints. The purpose of this study was to analyze inter-rater and inter-instrument reliability of passive and active flexion goniometric measures of thumb joints in healthy adults.
Methods
A within-subjects psychometric design was utilized. Two raters each used two goniometers (Baseline™ Flexion-Hyper Extension and Baseline™ 180 Degree Digit) to measure each participant’s (n = 48) thumb carpometacarpal, metacarpophalangeal, and interphalangeal flexion range of motion. Inter-rater and inter-instrument reliability and stability were evaluated through use of intraclass correlation coefficient, standard error of the measurement, and minimal detectable change test statistics.
Results
Inter-rater reliability was poor for carpometacarpal flexion and good-to-excellent for metacarpophalangeal and IP flexion. Between-rater error ranged between 3.9 and 6.3 degrees for active measurements and between 3.9 and 7.9 degrees for passive. Error was generally less when using the Baseline™ 180 Degree Digit goniometer. Inter-instrument reliability was excellent for all joints.
Discussion
These findings validate the concerns that thumb goniometry inter-rater reliability may differ in clinical and non-clinical populations, support further study in clinical populations, and support a common assumption that the same rater should test the same client with the same goniometer to minimize measurement error. When compared to the Baseline™ Flexion-Hyper Extension Goniometer, the Baseline™ 180 Degree Digit had higher repeatability across raters. Further research on within-rater reliability is required as is study on clinical populations.
Introduction
The thumb contributes to a variety of functional movements of the hand especially when performing activities of daily living (ADLs). 1 ADLs may be limited by restrictions in thumb range of motion (ROM). 2 A wide arc of movement (i.e. ROM) of the thumb joints is necessary for participating in a range of daily activities that demand holding, grasping, and pinching.1,3
When ADLs are limited by thumb symptomology, therapists often use a battery of measures, including ROM assessments, to comprehensively evaluate thumb function, the goniometer being the most commonly used tool for assessing thumb ROM.4–6 According to a recent practice pattern survey, 7 64% of hand therapists report that persons with thumb pain comprise a quarter of their caseload and, of those therapists who treat persons with thumb pain, 85% or more use carpometacarpal (CMC), metacarpophalangeal (MP), and interphalangeal (IP) goniometry as an outcome measure. These practice patterns support that this tool is often used with clients with thumb symptomology and therefore should be psychometrically sound so as to ensure that therapists’ interpretations are accurate. Specifically, because thumb ROM is likely to be measured several times across time, often by different therapists, these measurements should be repeatable amongst raters. 8
Unlike the numerous reliability studies on shoulder, elbow, wrist, and finger goniometry,9–16 there are few published on the thumb.8,17,18 de Kraker et al. 8 evaluated the repeatability of thumb CMC palmar abduction measurements in healthy hands. Bhavana et al. 17 evaluated the repeatability thumb CMC, MP, and IP goniometric measures in a sample of persons with thumb CMC osteoarthritis (OA); however, they go on to assert that additional studies should be performed in asymptomatic participants. Although Barakat et al. 18 report thumb goniometry reliability findings in asymptomatic persons, their exploration was limited by a small sample (n = 10), non-standardized procedures, and averaging the findings of repeatability analyses across numerous joints rather than reporting the repeatability of individual joint measures. Moreover, given that (i) as many as 19 goniometer types are reported to be used when assessing of hand function 16 ; (ii) goniometers with varying increments of measurement (i.e. 1 vs. 2 vs. 5 degrees) have been proven to have, at best, moderate agreement when assessing finger ROM, 6 and (iii) there are no published studies to date on the topic, an exploration of the impact of goniometer type on thumb ROM measurement agreement is also justified.
The purposes of this study were to investigate (i) the inter-rater and (ii) the inter-instrument reliability of two commonly used goniometers (where increments of measurement are 1 or 2 degrees) when measuring active and passive flexion of the thumb CMC, MP, and IP joints in healthy adults.
Methods
Design
A psychometric study design was used to test the inter-rater and inter-instrument reliability of goniometers when measuring active and passive thumb flexion ROM of the dominant hand at the CMC, MP, and IP joints.
Participants
Volunteers were recruited via convenience sampling from a university campus in the USA. Adults 18 years or older who reported having healthy functioning thumbs were eligible to participate. Those who had a prior injury to their thumb which resulted in limited AROM and/or pain upon moving their thumb were excluded from participation. According to Walter et al.’s 19 proposed formula for sample size determination, 30 subjects were needed for sufficient statistical power (Beta = 0.20, alpha = 0.05). This study was approved by the university’s institutional review board (IRB # 1406M51384).
Raters
Two occupational therapy student raters were instructed on general goniometric techniques and were trained on thumb-specific measurements by a licensed and experienced occupational therapy educator. In addition, the student raters’ performance was later reviewed by a certified hand therapist for each goniometer type used. The decision to permit trained therapy students to participate as raters for this study is supported as being reasonable given that in several published studies on this topic, experienced and inexperienced therapy raters did not report dissimilar goniometric findings.10,20,21 Furthermore, in a separate study, 22 although inter-rater reliability (IRR) of upper limb goniometric measures improved over the course of several years of practice, the level of agreement was generally no better if not less than this study’s findings and the published IRR of finger MP and IP joint goniometry 23 is less than is presented in this study.
Instruments
The Baseline™ Flexion-Hyper Extension,
24
henceforth referred to as “Clear”, and The Baseline™ 180 Degree Digit,
25
henceforth referred to as “Black”, plastic goniometers were chosen for this study (Figure 1). The “Clear” goniometer measures from 30 degrees hyperextension to 120 degrees flexion in one degree increments, while the “Black” goniometer measures from 40 degrees hyperextension to 110 flexion in two degree increments. Prior to this investigation, the inter-instrument reliability of these two instruments when taking thumb flexion measurements had not been studied.
Baseline plastic flexion-hyper extension (clear) and baseline 180° Digit (black) goniometers.
Procedures
All participants were assigned sequentially with Rater 1 measuring even-numbered participants first and Rater 2 measuring odd-numbered participants first. Sequential assignment was also given to the goniometers with Rater 1 always starting with the black goniometer and Rater 2 always starting with the clear goniometer. Each rater measured each participant’s thumb twice, one trial with the clear goniometer and one trial with the black goniometer. Non-rating research staff recorded each rater’s findings in an effort to keep raters blinded to one another’s findings. In being consistent with the American Society of Hand Therapists' 2015 Clinical Assessment Recommendations, 26 as advocated by Adams et al. 27 and as frequently used in practice,16,17 raters measured each thumb joint on the dorsum of the participant’s dominant hand with the wrist and forearm in mid position, measuring the CMC, MP, and IP joints individually. Raters measured the MP joint with the IP joint extended to decrease the likelihood of passive insufficiency of the extensor pollicis longus. Raters measured the IP joint with the MP joint flexed to decrease effects of potential oblique retinacular (Landsmeer’s) ligament tightness, which can limit IP joint flexion. Raters first measured AROM then PROM of each joint with one goniometer, then switched to the second goniometer and followed the same order. There were 24 total measurements taken per participant between both goniometers and both raters. After a preliminary review of the data, there were four outliers for one rater consisting of two CMC measurements on two participants, which were considered procedural errors by those recording the data. These four CMC joint measurements were re-measured for each participant using both goniometers, by the same rater and these values supplanted the original scores.
Data analysis
Data analysis was performed with the Statistical Package for the Social Sciences (SPSS) version 22 (IBM, Armonk, New York). Descriptive data including mean (M) and standard deviation (SD) were calculated for each joint, rater, and instrument. The reliability between instruments and raters was measured three ways: (1) intraclass correlation coefficients (ICC)2,1, a combined index of both correspondence and agreement were determined, (2) the standard error of the measure (SEM), or the smallest amount of difference which is above the threshold of error, was computed to reflect the stability of the measurement between raters and instruments, and (3) the minimal detectable change (MDC), a more conservative measure of measurement stability, was also determined. 28 An ICC value of <0.40 indicates poor reliability, 0.40–0.75 indicates fair to good reliability, and >0.75 indicates excellent reliability. 29
Results
Participants
Demographic characteristics of participants (n = 48).
IRR
Inter-rater reliability for active range of motion of thumb flexion: descriptive data, intraclass correlation coefficients (ICC)2,1, Standard error of the measurement, and minimal detectable change (MDC95) for comparing two raters using two different goniometers.
CMC: carpometacarpal; MP: metacarpophalangeal; IP: interphalangeal; ICC: intraclass correlation coefficient; SEM: standard error of the measure = SD × √(1-ICC); MDC95: minimal detectable change = 1.96 × (SEM × √2); MDC% = MDC Percent Change = (MDC/Mean of all observations per measure) × 100.
Inter-rater reliability for passive range of motion of thumb flexion: descriptive data, intraclass correlation coefficients (ICC)2,1, standard error of the measurement (SEM), and minimal detectable change (MDC95) for comparing two raters using two different goniometers.
CMC: carpometacarpal; MP: metacarpophalangeal; IP: interphalangeal; ICC: intraclass correlation coefficient; SEM: standard error of the measure = SD × √(1-ICC); MDC95: minimal detectable change = 1.96 × (SEM × √2); MDC% = MDC Percent Change = (MDC/Mean of all observations per measure) × 100.
Inter-instrument reliability for active range of motion of thumb flexion: descriptive data, intraclass correlation coefficients (ICC)2,1, standard error of the measurement (SEM), and minimal detectable change (MDC95) comparing two raters using two different goniometers.
CMC: carpometacarpal; MP: metacarpophalangeal; IP: interphalangeal; ICC: intraclass correlation coefficient; SEM = standard error of the measure = SD × √(1-ICC); MDC95: minimal detectable change = 1.96 × (SEM × √2); MDC% = MDC percent change = (MDC/Mean of all observations per measure) × 100.
Inter-instrument reliability for passive range of motion of thumb flexion: descriptive data, intraclass correlation coefficients (ICC)2,1, standard error of the measurement (SEM), and Minimal Detectable Change (MDC95) for comparing two raters using two different goniometers.
CMC: carpometacarpal; MP: metacarpophalangeal; IP: interphalangeal; ICC: intraclass correlation coefficient; SEM: standard error of the measure = SD × √(1-ICC); MDC95: Minimal Detectable Change = 1.96 × (SEM × √2); MDC% = MDC Percent Change = (MDC/Mean of all observations per measure) × 100.
The response stability of these 12 measures (i.e. SEM) ranged from 3.0 to 7.9 degrees: passive IP flexion when using the black goniometer demonstrating the least amount of between-rater change attributable to measurement error (i.e. SEM = 3 degrees) and passive flexion of the MP joint with the clear goniometer displaying the most (i.e. SEM = 7.9 degrees). In all cases except active MP flexion, the SEM was lowest when taking flexion measurements with the black goniometer. When compared to the 5 degrees of measurement error associated with finger IP ROM measurements, 30 the measurement error associated with thumb IP goniometry was as little if not less while thumb CMC and MP measurement error generally trended towards being 1–2 degrees more. However, for two or more therapists to be 95% confident that, when each reporting on a single patient’s progress, the differences in their measurements are not due to chance, a change of 8.4 to 21.8 degrees is necessary (i.e. minimal detectable change or MDC95). The MDC%, an expression of the MDC as a percent of the mean of each measure, illustrated that in all but one measure (i.e. active MP flexion), the black goniometer produces less measurement error relative to the average maximum available flexion ROM. It should be noted, however, that for both instruments, the inter-rater error approached or exceeded 100% of the average maximum flexion ROM of the CMC.
After reviewing the data, large discrepancies between PROM and AROM were noted. Additional analysis indicated that across both raters and instruments, within-participant differences in passive and active measurements were significantly different across the MP, IP, and CMC joints (F(1,47) > 24.2, p < .0010). Of particular interest is that, in normal hands, passive and active measurements differed the most when measuring the MP joint (mean diff = 12.5 degrees SD = 6.9) with the black goniometer. The differences between passive and active IP (mean difference = 6.9, SD = 4.4) and CMC (mean difference = 3.6, SD = 1.6) measurements were notably and significantly less.
Inter-instrument reliability
Inter-instrument reliability was excellent for both raters across all measures. For AROM (Table 4) and PROM (Table 5) of the CMC joint and IP joint, the reliability was excellent (ICC > 0.75). There was better inter-instrument agreement for both raters for passive MP and IP flexion measures than active. However, active CMC flexion measurements proved to be more consistent than were passive. Across raters, the averages of all active measurements and five out of six passive measurements were larger when taken using the black goniometer (five out of six active measurements and six out of six passive measures statistically significantly larger). Inter-instrument active and passive flexion measurement error was largest for the MP joint; however, when the MDC was normalized according to maximum mean flexion scores (i.e. MDC%), the CMC measurements proved to have the greatest percent error, whereas the IP measurements possessed the least. With the exception of the SEM for passive MP flexion for rater 2 (SEM = 5.6), the inter-instrument mean differences and SEM were less than 5 degrees for AROM and PROM, which is less than the 5 degrees of measurement error associated with finger IP ROM measurements. 30
Discussion
This study offers an exploration of the reliability of thumb flexion measurements in unimpaired hands. Our findings support that, in normal hands and across the two goniometers used in this study, the IRR of thumb MP joint flexion goniometry is excellent and, in all cases except when using the clear goniometer to measure active flexion, thumb IP flexion is as well. Across both goniometer types and both active and passive measurements, CMC flexion goniometry IRR was poor. It should be noted, however, that the IRR of passive flexion measurements through use of the black goniometer was approaching a “fair” rating (i.e. ICC = 0.40–0.75). These poor ratings may have been due to poorly visualized bony landmarks and thus inconsistent placement of the goniometer at the joint’s center. This propensity is recognized by Burr et al. 6 who noted that placement is critical to IRR of finger IP goniometry. These findings differ from those with thumb CMC OA 17 in that the reliability of the IP and MP joints was generally higher in those without known impairments; however, CMC flexion reliability was notably higher. Given that the ICC is dependent upon response variability, 28 a likely explanation for these differences is a more homogenous arc of ROM observed in the healthy CMC joint relative to those with various degrees of CMC pathology. The higher MP and IP flexion IRR in those with healthy thumbs could be explained by the concomitant occurrence of MP and IP joint OA and joint enlargement or deformity in those with CMC OA; however, Bhavana et al. do not include data on those with concomitant thumb impairments. Moreover, these differences could also be explained by differences in measurement technique. Bhavana et al. took individual joint flexion measurements with all joints compositely flexed, whereas we attempted to control the effects of joint postures proximal and/or distal to the joint being measured to control for passive insufficiency, etc.
Given these findings, there are notable differences in the repeatability of thumb CMC, MP, and IP flexion between those with and without known thumb CMC OA. For those without known thumb impairments, IP and MP flexion goniometry appear to be relativity more repeatable across raters than they are among those with CMC OA. Additionally, our CMC flexion IRR findings validate concerns that the repeatability of thumb goniometry may differ between those with and without thumb symptomology 17 and reinforces the need for thumb goniometry IRR studies in other symptomatic populations. Additional research is necessary to measure the intra-rater (i.e. test–retest) reliability of CMC flexion goniometric measurements as well the intra and IRR of extension measurements.
Contrary to the findings of Lewis et al., 23 who explored the IRR of finger MP and IP goniometry, thumb flexion PROM measurements were generally no less repeatable across raters than were the AROM measurements. This could be as a result of what Lewis et al. admit to be insufficient standardization in that some therapists were taking passive IP measurements in the intrinsic plus position and others were not – whereas, in our study, all participants were tested in a standardized fashion as previously described. The thumb’s relatively better passive MP flexion repeatability may be explained by its anatomy in that it is uniaxial and the finger MP is biaxial. 31 As a result, one might expect more between-rater differences in the direction of passive force application (i.e. into some abduction and flexion by one rater and into some adduction and flexion by another) in a biaxial joint than one which produces only flexion extension. These differences need to be explored in clinical populations where IP and MP flexion deficits are expected.
In all measures but active MP flexion, the IRR was highest when using the black goniometer. The length of the moveable arm on the clear goniometer was longer than that of the moveable arm on the black goniometer, which may have contributed to the discrepancies noted. This longer arm may pose more challenges when measuring IP flexion of the thumb given the more dorsally rounded morphology of the thumb distal phalanx relative to the fingers. 32
Relatedly, but as an aside, because it is often presumed that “ if AROM is significantly less than PROM there is a problem with how the underlying structures are functioning” (p. 148), 33 our findings reveal large differences in thumb MP passive and active measurements in healthy and relatively youthful hands. This notably larger PROM at this joint might be attributable to active thenar insufficiency due to muscle bulk or hyperlaxity of a predominantly female sample. These findings suggest that some caution should be taken when assuming weakness is at fault when thumb MP PROM exceeds AROM by 5+ degrees. 30 Additional study is needed.
This study also explored the within-rater repeatability of thumb flexion measurements when taken using two goniometers with differing designs. According to our findings, there is an excellent agreement (ICC > 0.75) between the clear and black goniometers. However, when using these two instruments interchangeably, a therapist can only be 90% confident that improvements in thumb flexion ROM are due to something other than chance when they exceed, depending on the joint and type of measurement, between 2.4 and 5.6 degrees (i.e. SEM). Although the black goniometer displayed superior IRR, there is now evidence to support the practice of using the same instrument so as to reduce error when taking thumb flexion measurements, preferably the black goniometer. These findings are limited to these two goniometers and those with healthy thumbs and should also be explored in clinical populations where thumb ROM deficits are experienced.
Raters were occupational therapists in training which may give some therapists’ pause about the generalizability of these findings. However, it should be acknowledged that the repeatability of goniometric findings across raters may be more dependent upon proper training than on clinical experience 10 and in the case of this study, student raters received intentional training on thumb goniometry and direct supervision.
Additionally, the sample population was generally quite homogenous in terms of age and gender. The mean age of participants was 25, which is likely not representative of the mean age of hand therapy patients and 77% of the participants were female. Including more males and those over the age of 40 would have provided a more accurate representation of the general population.
This study focused on the inter-rater and inter-instrument reliability of thumb flexion goniometry in persons with asymptomatic thumbs. Additional research is needed on the IRR of thumb extension measurements as well as the test–retest reliability of thumb flexion/extension goniometry. Additional study in clinical populations with limited thumb mobility is also needed.
Conclusions
These findings validate concerns that thumb goniometry IRR may differ in clinical and non-clinical populations and helps to justify further study in additional clinical populations. This was also a first glance and the IRR of passive thumb flexion measurements yet, although passive MP and IP joint goniometry had higher IRR than in the comparable finger joints in healthy hands, additional study is warranted in clinical populations. The inter-instrument reliability for all joints of the thumb was found to be excellent and thus the black and clear goniometers could be used interchangeably in healthy hands. However, because there is also some error associated with use of multiple goniometer types, this error can be easily mitigated through consistent use of one of either device; further study is needed among those with conditions affecting the thumb. Lastly, the sizes of SEM and MDC of these measurements, combined with those of Bhavana et al., 17 would also appear to justify the pursuit of novel approaches to quantifying thumb flexion ROM when more than one rater is involved.
Footnotes
Acknowledgements
The authors thank Victoria Glader and Abbey Turgeon for their extensive contributions and dedication to the completion of this research study and all of the participants involved with this study.
Contributorship
CM, VM, KC and AK researched literature and conceived the study. CM, VM, KC and AK were involved in protocol development, data analysis, and interpretation. VM, KC, and AK were involved in gaining ethical approval and patient recruitment. AK and KC wrote the first draft of the manuscript. CM and VM reviewed and edited the second draft of the manuscript and CM reviewed, revised, and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Informed consent
Written informed consent was obtained from all subjects before the study.
Ethical approval
Ethical approval for this study was obtained from the University of Minnesota (USA)’s Institutional Review Board (IRB # 1406M51384).
Guarantor
CM.
