Abstract
A “free hand” real-time-ultrasound method is commonly applied to measure transversus abdominis. Potentially, this increases transversus abdominis measurement error due to uncontrolled variability in probe to skin force, inclination, and roll, particularly for novice examiners. This single-group repeated-measures reliability study compared the intra-rater reliability of transversus abdominis thickness and activation measurement by a novice examiner between free hand and a standardized probe force device method. The examiner captured ultrasound videos of transversus abdominis in a single session in healthy participants (n = 33). Free hand ultrasound featured uncontrolled probe force, inclination, and roll, while probe force device method ultrasound standardized these parameters. Images of transversus abdominis at rest and contracted were measured and transversus abdominis activation calculated. Intraclass correlation coefficient, coefficient of variation, standard error of measurement, and worthwhile differences were calculated. The probe force device method resulted in greater reliability (intraclass correlation coefficient = 0.75–0.96) and lower measurement error (coefficient of variation = 8.89–28.7%) compared to free hand (intraclass correlation coefficient = 0.63–0.93; coefficient of variation = 6.52–29.4%). Reliability was good for all measurements except free hand TrA-C, which was moderate. TrA-C had the lowest reliability, followed by contracted thickness of the transverse abdominis, with resting thickness of the transverse abdominis being highest. Worthwhile differences were lower using a probe force device method versus free hand for resting thickness of the transverse abdominis and contracted thickness of the transverse abdominis and similar for TrA-C. Standardization using probe force device method ultrasound to measure transversus abdominis improved intra-rater reliability in a novice examiner. Use of a probe force device method is recommended to improve reliability through reduced sources of measurement error. Probe force device method intra- and inter-rater reliability in examiners of varying experience, in clinical populations, and to visualize other structures merits exploration.
Introduction
Real-time ultrasound (US) imaging is used by physiotherapists to visualize and measure muscle architectural changes.1,2 It has been used extensively to research abdominal muscle activation in healthy and low back pain (LBP) groups. Some advantages of US are its relatively low cost, high portability, and the absence of ionizing radiation exposure for recipients. 3 US correlates well with intramuscular electromyography (EMG) activity (R 2 = 0.87), 4 making it a noninvasive alternative. Furthermore, US is not subject to the cross talk experienced with surface EMG.5,6 This supports its application for observing thickness and activation changes in deeper abdominal muscles such as transverse abdominis (TrA). 7
Along with the diaphragm, pelvic floor, and lumbar musculature, TrA stabilizes the lumbopelvic region.8,9 TrA dysfunction has been observed in individuals with LBP.10,11 Moreover, those with persistent LBP exhibit inappropriate modulation of TrA activation (TrA-C) across different postures, 12 and reduced TrA thickness during the abdominal draw-in maneuver (ADIM), which selectively activates TrA.13,14 Using US, physiotherapists can observe and record these measurement and activation deficits, and use them as outcome measures when retraining TrA motor control in LBP populations. 15
Physiotherapists using US currently implement a “free hand” (FH) method. However, there is no external means of controlling probe–patient pressure, inclination (side to side tilt), and roll (forward and backward tilt) using this method, leading to variability in these probe parameters. This constitutes a potential source of TrA measurement error,16–18 and theoretically reduces the reliability of US TrA measurement by clinicians and researchers, because a clinical measurement must be reliable if it is to be useful. 16 Costa et al. 16 and Whittle et al. 18 suggested that reliability of TrA measurement using US in both healthy and LBP groups has been suboptimal due to poor study designs and a lack of standardization of imaging procedures. Measurement of TrA-C has demonstrated the lowest reliability when compared to resting thickness of the transverse abdominis (RTrA) and contracted thickness of the transverse abdominis (CTrA), and only a few studies have reported reliability of TrA-C, despite its clinical importance.16,18
Experienced examiners using FH US have shown greater reliability compared to novice examiners using FH US to measure muscle thickness 19 ; Ferreira et al. 17 found that errors made by novice examiners were greater during the process of data acquisition rather than image analysis. Specifically, novice examiners may demonstrate increased variability in the pressure applied through the probe. 17 Variations in probe force, inclination, and roll alter the location and shape of structures in the resulting image.2,20 Thus, potential increased variability of probe to skin pressure by novice US examiners using current FH US methods is particularly problematic.
Previous attempts have been made to standardize probe position,14,15,21 with one recent pilot study reporting high intra-rater reliability of an experienced examiner using a “probe force device” method (PFDM) to measure RTrA, CTrA, and TrA-C. 22 Thus, the capability of such a standardized US TrA measurement method to improve reliability of a novice examiner is indicated. The aims of this study were to report measurement reliability of RTrA, CTrA, and TrA-C in a novice examiner, within and between FH US and a standardized PFDM. It was hypothesized that greater intra-rater reliability would be found with a PFDM compared to FH.
Methods
Study design
This single-group repeated-measures reliability study was conducted in a tertiary clinical education facility between October 2016 and March 2017, and was registered with the Australian New Zealand Clinical Trials Registry (ACTRN12616001340426p). Ethical approval was granted by the institutional Human Research Ethics Committee (H6729). Participants gave written informed consent and the rights of participants were protected.
Inclusion and exclusion criteria
Eligible participants were aged 18–60 years. Exclusion criteria included LBP in the preceding 12 months of significant enough severity to limit work or recreational activities, pregnancy in the preceding 12 months, inability to tolerate testing procedures, and allergies to adhesive tapes.
Examiner
The novice examiner screened potential participants for inclusion and exclusion criteria, and conducted US and TrA measurements under blinded conditions. The novice examiner had no previous US experience. Prior to data collection, the novice examiner completed 2 hours of training session with a physiotherapist of six years' experience in US-TrA measurement. This comprised a demonstration of FH imaging, PFDM, identification of landmarks, preparation and use of the PFD and US machine. The novice examiner then completed two independent imaging and measurement trials, each of 2 hours duration.
Experimental equipment, outcome measures, and procedure
Participant's age (years), height (cm), and weight (kg) were measured and recorded using standard protocols. 23 A US scanner in movie mode (GE Healthcare Venue 40 MSK; General Electric Company; Wauwatosa, WI, USA) with a 3.1 MHz curved array and 4C-SC model 533374596 abdominal probe (65 mm × 15 mm footprint) was used to acquire video images of participants' RTrA and CTrA.
For all US video imaging, participants were positioned on a treatment plinth in supine crook lying with their abdomen exposed. Eight videos were captured per participant, during one US session. Participants remained on the treatment plinth for the duration of video capture (n = 8). Arm position was kept consistent. Lower limb position was standardized using a goniometer (hips 45° flexion, knees 90° flexion). There were two “video sessions.” The order of US video capture per participant for each “video session” was RTrA FH, then CTrA FH, RTrA PFDM, and finally CTrA PFDM (n = 4). There was an interval of 10–15 minutes between “video session 1” and “video session 2” (Figure 1).
Flow chart of real-time US session flow. FH: free hand; PFDM: probe force device.
Prior to imaging, a participant familiarization session was conducted which included for both FH and PFDM, a RTrA imaging breathing protocol22,24 (Table 1) and training to selectively achieve CTrA using the abdominal draw-in maneuver (ADIM).
25
Participants were instructed to contract with the same intensity each time. Fifty percent of maximum was recommended as voluntary contractions greater than this intensity can lead to ultrasound image distortion.
13
US probe positioning can be seen in Figure 2.
PFD attached to real-time US imaging probe, marker, and placement. A single breathing cycle protocol during imaging of RTrA and CTrA ADIM: abdominal drawing in maneuver; CTrA: contracted thickness of the transverse abdominis; RTrA: resting thickness of the transverse abdominis.
The acquisition of US videos was conducted according to methods described in a previous study. 22 These methods were used for both FH and PFDM imaging. However, during FH US the examiner was blinded to the real-time probe force, inclination, and roll visual display from the LabVIEW virtual instrument link on the laptop computer. 26 This was important to maintain FH TrA US. For the PFDM, the novice examiner viewed the real-time visual display of force, inclination, or roll for the duration of video capture to standardize probe–skin orientation.
Still image extraction
US videos were imported into VideoPad® (NCH Software, 2016). The novice examiner extracted one still image from each imported video (n = 8). Still RTrA images from all videos were captured at end expiratory phase, and CTrA images during ADIM. For FH, the novice examiner selected still images (n = 4) based on the highest quality visualization of TrA available (Figure 3). For PFDM, still images were extracted based on PFD data output using quantitative matching methods described in a previous study.
22
Sample resting image (left) and contracted image (right). EO: obliquus externus abdominis; IO: obliquus internus abdominis; TrA: transverse abdominis.
To eliminate recall bias, each extracted still image was assigned a five-digit random number prior to measurement. Images were re-identified once all measurements were complete. Images were calibrated and measured (mm) using ImageJ Software (National Institutes of Health, Version 1.51k). A mark was made in the center of the muscle tissue 20 mm from the proximal fascial join. Two additional marks were made 10 mm medial and lateral to this point. The ImageJ angle tool was used to mark a perpendicular line from the superior aspect of the inferior hyperechoic fascial border of the TrA to the inferior border of its superior hyperechoic fascial line, and this distance was measured (mm) and recorded. Similar to previous studies, three measurements were taken of each image.14,27,28 The mean of these three measurements was calculated to represent the TrA thickness measurement. TrA-C was calculated as
Sample size calculation and data analysis
Based on a priori power calculation (G*Power Software, Version 3.1.9.2), a sample size of 34 was sufficient to provide greater than 80% of statistical power at an α level of .05 for the key dependent variables.
Statistical analysis was conducted in SPSS software, version 24 (SPSS Inc., Chicago, IL, USA). The measures of central tendency and dispersion were reported as mean±standard deviation (SD). A two-way (time × probe) repeated measures analysis of variance (ANOVA) was conducted to determine systematic bias of all included measurements. If an interaction effect, time, and/or probe effect were reported, differences were located with pairwise Bonferroni's corrections. The intra-rater reliability and measurement error of six TrA measurement conditions was assessed using intraclass correlation coefficient (ICC, SPSS 2-way mixed, 95% confidence intervals (CIs)) and coefficient of variation (CV) with associated 95% confidence interval. The six conditions were FH RTrA, FH CTrA, FH TrA-C, PFDM RTrA, PFDM CTrA, and PFDM TrA-C. The classification of ICC was interpreted as good, moderate, and poor with values greater than 0.75, between 0.51 and 0.75, and less than 0.5, respectively. 31
Intra-individual CV and standard error of measurement (SEM) with associated 95% CI were examined for all measurements. A CV of less than 15% was considered acceptable.
32
CV was calculated to allow for appraisal across different conditions, while SEM was reported to allow comparison with previous studies. SEM was calculated as
Results
Participant characteristics
A total of 37 participants were enrolled in the study. Four were later excluded, due to unusable images (n = 1), failed force data capture (n = 1), and unexplained erroneous measurement data input (n = 1). Included participants (n = 33) were all right-handed (age 29.7 ± 10.4 years; height 169.2 ± 8.8 cm; weight 69.2 ± 13.3 kg; BMI 24.1 ± 3.6 m kg2), and included 9 males and 24 females.
Systematic bias
Mean ± standard with intra-class correlation coefficient (ICC; 95% confidence interval (CI)), intra-rater coefficient of variation (CV; 95% CI), measurement bias with 95% limits of agreement (LOA), worthwhile differences (WDs), and standard error of measurement (SEM; 95% CI) between the two images for free hand (FH) and probe force device (PFM) and between FH and PFDM.
TrA: transverse abdominis.
Reliability
The intra-rater ICC values for each of the six TrA measurement conditions between “video session 1” and “video session 2”ranged from 0.63 to 0.96, with RTrA exhibiting greater ICC values than CTrA and TrA-C for both FH and PFDM (Table 2). Furthermore, the intra-rater measurement error expressed as CV between “measurement 1” and “measurement 2” ranged from 6.52 to 29.4%, with RTrA generating lower CV values than CTrA and TrA-C. Similarly, the intra-rater ICC and CV values between FH and PFDM ranged from 0.77 to 0.95 and 5.90 to 27.6%, respectively, with greater ICC values and lower CV values for RTrA than CTrA and TrA-C. When reliability measures were compared between FH and PFDM, the ICC values were improved for all included measurements (i.e. RTrA, CTrA, and TrA-C) from ICC values ranging from 0.63–0.93 to 0.75–0.96, respectively, and lower CV ranging from 8.89–28.7 to 6.52–29.4%, respectively. Based on the one-sample T-test, differences between “measurement 1” and “measurement 2” were significantly different from zero for FH-CTrA, FH-TrA-C, and PFDM-RTrA (p < .05), thus measurement bias and associated LOA were not computed for these parameters. However, no differences were found for FH-RTrA, PFDM-CTrA, PFDM-TrA-C and between all included measures between FH and PFDM (p < .05). Accordingly, the measurement bias between “measurement 1” and “measurement 1” for FH RTrA, PFDM CTrA, and PFDM TrA-C were 0.012, 0.07, and −0.04, respectively. In addition, the measurement bias for RTrA, CTrA, and TrA-C between FH and PFDM were 0.07, −0.05, and −0.06, respectively. Bland–Altman plots for bias and associated 95% LOA for differences between “measurement 1” and “measurement 2”, and for differences between FH and PFDM are presented in Figure 4. The WDs for measurements obtained using FH and PFDM between “measurement 1” and “measurement 2” ranged from 8.6 to 37.6%, with smaller WDs using a PFDM than FH (Table 2). The SEM was lower using a PFDM than FH for all measurements and ranged from 0.22 to 0.67 mm for RTrA and CTrA; SEM for TrA-C was 26 and 34% for PFDM and FH, respectively (Table 2).
Bland–Altman plots for bias and associated 95% LOA for TrA at rest, during contraction and percentage activation using FH and PFD between the two images (left-hand column) and between FH and PFD (right-hand column). FH: free hand; PFDM: probe force device method; TrA: transverse abdominis.
Discussion
To the authors' knowledge, this is the first study to report intra-rater reliability of a novice examiner to measure RTrA, CTrA, and TrA-C using a PFDM, and compare this with a FH US method. This study showed that the novice examiner exhibited greater intra-rater reliability of TrA measurement using a PFDM compared to FH. Similarly, the lower SEM exhibited by the novice examiner using the PFDM indicated reduced levels of measurement error. 32 Previously, pilot intra-rater reliability of a single examiner using the same PFDM was reported in a clinical population. 22 Results were superior, but this was not surprising given the examiner was experienced in TrA US measurement.
TrA-C demonstrated the lowest reliability compared to RTrA and CTrA for both FH and PFDM, which is consistent with findings of a recent systematic review. 16 These results are unsurprising because reliability of TrA-C is calculated as a function of RTrA and CTrA, which augments measurement error. 35 Further, the good reliability for RTrA and CTrA measurement using both FH and PFDM may indicate that for clinical utility a FH method may be considered acceptable. Nevertheless, justification for applying a FH TrA US method by novice examiners in future research would seem debatable, given the results of this study which demonstrated that reliability of measurement can be improved using a PFDM.
A previous study which also attempted to standardize probe position using a foam block 14 reported lower intra-rater reliability compared to this study. This would suggest that the PFDM may be superior, particularly given that the lower reliability reported with the foam block was demonstrated by an experienced examiner. However, such inference is currently unsupported due the previous study's smaller study sample, and acknowledged limitations which prevent true comparability with the current study.
Using the FH method by a novice examiner, Djordjevic et al. 35 reported higher ICCs but lower SEM for CTrA and TrA-C compared to both PFDM and FH techniques in the current study. Explanation for the divergent findings between the current study, and that by Djordjevic et al., 35 may be related to differences in the level of training, blinding methods, and larger sample population. The novice examiner in Djordjevic et al.'s 35 study had more US training compared to the current study, and the number of participants was approximately 30% greater than the sample size of the current study. In addition, unlike the current study, Djordjevic et al. 35 used on-screen calipers, suggesting that examiner measurement was not blinded. It is possible that the greater US training and potential measurement bias in the study by Djordjevic et al. 35 contributed to lower measurement errors. Given the inconsistent definitions of novice raters, established classifications of examiner experience in US are indicated to allow for better comparability across studies.
TrA-C has greater clinical utility when compared to absolute measurements of RTrA and CTrA16,18 as it is reflective of function, providing insight into active spinal and SIJ stability and is therefore important to physiotherapy practice.36,37 Standardizing force, inclination, and roll improved reliability of TrA-C in a novice examiner compared to a FH method. The results have demonstrated that this method produced higher ICC values potentially due to the PFDM reducing measurement bias. In the current study, standardization of probe force, inclination, and roll using a PFDM likely generated a more similar image on subsequent measurements. Variations in the angle or pressure through the probe during FH US may have altered the resistance the participant had to contract against and/or their perception of contraction intensity, therefore resulting in increased variability of FH CTrA. Additionally, contact forces between the probe and the patient can deform the tissue itself, particularly in soft areas of the body such as the abdomen. 38 Ishida and Watanabe 21 found that increased inward ultrasound probe pressure decreased TrA thickness. Therefore, controlling for inward pressure limits these potential sources of measurement error, which is evidenced by the results of this study.
Notwithstanding, the degree of measurement error using the FH method in this study remains a concern, because large differences will be required to demonstrate meaningful change in response to clinical trials. Standardizing force, inclination, and roll using a PFDM in novice examiners has potential to minimize intra-individual variability, stabilize US measurements across testing periods, and enhance methodological rigor for future research. This is particularly apparent for the improved TrA-C reliability demonstrated in this study. In addition to enhancing reliability, PFDM may be useful for training novice examiners, although further research is warranted to determine whether PFDM improves reliability between separate examiners (i.e. inter-rater reliability).
To the authors' knowledge, this was the first study to report on reliability and WD of TrA measurement using a PFDM with a novice examiner. This has provided new updated recommendations for the degree of necessary change to be clinically significant as a result of a future intervention. This is important, as reported reliability and WDs in previous FH US studies have been conflicting.27,35,38 Furthermore, the findings of this study suggest that past interventional research with large measurement error and low reliability where probe position was not standardized should be interpreted with caution. Finally, the acceptable reliability measures obtained by a novice examiner with a PFDM suggest for a wider range of practitioners to access US and generate accurate measurements.
Limitations and future research
The sample size of this study compared to those reported previously27,39 represents a strength of this research, but there were some limitations. Only intra-rater reliability was reported from one novice examiner. Additionally, the study population were healthy individuals. Hence, results cannot be extrapolated to all novice examiners, experienced examiners, or clinical populations. As such, future research should examine both intra- and inter-rater reliability with PFDM between examiners of similar, or varying, real-time ultrasound experience in clinical populations. The utility of the PFDM as a novice examiner-training tool is also of interest. Future studies should examine PFDM during functional tasks, as greater probe motion occurs during dynamic maneuvers 24 and is a challenge for consistent probe position and contact force. 24 Until measures of force, inclination, and roll are standard in US machines, the levels of improved reliability demonstrated in this study cannot be fully realized.
Conclusion
Standardization using PFDM US to measure TrA improved intra-rater reliability in a novice examiner. Use of a PFDM is recommended to improve reliability through reduced sources of measurement error. Clinicians should consider the importance of standardization of US probe position, and use caution when interpreting past research findings where reliability of FH US was applied. Methodological rigor in future research and clinical practice may be enhanced through PFDM US TrA measurement. Intra- and inter-rater reliability of this PFDM in examiners of varying experience, in clinical populations, and to visualize other structures merits exploration.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethics Approval
Study approval was obtained from the human research ethics committee of the James Cook University (JCUH6729), 18 September 2016.
Guarantor
CAF
Contributors
VLK and CAF researched literature and conceived the study. VLK, CAF, and KD designed the methodology. VLK and CAF conducted data collection. VLK and KD conducted the data analysis. VLK, CAF, and KD jointly interpreted the data. VLK wrote the first draft of the manuscript. VLK, CAF, and KD reviewed and approved the final version of the manuscript.
Acknowledgments
Not applicable
