Abstract
Introduction
Lower extremity injuries are common throughout sport [1–6]. Non-contact injuries have been shown to account for 34% – 69% in longitudinal studies completed in elite English youth football [1, 4]. It is, therefore, important to understand the factors that cause these non-contact injuries. To execute athletic activity safely, requires multiple components of physical performance: Balance and proprioception, speed and agility, aerobic and anaerobic conditioning, muscle flexibility, strength, power and functional movement patterns [7–9]. There is significant overlap between these components, however measuring these individually through functional performance testing allows sports medicine professionals to identify potential injury risk [9].
Poor balance has been shown to augment the risk of non-contact injuries in football [10]. Balance testing, is therefore, an imperative tool to effectively assess the risk of injuries. Balance is central to many football specific movement patterns, including landing from a header and striking the ball [11]. It is important to note that balance is not a stand-alone component of performance, but an integral part of athletic performance with large overlaps with other athletic abilities, such as power and agility [9, 12–14]. It is suggested that as power and agility improve, balance ability must adapt to be able to control this development through functional movement patterns, resulting in better functional performance [9].
In order to truly gauge an athlete’s functional balance ability, reliable and valid balance tests must be conducted [9]. The current gold standard balance protocol is the ‘Y-Balance Test’ for the lower quarter (YBT-LQ) [10, 15]. The test was originally developed from the Star Excursion Balance Test (SEBT) with the aim to objectify the pass/fail criteria. Recent research [15] has demonstrated the YBT-LQ has ‘good’ to ‘excellent’ intra-rater reliability, with an intra-class correlation coefficient (ICC) ranging from 0.85–0.91. For inter-rater reliability the correlation coefficient was ‘excellent’ 0.99–1.00. Plisky et al.’s [15] work demonstrates that common errors made during the SEBT are effectively minimised using an instrumented device. These errors include whether the reach foot can bear weight, how this is objectified and whether movement of the stance foot is allowed. This research illustrates that by objectifying measures, using instrumented apparatus, correlation coefficients for both intra- and inter-rater reliability are increased. Previous research [16–18] shows that when the original SEBT is used, without such objective scoring, the ICC is impaired. The same authors report varying reliability of the SEBT with an ICC ranging from 0.67–0.97.
In recent years, the YBT-LQ has been considered one of the most reliable tests of dynamic balance in athlete testing, and as such, is used widely as a screening tool in elite sport. That said, there are more factors, aside from reliability, that influence whether a test is the most appropriate. In elite youth football, testing is generally conducted on the whole squad at specific times in the year, as well as for individuals following injury, therefore, functional performance testing including balance tests, must be easy to set-up, with minimal equipment and quick to complete. The YBT-LQ, however, requires a specific piece of equipment, which may not always be available to sports medicine professionals. It is also reported [15] to take 15 minutes per athlete to complete. Although reliable, this compromises the applicability of the YBT-LQ as a balance test in the sporting setting. In addition, when conducting functional performance testing in the sporting environment, it is imperative that the tests used are functionally appropriate to the movement patterns required in the sport. Although the YBT-LQ has been shown to be a reliable measure for dynamic balance, the movement pattern required during assessment is not applicable to most sports.
The modified BASS test for dynamic balance (BASS) is also widely used in sport to assess an individual’s balance competency [19]. One key weakness of the BASS test is that the set-up is time consuming, ten 2 cm×2.5 cm markers must be placed in the correct formation which requires measuring equipment and reduces the applicability to sport [19]. Additionally, the markers are a set distance apart and are not normalized for leg length, meaning nine-year-old youth athletes must jump the same distance as eighteen-year-old youth athletes. The BASS test requires the participant to keep their hands on their hips throughout each trial, reducing the appropriateness of the BASS test due to lack of sport specificity.
Berati, Adibpour and Rajaee [19] reported the fail criteria for the BASS test. In theory the criteria to which the BASS test assesses balance on is appropriate, however, in practice it could be seen as too subjective and difficult to score accurately in real time. An observer must count five seconds every time the participant lands whilst scoring them against eight different criteria during landing and balancing, ten times per trial per participant. This may lead to difficulties for a single observer and may require two in certain cases. It could be suggested that this is one factor that leads to its reduced ICC scores when compared with the YBT-LQ. Further, when compared with the YBT-LQ, the BASS test has lower reliability scores, ICC = 0.70–0.74 [19], although they are still classed as good.
There are similar limitations to the Balance error scoring system test (BESS). The test requires a series of tandem, double and single leg balance tasks on firm and foam surfaces [20]. Again the testing has limited sports specificity and assesses the capability of participant’s somatosensory, visual and vestibular capabilities to maintain static positions for prolonged periods of time with their hands on their hips [20]. However, the BESS test does assess balance on different surfaces which create instability and increase postural sway [21].
From the evidence above, there is a necessity for balance tests to not only be reliable, but also functional, sport-specific, quick, easy to set-up and easy to objectively measure. To cater to these demands the research team developed a new balance protocol using the existing criteria: The Functional Balance Protocol (FBP). The aim of the FBP was to effectively and reliably assess functional balance during football movement patterns, with a series of quick tests that require minimal set-up and equipment. Long term, the goal is to provide a new “gold standard” of testing for balance in sport. The current study aims to assess both intra- and inter-rater reliability of the FBP, to justify its use as a balance screening tool in sport.
Methods
Sample
The study included 10 healthy male subjects, aged between 18 and 40, who participated in sporting activity on a weekly basis. Participants were excluded if they were not moderately active or had any existing injuries or conditions that may affect their balance. A poster campaign was used to recruit volunteers for the study.
Ethical approval and informed consent
Ethical approval was gained by the STEMH ethics committee at the University of Central Lancashire (UCLan). Participants gave their informed consent after reading an information sheet and being briefed on the research procedure. Additionally, subjects were made aware of their right to withdraw from the study until the point of anonymization.
Procedures
All testing was conducted in a movement analysis laboratory at the University of Central Lancashire (UCLan). Upon arrival, participants read and signed an informed consent sheet and then changed into shorts, t-shirt and trainers. The testing order was randomized for each participant to reduce the risk of any learning effects distorting results.
The functional balance protocol
The FBP, was created by the research team based on the criteria from various gold standard balance tests, to provide sports clubs with a functional, sport-specific test that is easy to set-up and quick to administer. It was designed to incorporate three different take off mechanisms that mirror sporting movement patterns: jumping, running and lateral jockeying when defending. Testing was conducted in a movement analysis laboratory on a hard, smooth non-slip surface. A 100 cm strip of 5 cm wide Hypafix Tape was used to mark the take-off point for participants. A metronome was set at 60 beats per minute (bpm) to help researchers count during the balance phase of each task. Finally two video cameras were set-up to film each trial; one providing a frontal view and the other providing a lateral view. The cameras were used for retrospective video analysis in order to assess intra-rater reliability. Before conducting each task, the researcher explained the test to the participant, and testing commenced once the participant understood the test completely. All three tests in the FBP are measured against the same criteria; Touchdowns, foot adjustments, postural correction and the time that the subject’s balanced for. Touchdowns are defined as any part of the body, other than the landing foot, touching the floor during the five second balance period. Foot adjustments occur when the foot is adjusted to help correct balance, this includes a ‘hop’, ‘shuffle’ or heel/forefoot raise. Finally, postural correction is defined by the glenohumeral joint moving into >90 degrees of abduction and/or if the midline deviates due to hip rotation or sway. It is important to note that of these criteria, only a touchdown ends a trial, both foot adjustments and postural correction are permitted throughout the five second balance phase. Testing order was pre-assigned for each participant in a random order so that order of testing would not impact the results.
Vertical jump
The participants started in a bilateral stance on the take-off line and were instructed to jump as high as possible using their arms to help. Subjects were told to land on one leg depending on the testing order, and hold their landing position for five seconds. These five seconds were counted by a researcher in time with the metronome in the background. The participants completed two practice trials on each leg, before three experimental trials on each leg. During the five second landing phase the role of the two researchers was to time, observe and record the balance scores in line with the criteria. The role of the timing alternated between the two researchers for each participant. The subject’s balance mechanisms were recorded using the FBP score sheet (available from authors upon request).
Forward jump
The participant started approximately two metres behind the take-off line, this was adjusted by the subject during their practices for a natural run-up. All subjects began with their feet together before commencing a two-step run-up to the take-off line. The participant aimed to jump as far as possible and to land, on the opposite leg to their take-off leg, in the same fashion as described for the vertical jump. Two practice trials followed by three experimental trials were carried out on each leg. Balance mechanisms were recorded in the same way, against the same criteria, as the vertical jump.
Lateral jockey to jump
Each subject began approximately two metres behind the take-off line, as in the forward jump. For this task the participant performed a two-step lateral jockey before jumping from the take-off line. This task also required the subject to turn their body 90 degrees in the air in order to land facing the frontal video camera as in both previous tasks. Similarly to the forward jump, participants landed on the opposite leg to their take-off leg. Again two practice trials and three experimental trials were conducted and balance mechanisms were recorded using the FBP score sheet.
Scoring and retrospective video analysis
The FBP was scored retrospectively using the FBP scoring flowchart (available from authors). Participants scored 0–6 points for each trial, depending on the balance mechanisms employed. For each participant the highest score achievable is 108. For ease of use in sport, a ‘traffic light’ system accompanies the raw score for an individual. The aim of this is to allow easy identification of poor performance, so balance development programmes can be implemented as quickly as possible. Seventy-two hours after the real-time trials had been scored, both researchers reviewed videos of all trials for each participant and re-scored each individual. During this retrospective video analysis researchers did not have access to the real-time scores, in order to reduce the risk of bias. As a result, each participant received four scores out of 108; real-time researcher one, real-time researcher two, retrospective video analysis researcher one, retrospective video analysis researcher two.
Data analysis
Real-time and retrospective video analysis scores were used to calculate intraclass correlation coefficient scores for intra-rater reliability values. Both researcher’s real-time scores were used for real-time inter-rater reliability. Similarly, both researcher’s retrospective video analysis scores were used for retrospective video analysis inter-rater reliability. All data were analysed using IBM’s Statistics Package for Social Sciences (SPSS) version 22.0 (SPSS, Inc., Chicago, IL) using a two-way mixed effects model with absolute agreement.
Results
The overall and broken down mean scores, standard deviations and “traffic light banding” for live scoring can be seen in Table 1. Retrospective video analysis overall and broken down mean scores, standard deviations and “traffic light bandings” can be seen in Table 2.
Results of live scoring FBP test
Results of live scoring FBP test
Results of retrospective video analysis scoring FBP test
The results show the FBP has ‘excellent’ inter-rater reliability for both real-time analysis and retrospective video analysis. Real-time analysis produced an inter-class correlation coefficient value of 0.950, whereas retrospective video analysis produced one of 0.929 (Table 3). The correlation coefficient values varied between observers when analysing intra-rater reliability; Observer 1 produced ‘excellent’ intra-rater reliability with an correlation coefficient of 0.917, Observer 2 produced ‘excellent’ intra-rater reliability generating a correlation coefficient value of 0.864 (Table 3).
Correlation values of the FBP for inter- and intra-rater reliability, during real-time and retrospective video analysis
The inter-rater reliability of the FBP was excellent both in real time and with retrospective video analysis. Real-time inter-rater reliability produced a correlation coefficient value of 0.950, retrospective video analysis produced 0.929 (Table 3). Given that this is the first study to evaluate the reliability of the FBP, there is no comparable literature, however, other dynamic balance tests have been assessed. Plisky et al. [15] demonstrated that the YBT-LQ also produced ‘excellent’ inter-rater reliability in real-time with a correlation coefficient value of 0.970–1.000. This illustrates a similarity in reliability between the FBP and the YBT-LQ. Additionally, the FBP showed ‘excellent’ inter-rater reliability during retrospective video analysis, with a correlation coefficient value of 0.929 (Table 3). It is likely these correlation coefficient values were produced due to numerous factors. Firstly, the fail criteria were limited to three balance mechanisms, allowing observers more time to watch and less time scoring the individual during real-time analysis, which was a limitation of the modified BASS test. Secondly, the fail criteria were objective and defined, which may have reduced discrepancies between observers. Furthermore, the objectivity of the FBP means that inter-rater reliability during retrospective video analysis was not dissimilar to real-time scoring (0.929 compared with 0.950). This creates further opportunity in sport, where testing time is limited. The athletes could be filmed during testing and retrospectively analysed. This adds the necessity of video cameras to the set-up requirements, which may not always be available, however it will allow for swift and continuous testing for large participant groups.
The FBP utilises a flow chart scoring system that allows the clinician to categorise each participant into a ‘traffic light’ colour band. This allows the clinician to visually interpret each participant’s balance capability, with red being ‘poor’ and green being ‘excellent’. These traffic light categories can be seen in Table 4. As well as the colour system each participant is given an overall score which is a cumulative total of their limb scores for each of the 3 components of the test. When reporting the results for each participant, the colour and score is used. For example, participant x would be reported as ‘Yellow 81’. The advantage of this is that the clinician has an objective points total to use for comparison against future or past tests to monitor the participant’s progression or regression of balance. The system of scoring allows a breakdown of the overall score to look at limb differences which allows comparison of dominant versus non-dominant limbs. Further, this may be useful after lower limb injury as comparison can be drawn against the participant’s baseline measure, indicating whether further rehabilitation programmes are needed.
Traffic light classification system
Traffic light classification system
Interestingly, the current study highlighted discrepancies in intra-rater reliability between observers. One observer produced an ICC value for intra-rater reliability of 0.864, whereas the other produced one of 0.917 (Table 3). Previous studies, investigating reliability of balance protocols have found similar variability in intra-rater reliability. Plisky et al. [15] found that the YBT-LQ produced ICC values between 0.850 and 0.910. Similarly, Bressel, Yonker, Kras, and Heath [22] found that the Balance Error Scoring System (BESS) produced values between 0.780 and 0.960. One possible explanation for the difference in intra-rater reliability found in the current study may be that postural correction is difficult to define and, therefore, objectify. Although, attempts were made to include a definition, one observer may have perceived movements in real-time as postural correction that they did not during retrospective video analysis, or vice versa. There is potential to nullify the chance for such errors to occur, through the use of portable accelerometers, which provide objective data relating to postural movements. Future research can address the effectiveness of portable accelerometers during the FBP, however the purpose of the current reliability study was to assess the effectiveness of the test for pitch side or clinical use with minimal expense and equipment so that it can be used throughout all sports. It could also be argued that foot adjustments are more visible during retrospective video analysis than in real-time due to the angles available to the observer; during retrospective video analysis, observers saw the trials from both front-on and side-on. Despite the variance, an intra-class correlation coefficient value of 0.864 reflects ‘excellent’ intra-rater reliability [23].
During the data collection process, further advantages of the FBP were identified. Firstly, the set-up time is minimal, due to the lack of equipment required. This benefits practitioners twofold; they do not need to invest in equipment in order to carry out the protocol and they are able to test clients, athletes or patients quickly. The objectivity of the FBP is an advantage as it allows sports medicine professionals to use values as baseline scores, to identify risk with future testing or as return-to-play benchmarks following injury. This identification of specific risk, may in turn inform preventative intervention programmes.
Limitations of the current study should be noted. During data collection, subjects were asked to jump as high or as far as possible, depending on the task, whilst still being able to land safely. This was to prevent shorter or smaller jumps being conducted in order to make balancing easier. It is difficult to objectify whether subjects produced near maximal jumps or not and therefore results may have been skewed by varying power outputs. As a potential solution, future research could compare jump heights and distances against the subject’s baseline counter movement and forward jump scores.
The study was androcentric as it was aimed specifically at males. As a result, findings cannot be applied to females. Additionally, the sample size was small and future research should aim to assess the reliability of the FBP in a larger sample, this would help improve the generalizability of results. The test is not fully functional to elite sport as it is rare that athletes have to stick a single leg landing, however the test is an objective measure of dynamic balance using sport specific movements in comparison to the current gold standards. Finally, it is unknown as of yet whether the test can detect clinically significant change for use as a post-injury assessment tool. Future research should aim to identify whether the test is sensitive enough to detect alterations in balance scores following an intervention.
Conclusion
This study aimed to identify the inter- and intra-reliability of the FBP, a new balance screening protocol for sport. The results demonstrated that the FBP carries ‘excellent’ reliability (Table 3). This research brings a fresh debate to the forefront of FPT literature; what is the ‘gold standard’ of balance testing? The research supplies evidence to suggest that using a functional protocol with an objective fail criteria combining current “gold standard” balance tests, in conjunction with minimal equipment and a short testing duration may be a more appropriate method to test balance. The results of the research provide sports medicine professionals with an alternative balance screening protocol that may be easier to use, require minimal time and equipment, and is sport-specific as well as having ‘excellent’ reliability.
Conflict of interest
The authors have no conflict of interest to report.
