Longitudinal Improvement in Balance Error Scoring System Scores among NCAA Division-I Football Athletes

Abstract

The Balance Error Scoring System (BESS) is a commonly used concussion assessment tool. Recent studies have questioned the stability and reliability of baseline BESS scores. The purpose of this longitudinal prospective cohort study is to examine differences in yearly baseline BESS scores in athletes participating on an NCAA Division-I football team. NCAA Division-I freshman football athletes were videotaped performing the BESS test at matriculation and after 1 year of participation in the football program. Twenty-three athletes were enrolled in year 1 of the study, and 25 athletes were enrolled in year 2. Those athletes enrolled in year 1 were again videotaped after year 2 of the study. The paired t-test was used to assess for change in score over time for the firm surface, foam surface, and the cumulative BESS score. Additionally, inter- and intrarater reliability values were calculated. Cumulative errors on the BESS significantly decreased from a mean of 20.3 at baseline to 16.8 after 1 year of participation. The mean number of errors following the second year of participation was 15.0. Inter-rater reliability for the cumulative score ranged from 0.65 to 0.75. Intrarater reliability was 0.81. After 1 year of participation, there is a statistically and clinically significant improvement in BESS scores in an NCAA Division-I football program. Although additional improvement in BESS scores was noted after a second year of participation, it did not reach statistical significance. Football athletes should undergo baseline BESS testing at least yearly if the BESS is to be optimally useful as a diagnostic test for concussion.

Introduction

The Sport Concussion Assessment Tool (SCAT) is a series of tests used to aid the clinician in concussion evaluation. When feasible, baseline assessment of these tests is recommended, which includes an assessment of balance. The Balance Error Scoring System (BESS) test is a clinical tool used in the standardized concussion evaluation and is considered a reliable and valid means of evaluating balance in athletes with suspected concussion.¹ The BESS is also used in assessing for balance impairments in athletes before return to play post-concussion by comparing the number of errors made on the test at baseline to the number of errors made at the time of potential injury. Although the BESS is a widely used clinical tool, questions exist regarding its applicability to the broad spectrum of all athletes across age, sex, sport, injury history, and testing environment.^2,3

Recent studies bring question to the stability and reliability of baseline BESS scores over time. Burk and colleagues reported improvement in BESS scores during a 90-day intercollegiate season in female athletes, whereas Mulligan and colleagues reported that after 4 weeks, the learned effect of repeat BESS testing in college aged adults continued.^4,5 Aside from the athlete's performance on this testing, the reliability of the scoring of the exam has also been questioned. A study evaluating inter-rater reliability found that those experienced and trained in scoring the BESS do not exhibit clinically significant reliability in scoring the BESS.⁶ Attributed to the multiple variables that may affect the BESS scores, we hypothesize that baseline BESS performance is a moving target making the BESS a progressively insensitive marker of concussion. The aim of this study was to evaluate for change in baseline BESS scores on an NCAA Division-I football program; the evaluation of inter- and intrarater reliability were secondary outcomes.

Methods

Institutional review board approval was obtained for this prospective cohort study. In year 1 of the study, 23 incoming freshman football athletes were videotaped performing baseline BESS testing before initiating organized football activities. Testing was performed on both firm and foam surfaces with double, single, and tandem stances on each surface as outlined in previous works.³ All athletes in this study had been cleared for full participation in athletic activities by the yearly pre-participation examination. None of the athletes had suffered a previous concussion as documented on their entrance pre-participation evaluation. Information regarding an individual athlete's concussion history during the year-long study was maintained by the medical team overseeing the football athletes. After 1 calendar year of participation in organized football activities, the same 23 athletes were videotaped performing the BESS test.

In year 2 of the study, 25 incoming freshman football athletes were videotaped in the same manner as those athletes in year 1 of the study. Of note, 3 of these athletes had suffered a previous concussion as documented on their entrance pre-participation evaluation. Additionally, 21 of the athletes from year 1 (2 athletes had left the team after year 1) were videotaped after year 2 in the football program. No athletes had evidence of concussion on their pre-participation examination preceding either year.

The videos were de-identified and randomized. De-identification included removing the athletes' name and identifying information from the videos. The faces of the athletes were visible in the videos, given that viewing an athlete's eyes is necessary to properly score the BESS test. Four certified athletic trainers and three senior athletic training students (now certified athletic trainers), none of which were affiliated with the football team and/or the study subjects, were blinded to the purpose of the study. These individuals underwent standardized physician-guided training on scoring the BESS. Performed by the authors, this training included viewing practice videos, providing feedback, and performance based testing. Each reviewer scored every video independently and was blinded to the scores assigned by the other reviewers. The paired t-test was used to assess for change in score over time for the firm surface, foam surface, and the cumulative BESS score. Inter-rater reliability was calculated utilizing the intraclass correlation coefficient. To assess the intrarater reliability, 10% of all the previously scored videos (e.g., upon matriculation, after year 1 and after year 2) were selected utilizing a random number generator. The athletic trainers individually scored these tests in the same manner as with the previous videos.

Results

Cumulative errors on the BESS significantly decreased from a mean of 20.3 at baseline to 16.8 after 1 year of participation (95% confidence interval [CI] = −5.27 to −1.82); p < 0.01). Individually, the firm (95% CI = −2.35 to −0.38; p < 0.01) and foam (95% CI = −3.66 to −0.71; p < 0.01) scores significantly decreased (Table 1).

Table 1.

BESS Scores at Baseline (First Test) and after 1 Year of Participation (Second Test)

	First test mean (n = 48)	Second test mean (n = 48)	First test–second test mean difference (95% CI)	p value
All athletes
Cumulative BESS score	20.3 SD 6.1	16.8 SD 6.4	–3.55 (−5.27 to −1.82)	0.0001
Cumulative firm score	5.0 SD 4.0	3.6 SD 3.2	–1.36 (−2.35 to −0.38)	0.0079
Cumulative foam score	15.4 SD 4.0	13.2 SD 4.4	–2.18 (−3.66 to −0.71)	0.0047

BESS, Balance Error Scoring System; CI, confidence interval; SD, standard deviation.

In the 21 athletes evaluated following the second year in the program, additional improvement in BESS scores was observed for the cumulative score as well as both the foam and firm surfaces individually. However, this did not reach statistical significance (Table 2).

Table 2.

BESS Scores at Baseline (Y1), after 1 Year of Participation (Y2), and after 2 Years of Participation (Y3)

	Y1 mean (n = 23)	Y2 mean (n = 23)	Y3 mean (n = 21)	Y1–Y2 mean diff (95% CI)	p value	Y2–Y3 mean diff (95% CI)	p value	Y1–Y3 mean diff (95% CI)	p value
Athletes enrolled in year 1 of the study
Cumulative BESS score	19.0 SD 6.3	15.5 SD 5.1	15.0 SD 4.6	−3.46 (−6.33 to −0.59)	0.02	–0.86 (−3.93 to 2.22)	0.57	–3.96 (−7.10 to −0.82)	0.02
Cumulative firm score	4.2 SD 3.8	3.1 SD 2.2	2.6 SD 2.0	–1.12 (−2.7 to 0.50)	0.17	–0.65 (1.92–0.62)	0.30	−1.65 (−3.30 to −0.02)	0.048
Cumulative foam score	14.8 SD 4.9	12.4 SD 4.2	12.4 SD 3.7	–2.34 (−4.76 to 0.07)	0.057	–0.20 (−2.60 to 2.19)	0.86	–2.31 (−4.97 – 0.34)	0.08

BESS, Balance Error Scoring System; diff, difference; CI, confidence interval; SD, standard deviation.

Intraclass correlation coefficient for inter-rater reliability for the cumulative BESS score was 0.75 for videos obtained at the beginning of year 1, 0.65 for those obtained at the end of year 1 and beginning of year 2, and 0.75 for those obtained after year 2 (Table 3). Intrarater reliability was 0.81.

Table 3.

Inter-Rater Reliability of BESS Scores at Times Across the Study

	Intraclass correlation coefficient for inter-rater reliability
	Beginning of year 1 (baseline)	End of year 1/beginning of year 2	End of year 2
Cumulative BESS score	0.75	0.65	0.75
Cumulative firm score	0.82	0.66	0.84
Cumulative foam score	0.73	0.61	0.65

BESS, Balance Error Scoring System.

For those enrolled in year 1 of the study, 1 of the 23 athletes sustained two concussions during the year, whereas 2 additional athletes each sustained one concussion. During year 2 of the study, 1 athlete who sustained a concussion in the previous year was diagnosed with another concussion during year 2. One additional athlete suffered one concussion. For those enrolled in year 2, 3 athletes sustained one concussion each, and 1 athlete sustained two concussions.

Discussion

The baseline BESS score is commonly used on the sideline and in the athletic training room as part of the clinical assessment of concussion. We report a statistically and clinically significant improvement in BESS scores that occurs during the first year of participation in an NCAA Division-I football program and continued subsequent improvement, though not reaching statistical significance, during year 2 in the program. Given that an athlete's expected performance on the BESS improves over time, it becomes a progressively insensitive marker for concussion, amounting to a moving target when evaluating athletes for concussion. A sideline evaluator could evaluate a concussed athlete who produces an improved BESS score as compared to the documented baseline score of the athlete, which may have been collected weeks or months preceding the concussion evaluation. The frequency of reassessing an athlete's baseline BESS score needs be established; however, at what frequency this re-evaluation should occur remains a focus of future research.

The Sport Concussion Assessment Tool – 2nd Edition (SCAT2) utilized a modified BESS test (testing on the firm surface only); a past study on normative data in collegiate athletes reports modified BESS scores.⁷ The Sport Concussion Assessment Tool – 3rd Edition (SCAT3) and The Sport Concussion Assessment Tool – 5th Edition (SCAT5) offer the option of adding the foam surface for further assessment of balance, which improves the sensitivity of balance testing.^2,8 The authors of this study show a statistically significant decrease in the number of errors made in the cumulative score, individual firm surface, and the foam surface after 1 year of participation. Additionally, between the end of year 1 of participation and the end of year 2, further improvement (decreased number of errors made) was noted, although it fell short of statistical significance. The clinically meaningful change of the BESS test is unknown. However, it is safe to assert that if an athlete is improving on the test over time, the sensitivity of the test worsens. Inter-rater and intrarater reliability in this study is consistent with the reliability between experienced scorers as reported by Finnoff and colleagues.⁶

In this study, the number of cumulative errors at baseline was higher than previously published normative data in adults.⁹ The reason for the higher number of errors compared to published normative data is uncertain, although several factors may contribute. These factors may include the different cohorts studied and variability in intrarater and inter-rater reliability in the studies. In addition, the athletes at matriculation are transitioning from high school to college. High school male athletes make more errors on the BESS than college-age male athletes.¹⁰

It is important to remember that the balance testing is only one aspect of the assessment that clinicians may utilize when evaluating athletes for concussion. It is not possible to definitively determine whether athletes purposefully perform poorly on baseline balance testing for secondary gain, (e.g., avoiding a diagnosis of concussion during the athletic season). Thus, patterns of performance must be interpreted by the clinician.¹¹ The athletes in this study were not advised as to the purpose of the baseline BESS testing, and no other objective evidence of “sandbagging” was appreciated by the authors.

Limitations of this study include a relatively small sample size and moderate inter-rater reliability. There are also previously studied limitations on BESS scores, including individual variables such as age of the athlete,⁵ learning effects of repeat testing,^5,12,13 and level of fatigue of the athlete.¹⁴ Eight subjects sustained concussions during the study. It is possible that either the concussion itself, increased exposure to the BESS test, or proprioceptive training associated with participation in the football program affected the results. However, our study is not significantly powered to detect such changes. Further investigation is warranted to evaluate the effect of additional exposure to the BESS, diagnosed concussions, and the proprioceptive training associated with football program participation. These considerations are part of the authors' ongoing data collection and analysis.

These authors also aim to evaluate additional factors that may impact longitudinal changes in BESS scores, including assessing the effects of multiple seasons of Division-I football participation, player position, player demographics, and concussion history. Further understanding of how BESS performance changes over time will clarify the need to modify the BESS test and/or alter the frequency with which repeat baseline testing is performed.

In conclusion, this study shows a statistically and clinically significant improvement after 1 year of participation and continued improvement (though not reaching statistical significance) in BESS scores between the first year and second year of participation in an NCAA Division-I football program. In addition, the cumulative number of errors in this cohort was higher than previously published norms. Therefore, NCAA Division-I football athletes should undergo baseline BESS testing at least yearly if the BESS is to be utilized as a diagnostic test for concussion.

Footnotes

Acknowlegments

The authors thank the student-athletes for participating in this study. The authors also thank Matt Doyle, MS, LAT, ATC; Angela Frady, LAT, ATC, OTC; Johnnie James, MS, LAT, ATC; Megan Probasco, MA, LAT, ATC; Jenna Nagle, LAT, ATC; Jessica Novack, LAT, ATC; Nicholas Rains, LAT, ATC; Doug West, PhD, LAT, ATC, CSCS; Barney Graff, MS, ATC; and Russell Haynes, MS, LAT, ATC, for assistance with data collection.

Author Disclosure Statement

No competing financial interests exist.

References

McCrory

, Meeuwisse

, Aubry

, Cantu

, Dvořák

, Echemendia

, Engebretsen

, Johnston

, Kutcher

, Raftery

, Sills

, Benson

, Davis

, Ellenbogen

, Guskiewicz

, Herring

S.A.

, Iverson

, Jordan

, Kissick

, McCrea

, McIntosh

, Maddocks

, Makdissi

, Purcell

, Putukian

, Schneider

, Tator

, and Turner

(2013). Consensus statement on concussion in sport—the 4th International Conference on Concussion in Sport held in Zurich, November 2012. Clin. J. Sport Med., 23, 89–117.

Guskiewicz

K.M.

, Register-Mihalik

, McCrory

, McCrea

, Johnston

, Makdissi

, Dvorák

, Davis

, and Meeuwisse

(2013). Evidence-based approach to revisiting the SCAT2: introducing the SCAT3. Br. J. Sports Med., 47, 289–293.

Guskiewicz

K.M.

(2001). Postural stability assessment following concussion: one piece of the puzzle. Clin. J. Sport Med., 11, 182–189.

Burk

J.M.

, Munkasy

B.A.

, Joyner

A.B.

, and Buckley

T.A.

(2013). Balance error scoring system performance changes after a competitive athletic season. Clin. J. Sport Med., 23, 312–317.

Mulligan

I.J.

, Boland

M.A.

, and McIlhenny

C.V.

(2013). The balance error scoring system learned response among young adults. Sports Health, 5, 22–26.

Finnoff

J.T.

, Peterson

V.J.

, Hollman

J.H.

, and Smith

(2009). Intrarater and interrater reliability of the Balance Error Scoring System (BESS). PM R, 1, 50–54.

Zimmer

, Marcinak

, Hibyan

, and Webbe

(2015). Normative values of major SCAT2 and SCAT3 components for a college athlete population. Appl. Neuropsychol. Adult, 22, 132–140.

Echemendia

R.J.

, Meeuwisse

, McCrory

, Davis

G.A.

, Putukian

, Leddy

, Makdissi

, Sullivon

S.J.

, Broglio

S.P.

, Raftery

, Schneider

, Kissick

, McCrea

, Dvorak

, Sills

A.K.

, Aubry

, Engebretsen

, Loosemore

, Fuller

, Kutcher

, Ellenbogen

, Guskiewicz

, Patricios

, and Herring

(2017). The Sport Concussion Assessment Tool 5^th Edition (SCAT5): Background and rationale. Br. J. Sport Med., 51, 848–850.

Iverson

G.L.

, and Koehle

M.S.

(2013) Normative data for the balance error scoring system in adults. Rehabil. Res. Pract., 2013, 846418.

10.

Covassin

, Elbin

R.J.

, Harris

, Parker

, and Kontos

(2012) The role of age and sex in symptoms, neurocognitive performance, and postural stability in athletes after concussion. Am. J. Sports Med., 40, 1303–1312.

11.

Schatz

, and Glatts

(2013) “Sandbagging” baseline test performance on ImPACT, without detection, is more difficult than it appears. Arch. Clin. Neuropsychol., 28, 236–244.

12.

Valovich

T.C.

, Perrin

D.H.

, and Gansneder

B.M.

(2003). Repeat Administration Elicits a Practice Effect With the Balance Error Scoring System but Not With the Standardized Assessment of Concussion in High School Athletes. J. Athl. Train., 38, 51–56.

13.

Valovich McLeod

T.C.

, Perrin

D.H.

, Guskiewicz

K.M.

, Shultz

S.J.

, Diamond

, and Gansneder

B.M.

(2004). Serial administration of clinical concussion assessments and learning effects in healthy young athletes. Clin. J. Sport Med., 14, 287–295.

14.

Wilkins

J.C.

, Valovich McLeod

T.C.

, Perrin

D.H.

, and Gansneder

B.M.

(2004). Performance on the Balance Error Scoring System Decreases After Fatigue. J. Athl. Train., 39, 156–161.