Abstract
Kutcher and Giza suggested incorporating levels of certainty in concussion diagnosis decisions. These guidelines were based on clinical experience rather than objective data. Therefore, we combined data-driven optimization with predictive modeling to identify which athletes are unlikely to have concussion and to classify remaining athletes as having possible, probable, or definite concussion with diagnostic certainty. We developed and validated our framework using data from the Concussion Assessment, Research, and Education (CARE) Consortium. Acute concussions had assessments at <6 h (n = 1085) and 24–48 h post-injury (n = 1413). Normal performances consisted of assessments at baseline (n = 1635) and the time of unrestricted return to play (n = 1345). We evaluated the distribution of acute concussions and normal performances across risk categories and identified inter-class and intra-class differences in demographics, time-of-injury characteristics, the Standard Assessment of Concussion (SAC), Sport Concussion Assessment Tool (SCAT) symptom assessments, and Balance Error Scoring System (BESS). Our algorithm accurately classified concussions as probable or definite (sensitivity = 91.07–97.40%). Definite and probable concussions had higher SCAT symptom scores than unlikely and possible concussions (p < 0.05). Definite concussions had lower SAC and higher BESS scores (p < 0.05). Baseline to post-injury change scores for the SAC, SCAT symptoms, and BESS were significantly different between acute possible and probable concussions and normal performances (p < 0.05). There were no consistent patterns in demographics across risk categories, although a greater proportion of concussions classified as unlikely were reported immediately compared with definite concussions (p < 0.05). Although clinical interpretation is still needed, our data-driven approach to concussion risk stratification provides a promising step toward evidence-based concussion assessment.
Introduction
Concussion, a type of traumatic brain injury, is an important public health issue that has been associated with potential long-term health consequences. 1 Accurate diagnosis and proper post-injury management of concussion are pragmatic steps for mitigating possible consequences. 2 However, the clinical diagnosis of concussion is challenging for many reasons, 3,4 especially in the sporting environment, which requires rapid injury assessment and injury management decisions.
Currently, no diagnostic marker or clinical assessment has been designed for perfectly identifying concussion. To this end, existing guidelines call for the evaluation of concussion along multiple domains, including symptom presentation, neurocognitive status, and a physical examination. 5 –7 Previous studies analyzing such multi-dimensional testing batteries for acute concussion assessment have found that symptom evaluations were the most sensitive component of these batteries. 8 –12 However, an over-reliance on symptom presentation for concussion diagnosis is troublesome for several reasons. First, symptoms may be under-reported or go completely unrecognized. 13 –15 Further, common symptoms (e.g., headache, dizziness, fatigue) which are indicative of concussion are not necessarily specific to concussion. Finally, rapid changes in symptom and neurocognitive presentation within the acute stages of concussion result in highly variable clinical presentation of concussion across multiple patients. 16,17
To address these challenges, Kutcher and Giza recommended incorporating diagnostic certainty to the assessment of concussion. 4 That is, rather than a binary diagnosis paradigm (i.e., concussion or no concussion), Kutcher and Giza suggested that concussion diagnosis should be relayed across a spectrum of risk categories (e.g., possible, probable, and definite concussion), with each category reflecting the degree to which a concussion diagnosis is certain. Similar risk-based categories have been used for classifying diagnosis decisions for other diseases, including multiple sclerosis, 18 Alzheimer's disease, 19 and diabetes. 20 Compared with traditional binary diagnosis, risk-based diagnosis frameworks account for the evolution of the injury over time and allow for more flexibility in the post-injury management of concussion. Specifically, incorporating certainty in the assessment of concussion can help to determine whether an athlete should be managed as if he or she had a concussion, ultimately improving the quality of patient care. However, the guidelines developed by Kutcher and Giza were based on clinical experience rather than objective data.
Therefore, the goal of our study is to create a data-driven modeling framework to identify concussed and non-concussed athletes who are unlikely to have a concussion and classify the remaining athletes as having possible, probable, or definite concussion, with each category reflecting increasing diagnostic certainty. Although experienced clinicians may be able to quickly synthesize the likelihood of concussion and ultimately identify a post-injury management plan for athletes, our data-driven framework provides a more objective approach, which can ultimately benefit those clinicians who may be inexperienced in managing concussion. We then aim to validate our framework to identify how athletes are distributed across each risk classification and identify differences in demographics, time-of-injury characteristics, and standard assessment scores among athletes under each risk classification.
Methods
Study population and design
To develop our methodology for classifying athletes as having unlikely, possible, probable, or definite concussion, we used data from the Concussion Assessment, Research, and Education (CARE) Consortium. 21 The CARE Consortium defines concussion as “a change in brain function following a force to the head, which may be accompanied by temporary loss of consciousness, but is identified in awake individuals with measures of neurologic and cognitive dysfunction.” 22 These acquired data contain 33,271 player-seasons collected during the 2014–2018 academic years from 29 National College Athletic Association (NCAA) universities and military service academies. Player-season data were collected across male (57.58%) and female (42.42%) participants from 27 sports, including 19.8% from football, 12.1% from cross country/track, and 9.6% from soccer. The data include 24,561 athletes with pre-season baseline evaluations and 1950 concussions across 1755 athletes. For student-athletes (hereafter, referred to as just athletes) who were diagnosed with concussion by the local institution's medical staff (e.g., team physicians and athletic trainers), additional post-injury data were collected within 6 h of the injury (< 6 h), 24–48 h post-injury (24–48 h), when he or she was identified as asymptomatic, when he or she was cleared for unrestricted return to play (RTP), and 6 months post-RTP. We note that some athletes may not have completed a post-injury assessment at every time point. Therefore, there is some missingness in the data leading to unequal sample sizes across study cohorts. However, these imbalances are not to a degree that has significant effects on the methodologies employed in this research. All participants provided written consent that was approved by their local institutional review board and the United States Army Human Research Protection Office.
Sample selection
In our analysis, we focused on the time points at baseline, <6 h, 24–48 h, and unrestricted RTP. We only included baseline data that could be matched with post-injury data. The assessments at <6 h and 24–48 h were denoted “acute concussion” and those from baseline and unrestricted RTP were denoted “normal performance.” We consider those from the unrestricted RTP time point to demonstrate a normal performance because the subjects had been cleared for RTP by each institution's local medical staff. We analyzed data from <6 h and 24–48 h separately.
Study variables
For each participant in the study data, we obtained demographic information along with raw scores on the Standard Assessment of Concussion (SAC), Sport Concussion Assessment Tool (SCAT) symptom survey, and the Balance Error Scoring System (BESS) at baseline. For those diagnosed with concussion, we obtained time-of-injury characteristics along with raw scores for SAC, SCAT symptoms, and BESS scores at each post-injury evaluation time point. We computed the change score for these athletes by subtracting the raw score at baseline from the raw score at each post-injury time point. A positive change score indicated an increase in the measure compared with baseline, whereas a negative change score indicated a decrease compared with baseline. We also filled missing data elements using multiple imputation by chained equations. 23 We describe our study variables in more detail subsequently.
Demographic information
We aimed to identify differences in each risk classification by age, sex, and the number of previous concussions. Previous studies have suggested that younger athletes, females, and those with greater concussion history are at increased risk for concussion. 24 –29
Time-of-injury characteristics
In our analysis, we included whether the athlete experienced loss of consciousness (LOC), post-traumatic amnesia (PTA), or retrograde amnesia (RGA); whether the athlete was removed from play immediately; and whether the injury was reported immediately, as these variables have been suggested to impact concussion risk. 30 –33
SAC
The SAC is a neurocognitive assessment that measures orientation, immediate memory, concentration, and delayed recall. 34 In our analysis, we focused on the SAC total score and change score, both of which summarize the SAC assessment.
SCAT symptom survey
Symptom presentation has been shown, in numerous studies, to be highly associated with acute concussion. 8 –12 The SCAT symptom evaluation includes 22 symptoms, each of which is rated on a scale of 0–6 based on severity. 35 In our analysis, we included the total symptom severity and the total number of symptoms, in addition to their respective change scores.
BESS
The BESS is a physical examination that measures postural stability by assessing the number of “movement errors” committed by an athlete while attempting to hold different stances. 36 Balance has been noted to be affected by concussion, and we included the BESS total score (across all six stances) and change score in our analysis.
Data analysis
Our overall framework for classifying unlikely, possible, probable, or definite concussion is summarized in Figure 1. To create and evaluate our models, we divided our post-injury data into a training set and a validation set. The training set consisted of all data collected between January 23, 2014 and November 29, 2016, whereas the validation data consisted of all data collected after that date (i.e., November 30, 2016 to October 2, 2017). The CARE Consortium protocol for concussion diagnosis along with assessments performed at baseline and post-injury remained unchanged during this period and, therefore, rater drift, if there was any, was minimal. We used our training set to develop the models to determine which athletes should be classified under each risk category. Then, we applied our models to the validation data to evaluate and analyze our framework. We describe each of the steps in our methodology in more detail subsequently.

Illustration of methodological framework for developing data-driven models that were used to classify athletes as having unlikely, possible, probable, or definite concussion based on certainty of acute concussion. CART, Classification Tree; ADASYN, Adaptive Synthetic Sampling.
Model calibration
Using a randomly chosen 40% of the training data, we created a logistical regression model for estimating risk scores. For any athlete, risk scores are a scalar between 0 and 1, where greater risk scores indicate higher likelihood of acute concussion. We used a previously published and validated multivariate logistical regression model (i.e., the raw score model) to estimate risk scores associated with athletes at <6 h and 24–48 h. 12 We used the raw score models, because change scores could not be computed for baseline data and may not always be available for acute concussion assessments in clinical settings. Because time-of-injury characteristics were not available for baselines but were part of these previous models, we assumed in this logistical regression analysis only that for baseline data, injuries were reported immediately and that participants were removed from play immediately.
With the remaining 60% of the training data, we determined risk score thresholds to identify unlikely and definite concussions. We first applied our logistical regression models to this subset of training data to obtain risk scores for each athlete. Then, we used these risk scores as the input for a previously developed data-driven optimization algorithm to determine risk score thresholds (Garcia, G.-G.P., Lavieri, M.S., Jiang, R., McCrea, M., McAllister, T.W., Broglio, S.P., and CARE Consortium Investigators. Data-driven stochastic optimization approaches to determine decision boundaries for medical diagnosis. Submitted for publication.) This algorithm identifies an upper and lower risk threshold by maximizing sensitivity and specificity while limiting false-positive and false-negative rates. Athletes with risk scores below the lower threshold represent those whose concussion probability was low, and who therefore would be identified as having unlikely concussion. Similarly, athletes with risk scores above the upper threshold are most likely to have a concussion and would be classified as having a definite concussion. In designing these thresholds, we favored higher sensitivity over lower false-positive rates and lower false-negative rates over higher specificity.
After determining the upper and lower risk score thresholds, we identified athletes in the training set with risk scores between the unlikely and definite thresholds and used them to determine how athletes should be classified as possible or probable, as these cases could not be easily distinguished by our logistical regression model. Categorization of these cases was approached using a classification and regression tree (CART) 37 analysis. CART is a non-parametric statistical modeling technique that produces a decision tree for prediction and is capable of handling categorical variables and continuous variables. Compared with other predictive modeling methods (e.g., generalized linear models), CART is advantageous in its interpretability and ability to model highly non-linear relationships between variables. Because of the higher proportion of normal performances to acute concussions in our data, we applied adaptive synthetic sampling (ADASYN) to mitigate data imbalance issues before creating a CART. 38 Additionally, for this CART, we restricted the resulting decision tree to include only variables that were available for all time points. That is, the resulting decision tree did not include time-of-injury characteristics and change scores for the SAC, SCAT symptom assessments, and the BESS, as they were not available for baseline data. Athletes who were predicted to have acute concussions by this CART were classified as having probable concussions, whereas those who were predicted to be normal performances were classified as having possible concussions.
Model validation
To implement our models, we applied our logistical regression models to the validation data to obtain risk scores for each athlete. Then, we compared these risk scores to the upper and lower thresholds we generated using our optimization algorithm in the model calibration phase. Athletes with risk scores below the lower threshold were classified as having unlikely concussions, whereas athletes with risk scores above the upper threshold were classified as having definite concussions. We then applied our CART to any athlete with a risk score between these thresholds to classify them as having a possible or probable concussion.
Model evaluation
After implementing our models on the validation data, we performed additional analysis to evaluate the performance of our classification framework. The goals of this analysis were to (1) analyze how our models classified acute concussions and normal performance throughout each risk category and (2) identify inter-class differences (i.e., differences across different risk classifications) and intra-class differences (i.e., differences within the same risk classification) in demographics, time-of-injury characteristics, and standard assessment scores for acute concussions and normal performances among the risk classifications.
To achieve the first goal, we determined the percentage of acute concussions and normal performances within each risk classification at both <6 h and 24–48 h. Ideally, data captured in the acute post-injury state should place the athlete in greater risk classifications (i.e., definite or probable), whereas data captured at baseline should place the athlete in lower risk classifications (i.e., unlikely or possible). We compared the distribution of acute concussions and normal performances using the Kolmogorov–Smirnov test. A significant p value for this test implies that the distribution of acute concussions and normal performances among the risk classifications is dissimilar.
Because our diagnosis scheme consisted of four risk categories instead of two, we also computed a modified sensitivity and specificity. Our modified computation was founded in recommendations by Kutcher and Giza who indicated that probable and definite concussions should be managed as concussions, whereas possible concussions should be managed based on clinical judgment. Further, we assume that unlikely concussions are managed as non-concussions. Therefore, we provide a sensitivity range where the lower bound reflects the proportion of acute concussions that are correctly classified as probable and definite, and the upper bound reflects the proportion of acute concussions correctly classified as possible, probable, and definite. We also provide a range for specificity, where the lower bound reflects a situation in which no patients with possible concussions are treated as non-concussed, and where the upper bound reflects a situation in which all patients with possible concussions are treated as non-concussed. In practice, the true sensitivity and specificity should fall between these bounds, depending on how possible concussions are managed.
To achieve the second goal, we first identified inter-class differences in the study variables across each risk classification for acute concussions and normal performances using analysis of variance (ANOVA) tests with Tukey's post-hoc comparisons. For example, we determined if athletes with acute concussions who were classified as having a probable concussion had any differences in SAC, SCAT symptoms, or BESS compared with those with acute concussions who were classified as having definite concussions. Next, using Student's t test, we identified intra-class differences in the study variables between acute concussions and normal performances within each risk classification. All models were created and analyzed using Python 3.5.2 (Python Software Foundation, Beaverton, OR).
Results
In Table 1, we summarize the study data at each time point with respect to the study variables. Across all time points, there were significant differences between training and validation data in height (p = 0.0082–0.047), weight (p = 0.012–0.047), and number of previous concussions (p < 0.001 for all). There were also significant differences in age at baseline (p = 0.0012) and 24–48 h (p = 0.021) and the proportion of males with unrestricted RTP (p = 0.013). Among post-injury assessments, there were significant differences between SAC raw scores at baseline (p = 0.00085), <6 h (p = 0.038), and at unrestricted RTP (p = 0.0038), SCAT total symptoms raw score at unrestricted RTP (p = 0.027), and BESS raw score at <6 h (p = 0.048) and 24–48 h (p = 0.00098).
Data Characteristics of Training and Validation Set with Respect to Each Time Point
Change score at a time point is computed as: raw score at time point - raw score at baseline.
Significantly different from validation data at same time point based on Student's t test (p < 0.01).
Significantly different from validation data at same time point based on Student's t test (p < 0.05).
The variable was not available for baseline data.
RTP, return to play; LOC, loss of consciousness; PTA, post-traumatic amnesia; RGA, retrograde amnesia; SAC, Standard Assessment of Concussion; SCAT, Sport Concussion Assessment Tool; BESS, Balance Error Scoring System; NA, variable not available for baseline data.
Multivariate logistical regression
The model variables and corresponding coefficient values for the multivariate logistical regression models at <6 h and 24–48 h are shown in Table 2. At <6 h, all variables were significant except for whether the injury was reported immediately (p = 0.16), SAC raw score (p = 0.13), and BESS raw score (p = 0.080). At 24–48 h, all variables were significant except for SAC raw score (p = 0.23) and BESS raw score (p = 0.94).
Multivariate Logistical Regression Coefficients at <6 h and 24–48 h
SE, standard error; SAC, Standard Assessment of Concussion; SCAT, Sport Concussion Assessment Tool;
BESS, Balance Error Scoring System; NA, variable not included in this model.
Classifying unlikely, possible, probable, and definite concussion
We obtained risk score thresholds after applying the training data to our optimization algorithm. At <6 h, the lower threshold was 0.047 and the upper threshold was 0.33. At 24–48 h, the lower threshold was 0.07 and the upper threshold was 0.46. The CARTs we developed for <6 h and 24–48 h are shown in Figure 2.

Classification tree for determining possible and probable concussions at <6 h
We now provide an example to illustrate how these risk thresholds and CART can be used to determine whether an athlete should be classified as having an unlikely, possible, probable, or definite concussion.
Consider the case of a 19-year-old female athlete who is being assessed for acute concussion 24–48 h after injury. She did not report the injury immediately and in her post-injury assessments, obtained total scores of 30 and 12 on the SAC and BESS, respectively. On the SCAT symptom assessment, she reported four total symptoms with a total severity of 6. Using the logistical regression model for 24–48 h, her risk score is equal to 0.36. Because her risk score is less than the upper threshold of 0.46 and greater than the lower threshold of 0.07 at 24–48 h, she is not classified as having a definite or unlikely concussion. To determine if she has a possible or probable concussion, one would refer to the CART for 24–48 h. Because her SCAT symptom severity raw score is not 0, her SAC raw score is >25, and her SCAT total symptoms raw score is >1, she would be classified as having a probable concussion.
To provide an additional example, consider the case of a 21-year-old male athlete who was assessed for concussion within 6 h of a suspected injury. His injury was not reported immediately and he was not removed from play immediately. His SAC raw score and BESS raw score were 24 and 12, respectively. He also reported one symptom with a severity of 1. Based on these values, his risk score is equal to 0.22. Because his risk estimate is between the lower and upper thresholds of 0.047 and 0.33 at <6 h, respectively, then he must either have a possible or a probable concussion. Because his SCAT symptom severity raw score is ≤4, the CART analysis at <6 h would classify him as having a possible concussion.
Distribution of acute concussions and normal performances
The distribution of acute concussions and normal performances within each risk classification is shown in Table 3. At <6 h, 434 (80.52%) of acute concussions were classified as definite concussion whereas only 14 (2.60%) were classified as unlikely concussion. Among the remaining acute concussions, 31 (5.75%) were classified as possible concussion and 60 (11.13%) were classified as probable concussion. When the <6 h algorithm was applied to normal performance data (i.e., baseline and unrestricted RTP), 696 (46.00%), 526 (34.77%), 189 (12.49%), and 102 (6.74%) were classified as unlikely, possible, probable, and definite concussion respectively. With the 24–48 h algorithm, 522 (75.22%) acute concussions were classified as definite concussion whereas 21 (3.03%) were classified as unlikely concussion. There were 41 (5.91%) and 110 (15.85%) acute concussions classified as possible and probable concussion, respectively. Among the normal performances, 714 (47.19%), 397 (26.24%), 309 (20.42%), and 95 (6.15%) were classified as unlikely, possible, probable, and definite concussion, respectively. With both <6 h and 24–48 h algorithms, the distributions among risk classifications were different between acute concussions and normal performances based on the Kolmogorov–Smirnov test (p < 0.001). Additionally, the distribution of baselines and unrestricted RTP across the risk classifications was also significantly different at both <6 h and 24–48 h (p < 0.001).
Distribution of Acute Concussions and Normal Performances among Risk Classifications at <6 h and 24–48 h
Distributions of acute concussions and normal performances within risk classifications are significantly different at p < 0.001 using Kolmogorov–Smirnov test.
Distributions of unrestricted return to play (RTP) and baseline time points within risk classifications are significantly different at p < 0.01 using Kolmogorov–Smirnov test.
Using our modified calculation for sensitivity and specificity, we obtained a sensitivity range of 91.65–97.40% with the <6 h algorithm and 91.06–97.00% with the 24–48 h algorithm. We also obtained a specificity range of 46.00–80.77% with the <6 h algorithm and 47.19–73.43% with the 24–48 h algorithm, respectively.
As an ancillary analysis, we performed our analysis without the unrestricted RTP data (data not shown). The resulting logistical regression model, risk score thresholds, and CART models led to a distribution with a sensitivity of 89.42–98.52% and a specificity of 23.21–71.60% at <6 h. At 24–48 h, the sensitivity and specificity ranged from 85.73% to 95.67% and from 41.52% to 71.60%, respectively.
Inter-class differences
The inter-class differences for acute concussions and normal performances are shown in Tables 4 and 5, respectively. Among acute concussions, all mean raw and change scores for SCAT symptom assessments of unlikely and possible concussions are significantly different from definite concussions at both <6 h and 24–48 h (p < 0.001 for all). Among the SAC and BESS at <6 h, only the SAC change score is not significantly different between definite and unlikely concussions. In contrast, at 24–48 h, only the BESS raw score is significantly different between definite and unlikely concussions (p = 0.021). Possible and probable concussions are significantly different in SCAT total symptoms raw score at <6 h and 24–48 h (p = 0.0027–0.0082). They are also significantly different in SAC change score (p = 0.013), SAC raw score (p < 0.001), and SCAT total symptoms change score (p = 0.012) at 24–48 h.
Comparison of Study Variables for Acute Concussions Classified as Unlikely, Possible, Probable, and Definite Concussion at <6 h and 24–48 h
Change score at a time point is computed as: raw score at time point - raw score at baseline.
Significantly different (p < 0.05) from normal performances in the same risk classification and timepoint based on Student's t test
Significantly different (p < 0.05) from definite concussion at the same timepoint based on Tukey's post-hoc pairwise comparisons.
Significantly different (p < 0.05) from probable concussion at the same timepoint based on Tukey's post-hoc pairwise comparisons.
LOC, loss of consciousness; PTA, post-traumatic amnesia; RGA, retrograde amnesia; SAC, Standard Assessment of Concussion; SCAT, Sport Concussion Assessment Tool; BESS, Balance Error Scoring System
Comparison of Study Variables for Normal Performances Classified as Unlikely, Possible, Probable, and Definite Concussion At <6 h and 24–48 h
Change score at a time point is computed as: raw score at time point - raw score at baseline.
Significantly different (p < 0.05) from acute concussions in the same risk classification and time point based on Student's t-test.
Variable not available for baseline data.
Significantly different (p < 0.05) from no concussion at the same time point based on Tukey's post-hoc pairwise comparisons.
Significantly different (p < 0.05) from possible concussion at the same time point based on Tukey's post-hoc pairwise comparisons.
LOC, loss of consciousness; PTA, post-traumatic amnesia; RGA, retrograde amnesia; SAC, Standard Assessment of Concussion; SCAT, Sport Concussion Assessment Tool; BESS, Balance Error Scoring System.
For normal performances, the mean raw scores for the SAC, SCAT symptom severity, SCAT total symptoms, and the BESS among definite and probable concussions were significantly different from those for unlikely concussion (p < 0.001 for all), except for SAC raw score at 24–48 h. At <6 h and 24–48 h, possible and probable concussions were significantly different in SCAT symptom severity raw score and SCAT total symptoms raw score (p < 0.001 for all). At 24–48 h, possible and probable concussions were also significantly different in SAC raw score and BESS raw score (p < 0.001 for all).
Intra-class differences
The intra-class differences are highlighted in Tables 3 and 4. Among those classified as having possible concussions, acute concussions and normal performances are significantly different in SCAT symptom severity (p < 0.001 at <6 h, p = 0.016 at 24–48 h) and SCAT total symptoms raw score (p < 0.001 at <6 h, p = 0.0019 at 24–48 h). There are also significant differences in SAC raw change scores (p = 0.0026) and raw scores (p = 0.046) for acute concussions and normal performances classified as Possible concussion at 24–48 hours. Among probable concussions at <6 h and 24–48 h, acute concussions and normal performances are significantly different in change scores for SCAT symptom severity (p = 0.0012–0.0077), SCAT total symptoms (p = 0.0093 at <6 h, p < 0.001 at 24–48 h), and BESS (p = 0.0074 at <6 h, p < 0.001 at 24–48 h). They are also significantly different in SCAT symptom severity raw score at <6 h (p < 0.001) and SCAT total symptoms raw score at 24–48 h (p < 0.001).
To illustrate how these intra-class differences can be used to inform clinical decision making, we revisit the examples from the Classifying unlikely, possible, probable, and definite concussion subsection. If one considers the first athlete (19-year-old female) and supposes that her change scores for the SAC, SCAT symptom severity, SCAT total symptoms, and BESS are 0, 6, 4, and −5, respectively, based on intra-class differences identified in this study for 24–48 h, there were significant differences between acute concussions and normal performances for the SCAT symptom severity change score, SCAT total symptoms change score, SCAT total symptoms raw score, and the BESS change score. Comparing this athlete's assessments with the mean values for probable concussions presented in Tables 4 and 5, we find that the athlete's situation is more comparable to acute concussion in terms of change scores for the SCAT symptom severity and total number of symptoms. Conversely, her situation is more comparable to the normal performances in terms of the BESS change score. Following the conservative decision-making approaches that are recommended for concussion management, one could treat this athlete as if she had an acute concussion.
If one considers the 21-year-old male athlete and supposes that his change scores for the SAC and BESS were 0 and 5 respectively, and that additionally, his SCAT symptom severity and total symptoms both decreased by 4 compared with baseline, based on intra-class differences identified in this study for the possible concussion group at <6 h, there were significant differences between acute concussions and normal performances in the SCAT symptom severity and SCAT total symptom raw scores. Comparing this athlete's values in these measures (of one symptom reported with a severity of 1) to the mean values obtained in our analysis, we find that this athlete's situation more closely resembles the normal performances within the possible concussion group despite the low SAC raw score and high BESS change score. These results could potentially indicate that additional assessments should be performed on this athlete to confirm the possibility that this athlete is not concussed.
Discussion
Kutcher and Giza proposed a risk-based classification framework for diagnosing acute concussion developed from clinical experience.
4
Compared to traditional binary diagnosis, this framework allows the assessment of acute concussion to reflect the physician's diagnostic certainty. Further, taking this approach allows the injury diagnosis to evolve as the injury evolves and more information becomes available. However, although these authors provided clinical guidelines for each risk classification, they did not provide specific criteria with respect to commonly recommended and implemented concussion assessment tools. In this research, we designed and evaluated a novel data-driven method for classifying athletes evaluated for acute concussion as having either unlikely, possible, probable, or definite concussion. The major contributions of our research are as follows: We develop an objective and data-driven framework that stratifies acute concussion assessment by diagnostic certainty. These risk categories lay the foundation for guiding post-injury management decisions. We identify key characteristics that can be used to differentiate between acute concussions and normal performances in each risk category. We provide additional, quantitative support for the value of a multi-dimensional battery, the use of change scores in acute concussion assessment, and the potential implications for several demographic factors and time-of-injury characteristics in acute concussion assessment.
The variables used in our logistical regression and CART models are parts of standard concussion assessment batteries, giving foundation for our framework to be used in sporting environments. To our knowledge, we are the first to combine predictive modeling techniques (i.e., logistical regression and CART) and optimization algorithms to classify athletes into concussion risk categories. Erring in the direction of minimizing false negatives, our framework classified most acute concussions (91.07–91.65%) into the higher risk categories (i.e., probable and definite concussion) and most normal performances (73.43–80.77%) into the lower risk categories (i.e., unlikely and possible concussion). Additionally, few acute concussions were classified as unlikely concussion (2.60–3.03%) and few normal performances were classified as definite concussion (6.15–6.74%).
Our most important finding was that athletes classified as having definite concussion had lower SAC, higher SCAT symptom, and higher BESS scores than the other risk categories. In comparing these risk groups, definite concussions exhibited noticeably more symptoms and greater symptom severity than the other risk categories, whereas the Unlikely concussions exhibited mean symptom severities and mean total symptoms close to 0. Definite concussions also had much higher BESS raw scores and lower SAC scores than unlikely concussions. These findings demonstrate the ability of our framework to separate the “easy” cases from the “hard” cases, and are consistent with previous research demonstrating that symptoms are typically the most sensitive to acute concussion. 8 –12 Our findings also provide support for the utility of using neurocognitive assessments and postural control measures for acute concussion assessment, as demonstrated by previous research. 8,26,39 –46
However, among those classified as possible or probable concussions, raw scores for SCAT symptom severity and total symptoms are significantly less for acute concussions in the possible concussion group compared with all baselines (p < 0.01 for both measures at <6 h and 24–48 h using Student's t test). Additionally, there are no significant differences in the SCAT symptom severity or total symptoms between acute concussions and normal performances in the probable risk category. These findings demonstrate the difficulty in identifying all acute concussions using symptom raw scores alone. Fortunately, there were some significant differences between acute concussions and normal performances in the possible and probable risk categories for change scores in the SAC, SCAT symptom severity, SCAT total symptoms, and BESS. This result suggests that change scores, which require baseline assessments, have added value when evaluating possible and probable concussions, and are an important finding regarding the utility of the baseline assessment.
In our analysis, we also sought out to identify differences in athlete demographics and time-of-injury characteristics across and within risk classifications. There were statistically significant differences in age, sex, and number of previous concussions between acute concussions and normal performances within some risk categories. For example, among those classified as having definite concussions, the athletes were, on average, older than those providing normal performances at both <6 h and 24–48 h. Outside of age, there were no other consistent demographic differences and risk categories. For time-of-injury variables among acute concussions, a larger proportion of those classified as having unlikely concussion reported the injury immediately and were removed from play immediately compared with those who were classified as having definite concussions. This result suggests that those athletes who were removed from play immediately or assessed immediately after injury may have been in the earliest stages of an evolving injury wherein neurocognitive declines, increased symptoms, and worsening postural control emerge over time. 16,17,47 However, we note that very few acute concussions were classified as unlikely concussion, and because of this small sample size, this point may require further investigation.
We also found that baselines comprised most normal performances within the probable and definite risk categories. Despite varying parameter settings to balance the sensitivity and specificity of our results, we were unable to drastically improve on the proportion of baselines in these upper risk categories. This finding may be the result of performance differences between baseline and unrestricted RTP time points. Specifically, the baseline data showed lower SAC scores, higher SCAT symptom assessment scores, and higher BESS scores than unrestricted RTP data (p < 0.001 for all measures and in both training and validation sets). As a result, our logistical regression model categorized the baseline performance of some athletes into the higher risk categories. The performance discrepancy between baseline and unrestricted RTP time points is consistent with previous studies 12,16,48 –50 and may be attributed to comorbidities 51 or learning effects from multiple assessments prior to return to play. 41,52 Future works may be able to address this shortcoming by incorporating individual items from the SAC, SCAT symptom assessments, and the BESS instead of using total scores. Regardless, this finding highlights the need for clinicians to interpret the administered assessments in the context of the injury, such as an observed mechanism, and differentiate from other injuries and conditions with similar signs and symptoms. 1,4,33,53
To account for clinical judgment in our methodology, we used a modified range-based computation for sensitivity and specificity, which provided a 6% sensitivity increase and >34% specificity increase in possible concussion management. Furthermore, the sensitivity of our algorithm mirrors those reported in previous studies (80.0–100.0%) evaluating concussion testing batteries for acute concussion assessment. 8,9,39,54 Methodological differences between our study and these aforementioned studies account for some differences, including the test battery assessments. For example, both Broglio and colleagues 39 and Resch and colleagues 9 used the Sensory Organization Test (SOT) for balance assessment instead of the BESS. Whereas both the SOT and BESS reveal similar post-concussion trends in postural control deficits, 40 the SOT has less clinical applicability given its size and cost. Additionally, the diagnosis criteria differed greatly across each study. McCrea and colleagues, 8 Broglio and colleagues, 39 and Putukian and colleagues 54 used different measures of significant change to indicate concussion whereas Resch and colleagues 9 used both predictive discriminant analyses and clinical interpretation guidelines. In comparison, we paired a data-driven optimization framework with predictive modeling methods (i.e., logistical regression and CART) to classify athletes into risk categories. By using predictive modeling methods, we were able to simultaneously incorporate demographic information and time-of-injury characteristics, along with SAC, SCAT symptoms, and BESS results. Finally, the concussed sample used in the present study (n = 1085 for <6 h and n = 1413 for 24–48 h) is much larger than those in the aforementioned studies (n = 32–166).
From a clinical perspective, previous studies have discussed the importance and value of taking a heterogeneous and targeted approach to concussion management. 55 –57 However, because the focus of this study was on identifying acute concussion, it does not address injury heterogeneity by accounting for potential concussion subtypes or clinical profiles. However, our work lays the foundation for doing so using clustering or clinically determined approaches.
Our study is not without limitations. First, we acknowledge that our framework does not provide a recommendation for post-injury management for athletes classified in each risk category. These post-injury decisions are beyond the scope of our study and are an important topic for future research. To this end, clinicians can still benefit from knowing the degree of certainty in a diagnosis decision before determining the next course of action. Second, our study treats all concussions in the CARE data as true concussions, regardless of the medical staff certainty. Therefore, there is the possibility that our models were trained and validated on athletes who were not actually concussed but were labeled so. Third, the differences between our training and validation data in important clinical measures such as PTA or LOC may have caused differences in the presentation of concussion between those two groups, potentially explaining some of the prediction errors in our models. Determining training and validation data using random subset selection instead of by a time-based cutoff could lead to a more homogeneous division in data and ultimately, improved modeling results. Fourth, because our study data only included athletes 18–22 years of age, we cannot directly apply our results to populations beyond this group. Therefore, future studies should focus on other population groups, such as those engaged in youth sports and professional athletes, to determine the generalizability of our results beyond our study population. Fifth, we were limited in our ability to include change scores and time-of-injury characteristics in our models, as these measures were not available for baseline data. Finding ways to incorporate such variables in future analysis may improve our results. Further, our analysis focused on the SAC, SCAT symptom assessments, and BESS. Data limitations precluded our ability to include assessments such as the SOT, computer-based neurocognitive testing, the King–Devick test, and/or the Vestibular/Ocular Motor Screening Assessment, which have shown promise in other investigations. 58 –61 Finally, as there is no gold standard for concussion diagnosis, we did not have a comparative mechanism for our results.
The objective, algorithmic approach we proposed and developed for risk-based classification of athletes undergoing acute concussion assessment extends the original framework proposed by Kutcher and Giza. 4 By applying predictive modeling and optimization methods, our work provides a promising first step in taking an evidence-based approach to acute concussion assessment stratification. Although the clinical examination remains the gold standard for concussion diagnosis, the models we have designed and analyzed have the potential to provide valuable decision support for clinicians.
Footnotes
Acknowledgments
The material is based on work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE 1256260. This publication was made possible, in part, with support from the Grand Alliance CARE Consortium, funded, in part, by the NCAA and the Department of Defense (DoD). The United States Army Medical Research Acquisition Activity, 820 Chandler Street, Fort Detrick MD 21702-5014 is the awarding and administering acquisition office. This work was supported by the Office of the Assistant Secretary of Defense for Health Affairs through the Psychological Health and Traumatic Brain Injury Program under Award No. W81XWH-14-2-0151. Opinions, interpretations, conclusions, and recommendations are those of the author(s) and are not necessarily endorsed by the Department of Defense (DoD) (Defense Health Program [DHP] funds). We thank April Marie (Reed) Hoy, MS, ATC (Azusa Pacific University), Joseph B. Hazzard Jr., EdD, ATC (Bloomsburg University), Louise A. Kelly, PhD (California Lutheran University), Justus D. Ortega, PhD (Humboldt State University), Nicholas Port, PhD (Indiana University), Margot Putukian, MD (Princeton University), Gerald McGinty, DPT and Jonathan C. Jackson, PhD (United States Air Force Academy), Kenneth L. Cameron, PhD, MPC, ATC (United States Military Academy), Christopher Giza, MD (University of California Los Angeles), Holly J. Benjamin, MD (University of Chicago), Thomas Buckley, EdD, ATC and Thomas W. Kaminski, PhD, ATC (University of Delaware), James R. Clugston, MD, MS (University of Florida), Julianne D. Schmidt, PhD, ATC (University of Georgia), Louis A. Feigenbaum, DPT, ATC (University of Miami), James T. Eckner, MD, MS (University of Michigan), Kevin M. Guskiewicz, PhD, ATC and Jason P. Mihalik, PhD, ATC (University of North Carolina), Jessica Dysart Miles, PhD, ATC (University of North Georgia), Scott Anderson, ATC (University of Oklahoma), Christina L. Master, MD (University of Pennsylvania), Anthony P. Kontos, PhD and Micky Collins, PhD (University of Pittsburgh), Sara P.D. Chrisman, MD, MPH (University of Washington), Alison Brooks, MD, MPH (University of Wisconsin), Steven Rowson, PhD (Virginia Tech), Christopher M. Miles, MD and Laura J. Lintner, DO (Wake Forest University), and Brian H. Dykhuizen, MS, ATC, LAT (Wilmington College).
Author Disclosure Statement
No competing financial interests exist.
