Identification and Connectomic Profiling of Concussion Using Bayesian Machine Learning

Abstract

Accurate early diagnosis of concussion is useful to prevent sequelae and improve neurocognitive outcomes. Early after head impact, concussion diagnosis may be doubtful in persons whose neurological, neuroradiological, and/or neurocognitive examinations are equivocal. Such individuals can benefit from novel accurate assessments that complement clinical diagnostics. We introduce a Bayesian machine learning classifier to identify concussion through cortico-cortical connectome mapping from magnetic resonance imaging in persons with quasi-normal cognition and without neuroradiological findings. Classifier features are generated from connectivity matrices specifying the mean fractional anisotropy of white matter connections linking brain structures. Each connection's saliency to classification was quantified by training individual classifier instantiations using a single feature type. The classifier was tested on a discovery sample of 92 healthy controls (HCs; 26 females, age μ ± σ: 39.8 ± 15.5 years) and 471 adult mTBI patients (158 females, age μ ± σ: 38.4 ± 5.9 years). Results were replicated in an independent validation sample of 256 HCs (149 females, age μ ± σ: 55.3 ± 12.1 years) and 126 patients with concussion (46 females, age μ ± σ: 39.0 ± 17.7 years). Classifier accuracy exceeds 99% in both samples, suggesting robust generalizability to new samples. Notably, 13 bilateral cortico-cortical connection pairs predict diagnostic status with accuracy exceeding 99% in both discovery and validation samples. Many such connection pairs are between prefrontal cortex structures, fronto-limbic and fronto-subcortical structures, and occipito-temporal structures in the ventral (“what”) visual stream. This and related connectivity form a highly salient network of brain connections that is particularly vulnerable to concussion. Because these connections are important in mediating cognitive control, memory, and attention, our findings explain the high frequency of cognitive disturbances after concussion. Our classifier was trained and validated on concussed participants with cognitive profiles very similar to those of HCs. This suggests that the classifier can complement current diagnostics by providing independent information in clinical contexts where patients have quasi-normal cognition but where concussion diagnosis stands to benefit from additional evidence.

Introduction

Traumatic brain injury (TBI) is a physical impact to the head associated with structural and functional disruptions to brain tissue. Patients with mild TBI (mTBI) may be at risk of both neurodegenerative conditions—such as Alzheimer's disease—and accelerated brain aging, as evidenced by these patients' larger differences between chronological and biological brain ages.^1

–4 The most current diagnostic criteria for mTBI require a non-penetrating force to the head resulting in one or more of the following: more than one clinical sign (e.g., loss of consciousness, post traumatic amnesia, altered mental state), more than two symptoms within 72 h (e.g., altered cognition, physical symptoms), as well as more than one clinical examination or laboratory findings (e.g., balance or cognitive impairment), or neuroimaging evidence of intracranial abnormalities, according to Silverberg and colleagues.⁵

Diagnosis of mTBI, whether acute (less than 48 h after injury) and/or subacute (2 days to 2 weeks after injury), is important to prevent sequelae and improve neurocognitive outcomes.⁶ However, mTBI is a heterogeneous condition that can be challenging to diagnose acutely when: 1) early clinical signs or acute symptoms (e.g., confusion, difficulty concentrating) are equivocal or insufficiently specific; and 2) there is delay in the onset of such symptoms.⁷ Moderate-to-severe TBI is ruled out by the absence of focal lesions or injuries apparent on acute computed tomography (CT) or magnetic resonance imaging (MRI).⁸ Concussion, a condition held by 93.8% of medical professionals to be synonymous with mTBI,⁵ is sometimes considered to include all forms of mTBI, including repetitive head injury and complicated mTBI.⁹ Here and throughout, concussion is used interchangeably with mTBI to refer to head injuries meeting standard criteria for mTBI according to Silverberg and colleagues,⁵ in the absence of evidence for anatomic intracranial injury.

Conventional concussion diagnosis uses standardized clinical assessments. The Glasgow Coma Scale (GCS), classifying consciousness levels, is frequently used to diagnose TBI and designate its severity. The GCS holds diagnostic power for moderate-to-severe TBI¹⁰ but is not sensitive to the symptoms and mental alterations, such as confusion, attention, and concentration problems, seen in concussion.⁷ Cognitive tests assessing language, memory, and executive functioning can be used as evidence of concussion.¹¹ However, many neurocognitive batteries have low diagnostic power distinguishing concussed patients from neurologically healthy controls (HCs).^12,13 Variability in diagnosis also stems from potential subjectivity in the interpretation of clinical guidelines,¹⁴ leading to ∼50%-90% of concussion cases going without formal diagnosis at hospital admission.¹⁵

The standard MRI protocols for suspected brain injuries include T1- or T2-weighted anatomic scans, primarily providing radiologists with evidence of brain pathology to differentiate concussion from moderate-to-severe TBI.¹⁶ However, the absence of identifiable lesions on T1- or T2-weighted MRIs does not rule out concussion. For example, traumatic axonal injury (TAI) is not always detectable by such scans.¹⁷ In these and other cases, diffusion weighted imaging (DWI) can assess concussion's impact on structural connectivity.^{3,17

–26} TAI-related white matter (WM) disruption mapped using DWI-estimated diffusion tensors can be a major contributing factor to poor cognitive outcome.²⁷

Tensors enable quantification of fractional anisotropy (FA), a surrogate measure of WM integrity defined as the directional coherence of water molecules in axonal bundles.^28,29 Some studies suggest that, compared with HCs, FA is typically lower in concussed patients within WM structures such as the corona radiata, cingulum, superior longitudinal and uncinate fasciculi, cortico-spinal tract, and corpus callosum.^22,30 A shift from voxel-wise analysis towards whole–brain connectomics has improved understanding of how concussion affects macroscale neural networks.^21,23,25,31 The integrity of WM connectivity, as conveyed by the mean FA of WM bundles between brain structures, is diminished in patients with acute concussion relative to HCs.³² Further, FA-derived structural connectivity measures can distinguish the connectomes of concussed patients from those of HCs.³³ Thus, given concussed patients' expected lack of neuroradiological findings on T1- or T2-weighted MRIs, analyzing DWI-derived connectome features is a reasonable strategy to identify concussion-related structural brain abnormalities.

This study introduces a Bayesian machine learning (ML) classifier to identify subacute concussion in the absence of neuroradiological MRI findings on T1- or T2-weighted scans. To produce classifier features, our workflow generates connectivity matrices specifying the mean FAs of WM connections between brain structures. Entries in these matrices become features for ML classifiers trained to detect in a discovery sample including both concussed and HC participants, and in an independent validation sample. In both samples, our classifier identifies concussion with accuracy above 99% in the absence of neuroradiological MRI findings, evidencing its ability to differentiate between concussed and healthy brains in typical cases where diagnosis may be unclear.

Methods

Participants

This study was conducted in accordance with the US Code of Federal Regulations (45 CFR 46), the declaration of Helsinki, and with approval from the Institutional Review Board of the University of Southern California. Concussion was defined based on: an acute GCS score of 13-15 at the time of the initial clinical examination; a loss of consciousness shorter than 30 min; and post-traumatic amnesia lasting less than 24 h. Study inclusion criteria for concussed participants included the availability of MRIs acquired 14 days ±4 days (subacutely) post-injury. Additionally, concussed participants had no history of brain injury prior to the concussion accounting for their inclusion in the study group. Participants satisfying these criteria and who were both able and willing to provide written informed consent were invited to participate. Excluded were participants with pre-traumatic histories of clinical neurological disease, psychiatric disorder, or drug/alcohol abuse. For healthy controls, study inclusion criteria included a history of no TBI or concussion within the last 12 months, as well as the ability to provide written informed consent.

To train and validate classifiers, two samples were studied: a discovery (training) sample and a validation (testing) sample. The discovery sample comprised HCs and participants diagnosed with mTBI by the Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Consortium, including 471 mTBI participants and 92 HCs (Table 1). The ages of participants in this sample ranged from 17 to 83 years. The discovery sample was an adult cohort to exclude the effects of microstructural WM changes associated with early childhood development. The validation sample comprised 126 participants diagnosed with concussion, whose data are available at the University of Southern California (henceforth labeled as the USC subsample, n = 126) and 256 HCs from the Human Connectome Project (HCP subsample).³⁴ The ages of participants in this sample ranged from 12 to 83 years. The validation sample involved primarily an adult cohort, although some younger participants were included to ensure classifier robustness to age as a potential confound. Linear regressions to partial out the statistical effect of age are included at later stages of analysis. Because training and testing samples were completely distinct, non-overlapping, and independent, data leakage is not a concern.

Table 1.

Demographics of Discovery and Validation Samples

	Discovery		Validation
Measure	Uninjured	Injured	Uninjured	Injured
Sample size	92	471	256	126
Males	66	313	107	80
Females	26	158	149	46
Age (years)
Mean	39.8	38.4	55.3	39.0
Standard deviation	15.5	5.9	12.1	17.7
Injury type
Accidental fall		123
Vehicular accident		427
Other accidents		21
Undetermined if injury was intentional or accidental		5
Homicide or injury purposely inflicted by other persons		20
Hand preference
Left-handed	4	48	26
Right-handed	86	407	223
Both hands	2	13	0
Race
American Indian or Alaska Native	1	3	1
Asian	6	21	22
Black or African American	13	78	47
Native Hawaiian or Other Pacific Islander	0	1	0
White	69	356	167
More than one race	0	0	16
Ethnicity
Hispanic or Latino	85	21	24
Not Hispanic or Latino	385	71	231
Education level
High school graduate or lower	18	143	25
Bachelor's degree or other undergraduate education	64	262	139
Graduate degree	9	61	85

Blank entries represent unknown or missing data.

Cognitive assessments

Cognitive assessments were summarized to calculate average cognitive profiles of participants in each injured sample (discovery and validation), in each of which different tests were available. Here and throughout, the “injured” designation refers to patients with formal diagnoses of mTBI. The discovery sample had three tests for both study group participants and HCs. The first is the Rey Auditory Verbal Learning Task (RAVLT),³⁵ evaluating verbal memory and learning ability through immediate and delayed recall of a list of 15 unrelated words. The second is the Trail-Making Test (TMT),³⁶ quantifying visuomotor function, perceptual-scanning, and cognitive flexibility, with longer times indicating worse performance. The third test is the processing speed index of the Wechsler Adult Intelligence Scale (WAIS),³⁷ consisting of two timed tasks (the coding and symbol tasks, 120 sec each). These tasks together index processing speed and visual motor coordination.

The validation sample had the Brief Test of Adult Cognition by Telephone (BTACT)^38,39 available for injured participants. The BTACT is a short phone-based assessment with six subtests assessing cognitive function implicated in pathology and aging: 1) episodic verbal memory—immediate recall (EVMI); 2) episodic verbal memory—delayed recall (EVMD) of words on a 15-item list; 3) working memory span (WMS, assessed using a backward digit span task); 4) inductive reasoning (IR; assessed using a number series completion task); 5) processing speed (PS; measured using a backward counting task); and 6) verbal fluency (VF; assessed using a category fluency task). To situate injured participants in the discovery sample relative to the uninjured (HC) population, their cognitive scores were compared against those of 4179 HCs from the Midlife in the United States (MIDUS) study.⁴⁰ For all participants in this study, cognitive assessments were performed within 3 days of imaging.

First-order statistics (i.e., mean, standard deviation) of cognitive scores were calculated for injured participants and HCs in SPSS version 28. Independent-sample t-tests were used to compare mean differences between injured and uninjured participants in the discovery and validation samples separately (i.e., discovery-injured vs. discovery-uninjured; validation-injured vs. validation (MIDUS)-uninjured). Whenever Levene's test statistic was significant, unequal variances were assumed. Effect sizes of the differences between groups were assessed using Cohen's d, where effect sizes d < 0.19 are negligible, 0.20 < d < 0.49 are small, 0.50 < d < 0.79 are medium, 0.80 < d < 1.20 are large, and d < 1.20 are very large.⁴¹ Multiple comparisons were accounted for using a false discovery rate (FDR) correction.⁴²

Imaging in the discovery sample

For both injured and uninjured participants, MRIs were selected from the TRACK-TBI Consortium, which has standardized protocols for acquisition of imaging data (https://www.tracktbi.ucsf.edu). T1-weighted images were acquired in three dimensions with a multi-echo magnetization-prepared rapid gradient-echo sequence. DWIs were acquired with a multi-slice single-shot spin echo echo-planar pulse sequence (64 gradient directions; voxel size = 2.7 mm × 2.7 mm × 2.7 mm; maximum b = 1300 sec/mm²; eight volumes with b = 0 sec/mm²). Standardized echo and repetition times were not available for TRACK-TBI subjects across study sites, but all MRIs were acquired on approved 3 Tesla (3T) magnetic resonance scanners. Further, an isotropic diffusion phantom developed by the National Institute of Standards and Technology, together with a traveling volunteer, ensured standardization of DTI measures across imaging sites.⁴³ Scans additionally underwent quality control to ensure compliance with necessary protocols for inclusion in the TRACK-TBI Consortium.

Imaging in the validation sample

The MRIs of injured participants in the validation sample were acquired on a 3T Prisma MAGNETOM Trio TIM scanner (20-channel head coil, Siemens Corporation, Erlangen, Germany). T1-weighted images were acquired in three dimensions using a magnetization-prepared rapid gradient-echo sequence (voxel size = 1.0 mm × 1.0 mm × 1.0 mm; repetition time (TR) = 1.95 sec; echo time (TE) = 2.98 sec; inversion time (TI) = 0.9 sec). DWIs were acquired axially in 64 gradient directions (voxel size = 2.73 mm × 2.73 mm × 2.7 mm; TR = 8.3 sec; TE = 72 msec; maximum b = 1300 sec/mm²; one volume with b = 0 sec/mm²). For HCs, MRIs were selected from the HCP-Aging repository (https://www.humanconnectome.org/study/hcp-lifespan-aging), acquired on a 3T PRISMA scanner (32-channel head coil, Siemens Corporation, Erlangen, Germany). T1-weighted MRIs were acquired in three dimensions with a multi-echo magnetization-prepared rapid gradient-echo sequence (voxel size = 0.8 mm × 0.8 mm × 0.8 mm; TR = 2.5 s; TE = 2.22 msec; TI = 1 sec). DWIs were acquired axially in 98 gradient directions (voxel size = 1.5 mm × 1.5 mm × 1.5 mm; TR = 3.23 sec; TE = 89.20 msec; maximum b = 1300 sec/mm²; seven volumes with b = 0 sec/mm²).

Image processing

T1-weighted MRIs were pre-processed and segmented automatically using Freesurfer 6.0 (http://surfer.nmr.mgh.harvard.edu)⁴⁴ with default parameters and a standard protocol described elsewhere.⁴⁵ Non-cortical structures were stripped using a hybrid-watershed deformation process, image intensities were normalized, and volumes were registered into Talairach space. Segmentation followed the Destrieux parcellation scheme, containing 165 structures: 74 cortical and eight subcortical in each hemisphere, as well as one brainstem structure.⁴⁶

DWIs were processed using 3DSlicer 4.11 (https://www.slicer.org) and DTIPrep 0.1.1 (https://www.nitrc.org/projects/dtiprep), as detailed elsewhere.²² Skull-stripped DWIs and b₀ images were registered to T1-weighted volumes using the BRAINSFit module of 3DSlicer.⁴⁷ Any DWI volume with poor registration to the corresponding T1-weighted volume was corrected with user supervision using a transformation matrix estimated from the registration between the b₀ volume and the original T1-weighted volume. Unscented Kalman filter (UKF) two-tensor tractography was performed in 3DSlicer using whole–brain seeding, with default parameters described elsewhere⁴⁸ and five seeds per voxel. UKF tractography is a deterministic tractography approach that fits two tensors at each step.⁴⁹ The UKF algorithm utilizes previous tracking positions to direct model estimation and to improve tracking. FA was calculated from UKF-derived tensors at each voxel. Tractograms were co-registered onto anatomical segmentations of the brain.

Streamlines shorter than 1.5 cm were discarded due to their higher likelihood to be spurious. The quality of participants' imaging data was verified by a statistical quality control process described below. To reduce heterogeneity due to differences in protocols, scanners, and software used across samples, all DWIs were harmonized using the ComBat-GAM pipeline.⁵⁰

Connectivity calculation

Connectivity matrices specifying the mean FAs of connections between all pairs of brain structures were constructed using purpose-built software.⁵ Let M and N be the number of brain structures and subjects, respectively. The connectivity matrix C has scalar entries $c_{i j k}$ , where $i = 1, \dots, M$ , $j = 1, \dots, M$ , and $k = 1, \dots, N$ . Each connectivity matrix vector $c_{i j}$ of length N describes, for subjects $1, \dots, N$ , the mean FA of tractography streamlines between i and j. For each pair i and j, we computed the fraction of subjects for whom $c_{i j k} \neq 0$ , i.e., who had a WM connection between i and j. For each $c_{i j}$ , the interquartile range (IQR) of mean FAs was calculated, as were the lower, middle, and upper quartiles (Q ₁, Q ₂ and Q ₃, respectively). We attempted to exclude participants with $c_{i j k} < Q_{1} - 1.5 \times I Q R$ or $c_{i j k} > Q_{3} + 1.5 \times I Q R$ but none satisfied either of these inequalities.

To reduce spurious connectivity due to noise and tractography artifacts, we removed outlier participants' information from C . Specifically, connections present in fewer than 80% of participants in each sample (discovery-injured, discovery-uninjured, validation-injured, validation-uninjured) were removed because such connections were more likely to be artifactual and less likely to be sampled adequately. Within each sample, connections available in over 80% of participants were retained. This threshold was obtained by comparing the overlap of (A) the set of features with individual classification accuracies above 95% with (B) the set of features for which connectivity entries were available in over 80% of participants. During this comparison, the proportion of connections in (A) was computed as a function of the connection availability percentage in (B). A threshold of 80% for (B) was found to be maximize the feature membership of (A). Nonzero entries left in each vector $c_{i j}$ were normalized by dividing them by their maximum value across subjects. To account for demographic variable effects on $c_{i j}$ ,^19,51 linear regressions were implemented to partial out the statistical effects of age, sex, and both their first- and second-order interactions (independent variables) on FA (dependent variable).

Machine learning classification and interpretability

To classify participants as injured or uninjured, we trained 25 standard supervised ML classifiers available in MATLAB's classification learner, using five-fold cross-validation to alleviate overfitting. For each classifier, in addition to predictive accuracy (i.e., the percentage of participants whose diagnostic status was identified correctly), we calculated sensitivity (i.e., true positive rate, TPR), specificity (i.e., true negative rate, TNR), precision (i.e., positive predictive value, PPV), and negative predictive value (NPV). From the set of trained classifiers, we chose the Gaussian naïve Bayesian classifier for further analysis based upon its predictive accuracy and overall suitability to the computational problem. This classifier was applied to the validation sample, where its classification measures were calculated.

In choosing between classifiers that yielded sufficiently high accuracy, the following characteristics were considered. Two naïve Bayesian classifiers were tested; a Gaussian and a kernel classifier. The Gaussian classifier assumes that features are normally distributed, but the kernel classifier makes no such assumptions, being more flexible in its ability to model nonlinear class boundaries. Compared with Bayesian classifiers, both standard and ensemble decision trees perform similarly well and are more parsimonious. However, they are also more vulnerable to overfitting and more reliant upon features with low predictive power.⁵² Decision trees are less suited to our setting due to the relatively unbalanced nature of our samples (discovery sample: 471 injured participants and 92 HCs; validation sample: 126 injured participants and 256 HCs). Both naïve Bayesian classifiers trained here are scalable and robust to overfitting,⁵³ but the Gaussian iteration is better for features consisting of continuous variables, such as FA.⁵⁴ By contrast, kernel classifiers are typically more suitable for categorical or discrete data.⁵⁴

To rank connections according to their classification utility (saliency), we trained distinct instantiations of the optimal model which only included a pair of bilateral features consisting of a single connection and its contralateral homolog. For example, to quantify the classification saliency of hippocampo-amygdalar connectivity, we included both: 1) the connection between the right hippocampus and the right amygdala; and 2) the connection between the left hippocampus and the left amygdala. We opted for this type of bilateral saliency analysis to alleviate the potential confounds of asymmetric injury effects on neurocircuitry. Inclusion of bilateral feature pairs contributes to reducing such confounds because it allows study of the bilateral saliency of a connection to the classifier rather than the unilateral asymmetry of injury effects on that connection.

To provide model interpretability, we computed classifier sensitivity and specificity when it was trained using each bilateral pair (left and right homologs) of features. Both measures were evaluated as indicators of classifier performance to ensure that our findings were sensible. We sought to confirm that connections deemed most salient for classification exhibited mean FAs that differed significantly between classes (injured vs. uninjured). This step helped to ensure that high classification accuracy was not due to artifactual or erroneous mean FA differences between classes that might confound findings. For confirmation, the empirical probability distribution functions (PDFs) of mean FA were calculated for each discovery sample class and for each connection. The null hypothesis of no group difference in mean FA PDFs was tested using the Kolmogorov-Smirnov (KS) goodness of fit test. Its test statistic quantified whether the mean FA PDF of the most useful connections differed between classes. We hypothesized that this difference was significant for connections with highest individual predictive accuracy and nonsignificant for those with lowest individual accuracy.

Results

Demographic description of samples

In the discovery sample, 20.64% of injuries were due to accidental falls, 71.64% were due to vehicular accidents, and 7.72% were due to other causes (Table 1). The validation sample's injury mechanisms are unknown. Most participants were right-handed (discovery-uninjured: 93.48%, discovery-injured: 86.97%, validation-uninjured: 89.56%). Participants were predominantly white (discovery-uninjured: 77.53%, discovery-injured: 77.56%, validation-uninjured: 66.01%) and non-Hispanic (discovery-uninjured: 81.91%, discovery-injured: 77.17%, validation-uninjured: 90.59%). The majority of participants had a bachelor's or higher degree (discovery-uninjured: 80.22%, discovery-injured: 69.31%, validation-uninjured: 89.96%) The validation-injured sample's handedness, race/ethnicity, and level of education were not available.

Cognitive and imaging profiles

Consistent with their diagnoses and with our definition of concussion, all injured participants lack neuroradiological findings. Within the discovery sample, injured participants recalled significantly fewer words (9.53 ± 3.17 on average, compared with 10.48 ± 3.06 by HCs) from the RAVLT immediately after learning (Table 2; Fig. 1). However, Cohen's d is small (d = -0.301; Table 2).⁴¹ There are no other differences between injured patients and HCs in the discovery sample after FDR correction. More than 93% of injured participants have cognitive scores within the HC normal range (i.e., z > -1.96). In other words, most injured participants' cognitive scores are not meaningfully distinguishable from HCs.

FIG. 1.

Comparison of cognitive scores between injured (concussion/mild traumatic brain injury) and uninjured participants. Cognitive scores for injured (orange) and uninjured (blue) subjects. Error bars denote standard errors, and asterisks denote comparisons that are significantly different after false discovery rate correction. (A) Scores (y-axes) in the discovery sample. These include: 1) the number of words correctly recalled on the Rey auditory verbal learning task (RAVLT); 2) time in seconds on the Trail-Making Test (TMT) A and B; and 3) the standardized score on the Weschler Adult Intelligence Scale processing speed test (WAIS PS). (B) Scores (y-axes) in the validation sample. These include the numbers of correct items on the following tasks: 1) episodic verbal memory immediate (EVMI) and episodic verbal memory delayed (EVMD); 2) working memory span (WMS); 3) inductive reasoning (IR); 4) verbal fluency (VF); and 5) processing speed (PS).

Table 2.

Means, Standard Deviations, and Independent Sample t-tests Comparing Injured to Uninjured Cognitive Scores on Available Tests. Cognitive Scores in the Injured Validation Subsample Were Compared With Those of HCs (Uninjured) in MIDUS

Test	dx	N	μ	σ	t	df	p	μ_Δ	σ_Δ	d	% ^a
Discovery sample
RAVLT IR	injured	457	9.53	3.17	-2.585	543	0.010	-0.95	0.37	-0.301	93
RAVLT IR	uninjured	88	10.48	3.06
RAVLT DR	injured	456	9.22	3.29	-1.826	542	0.068	-0.70	0.38	-0.213	96
RAVLT DR	uninjured	88	9.92	3.33
TMT A	injured	456	30.39	15.07	2.298	540	0.022	3.94	1.72	0.270	100
TMT A	uninjured	86	26.45	11.75
TMT B	injured	453	74.38	40.29	1.227	537	0.220	5.75	4.68	0.144	100
TMT B	uninjured	86	68.63	37.23
WAIS PS	injured	454	99.88	16.15	-0.136	538	0.892	-0.26	1.89	-0.016	97
WAIS PS	uninjured	86	100.14	15.37
Validation sample
EVMI^b	injured	110	3.820	2.72	-10.955	113	<0.001	-2.862	0.26	-1.242	61
EVMI^b	uninjured	4493	6.680	2.29
EVMD	injured	110	2.230	2.34	-8.428	4384	<0.001	-2.137	0.25	-0.814	100
EVMD	uninjured	4276	4.360	2.63
WMS	injured	110	3.780	1.58	-8.028	4604	<0.001	-1.188	0.15	-0.775	100
WMS	uninjured	4496	4.960	1.53
IR^b	injured	110	1.450	1.30	-6.166	117	<0.001	-0.780	0.13	-0.481	100
IR^b	uninjured	4512	2.230	1.63
PS^b	injured	110	28.451	13.34	-6.399	113	<0.001	-8.219	1.28	-0.697	85
PS^b	uninjured	4479	36.670	11.75
VF^b	injured	110	13.066	7.03	-8.152	113	<0.001	-5.516	0.68	-0.891	82
VF^b	uninjured	4498	18.582	6.17

Percentage of injured patients whose cognitive score was within the normal range of HCs (i.e., whose z-scores were above the left-tailed two-sigma cutoff of -1.96 for the listed task).

The statistic of Levene's test for equality of variances between injured and uninjured cognitive scores was significant, so equal variances were not assumed.

dx, diagnosis (injured or uninjured); N, sample size; μ, mean; σ, standard deviation; t, independent samples t-value; df, degrees of freedom; p, two-tailed significance value; μ_Δ, mean difference; σ_Δ, standard error of the difference; d, Cohen's d. RAVLT IR, Rey auditory verbal learning task immediate recall; RAVLT DR, RAVLT delayed recall; TMT A, trail-making test trial A time; TMT B, TMT trial B time; WAIS PS, Weschler adult intelligence scale processing speed (standardized).

In the validation sample, injured participants' average BTACT scores are significantly lower than that of HCs (Table 2; Fig. 1). For instance, on both immediate and delayed recall tasks, the average injured participant remembered half the number of words (EVMI: 3.82 ± 2.72; EVMD: 2.23 ± 2.34) that HCs did (EVMI: 6.68 ± 2.29; EVMD: 4.36 ± 2.63) and could retain one fewer digit in the working memory task (injured: 3.78 ± 1.58; uninjured: 4.96 ± 1.53). Injured participants were able to name, on average, 13.07 ± 7.03 animals/fruit in 60 seconds, while HCs could name 18.58 ± 6.17. A medium-sized effect (d = -0.481) of injury diagnosis on IR scores was observed, where the negative sign indicates that injured participants' mean scores are lower than HCs’. All other effect sizes are large (d = -0.697 to d = -0.891) except for EVMI (where the effect is very large, d = -1.242). Despite injured patients having significantly lower average scores relative to HCs, individual scores are largely indistinguishable, in a statistical sense, from those of HCs (Table 2). Specifically, no injured participant exceeds the two-sigma cut-off (z < -1.96) for “poor” EVMD, WMS, or IR scores. Only for a small minority of injured participants are PS and VF z-scores significantly lower than those of HCs (15% and 18%, respectively), and fewer than half (39%) have EVMI z-scores below the cut-off.

ML classification

No significant relationship of age, sex, and/or their interaction on mean FA was found (p > 0.05, corrected). Of 25 supervised probabilistic classifiers tested, 17 could be trained on the discovery sample. Eight classifiers—the support vector machine and discriminant models, both linear and quadratic—were discarded from ulterior analysis because they failed to converge. We also tested k-nearest neighbor classifiers (fine, medium, coarse, cosine, cubic, weighted, and subspace ensemble varieties), logistic regression, boosted trees ensemble, and subspace discriminant ensemble classifiers. These yielded 83.7% predictive accuracy as they incorrectly classified all HCs as injured participants.

Two naïve (Gaussian and kernel) Bayesian classifiers achieved 100% predictive accuracy in the discovery sample (Table 3). For reasons discussed in the Machine learning classification & interpretability section, the Gaussian naïve Bayesian classifier was selected for further analysis. This classifier could identify injured participants in the validation sample with accuracy above 99%.

Table 3.

Classification Error Rates on the Discovery Sample for a Selected Subset of Well-Performing Approaches, Expressed in Percentages

Classifier	Accuracy	TPR	TNR	PPV	NPV
Naïve Bayesian classifiers
Kernel	100.00	100.00	100.00	100.00	100.00
Gaussian	100.00	100.00	100.00	100.00	100.00
Decision trees
Fine	99.50	100.00	96.74	99.37	100.00
Medium	99.50	100.00	96.74	99.37	100.00
Coarse	99.50	100.00	96.74	99.37	100.00
Ensemble methods
Bagged	99.82	99.79	100.00	100.00	98.91
RUS boosted	99.64	99.79	98.91	99.79	98.91

TPR, true positive rate (i.e., sensitivity); TNR, true negative rate (i.e., specificity); PPV, positive predictive value (i.e., precision); NPV, negative predictive value (i.e., miss rate).

Interpretability

Our analysis singles out brain connections that are particularly sensitive and specific to concussion as defined here (Table 4). Notably, 13 bilateral cortico-cortical connection pairs form classification features predicting diagnostic status with accuracy higher than 99% in both discovery and validation samples. These connection pairs link frontal lobes to limbic, temporal, parietal, and occipital structures (Fig. 2). The cortical structures linked by these 13 connections are displayed on the cortex (Fig. 3A), visualized using a connectogram (Fig. 2),^21,25 and displayed using streamlines (Fig. 4). One connection pair with high predictive accuracy links the occipital lobes to the subcallosal aspects of the ipsilateral limbic lobes (Fig. 2). The two connections pairs with highest predictive accuracy link (A) the fusiform gyrus to the ipsilateral hippocampus and (B) pericallosal to precentral sulci, ipsilaterally (Table 4). To further contextualize the connections most representative of concussion, those that predict diagnostic status with accuracy above 95% in both discovery and validation samples are visualized in Supplementary Figure S1. A notable proportion of such connections link the frontal lobe to subcortical structures and to the parietal lobe, both ipsilaterally and contralaterally.

FIG. 2.

Connectogram of classification features that best facilitate identification of injury status. These features are part of a highly salient network of brain connections, i.e., connections with above 99% accuracy in both discovery and validation samples. The outermost ring encodes brain structures arranged by lobe (fr—frontal; ins—insula; lim—limbic; tem—temporal; par—parietal; occ—occipital; nc—non-cortical; bs—brain stem; CeB—cerebellum). Lobes and structures are ordered from anterior to posterior according to their center-of-mass locations within the right-anterior-superior (RAS) brain coordinate system. Each lobe has a unique dominant color scheme with distinct hues for each constituent structure. The color of each structure on the cortical plot of Figure 3B matches the corresponding color on the outer ring of the connectogram. A complete description of color schemes is available in our previous publication on the connectogram.²⁵ Connectogram links encode the mean fractional anisotropy (FA) of streamlines forming the respective connection between any two structures. Links shaded in red denote tractography-resolved pathways in the lower tercile of the mean FA distribution (lowest mean FAs), links in green are pathways with mean FAs in the middle tercile, and links in blue are in the top tercile (highest mean FA).

FIG. 3.

Brain structures linked to other parts of the brain by connections with optimal discriminative ability to identify injury status. (A) Brain structures involved in connections that are optimal for classification. From left to right, across the top row, the views are left lateral, right lateral, frontal, and superior. From left to right, across the bottom row, the views are medial right, medial left, posterior, and inferior. (B) Legend of structure colors. Each gyral or sulcal structure is painted in the same color used to represent it on the outer edge of the connectogram, see Figure 2.

FIG. 4.

Streamline representations of white matter connections whose mean fractional anisotropy (FA) values have the highest classification accuracy to identify injury status. Structures connected by these streamlines are shown in Figure 3A, and the connections are visualized in the connectogram in Figure 2. All magnetic resonance images are displayed in radiological convention (the left-hand side of the reader is the right-hand side of the subject). Streamlines are colored according to mean streamline orientation (green streamline trajectories primarily span the sagittal plane (i.e., they proceed from the anterior to the posterior part of the brain); blue streamlines are along the coronal plane (i.e., inferior to superior); red streamlines are along the transverse plane (i.e., left to right). (A) Streamline bundles with orientations largely along the coronal plane linking the posterior-dorsal part of the cingulate gyrus to the precentral sulcus, the superior part of the superior frontal gyrus to the parahippocampal gyrus, the superior frontal gyrus to the lateral occipito-temporal sulcus, the superior frontal sulcus to the caudate nucleus, and the pericallosal sulcus to the superior part of the precentral sulcus. (B) Streamline bundles with orientations largely along the sagittal plane linking the fusiform gyrus to the hippocampus, the supramarginal gyrus to the inferior frontal sulcus, the superior parietal lobule to the central sulcus, the inferior occipital gyrus and sulcus to the lingual gyrus, the posterior ramus of the lateral sulcus to the inferior part of the precentral sulcus, and the inferior part of the precentral sulcus to the inferior temporal sulcus.

Table 4.

Bayesian Classification Accuracies of Single Features Conveying the Mean FA of Cortico-Cortical Connectivity Between Two Regions

		Accuracy		Sensitivity
Region 1	Region 2	Discovery	Validation	Discovery	Validation
Pericallosal sulcus	Precentral sulcus, superior part	100.00%	100.00%	100.00%	100.00%
Fusiform gyrus	Hippocampus	99.47%	100.00%	100.00%	100.00%
Inferior occipital gyrus and sulcus	Lingual gyrus	99.29%	100.00%	100.00%	100.00%
Superior frontal gyrus	Parahippocampal gyrus	99.29%	100.00%	100.00%	100.00%
Superior frontal sulcus	Circular insular sulcus, superior part	99.29%	100.00%	100.00%	100.00%
Posterior ramus of the lateral sulcus	Precentral sulcus, inferior part	99.47%	99.48%	99.15%	100.00%
Superior parietal lobule	Central sulcus	99.12%	100.00%	100.00%	100.00%
Cingulate gyrus and sulcus, middle-anterior part	Precentral sulcus, inferior part	99.82%	99.21%	98.51%	100.00%
Superior frontal gyrus	Lateral occipito-temporal sulcus	99.12%	100.00%	100.00%	100.00%
Precentral sulcus, inferior part	Inferior temporal sulcus	99.12%	100.00%	100.00%	100.00%
Supramarginal gyrus	Inferior frontal sulcus	99.12%	100.00%	100.00%	100.00%
Cingulate gyrus, posterior-dorsal part	Precentral sulcus, superior part	99.30%	99.21%	98.94%	99.20%
Superior frontal sulcus	Caudate nucleus	99.29%	99.21%	100.00%	98.41%

FA, fractional anisotropy.

Mean FA histograms were compared across groups for each bilateral connection pair. For the 13 connection pairs with highest predictive accuracy, a two-sample KS test rejected the null hypothesis that the mean FAs of concussed and HC subjects were from the same continuous distribution (p < 0.05; Fig. 5; Supplementary Fig. S2-S4). This confirms that brain features enabling our classifier's high discriminant ability reflect genuine diffusion properties (mean FA) of connections on which classification relies. It also suggests that the classifier's accuracy is not simply due to an overfit model. By contrast, for the connections with lowest prediction accuracy, the null hypothesis of the KS test was not rejected (p > 0.05; Fig. 6).

FIG. 5.

Comparison of mean fractional anisotropy (FA) for representative connections with near-ideal accuracy. Red bars depict injured participants, while blue bars depict uninjured participants. (A) Discovery sample. (B) Validation sample. In both (A) and (B), histograms of mean FA are depicted for three representative connections whose individual classification accuracy is near-ideal (superior part of precentral sulcus to pericallosal sulcus, accuracy >99%, left column; hippocampus to fusiform gyrus, accuracy >99%, middle column; inferior occipital gyrus and sulcus to lingual gyrus, accuracy >99%, right column). See also Table 4.

FIG. 6.

Comparison of mean fractional anisotropy (FA) for representative connections of both near-ideal and poor accuracy. Red bars depict injured participants, while blue bars depict uninjured participants. (A) Discovery sample. (B) Validation sample. In both (A) and (B), histograms of mean FA are depicted for two representative connections: one whose individual classification accuracy is near-ideal (superior frontal sulcus to the caudate nucleus, accuracy >99%, left column) and one whose accuracy is poor (middle frontal gyrus to the caudate nucleus, accuracy = 9.95%, right column). See also Table 4.

We identified 1265 connections confidently, and their sensitivity and specificity were tallied. The sum of TPR and TNR is above 1.65 in both discovery and validation samples for 49 connections (3.87% of all connections). Other studies have suggested that tests with accuracies above 90% are “excellent” for diagnosis, whereas accuracies between 75% to 90% have “good” diagnostic value.^55,56 By this measure, 279 connections (22.06%) have excellent accuracies in both discovery and validation samples, while 174 (13.75%) connections have good accuracies in both discovery and validation samples (Supplementary Fig. S5). A total of 812 connections (64.19%) have discovery or validation accuracies below 75%, suggesting that these features would be diagnostically suitable neither on their own nor as part of a diagnostic classifier or panel (Supplementary Fig. S5). Together, these findings confirm our appraisal of the Bayesian classification and lend credibility to the premise that our analysis is based on both direct quantitative measures of brain water diffusion and sound inferences.

Discussion

Significance

Neuropsychological assessment for concussion typically occurs weeks to months after injury due to perceived mildness of neuropsychological symptoms in the sub/acute phase of injury.⁸ In the subacute phase, many concussion victims experience persistent cognitive deficits, including memory difficulties, impulsivity, and impaired information processing.⁵⁷ Misdiagnosed patients with sub/acute concussion miss necessary treatments and psychoeducation, increasing the risk of recovery complications.⁵⁸ Our classifier is trained on MRIs of patients without neuroradiological findings, thus being particularly appealing for clinical cases where imaging does not corroborate a concussion diagnosis. Our classifier is less relevant for subacute concussions with CT findings but without MRI findings, as a lack of CT findings does not guarantee a lack of MRI findings.

Some participants exhibit cognitive symptoms allowing unambiguous concussion diagnosis. Hypothetically, our classifier may only be useful in the presence of such symptoms. On average, the injured discovery sample includes participants who exhibit deficits in EVMI but are otherwise not impaired significantly compared with HCs. Injured participants in the validation sample exhibit, on average, deficits in working and episodic verbal memory, slowed processing, impaired inductive reasoning, and limited verbal fluency. Nevertheless, injured participants' individual cognitive scores are rarely outside the normal range established in HCs.

Specifically, no injured participant scores significantly lower than HCs on TMT-A, TMT-B, EVMD, WMS, or IR. At most, 39% of injured participants score lower than HCs on EVMI (Table 2). Thus, injured participants in both samples have cognitive profiles representative of routinely observed concussion syndromes. However, a majority of injured participants have cognitive profiles difficult to distinguish from those of HCs, corroborating reports of difficulties in distinguishing individual injured patients from HCs based solely on cognitive scores.⁵⁹ The training and validation of our classifier involved injured participants with cognitive profiles similar to those of HCs. This suggests its applicability to persons with quasi-normal cognition but whose concussion diagnosis can benefit from further insights.

Comparison to previous research

We successfully distinguish injured participants from HCs with accuracy above 99%. Of 25 supervised classifiers trained, the naïve Bayesian classifier demonstrates the highest classification accuracy and generalizability to an external independent validation sample with acquisition and pre-processing pipeline different from those of the discovery set. No other MRI-based classifier known to us has higher predictive accuracy. For example, Vergara and colleagues⁶⁰ trained classifiers to identify subacute mTBI from resting-state functional MRIs (maximum accuracy: 84.1%) or diffusion MRIs (maximum accuracy: 75.5%). Italinna and colleagues⁶¹ developed an ML classifier that used resting-state magnetoencephalography in mTBI patients to achieve a classification accuracy of 79%. These classifiers, in contrast to ours, failed to reach the high sensitivity and specificity thresholds expected in clinical settings. Nevertheless, because biomarkers and classifiers with (quasi) errorless classification rates are rare in biomedical sciences, we emphasize the need for further independent validation, both retrospective and prospective.

Interpretability

A small pool of features was uniquely useful for subacute injury identification. WM connections most useful to our classifier constitute a network of brain connectivity particularly sensitive to the neurological sequelae of head impacts.

Many connections idiosyncratic to concussion involve the prefrontal cortex (PFC), important in cognitive control of emotion and behavior,⁶² where it exerts top-down influence on subcortical and limbic structures like the amygdala.⁶³ Such disturbances frequently manifest themselves as anxiety or depression, whose severities are mediated by damage to fronto-limbic WM connections.⁶⁴ Socio-emotional dysregulation and impulsivity can occur after concussion due to fronto-subcortical and fronto-limbic dysconnectivity.⁶⁵ The prominence that the classifier assigns to connections between the PFC and both subcortical and limbic regions may explain the high incidence of affective disorders (23% of adults within 4 years of injury)⁶⁶ and behavioral disturbances such as aggression (34% of adults within 6 months of injury)⁶⁷ following concussion.

Many connections with near-ideal classification accuracy involve the superior frontal gyrus. This voluminous structure spans the superior extremity of dorsolateral PFC along the anteroposterior axis, where shear and strain forces are often strong during injury. The connection between the superior frontal sulcus and the caudate nucleus spans a notable portion of frontal WM, where connectivity from superficial frontal areas travels to deeper subcortical regions. Thus, it is unsurprising to find these and similar structures among the connections with high concussion classification accuracy.

Somatosensory and somatomotor areas are prominent in our network of connections salient to concussion. Specifically, connectivity between inferior precentral and inferior temporal sulci is involved in nociceptive and somatomotor memory circuits, and both are often affected by concussion.^68,69 These connections originate/terminate along the boundary between the somatomotor and somatosensory cortices; their high sensitivity to concussion may reflect cortical dynamics of post-traumatic pain syndromes. Primary somatosensory cortex and several supplementary motor areas localize to the central sulcus and the superior parietal lobule, respectively. The significant traumatic disruption of connections involving these areas may explain the frequency of post-concussion complaints pertaining to sensorimotor integration and balance problems.⁷⁰

Many connections salient to concussion identified here belong to the ventral (“what”) visual pathway subserving object recognition and storage of visual and long-term memory. This pathway includes the fusiform gyrus, the lateral occipito-temporal sulcus, and the inferior temporal sulcus. After injury, persons with concussion exhibit impaired object recognition through the ventral visual processing stream.⁷¹ Connectivity between the superior frontal gyrus and both the parahippocampal gyrus and the lateral occipito-temporal sulcus also links attention to memory systems.⁷² Similarly, connections linking the hippocampus to the (ipsilateral) fusiform gyrus are involved in visual processing and memory.⁷³ Both structures are vulnerable to concussion partly due to their locations in the brain.⁷³ Disruption of connections between these structures may partly explain memory symptoms common after concussion,⁷³ as seen in our patients' lower-than-normal RAVLT, EVMI/EVMD, and WMS scores.

Connectivity between the superior part of the precentral sulcus (dorsolateral PFC) and the pericallosal sulcus (limbic lobe) perfectly predicts concussion status in the validation sample. Kim and colleagues⁷⁴ found that WM tracts innervating the pericallosal sulcus exhibit altered connectivity properties in preclinical Alzheimer's disease. Because pericallosal and precentral areas mediate cognitive functions affected by concussion and Alzheimer's disease,⁷⁵ the connectivity identified here may suggest neurocognitive parallels between these conditions⁷⁶ that future studies should examine.

Reproducibility and generalizability

Reproducible classification with accuracy above 95% is difficult in medicine and life sciences and therefore warrants skepticism. To strengthen trust in our classifier, we compared mean FAs between groups across all bilateral pairs of connectivity features, including those either most or least salient to the classifier. Because FA quantifies water diffusion measured by MRI, we reasoned that this analysis could relate abstract features identified by our classifier to concrete measures of connectivity. Unsurprisingly, our results indicate that connections salient to the classifier exhibit significant mean FA differences between diagnostic groups. Conversely, connections with poor saliency have nonsignificant group differences in mean FA. These results strengthen the premise that the classifier relies on physical alterations affected by concussion on connectivity features rather than on spurious properties of WM streamline bundles.

In this study, 50 features (∼4%) achieve classification accuracies above 95%. Since 13 features were used for classification, this suggests that accurate binary classification of concussion is possible based on connection pairs other than those used here, assuming that such classification relies on this pool of highly salient features.⁷⁷ Because the pool size is not similar to the number of features used for classification, our diagnostic strategy is robust to numerical analysis choices pertaining to optimization scheme, hyperparameters, etc. Overfitting is more likely when the salient feature pool is small relative to the number of classifier features, because a paucity of salient features suggests poor class separation in the eigenspace of the discriminant function. The fact that our salient feature pool is ∼four times larger than the number of features used for classification alleviates concerns that our classifier relies on overfit features, or that our accuracy may be irreproducible or ungeneralizable.

Classifiers like ours require testing on MRIs from both patients with subacute concussion without radiological findings and matched HCs, causing difficulty in identifying samples suitable for this study. The validation sample comprises two subsamples from distinct studies where participants were imaged on the same scanner type but using slightly different scan parameters. For this reason, the validation sample's classification accuracy is partly affected by the statistical interaction between diagnostic status and scan protocol. In the USC validation subsample, which lacks HCs, we calculated the true positive (100%) and false negative (0%) rates, because the false positive and true negative rates are unknown. Similarly, in the HCP validation subsample, the true negative rate is 100% and the false positive rate is 0%; the true positive and false negative rates are unknown because this subsample lacks concussed participants.

While data harmonization via the ComBat-GAM pipeline was performed to minimize the effects of different scanners, we appreciate that this confound still exists to a certain degree. However, we expect this confound to have only modest effects on classification accuracy for several reasons. Firstly, the classifier—despite being trained on the discovery sample—performed exceptionally on the validation sample, as evidenced by the latter's near-ideal classification accuracies. The USC and HCP subsamples are independent, and the classifier's classification rates are excellent in each distinct subsample and in the (combined) validation sample, suggesting genuine discriminative power. Secondly, the demonstration of high classification ability despite potential confounds is desirable, providing evidence of model generalizability. Nevertheless, data from other independent samples can further validate our approach and evince its generalizability. Such testing should also appraise how different brain segmentations, tractography, connectomic analyses, etc., affect results. Finally, although classification accuracy is exceptional for both the discovery and validation samples, the trustworthiness of this metric depends on sample size and population variance. Thus, our results require replication in larger samples capturing this variance.

Comparison to clinical and cognitive measures

Routine methods of diagnosing concussion rely on the ability to measure GCS scores, loss of consciousness, and post-traumatic amnesia within 30 min of injury. However, typical delays in clinical assessment or delayed effects of many injuries prevent an estimated 50-90% of concussion cases from being identified early enough to reduce chance of sequelae.^7,78 Further, concussion symptoms are often misattributed to alcohol intoxication, whose relationship to GCS remains unclear.⁷⁹ Because 35-50% of suspected patients with concussion may be intoxicated during injury,⁸⁰ developing novel diagnostic protocols less vulnerable to intoxication status is crucial. As detailed by Peixoto and colleagues in patients with acute polytrauma, cognitive assessments often lead to missed concussion diagnoses (60.9% of cases) if other injuries accompany symptoms similar to those of concussion.⁸¹

The neuropsychological assessments used in this study to identify sequelae indicative of brain injury are reportedly related to cognitive outcome.^7,13,78 For example, whereas the RAVLT is often used to test verbal memory and learning in patients,⁸² Callahan and collegues⁸³ argued that it also assesses global cognitive functions in the medical rehabilitation setting. On average, in both discovery and validation samples, participants with subacute concussion had significantly poorer verbal memory than HCs, suggesting that tests such as the RAVLT could differentiate between these groups. However, in practice, the RAVLT has substantially lower accuracy (69%) than our classifier (> 99%) when used in the context of concussion diagnosis.⁸⁴ Similarly, 93% of concussed patients had normal RAVLT scores. The RAVLT also has low accuracy in predicting recovery for moderate-to-severe TBI patients.⁷⁸

The BTACT, designed to assess typical aging,³⁹ is often used to track cognition after concussion, although it has low criterion validity in sub/acute concussion.⁸⁵ These authors contend that the classifier's best utility is detecting TBI sequelae, not differentiating between participants according to diagnostic status. We observed significant deficits in BTACT performance, on average, in participants with subacute concussion compared with MIDUS HCs. However, most concussed patients had BTACT scores within the normal range (ranging from 100% on EVMD, WMS, and IR, to 61% on EVMI; Table 2). Unsurprisingly, the BTACT was insufficient for sensitive and specific differentiation between concussion and HC. These limitations of routine neuropsychological diagnostic assessments for concussion highlight the need for a novel, robust and independent source of clinical and scientific evidence such as ours.

Limitations

In the validation sample, the average age of concussed participants differs significantly from that of HCs (Table 1). This discrepancy was partially mitigated by regressing out age, sex, and their interactions from classifier features.²² In future studies, the classifier should be tested on additional independent concussion samples with participants matched on age, sex, and other demographic variables (e.g., years of education, socioeconomic status, and race) known to affect functional outcomes after concussion.⁸⁶ Further, classifier robustness to age- and sex-related effects should be studied. Classification accuracy should also be investigated in the presence of additional sources of diagnostic information such as the GCS, T2-weighted MRI, CT, and measures of functional disability, cognition, and consciousness level.

Conclusion

We introduced a novel approach for identifying concussion in the absence of neuroradiological findings using ML of connectomic brain features. Our classifier leverages a network of highly salient connectivity features to achieve near-ideal classification. In the discovery sample, our protocol identifies subacute concussion with high sensitivity and specificity even in patients with quasi-normal cognitive scores. The need for methods like ours increases as the incidence of concussion rises within aging populations. These findings could also help to identify novel biomarkers more accurately reflecting subacute concussion's impact on the connectome.

Transparency, Rigor, and Reproducibility Summary

This study was not formally registered because it is a retrospective study of a sample of convenience. The analplan was not formally pre-registered, but the team member with primary responsibility for the analysis certifies that the analysis plan was pre-specified. A sample size of 950 subjects was planned based on: 1) availability of diffusion-weighted imaging in the TRACK-TBI, HCP, and USC subsample repositories; as well as 2) the existence of necessary data to perform UKF tractography. Five subjects' data failed quality controls; therefore, the final sample size was 945 subjects (92 healthy control subjects and 471 adult mTBI subjects for the discovery sample; 256 healthy control subjects and 126 concussion subjects for the validation sample). Imaging quality control decisions and analyses were performed by investigators who were aware of participants' relevant characteristics. Imaging data were collected using multiple scanners according to data collection protocols for the TRACK-TBI and HCP repositories. Imaging data were preprocessed using Freesurfer and all imaging sets were analyzed at the same time. All equipment and software used to perform acquisition and analysis are widely available from open and commercial sources. The key inclusion criteria are established standards. Outliers were defined based on the interquartile range of their mean FAs, and missing data were handled through removal of corresponding connectivity features from consideration by the classifier. Multiple comparisons were accounted for using an FDR correction. This report includes documentation of internal replication. MRI data are publicly available from HCP (https://www.humanconnectome.org) and TRACK-TBI (https://tracktbi.ucsf.edu/transforming-research-and-clinical-knowledge-tbi). Analytic code used to conduct the analyses presented in this study are not available in a public repository. They may be available by emailing the corresponding author as of September 29, 2023. The authors agree to provide the full content of the manuscript on request by contacting the corresponding author.

Footnotes

Acknowledgments

The authors thank Paul Bogdan for useful discussions.

Authors' Contributions

Authors' contributions include conceptualization and design (AI), data curation (BJH, PEI, AMD, JZ, NFC, NNC), formal analysis (BJH, AMD, JZ, NFC, NNC), funding acquisition (AI), investigation (BJH, PEI, AMD, JZ), methodology (BJH, AMD, NFC, NNC, AI), project administration (AI), resources (AI), software (BJH, AMD, NFC, NNC), supervision (AI), validation (BJH), visualization (BJH, PEI, NFC), writing of the original draft (BJH, PEI, AMD, JZ) and writing review and editing (BJH, PEI, AMD, JZ, NNC, AI).

Data Availability

MRI data are publicly available from HCP (https://www.humanconnectome.org) and TRACK-TBI ().

Funding Information

A.I. gratefully acknowledges the support from the National Institutes of Health under grants R01 NS 100973, R01 AG 082201, and R01 AG 079957, from the US Department of Defense (DoD) under award W81-XWH-1810413, from the Leonard Davis School of Gerontology under a Hanson-Thorell Research Scholarship, from an anonymous donor family, from the Undergraduate Research Associate Program (URAP), the Provost's Undergrad Research Fellowship (PURF), and the Center for Undergraduate Research in Viterbi Engineering (CURVE) at the University of Southern California.

Author Disclosure Statement

No competing financial interests exist.

Supplementary Material

Supplementary Figure S1

Supplementary Figure S2

Supplementary Figure S3

Supplementary Figure S4

Supplementary Figure S5

References

Cole

, Leech

, Sharp

, et al. Prediction of brain age suggests accelerated atrophy after traumatic brain injury. Ann Neurol, 2015; 77(4):571–581; doi: 10.1002/ana.24367

Dams-O'Connor

, Gibbons

, Landau

, et al. Health problems precede traumatic brain injury in older adults. J Am Geriatr Soc, 2016; 64(4):844–848; doi: 10.1111/jgs.14014

Irimia

, Maher

, Chaudhari

, et al. Acute cognitive deficits after traumatic brain injury predict Alzheimer's disease-like degradation of the human default mode network. Geroscience, 2020; 42(5):1411–1429; doi: 10.1007/s11357-020-00245-6

Imms

, Chui

, Irimia

. Alzheimer's disease after mild traumatic brain injury. Aging, 2022; 14(13):5292–5293; doi: 10.1863/2/aging.204179

Silverberg

, Iverson

, Group

ABISI

, et al. The American Congress of rehabilitation medicine diagnostic criteria for mild traumatic brain injury. Arch Phys Med Rehabil, 2023; 104(8):1343 doi: 10.1016/j.apmr.2023.03.036. 1355; doi: 10.1016/j.apmr.2023.03.036.

Irimia

, Van Horn

. Systematic network lesioning reveals the core white matter scaffold of the human brain. Front Hum Neurosci, 2014; 8(1 FEB); doi: 10.3389/fnhum.2014.00051

McCrea

MA.

Mild traumatic brain injury and postconcussion syndrome: The new evidence base for diagnosis and treatment. American Academy of Clinical Neuropsychology Workshop Series. Oxford University Press: Oxford, U.K.; 2008.

Ruff

, Iverson

, Barth

, et al. Recommendations for diagnosing a mild traumatic brain injury: A National Academy of Neuropsychology education paper. Arch Clin Neuropsychol, 2009; 24(1):3 doi: 10.1016/j.apmr.2023.03.036. 10; doi: 10.1093/arclin/acp006

Mayer

, Quinn

, Master

. The spectrum of mild traumatic brain injury: a review. Neurology, 2017; 89(6):623 doi: 10.1016/j.apmr.2023.03.036. 632; doi: 10.1212/wnl.0000000000004214

10.

, Li

, Tu

, et al. Predicting long-term outcome after traumatic brain injury using repeated measurements of Glasgow Coma Scale and data mining methods. J Med Syst, 2015; 39(2):14; doi: 10.1007/s10916-014-0187-x

11.

Calvillo

, Irimia

. Neuroimaging and psychometric assessment of mild cognitive impairment after traumatic brain injury. Front Psychol, 2020; 11:1423; doi: 10.3389/fpsyg.2020.01423

12.

Cairncross

, Gindwani

, Rita Egbert

, et al. Criterion validity of the brief test of adult cognition by telephone (BTACT) for mild traumatic brain injury. Brain Inj, 2022; 36(10-11):1228–1236; doi: 10.1080/02699052.2022.2109744

13.

Iverson

. Complicated vs uncomplicated mild traumatic brain injury: Acute neuropsychological outcome. Brain Inj, 2006; 20(13-14):1335–1344; doi: 10.1080/02699050601082156

14.

Prince

, Bruhns

. Evaluation and treatment of mild traumatic brain injury: The role of neuropsychology. Brain Sci, 2017; 7(8); doi: 10.3390/brainsci7080105

15.

McCrea

, Nelson

, Guskiewicz

. Diagnosis and management of acute concussion. Phys Med Rehabil Cli, 2017; 28(2):271–286; doi: 10.1016/j.pmr.2016.12.005

16.

Lee

, Newberg

. Neuroimaging in traumatic brain imaging. NeuroRx, 2005; 2(2):372–83; doi: 10.1602/neurorx.2.2.372

17.

Yeh

F-C

, Irimia

, de Almeida Bastos

, et al. Tractography methods and findings in brain tumors and traumatic brain injury. Neuroimage, 2021; 245(118651; doi: 10.1016/j.neuroimage.2021.118651

18.

Palacios

, Owen

, Yuh

, et al. The evolution of white matter microstructural changes after mild traumatic brain injury: a longitudinal DTI and NODDI study. Sci Adv, 2020; 6(32):eaaz6892; doi: 10.1126/sciadv.aaz6892

19.

Imms

, Clemente

, Cook

, et al. The structural connectome in traumatic brain injury: a meta-analysis of graph metrics. Neurosci Biobehav R, 2019; 99:128–137; doi: 10.1016/j.neubiorev.2019.01.002

20.

Irimia

, Goh

S-YM

, Wade

, et al. Traumatic brain injury severity, neuropathophysiology, and clinical outcome: insights from multimodal neuroimaging. Front Neurol, 2017; 8:530; doi: 10.3389/fneur.2017.00530

21.

Irimia

, Chambers

, Torgerson

, et al. Patient-tailored connectomics visualization for the assessment of white matter atrophy in traumatic brain injury. Front Neurol, 2012:3:10; doi: 10.3389/fneur.2012.00010

22.

Robles

, Dharani

, Rostowsky

, et al. Older age, male sex, and cerebral microbleeds predict white matter loss after traumatic brain injury. Geroscience, 2022; 44(1):83–102; doi: 10.1007/s11357-021-00459-2

23.

Irimia

, Wang

, Aylward

, et al. Neuroimaging of structural pathology and connectomics in traumatic brain injury: toward personalized outcome prediction. Neuroimage-Clin, 2012; 1(1):1–17; doi: 10.1016/j.nicl.2012.08.002

24.

Irimia

, Fan

, Chaudhari

, et al. Mapping cerebral connectivity changes after mild traumatic brain injury in older adults using diffusion tensor imaging and Riemannian matching of elastic curves. 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA 1690–1693; doi: 10.1109/ISBI45749.2020.9098476

25.

Irimia

, Chambers

, Torgerson

, et al. Circular representation of human cortical networks for subject and population-level connectomic visualization. NeuroImage, 2012; 60(2):1340–1351; doi: 10.1016/j.neuroimage.2012.01.107

26.

Irimia

, Van Horn

, Vespa

. Cerebral microhemorrhages due to traumatic brain injury and their effects on the aging human brain. Neurobiol Aging, 2018; 66:158–164; doi: 10.1016/j.neurobiolaging.2018.02.026

27.

Catani

, Ffytche

. The rises and falls of disconnection syndromes. Brain, 2005; 128(10):2224–2239; doi: 10.1093/brain/awh622

28.

Jones

, Knosche

, Turner

. White matter integrity, fiber count, and other fallacies: The do's and don'ts of diffusion MRI. Neuroimage, 2013; 73:239–254; doi: 10.1016/j.neuroimage.2012.06.081

29.

Jones

. Challenges and limitations of quantifying brain connectivity in vivo with diffusion MRI. Imaging Med, 2010; 2(3):341–355; doi: 10.2217/IIM.10.21

30.

Hulkower

, Poliak

, Rosenbaum

, et al. A decade of DTI in traumatic brain injury: 10 years and 100 articles later. Am J Neuroradiol, 2013; 34(11):2064–2074; doi: 10.3174/ajnr.A3395

31.

Irimia

, Chambers

, Wang

, et al. Systematic connectomic analysis of white matter atrophy associated with severe traumatic brain injury. J Neurotrama, 2012; 29(10):A8–A9.

32.

Iraji

, Chen

, Wiseman

, et al. Connectome-scale assessment of structural and functional connectivity in mild traumatic brain injury at the acute stage. Neuroimage-Clin, 2016; 12:100–115; doi: 10.1016/j.nicl.2016.06.012

33.

Mitra

, Shen

, Ghose

, et al. Statistical machine learning to identify traumatic brain injury (TBI) from structural disconnections of white matter networks. Neuroimage, 2016; 129:247–59; doi: 10.1016/j.neuroimage.2016.01.056

34.

Van Essen

, Ugurbil

, Auerbach

, et al. The Human Connectome Project: a data acquisition perspective. Neuroimage, 2012; 62(4):2222–2231; doi: 10.1016/j.neuroimage.2012.02.018

35.

Rey

Rey auditory verbal learning test (RAVLT). Paris: PUF; 1964.

36.

Reitan

. Validity of the Trail Making Test as an indicator of organic brain damage. Percept Motor Skill, 1958; 8(3):271–276.

37.

Wechsler

. Wechsler Adult Intelligence Scale–Fourth Edition (WAIS–IV). NCS Pearson: San Antonio, TX:, 2008; 22(498):1

38.

Tun

, Lachman

. Telephone assessment of cognitive function in adulthood: The Brief Test of Adult Cognition by Telephone. Age Ageing, 2006; 35(6):620–632; doi: 10.1093/ageing/afl095

39.

Lachman

, Agrigoroaei

, Tun

, et al. Monitoring cognitive functioning: Psychometric properties of the brief test of adult cognition by telephone. Assessment, 2014; 21(4):404–417; doi: 10.1177/1073191113508807

40.

Brim

, Ryff

, Kessler

. How healthy are we?: A national study of well-being at midlife. Prev Chronic Dis. 2004; 1(3): A12.

41.

Cohen

Statistical power analysis for the behavioral sciences. Lawrence Earlbaum Associates.: Hillsdale, NJ; 1988.

42.

Benjamini

, Hochberg

. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B Met, 1995; 57(1):289–300; doi: 10.1111/j.2517-6161.1995.tb02031.x

43.

Palacios

, Yuh

, Mac Donald

, et al. Diffusion tensor imaging reveals elevated diffusivity of white matter microstructure that is independently associated with long-term outcome after mild traumatic brain injury: a TRACK-TBI study. J Neurotrauma, 2022; 39(19-20):1318–1328; doi: 10.1089/neu.2021.0408.

44.

Fischl

. FreeSurfer. NeuroImage, 2012; 62(2):774–781; doi: 10.1016/j.neuroimage.2012.01.021

45.

Fischl

, Dale

. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc Natl Acad Sci U S A, 2000; 97(20):11050–11055; doi: 10.1073/pnas.200033797

46.

Destrieux

, Fischl

, Dale

, et al. Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 2010; 53(1):1–15; doi: 10.1016/j.neuroimage.2010.06.010

47.

Johnson

, Harris

, Williams

. BRAINSFit: Mutual information rigid registrations of whole-brain 3D images, using the insight toolkit. Insight J, 2007; 57(1):1–10; doi: 10.5429/4/hmb052

48.

O'Donnell

, Suter

, Rigolo

, et al. Automated white matter fiber tract identification in patients with brain tumors. Neuroimage Clin, 2017; 13:138–153; doi: 10.1016/j.nicl.2016.11.023

49.

Malcolm

, Shenton

, Rathi

. Filtered multitensor tractography. IEEE T Med Imaging, 2010; 29(9):1664–1675; doi: 10.1109/tmi.2010.2048121

50.

Pomponio

, Erus

, Habes

, et al. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage, 2020; 208:116450; doi: 10.1016/j.neuroimage.2019.116450

51.

Amgalan

, Maher

, Imms

, et al. Functional connectome dynamics after mild traumatic brain injury according to age and sex. Front Aging Neurosci, 2022; 14:852990; doi: 10.3389/fnagi.2022.852990

52.

Farid

, Zhang

, Rahman

, et al. Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst Appl, 2014; 41(4):1937–1946; https://doi.org/10.1016/j.eswa.2013.08.089

53.

Rossi

Bayesian non-and semi-parametric methods and applications. Princeton University Press: Princeton, NJ; 2014.

54.

Ahuja

, Chug

, Gupta

, et al. Classification and clustering algorithms of machine learning with their applications. In: Nature-Inspired Computation in Data Mining and Machine Learning. Springer: New York, NY: 2020; pp. 225–248.

55.

Ray

, Manach

, Riou

, et al. Statistical evaluation of a biomarker. J Am Soc Anesth, 2010; 112(4):1023-1040; doi: 10.1097/ALN.0b013e3181d47604

56.

Power

, Fell

, Wright

. Principles for high-quality, high-value testing. Evid Based Med, 2013; 18(1):5–10; doi: 10.1136/eb-2012-100645

57.

Rabinowitz

, Levin

. Cognitive sequelae of traumatic brain injury. Psychiatr Clin North Am, 2014; 37(1):1–11; doi: 10.1016/j.psc.2013.11.004

58.

Powell

, Ferraro

, Dikmen

, et al. Accuracy of mild traumatic brain injury diagnosis. Arch Phys Med Rehab, 2008; 89(8):1550–1555; doi: 10.1016/j.apmr.2007.12.035

59.

de Freitas Cardoso

, Faleiro

, De Paula

, et al. Cognitive impairment following acute mild traumatic brain injury. Front Neurol, 2019; 10:198; doi: 10.3389/fneur.2019.00198

60.

Vergara

, Mayer

, Damaraju

, et al. Detection of mild traumatic brain injury by machine learning classification using resting state functional network connectivity and fractional anisotropy. J Neurotrama, 2017; 34(5):1045–1053; doi: 10.1089/neu.2016.4526

61.

Italinna

, Kaltiainen

, Forss

, et al. Detecting mild traumatic brain injury with MEG, normative modelling and machine learning. medRxiv, 2022;2022.09. 29.22280521; doi: https://doi.org/10.1101/2022.09.29.22280521

62.

Nejati

, Majdi

, Salehinejad

, et al. The role of dorsolateral and ventromedial prefrontal cortex in the processing of emotional dimensions. Sci Rep, 2021; 11(1):1–12; doi: 10.1038/s41598-021-81454-7

63.

van der Horn

, Liemburg

, Aleman

, et al. Brain networks subserving emotion regulation and adaptation after mild traumatic brain injury. J Neurotrama, 2016; 33(1):1–9; doi: 10.1089/neu.2015.3905

64.

Smith

. Mild traumatic brain injury and psychiatric illness. BC Med J, 2006; 48(10):510.

65.

Wood

, Worthington

. Neurobehavioral abnormalities associated with executive dysfunction after traumatic brain injury. Front Behav Neurosci, 2017; 11:195; doi: 10.3389/fnbeh.2017.00195

66.

Delmonico

, Theodore

, Sandel

, et al. Prevalence of depression and anxiety disorders following mild traumatic brain injury. PM R, 2022; 14(7):753–763; doi: 10.1002/pmrj.12657

67.

Tateno

, Jorge

, Robinson

. Clinical correlates of aggressive behavior after traumatic brain injury. J Neuropsych Clin N, 2003; 15(2):155–160; doi: 10.1176/jnp.15.2.155

68.

Ofoghi

, Dewey

, Barlow

. A systematic review of structural and functional imaging correlates of headache or pain after mild traumatic brain injury. J Neurotrama, 2020; 37(7):907–923; doi: 10.1089/neu.2019.6750

69.

, Zhang

. Nociceptive memory in the brain: cortical mechanisms of chronic pain. J Neurosci, 2011; 31(38):13343–13345; doi: 10.1523/JNEUROSCI.3279-11.2011.

70.

Campbell

, King

, Parrington

, et al. Central sensorimotor integration assessment reveals deficits in standing balance control in people with chronic mild traumatic brain injury. Front Neurol, 2022; 13:897454; doi: 10.3389/fneur.2022.897454

71.

Alnawmasi

, Chakraborty

, Dalton

, et al. The effect of mild traumatic brain injury on the visual processing of global form and motion. Brain Inj, 2019; 33(10):1354–1363; doi: 10.1080/02699052.2019.1641842

72.

Ward

, Schultz

, Huijbers

, et al. The parahippocampal gyrus links the default-mode cortical network with the medial temporal lobe memory system. Hum Brain Mapp, 2014; 35(3):1061–1073; doi: 10.1002/hbm.22234

73.

Henke

, Buck

, Weber

, et al. Human hippocampus establishes associations in memory. Hippocampus, 1997; 7(3):249–256; doi: 10.1002/(SICI)1098-1063(1997)7:3<249::AID-HIPO1>3.0.CO;2-G

74.

Kim

, Adluru

, Chung

, et al. Multi-resolution statistical analysis of brain connectivity graphs in preclinical Alzheimer's disease. Neuroimage, 2015; 118:103–117; doi: 10.1016/j.neuroimage.2015.05.050

75.

Mahoney

, Chowdhury

, Ngo

, et al. Mild traumatic brain injury results in significant and lasting cortical demyelination. Front Neurol, 2022; 13:854396; doi: 10.3389/fneur.2022.854396

76.

Rostowsky

, Irimia

. Acute cognitive impairment after traumatic brain injury predicts the occurrence of brain atrophy patterns similar to those observed in Alzheimer's disease. GeroScience, 2021; 43(4):2015–2039; doi: 10.1007/s11357-021-00355-9

77.

Balayla

. Information threshold, bayesian inference and decision-making. arXiv Preprint, 2022;2206.02266; doi:10.4855/0/arXiv.2206.02266

78.

Green

, Colella

, Hebert

, et al. Prediction of return to productivity after severe traumatic brain injury: investigations of optimal neuropsychological tests and timing of assessment. Arch Phys Med Rehabil, 2008; 89(12 Suppl):S51–S60; doi: 10.1016/j.apmr.2008.09.552

79.

Kushner

. Strategies to avoid a missed diagnosis of co-occurring concussion in post-acute patients having a spinal cord injury. Neural Regen Res, 2015; 10(6):859; doi: 10.4103/1673-5374.158329

80.

Stuke

, Diaz-Arrastia

, Gentilello

, et al. Effect of alcohol on Glasgow Coma Scale in head-injured patients. Ann Surg, 2007; 245(4):651–655; doi: 10.1097/01.sla.0000250413.41265.d3

81.

Peixoto

, Buchanan

, Nahas

. Missed emergency department diagnosis of mild traumatic brain injury in patients with chronic pain after motor vehicle collision. Pain Physician, 2023; 26(1):101.

82.

Peaker

, Stewart

. Rey's Auditory Verbal Learning Test—a review. In: Developments in Clinical and Experimental Neuropsychology. Crawford, JR, Parker, DM. eds) Springer: Boston, MA;, 1989; 219–236

83.

Callahan

, Johnstone

. The clinical utility of the Rey Auditory-Verbal Learning Test in medical rehabilitation. J Clin Psychol Med Settings, 1994; 1(3):261–268; doi: 10.1007/bf01989627

84.

Guilmette

, Rasile

. Sensitivity, specificity, and diagnostic accuracy of three verbal memory measures in the assessment of mild brain injury. Neuropsychology, 1995; 9:338–344; doi: 10.1037/0894-4105.9.3.338

85.

Cairncross

, Gindwani

, Rita Egbert

, et al. Criterion validity of the brief test of adult cognition by telephone (BTACT) for mild traumatic brain injury. Brain Inj, 2022; 36(10-11):1228–1236; doi: 10.1080/02699052.2022.2109744

86.

Mushkudiani

, Engel

, Steyerberg

, et al. Prognostic value of demographic characteristics in traumatic brain injury: results from the IMPACT study. J Neurotrauma, 2007; 24(2):259–269; doi: 10.1089/neu.2006.0028

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

7.76 MB

0.73 MB

1.29 MB