Abstract
The existing gold standard for diagnosing a suspected previous mild traumatic brain injury (mTBI) is clinical interview. But it is prone to bias, especially for parsing the physical versus psychological effects of traumatic combat events, and its inter-rater reliability is unknown. Several standardized TBI interview instruments have been developed for research use but have similar limitations. Therefore, we developed the Virginia Commonwealth University (VCU) retrospective concussion diagnostic interview, blast version (VCU rCDI-B), and undertook this cross-sectional study aiming to 1) measure agreement among clinicians' mTBI diagnosis ratings, 2) using clinician consensus develop a fully structured diagnostic algorithm, and 3) assess accuracy of this algorithm in a separate sample. Two samples (n=66; n=37) of individuals within 2 years of experiencing blast effects during military deployment underwent semistructured interview regarding their worst blast experience. Five highly trained TBI physicians independently reviewed and interpreted the interview content and gave blinded ratings of whether or not the experience was probably an mTBI. Paired inter-rater reliability was extremely variable, with kappa ranging from 0.194 to 0.825. In sample 1, the physician consensus prevalence of probable mTBI was 84%. Using these diagnosis ratings, an algorithm was developed and refined from the fully structured portion of the VCU rCDI-B. The final algorithm considered certain symptom patterns more specific for mTBI than others. For example, an isolated symptom of “saw stars” was deemed sufficient to indicate mTBI, whereas an isolated symptom of “dazed” was not. The accuracy of this algorithm, when applied against the actual physician consensus in sample 2, was almost perfect (correctly classified=97%; Cohen's kappa=0.91). In conclusion, we found that highly trained clinicians often disagree on historical blast-related mTBI determinations. A fully structured interview algorithm was developed from their consensus diagnosis that may serve to enhance diagnostic standardization for clinical research in this population.
Introduction
E
Mild TBI (mTBI) or concussion is, by far, the most common category of TBI during OEF/OIF/OND deployment, accounting for over 80% of cases. 5 Although mild in nomenclature, up to 20% of those sustaining an mTBI will develop postconcussion syndrome (PCS), a condition of chronic symptoms that may include cognitive impairments and detrimental effects on psychosocial functioning. 6,7 And in contrast to more severe TBI, mTBI is uniquely problematic in diagnosing. The diagnosis of any severity of TBI centers on identifying an initial period of alteration of consciousness (AOC) with or without frank loss of consciousness (LOC). 8 Various pathophysiological processes are hypothesized to be responsible for TBI-induced AOC, including diffuse axonal injury. 9 With mTBI, the initial AOC period can be minutes or less and usually does not include LOC; so its determination is complex, compared to severe TBI, where amnesia lasts over 1 week and often includes an initial LOC period. The briefer its duration, the more challenging it becomes to distinguish it from absence of AOC. Once the brief AOC period resolves, the diagnosis of mTBI may easily be missed, as shown by a recent study of emergency department patients, where less than half of those who sustained mTBI by study criteria actually received a documented diagnosis. 10 Further, imaging may provide confirmatory evidence in moderate-to-severe TBI, but, by definition, conventional computerized tomography (CT) is normal in mTBI. 8
Varying diagnostic criteria for mTBI exist, including the VA/DoD Common Definition of mTBI (
As an additional confounder, traumatic events, such as battlefield blast exposure, may result in a perception of AOC strictly on the basis of an acute stress response. 13 Shock, fear, horror, or adrenaline surge may cloud the sensorium or even lead to repressed memory. There are no reliable means to differentiate between symptoms involving impaired awareness that are caused by severe stress versus mTBI, so differential diagnosis is problematic. 14 This problem may be heightened over time, because during recall of trauma reactions, people with severe psychological disturbance overestimate the symptoms that they had in the acute phase. 13 The residual effects of the psychological versus physical brain trauma also cannot be easily distinguished as PTSD and PCS both lack objective neurological findings and exhibit nonspecific and overlapping symptoms. 15 They may also coexist, as demonstrated by one study showing that 40% of U.S. military personnel reported acute PTSD symptoms after an mTBI 16 and another that 42% of OEF/OIF veterans with a history of mTBI reported persistent PTSD symptoms. 17 Because of these issues, some have suggested that the more objective construct of post-traumatic amnesia (PTA) is preferred over softer symptoms of AOC, such as “dazed,” in diagnosing suspected combat mTBI, 13 but this approach alone would fail to identify the substantial numbers of mTBI without PTA.
The inherent challenges of combat mTBI detection have prompted extensive research efforts in the quest of more-objective diagnostics. These efforts have centered on biomarker and imaging substrates of TBI, with less attention placed on refinement of the gold-standard clinical assessment. Though validated, structured symptom measures and mental status examinations exist to assist diagnosis in acute settings, such as athletic sidelines, 18 there is usually a time lag before formal acute medical evaluation. The examiner must determine the existence or nonexistence of an initial TBI-based AOC period solely from the patient's self-reported symptom experience recall. Notwithstanding current symptoms, the interviewer must instead ascertain that a period of AOC occurred immediately after experiencing the earlier injury force.
Given these difficulties, the scientific literature lacks standard sample selection criteria for the combat mTBI population. Most studies have relied on self-classified concussion or screening instruments, such as those mentioned by Hoge and colleagues, 16 or unstructured postacute clinical evaluations, none of which have proven diagnostic accuracy for mTBI. Screening instruments of any type cannot be relied upon without additional diagnostic steps on positive screens. For example, it has been shown that patients often report illogical, or even frankly contradictory, responses to AOC items on TBI screening questionnaires, such as endorsing LOC but denying a memory gap. 19 Unstructured interviews, which could potentially vet these types of responses, are limited by the degree of examiner thoroughness, experience, expertise, and bias in question formatting and response interpretation. In research settings, using unstructured interview to diagnose mTBI has the further problems of poor transparency and questionable inter-rater reliability. It may be owing to these methodological limitations that published studies to date have been unable to disentangle all the potential risk factors and infer a major causative role of mTBI in PCS among veterans and military SMs.
The standardization and transparency of mTBI diagnostics for research could be advanced by implementing a valid, structured interview. Formal structured interviews have been developed, validated, and used extensively in other conditions, such as those in the mental health arena, where they are considered the gold standard for diagnostic accuracy and against which shorter self-administered questionnaires are typically assessed psychometrically. 20 –22 The structured interview tools developed to date for postacute TBI diagnosis have significant shortcomings when applied for mTBI, whether from blast or other causes. The Ohio State University TBI Identification Method (OSU TBI-ID) is the most widely used structured interview designed for retrospective identification of TBI. 23 However, sound inter-rater reliability has only been reported when the case definition of TBI was “knocked out or unconscious,” probably because the other AOC symptoms are less specific for TBI. This lack of proven reliability for TBI without LOC is an important limitation concerning use of the OSU TBI-ID for diagnosing mTBI because available prospective data in athletic populations show that 80% or more of mTBI cases do not have LOC. 24,25
Other retrospective TBI interview instruments have been developed and reported. The Boston Assessment of TBI Lifetime (BAT-L) 26 is a semistructured interview developed for administration by a doctoral-level neuropsychologist. The BAT-L is described as a “preliminary screen” using a forensic approach with open-ended questions, as previously described by Vanderploeg and colleagues, 27 with open-ended questioning. The Brief Traumatic Brain Injury Screen is a self-report tool for “probable” TBI and problems and symptoms that may be associated with TBI. 28 This interview consists of a series of primarily open-ended questions with vetting of responses left to judgment of the interviewer, either a masters-level psychologist or trained staff member. The Traumatic Brain Injury Questionnaire is a semistructured interview with 12 closed-ended (Y/N) response items assessing for a possible TBI incident, followed by an open-ended interview of the incident(s) identified. 29 Donnelly and colleagues 30 described a semistructured interview for TBI diagnosis against which the Veterans Affairs TBI Screening Tool was assessed for diagnostic accuracy, but did not report on psychometric properties of the interview instrument. These semistructured instruments are not only generally lacking published reliability data, but, as with the OSU-TBI ID sensitivity and specificity, also have not been measured against a meaningful gold-standard diagnosis, such as an antecedent thorough acute clinical assessment.
In summary, the retrospective diagnosis of mTBI is challenging, especially for battlefield blast, where its effects are difficult to separate from that of acute stress reaction. Unstructured clinical interview is the existing gold standard, but it is susceptible to bias and has no published inter-rater reliability data, which limits its use for research. Existing interview and questionnaire tools for detecting mTBI are generally only semistructured and so do not totally eliminate potential bias and are of unproven diagnostic accuracy. Therefore, we sought to examine inter-rater reliability of clinicians' mTBI diagnoses made using identical interviewee blast experience information gathered during semistructured interview. The interview tool, the Virginia Commonwealth University (VCU) retrospective Concussion Diagnostic Interview-blast version (VCU rCDI-B), was developed as part of an overarching epidemiological study of military blast exposures. The second, and more central, aim was to develop an automated interpretive algorithm to pair with the fully structured component of the interview to create a highly standardized and transparent tool for retrospective diagnosis of blast-related mTBI. In order to achieve this, we separated the sample into an initial algorithm development sample and a second smaller sample to test its accuracy against a physician consensus diagnosis.
Methods
Participants
Participants for the overarching ongoing epidemiological study were recruited through letters, advertisements, and from ambulatory health care clinics at the Hunter Holmes McGuire VA Medical Center Polytrauma Rehabilitation Center (Richmond, VA) and at several nearby large military bases (Fort Lee Army Base [Prince George County, VA], Marine Corps Base (MCB) Camp Lejeune [Onslow County, NC], and MCB Quantico [Prince William County, VA]). Inclusion criteria were SM or veteran with one or more blast experiences within the past 2 years while deployed in OIF/OEF. Given that the intended population was those at high risk for physical effects of blast, “blast experience” was defined as reporting any of the following symptoms or effects during or shortly after exposure to blast or explosion: feeling dazed, confused, seeing stars, headaches, dizziness, irritability, memory gap (not remembering injury or injury period), hearing loss, abdominal pain, shortness of breath, being struck by debris, knocked over or down, knocked into or against something, having one's helmet damaged, or being medically evacuated. Severe and moderate TBI were the only exclusion criteria and were defined as: more than 30 min in coma, brain bleeding or blood clot (abnormal brain CT scan), or none of the first 24 or more hours after event can be remembered (PTA >24 h). The current study contained two separate samples that were both derived from the overarching study sample. Sample 1, intended to develop the VCU rCDI-B diagnostic algorithm, consisted of the first 66 consecutively consented and enrolled participants who completed baseline study procedures after our semistructured interview was added to the protocol. Sample 2, intended to cross-validate the algorithm, consisted of the 37 subsequent subjects completing baseline.
Procedure
Blast experience interview
After identifying their self-determined worst (or only) blast experience, each participant was administered the interview (VCU-rCDI-B) by a research coordinator. The VCU-rCDI-B is a combination of open-ended, fully structured interview developed by the one of the researchers (W.C.W.) in order to provide an in-depth assessment of a subject's blast experience and the variables that formulate the AOC construct used to diagnose mTBI. Because the fully structured component was untested, we included the open-ended portion to provide contextual and supplementary information that may not be captured by the structured interview alone. The interview was designed to be administered in 15–30 min and the structured portion probed the subject on the description of the event and experiences, the recollection of the event, the injury mechanism, consciousness, all potential symptoms that might indicate immediate AOC, and outcome of the event. The interview included a structural quality assurance check intended to minimize false-negative responses to the detailed, but potentially abstract, questioning on amnesia. If the interviewee reports memory of the (blast) event and denies any memory gap before or after the event, then the interviewee is asked to verify that they had continuous memory of the event and immediate surrounding period of time; if continuous memory is denied, then the memory items are administered again before moving on with the interview. Another check intended to minimize false positives is included when LOC is endorsed asking if a witness verified it.
Interviewer training consisted of providing coordinators time to familiarize themselves with the instrument, emphasizing the need to exactly adhere to the questions, embedded scripts, and decision trees, and conducting several practices with mock patients. The unstructured portion of the interview was administered first and was followed by the fully structured portion. For the unstructured portion, the coordinator asked and wrote down the responses to the following query: “On the screening form, we asked you to identify your worst blast event which you described as.. Today, I would like you to tell me in as much detail as possible what happened to you and what you felt.” [Interviewer instructions: Make sure to get a clear narrative about events leading up to the blast, information about the blast event, and information about what happened after the blast including what he/she experienced physically and emotionally].
The entire VCU rCDI-B is provided in Appendix A (supplementary material is available online at
Primary outcome measures
The primary outcome measure was the five physicians' independent diagnoses of whether the participant probably did or did not sustain an mTBI during the interviewed event. All physicians were board certified or eligible in physical medicine and rehabilitation and considered themselves TBI experts by virtue of training and experience. Two were investigators for the overarching study. Physicians were each separately provided with deidentified copies of the subject's VCU-rCDI-B responses and were instructed to use their best clinical judgment to determine the diagnosis based on the aforementioned CDC definition of mTBI. In cases of diagnostic uncertainty, they were asked to use their best judgment to make a choice of mTBI versus no mTBI using the legal definition of probability as a reference (greater or less than 50% probability). They independently reviewed and interpreted the interview data and all were blinded to any other subject information or the other physician ratings.
Sample 1 (n=66) ratings procedure and algorithm development
In sample 1, we aimed to examine inter-rater reliability of not only the yes/no TBI determination, but also ordinal TBI likelihood scale ratings, so we instructed the physicians to also rate their level of certainty on each diagnosis. Specifically, they were asked to rate their yes versus no determination as being either of high or low certainty using 90% or higher confidence as the anchor for high certainty. Thus, the following clinician rating was generated for each participant with the TBI likelihood ordinal scale ranking denoted in parentheses:
(>=2) Yes; most likely is a TBI
(3) TBI with high certainty; at least 90% confidence subject had TBI,
(2) TBI with low certainty; subject most likely had TBI but less than 90% confident,
(<=1) No; most likely is not a TBI
(1) No TBI with low certainty; subject most likely did not have TBI but less than 90% confident,
(0) No TBI with high certainty; at least 90% confidence subject did not have TBI.
For the purposes of determining the physician consensus rating, only the dichotomous scale of most likely yes versus most likely no was used. A consensus rating was defined as the majority physician diagnosis (three of five physicians) of this scale. That is, if >=3 of 5 physicians independently determined that a subject certainly or most likely had a mTBI, then the physician consensus rating was positive; if >=3 of 5 ratings were certainly not or most likely not, then consensus rating was defined as negative for TBI. Thus, the consensus rating for each participant was a compilation of the independent blinded individual physician interpretations and constituted the gold standard against which algorithm development described in the Results section sought to match.
Sample 2 (n=37) ratings procedure
A blinded physician rating protocol similar to sample 1 was employed for sample 2 again using five physicians; four of the five from sample 1 and one other physician. As in sample 1, the raters independently classified each participant as positive for mTBI using a cutpoint of 50% probability of mTBI, and this dichotomous scale was used to determine the consensus rating with >=3 physicians considered the consensus. Differing from sample 1, a TBI likelihood scale was not used.
Statistical analysis
The distributions of the demographic and blast experience characteristics for each sample were summarized with frequency counts and percentages for categorical variables and medians and interquartile ranges (IQRs) for continuous variables. Medians and IQRs were chosen because they are better descriptors of the center and spread of severely skewed continuous variables.
The distributions of the physician diagnoses of mTBI for each sample of subjects were examined and summarized for both the ordinal scales and the binary outcome scales (Probably mTBI, Probably Not mTBI). A simple and weighted kappa statistic was computed to assess the reliability, or agreement, between each pair of physicians on the binary and ordinal rating scales respectively. Values of kappa close to 1 indicate agreement and values close to 0 indicate random assessment. The kappa statistics were interpreted using the schematic recommended by Landis and Koch. 31 Kappa values, simple or weighted, less than 0 are considered to be indicative of no agreement, those ranging from 0.00 to 0.20 indicate slight agreement, 0.21–0.40 indicates fair agreement, 0.41–0.60 indicates moderate agreement, 0.61–0.80 substantial agreement, and 0.81–1.00 almost perfect agreement. Fleiss-Cohen weights were used for the computation of the weighted kappa statistic and asymptotic standard errors (ASEs) and 95% confidence intervals (CIs) were computed for each statistic.
The final VCU-rCDI-B automated diagnostic algorithm was compared to the physician consensus in sample 2 using the following parameters: correct classification rate, positive predictive value (PPV), negative predicted value (NPV), sensitivity, and specificity. Owing to these estimates being close to the upper bound, a Wilson 95% CI was reported with each of these quantities. 32 Last, Cohen's simple kappa was also calculated to better understand the predictive ability of the algorithm. A test of chance agreement between the algorithm and physician consensus for each of the kappa statistics was performed at the 0.05 level.
Results
Description of sample characteristics
The demographic, military, and blast exposure characteristics for categorical variables are summarized in Table 1 for both samples. Overall, both samples were very similar regarding their characteristics, except that sample 2 had comparatively more African American subjects (by percentage).
Median ages of participants was 24 (IQR=21–27) and 23 years (IQR=22–26) for sample 1 and 2, respectively. Sample 1 participants were evaluated at a median of 9.1 months (IQR=6.3–9.7) after their self-described worst blast experience for which they were interviewed. Nominally, sample 2 participants had greater elapsed time, with evaluation occurring a median of 13.2 months (IQR=7.7–22.5) since their index blast experience. Sample 2 also tended to have more blast experiences, with almost half (49%) having three or more blast experiences, compared to only 34% for sample 1 (Table 1). None of these qualitative differences reached statistical significance.
Distribution of physician diagnoses by sample
Distributions of the physician diagnoses of mTBI are summarized separately by physician for both samples in table 2. The four-level multi-nomial outcome for sample 1 is shown along with the binary outcome (Probably mTBI, Probably Not mTBI) for both samples. The proportion diagnosed positive for probable mTBI by each physician ranged from 58% to 93% in sample 1 and from 59% to 86% in sample 2. The proportion of subjects diagnosed with probable mTBI using physician consensus designation was 85% in sample 1 and 84% in sample 2.
Indicates study investigator.
TBI, traumatic brain injury; mTBI, mild TBI.
Inter-rater reliability of physician diagnoses
Inter-rater reliability was first assessed using the binary scales. As displayed in Table 3, sample 1 simple kappa values for the 10 physician pairs ranged from 0.314 to 0.615 and sample 2 kappa values for the 10 pairs ranged from 0.194 to 0.825.
R indicates reviewer.
TBI, traumatic brain injury; mTBI, mild TBI; ASE, asymptotic standard error.
Weighted kappa values for the likelihood of TBI scale (high certainty not, probably not, probably TBI, and high certainty TBI) used in sample 1 are shown in Table 4, with the 10 pairs ranging from 0.283 to 0.641.
R indicates reviewer.
mTBI, mild traumatic brain injury; ASE, asymptotic standard error.
Virginia Commonwealth University retrospective Concussion Diagnostic Interview-blast version automated algorithm development and description
The framework for the automated diagnostic algorithm was based on the first author's own mTBI clinical interview and decision-making method for identifying TBI-induced AOC that was refined during over 20 years of experience. Once a potential concussive event is identified, this method is a two-step hierarchical process consisting of first probing and vetting the PTA construct followed by probing and vetting other potential AOC symptoms if PTA was not already ruled in. The conceptual basis for the item-level decision-tree rules within the algorithm was that diagnosticians would consider some symptoms and symptom combinations to be more specific than others for AOC caused by TBI, and that some participants would give illogical or contradictory responses. In order to form the first draft set of decision-tree rules, we inspected the contrasting patterns of item responses for those who were physician consensus positive versus negative for mTBI. This first draft set of algorithm-tree rules was then refined through trial-and-error adjustments until maximizing the number of correct classifications achieved vis-à-vis the consensus diagnosis. Through this process, we found that the majority of the physician group did, in fact, give precedence to patterns of responses that showed clear evidence of PTA and less credence to illogical PTA patterns, such as remembering the blast but not remembering the time before and after. The majority also appeared to judge “dazed” as the non-PTA symptom least specific for AOC resulting from blast-related mTBI, especially when it was instantaneous (<30 sec) in duration. Further, the majority judged each non-PTA AOC symptom (dazed, confused, and saw stars) to be less specific when it was endorsed lasting over 24 h, perhaps being viewed as nonorganic symptom aggrandizement. We also found that including the “head struck” item in the algorithm provided a very slight improvement in correct classification; perhaps in questionable cases, the raters were swayed by evidence that blunt head trauma accompanied the blast event. Other items in the interview, such as the blast distance and directionality, did not enhance the correct classification rate. The final best-fitting algorithm built from sample 1 is displayed in Figures 1 and 2. It generated a correct classification rate of 61 of 66 (92%) with two false positives and three false negatives against the actual consensus ratings.

Diagnostic algorithm step 1, determination of traumatic brain injury (TBI) with post-traumatic amnesia (PTA). LOC, loss of consciousness.

Diagnostic algorithm step 2, determination of traumatic brain injury (TBI) without post-traumatic amnesia (PTA).
TBI ratings from the final diagnostic algorithm are shown in Table 5. It classified 55 (83%) subjects as positive for Probable mTBI in sample 1 and 30 (81%) subjects in sample 2, proportions very similar to physician consensus.
rCDI-B, Concussion Diagnostic Interview-blast version; TBI, traumatic brain injury; mTBI, mild TBI; PTA, post-traumatic amnesia.
Table 6 shows the performance of the diagnostic algorithm compared to the physician consensus, the proxy gold standard, within sample 2, the cross-validation sample.
rCDI-B, Concussion Diagnostic Interview-blast version; TBI, traumatic brain injury; mTBI, mild TBI.
The VCU rCDI-B automated algorithm achieved near-perfect prediction, in comparison with the physician consensus, given that the algorithm and consensus agreed for 97% of participants (95% CI, 86, 100). Cohen's kappa was 0.91 (ASE=0.09; 95% CI, 0.73, 1.00), reflecting almost perfect agreement. The other measures of agreement also reflect this near-perfect prediction: Sensitivity and specificity of the algorithm were 1.00 (95% CI, 0.61, 1.00) and 0.97 (95% CI, 0.84, 0.99), respectively, and the PPV and NPV were 1.00 (95% CI, 0.89, 1.00) and 0.86 (95% CI, 0.49, 0.97), respectively.
Discussion
As noted earlier, published TBI interview instruments have unknown diagnostic accuracy for previous mTBI, so unstructured clinician interview remains the most widely accepted gold standard method.
33,34
Nevertheless, such freestyle mTBI interviews are susceptible to interviewer differences, including bias, and they lack transparency, both of which limit interpretability when used in research. There is also a complete absence of inter-rater reliability data of unstructured mTBI interviews. Veterans Health Administration (VHA) administrative data suggest that reliability is weak given extreme intersite variability in the proportion of positive versus negative mTBI diagnosis determinations made during comprehensive clinical evaluation of TBI screen positives (VA intranet site not accessible to public; comparable public Internet site can be found at
In the current study of physician ratings of a combined structured and unstructured interview, the range of probable mTBI diagnoses within both the samples were very wide (sample 1 range, 58–93%; sample 2 range, 58–86%). Strength of pair-wise inter-rater reliability (e.g., kappa coefficients) was highly variable, ranging all the way from minimal (κ=0.19) to substantial (κ=0.82). On the mTBI likelihood ordinal scale, paired agreement measures that accounted for magnitude (e.g., weighted kappa coefficients) had similarly very wide ranges (sample 1, κ=0.283–0.641). These levels of agreement involving experts with extensive training and experience in mTBI are less than ideal and echo the VHA administrative data. Although methodology limitations exist, our data suggest that individual “clinical judgments” in determining a historical mTBI after blast are not reliable enough for research use as the proxy gold standard.
In an effort to increase standardization of the mTBI interview and determination process for research purposes, we developed a fully structured, transparent, and automated algorithm from our mTBI interview. The interview itself probed for all potential symptoms that might indicate immediate AOC. It also included two important quality assurance structural features; the first was intended to minimize false-negative responses to the detailed, but potentially abstract, questioning on amnesia, and the second was intended to minimize false positives for self-reported LOC.
The final diagnostic algorithm represents the study clinicians' collective interpretation of the interview data. In essence, it provides a clinician group consensus on an operational definition of historical blast-related mTBI. The algorithm incorporates the various amnesia symptom items to check for logical consistency with PTA, such that certain combinations are considered nonphysiological of mTBI (such as remembering the blast, having a retrograde memory gap, and not having an antegrade memory gap). The algorithm also weighs the relative importance of other AOC symptoms in recognition that they have differing specificity for mTBI after blast. For example, the clinician raters considered the AOC symptom “dazed” as least specific for mTBI. Historically, dazed has been considered a controversial symptom for indicating TBI-induced AOC to the extent that published standardized diagnostic criteria differ on its inclusion. 8 In our algorithm, “dazed” must be accompanied by either “confused” or “saw stars” in order to diagnosis a blast-related mTBI, whereas a stand-alone symptom of “confused” or “saw stars” indicates that an mTBI was sustained if the person also endorsed that their head was struck. The endorsement of “saw stars” is conventionally construed as a patient's concrete portrayal of altered consciousness, but the study physician(s) may have also construed it as a transient focal neurological sign. Finally, a minimum and maximum time frame was used to define physiological duration of AOC resulting from mTBI, such that participants who reported a <30 sec duration or still having the symptom many months after blast were considered negative for that symptom.
The lack of sound inter-rater reliability found on the interpretation of the interview content highlights the importance of this study's development and preliminary validation of a novel, fully structured interview and algorithm. The combined interview and algorithm tool may permit better diagnostic transparency and standardization for clinical research in a population where definitive diagnostics have proven elusive. The rigorous methodology to develop the algorithm included blinding of physician ratings. The structure of the interview, including no follow-up questions on the open-ended portion, served to remove all interviewer bias during symptom history probing and gathering. The results should best generalize to settings where individuals with a high probability of having sustained a previous blast-related TBI are evaluated such as VA Polytrauma clinics. The definition of a blast experience used as inclusion criteria in this study were similar to the OEF/OIF TBI screen used by the VHA, and the distribution of positive mTBI diagnoses in our study is similar to some regional administrative data from the VHA comprehensive TBI evaluations (VA intranet site not publically accessible; a comparable public Internet site is found at
It is unknown whether these findings will generalize beyond blast-related mTBI populations. They may not be applicable to other settings of possible mTBI where there is a lower chance of psychological trauma accompanying the traumatic force. For example, the symptom of “dazed” may have greater specificity for clinician-determined mTBI incurred during athletic or recreational activities. Nevertheless, even if the pattern and combination of endorsed symptoms turns out to be different across trauma settings, researchers may find it advantageous to gather information using the VCU-rCDI-B given its tight standardization. The VCU-rCDI-B can readily be adapted for blunt mechanisms of mTBI with the VCU-rCDI-B and can simply substitute “____” (event) for “blast” (event). Although testing for diagnostic accuracy in a nonblast event population is recommended, we have developed the VCU-rCDI-G (general version), which can be provided upon request.
The primary advantage of using this instrument, as compared to unstructured or existing semistructured interviews, will likely be in research settings, especially multi-center studies where a high degree of standardization is crucial. Even if future research indicates the operational definition of blast-related mTBI subsequently should differ from our clinician consensus, collecting the data with this tool will facilitate data reanalysis using that more valid definition. For example, if it is subsequently determined that the “saw stars” is 100% specific for AOC from mTBI regardless of whether “head (was) struck,” then the algorithm could simply be adjusted and data readily reanalyzed.
The VCU-rCDI-B may also be valuable to some clinicians, clinician extenders, and trainees who have minimal expertise in TBI evaluations and could be useful for administrative purposes, such as estimating prevalence. Experienced clinicians are less likely to fully adopt a highly structured interview, which may be perceived as less personal, more time-consuming, and disruptive to the “art” of interview. Nevertheless, they may find certain elements of this instrument useful to incorporate into their own interview process. It is also important to note that study algorithm and its resulting operational definition of historical blast-related mTBI were based on the rating clinicians' interpretation of the interview responses. This consensus method of “collective wisdom” served to negate the threat of inadequate inter-rater reliability, but it did not negate the threat of collective bias. Furthering the concern for collective bias is that, although they all trained at differing institutions, the physician raters all practiced at the same institution at the time of the study. This and other internal validity threats could potentially be surmounted with a prospective study using immediate assessments to serve as the gold standard, but for which the combat theater setting presents an imposing logistical challenge.
Another limitation of this study was the small sample size, a total of 103 participants split into two samples. CIs are included to permit the readers to make their own conclusions about this limitation. Also, the physician raters did not directly interview the participants and thus had to rely only on a completely open-ended interview in combination with the fully structured portion. The process did not permit the “art” of the interview with adjusting and shaping of questions based on the interviewee's responses and the interviewer's immediate interpretation. To permit this would have biased the interviewee by “priming” him or her for subsequent interviewers. The previously mentioned threat of recall bias could have been further confounded by symptom exaggeration from unknown secondary gain factors. Though the algorithm minimized this to some extent by the aforementioned vetting of illogical responses, we did not specifically assess this with a validated symptom falsification measure. Last, in interpreting the inter-rater agreement data in this study, the inherent limitations of the kappa and other agreement coefficients should be considered. 37 This includes biases owing to the unbalanced proportion of the consensus mTBI-positive versus consensus mTBI-negative participants and varying patient and residual disease characteristics within our samples.
Conclusions
Expert physicians had widely varying degrees of agreement for their blinded ratings of the VCU-rCDI-B, a combined structured and unstructured interview for determining historical mTBI after blast exposure. To minimize the influence of inter-rater unreliablity and maximize transparency and standardization, we developed a diagnostic algorithm from the fully structured portion of the VCU-rCDI-B based on maximum fit with the consensus of the clinician ratings. By this algorithm, an individual was positive for probable previous blast-related mTBI if the endorsed memory gap pattern was consistent with the physiological construct of PTA, or a witness observed LOC was endorsed, or when there were certain combinations of other AOC symptoms endorsed. Illogical memory gap patterns, illogical symptom durations, or having the stand-alone symptom “dazed” were deemed not indicative of AOC resulting from mTBI. The final algorithm had near-perfect agreement vis-à-vis the proxy gold-standard clinician consensus ratings in a small cross-validation sample. The primary advantage of the VCU-rCDI-B, as compared to unstructured or existing semistructured interviews, will likely be in research settings, especially multi-center studies. Additional testing of its psychometric properties of the algorithm component within independent groups of clinicians and comparison studies with other existing instruments is recommended.
Footnotes
Acknowledgments
This study was supported by a grant from US Army Medical Research & Material Command, Congressionally Directed Medical Research Program (CDMRP; grant no.: W91ZSQ8118N6200001); Epidemiological Study of Mild Traumatic Brain Injury Sequelae Caused by Blast Exposure during Operations Iraq Freedom and Enduring Freedom.
Author Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
