Abstract
We aimed to assess the reliability and validity of the Therapy Intensity Level scale (TIL) for intracranial pressure (ICP) management. We reviewed the medical records of 31 patients with traumatic brain injury (TBI) in two European intensive care units (ICUs). The ICP TIL was derived over a 4-day period for 4-h (TIL4) and 24-h epochs (TIL24). TIL scores were compared with historical schemes for TIL measurement, with each other, and with clinical variables. TIL24 scores in ICU patients with TBI were compared with two control groups: patients with extracranial trauma necessitating intensive care (Trauma_ICU; n = 20) and patients with TBI not needing ICU care (TBI_WARD; n = 19), to further determine the discriminative validity of the TIL for ICP-related ICU interventions. Interrater and intraobserver agreement were excellent for TIL4 and TIL24 (Cohen κ: 0.98–0.99; intraclass correlation coefficient: 0.99–1; p < 0.0005). The mean + standard deviation (SD) TIL24 in the ICU TBI cohort was significantly higher than the Trauma_ICU patients and the TBI_WARD patients (8.2 ± 3.2 vs. 2.2 ± 0.9 and 0.1 ± 0.1, respectively; p < 0.005 for both comparisons). Correlations between the TIL scale scores and historical TIL scores, between TIL24 and the Glasgow Coma Scale, and between a range of TIL metrics and summary measures of ICP over the 4-day period, were all highly significant (p < 0.01). The results were consistent with the expected direction. A linear mixed effect analysis, accounting for within-subjects repeated measures, showed strong correlation between TIL4 and 4-h ICP (p < 0.0000005). The TIL scale is a reliable measurement instrument with a high degree of validity for assessing the therapeutic intensity level of ICP management in patients with TBI.
Introduction
I
Given this context, there has been a growing desire to use the intensity of ICP-directed therapy as an alternative biomarker in this context. 3 –5 Many different therapies may be used for the control of ICP, often simultaneously. This poses major difficulties in clinical TBI research, because the individual effect of a study therapy can be obscured by adjustments in any or all other therapies. 4 Integration of all known and relevant ICP directed treatments into a single summary score could therefore be useful in conducting research studies, allowing better comparison between management approaches and outcome variables between centers and countries.
In 1987, Maset and associates 6 proposed a Therapy Intensity Level scale (TIL) to assess the intensity of ICP lowering management with a 15-point scale. This scale has been used in several trials since as a secondary outcome despite important limitations 5,7 –11 such as showing a ceiling effect (i.e., scoring at maximal levels whenever barbiturates are given), not including a complete range of interventions, and being labor-intensive because of the need for hourly assessments.
A novel approach to assessing TIL, which sought to address some of these issues, was developed as part of the Interagency Common Data Elements scheme. 12 The summary score was designed to be consistent with the pediatric intensity level of therapy (PILOT) scale, proposed by Shore and colleagues. 4 The novel adult TIL has been broadly accepted by the neurotrauma research community, but has not as yet been subjected to validation. Such analysis is important to confirm that it effectively documents therapeutic intensity of ICP-directed measures, rather than a diagnosis of TBI, injury severity, non-ICP specific intensive care unit (ICU) procedures, or clinical outcome. Further, such validation would need to address its consistency across repeated measurements.
In the current study, we aimed to assess the reliability and validity of the TIL for ICP management.
Methods
TIL scale
Individual ICP-targeting therapies were assigned a score based on published estimates of their relative efficacy and risks of morbidity. 12 The TIL includes eight ICP-treatment modalities, termed items (Table 1). We calculated the following TIL scale scores:
ICP, intracranial pressure; CPP, cerebral perfusion pressure; CSF, cerbrospinal fluid.
The scheme for Therapy Intensity Level assessment, based on Maas et al., 12 minimally adapted. Initial problems in the pilot phase that were subsequently addressed were:
(*) Conversions between kPa and mm Hg for PaCO2 were ambiguous because of “rounding up” errors. We consequently decided to base our calculations on mm Hg, which resulted in less ambiguous cutoffs.
(#) For the 4-h assessments, we used 0.33 g/kg/4h for mannitol and 0.05 g/kg/4h for hypertonic saline to assign a score value. In the pilot phase of the study, however, these cutoffs were calculated from the 24-h thresholds (inconsistently) by individual raters. In addition, because of lack of clarity in scoring instructions, some cases scored maximally in this category for the 4-h assessment, but were wrongly not scored as maximal for the 24-h assessment, because the total dose of hyperosmolar agent did not exceed thresholds when averaged over 24 h. In a revised version, we explicitly stated that if the dose of hyperosmolar agent exceeded a given threshold in any 4-h epoch, the same score should apply to the 24-h period in which that 4-h epoch was contained.
• TIL4, the numerical summary TIL scale score for every 4-h epoch.
• TIL24, a daily TIL score based on the highest score in each item per day, to provide a metric of the maximal therapeutic intensity for ICP management for the day.
• TILmax, the highest TIL24 score in the assessed 4-day period.
• TILmean, the mean between the four TIL24 in the assessed 4-day period.
For certain calculations—e.g., correlation analysis between the different TIL scale scores (i.e. TIL4 vs. TIL24, TILmean, TILmax)—we used a processed value and calculated a mean TIL4 per day or per 4-day period.
Patients and data acquisition
Data from patients with TBI who were admitted to the neurocritical care unit of the University Hospital Groningen (n = 16) and Addenbrooke's Hospital Cambridge (n = 32) from May 2012 until December 2013 were collected and screened. Patients in the latter cohort were part of an existing approved research study (29 REC 97/291), while use of data from patients in the former cohort was permitted after local medical ethical committees review, which waived the need for informed consent. Only patients older than 16 years, with at least 4 consecutive days of continuous high resolution ICP monitoring starting within 48 h of the incident, were initially included for the ICU stratum (TBI_ICU). Thirty-one patients fulfilled these criteria.
All ICU patients in the TBI group were sedated, intubated, and mechanically ventilated. Both centers used (different) protocol-based ICP management strategies. We collected patients' demographics and baseline clinical data for the prognostic assessment using the extended Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scores. 13 Clinical outcome was measured in most patients with TBI at 6 months using the Glasgow Outcome Scale (GOS). 14 –17 In nine patients (none of whom had severe TBI), a favorable outcome was charted at an earlier time point (at the time of clinical or research follow-up), and patients did not return for further follow-up. These patients were assumed to have carried their favorable outcome out to the 6-month follow-up point.
For the calculation of TIL, medical and surgical interventions for managing intracranial hypertension were extracted from clinical records, including amount of cerebrospinal fluid (CSF) drained, treatment of fever, and whether or not an active cooling protocol was in place. Points for surgical interventions were assigned to the period when the intervention took place and included in every successive period onward. We additionally extracted dosages of all sedative and vasoactive drugs, neuromuscular blockers, hyperosmolar agents, and barbiturate administration.
PaCO2 values were directly obtained from measurements made during each 4-h epoch in the majority of instances. In fewer than 10% of epochs (73 of >800 epochs), however, no contemporaneous PaCO2 value was available within the 4-h period. In such instances, the PaCO2 was derived from the end-tidal CO2 (etCO2) measurement, using the nearest PaCO2 value to correct for the PaCO2-etCO2 gradient, and making the assumption that this had not changed.
A range of summary metrics of ICP were calculated for correlations: • ICP4, mean ICP within 4-h period • ICP24, mean ICP within 24-h period • ICPmax, highest ICP24 in the assessed 4-day period • ICPmean, mean between the four ICP24 in the assessed 4-day period
Two control groups were defined, selected from patients admitted to Addenbrooke's Hospital in Cambridge between November 2012 and April 2015. Patients with extracranial trauma admitted to the Neurocritical Care Unit were randomly selected, screened, and included if TBI could be reasonably excluded based on history, examination, and neuroimaging findings (Trauma_ICU, n = 20). In most of these patients, we had a reliable post-injury Glasgow Coma Scale (GCS) of 15 and normal neuroimaging.
In a minority (Table 2), severe extracranial injury meant that we were either unable to obtain a post-resuscitation GCS, because the patient was intubated for cardiorespiratory instability, or received substantial doses of opioids or ketamine for analgesia before a reliable GCS could be recorded. None of these patients had any TBI-directed therapy, and all had neuroimaging and a subsequent clinical course that excluded any significant TBI. A second group consisted of patients with mild/moderate TBI directly admitted to a ward for observation and treatment of extracranial trauma (n = 19), and who needed no ICP-specific therapies.
TBI_ICU: patients with traumatic brain injury (TBI) and elevated intracranial pressure in need of care in the intensive care unit (ICU).
Trauma_ICU: patients with extracranial trauma in need of intensive care.
TBI_WARD: patients with mild/moderate TBI not needing intracranial pressure (ICP)- directed therapy or ICU admission.
GOS: Glasgow Outcome Scale.
ImPACT: Outcome prediction from Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scheme (Mort: mortality; UO: unfavorable outcome; both calculated from extended ImPACT calculation (core model +CT +Lab).
GCS: Glasgow Coma Scale; data and age shown are mean ± standard deviation or frequency and percentage; (-) not assessed.
In three patients with extracranial trauma, no reliable pre-sedation GCS was available;
for 10 patients, 6-month GOS data were not available at the time of data analysis, but because favorable outcome was achieved at an earlier follow-up point, this was assumed to have been maintained subsequently for the purposes of this analysis.
Fisher exact test.
p-value and Bonferroni corrected p-value for TBI_ICU vs. all other groups.
TIL: Therapy Intensity Level scale.
Demographic data from all three groups were compared using the chi-square test for categorical variables and analysis of variance with the Bonferroni post hoc test for continuous variables. Whenever the criteria for a chi-square test were not met, the Fisher exact test was used instead.
Reliability assessment
We assessed interrater and intrarater reliability in a random subset of 10 patients with TBI. TIL4 and TIL24 were calculated independently by two blinded investigators (PZ, JLG; interrater). After a washout interval of 3 months and blinded to the initial TIL scale scores, one investigator (JLG, intrarater) repeated the measurement. We calculated the Cohen κ and intraclass correlation coefficients (ICC) to better compare our data with that of the literature. 4
Validity assessment
In accordance with the recommendations from the Consensus-based Standards for the Selection of Health Measurement Instruments taxonomy 18 and clinimetric literature, 19 we evaluated content validity, criterion validity and construct validity (assessed by convergent, discriminant, and discriminative validity).
Content validity
This is “the degree to which the content of an instrument is an adequate reflection of the construct to be measured.” 19 It is a subjective measure of how appropriate the items seem to a set of experts. This is not quantified with statistics, and given the derivation and acceptance of our TIL scheme by experts, this was assumed to exist.
Criterion validity
This is “the degree to which the scores of a measurement instrument are an adequate reflection of a gold-standard.” 19 We compared the TIL scale with the grading system suggested by 6 (TIL_Maset), mindful of its limitations. We calculated TIL4 and TIL24 in a subset of TBI_ICU patients and tested our hypothesis of a positive correlation of moderate to strong magnitude using the Spearman rho.
Construct validity
This was quantified to evaluate our expectations regarding how the measurement instrument related to known parameters. 19 We evaluated construct validity by assessing convergent (assessing positive or negative correlations with similar constructs), discriminant (assessing for correlations with measurement instruments measuring different constructs), and discriminative (assessing for ability to differentiate between known groups) validity as follows:
Convergent validity
This was evaluated by testing our expectations of a negative correlation between TILmean and TILmax with GCS of moderate to strong magnitude, a positive correlation between TIL4/TIL24/TILmax/TILmean with ICP4/ICP24/ICPmax/ICPmean of moderate magnitude. In addition, we expected positive correlations of strong magnitude between TIL subtypes, TIL24 vs. TIL4 (daily mean), TILmean vs. TIL4 (averaged over 4 days), TILmean vs. TILmax and TILmax vs. TIL4 (averaged over 4 days). The TIL scale quantifies therapeutic intensity and is not intended to predict clinical outcome. Any clear correlations between outcome and the TIL scale are therefore not expected.
Discriminant validity
To assess discriminant validity, we therefore hypothesized that there is no correlation between the TILmean and TILmax with outcome (GOS, ImPACT) in the TBI_ICU group alone and at best a weak negative correlation in the combined TBI group (TBI_ICU and TBI_WARD) in agreement with the PILOT study. 4 For most of the variables used to assess convergent validity, simple nonparametric statistical tests were used (Spearman rho). This approach was not appropriate, however, for examining the relationship between TIL4 and ICP4, because the multiple estimates of ICP in each individual were not independent. We therefore additionally used linear mixed effects (LME) regression techniques to examine this relationship, using the lme4 package (v. 1.1–8) in R (v. 3.2.1; R Foundation for Statistical Computing, Vienna, Austria).
Discriminative validity
This was assessed by testing the hypothesis that the TIL scale can accurately discriminate cases (the TBI_ICU cohort) from controls (the TBI_WARD and Trauma_ICU cohorts) by comparing TILmean and TILmax between groups using the Kruskal-Wallis test with follow-up testing for pairwise comparison with adjusted p values.
Statistical analysis
All statistical analyses, other than the LME analysis, were undertaken using IBM SPSS Statistics, v. 22. Where multiple analyses were undertaken using the same pairs of data, or their derivatives, we applied a Bonferroni correction to our p values.
Results
Patient demographics, along with GCS, ImPACT, GOS (where appropriate), and the different TIL scale scores are shown in Table 2. Data were collected from patients in each of the three groups (total n = 70). The groups did not differ in terms of age or sex.
The instrument showed high intrarater reliability for both TIL24 and TIL4 measurements. An assessment of interrater reliability resulted in a Cohen κ of 0.981 with and ICC of 0.999 (p < 0.0005) for the TIL4, and perfect agreement for the TIL24 (Table 3).
ICC, intraclass correlation coefficient; TIL, Therapy Intensity Level scale.
Reliability assessment with Cohen κ and ICC (aimed for agreement) after pilot phase testing with subsequent rulebook optimization.
Validity metrics for our TIL score are shown in Table 4, Figures 1 and 2, and Supplementary Figure 1. The TIL24 and TIL4 showed moderate correlations with the corresponding historical scores (TIL_Maset, showing criterion validity), and with the GCS and ICP (showing convergent validity). The direction and strength of these correlations were all in keeping with our a priori predictions.

Correlation between the Therapy Intensity Level (TIL) scale score and intracranial pressure (ICP): (

Construct, discriminative validity of the Therapy Intensity Level (TIL) scale. Kruskal-Wallis test, independent samples. Boxplot shows TILmean (
TIL, Therapy Intensity Level scale; GCS, Galsgow Coma Scale; ICP, intracranial pressure; GOS, Glasgow Outcome Scale; TBI, traumatic Brain Injury; ICU, intensive care unit; UO, unfavorable outcome; ImPact, Immediate Post-Concussion Assessment and Cognitive Testing.
Correlation coefficients regarding criterion and construct validity. The results were all in agreement to the predictions (*) made before data analysis.
A random intercepts linear mixed effects modeling, grouped by patient, undertaken to correct for the nonindependence of multiple measurements of TIL4 and ICP4 within patients, showed a significant positive association (p < 0.0000005). This parameter was still significant (p < 0.0005) with both random intercepts and slopes, providing strong evidence for an underlying population-level relationship between these parameters.
Patients in the TBI_ICU stratum showed significantly higher TILmean and TILmax values than the two control cohorts (TILmean ± SD: 8.2 ± 3.2 vs. 2.2 ± 0.9 and 0.1 ± 0.1; TILmax ±SD: 9.9 ± 3.7 vs. 3.4 ± 1.4 and 0.2 ± 0.4, for groups TBI_ICU, Trauma_ICU and TBI_WARD, respectively). Kruskal-Wallis test to compare multiple independent samples for TILmean showed H(2) = 60.55, p < 0.0005 and for TILmax H(2) = 59.39, p < 0.0005. Pairwise comparisons with adjusted p values showed that there are significant differences between all groups: TBI_ICU vs. Trauma_ICU (p < 0.005), TBI_ICU vs. TBI_Ward (p < 0.0005), and Trauma_ICU vs. TBI_Ward (p = 0.007) for TILmean and TILmax, but the correlations between TIL and predicted or observed outcome did not survive Bonferroni correction for multiple comparisons.
The TIL4 and TIL24 showed high correlation within our scheme, showing that the daily measure of TIL (TIL24) was an acceptable summary metric of ICP therapy intensity. The TILmax showed strong correlation with TILmean and less strong, but still highly significant correlations with the TIL4 (mean over 4 days), suggesting that abstraction of the higher intensity interventions performed in each 24-h period still provided an acceptable (albeit less faithful) measure of the TIL (Supplementary Figures 2 and 3 and Supplementary Table 1).
Discussion
We show that the TIL scale score can be obtained retrospectively in patients with TBI and that it has excellent inter- and intrarater reliability with minimal measurement error, both for 4-h and 24-h assessments. These results are in agreement with the PILOT study, 4 which showed an ICC of 0.91 for interrater and 0.94 for intrarater reliability.
We judge the amount of content validity of the TIL scale as very high, because it was based on the consensus article on standardizing data collection in TBI 12 developed by an international expert panel.
As a fundamental prerequisite, the test-of-interest is judged against a “gold standard” for assessing the same variable, or the same concept. This measure of criterion validity is less likely to find a suitable gold standard with more abstract concepts. In this case, we compared our TIL scale against the “best available alternative” comparator—the TIL grading system described by Maset and coworkers, 6 despite its recognized limitations and lack of formal validation. We found a high correlation between these two variables, as expected, adding further criterion validity to the iterative process of validation. Although we had no patients in our cohort who received metabolic suppression with barbiturates or other anesthetic agents, our scoring system would (self-evidently) not suffer from the ceiling effects that have been seen to be a problem with the TIL_Maset, even if this intervention was deployed. 7,10,11
Convergent validity was demonstrated by showing a correlation of expected magnitude and direction between TIL vs. GCS and TIL vs. ICP. These correlations were broadly in agreement with corresponding figures in the PILOT study. Our expectation of only a moderate to strong correlation between GCS and TIL was based on the fact that some patients (e.g., those with diffuse axonal injury) who present with low GCS may experience no problems with intracranial hypertension, and hence achieve low TIL scores, even with prolonged ICU stays. Critically, the relationship between TIL and ICP was retained with the application of a linear mixed effects modeling approach, which accounted for the repeated measures of ICP within individuals, thus ensuring the reliability of our results.
We show that our TIL scale can accurately discriminate between different treatment groups in intensive care (TBI_ICU vs. Trauma_ICU), and between TBI patients treated within and outside the ICU environment (TBI_ICU vs. TBI_WARD), suggesting that the items that it includes are specific for the ICU management of TBI (which is prominently targeted at intracranial hypertension). One drawback of our TRAUMA_ICU population was the fact that a reliable GCS could not be obtained pre-sedation in a minority, although the overall clinical course in these patients excluded a significant TBI. In any case, if these patients did have an undetected mild TBI, this confound would work against the discriminative validity of the TIL.
Additional evidence supporting its discriminant validity comes from the demonstration that the negative correlation between TIL and outcome (using a dichotomized GOS) is present but weak in the combined group of TBI patients (n = 40 with mild to severe TBI), a finding that is concordant with the PILOT study results, but no correlation between TIL and outcome within the TBI_ICU cohort. The former relationship detects ICP lowering therapies in the TBI_ICU group that are absent in the TBI_WARD patients, while the absence of this relationship in the TBI-ICU group on its own suggests that TIL itself does not correlate with outcome, because successful treatment of refractory intracranial hypertension (with very high TIL scores) can be associated with good outcomes.
The choice of 4-h or 24-h epochs was arbitrary. Our results show no difference in reliability or validity between these two approaches, with strong and statistically significant correlations both between the two TIL scores. In addition, the correlations that we show for demonstrating validity apply equally to both scores. Concerns about practicability, burden of data capture, and time investment may predicate the use of the TIL24 in many settings. Where the research question requires this (e.g. evaluation of a new pharmaceutical approach to lowering ICP), however, a more frequent assessment of TIL may be justified. On the other hand, we also showed less strong, but still reasonable correlation between the TILmax and the TIL4. Given that the TILmax is relatively easy to abstract from clinical notes, this may provide a reasonable alternative in resource-limited research settings.
The positive correlation of TIL4 with 4-h ICP values is interesting. While ICP control generally remained acceptable throughout the range of TIL, this correlation implies that the actual ICP value rose slightly as therapy was progressively intensified. This perhaps suggests that clinicians balance risks and benefits in a pragmatic manner, and when applying more aggressive therapies, accept slightly higher ICP values. These data provide insights into clinician behavior that merit further investigation.
While the TIL score that we describe has many pragmatic benefits, it also has limitations. The TIL scale per se is arbitrary. Point assignment to each TIL item reflects a presumed weighting of interitem differences in therapy intensity that is impossible to determine objectively with the limited evidence available. This is inevitable, however, and reflects the very nature of any data reduction exercise.
Another limitation of this present study is the fact that it is a two-center, retrospective study, analyzing a specific—and small—population in a specific context that may not be truly representative.
In an initial pilot assessment of the scheme (data not shown), we obtained a less satisfactory κ value of 0.455 in a 4-h interrater reliability assessment. A review of our internal rules for assigning points showed the possibility of ambiguity. A revision of these rules was therefore undertaken before implementing the study (see footnote to Table 1). This discussion shows that it is critical to ensure the availability of a precise and unambiguous protocol for implementation of a TIL scale. In addition, they merit a cautionary note, that the results from the interrater reliability assessment may be theoretically contaminated by possible practice effects.
Conclusion
We have presented evidence that the TIL scale is a reliable measurement with a high degree of validity for assessing TIL of ICP management in patients with TBI in an ICU. Further studies are warranted, ideally prospective and across heterogeneous centers in a larger population with different preferential therapeutic approaches, to evaluate the generalizability of this measurement instrument.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
