Abstract

Dear Editor,
W
A majority of A-J & L's comments on this preliminary phase I pilot study 1 have merit, and we thank them for their insight. With respect to our choice of psychometric tests, there are hundreds of psychometric tests, and most researchers by habit, training, or experience have a preferential battery of tests that they use for a general psychometric test battery that they modify as needed for particular diagnoses. Our choice of instruments was guided more by our experience from testing hundreds of patients with dozens of different chronic neurological conditions who have been treated with this hyperbaric oxygen therapy (HBOT) 1.5 ATA protocol over the past 23 years. In that large cohort of patients are the TBI patients with whom we have demonstrated positive treatment effects using the majority of the psychometric tests we employed in this pilot study. This experience, our desire to maximize enrollment and compliance with a short widely applicable test battery in previously multiply tested subjects, our familiarity with these instruments, and the decision not to gamble with the unknown magnitude of treatment effects on new instruments in these combined diagnoses dictated the test battery. We also attempted to choose instruments that are well known, among the most widely used, measure several different forms of memory, and have recently updated norms. For these reasons we chose the Wechsler Memory Scale, 4th ed. (WMS-IV) to measure memory instead of the Hopkins Verbal Learning Test-Revised 2 and the Brief Visuospatial Memory Test-Revised. 3 We were also not convinced that the alternate forms of these latter tests truly addressed test/retest effects. Our concerns were recently reinforced by Duff, 4 who pointed out that despite having six alternate forms, all of the alternate forms of the Hopkins Verbal Learning Test-Revised 5 do not appear to be comparable. If the tests are not comparable they cannot address practice effects.
Regarding the use of the Wechsler Abbreviated Scale of Intelligence™ (WASI) as an alternate test for the Wechsler Adult Intelligence Scale, 4th ed. (WAIS-IV), we agree with A-J & L's criticisms and note that there are criticisms for either choice we could have made. A-J & L acknowledge the challenge of reducing practice effects in repeat testing and the desire for alternate forms. We share these sentiments. As there is no alternate form for the WAIS-IV we were faced with either having a practice effect by repeating the WAIS-IV, or trying an alternate, non-ideal short IQ test (WASI) that has no practice effect with the WAIS-III, 6 our test of choice when the study was conceived. We chose the WASI option, as the lesser of two evils.
A-J & L also recommended reporting pre/post subtest scores for the WAIS-IV and WASI full-scale IQ (FSIQ) measurement, while arguing that the subtests in the two measures are both different and therefore not equivalent. We do not understand this reasoning. We did not attempt a comparison of subtest scores (WASI vs. WAIS-IV) because the subtests are different/not comparable. We also did not do so for economy. The original draft manuscript reporting all of the data included 27 tables, figures, and graphs. This was condensed to the present 17 and generously allowed by the Journal of Neurotrauma.
Thank you for questioning whether we used the WMS-III or WMS-IV. We thought this was obvious in both text and tables where we specify our use of the WMS-IV on four occasions (abstract, Table 1, p.173, Table 5). We simply shortened the description to “working memory,” as that was the construct being measured.
Economy was also one of the reasons we chose not to report Reliable Change Scores (RCS) in our tables as recommended by A-J & L. The RCS is the mean change divided by the standard error of that mean change. It is, in effect, the t value from which the p value of a paired Student t test is calculated; therefore, for reasonably large sample sizes, RCS above or below the value 2 can be considered to be equivalent to significance or non-significance, respectively (the 0.05 α level). The decision to include or exclude the RCS in a summary table is based on a trade-off between the desire to minimize clutter in what was already a very voluminous set of tables, quantity of data, and statistics, and the desire to present the most intuitively interpretable summary statistics. We chose the latter. A mean change having an RCS of 10 is more “impressive” than a mean change having an RCS of 4, even though both of them would have their significance displayed as “p<0.001” in a summary table. In constructing our tables, we decided that p values would provide suitable indication of significance of the changes, maintain consistency in the reporting of all of our data, and prevent an unnecessary expansion of the tables to accommodate RCS values.
Unfortunately, some of A-J & L's other comments and recommendations missed the mark and seemed illogical. For example, A-J & L suggested on the one hand that the Phase I study 1 reported more change from the treatment than could be substantiated, yet note the failure to achieve a statistically significant improvement in the T.O.V.A. reaction time measure. A-J and L then suggest that we include additional PCS specific reaction time/processing speed measures to “show change over time with improvement in neuronal functioning.” Too much improvement on our outcome measures collectively, yet no improvement on 6/21 measures, one of which measures reaction time/processing speed, so choose additional/alternate measures of reaction time/processing speed to show improvement? We do not understand this. Further, we do not see the logic in A-J & L's selection of one variable out of the 21 reported after seeing all 21 p values, attribution of a special ad-hoc property to this one variable (such as special resistance to test/retest effects), and concluding that the non-significance of this one variable constitutes a convincing counterargument against our overall conclusions. Rather, it is an example (in the reverse direction) of the same type of multiplicity that A-J & L have criticized in our analysis. (See our subsequent response to their comment on multiplicity adjustments).
Additional comments that seemed to miss the point or were confusing were the comment on Green's Word Memory Test (WMT) and, in the same paragraph, the comment on the subjects' “…subjective rating of his or her memory functioning and objective performance on neurocognitive measures.” A-J & L note that Green's WMT was not given at the post-test, and the veterans may not have given a full effort. However, if the veterans had given a substandard effort on the post-test, then the amount of change on most of the measures would have been substantially reduced. In fact, the suggestion is that the amount of change being reported is too large to be believable without the possible explanation of practice effects and placebo effects. (As A-J & L noted, we omitted reporting the results of Green's WMT in the article; however, we addressed this in our response to the Wortzel et al. Letter to the Editor; all subjects passed the composite Green score).
The comment on the subjects' estimation of their memory is a parsing of our statement in the mentioned paragraph and is taken out of context and continuity with the preceding paragraph. In the preceding paragraph we stated that the FSIQ increase as a global measure of cognitive function was consistent with the subjects' self-reported 40% cognitive improvement. We then stated that memory and frontal lobe function (simple sustained attention, working memory, and more complex attention), not memory alone, as A-J & L state, improved from what would appear to be ‘‘average’’ or ‘‘normal’’ levels to what the subjects considered to be more their ‘‘normal’’ levels. We were describing a self-reported change in cognition that corresponds to a variety of cognitive measures, not just memory. We respect A-J's work in this subject matter, but the references that A-J & L cite describe a body of literature much more complicated than their simple statement, the nuances of which cannot be dissected here because of space limitations. The Sawrie reference, 7 for example, describes subjective and objective memory decline after anterior temporal lobectomy for seizures. Sawrie reported that objective verbal memory decline post-surgery did not predict self-reported memory decline; however, self-reported improvement in memory was consistent with objective improvement in memory. In the Pearman study 8 on healthy young adults, subjective (and derivatively, objective) health was related to objective memory performance and was the sole predictor of self-rated memory ability. Our veterans were in poor health, reported that they experienced subjective improvement in their health and quality of life, as measured by physical, social, and cognitive measures of health on the Modified Perceived Quality of Life Questionnaire, objectively improved their cognition and brain blood flow, and reported that they experienced improved cognition. Our subjects' claim of self-reported improvement in cognition seems to be supported by both the data and the two A-J & L references that directly address the issues pertinent to our subjects' estimation of their improvements in cognition.
A-J & L wondered about “…the potential contribution of subject selection in the reported findings,” that is, the inclusion of 2 moderate TBI subjects with the 13 mild TBI subjects. Because of the small sample size, we did not have the ability to answer this question; however, in our future analysis of the subsequently recruited 14 additional subjects, we will have 3 additional moderate TBI subjects for a total of 5 out of the 29 treated subjects. We might be able to answer A-J & L's question with subgroups of 5 and 24 subjects.
A-J & L have said that we did not apply suitable corrections to control the “familywise error” rate to some conventional pre-specified level (such as 5%) on our cognitive measures. We would point out that this was not a late-phase clinical trial, in which final definitive inferences were going to be drawn, and for which we needed to keep the possibility of even one “false positive” significance conclusion out of the 21 variables analyzed, to the pre-specified α level. In this study, as in all exploratory studies, all conclusions are tentative, and p values are generally considered descriptive, not inferential. Furthermore, even though there is an ∼66% chance that random fluctuations alone (in the absence of any true HBOT efficacy) could cause at least one of the 21 variables to have p<0.05, there is only about one chance in 1,000,000,000,000,000 (1 x1015, a quadrillion) that random fluctuations alone could cause 15 of the 21 variables to have p<0.05. If we combined this figure with the chance of random fluctuations explaining the associated imaging findings, a chance explanation for all of our findings would be many orders of magnitude smaller.
We stand corrected on the Benson and Hartz 9 and Concato 10 references. Thank you for pointing out this error. Our study was prospective and, therefore, higher in hierarchical evidence than a retrospective case series, but it is not consonant with Benson and Hartz's, and Concata's definitions of observational study.
A-J & L dispute our claim that “There is no effective treatment for the combined diagnoses of PCS and PTSD.” They cite Mittenberg 11 and Ponsford 12 as evidence for our “ungrounded statement.” We would ask A-J & L to recheck these references and their statement. Both the Mittenberg and Ponsford articles report efforts to prevent, not treat, (persistent) PCS at 6 or 3 months, respectively, in mild TBI patients who were initially evaluated in hospital emergency departments and given early intervention in the emergency department or during the first week post-discharge, respectively. Neither of these articles addresses treatment, nor the PCS or persistent PCS nearly 3 years after injury, as in our subjects, nor the combination of (persistent) PCS with PTSD. A-J & L further state that “symptom management in combination with soldier and family education is effective in the majority of cases.” Nearly 3 years post injury? We are not familiar with literature that supports this, A-J & L provided none, and it is inconsistent with the recent Congressional Budget Office report showing very high persistence of veterans withTBI (∼70%), PTSD (∼80%), and TBI/PTSD (∼100%) in the Veterans Administration (VA) system 4 years after entry for their diagnoses. 13 As a result, we reaffirm our statement that there is no effective treatment for the combined diagnoses of PCS and PTSD. In fact, if one employs the definition of evidence-based medicine. 14 we tender that our small preliminary phase I data are the best evidence-based medicine at this time for the combined diagnoses of mild-moderate blast-induced PCS and PTSD.
A-J & L also state that “…it may be unrealistic to find a single treatment for both conditions…It is also important to acknowledge that while (sic) sometimes comorbid PTSD and mTBI are distinct pathological entities and thus may require the application of multiple treatment methodologies.” Both of these statements are true and have nothing to do with our original claim; however, it appears that we may have found a treatment for both conditions and we recommend that it (HBOT) should be delivered in combination with other therapies. For all of the abovementioned reasons we reaffirm our claim.
A-J & L argue that the sum of their comments are so strong that the “study results leave little meaningful behavioral data to support the effectiveness of the examined treatment.” Our article is lengthy with many components: symptoms, physical examination, quality of life questionnaires, affective measures, cognitive measures, and functional brain imaging. Taken or published individually one can find fault with most of them and additional components of our study: the design, combination of diagnoses, length of time to treatment, effects of psychoactive drugs, confounding by affective components, test/retest effects, placebo effects, and others. However, the study is published as a composite and should be critiqued as such. To take selective components of the study with which one finds faults and argue that the results of the study are negated is insufficient and an incomplete analysis of our preliminary data. A-J & L omitted commenting on one of the critical pieces of our study, the well-validated functional imaging modality and analysis, which, when taken in conjunction with all of the rest of the multicomponent data, led us to the conclusions we stated in our study.
As a result, we repeat our conclusions that our data cannot be explained with placebo effects (or test/retest effects) and are more consistent with the known biological wound-healing effects of HBOT on chronic brain wounding. Most notably, the widespread improvements in brain blood flow to what we suggested were the microscopic and macroscopic wounds of mild-moderate TBI, particularly in the white matter, are consistent with the treatment effects of HBOT on the recent high angular resolution diffusion imaging (HARDI) MRI-demonstrated widespread areas of damage (wounds) in the white matter of veterans with mild blast and non-blast TBI. 15 Similar subtle white matter lesions detected by diffusion tensor imaging MRI 16 in blast-exposed veterans has been found to correlate with PTSD in the absence of a diagnosis of a “clinical diagnosis of mild TBI.” The authors stated that these findings suggested a role for subclinical TBI in the genesis of PTSD, thus underscoring the seemingly inextricable intertwining of blast-induced persistent PCS and PTSD. As we stated in our article, given this intertwining, we appear to be treating both PCS and PTSD simultaneously. Moreover, given brain blood flow-metabolism coupling in chronic brain injury 17,18 and blood flow-function relationships, 19,20 the demonstrated improvements in brain blood flow are strongly supportive of and consistent with the symptom, physical examination, cognitive, and quality of life improvements experienced by our subjects, rather than the multiplicity of purported flaws in our design and methodologies.
In conclusion, we thank A-J & L for the tone and content of their letter and will incorporate some of the changes they have recommended. There is little argument that alternate forms should be used in cognitive testing where possible, and most neuropsychologists lament the dearth of more robust and clinically meaningful measures with alternate forms. Much of the conversation about practice effects and placebo effects will become moot with the addition of a randomly assigned control group and double-blind experimental conditions. This is a phase I pilot study, and further exploration is planned to include control groups, blinded conditions, and, certainly, the consideration of measures with alternate forms.
