Abstract
Introduction
In recent years, advances in cognitive neuroscience, neurorehabilitation research, and neuroimaging have led to dramatic advances in our understanding of how the brain reorganizes in thesetting of stroke and other forms of focal brain injury. These discoveries have, in turn, paved the way for the use of noninvasive neuromodulation technologies, such as transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS), which can potentially be employed to create focal, persistent neuroplastic changes in brain activity. Noninvasive brain stimulation has been explored as a potential adjunctive treatment for a variety of post-stroke deficits, including aphasia, one of the most common and debilitating cognitive sequelae of stroke. Aphasia can incapacitate all modes of human communication, including language production, language comprehension, reading, and writing. Encouragingly, over the course of the last decade, a growing body of evidence has supported the use of noninvasive brain stimulation (NIBS) approaches to enhance long-term recovery in persons with aphasia (PWA; Barwood et al., 2013; Marangolo et al., 2013; Medina et al., 2012; Shah-Basak et al., 2015; Waldowski, Seniów, Leśniak, Iwański, & Członkowska, 2012).
Depending on the stimulation parameters employed, both TMS and tDCS can be applied in ways that are understood to have either excitatory or inhibitory effects on underlying brain areas. Reflecting this, the specific TMS or tDCS approaches taken in treatment studies of PWA vary. Generally, these differences in approach are driven by different theoretical accounts of the likely neural mechanisms that underlie recovery from aphasia. To date, treatment approaches using NIBS are based on one (or more) of three putative mechanisms. The first of these is that spared regions of the damaged left hemisphere can be recruited to subserve language functions, leading to studies in which excitatory stimulation is applied to these perilesional areas (Baker, Rorden, & Fridriksson, 2010; Cornelissen et al., 2003; Karbe et al., 1998; Khedr et al., 2014; Ohyama et al., 1996; Szaflarski, Allendorfer, Banks, Vannest, & Holland, 2013; Warburton, Price, Swinburn, & Wise, 1999; You, Kim, Chun, Jung, & Park, 2011). The second model posits that right hemisphere homologues of lesioned left hemisphere regions can function in a compensatory manner, suggesting that facilitative stimulation of the intact hemisphere could enhance recovery (Musso et al., 1999; Thulborn, Carpenter, & Just, 1999; Tillema et al., 2008; Peter E. Turkeltaub et al., 2012). Finally, a frequently invoked third account is that activity in specific sites in the right hemisphere may hinder left hemisphere recovery and that inhibiting such sites could be beneficial (Barwood et al., 2011; Medina et al., 2012; Naeser et al., 2005; P. E. Turkeltaub, Messing, Norise, & Hamilton, 2011; Weiduschat et al., 2011).
Studies in PWA involving TMS and tDCS have reported improvement in a variety of language functions, ranging from better accuracy in picture naming (e.g., Baker et al., 2010; Barwood et al., 2013; Flöel et al., 2011; Medina et al., 2012; Naeser et al., 2005; Thiel et al., 2013; You et al., 2011) to self-perceived improvement among patients in the ability to communicate (Rubi-Fessen et al., 2015; Szaflarski et al., 2013) after both forms of NIBS. However, the degree to which PWA improve in response to one brain stimulation technique compared to the other is largely unknown. Direct comparison of TMS to tDCS in PWA is challenging, not only because of methodological and mechanistic differences between the two techniques, but also because of the heterogeneity of the parameters employed in the studies that have used each technology, as well as the clinical variability in stroke patients in these studies with respect to lesion location, size, chronicity, and aphasia symptoms. Nonetheless, the growing number of studies supporting the efficacy of both of these techniques in the treatment of aphasia after stroke suggests that careful quantitative comparisons of TMS and tDCS are feasible and potentially clinically useful.
It is important to note that while TMS and tDCS are often discussed together under the category of NIBS, they differ in both mechanistic and practical ways. TMS entails the focused generation of rapidly fluxing magnetic fields, which penetrate the skull and generate suprathreshold electrical currents that depolarize underlying cortical neurons. By contrast, conventional tDCS is administered by delivering low-intensity subthreshold electric currents using saline-soaked sponges that are placed on patient’s scalp. These currents are thought to induce incremental shifts in neuronal membrane resting potentials that change neuronal firing rates, but are insufficient to trigger action potentials. The excitatory or inhibitory effects of TMS are dependent on the frequency at which it is administered; Repetitive TMS (rTMS) delivered at frequencies <5 Hz decreases excitability of affected cortical areas, while rTMS at higher frequencies increases excitability. While tDCS can also induce excitatory or inhibitory effects on cortical activity, these effects are believed to be dependent on electrode polarity. The resting membranes of neurons near the cathode are thought to become incrementally hyperpolarized and hence less excitable, while those near the anode depolarize slightly and thus become more excitable. Recent evidence, however, suggests that this relationship between anodal- (atDCS) and cathodal-tDCS (ctDCS) and changes in cortical excitability may not be so straightforward (Batsikadze, Moliadze, Paulus, Kuo, & Nitsche, 2013; Jacobson, Koslowsky, & Lavidor, 2012), especially in brains damaged by stroke (Shah-Basak et al., 2015).
In terms of choosing the most useful brain stimulation therapy for PWA, practical differences between TMS and tDCS, such as precision, portability, and ease of use, may prove to be as important asmechanistic differences. Because conventional tDCS employs large electrodes, conventional direct current stimulation affords much lower spatial resolution compared to TMS. Moreover, localization of brain areas using tDCS is imprecise (Datta et al., 2009; Datta, Truong, Minhas, Parra, & Bikson, 2012). Whereas the 10–20 EEG International System for scalp electrode placement is typically used to guide tDCS, TMS is often implemented using neuronavigational systems that incorporate structural MRI data in order to enable precision at the level of several millimeters. However, this precision comes at a price; TMS is cumbersome, expensive, and can require bulky auxiliary equipment. By contrast, tDCS is small and portable. TMS is also considerably more difficult to pair with concurrent behavioral therapies as subjects are not free to move their heads during stimulation and are generally required to sit still. TDCS allows more flexibility in this regard, and can be delivered at the same time as speech and language therapies, which can potentially further enhance the benefits of the treatment.
Recently published meta-analyses that have assessed NIBS treatment effects in PWA have focused on either rTMS or tDCS, except one report that analyzed studies employing inhibitory subtypes of both rTMS and tDCS, targeted to homotopic language areas (Elsner, Kugler, Pohl, & Mehrholz, 2013; Otal, Olma, Flöel, & Wellwood, 2015; Ren et al., 2014). However, responses to both rTMS and tDCS can be highly variable, with as many as 30% of subjects experiencing excitatory effects in the setting of inhibitory stimulation protocols, or vice versa (Hamada et al., 2008; Krause & Cohen Kadosh, 2014; López-Alonso, Cheeran, Río-Rodríguez, & Fernández-Del-Olmo, 2014; Wassermann, 2002). Not only do models predict substantial differences in current flow between individual brains, particularly those with stroke damage (Datta et al., 2009; Shah-Basak et al., 2015), but also circuit-based mechanisms for their effects remain poorly understood. These limitations interfere with purely hypothesis-driven selection of stimulation parameters. Thus, to constrain meta-analyses of the therapeutic effects in PWA to studies targeting only one cortical region, or using only inhibitory (or excitatory) subtypes of rTMS and tDCS may be premature, especially since therapeutic effects remain theoretically plausible for both excitatory and inhibitory rTMS or tDCS, as applied to any of several cortical sites relevant to language. This study, therefore, presents a systematic review of all treatment studies using TMS and tDCS in PWA, without regard to stimulation subtype or cortical target, and compares overall effects between these modalities. Because rTMS exogenously triggers a spatially precise, suprathreshold physiological response in language-specific areas, whereas tDCS facilitates ongoing endogenous synaptic activity over broad regions of cortex, we hypothesized that our meta-analyses would reveal greater therapeutic benefit of TMS compared to tDCS. Secondly, because variability in responses to rTMS and tDCS is frequently attributed to differences in patients’ clinical characteristics, as well as experimental design, we explored these factors in two subanalyses which parsed studies according to stroke chronicity (subacute, chronic, and mixed) or trial design (within-subject, between-subject, and crossover trials).
Methods
Literature search strategy
Two reviewers (FG and JP) carried out independent literature searches to identify potential treatment studies of rTMS or tDCS in post-stroke aphasia. The following databases were used to conduct electronic searches to identify relevant studies: PubMed, ScienceDirect, Cochrane Central Register of Controlled Trials, Embase, Journals@Ovid and clinicaltrials.gov. The search terms were “aphasia OR language disorders OR anomia OR linguistic disorders AND stroke AND transcranial magnetic stimulation OR transcranial direct current stimulation”. The searches were limited to human studies written in English and published between January 1960 and October 2014.
Inclusion/exclusion criteria
Table 1 provides detailed criteria for identifying studies for the meta-analyses. In keeping with the main objective of this review, we included all studies that carried out treatment using rTMS or tDCS in stroke PWA, regardless of the trial (or experimental) design of the study. Studies that implemented between-subject or randomized controlled (RCT) design, cross-over trials, and within-subject or pre-post trials were all included. Since picture naming is one of the most frequently used batteries for assessing improvement in language abilities after treatment with rTMS or tDCS (Shah, Szaflarski, Allendorfer, & Hamilton, 2013), we included studiesthat reported raw scores or changes in picture naming accuracy. Picture naming accuracy reflects the number of correctly articulated names of objects, displayed to patients as line drawings (DeLeon et al., 2007). A positive change in accuracy, comparing post-stimulation performance to pre-stimulation performance, indicates an increase in number of correct responses and therefore suggests improvement. The comparison of performance depends on the experimental design of the study. For instance, accuracies are either compared to patients’ own pre-treatment performance in within-subject study designs, or they are compared against a separate group of patients who receive sham/control treatment in between-subject study designs. In cross-over trials, the same subjects undergo both the sham and the real treatment and changes in accuracy relative to baseline are compared between conditions. In incomplete crossover studies, a subset of subjects receives only real stimulation, while a subset receives sham stimulation first followed with real stimulation. The comparison is within-subject in the former subset (i.e., assessing post-stimulation performance relative to subject’s baseline performance). The comparison in the latter subset is also within-subject but it is relative to subject’s performance after the sham stimulation.
We placed no restrictions on stroke characteristics or types of aphasia in our inclusion criteria for treatment studies. We also did not restrict inclusion of studies based on brain stimulation parameters. Thus, studies that provided atDCS, ctDCS, or both were all included, and studies that implemented rTMS ofdifferent kinds, including theta burst stimulation (TBS), were included. However, we excluded those studies in which: a) fewer than 3 stimulation sessions were administered per patient (e.g., Chieffo et al., 2014; S. Y. Lee, Cheon, Yoon, Chang, & Kim, 2013), or b) stimulation was provided to different sites across sessions (e.g., Naeser et al., 2011), because there is little evidence to suggest that single sessions or very few sessions of stimulation can translate to long-term benefits. In line with the notion that repeated sessions of stimulation are necessary to elicit therapeutic effects, most treatment studies have implemented protracted regimens involving 1–3 weeks of exposure to rTMS or tDCS. Only those studies that administered stimulation (rTMS or tDCS) in daily sessions (3–5 days/week) and kept the cortical target(s) constant throughout the treatment period were included. No restrictions were placed on the duration or time point of speech and language therapy (SLT) in these studies. While most tDCS studies provided SLT concurrent with stimulation (Baker et al., 2010; Flöel et al., 2011; Jung, Lim, Kang, Sohn, & Paik, 2011; Kang, Kim, Sohn, Cohen, & Paik, 2011; You et al., 2011), rTMS studies provided SLT after stimulation was completed. The requirement that patients hold still during rTMS delivery and the noise produced by the device both presented likely obstacles to concurrent language therapy during rTMS (Heiss et al., 2013; Kakuda, Abo, Momosaki, & Morooka, 2011; Khedr et al., 2014; Seniów et al., 2013).
Studies that were published without English language translation (e.g., J. H. Lee et al., 2007) were excluded. We also excluded publications of pilot studies (e.g., Barwood et al., 2011; Thiel et al., 2013; Waldowski et al., 2012; Weiduschat et al., 2011) where pilot data were also included in later publications, either with increased recruitment (e.g., Heiss et al., 2013; Seniów et al., 2013) or extended follow-up evaluations (e.g., Barwood et al., 2013). In such cases we only analyzed the follow-up studies. Lastly, because both rTMS and tDCS are relatively newer techniques, several studies that were published in the early 2000 s were case reports, pilot, or proof-of-concept studies, which were excluded from the analysis. Our inclusion criteria permitted only those studies that provided treatment with either rTMS or tDCS in ≥4 patients.
In summary, the following inclusion criteria were applied in the evaluation of studies: 1) the patients were adults diagnosed with aphasia due to stroke; 2) the number of participants in the study was ≥4; 3) the outcome measures included picture naming accuracy before and after brain stimulation; and 4) the number of stimulation sessions was ≥3. Those that seemed applicable from their title or abstract were obtained in full-text format and underwent further scrutiny.
Qualitative assessment
The methodological quality of all included studies was assessed using the Downs and Black (D&B) tool (Downs & Black, 1998). D&B is a 27-item checklist validated for both RCTs and cohort/non-RCTs (Bastani & Jaberzadeh, 2012; Saunders, Soomro, Buckingham, Jamtvedt, & Raina, 2003). It allows for assessment with respect to different sub-scales that include ratings for: 1) reporting (out of 11): is sufficient information provided in the study to make an unbiased judgment about the study findings?; 2) external validity (out of 3): can study findings be generalized to the population from which the sample patients are derived?; 3) bias (out of 7): assesses for measurement bias in the intervention and the outcome; 4) confounding (out of 6): assesses for selection bias; 5) power (out of 5): assesses whether the study has sufficient power to detect an effect. These sub-scores outline the methodological strengths and weaknesses of included treatment studies (Downs & Black, 1998); the maximum score for D&B checklist is 31. The higher the D&B scores, the better the methodological quality of a study. One reviewer (JP) rated all 27-items in the D&B quality checklist for each study.
Data extraction and comparisons
For each included study, patient population, trial design, and stimulation paradigm were extracted by two authors (PSB and JP). Specifically, we extracted the: 1) patients mean time since stroke, 2) mean age, 3) group sample sizes, 4) characteristics of patients’ stroke and their aphasia, 5) trial design, 6) rTMS or tDCS administration procedures, 7) site(s) of stimulation, and 8) outcome measures (including but not limited to the picture naming accuracy).
Standard mean difference (SMD), a point estimate of the treatment effect (Faraone, 2008), and its 95% confidence intervals (CI) were computed for each study using the Comprehensive Meta Analysis® (CMA; version 2.2.050, 2009) software. To compute SMDs, we extracted descriptive statistics, in particular those related to the changes in picture naming accuracies with treatment. Data extraction was largely dependent on the amount of data reported and the trial design implemented in each included study. If adequate information was not provided in the textual description of the results, or if the results were unclear, we extracted the means and standard deviations (SD) or standard errors (SE) from published figures within these studies. We used a freely available Plot Digitizer software (Huwaldt, 2013) for this purpose. If standard errors were reported, we converted them into standard deviations using the formula
For SMD computations in between-subject trial designs, we extracted separate data for treated and control groups. For Barwood et al. (2013) and Seniów et al. (2013), we extracted sample sizes, means and SDs of measurements taken pre- and post-stimulation, and correlation between pre- and post-stimulation values. For Heiss et al. (2013) and You et al. (2011), we extracted the mean difference in naming performance (post- compared to pre-treatment), SD of those differences, and correlation between treated and control difference values. The same values for the sham group in You et al. (2011) were in comparison a-tDCS and c-tDCS values, which were analyzed separately (see Fig. 3b). For Khedr et al. (2014) we extracted means for pre- and post-stimulation, sample sizes, and the F-statistic for differences. Polanowska, Lesniak, Seniow, Czepiel, and Czlonkowska (2013) only reported medians, inter-quartile ranges, and non-parametric statistics, so we obtained mean and SD scores for each group pre- and post-stimulation from the authors in order to keep the meta-analysis consistent. However, we were not able to obtain pre-vs-post correlation values or parametric p- or F-statistics. Because the difference in pre-stimulation values was very small between groups (see Figs. 3a-3c) and more importantly, small relative to the magnitude of the difference in post-stimulation values between groups, we used each group’s post-stimulation mean and SD along with the sample size for SMD calculation. For all studies with between-subject trial designs, other than Polanowska et al. (2013), we standardized the SMD by the change score SD.
For crossover trials such as Baker et al. (2010), Kang et al. (2011), and Volpato et al. (2013), we extracted the mean and SD of differences in performance (post- minus pre- stimulation) for real and sham treatment conditions, the sample sizes corresponding to matched comparisons, and the between-condition correlation values. For Flöel et al. (2011), we extracted a mean and SD of the percent changes from pre-stimulation performance for a-tDCS, c-tDCS, and sham conditions. Two SMDs were calculated, as if a-tDCS and c-tDCS were tested in two different crossover trials, but the same sham values were used for both comparisons (Fig. 3b). In partial crossover trials like the one implemented by our group (Medina et al., 2012 using unpublished picture naming data), half of the patients underwent real rTMS treatment and half underwent sham treatment before they received real rTMS treatment. In this case, different analysis methods were applied to the two patient cohorts. For subjects who received sham and then real treatment, extraction was the same as for other crossover trials, except that the sample size reflected the size of the cohort, not the whole study. Subjects who only received real TMS were analyzed as a within-subject comparison using the difference in means between their baseline (or pre-) and post-stimulation accuracies, cohort sample size, and two-tailed paired-groups p-value.
For within-subject tDCS trials, such as Jung et al. (2011) and Santos et al. (2013), we extracted sample size, means and SD for performance pre-stimulation and post-stimulation, and correlation between pre- and post-stimulation performance values. We then calculated SMDs using similar methods as for crossover trials, except that pre-stimulation data was assigned for the control condition. For other within-subject trials, such as Abo et al. (2012) and Kakuda et al. (2011), we extracted the mean and SD of performance differences (post- minus pre-stimulation), sample size, and the correlation between pre- and post-stimulation values. For Szaflarski et al. (2011) (and the real TMS-only cohort from Medina et al., 2012) we extracted the mean pre- and post-stimulation values, the sample size, and two-tailed paired-groups p-value for SMD calculation.
Meta-analyses
Two meta-analyses—one for rTMS treatment studies and the other for tDCS treatment studies—were performed by pooling SMDs computed for each included study. These SMDs for changes in picture naming accuracies were pooled using random effects models. Typically, inter-study heterogeneity is quantified using I2 and Q statistics, which determine the model most appropriate for pooling data for meta-analyses. If I2 is found >50%, heterogeneity is considered substantial and random effects model is used, otherwise fixed effects model is used (Bastani & Jaberzadeh, 2012). We posited that clinical factors (stroke chronicity, anatomical lesion sizes and locations, type of aphasia) and stimulation parameters differ in ways that could significantly impact the treatment effects (Borenstein, Hedges, & Rothstein, 2007). Therefore, though we quantified and report the heterogeneity, we used random effects models for both meta-analyses, regardless of the indication from I2 and Q statistics.
Lastly, to interpret the degree of treatment effects, scales recommended in Higgins and Green (2008) were used: if SMD <0.4 the effect size is considered small, if SMD is between 0.4 and 0.7 the effect is considered moderate, and if SMD >0.7 the effect is considered large (Bastani & Jaberzadeh, 2012).
Results
rTMS
Selection of studies
A total of 173 records were identified using our search criteria (Fig. 1a). After removing duplicates, other publication types, like case studies (e.g., Cotelli et al., 2011; Cotelli et al., 2014; Hamilton et al., 2010; Martin et al., 2009; Martin et al., 2014; Naeser et al., 2011; Vuksanovic et al., 2015) and non-rTMS and non-stroke studies, 17 full-text articles were identified and scrutinized for eligibility; we also removed studies that provided stimulation in less than 3 sessions per week and/or studies in which different sites of stimulation were targeted across multiple sessions (e.g., Chieffo et al., 2014; Kindler et al., 2012; Naeser et al., 2011). Out of the 17 articles, 8 articles that included preliminary treatment effects were removed (e.g., Barwood et al., 2012; Barwood et al., 2011; Thiel et al., 2013; Waldowski et al., 2012; Weiduschat et al., 2011). In place of these articles, we included their most recently published versions with updated data reports (e.g., Barwood et al., 2013; Heiss et al., 2013; Seniów et al., 2013). We identified 1 study that was published in a language other than English and was excluded (J. H. Lee et al., 2007). A total of 8 studies with 143 patients met our inclusion criteria and were included in the final analysis.
Qualitative assessment
D&B quality ratings for rTMS treatment studies are provided in Table 2a. The mean total score for 8 studies was 23.7 ± 4.46 (out of 31). The overall rating score was the lowest for studies that implemented a within-subject trial design (19.7 ± 3.21), while RCTs scored the highest (26.7 ± 3.30). Within-subject studies scored especially low on internal validity measures, scoring 3.7 ± 0.58 (out of 7) and 1.7 ± 0.58 (out of 6) for bias and confounding scores, respectively. All within-subject trials were implemented in studies on chronic PWA, whereas RCTs were more frequently implemented in subacute PWA (except Barwood et al., 2013).
Characteristics of included studies
Table 3a provides a summary from all included rTMS studies of the patient population, trial design and stimulation paradigms.
Heiss et al. (2013; n = 29) and Seniów et al. (2013; n = 40) carried out relatively large RCTs and both scored high on our D&B quality assessments (20 and 30, respectively). In both studies, the active stimulation site was the right hemispheric PTr. The stimulation intensity in both studies was identical –90% of each individual patients’ resting motor thresholds (rMT). Significant improvement on a global severity measure of aphasia (AAT) in the real group was found in Heiss et al. (2013) compared to the control group, while Seniów et al. (2013) did not find any measurable differences between the real and the sham groups. The latter study did, however, report improvement in a subgroup of patients who suffered from severe aphasia in the real compared to the sham group. Interestingly, the duration of rTMS delivery was greater in Seniów et al. (2013) (15 days for 30 minutes per day), the study that did not find treatment effects of rTMS, compared to Heiss et al. (2013) (10 days for 20 minutes per day; Heiss et al., 2013; Seniów et al., 2013).
Barwood et al. (2013; n = 12) is the only RCT conducted in chronic PWA to date. All other RCTs to our knowledge have been conducted in subacute PWA. In this study, significant increases in naming, expressive language, and auditory comprehension were found two months after the end of rTMS in the real group as compared to sham. The rTMS paradigm was similar to Heiss et al. (2013) whereby the right PTr was stimulated at low-frequency for 10 days, 20 minutes per day, and at 90% of rMT (Barwood et al., 2013; Heiss et al., 2013).
All previously mentioned RCTs employed a unilateral rTMS paradigm with only minor differences in stimulation duration (20 or 30 minutes), or in the use of comparison groups (vertex or sham). Khedr et al. (2014) is one of the first groups to examine a novel dual-hemispheric, dual-rTMS protocol in PWA. Patients received low frequency rTMS over the right Broca’s area and high frequency (20 Hz) over the left, lesioned Broca’s area over a period of 10 days; these patients also received speech and language therapy following after dual-rTMS (Khedr et al., 2014). Overall aphasia severity improved, based on several language measures including naming, repetition, fluency, and comprehension.
Improved discourse production and picture naming (unpublished data) were found using a crossover design in chronic patients in Medina et al. (2012). A site-finding protocol was adopted in this study whereby 1 Hz rTMS was delivered over multiple right hemispheric IFG sites to determine a site that triggered the greatest improvement. Protracted1 Hz rTMS over 10 days (one session per day) was then delivered to that site. Notably for 9 out of 10 patients, the optimal site was found within the the right PTr (Medina et al., 2012).
Three studies that employed within-subject trial designs were all proof-of-concept studies to examine novel treatment approaches using rTMS. In one study by Abo et al. (2012), the goal was to examine a novel site-finding approach for the rTMS treatment. In this study, stimulation sites were identified by monitoring fMRI activation patterns during a language task in patients with language deficits (Abo et al., 2012). Improvement in spontaneous speech was reported in non-fluent PWA, and in fluent patients, auditory comprehension was improved; both benefits lasted at least 4 weeks after the discontinuation of stimulation. Another study by Szaflarski et al. (2011) examined whether facilitating recruitment of perilesional areas using an excitatory, iTBS protocol over the damaged left frontal areas can induce language recovery in PWA (Szaflarski et al., 2011). In this study, improved performance on a semantic fluency task was observed, as well as a trend toward better self-reported communication abilities. A third study using a novel rTMS approach was conducted by Kakuda et al. (2011), who delivered stimulation at two different frequencies within a single rTMS session. Patients were primed with 6 Hz rTMS for 10 minutes before the application of low frequency/1 Hz rTMS for 20 minutes over right frontal sites. A marked improvement in language performance was observed in these patients (Kakuda et al., 2011).
Pooled analysis of the treatment effects
Eight studies with 143 patients were included in the pooled analysis for picture naming accuracy using a random effects model. The formal statistical test of heterogeneity was non-significant (I2 = 17%, Q(8) = 9.64; p = 0.291). However, based on the discussion of study characteristics above and the evidence presented in Table 3a, a great deal of heterogeneity across studies exists with respect to clinical factors and stimulation parameters. Therefore, we believe that this heterogeneity warranted the use of a random effects model, despite statistical support for the fixed effects model.
The meta-analysis revealed a statistically significant and moderate SMD of 0.448 (95% CI = [0.23, 0.66]; p < 0.001) in favor of the rTMS treatment (Fig. 2a).
Because of the qualitative differences across studies (Table 2a), a subgroup analysis with respect to trial design (between-subject, crossover, within-subject) was conducted. This pooled analysis revealed statistically significant SMDs for between andwithin-subject designs, while a trend toward significance was observed for the cross-over trials. For between-subject trials, a moderate to large SMD of 0.704 (95% CI = [0.31, 1.10]; p < 0.001) was found, while for within-subject trials, a small SMD of 0.292 (95% CI = [0.10, 0.48]; p = 0.002) was found. For the crossover trial, a large (albeit non-significant) SMD of 1.248 (95% CI = [–0.19, 2.68]; p = 0.088) was found (Fig. 2b).
We also carried out a second subgroup analysis by stroke chronicity to compare rTMS treatment effects in chronic versus subacute patients. Of note, 3/4 studies that implemented a between-subject trial design included subacute patients, while crossover and within-subject trial designs were implemented exclusively in chronic patients. Pooled analysis revealed statistically significant SMDs for both chronic and subacute patients. A small SMD of 0.348 (95% CI = [0.14, 0.56]; p = 0.001) was found for chronic patients, while a moderate SMD of 0.667 (95% CI = [0.24, 1.09]; p = 0.002) was found for subacute patients (Fig. 2c). Although these findings are greatly confounded by trial designs employed in different chronicity groups, they appear to suggest that rTMS may be more effective subacutely than in the chronic phases of stroke recovery.
Publication bias
The funnel plot in Fig. 4 (left side; “rTMS”) reveals significant asymmetry about the combined effect size (Begg’s test: Tau = 0.44, p = .048; Egger’s intercept = 1.62, p = 0.002) in the rTMS studies included in this meta-analysis. To investigate the magnitude of publication bias, we employed the Duval and Tweedie (2000) trim and fill method. The estimated number of missing studies was found to be 4, using the random effects model, the same model as in the original meta-analysis. This adjustment for bias changed the point estimate of overall effect size from 0.448 to 0.354 (95% CI = [0.12, 0.58]), a small yet meaningful SMD. The classic fail-safe N to exceed p = 0.05 was calculated as 54, meaning that there would have to be 6 missing studies for every one reported to nullify this effect. Thus, although the magnitude of effect size changed from moderate to small after the exclusion of publication bias, the overall conclusion of the meta-analysis, in favor rTMS treatment, did not change.
tDCS
Selection of studies
A total of 342 records were identified using our search criteria. After eliminating duplicates, other publication types, like case studies (e.g., Fiori et al., 2011; Vestito, Rosellini, Mantero, & Bandini, 2014) and non-tDCS and non-stroke studies, 17 full-text articles were identified and scrutinized for eligibility; studies that provided stimulation in less than 3 sessions per week were also removed (e.g., S. Y. Lee et al., 2013). Out of the 17 articles, 8 articles were removed because they did not use picture naming accuracy as one of their outcome measures (e.g., Fridriksson, Richardson, Baker, & Rorden, 2011; Marangolo et al., 2014; Marangolo et al., 2013; Marangolo et al., 2011; Vines, Norton, & Schlaug, 2011). We could not extract data from 1 out of the 9 studies considered at this stage (Saidmanesh, Pouretemad, Amini, Nilipor, & Ekhtiari, 2012). A total of 8 studies with 140 patients met all our inclusion criteria and were included in the final analysis.
Qualitative assessment
D&B quality ratings for tDCS treatment studies are provided in Table 2b. The mean total score for 8 studies was 25.4 ± 3.85 (out of 31). While the overall rating score was the lowest for studies that implemented within-subject trial design (23.5 ± 3.54), RCTs scored the highest (28.5 ± 0.71). All crossover tDCS trials were conducted in chronic PWA, and all RCTs were conducted in subacute PWA.
Characteristics of included studies
You et al. (2011) and Polanowska et al. (2013) implemented between-subject designs in subacute PWA. Of 21 patients in the You et al. (2011) study, 7 patients received a-tDCS centered on left STG (Wernicke’s), 7 received c-tDCS on right STG and 7 received sham stimulation. Stimulation intensity in You et al. (2011) was 2 mA and all patients underwent SLT during stimulation. Significant improvement was reported across all patient groups, including the group that received sham stimulation, based on a global aphasia severity scale (WAB-AQ). Auditory verbal comprehension was the only ability in which significantly greater improvement was observed in patients who received right c-tDCS compared to those who received left a-tDCS or sham stimulation (You et al., 2011). In another between-subject study by Polanowska et al. (2013), one group received 1 mA a-tDCS over Broca’s area (left inferior frontal gyrus; n = 14), while the other group (n = 10) received sham stimulation over the same site. The duration of stimulation in both groups was 10 minutes per day over 15 days. Both groups also received offline SLT for 45 minutes after stimulation ended. Polanowska et al. (2013) did not find significant differences between groups in naming accuracy or reaction time with a-tDCS (Polanowska et al., 2013). Non-significant treatment effects in this study, in particular compared to You et al. (2011), may be due to shorter duration of tDCS (only 10 minutes compared to 30 minutes in You et al., 2011), lower stimulation intensity (1 mA compared to 2 mA), and/or the timing of SLT (during versus after treatment). Of note, a-tDCS over language areas in the damaged, left hemisphere in both studies was found to be ineffective, while c-tDCS of contralesional, right STG in You et al. (2011) yielded significant benefits (Polanowska et al., 2013; You et al., 2011).
A set of four studies that employed a crossover design were all conducted in chronic PWA. Baker et al. (2010) is the first of these 4 studies to have reported significant improvement in naming accuracy in 10 patients after 1 mA a-tDCS to intact left frontal areas, as compared to sham stimulation in the same patients for the same duration. In this study, the stimulation site within the left hemisphere was determined using fMRI activation patterns. The benefit after a-tDCS lasted at least 1 week after the treatment ended (Baker et al., 2010). In comparison, Kang et al. (2011) applied c-tDCS over right frontal areas with the goal of reducing deleterious right hemispheric activity. Picture naming accuracy improved in 10 patients in this study, compared to sham stimulation (Kang et al., 2011). SLT was provided in both Baker et al. (2010) and Kang et al. (2011) during stimulation. Flöel et al. (2011), in comparison, provided both a-tDCS and c-tDCS over the right temporo-parietal cortex, in conjunction with anomia training, and compared effects with sham. A-tDCS, but not c-tDCS, of right hemispheric areas exhibited greater and longer-lasting improvement (up to 2 weeks) in naming ability, as compared to sham. In this study, excitatory stimulation of the non-dominant homotopic areas was found to more reliably enhance effects of anomia training than inhibiting them. Lastly, Volpato et al. (2013) also investigated the treatment benefits of a-tDCS in 8 chronic patients. Anodal stimulation administered over left frontal areas at an intensity of 2 mA for 10 days was compared to sham stimulation. Patients did not receive any form of SLT in this study. No significant improvement based on picture naming accuracy or reaction time was reported after a-tDCS as compared to sham. The authors noted that a lack of online SLT in the treatment regimen may explain non-significant findings with a-tDCS in their patients (Volpato et al., 2013).
Two studies employed a within-subject design. Santos et al. (2013) explored whether c-tDCS of the intact, right motor cortex (2 mA, 10 days, 20 minutesper day) could improve language performance in 19 chronic PWA. Comparing post-stimulation to pre-stimulation performance, significant improvement was found in naming accuracy, verbal fluency, and simple phrase comprehension (Santos et al., 2013). Jung et al. (2011) investigated treatment effects of 1 mA c-tDCS over right frontal areas in 10–15 daily sessions (20 minutes per session) in 37 subacute patients. SLT was provided during and after c-tDCS for a total of 30 minutes in these patients. The main goal in this study was to explore factors associated with better response to tDCS treatment. The authors found significant improvement in WAB-AQ scores post-ctDCS, particularly in patients exhibiting less severe aphasia (Jung et al., 2011). Stroke type was also found to predict response to tDCS, as patients with hemorrhagic stroke improved significantly more than patients with ischemic stroke. Similar to rTMS, the 2 studies that employed a within-subject design each explored a novel approach: one study examined whether tDCS over brain regions distant from language areas, not those typically stimulated in aphasia studies, could provide benefits and another examined the factors that could predict and/or explain individual variability in response to tDCS treatment (Jung et al., 2011; Santos et al., 2013).
Pooled analysis of the treatment effects
Eight studies with 140 patients were included in the pooled analysis for the picture naming accuracy using a random effects model (Heterogeneity: I2 = 0%, Q(9) = 7.43; p = 0.592).
The meta-analysis revealed a statistically significant and moderate SMD of 0.395 (95% CI = [0.28, 0.51]; p < 0.001) in favor of tDCS treatment (Fig. 3a).
Because of the differences in quality of studies, as noted above (Table 2b), we carried out subgroup analysis by trial design (between-subject, crossover, within-subject). This pooled analysis revealed a statistically significant and moderate SMD of 0.490 (95% CI = [0.30, 0.68]; p < 0.001) for within-subject trials, whereas a significant but small SMD of 0.336 (95% CI = [0.18, 0.49]; p < 0.001) was found for crossover trials, and a non-significant and small SMD of 0.283 (95% CI = [–0.27, 0.83]; p = 0.31) was found for between-subject trials (Fig. 3b).
The observed differences in SMDs could also have stemmed from differences in stroke chronicity of patients across trials. As noted above, among studies included in the meta-analysis, 2 implemented a between-subject trial design, both of which were conducted in subacute patients, while crossover and within-subject trial designs were largely conducted in chronic patients. The effects of trial design and stroke chronicity can be considered nested. In an attempt to disambiguate these effects, a second subgroup analysis by stroke chronicity was performed. This analysis revealed a statistically significant but small SMD of 0.320 (95% CI = [0.17, 0.47]; p < 0.001) for studies in chronic patients, while for studies in subacute patients, the SMD of 0.283 (95% CI = [–0.27, 0.83]; p = 0.31) was small and non-significant(Fig. 3c).
Publication bias
The funnel plot in Fig. 4 (right side; “tDCS”) reveals that the distribution was symmetrical about the combined effect size for the tDCS studies included this meta-analysis (Begg’s test: Tau = 0.11, p = 0.33; Egger’s intercept = –0.08, p = 0.45). The Duval and Tweedie (2000) trim and fill method estimated the number of missing studies to be 0. Therefore, no adjustment for bias was made, and the point estimate was the same as the original meta-analysis. The classic fail-safe N to exceed p = 0.05 was calculated as 72, meaning that 72 unreported null studies would be required to render the effect as non-significant.
rTMS versus tDCS
We used the fixed effects model to compare the estimates of SMD obtained from 2 independent meta-analyses for rTMS and tDCS treatment studies, both of which were run using the random effects model; this analysis was conducted using the metafor package in RStudio (2015) (Viechtbauer, 2010). While larger effects were observed with rTMS than tDCS (b = 0.53, SE = 0.125), the difference between estimates was not found significant (z = 0.423, p = 0.672). We did not perform a similar comparison of estimates with respect to trial design and stroke chronicity because of the small number of studies in eachcategory.
Discussion
The systematic review and meta-analyses in our study provide compelling evidence that both TMS and tDCS can be effective in the treatment of post-stroke PWA. Our findings replicate the outcomes from other recently published meta-analyses, which also favor rTMS and/or tDCS compared to sham or control treatments in PWA (Otal et al., 2015; Ren et al., 2014). Although the magnitude of overall treatment effects in rTMS versus tDCS was larger, this difference was not statistically significant. Nevertheless, a further assessment of comparative advantage between the 2 widely used NIBS modalities in PWA is warranted in large-scale clinical trials. TDCS is a less expensive and more flexible mode of NIBS, but its spatial precision and intensity of delivery are quite limited. TMS, on the other hand, is highly precise and delivers stimulation at a higher intensity that tDCS, but it comes with a proportionally higher price tag, bulkier equipment, and limited flexibility, particularly with respect to pairing stimulation with concurrent behavioral or language therapies. We believe that further critical evaluation of the utility of these two brain stimulation approaches for treating PWA in clinical practice ought to combine data about their relative efficacy with evidence related to the feasibility of their use and benefit to cost analyses. Furthermore, specific treatment effects may depend on the characteristics of patients’ strokes and aphasia symptoms; this could also inform which neuromodulation technique would be best suited to treatPWA.
In our subanalysis by stroke chronicity, we found that while rTMS was effective in both chronic and subacute populations of PWA, tDCS was effective in the chronic, but not in the subacute populations (Polanowska et al., 2013; You et al., 2011). This finding is potentially consistent with the notion that neuroplastic changes in reorganizing language networks are dynamic in the acute and subacute stages after stroke, but stabilize over months and years after stroke (Saur et al., 2006). TDCS likely facilitates changes in performance via diffuse incremental reinforcing effects on widely distributed networks of neurons. It may be the case that during the acute and subacute phases after stroke, dynamic networks of brain regions are not yet sufficiently reorganized and stably recruited for language tasks in order to benefit from subtle exogenous reinforcement.
However, one major caveat to our subanalysis of chronicity effects is that this effect cannot be readily disentangled from the potential effect of trial design. Within-subject and crossover trial designs were more frequently employed in chronic PWA, while trial designs in subacute PWA were invariably between-subject designs/RCTs. The bias toward certain trial designs based on chronicity is perhaps unsurprising. Insofar as language performance in PWA in the chronic phase is generally stable (and in fact notoriously difficult to treat), the use of within-subject/crossover designs is more frequent, wherein patients’ pre-treatment performance can be compared meaningfully with their own post-treatment performance. By contrast, because persons with subacute aphasia generally exhibit some spontaneous recovery over time, between-subject designs are generally needed in order to reliably show any long-term effects of treatment in these patients. In addition, owing to the difficultly in recruiting patients with chronic language deficits after they have left the hospital or rehabilitation care, studies with larger sample sizes and those that were RCTs/between-subject designs included subacute patients, with only a few exceptions (e.g., Barwood et al., 2013). We therefore conducted a second subgroup analysis by trial design to disambiguate contributions of trial design on the treatment effects. The observed benefits with tDCS in chronic PWA could be a product of their use of crossover and within-subject trial designs, and when examined using between-subject/RCTs the effectiveness of tDCS was not significant. In contrast, rTMS was found effective in both within- and between-subject trial designs.
The picture naming accuracy was the primary outcome measure in our meta-analyses. As recently noted by Turkeltaub (2015), the value of this measure in determining NIBS efficacy has been scrutinized. Most of the earlier proof-of-concept studies used sensitive neuropsychological measures, such as picture naming, to examine the safety and first-order effectiveness of NIBS in aphasia (Turkeltaub, 2015). Whether improvement in picture naming translates to improvement in everyday communication has only been a topic of interest in the recent years. A trend towards improvement in self-reported communication abilities has been reported after rTMS treatment (Szaflarski et al., 2013). In a recent randomized controlled trial, anodal tDCS of the motor cortex not only improved naming abilities but also functional communication, as measured by Communicative Effectiveness Index and Partner Communication Questionnaire, in their patients (Meinzer et al., 2016). A series of studies by Marangolo and colleagues (2011, 2013a,b, 2014) have demonstrated improvements in informative speech (Marangolo et al., 2013b), verb retrieval (Marangolo et al., 2013a), word production and speech cohesion (Marangolo et al., 2014) after anodal tDCS. Therefore, emerging evidence supports that NIBS treatment can improve complex aspects of language and operational communication in patients with aphasia. Moving forward, we strongly advocate for the use of clinically-relevant outcome measures, such as those reported by patients themselves or their caregivers, to further validate the functional efficacy and also estimate comparative advantage of different NIBS aphasia treatments (Shah et al., 2013; Shah-Basak and Hamilton, 2016). As the current state of affairs indicate, the field is progressing, moving beyond neuropsychological tests, and focusing on ways to optimize stimulation parameters to facilitate a real difference in stroke survivors’ abilities to communicate.
D&B ratings of qualitative assessment in rTMS and tDCS studies clearly distinguished between within-subject/crossover and RCT designs. In particular, internal validity was poorer in within-subject compared to RCTs/between-subject designs. This is not surprising because of a possibility of carryover or practice effects of repeated testing, which cannot be appropriately controlled for in within-subject and crossover trials. In addition, it is challenging to disentangle the effects of rTMS or tDCS that are above and beyond the effects of SLT in these trial designs. One caveat to our meta-analyses is that they also cannot reliably separate the contribution of tDCS or rTMS from that of SLT or carryover effects. For instance, the within-subject tDCS studies included in our analysis (Jung et al., 2011; Santos et al., 2013) were distinct in 3 important aspects: 1) the stimulation site (right frontal vs. right motor cortex), 2) the stimulation intensity (1 mA vs. 2 mA) and 3) the use of concurrent SLT (30 minutes vs. none provided). While both studies individually reported significant improvement in language measures, the meta-analysis (Fig. 3b) suggested larger effects in Jung et al. (2011) than Santos et al. (2013). However, we are unable to comment on the root of these benefits and whether or not it lies in the differences in tDCS parameters or due to the use of concurrent SLT.
Publication bias in the meta-analysis of rTMS studies was found to be significant. Sample sizes in proof-of-concept studies, like the ones investigating treatment effects of a novel TMS approach (Kakuda et al., 2011), tended to be small. The likelihood of these studies being accepted for publication is greater if the findings are statistically significant and exhibit large effect sizes. This may have mediated the observed asymmetry between sample and effect sizes as represented in the funnel plot, whereby 4 studies were imputed to be ‘missing’ from our analysis (bottom left in Fig. 3c; Duval, 2005; Duval & Tweedie, 2000). Nevertheless, adjusting for this bias in our analysis did not impact our overall conclusions about the efficacy of rTMS treatment in PWA. Furthermore, inclusion of poor quality studies, preferential selection of studies based on the quality of reports and publication language, and inherent heterogeneity in patient characteristics and stimulation paradigms could have also resulted in the observed publication bias in rTMS studies (Souza, Pileggi, & Cecatti, 2007).
Mounting evidence suggests that size and location of the stroke, as well as stroke type (hemorrhagic, infarction), and severity of language deficits at treatment may critically predict the magnitude of induced benefit from a particular stimulation paradigm (low frequency rTMS/c-tDCS or high-frequency/a-tDCS). For example, more severe patients and patients with more anterior damage responded better to rTMS treatment in Seniów et al. (2013), and Jung et al. (2011) found that more severe aphasia and/or hemorrhagic strokes were associated with greater improvements in language deficits after the tDCS treatment. Because these factors are rarely captured or accounted for in the context of stratifying response to treatment with NIBS, further research will be needed to shed light on the use of specific stimulation paradigms based on location/size of stroke damage, as well as the role of baseline aphasia severity and type of language deficits.
At least 3 other meta-analysis studies investigating the effects of rTMS and/or tDCS in PWA have been published in the last 3 years (Elsner et al., 2013; Otal et al., 2015; Ren et al., 2014). Our study is unique in 3 important ways: 1) the scope of prior meta-analyses is limited to examining treatment effects of either rTMS (Ren et al., 2014) or tDCS (Elsner et al., 2013) techniques, or studies that adopted a specific stimulation paradigm (SLT and inhibitory stimulation using 1 Hz rTMS and cathodal tDCS; Otal et al., 2015). Subanalyses by stroke characteristics or by experimental designs were not conducted in any of these studies and, therefore, our analysis seizes a missed opportunity to understand how treatment effects may be confounded by factors seemingly unrelated to the use of NIBS in PWA. 2) Reported findings for rTMS in Otal et al. (2015) and Ren et al. (2014) may be inaccurate because of the inclusion of studies with overlapping sets of patients, as both these studies included Seniów et al. (2013), Waldowski et al. (2012), Weiduschat et al. (2011), Thiel et al. (2013), and Heiss et al. (2013); only the most recently published studies (e.g., Heiss et al., 2013; Seniów et al., 2013) should have been included to avoid inappropriately weighting the effect sizes. Lastly, 3) we adopted a more conservative approach by implementing a random effects model for all effect size estimations because of the inherent heterogeneity in included studies.
Finally, although our study and other published meta-analyses are in agreement on the effectiveness of rTMS in improving language deficits in PWA (Otal et al., 2015; Ren et al., 2014), our findings for tDCS are in contrast to the results of other meta-analyses (Elsner et al., 2013; Otal et al., 2015). Elsner et al. (2013) did not find enhanced benefits of tDCS compared to what SLT could already provide in PWA (Elsner et al., 2013). The effect size found in that study was small (SMD = 0.31) by the Higgins and Green (2008) standard. Otal et al. (2015) also reported non-significant effects with c-tDCS but with moderate effect size (SMD = 0.42). In both of those studies, the inclusion/exclusion criteria were quite different than in our analysis; in Elsner et al. (2013) with regards to a minimum number of subjects, requirements for at least 3 stimulations at any given site, and trial design, and in Otal et al. (2015) with regards to the delivery of SLT in both active and sham conditions and studies that applied c-tDCS (or 1 Hz rTMS) on contralesional non-language dominant hemisphere. The total N in Elsner et al. (2013) (in only 5 studies) was 54, compared with 140 in this analysis; in addition to excluding the two large within-subjects trials analyzed here, that meta-analysis was also released prior to gaining access to data from 2 other larger studies that were also included in our analysis (e.g., Baker et al., 2010; Polanowska et al., 2013). Otal et al. (2015) included 3 c-tDCS studies that met their inclusion criteria, with the total N of only 32. Given that there are substantial differences in methodologies, it is difficult to isolate the source of disagreement on the effectiveness of tDCS in PWA.
In conclusion, mechanisms of language system plasticity in PWA remain to be fully elucidated. Consequently, the specific effects of NIBS on the neural processes underlying aphasia recovery are also not yet well understood. It is important to recognize that these limitations in our current understanding constrain the utility of purely hypothetical model-driven selection of stimulation parameters. Consistent with this notion, we observed overall beneficial effects of NIBS in PWA in groupings of rTMS and tDCS studies that were heterogeneous with respect to sites of stimulation, presumed effects on cortical activity (excitation vs. inhibition), and theorized neural mechanisms. Our findings suggest that it may be premature to limit studies and analyses of NIBS techniques in aphasia to only inhibitory (or excitatory) subtypes of rTMS and tDCS, or to stimulation protocols that target only a single cortical region. Moving forward, large-scale and high quality RCTs with high internal and external validity are needed to more fully characterize the comparative efficacy of rTMS and tDCS, and also to reveal the factors and parameters that directly and indirectly influence response to these treatments in PWA.
Footnotes
Acknowledgments
The current work was supported by NIDCD RO1 DC012780-01A1, the Dana Foundation, and the Robert Wood Johnson Foundation. The authors would like to thank Dr. Jose Torres at the New York University for his initial involvement in dataextraction.
