Abstract
Objectives:
To report the comparative benefits and harms of exercise and complementary and alternative medicine (CAM) treatments with second-generation antidepressants (SGA) for major depressive disorder (MDD).
Design:
Systematic review and meta-analysis.
Settings:
Outpatient clinics.
Subjects:
Adults, aged 18 years and older, with MDD receiving an initial treatment attempt with SGA.
Interventions:
Any CAM or exercise intervention compared with an SGA.
Outcome measures:
Treatment response, remission, change in depression rating, adverse events, treatment discontinuation, and treatment discontinuation due to adverse events.
Results:
We found 22 randomized controlled trials for direct comparisons and 127 trials for network meta-analyses, including trials of acupuncture, omega-3 fatty acids, S-adenosyl methionine, St. John's wort, and exercise. For most treatment comparisons, we found no differences between treatment groups for response and remission. However, the risk of bias of these studies led us to conclude that the strength of evidence for these findings was either low or insufficient. The risk of treatment harms and treatment discontinuation attributed to adverse events was higher for selective serotonin receptor inhibitors than for St. John's wort.
Conclusions:
Although we found little difference in the comparative efficacy of most CAM therapies or exercise and SGAs, the overall poor quality of the available evidence base tempers any conclusions that we might draw from those trials. Future trials should incorporate patient-oriented outcomes, treatment expectancy, depressive severity, and harms assessments into their designs; antidepressants should be administered over their full dosage ranges; and larger trials using methods to reduce sampling bias are needed.
Introduction
M
Use of complementary and alternative medicines (CAM) among the general U.S. adult population is common 4 and increases among those with chronic medical conditions. In a nationally representative sample of U.S. adults 5 with self-reported depression or anxiety, more than 50% reported using CAM therapies. As a result, several professional organizations have published statements or practice guidelines about CAM use for MDD, including the American Psychiatric Association (APA), 6 the Canadian Network for Mood and Anxiety Treatments (CANMAT), 7 and the Department of Veterans Affairs. 8 Although recommendations from these organizations support the use of St. John's wort for mild-to-moderate depression, recommendations for other CAM therapies are more equivocal because of a paucity of high-quality evidence.
Pharmacotherapy remains the most common intervention for patients with MDD. However, more than 60% of patients experience at least one adverse effect during treatment, often leading to treatment cessation. 9 Further, ∼70% of patients with MDD do not achieve remission after initial pharmacologic treatment. 10 Therefore, both patients and providers may wish to consider other treatment options, including the combination of antidepressants with CAM therapies. Previously, we reported on the comparative effectiveness of SGAs versus psychological, complementary, and exercise treatments for MDD. 11 –13 Here, we present our full findings on the comparative benefits and harms of CAM and exercise interventions with SGAs for MDD. In addition, we report on the comparisons among CAM and exercise interventions themselves.
Materials and Methods
Search strategy
Detailed methods and search strategy are available in the full report on the use of nonpharmacological treatments for patients with MDD.
12
Briefly, we searched MEDLINE®, EMBASE, the Cochrane Library, AMED (Allied and Complementary Medicine Database), PsycINFO, and CINAHL (Cumulative Index to Nursing and Allied Health Literature) for studies published from January 1990 through September 2015. For a comparison of benefits, we limited ourselves to randomized controlled trials (RCT) only, but included nonrandomized studies (i.e., nonrandomized controlled trials, cohort study, case-control study) for comparisons of harms. We used a combination of Medical Subject Headings terms and keywords for CAM interventions. To find unpublished studies, we searched
Study selection, data abstraction, and quality assessment
Two trained team members independently reviewed all abstracts and full-text articles by using predefined inclusion and exclusion criteria. Our population of interest included adult outpatients of all races and ethnicities with MDD, which was consistent with the Diagnostic and Statistical Manual, Fourth edition (DSM-IV) or diagnostic criteria. We used a structured data abstraction form. Trained reviewers initially abstracted trial data, which a senior reviewer reviewed for accuracy and completeness. Two independent reviewers assessed trial risk of bias by using the Cochrane Risk of Bias tool 14 (rated as low, medium, or high). Two reviewers independently graded the strength of evidence based on the guidance established for the Evidence-based Practice Center program of the U.S. Agency for Healthcare Research and Quality. 15 Grades reflect the confidence that the estimate of an outcome of interest is close to the true effect and are rated as high, moderate, low, or insufficient. At each step, disagreements were resolved by consensus or by involving a third reviewer.
Analysis plan
Because we were aware of the dearth of studies directly comparing CAM interventions, we planned to conduct network meta-analyses, which took a larger evidence base into account. Network meta-analysis allows for the estimation of comparative treatment effects across trials based on a common comparator such as placebo or a standard treatment. In the absence of head-to-head trials, such indirect comparisons allow for pooling of results to better visualize the available evidence base. When available, direct comparisons using meta-analysis are typically preferred and often represent higher strength of evidence compared with indirect comparisons. However, evidence suggests that network meta-analyses agree with head-to-head trials if component studies are similar and treatment effects are expected to be consistent in patients in different trials. 16
We conducted network analyses with a hierarchical frequentist approach by using random effects models. 17,18 We included all placebo- and active-controlled RCTs that were homogenous in study populations and outcome assessments and were part of a connected network. We built on a database of relevant RCTs from a previous report on the comparative efficacy and safety of second-generation antidepressants. 19 We included double-blinded RCTs of at least 6 weeks' duration. For interventions for which double blinding was not possible or not performed, we required that outcome assessors were blinded. For network meta-analyses, we excluded studies conducted only in participants who were older than 55 years of age.
Our primary outcome measure was response to treatment on the Hamilton Depression Rating Scale (HAM-D), which was defined as a 50% improvement of scores from baseline. We chose this outcome because most studies used the HAM-D and reported data on response to treatment. We recalculated response rates for each study by using the number of all randomized patients as the denominator to reflect a true intent-to-treat (ITT) analysis. With this approach, we attempted to correct variations in results of modified ITT analyses encountered in individual studies.
The data provided information on the probability of the response of treatment j out of K possible treatments in study i (pij). We applied a generalized linear model with random effects. The logit for the random effects model
16,17,20
can be expressed as:
where all
We fit all models by using PROC GLIMMIX in SAS version 9.3 (SAS Institute, Cary, NC), specifying a binomial likelihood and logit link function. For ease of interpretation, we present the relative risks and 95% confidence intervals (CI) of outcomes of interest for all possible comparisons among our treatments of interest. For all network meta-analyses, we conducted sensitivity analyses with and without high risk-of-bias studies, but we report findings with high risk-of-bias studies because sensitivity results were similar to the full results.
Results
Our searches for the full report identified 8,317 citations; we included 22 RCTs for direct comparisons and 127 trials for network meta-analyses (Fig. 1). We included CAM trials of acupuncture, omega-3 fatty acids, S-adenosyl methionine, and St. John's wort (Hypericum perforatum [L.]); we also included trials of exercise. No studies of meditation, mindfulness-based therapies, or yoga met our inclusion criteria. For all included trials of direct comparison, the SGA was a selective serotonin receptor inhibitor (SSRI). For network meta-analyses, we found trials comparing both SSRI and selective norepinephrine receptor inhibitor.

PRISMA diagram for CAM and exercise treatment of major depressive disorder. CAM, complementary and alternative medicine; KQ, key question; MA, meta-analysis; PRISMA, preferred reporting items for systematic reviews and meta-analyses; SR, systematic review.
For direct comparisons, we identified 20 RCTs (22 articles) including 2,600 participants comparing a CAM therapy with an SSRI and 2 RCTs (4 articles) including 309 participants comparing exercise with an SSRI. Half of the trial comparisons were with fluoxetine; other SSRIs included sertraline (5 trials), paroxetine (2), citalopram (2), and escitalopram (1). Most trials made comparisons with moderate or low doses of the SSRIs, but no trial used the full, approved range of antidepressant dosage. All trials enrolled participants exclusively from outpatient settings and excluded patients who had additional Axis I disorders, high suicidal risk, progressive medical diseases, or who used psychotherapy, electroconvulsive therapy, or psychotropic medications. Most participants had moderate-to-severe depression as measured by the HAM-D. 21 Treatment durations ranged from 6 to 16 weeks. Trials were conducted in a variety of countries, including Brazil (1 trial), Canada (1), China, (3), Denmark (1), Germany (5), Iran (1), Sweden (1), and the United States (7). Table 1 presents study characteristics, main outcomes, and risk-of-bias ratings for all 22 CAM and exercise trials that we identified.
Total number of randomized participants in relevant arms of trial.
Acupuncture and exercise dose recorded as number of sessions.
Response and remission are measured on the HAM-D.
The Chen et al. trial had a substantial overlap of participants (n = 105) with the Qu et al. trial.
Very little information provided on randomization procedures and analytic methods.
High differential attrition; completers analysis.
Trial included two active electroacupuncture groups, with different sets of points, designed to treat depression.
Fluoxetine versus EPA versus fluoxetine + EPA. p-values are for fluoxetine versus EPA, fluoxetine versus combination, and EPA versus combination, respectively.
Unclear randomization methods; high attrition; completers analysis.
High attrition.
For dichotomous outcomes (e.g., response and remission), we rated the risk of bias for these trials as medium because dropouts were counted as remission failures.
High attrition, unclear randomization methods.
Not included in meta-analyses because it is a reanalysis of Fava et al. 34
High attrition, unclear randomization methods.
Not included in response and remission meta-analyses because of the age of the trial population (60–80 years).
Completers analysis.
DHA, docosahexaenoic acid; EA, electroacupuncture; EPA, eicosapentaenoic acid; HAM-D, Hamilton Depression Rating Scale; MA, manual acupuncture; mg/day, milligram per day; N, number; NR, not reported; NS, reported as not significant; SAMe, S-adenosyl-L-methionine; SGA, second-generation antidepressants.
Treatment benefits and comparative effectiveness
Acupuncture
Three trials compared acupuncture monotherapy with an SSRI. Response rates were similar for interventions in two trials 22,23 and were not reported for the third. 24 We found no statistically significant differences in treatment response rates from network meta-analysis (RR 1.25; 95% CI: 0.71–2.2). CI, however, encompassed clinically relevant differences between treatments.
Two trials compared acupuncture plus SSRI combination therapy with SSRI monotherapy. One trial reported a higher treatment response rate for combination manual acupuncture (70% vs. 42%, p = 0.004) and electroacupuncture (70% vs. 42%, p = 0.004) compared with paroxetine alone, but no differences in remission. 25 The other trial reported no difference in response rate between combination acupuncture plus fluoxetine versus sham acupuncture plus fluoxetine. 26
In head-to-head comparisons using network meta-analyses, acupuncture was superior only to omega-3 fatty acids (RR 2.45; 95% CI: 1.20–5.03) and not statistically different from those of all other interventions.
Omega-3 fatty acids
One trial compared omega-3 fatty acid monotherapy with an SSRI. 27 It reported no differences in treatment response. However, based on network meta-analyses, we found higher response rates for patients treated with an SSRI (RR 1.96; 95% CI: 1.26–3.05) compared with those taking omega-3 fatty acids.
Two trials compared a combination of omega-3 fatty acid plus either fluoxetine or citalopram with SSRI monotherapy. For both trials, participants treated with combination therapy were more likely to benefit than those treated with either SSRI or omega-3 fatty acid monotherapy (combination treatment vs. citalopram: 44% vs. 18% for remission, p-value not reported 28 ; combination treatment vs. fluoxetine: 81% vs. 50% for treatment response, p = 0.005, EPA monotherapy vs. fluoxetine: 56% vs. 50%, p = 0.43). 27
In head-to-head comparisons using network meta-analyses, omega-3 fatty acids were inferior to St. John's wort (RR 0.41; 95% CI: 0.25–0.66) and SGAs (RR 0.51; 95% CI: 0.32–0.80). We found no significant differences between omega-3 fatty acids and other interventions.
S-adenosyl methionine
We identified one trial comparing S-adenosyl-L-methionine (SAMe) monotherapy with escitalopram alone. 29 It reported no significant differences between interventions for treatment response and remission. Similarly, we found no differences in response rates from network meta-analysis (RR 1.22; 95% CI: 0.66–2.26). We did not identify any trials comparing SAMe combination therapy with SGA monotherapy. In head-to-head comparisons using network meta-analyses, we found no significant differences for any comparisons with SAMe.
St. John's wort
Twelve trials (1,806 participants) compared St. John's wort with various SSRIs. 30 –41 Trials used a variety of commercially available standardized extracts (e.g., LI-160, WS5570, Ze117, STW3, Calmigen, Iperisan, Swiss herbal remedies), which were most often standardized to 0.12%–0.28% hypericin; doses ranged from 300 mg to 1,800 mg of the standardized extract daily. Based on the HAM-D, most participants had severe depression.
Overall, treatment response, remission, and magnitude of change on the HAM-D scale were similar between participants treated with St. John's wort and those receiving an SSRI (Fig. 2). Sensitivity analysis using dose of the SSRI or treatment duration showed no statistical difference between the SSRIs and St. John's wort. Sensitivity analysis stratified by St. John's wort preparation demonstrated a larger benefit in treatment response compared with an SSRI for Ze 11739 when compared with other St. John's wort preparations (RR 0.66; 95% CI: 0.51–0.87), but it was used in only a single trial. When stratifying by study country of origin, we found no statistical difference in estimates between studies conducted in Germany (RR 0.90; 95% CI: 0.76–1.06) and those conducted in non-German countries (RR 1.07; 95% CI: 0.85–1.33). We did not find any trials comparing combination therapy with SGA monotherapy.

St. John's wort versus second-generation antidepressants: response, remission, and change in HAM-D.
In head-to-head comparisons using network meta-analyses, St. John's wort was superior to exercise (RR 2.37; 95% CI: 0.98–5.75), omega-3 fatty acids (RR 2.44; 95% CI: 1.52–3.93), and SGAs (RR 1.24; 95% CI: 1.05–1.48). We found no differences between St. John's wort and other interventions.
Exercise
We identified two trials comparing aerobic exercise monotherapy with SSRIs. 42,43 Neither trial found a statistically significant difference in remission rates between interventions. One trial compared a combination of exercise plus sertraline with sertraline monotherapy. 42 It reported no differences in treatment response. In network meta-analysis comparing exercise with SGAs, we found no significant difference in treatment response rate (RR 0.53; 95% CI: 0.22–1.26).
In head-to-head comparisons using network meta-analyses, exercise had lower response rates compared with St. John's wort (RR 0.42; 95% CI: 0.17–1.02), although the difference did not quite reach statistical significance. We found no significant differences between exercise and all other interventions.
Harms and treatment discontinuation
Few trials adequately assessed differences in harms; no trial reported harms data by using a validated scale. Most trials combined spontaneous patient-reported adverse events with a regular clinical examination. Rarely did authors report whether adverse events were prespecified and defined. No trial was designed to assess specific adverse events as primary outcomes.
The risk of treatment harms was higher for SSRIs than for both acupuncture (RR 3.96; 95% CI: 3.40–4.62, 21 trials, 3,128 participants) and St. John's wort (RR 1.19; 95% CI: 1.05–1.34, 8 trials, 1,427 participants) (Fig. 3). Treatment harms did not differ between combination acupuncture plus SGA and SGA alone. Evidence was absent or inadequate to draw conclusions about the risk of harms for other CAM treatments or exercise.

Comparison of overall risk of harms of second-generation antidepressants with CAM interventions. Strength of evidence rated as high, moderate, low, or insufficient based on the Agency for Healthcare Research and Quality (AHRQ) guidance. CI, confidence interval; SGA, second-generation antidepressant.
The risk of treatment discontinuation because of adverse events (Fig. 4) was higher for SGAs than for both exercise (RR 21.0; 95% CI: 1.19–269.0) and St. John's wort (RR 1.70; 95% CI: 1.12–2.60). For comparisons of other monotherapies or combination treatments for which we had at least one trial, we did not find any differences in treatment discontinuation between intervention groups (Fig. 5).

Comparison of treatment discontinuation because of adverse event rates of SSRIs with CAM interventions. Strength of evidence rated as high, moderate, low, or insufficient based on the Agency for Healthcare Research and Quality guidance.

Comparison of overall discontinuation rates from SSRIs with other CAM Interventions. Strength of evidence rated as high, moderate, low, or insufficient based on the Agency for Healthcare Research and Quality (AHRQ) guidance. CAM, complementary and alternative medicine; CI, confidence interval; SAMe, S-adenosyl-L-methionine; SOE, strength of evidence; SSRI, selective serotonin reuptake inhibitor.
Discussion
Main findings of the study
For most treatment comparisons of CAM interventions or exercise with SGAs, we found no differences between treatment groups for response and remission. However, with the exception of St. John's wort, the risk of bias of these studies was either medium (50% of studies) or high (50% of studies); this finding led us to conclude that the strength of evidence for these findings was either low or insufficient. For St. John's wort, although our meta-analysis included only trials of medium or low risk of bias, we concluded that the strength of evidence was low because the comparisons with antidepressants were done with either moderate or low doses of the antidepressants. For all CAM therapies, we concluded that more evidence of a higher quality was needed to adequately assess the benefits and harms of these treatments compared with antidepressants.
Both Canadian (CANMAT) and American (APA) guidelines support the use of omega-3 fatty acid supplementation for patients with MDD based on evidence of modest efficacy versus placebo and low risk of harms. 6,7 However, a recent systematic review concluded that the evidence to support efficacy was weak; any benefit was likely to be small and not clinically meaningful. 44 Similarly, these guidelines agree about the positive benefit of using either exercise or St. John's wort for treating patients with mild-to-moderate MDD. For all three interventions, however, we concluded that the current strength of evidence did not support a strong recommendation for their use in managing patients with MDD. Given the low risk of harms associated with the CAM treatments or exercise, clinicians may choose to use them for patients with strong interest in such therapies, but they should be cautious about prolonged treatment trials for individual patients when benefit is equivocal, especially given the demonstrated and comparable efficacy of SGAs and cognitive behavioral therapy. 13,19
For acupuncture and SAMe, evidence is not yet sufficient to recommend a treatment trial in lieu of treatments with proven efficacy. For acupuncture, treatment guidelines and evidence from high-quality systematic reviews concur that the evidence of benefit is insufficient, although the risk of harms appears to be very low. 6 –8,45 –47 The Canadian guideline recommends SAMe as treatment only after failure of an initial proven therapy, whereas the American guideline calls for more studies to determine efficacy and does not recommend its use. Our results do not support the use of either of these therapies for first-line treatment of MDD.
Contrary to findings for treatment response and remission, we found greater risk of harms for certain SSRIs than for either acupuncture or St. John's wort. For both comparisons, the strength of evidence was sufficient to conclude that clinicians could have confidence that differences in harms exist. Further, patients were more likely to discontinue SSRIs than St. John's wort because of treatment-specific adverse events. Given the demonstrated superiority of St. John's wort to placebo, clinicians may find it reasonable to consider an initial treatment trial in patients seeking to use this supplement. However, St. John's wort is well known to interact with a variety of other medications, so its use should be guided by practitioners with a good understanding of its herb-drug interactions.
Limitations of the evidence base
We encountered numerous methodologic shortcomings in this evidence base. Among the more important ones were unclear randomization methods, high loss to follow-up, small sample sizes, and inadequate dosing for the SSRIs used in the comparative studies. These shortcomings most often led to low strength of evidence ratings, which tempered most of our findings. Although many of the treatments we evaluated may, indeed, be beneficial for managing patients with MDD, or at least patients with mild and perhaps moderate MDD, the quality of the current evidence base precludes any estimation of comparative efficacy with well-demonstrated interventions for MDD.
Risk of harms was often not adequately assessed. Most trials did not employ an objective instrument or scale to measure harms, and few trials used a systematic approach for evaluating them. Even for trials that did assess harms, the methods were often poorly described, precluding a thorough assessment of whether the approaches were adequate and unbiased. Further, most trials were small and of a short duration; these factors limited the validity and generalizability of their harms assessments.
Treatment expectancy (i.e., patients expecting a positive outcome) was rarely factored into trial designs or results. Expectancy may play a large role in determining outcomes for CAM interventions, especially in countries where a treatment is commonly used (e.g., acupuncture in China or St. John's wort in Germany). We found only a single study, a re-analysis of the U.S. Hypericum Depression Trial, 52 that considered the role of expectancy. 53 Interestingly, those researchers concluded that treatment expectancy was more strongly associated with outcomes than the treatment that patients actually received. Treatment expectancy may also play an important role for trials of acupuncture. 11
Next steps
Considering the various shortcomings that we identified in the evidence base, we suggest the following recommendations for investigators leading future trials: 1. Study outcomes should include patient-centered measures such as functional capacity, quality of life, and comparative harms; 2. Treatment expectancy and depressive severity should be considered treatment effect modifiers; 3. Comparative studies should incorporate appropriate dose escalation protocols extending over the full range of antidepressant dosages; 4. Additional CAM therapies with evidence of positive treatment efficacy for MDD, such as yoga and meditation, should be included in future comparative studies.
Conclusions
Although we found little difference in the comparative efficacy of most CAM therapies and SGAs for treating patients with MDD, the overall poor quality of the available evidence base tempers any conclusions that we might draw from those trials. An important exception is that SSRIs may lead to more adverse events and treatment cessation when compared with acupuncture or St. John's wort. Although this evidence base does not provide definitive answers about the comparative benefits and harms of these interventions, clinicians with knowledge of the safety profile of St. John's wort may want to consider a trial of its use for the initial treatment of MDD in patients with a strong inclination to use CAM therapy.
Footnotes
Acknowledgments
The authors would like to thank Aysegul Gozu, MD, MPH, from the U.S. Agency for Healthcare Research and Quality, for support and advice throughout the project. They also express their appreciation to RTI colleagues Meera Viswanathan, PhD, Director of the RTI International–University of North Carolina Evidence-based Practice Center, for dedicated encouragement and leadership and to Loraine Monroe, for exceptional document preparation efforts. They also extend their gratitude to Irma Klerings, MA, from Danube University, Krems, Austria, for literature searches. This study was supported by Contract 290-2012-00008i from the U.S. Agency for Healthcare Research and Quality to RTI International.
Author Disclosure Statement
This project was funded under Contract No. HHSA290201200008i from the Agency for Healthcare Research and Quality, the U.S. Department of Health and Human Services. The authors of this article are responsible for its content. Statements in the article should not be construed as endorsement by the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services. AHRQ retains a license to display, reproduce, and distribute the data and the report from which this article was derived under the terms of the agency's contract with the authors.
This topic was nominated by the American College of Physicians and selected by AHRQ for systematic review by an EPC. A representative from AHRQ served as a Contracting Officer's Technical Representative, provided technical assistance during the conduct of the full evidence report, and provided comments on the draft versions of the full evidence report. AHRQ did not directly participate in the literature search, determination of study eligibility criteria, data analysis or interpretation, or preparation, review, or approval of the article for publication.
