Abstract
Objectives:
This study aimed to identify the factors that might have contributed to the conflicting outcomes about the efficacy of acupuncture for tension-type headache (TTH) through systematically reviewing relevant randomized controlled trials.
Methods:
Thirteen (13) databases were searched from their inception until August 2010. There were no restrictions on language or year of publication. Included studies were randomized controlled trials comparing real with sham acupuncture, with patient selection guided by the International Headache Classification, and reported headache days. Meta-analyses and subgroup analyses were undertaken to compare the effects of real and sham acupuncture interventions and the effects of acupuncture with various needling techniques and treatment modes.
Results:
Forty-three (43) studies were retrieved for further assessment from 120 potential studies. Finally, five studies of high methodological quality were included in this review. Standard mean difference (SMD) of the included studies showed no statistical significance between real and sham acupuncture (−0.31; 95% confidence interval [CI] −0.72 to 0.09), however, the heterogeneity among the studies was high (I2=81%). Subgroup analyses reduced heterogeneity, and showed that electro-acupuncture (SMD−1.60; 95% CI −2.33 to −0.88) to be more efficacious than manual acupuncture (SMD −0.13; 95% CI −0.41 to 0.14); needle retention with 30 minutes (SMD-0.46; 95% CI −0.87 to −0.06) being better than no needle retention (SMD 0.45; 95% CI −0.11, 1.01); and twice-a-week treatment (SMD −0.46; 95% CI −0.87 to −0.06) was better than once-a-week treatment (SMD 0.45; 95% CI −0.11, 1.01).
Conclusions:
Acupuncture stimulation mode, needle retention, and treatment frequency could be important factors contributing to the outcome of acupuncture for TTH. Further studies are warranted to determine treatment parameters to ensure effective translation of RCTs outcomes of acupuncture for patients with TTH.
Introduction
Acupuncture is a complex intervention. Its active ingredients are not well defined. 13,14 A recent Delphi study of expert opinions on the essential components of quality acupuncture treatment has identified 14 domains with 26 items, including needle stimulation mode, treatment duration, frequency of treatment, practitioner training, and trial monitoring. 15
To improve the quality of efficacy research on acupuncture for TTH, there is a need to identify factors contributing to the inconsistent outcomes (for instance, the adequacy and administration of trial acupuncture treatments). This systematic review aimed to identify the factors contributing to conflicting findings through conducting a meta-analysis and subgroup analyses comparing the effects of real and sham acupuncture on TTH. Specifically, this review explored the effect of the factors that have been considered to have direct impact on the quality of acupuncture trial as described elsewhere. 15 These factors are mode of acupuncture, manipulation of needle, frequency of treatment, and the number of trial centers, and other factors identified through qualitative assessment of the included studies.
Methods
Search strategy
Thirteen (13) major databases were searched consisting of four leading Chinese databases (CNKI, CQVIP, Wanfang database, and CBM) and nine English databases (Pubmed, Embase, CINAHL, Proquest, Cochrane library, Acubriefs, Science direct, SCOPUS, Informit) up to August 2010. There were no restrictions on year or language of publication. The search terms were the following:
RCTs, randomized controlled trials.
Selection criteria
Included studies (1) were randomized or quasirandomized controlled trials (RCTs or quasi-RCTs); (2) had adult patients with TTH diagnosed according to International Headache Society (IHS) criteria or Ad Hoc committee's criteria; (3) reported headache days as an outcome measurement; (4) employed invasive acupuncture needling; and (5) needled acupoints, Ashi point, and/or trigger/tender points. Studies were excluded if (1) they did not have a sham acupuncture control group; (2) point injections were used because it is difficult to differentiate if the efficacy was from medication or acupuncture itself; (3) dry-needling was employed because its theory and practice is not in accordance with traditional acupuncture; or (4) results about participants with TTH were reported separately from those with other types of headache, such as migraine.
Outcome measure
The outcome measure was headache days 16 at the end of the treatment and follow-ups.
Data extraction
Two (2) reviewer authors (XYH, LD) extracted data from eligible trials independently. Major characteristics including methods, participants, interventions, and outcome measures were recorded for analysis. Other information including protocols of treatment and selection criteria was also extracted for study comparisons. Disagreements were solved by discussion and consult with a third reviewer (ZZ).
Assessment of methodological quality
The methodological quality of each study was assessed by 2 reviewers (XYH and LD) using Jadad Scale, 17 Internal Validity Scale (IVS), 18 and the Oxford Pain Validity Scale (OPVS). 19 Additionally, allocation concealment was ranked as A: adequate, B: unclear, C: inadequate, and D: not used. 20 A study with a score of 3 or more points on a Jadad scale is considered high quality. Similarly, with IVS and OPVS rating, studies reaching 60% of the total scores can be regarded as high quality. 21
Data analysis and synthesis
Review Manager 5.1 was used for meta-analysis. Assessment of heterogeneity was evaluated using the I2 test; a low, moderate, and high I2 value was indicated by 25%, 50%, and 75%, respectively. 22 A random-effect model was used if significant heterogeneity (I2 ≥50%) among the trials was detected. For continuous data, weight or standard mean difference (SMD) was used. When different scales were used for assessing one outcome measure, SMD was calculated.
Subgroup analysis
When heterogeneity was high, qualitative data were examined to identify the potential sources. Subgroup analyses were then conducted to verify the sources. 20
Results
Search result and eligibility
Five (5) of 120 trials involving 838 TTH participants
3
–5,23,24
were selected, and the remaining were excluded because they reported invalid outcome measures, were duplications, had invalid interventions, or did not use IHS diagnostic criteria (see Supplementary Table online at

Flowchart of study selection. RCTs, randomized controlled trials.
Study characteristics
The characteristics of the five trials are listed in Table 2. Three (3) 5,24,25 were conducted in Germany, and the remaining two 3,23 were from the United Kingdom and Australia, respectively. The sample size ranged from 40 to 409, with three multicenter trials 5,24,25 having a larger sample size. There were more females than males in the studies. The average age of the participants ranged from 30 to 50 years. 3,5,23,25
IHS, International Headache Society; CGI, Clinical Global Impressions; NHP, Nottingham health profile; SIP, sickness impact profile; ICD, International Classification of Diseases; D-S, von Zerssen Depression Scale; ELQ, Everyday-Life-Questionnaire; PTH, Mechanical pain threshold; VAS, visual analogue scale; ICHD, International Headache Classification; GHQ, General Health Questionnaire; PPT, pressure pain thresholds; IVS, Internal Validity Scale; CTTH, Chronic Tension-type Headache; HDI, Headache disability index; SES, Schmerzempfindungs-Skala; OPVS, Oxford Pain Validity Scale; ETTH, Episodic Tension-type Headache; LS, Life-Quality scale; ADS, depression scale (Allgemeine Depressionsskala); RCT, randomized controlled trial; SF-36-item, Short Form Quality of Life Survey; SF-12,12-item Short-Form Health Survey.
Quality of the trials
Scores of Jadad, IVS, and OPVS are presented in Table 2. Based on the Jadad sores, all five studies were of high methodological quality. There was a high correlation between Jadad and IVS scores (r=0.97 p=0.006), but not between either Jadad and OPVS or IVS and OPVS. There was no correlation between methodology quality and treatment effect.
Characteristics of acupuncture treatments
Details of acupuncture treatments are presented in Table 3. One (1) study 3 used electroacupuncture (EA) and selected distal points only, whereas the rest 5,23 –25 chose manual acupuncture (MA) using both local and distal points.
Rating on Rationale for Treatment:
A: rationale for treatment has been clearly stated in text.
B: rationale for treatment can be assumed through the text.
C: rationale for treatment can be assumed through the text but not clear.
D: didn't mention the rationale for design of the treatment protocol at all.
Four (4) studies 3,5,24,25 had 30 minute-needle retention with de qi sensation, whereas one applied 15 seconds of needling stimulation without intention to obtain de qi or needle retention. 25
Treatment sessions range from 8 to 15 over 4–8 weeks, with weekly or twice-a-week treatment. Three (3) studies applied pulse and tongue diagnosis or Chinese medicine (CM) diagnostic questionnaires. 3,5,24 All trials reported number and training background of practitioners except for one. 25 Four (4) studies intended to achieve individualized treatment through semistructured protocols, 3,5,23,24 and one used fixed points for all patients. 25
A comparison of sham acupuncture is provided in Table 4. All studies used nonacupuncture points and applied the same needle retention time as that in the real acupuncture treatment. Three (3) 3,5,24 avoided using any points on the head. Three (3) 3,4,24 used an invasive method, by means of superficial insertion onto on nonpoints without de qi. The other two studies 23,25 used a noninvasive method.
Efficacy of acupuncture in headache days
All five studies 3,5,23 –25 were included in the meta-analysis. SMD was calculated because one study reported headache days per week 23 ; two studies 3,25 reported headache days per month; and the other two studies 5,24 reported headache days per 4 weeks.
Results are presented in Figure 2. Real and sham acupuncture groups were not statistically different at any time point with regard to headache days. A high heterogeneity (p=0.0004; I2=81%) was identified at the end of the treatment. Low heterogeneities were presented for analyses of short-term and long-term follow-up periods (I2=33% and I2=30%, respectively).

Forest plot: Acupuncture versus sham acupuncture, outcome: Headache days. SD, standard deviation; CI, confidence interval.
Subgroup analyses and investigation of heterogeneity
Subgroup analyses were conducted to examine the source of heterogeneity according to qualitative data of the features of acupuncture treatments (Table 5).
SMD, standard mean difference; CI, confidence interval; EA, electroacupuncture; MA, manual acupuncture; N/A, not applicable.
Type of acupuncture (EA versus MA)
When four 5,23 –25 MA studies were compared with one EA study, there was a statistically significant subgroup difference favoring EA. The overall heterogeneity was moderate (I2=57%) (Fig. 3A). After the removal of White's study, which had no needle retention, the heterogeneity was reduced. The subgroup difference remained statistically significant (χ2=12.69, p=0.0004) (Fig. 3B).

Exploration of the sources of heterogeneity.
Needle retention (30 minutes versus no retention)
Four (4) studies 3,5,24,25 with 30-minute needle retention with de qi sensation were compared with one study with 15 seconds of needling. 25 There was a high heterogeneity (I2=78%) (Fig. 3C). After the removal of the EA study, heterogeneity was reduced (I2=0%). Among the remaining four MA studies, there was a statistically significant subgroup difference between studies having 30-minute retention and those with no needle retention (χ2=5.70, p=0.02) favoring longer needle retention (Fig. 3D).
Frequency of treatment (twice a week vs. once a week)
Four (4) studies 3,5,24,25 with twice-a-week treatment sessions were compared with one study having weekly treatment. 25 This subgroup analysis involved exactly the same studies as in the comparison of needle retention, and the same result was obtained (Figs. 3C and 3D).
Number for study center (multicenter versus single site)
Three (3) multicentered trials were compared with two single-site studies. There was no statistically significant difference (χ2=1.60, p=0.21). Heterogeneity was high among the two single site-studies (I2=88%) because EA and MA were used, respectively. Removal of the one study with short-needle retention did not change the results (Figs. 3E and 3F).
A comparison of two multicenter trials
A detailed comparison was conducted of the two multicenter studies 5,24 (Table 6). Information about the trial methods of the two studies were extracted from another two publications. 26,27 They were similar in their design, but differed in outcomes, with one study showing significantly less headache days reported in the real acupuncture group than the sham group 24 (SMD −0.31; 95% CI −0.51 to −0.12) and the other reporting no group difference 5 (SMD−0.10; 95% CI−0.42 to 0.21).
CM, Chinese Medicine; TCM, Traditional Chinese Medicine.
The two studies differed in treatment protocol and administration. In Endres' study, protocol adherence was enforced with trial monitors visiting the centers regularly; in Melchart's study, this was not described. In that study, one center did not use two out of three mandatory points as the points were considered unnecessary by the trial practitioners at the center. Data from the center were excluded from the final outcome. Nevertheless, the incident highlighted the importance of protocol adherence and trial monitoring.
Discussion
Results from meta-analysis showed no statistically significant difference between real and sham acupuncture on headache days. Through subgroup analyses, it was found that mode of acupuncture stimulation, duration of needle retention, and frequency of treatment could be the contributing factors to a lack of difference. In clinical practice, a typical acupuncture treatment involves 20–30 minutes of needling retention, repeated de qi sensation or stimulation, and daily or twice-weekly treatment. This review conforms to the importance of these factors. Trials with acupuncture treatment that neglect these factors might have provided suboptimal treatments, leading to a lack of difference between real and sham acupuncture.
The impact of the number of trial centers on the outcome of acupuncture treatment for TTH is not clear. Through qualitative analysis of two multicenter trials, it was further identified that adherence to treatment protocol could be another key contributing factor for multicenter trials.
A comparison with other reviews
Among four existing meta-analyses comparing the effectiveness of real with sham acupuncture intervention, 1,4,10,28,29 only two used the headache days as one of the outcome measures, 1,10 including the latest Cochrane review by Linde and colleagues finding that real acupuncture was significantly better than sham acupuncture on reducing headache days (WMD−1.56; 95% CI−3.02 to −0.10), and the other by Davis that reported no difference between the two interventions, the same as ours.
The Linde review restricted trials to those with 8 weeks or more observation period after randomization. As a result, they included four out of five studies selected for the current review and excluded one EA study with a 6-week treatment period. This EA study was included because it meets the authors' criteria, and a 6-week observation period was considered sufficient to detect the short-term effect of acupuncture. The Davis review included the same five studies as those of the current review. The EA study increased the heterogeneity of the meta-analysis, resulting in no difference between real and sham acupuncture. The Linde review reported small heterogeneity of I2=13%, and we reported a large heterogeneity of I2=81.
The main aim of this review was to utilize subgroup analyses to explore the source of heterogeneity, which was not conducted in either of the two reviews. Subgroup analyses are frequently used to extract valuable information from RCTs. 30 –32 It has also been applied in acupuncture systematic reviews to investigate and interpret heterogeneity. 33 To ensure that only high-quality studies were included, this review employed three commonly used scales (i.e., Jadad, IVS, and OPVS), considering selection bias, performance bias, and attrition bias, as well as the sample size and outcome measures used in pain studies. All five studies were found to be of high reporting and methodological quality.
Adequacy of acupuncture treatment and reporting of treatment details
Adequacy of acupuncture treatment is as important as methodological quality when considering the validity of the conclusions. 34 –37 However, the adequacy and quality of treatment was seldom assessed in the existing systematic reviews of acupuncture. 34 A recent study using the Dephi method 15 identified contributing factors as recommended by experts. Mode of acupuncture, frequency of treatment, and needle manipulation, identified in the current review, were also recommended by experts as key contributing factors.
Stimulation mode: EA versus MA
In this review, the study with EA produced a better result in headache reduction than MA did. EA has found by other researchers to be effective in reducing pain. 38,39 Up to now, there has not been any study directly comparing EA with MA on TTH. However, comparisons were made in some health conditions, and EA was better than MA in treating patients with tennis elbow, 40 knee osteoarthritis, 41 and fibromyalgia syndrome. 33
When measured with functional magnetic resonance imaging, EA produced a more widespread signal increase in the brain than MA did. 42 In addition, a greater analgesic effect was found when comparing EA with MA or sham acupuncture. 43
Needle retention and treatment frequency
In most acupuncture clinical trials for headache, 20–30-minute treatments with more than 10 minutes of needle retention per session were applied. 44 –48 However, standards or guidelines of needle retention time are not available for TTH treatment currently. A study for cerebral palsy reported that the group with needle retention of 30 minutes had better results than the group with 5-minute needle retention. 49 One recent study also identified EA of 15–20 minutes was better than that of 5 or 10 minutes. 50
Another factor is the frequency of treatment. One (1) study in this review applied weekly treatment, which was regarded as being insufficient to produce an analgesia effect. 51 Short needle retention time together with inadequate acupuncture treatment frequency (e.g., weekly 5-minute treatment) could underestimate the treatment effect. 52
Other factors identified
Other potential factors identified through qualitative analyses of the data in this reviews are de qi, point selection, CM diagnosis, background of trial acupuncturist, and adherence to trial protocol. Due to insufficient data, subgroup analyses were not conducted.
de qi is regarded as the essential part of acupuncture and the predictor of a positive outcome during an acupuncture treatment. 51 A strong correlation between de qi and analgesic effect was found in healthy humans. 53
Studies included in this review had two categories of point selection (i.e., fixed or semistandardized point selection) like many other acupuncture studies 54 –59 . However, fixed, a predetermined acupuncture treatment protocol was not suggested in headache clinical trials, 60 given the varieties of headache presentations.
Individualized or semistandardized point selection requires standard CM diagnosis of TTH. Variations in these studies indicate a need to establish the CM differentiation diagnosis of TTH before further trials are conducted.
Multicenter studies inevitably encounter more problems than single-center trials, especially in nonpharmacological studies. As described previously, delivery of intervention could vary from one center to another even when the same protocol is used. In addition, the quality of nonpharmacological treatment largely relies on the skills and experience of the trial practitioners. As a result, skilled practitioners and reporting and assessment on protocol adherence are imperative. 61 Careful monitoring is necessary for ensuring the credibility of the results. 62
Strengths and limitations
This review is among the first studies that explore the contributing factors to inconsistent findings from systematic reviews of acupuncture on TTH. It does not aim to assess the effect and safety of acupuncture for TTH as other systematic reviews aimed to achieve. As a result, all sham-acupuncture controlled trials were included without including studies comparing real acupuncture with no acupuncture or other active treatments. Authors of two studies with insufficient data 23,25 were contacted, but no reply was obtained from them. Also, Chinese literature was systematically searched from the leading Chinese databases, but none met the selection criteria.
The limitation of this review is the small sample size with only five studies meeting the selection criteria. As a result, some potential factors, such as obtaining de qi during the treatment, CM diagnosis, and point selection were not able to be explored. Furthermore, the conclusion of the current review is limited by the weakness of subgroup analyses, which help identify the source of heterogeneity, but do not provide definite answers as studies that directly compare the factors do. For instance, EA was found to be better than MA in the current review. A definite conclusion can only be drawn when EA and MA are compared in a study.
A consensus on quality acupuncture needs to be reached. The Delphi study by Smith and colleagues has provided the first step. The present review illustrates the importance of some factors. The factors identified from the Delphi and this review should be assessed via subgroup analyses planned in systematic reviews of acupuncture and purposely designed trials in patients and healthy humans.
Implications and Conclusions
Stimulation mode, needle retention, and treatment frequency are important factors contributing to the outcome of acupuncture treatment for TTH. For clinical practice, the ideal acupuncture treatment protocol for TTH could be EA with 30 minutes needle retention and twice-weekly treatment. Further studies comparing EA with MA and comparing individualized treatment with standard formula are needed. Researchers also need to consider number of trial sites, practitioner training, and protocol adherence. To ensure that systematic reviews and trials of acupuncture are of high quality, consensus on quality acupuncture treatment and administration need to be established with evidence-based approaches.
Footnotes
Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
