Abstract
Objectives: The authors systematically reviewed the outcomes and methodological quality of 24 Internet addiction (IA) treatment outcome studies in China. Method: The authors used 15 attributes from the quality of evidence scores to evaluate 24 outcome studies. These studies came from both English and Chinese academic databases from 2000 to 2010. Results: Among the 15 attributes, only sequence generation and intention-to-treat were reported by more than 50% of the 24 studies. None of the studies contained treatment adherence ratings or collateral reports. Cognitive behavior therapy combined with family therapy or group therapy emerged as possibly efficacious treatments. Conclusions: More rigorously designed studies, accompanied by transparent reporting of methods and findings are needed to identify promising IA treatments.
Internet addiction (IA; also called pathological Internet use, excessive computer and video game playing, Internet overuse, problematic Internet use, etc.) has become a widespread problem among youth ages 9–23. According to a 2009 national Harris poll survey with a randomly selected sample of American youth ages 8–18, about 8% of video game players exhibited pathological patterns of play (Gentile, 2009), which is also considered a type of IA. In a 2005 survey in Germany, among 323 children from 11 to 14 years of age, 9.3% of the children fulfilled all criteria for excessive computer and video game playing (Gruesser, Thalemann, Albrecht, & Thalemann, 2005). In Australia, excessive computer playing corresponding to addictive behavior was found in 12.3% of school-age children, adolescents, and emerging adults (Batthyany, Muller, Benker, & Wolfling, 2009). This public health problem might be even worse in China. According to the China Youth Internet Association’s 2009 Report on Youth IA Disorder, around 14.3% of youth ages 13–17 and around 15.6% of youth ages 18–23 in China with computer access had IAs (China Youth Internet Association, 2010).
Although there are many cross-sectional studies on the correlates of IA and some review articles, most articles only discuss promising treatments in passing (Brezing, Derevensky, & Potenza, 2010; Chou, Condron, & Belland, 2005; Huang, Li, & Tao, 2010; Young, 2009a, 2009b). Thus, there is no comprehensive review of outcomes of different IA treatments, and there is no review article that focuses on the methodological quality of such studies. Chou, Condron, and Belland (2005) systematically reviewed research on IA definitions, assessments, risk factors, addictive potentials before 2005. However, only two empirical treatment studies, which used cognitive–behavior therapy (CBT) and motivational enhancement therapy, were included in this review. Widyanto and Griffiths (2006) also explored IA research but did not contain detailed information about treatment outcome studies. These above two reviews suggest that cognitive behavioral therapy has been used to treat IA, but with so few studies, conclusions about its effectiveness are premature. In addition, Young (2009) describes a cognitive behavioral therapy model that includes behavior therapy and self-monitoring (i.e., a daily Internet use log) to treat IA. However, this article is only description of potential IA treatments but not actual empirical study.
Although few studies outcome studies exist in the United States, many outcome studies have been conducted in China. Therefore, this review article aims to systematically review IA treatment outcome studies published in English and Chinese language journals that focused on IA treatment in China from 2000 to 2010. Our primary objective is to evaluate the methodological quality of different treatment outcome studies. Our secondary objective is to find out which treatments are empirically supported. Based on evidence-based practice requirements, this systematic review on IA outcome studies will help social workers identify the most effective treatments for IA. Further, by focusing on methodological attributes of these studies, we will identify those practices with the largest effects sizes from the highest quality studies.
Definitions of IA vary considerably among different scholars. Some definitions, and the resulting measures that have been developed like Young’ (2004) IA scales, flow from the conceptualizations of substance use disorders and pathological gambling in the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; Fisher, 1994; Skoric, Teo, & Neo, 2009; Young, 2004). Similar to these problems, IA is conceptualized as a problem where frequent use leads to significant impairment in one’s life (i.e., urges to play, ignoring personal responsibilities, and loved ones). According to Block (2008), IA appears to be a common disorder that merits inclusion in DSM-V and conceptually the diagnosis is a compulsive-impulsive spectrum disorder that involves excessive online and/or computer usage use, withdrawal, and tolerance. Although a thorough discussion of diagnostic issues surrounding IA is outside of the scope of this article, when we refer to IAhere, we include both problematic Internet and computer use (i.e., video game) that has been diagnosed with the best available, empirically validated measures.
Some of the most commonly used scales in China are Young’s (1995–2004) diagnostic questionnaires. Young developed four questionnaires, based on pathological gambling measurement, called the IA scales (IAS). They range from 7 to 20 items (IAS-7, IAS-8, IAS-10, and IAS-20). Example items include “do you feel the need to use the Internet with increasing amounts of time in order to achieve desired satisfaction?” and “have you repeatedly made unsuccessful efforts to control, cut back, or stop Internet use?” The Chinese Internet Addiction scales (CIAS 1998 and CIAS 2005) are based on Young’s (1995–2004) scales (Yang, Zheng, & Ruan, 2004; Zhang, 2009). The CIAS 1998 include four factors: tolerance, compulsive behavior and time, withdrawal symptoms, and related mental health problems (Yang et al., 2004). The CIAS 2005 were used in a large epidemiological study in China, called the Chinese Adolescent IA Report 2005 (Zhang, 2009). China developed its own diagnostic criteria for IA in late 2008 and currently uses them in treatment settings (Liu, Wang, & Zhuang, 2008). The criteria are as follows: (a) more than 6 hr a day of Internet use; (b) this heavy Internet use persists for more than 3 months; (c) social, study, and communication skills dysfunctions; (d) dependence symptoms (e.g., a strong desire and impulse) are present; and (e) withdrawal (e.g., uncomfortable, easy angered, attention deficit, sleep disorder without Internet use).
In addition to these criteria developed in China, there is evidence that IA is associated with attention deficits, bad academic performance, depression, self-injury, hostility, and violent behaviors (Batthyany et al., 2009; Bernardi & Pallanti, 2009; Bioulac, Arfi, & Bouvard, 2008; Caplan, Williams, & Yee, 2009; Chan & Rabinowitz, 2006; Frolich, Lehmkuhl, & Dopfner, 2009; Gentile, 2009; Ha, Yu, Park, & Lim, 2009; Hwang, Cheong, & Feeley, 2009; Kim, Namkoong, Ku, & Kim, 2008; Ko, Yen, Chen, Yeh, & Yen, 2009; Lam, Peng, Mai, & Jing, 2009; Skoric et al., 2009; Sun, Ma, Bao, Chen, & Zhang, 2008). Therefore, researchers sometimes use the Symptoms Checklist 90 (SCL-90; Derogatis, 1994; Wang, Wang, & Ma, 1999) to evaluate the treatment outcomes of IA besides the IAS and CIAS in hospitals and clinical settings.
Method
Study Selection
An extensive literature search was conducted both in English academic databases (social sciences citation index [SSCI], Social Services Abstracts, Social Work Abstracts, PsycINFO w/ PsycARTICLES) and Chinese academic databases (WanFang Database, China national knowledge infrastructure [CNKI] Chinese academic journal web databases, WeiPu databases) published from 2000 to May 2010. Keywords such as “Internet addiction, pathological Internet use, excessive Internet use, problematic Internet use, video game addiction” were used in the searching combined with keywords of “treatment” or “intervention” or “evaluation”. Based on this search, we selected studies that (a) reported posttreatment outcomes, (b) included participants ages 9–23 that were treated for an IA, (c) were conducted in China, and (d) for which we could obtain the full text report. Review papers and case study reports are not included in this review. We chose a large age rage since we preferred to contain as many studies as possible on IA treatment and usually studies which were done in clinical settings in China had a large age range. Using these criteria, we located 24 studies, which are evaluated in this article.
Selection of Methodological Criteria
Methodological quality attributes of IA outcome studies in this article were established based on a recent review of adolescent substance abuse trials (Becker & Curry, 2008). Becker and Curry reported whether 14 methodological attributes were present in adolescent substance abuse treatment trials. These attributes are consistent with the Consolidated Standards of Reporting Trials (CONSORT statement; Moher, Schulz, & Altman, 2001). The study attributes that were coded included objective, sample size, power, outcome, random sequence generation (i.e., how randomization was achieved), allocation concealment, active comparison, baseline data, manualized treatment (1 = yes, 0 = no), treatment adherence rating, collateral report, collection of an objective measure besides self-report (e.g., collateral report; 1 = yes, 0 = no), intention-to-treat (ITT) analysis, and blind assessment. In our study, we also coded whether the follow-up data were collected at least 30 days after the study treatments ended. Thus, we used 15 attributes to evaluate the methodological strength of the 24 IA treatment outcome studies (see Table 1 ).
Attributes of Methodological Quality
Rating Process
The first and second authors independently reviewed the 24 studies based on the 15 attributes. After that, κ coefficients were used to assess the reliability of each attribute. Discrepancies were resolved through discussion with the third author. All three authors engaged in a discussion until consensus was reached. In conjunction with Becker and Curry’s (2008) article, we also used a composite quality of evidence score (QES) to indicate the number of methodological attributes each study met. We coded each attribute 0 (not met or unclear) or 1 (met), except for the ITT analysis item where items ranged from 0 to 2 (0 = treated case analysis, 1 = available cased analysis, and 2 = full ITT analysis).
We report QES scores separately for three different study designs found in the literature, as some QES items do not apply to some designs. The first type of design is a one group pretest and posttest, called a “preexpeerimental design” study. The second design style is nonrandomized comparison groups design. The third one is the experimental design (Rubin & Babbie, 2011). For preexperimental design studies, some attributes (e.g., random sequence generation, allocation concealment, active comparison, and baseline data) did not apply and were coded as 0 for such studies. After summing items, the range of the total QES is 0–16 for experimental design studies, 0–15 for nonrandomized comparison group design studies while the total QES is from 0 to 12 for preexperimental design studies.
Effect Size Calculation Process
We calculated the effect sizes of the studies mainly by using Cohen’s d (Cohen, 1988). The studies primarily used IAS or SCL-90 as the outcome measurement tools in the 24 studies. According to these two measurement tools, higher scores indicate more severe IA or health problems, respectively. For studies which only used SCL-90, we chose depression factor as the representative outcome factor to calculate effect size, since depression was the most common correlate of IA disorders in China. If the studies used continuous outcome measures and there were control groups, we calculated the effect size by subtracting posttest control group outcomes from those of the experimental treatment group, dividing by the standard deviation of the controls. Similarly, for studies with continuous outcome measures and preexperimental designs, we calculated the effect size by subtracting pretest values of the dependent measures from post treatment values, dividing by the standard deviation of the pretest data wave. Finally, for studies using dichotomous outcome indicators, we used risk ratios as estimates of the effect size. Therefore, the effect sizes are initially expressed as negative values, since posttest scores and treatment group scores were smaller than the pretest scores and control group scores, respectively. So, we transformed negative values to positive ones so that higher positive values indicate larger effect sizes in favor the treatment group (i.e., comparison group design) or posttest wave (i.e., preexperimental designs). In order to prevent outliers from exerting too much influence on analyses, we used winsorizing to compare effectiveness of single treatments and multimodal treatments (Erceg-Hum & Mirosevich, 2008). In this analysis, we also transformed negative effect size scores to positive scores for better interpretation.
Results
Methodological Attributes Across Studies
The κ values for each of the 15 methodological attributes are described in Table 2 . Except for 2 of the 15 attributes, outcome (κ = .57) and active comparison (κ = .52), we reached moderate or high agreement (ranged from 0.63 to 1.00). The average κ value across the studies is .79, indicating a substantial agreement between the two raters, in accord with Landis and Koch’s (1977) benchmark. Discrepancies were resolved through communication with the third author, who had advanced clinical and research experience in outcome study evaluation. The two raters ultimately reached unanimous agreement on each attribute in order to calculate the QES.
Design Characteristics of Internet Use Disorder Outcome Studies
Note. ITT = intention-to-treat analysis; QES = quality of evidence scores; SD = standard deviation.
The reviewed studies are classified into three categories. We examined how each type of study meets the 16 characteristics separately because some criteria are not applicable to all studies. Over 50% of the studies were methodologically strong in two characteristics, sequence generation, and ITT analysis. However, less than 50% of the studies were strong in the following 13 attributes: objective, baseline data, sample size, power, outcome, active comparison, allocation concealment, manualized treatment, treatment adherence rating, objective measure, blind assessment, and 30 days follow-up.
Most frequently reported attributes. Random sequence generation is one of the most frequently mentioned attributes. It is only applicable to the experimental design studies, as opposed to preexperimental studies and nonrandomized comparison group design studies, which do not need to carry out this process. Half of the experimental design studies (n = 6) mentioned that they implemented the process for generating a random sequence. For example, these studies indicated that they used either a random number table or statistical software to generate random numbers in order to ensure that each participant had an equal chance to get intervention. For the remaining half of the studies, we cannot tell how the randomization was achieved because they did not describe the process explicitly.
The next attribute reported frequently is ITT analysis. Five of the eight preexperimental studies applied true ITT analyses. The remaining three either used treated case analysis (n = 2) or available case analysis (n = 1). Half of the nonrandomized comparison design studies used ITT analyses, and the other half used treated case analyses (n = 2). For the experimental design studies, 66.67% of the studies applied ITT analyses (n = 8), whereas two used available case analyses, and two used treated case analyses. Since sample attrition is inevitable in intervention studies, dealing with missing data in the analysis is a challenge in program evaluation. ITT analysis, in which all participants allocated to the study are included in the analysis, regardless of whether they received the treatment or not, has been established to produce less bias in estimating the true program effect than that produced by other analysis methods, such as available case analysis and treated case analysis (Freedman, 2005). The application of this method in these studies is an indication of researchers’ efforts to address the estimation bias of the treatment.
Least frequently reported attributes. Two of the least frequently reported attributes were treatment adherence ratings and collateral reports. None of the studies reported these two characteristics. These studies did not use scales, checklists, or rating forms to monitor the process of treatment implementation, in the absence of which we can determine neither whether the participants complied with the treatments nor whether the therapists implemented the models as intended. Consequently, it is difficult to tell what treatments were provided. Similarly, none of the studies used collateral reports, Therefore there is no control of potential bias in self-reported outcomes.
The next least frequently reported attributes included baseline data, sample size, allocation concealment, and blind assessment. Only less than 50% of the experimental design studies reported these four attributes. Nonrandomized comparison group design studies ignored these four attributes completely. None of preexperimental design studies reported the baseline data, sample size, and blind assessment. Five of the 12 experimental studies provided information concerning baseline demographics and clinical characteristics of participants in both experimental groups and control groups. For example, they provided t tests or chi-square tests for the demographic characteristics and participants’ Internet use behavior at baseline, through which we established whether there was preexisting selection bias between the two groups. Only one experimental study explicitly showed their power calculation formula to show how the sample size was determined. Two experimental studies specifically mentioned that they collected their baseline data before they randomized the intervention group and control group or that they collected the baseline data without knowing who was in which group. Similarly, two studies conducted the follow-up assessment by means of a treatment-blind evaluator.
Approximately a quarter of the studies reported the following characteristics: objective, manualized treatment, and objective measure. Only one of the eight preexperimental studies and one of the 12 experimental studies specify their study objectives and hypotheses. The lack of information about hypotheses indicates that they have not articulated which outcomes are the most important for their studies. There were few preexperimental studies (n = 1) and experimental studies (n = 2) that mentioned using standardized training manuals to guide their treatments, whereas most of them simply used self-designed manuals or no manual at all. In terms of the measurement of the outcome, less than a quarter of any of these studies used objective measures as opposed to self-report scales. The final block of infrequently encountered attributes included power, outcome, active comparison, and follow-up assessment longer than 30 days. Half of the preexperimental studies (n = 4) achieved adequate statistical power for their design with at least 30 participants for the pretests and posttests. However, studies with comparison groups were less likely to achieve adequate statistical power because at least 71 subjects per condition were needed for active comparison, and 27 subjects per condition were required for passive comparison (Kazdin & Bass, 1989). Half of the preexperimental studies (n = 4) and half of the nonrandomized studies (n = 2) established primary and secondary outcome measures, while only two experimental studies specified which outcome was more important. One quarter of the nonrandomized experimental studies (n = 1) used active comparison involving psychotherapy to treat IA for the control group, and 7 of the 12 experimental studies applied active comparison such as medication or exercise therapy. Finally, two preexperimental studies conducted a follow-up assessment after 30 days to track the lasting effects of the treatment, whereas 4 of the 12 experimental studies did so.
Quality of Evidence and Effect Sizes
Among the 24 studies, the most commonly used treatments for IA are exercise programs (EP), CBT, electroacupuncture (EA), family therapy (FT), group-based treatment (GT), motivational interviewing (MI), and psychotropic medications (M). As indicated in Table 3 , more than half of the studies have multiple treatments. Except for studies by Du, Jiang, and Vance (2010) and Pan and Dai (2010), the rest of 22 studies all have statistically significant effects favoring the treatment. The overall mean effect size is 1.89, indicative of very large effects. Some of the studies which use CBT, M, and FT have extremely high effect size (> 5), such as Shen (2008), Yang, Shao, and Zheng (2005), and Shao, Yang, Luo, and Zheng (2004). However, most of those studies have low quality scores (<5), which means that the quality of the evidence is low.
Effect Sized and Methodological Quality Scores for Chinese Internet Addiction Studies
Note. CBT = cognitive behavior therapy; EA = electroacupuncture; EBT = electroencephalographic biofeedback treatment; EP = exercise program; FT = family therapy; GT = group-based treatment; HE = health education; M = medication; MI = motivational interview; MT = military training; SBT = solution-focused brief therapy; VS = voluntary service.
CBT treatments combined with medications were the most frequently used therapies in the 24 studies. The QES scores for these studies ranged from 0 to 5 (mean = 3.4). The average effect size of CBT treatments combined with medications is 3.93. There are three studies which only used EP as the treatment (shown in Table 4 ). The average effect size of the three studies is 1.47 while the mean QES is 2.7. Two studies only used CBT. For one such study with a high QES (i.e., 10) the effect size is .13 (not statistically significant). However, for the lower quality study (QES = 2), the effect size is 14.5. Two studies used EA. This was a unique treatment in China, which uses electric stimulation to the body to reduce the desire to use the Internet. Since 2009, this method has been suspended in China due to the unclear negative consequences (Wei sheng bu jiao ting dian ji zhi liao wang yin [Chinese Ministry of Health ban on electric stimulus therapy for Internet addition disorder], 2009). One study used EBT which is a new biofeedback machine treatment. However, the findings showed that biofeedback machine therapy did not have statistically significant effects on IA. The combination of CBT, GT, FT, or others have average effect sizes ranging from 1.27 to 5.86, but the methodological quality for these studies is around 4. Based on Tables 3 and 4, when there are extremely large effect sizes (>5) as study 18, 20, 23 the QES falls below 2.
Quality of Evidence Scores and Effect Size by Type of Treatment
Note. CBT = cognitive behavior therapy; EA = electroacupuncture; EBT = electroencephalographic biofeedback treatment; EP = exercise program; FT = family therapy; GT = group-based treatment; HE = health education; M = medication; MI = motivational interview; MT = military training; QES = quality of evidence scores; SBT = solution-focused brief therapy; VS = voluntary service.
Single Treatments and Multimodal Treatments
We compared effect sizes of studies which used single treatments with studies which used multiple treatments. The average effect size for single treatments is 1.77, and the average effect size for multiple treatments is 2.3. However, the t test is not statistically significant (t = .39). Using Chambless and Hollons’ (1998) criteria most of the treatments we reviewed would be considered possibly efficacious, with no replication studies for any treatment we reviewed (Chambless & Hollon, 1998).
Discussion and Applications to Social Work
The primary objective of the study was to review the published outcome studies on IA in the Chinese context. Following the procedures of Becker and Curry (2008) we evaluated 24 IA treatment studies. Our main finding was that only 2 of the 15 attributes were reported by more than 50% of the three types of study designs. Although random sequence generation is a necessity for experimental studies, only half of them mentioned how they generated a random sequence, and the other half did not indicate how they randomized participants. In addition, the remaining methodological criteria were poorly met, indicating most research did not establish reporting guidelines for clinical trials (Moher et al., 2001).
None of the studies contained treatment adherence ratings or collateral reports, which indicates that research paid a minimal amount of attention to these areas. Only the experimental studies reported baseline data, sample size, allocation concealment, and blind assessment. Few studies indicated which outcomes were primary. Similarly, few studies used manualized treatments. Replication by outside investigators, a criterion for establishing a treatment as empirically supported (Chambless & Hollon, 1998), would be difficult in the absence of treatment manuals. The lack of objective measures also threatened the reliability of the outcomes. Finally, the small sample sizes of these studies was concerning, as inadequate statistical power limits our abilities to detect true differences if they were indeed present.
Our secondary objective was to determine which treatment was more effective. However, based on low methodological quality scores here and the limited number of studies, this was a difficult task. Thus, we have only described trends in this study, and have not completed formal moderation analyses. In future studies, the QES criteria could be explored as moderators of treatment effects. However, due to the limited number of studies, there was not enough statistical power in this study to enable such an analysis. Another approach is to find studies with high methodological quality, and then to see whether such studies have high effect sizes. Comparatively, in Becker and Curry’s (2008) study, the median QES was 7, reflecting the relatively higher maturity of the evidence-base for that population. That was, in our study, only 1 study of the 24 studies examined that has a QES larger than 7. And this study had a low effect size of .13 without statistically significant differences between treatment conditions. Most of the studies in this review article had a QES lower than 5. Because of this finding, it is important that future trials use more rigorous designs and adhere to established standards of clinical trial reporting. According to Chambless and Hollon (1998), at least two repeated rigorous treatment studies are needed to identify a promising treatment. Using Chambless and Hollons’ (1998) criteria most of the treatments we reviewed would be designated as possibly efficacious, since no replication studies existed for any treatment we reviewed and power was too low in most cases to detect moderate effect sizes. Rigorous designs and reporting can help researchers and social workers to identify whether studies are rigorously replicated. The rigorous design and reporting will also prevent potential biased estimation of effect size for the IA treatment in the future. There is evidence that inadequate methodological approaches and reporting are associated with over-estimation of treatment effects and failure to consider the quality of methodology and reporting limits the ability to detect potentially inflated treatment estimates, identify sources of bias and inform best practices in the field (Juni, Altman, & Egger, 2001; Moyer & Finney, 2005).
There were some limitations for this study. First, we reported low κ values with respect to two attributes, outcome and active comparison, indicating a relatively low interrater reliability for some characteristics. There were high discrepancies initially for these two characteristics. For example, one rater believed the outcome was presented only if the study differentiated which outcome was primary and which was secondary. However, another rater was more generous in coding the outcome as “1” as long as the study presented two or more outcomes in which at least one pertained to IA behavior. Similarly, one rater believed the study applied active comparison only if the comparison group had received evidenced-based intervention to treat Internet addition, whereas the other rater applied a broader rule and coded a “1” if the comparison group received some treatments, no matter whether they were evidenced-based or not. Despite the low interrater reliability for these two attributes, we observed modest or high κ values for the rest of the attributes. In addition, for these two attributes and other attributes having κ values lower than one, unanimous agreement was achieved through dialogue with an experienced colleague.
It was also possible that studies were more rigorous than we have estimated, as we could only score studies based on their reporting of methodological attributes. Thus, it was possible that we underestimated the QES scores for some studies. As long as the authors did not explicitly mention the attributes in the article, we treated them as not having those attributes although they may have simply not reported doing so. The research conventions in the context of Chinese academic language might divert the publication format away from the American academic conventions even further. For example, many articles were between one to four pages in length, which is a highly condensed publication format.
The limited number of studies and nonconsistent outcome measurement tools constrained our ability to fulfill the second main objectives of this review. Thus, we were limited by our small sample size of 24 studies. In addition, these 24 studies used different outcomes measurement tools to report how people recover from IA. Some of the studies only reported changes in symptoms, such as depression or anxiety, without reporting changes in IA criteria. It was problematic to compare effect sizes calculated based on different outcome measurement tools.
Finally, we acknowledge that it is yet unclear how vulnerable and impoverished youth are affected by IA. For example, Eamon’s (2004) analysis of a nationally representative sample found that poor youth had less computer access and were less likely to use computers less for nonacademic purposes. Future clinical research should explicitly report participant demographic characteristics so social workers know if vulnerable youth are affected by IA and represented in these studies.
Given that few of the methodological standards were met, researchers are encouraged to include as many essential elements of design as possible and contribute to establishing reporting guidelines for intervention studies. As the classic experimental design is characterized by criteria such as sequence generation, allocation concealment, and baseline data, these attributes should be reported when they have been implemented. Some criteria such as using objective measures, establishing primary outcome measures, having adequate power, and using manualized treatment, are easier to meet than other criteria; therefore, researchers should report them. For those attributes, including treatment adherence rating, collateral report, objective measure, and blind assessment, as they are embedded in more rigorous designs and are difficult to be fulfilled, researchers might consider including them while designing their studies. The bottom line is that when implementing a design, researchers should not neglect to report them. Otherwise, the audience will assume that the study has not been conducted rigorously.
The overall QES of the 24 studies is relatively low (Mean = 3.58, SD = 2.12, Median = 4, Range = 0–10), suggesting there are many improvements needed for experimental studies on IA treatment. As experimental designs are weighted heavily in the determination of whether practices are empirically supported, it is discouraging that the QES scores for experimental studies in this review are not higher. Indeed, they were lower than those of randomized trials in other reviews (Becker & Curry, 2008), indicating that methodology quality is relatively weak in the area of IA intervention studies. The lack of scientifically established evidence hampers researchers’ abilities to inform practitioners of promising treatments for IA. Therefore, we suggest researchers apply more rigorous experimental designs in their research because only through randomized controlled experiment design can a counterfactual model be set up to determine true treatment effectiveness (Shadish, Cook, & Campbell, 2002).
The second implication of this study is that there exists a need for a more uniform reporting style for outcome studies in China. The CONSORT statement (Moher et al., 2001) is lauded as the gold standard for transparency in clinical trial reporting in the United States. Chinese scholars could develop similar reporting criteria or follow CONSORT criteria to enhance the quality of study and reporting.
Finally, one implication for social work practice pertains to social workers choosing the best evidence about IA treatments. In short, this study highlights how, social workers should not choose treatments only based on simple counts of how many studies have found positive effects for a specific treatment. As we have seen in this study, not all clinical trials are equally rigorous. Thus, criteria for determining whether a particular treatment is empirically supported should consider a more sophisticated weighting scheme to account for studies with low methodological quality, which could inflate effect sizes. To be specific, social workers should continue to receive training in being good consumers of research, with emphasis on identifying attributes of studies that are associated with higher methodological quality. Although the literature is limited with regards to the most promising IA treatments, studies reviewed here with a QES equal or larger than 4 (i.e., the median) have relatively higher methodological quality.
In terms of IA treatment, the most popular treatment is CBT. Unfortunately, replication studies are needed to conclusively determine whether CBT is a promising treatment, since one study had a low QES score, and a more rigorous trial found a weak, statistically insignificant effect (d = .13). With the above evidence, social worker can only treat CBT as a possibly efficacious treatment. CBT combined with medication seems like a promising therapy with a mean effect size of 3.93 and mean QES of 3.4. However, when we looked deeply, we found that five of the CBT plus medication studies used the SCL-90 as the outcome measurement, and did not report actual changes in IA symptoms. Although the studies were using medications that targeted depression, it would also be informative to know if these medications work for IA symptoms. Future trials could use 2 × 2 designs to partition the effects of medications and cognitive behavioral treatments. EPs had a mean effect size of 1.47 and mean QES of 2.7. The effect size was high however the quality of the studies was low. Taking into consideration of effect size inflation, EP should be viewed somewhat skeptically, and studied further in more rigorous trials. Electroencephalographic biofeedback treatment did not have statistically significant effects (d = .29), so social workers should proceed with caution with this treatment until additional studies support its efficacy. CBT, combined with group therapy and military training had a QES of 7 and effect size of 1.27. This treatment fulfilled the criteria with a relatively higher methodological quality and effect size. Therefore, this treatment was the most promising treatment we reviewed in this study. CBTs combined with either group or FT had a mean QES of 4 and mean effect size of 1.65. The average effect size was high, but these studies were only average in terms of methodological quality. Relative to other treatments we studied, these seem promising, with the general caveat that greater specification of these treatments and more rigorously designed studies are needed. Based on this review, possibly efficacious treatments for IA are cognitive behavior therapies that are combined with group therapy, military training, or FT.
Footnotes
Acknowledgments
The authors would like to thank Mary Keegan Eamon and Jun Sung Hong for their helpful comments on an earlier draft of this article.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The authors received no financial support for the research, authorship, and/or publication of this article.
