Abstract
COVID-19 has highlighted the need for evidence-based behavioural health interventions that can be delivered remotely. This article provides within-group effect size benchmarks for randomised controlled trials of Internet-based Acceptance and Commitment Therapy for the treatment of adults with anxiety and depression. Effect sizes were calculated using the Glass approach, adjusted using Hedges g, then aggregated to produce separate benchmarks for measures of anxiety and depression. These benchmarks can be used by community-based treatment providers to evaluate the effectiveness of their web-based Acceptance and Commitment Therapy intervention to determine if it should be continued, modified for the unique needs of their client population and practice setting, or discontinued.
Introduction
COVID-19 has brought with it new challenges for behavioural health providers working with clients experiencing symptoms of depression and anxiety.1–4 Significant changes to health services delivery methods have been mandated across the country. Most non-emergency healthcare related services quickly converted from face-to-face appointments to online meetings, presenting new challenges for those with limited Internet access, low Internet quality or low levels of technology literacy.5–8 Behavioural health providers, many of whom had never engaged in a telehealth session prior to COVID-19, were required to do so to comply with the need to social distance and limit potential exposure to the virus. At the same time, prolonged social isolation, fear concerning virus exposure, grief related to loss of loved ones, economic setbacks and the inability to celebrate significant life milestones in usual ways have all increased susceptibility to symptoms of depression and anxiety in the general population, resulting in an increasing need for mental health services.1,2,4,9,10 This uptick in need for services, coupled with decreased ability to engage clients by traditional means, has left many providers looking for new and innovative ways to engage in client care, and to continue providing services for existing clients.
Technology-enhanced service delivery methods, such as the use of web-based therapy and mobile device applications, have offered behavioural health providers a means through which they can engage clients in mental health services in a safe and socially distant way. Although telehealth and online therapy are nothing new, with early adopters utilising these intervention strategies for the past 10 years or more, these service delivery methods have become more attractive to a larger group of providers since March 2020.5,6 Evidence-based intervention approaches such as cognitive behavioural therapy (CBT) and Acceptance and Commitment Therapy (ACT) have been successfully transported from a face-to- face delivery format to a fully online format prior to COVID-19, facilitating service access and helping to engage clients who may be reluctant to engage in traditional outpatient services.11–15 However, many providers, particularly those newer to technology enhanced modes of service delivery, may be left wondering how effective these services are with specific client populations that they serve. One way for community-based providers to help evaluate the impact of technology enhanced modes of service delivery is through the use of benchmarks.
What is benchmarking?
Benchmarking is an approach that allows for comparison of outcome data obtained from real-world settings against outcomes resulting from controlled clinical trials. 16 It is a process through which outcomes of randomised clinical trials (RCTs) are aggregated to determine the overall ‘impact’ of a given intervention under ideal circumstances. 17 Benchmarks provide tools for community-based providers to engage in the final step of the process of evidence-based practice,18–21 by allowing them to compare the outcomes of their interventions, particularly interventions that have been adapted for a particular client group or mode of delivery, with outcomes of RCTs.22,23 This information, in turn, helps the provider to determine if the outcomes that they are getting are similar to those found in clinical trials, thus guiding the practitioner concerning if they should (a) continue the intervention ‘as is’; (b) modify the intervention further in the hopes of improving its effectiveness; or (c) discontinue the intervention and implement another.
Benchmarks have been established for face-to-face evidence-based interventions for depression,24–30 anxiety, 31 trauma17,32,33 and for diagnostically heterogeneous groups. 34 However, there are currently no available benchmarks in the empirical literature for evidence-based interventions for anxiety or depression delivered in a technology-enhanced format. Accordingly, this article presents within-group effect size benchmarks for web-delivered ACT for use by behavioural health service providers working with clients experiencing mild to moderate symptoms of depression and/or anxiety.
What is ACT?
ACT is a ‘third wave’ therapy 35 that can be applied to a wide variety of behavioural health concerns including depression and anxiety. 36 It is based on relational frame theory 37 which is comprised of three key tenets which assert (a) our thinking and the language that we use is a particular type of learned behaviour; (b) our thoughts alter the effects of other behavioural processes; and (c) contextual features of a situation regulate our thoughts and their functions. 38 Psychopathology is thought to result from psychological inflexibility and can manifest in symptoms such as experiential or emotional avoidance, cognitive fusion, impulsivity or inaction and a lack of clarity concerning one’s values. Therapeutic benefit is achieved through the process of understanding the functions of our cognitions rather than simply looking at their content.
Hayes and colleagues35,38 assert that there are six core interrelated and overlapping processes within ACT which facilitate psychological flexibility including (a) acceptance; (b) cognitive diffusion; (c) being present; (d) self as context; (e) clarification of values; and (f) committed action. ACT is different from traditional cognitive behavioural therapies in that rather than seeking to change one’s emotions or cognitions, ACT encourages one to first simply observe, then actively embrace them (rather than trying to fight them, avoid them, or feel guilty about them) through the practice of self-acceptance and mindfulness via a series of structured exercises. 39 The overall goal is to develop acceptance of unwanted private experiences (thoughts and feelings) which are out of personal control and to commit to actions in one’s life that are consistent with one’s personal values. 40 The typical ACT intervention can be done individually or in a small group format and can be presented as a series of sequential sessions or can be implemented as a full day workshop. 41 To date there are over 2000 research studies related to ACT outcomes, including over 280 controlled RCTs and 40 systematic reviews and meta-analyses. 42 There is substantial research to support the effectiveness of technology-enhanced ACT to address symptoms of depression and anxiety.43–45
Methods
Study identification and selection
An initial search of Internet databases was conducted in spring 2020 to locate existing meta-analyses and systematic reviews on the efficacy of the ACT intervention delivered via technology-enhanced means such as via therapist or self-guided web-based delivery or mobile device application. This search included the following databases: Google Scholar, Web of Science, Medline, Academic Search Complete, ERIC, Social Work Abstracts, Soc INDEX with full text, Social Sciences full text, PubMed, PsychARTICLES and PsycINFO, Psychology and Behavioral Sciences Collection, Health and Psychosocial Instruments and the Cochrane and Campbell libraries. Search terms included: ‘ACT', ‘Acceptance and Commitment Therapy', ‘Internet’, ‘Web’, or ‘technology’, ‘depression’ and ‘anxiety’ and combinations of these terms with ‘treatment', ‘systematic review' and ‘meta-analysis'. Three systematic reviews12–14 were identified focusing exclusively on ACT delivered in a technology-enhanced format for the treatment of depressive or anxiety symptoms.
An initial screening of each article identified in the three systematic reviews was conducted, yielding a total of 25 RCTs using some form of the ACT intervention that was enhanced by technology. A follow-up targeted search of articles that were published about a technology-based ACT intervention was also conducted to identify additional articles that might meet the inclusion criteria using the same databases as the original search. Thirteen additional studies were located, yielding a total of 38 studies that were reviewed in their entirety to make preliminary determinations on whether they met inclusion criteria. Studies were included if they: (a) included only adults ages 18 years and older; (b) results were in English or an English translation was available; (c) the intervention was delivered by some technology-enhanced means; (d) at least two-thirds of the sample included participants with symptoms of clinically significant depression or anxiety; (e) used a RCT design; (f) the full ACT intervention comprised of six component processes was delivered in a multi-session format (rather than a one-time workshop); (g) reported at least one standardised outcome measure of depression and/or anxiety; and (h) reported pretest and posttest means and standard deviations (SDs) for each group on standardised measures. Participants with co-occurring physical health conditions were then excluded from our analysis as mental health symptomatology was the focus on this work. Authors 1 and 3 independently evaluated whether each article met the inclusion criteria. There was initial disagreement about inclusion of one article. This discrepancy was resolved via a joint reexamination of the full text article by authors 1, 2 and 3.
Six of the 38 studies reviewed met all inclusion criteria. Thirty-two studies were excluded for the following reasons (which are not mutually exclusive): Three were eliminated as they were not RCTs, four focused on behavioural health symptoms that were not anxiety or depression, one did not report any standardised measures of mental health outcomes, two presented data from follow-up studies, 12 were based on studies where participants had co-occurring physical health conditions, such as chronic pain, seven studies included a majority of participants that, while experiencing mental health symptoms, did not meet the criteria for a mental health diagnosis, and four did not deliver the full ACT intervention. Figure 1 depicts the article selection process.

Article identification, review and selection process.
Data extraction process
Authors 1 and 2 independently recorded data from each of the included articles, and they reached an interrater agreement rate of 100% on data extraction. All data on study measures were self-reported and an intent-to-treat analysis was used for all included studies. The following information was recorded for each study: sample size of each group (ACT and control), authors and year of publication, type of control condition, percentage of participants who were female, length of intervention, mean age of participants, country where the study was conducted, language of the intervention materials and percentage of attrition in each of the groups.
All calculations were conducted using IBM’s SPSS Statistical Software Version 26. A total of 559 participants were included in these benchmark calculations, 283 participants were randomly assigned to the waitlist condition and 276 received the web-based ACT intervention. All but one study (83.33%) were conducted in Scandinavia and, as such, ethnicity data were not reported. Only one study (16.7%) used English language materials for the study. Women comprised the majority (76.9%) of the total sample. The mean age of participants ranged from 20.5–53.4 years of age. The mean number of modules for the ACT intervention was 7.5 (SD = 2.1). Dropout rates ranged from between 5.0–20.0% for the ACT intervention group and between 0–23.1% for the waitlist group. All of the studies reported measures of depression, and 83.3% reported measures of anxiety. A detailed outline of the sample characteristics and intervention methods for each study is included in Appendix 1.
Next, baseline scores and their SDs, along with post-intervention scores and their SDs, for the depression and anxiety measures were recorded. Within-group effect sizes were calculated according to Glass’s delta approach.
46
This approach divides the difference between the experimental and control group means by the control group SD. For within-group effect sizes, the difference between the pretest and posttest means is divided by the pretest SD. This calculation is done for both the experimental and control groups.47–49 As was done for prior benchmarking studies,17,26–28,32,33 follow-up data were not included as these benchmarks are intended to guide practice decision-making in non-research settings that typically do not conduct post-termination follow-up assessments. So as to account for small sample sizes of some of the included studies, each effect size d,
50
was adjusted using the Hedge’s g, formula recommended by Wilson and the Campbell Collaboration
51
in which the effect size is multiplied by
The individual study effect sizes were then aggregated across studies by using the following formula.
53
First, the variance of the individual study effect sizes was estimated as follows:
In the formula, j indicates the experimental group or control group; i represents individual study within group; n is sample size of each group within each individual study; and r is the best available estimate of the correlation between pre-test score and post-test score in the population (ρ, Rho). Since the value of r is the best estimation of the population parameter and it is usually not presented in original studies, a fixed value of either 0.7 54 or 0.5 17 is commonly used. In this study, we calculated and compared the benchmark effect sizes with r set to 0, 0.5 and 0.7 to determine if changing this estimate would have any meaningful impact on our benchmark calculations. No significant differences were found among these three values for r; thus, the value of r was set at 0.5 for all subsequent calculations.
Next, the variance and effect size were used to estimate the fixed benchmark effect size across studies using the following formula:
Confidence intervals for each benchmark were then calculated, as the aggregated effect size is a point estimate of the effect size in the population. First, the standard error (SE) for each group was calculated for each benchmark by taking the square root of one over the sum of the inverse variance weight for Hedge’s g:
Next, the 95% confidence interval for each benchmark effect size (gB) was calculated using the following formula:
In addition, minimum and maximum benchmark effect sizes are reported to show the range of effect sizes because of the potential for skewed distributions in which confidence interval estimates could exceed the minimum or maximum effect size due to the relatively small sample of studies (n=6).
Results
Table 1 displays the aggregated one-group effect sizes along with information on central tendency and dispersion for depression and anxiety. The aggregated within-group effect sizes are 0.84 for depression and 0.85 for anxiety. The referent aggregated within-group effect sizes for the control groups are much smaller, 0.42 for depression and 0.32 for anxiety respectively.
Aggregate within-group effect-size estimates for web-based Acceptance and Commitment Therapy (ACT) addressing depression and anxiety.
CI: confidence interval; gB : adjusted effect size, k: number of studies; SEg: standard error; WL: waitlist control.
Case example
Practitioners may use these results to guide clinical decision-making in the following way. Since the emergence of COVID-19, a clinician at a community-based healthcare agency has been delivering the ACT telehealth intervention to clients who are experiencing symptoms of depression and anxiety. The clinician administered standardised pretest measures of anxiety and depression prior to initiating the telehealth intervention (pretest), then administered the same anxiety and depression midway through treatment (posttest). To determine if the intervention has thus far exhibited levels of effectiveness similar to those found in clinical trials, the clinician would first calculate the mean and SD of the pretest and posttest scores for this group of clients. From there, the clinician can obtain the effect size for their client group by dividing the difference between the pretest and posttest means by the pretest SD. The clinician finds (for example) the effect sizes associated with the treatment of this group of clients for symptoms of anxiety is 0.75 and is 0.35 for symptoms of depression. Based on the aggregated results of the RCTs included in this review, it appears that the intervention had the expected impact on their clients’ anxiety symptoms, since the effect size achieved for this group’s anxiety treatment (0.75) is similar to that of those who received the ACT telehealth intervention as part of a clinical trial (0.85). However, when one looks at this group’s treatment effect size for depression, the obtained effect size for depressive symptoms (0.35) more closely approximates the aggregated effect size for depression obtained from the ‘no treatment control group’ (0.42) in clinical trials than the obtained effect size for the treatment group (0.84) indicating that moving forward, the clinician may want to consider modifying the ACT telehealth intervention to improve its impact on depressive symptomology, or perhaps supplement this intervention with another that focuses on depressive symptoms. Clinicians would then administer the standardised measures again at the end of treatment to obtain the effect size of their treatment group. This will help the clinician to determine if the intervention was performing as expected, particularly if he/she adapted or modified the ACT telehealth intervention in some way to better serve his/her unique client group.
Discussion
This article provides aggregated within-group effect sizes from published RCTs on web-delivered ACT community-based treatment of adults with symptoms of mild to moderate anxiety and/or depression. These benchmarks can provide community-based practitioners with a way to evaluate if their web-delivered ACT intervention is exhibiting effectiveness at levels similar to those found in randomised clinical trials. This information can be used to guide decision-making around whether the intervention and/or its delivery method need to be further modified or adapted to ensure that it is impacting the target symptoms of depression and/or anxiety.
Practitioners who calculate their own within-group effect size for web-based ACT can compare their results to the data in Table 1 that correspond to the targeted issue. In addition, practitioners may wish to compare their within-group effect size to the aggregate effect sizes to those of the control group as well. The extent to which their web-based ACT recipients’ effect size more closely approximates the corresponding web-based ACT recipients’ effect size in Table 1, the greater the grounds for optimism regarding whether this intervention appears to be acceptable. If their obtained effect sizes more closely approximates the control group effect sizes, then they may wish to consider making some modifications to the intervention or discontinuing it. If small sample sizes are used for comparison, it may also be beneficial to consult the minimum and maximum values that are included in Table 1, as well as to consult the values of the pre-post effect sizes reported for individual studies in Appendix 1. For example, if the provider is implementing web-based ACT with a group of clients who are in their 20s, they may want to consult the average effect sizes found for studies with samples that have a similar mean age.
Limitations
As with any study, there are some limitations to this work. The majority of these studies were conducted outside of the USA with a non-English speaking sample. Although the ACT intervention has been translated and validated for use in 13 languages besides English, 41 these benchmarks may not be as applicable to community-based samples in the USA that are more ethnically and culturally heterogenous. All of the studies on which our benchmarks were calculated contained samples that were predominantly female. Thus, one should be cautious in using these benchmarks to evaluate ACT with male clients. Although our benchmarks are calculated from a small number of studies meeting our specific criteria, other works using meta-analytic techniques to aggregate effect sizes of evidence-based interventions have been calculated including six or fewer studies.17,27,32,33,42 The goal of this work was not to establish the efficacy of the remote ACT intervention, which has been done previously, but rather to construct a tool to be used by community-based practitioners to easily compare the outcomes of their technology-enhanced ACT interventions with the outcomes in RCTs. Accordingly, although meta-analytic methods were used, the research team did not search for unpublished papers for inclusion or evaluate the rigour of included studies.
Application to practice
These benchmarking tools may be particularly useful during the current public health crisis, when many providers who are skilled at face-to-face service provision have to adapt their intervention methods to support remote (online) client interactions due to COVID-19. Accordingly, many practitioners may be using a web-based format for their ACT intervention for the first time. Knowing what types of outcomes are usual and expected for the web-based ACT intervention can help to address providers’ concern about using technology-enhanced intervention methods. This can facilitate a conversation regarding the potential barriers or challenges to successful remote service provision within an agency or an individual provider’s office. These conversations should also explore whether the way that they currently provide ACT should be modified or potentially replaced by a different intervention to better serve the needs of the unique populations that they serve.
The expansion of remote delivery methods of evidence-based interventions not only addresses the logistical challenges associated with the provision of behavioural health services safely in the age of COVID-19, but also may have a significant impact on health equity by having the ability to reach potential clients who may be reluctant to present for traditional outpatient therapy for any number of reasons. By first establishing the efficacy of technology-enhanced evidence-based intervention approaches for the treatment of symptoms of mild to moderate depression and anxiety,43–45 then by establishing benchmarks that providers can use to help guide their clinical decision making, researchers can play a pivotal role in expanding the availability and uptake of evidence-based behavioural health interventions.
Now more than ever, behavioural health providers need flexibility in the way they adapt and/or deliver evidence-based interventions due to the current ongoing public health crisis. These benchmarks offer providers an easy to use tool with which they can evaluate their use of the web-based ACT intervention that does not involve a control group. The process of evidence-based practice necessitates exploration of potential innovations and adaptations that make sense for one’s unique practice setting or client population. Accordingly, these benchmarks can be used to allow for further adaptation of the web-based ACT intervention; for example, by varying the number of required modules used to cover the six core concepts or delivering the intervention via face-to-face telehealth sessions, rather than using asynchronous web-based delivery. Benchmarks provide community-based behavioural health providers with tools needed to objectively evaluate their practice outcomes.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
