Abstract
This study provides a demonstration of how published intervention outcomes can be used to create benchmarks for wellness programs for comparison of a case study. Case study results can then be applied by decision makers to adopt and evaluate the relative effectiveness of wellness programs. This case study assessed outcomes from Transtheoretical Model (TTM) computer-tailored interventions (CTIs) on 6 behaviors over a 5-year period. Results were compared with outcomes from a series of TTM randomized controlled trials and a representative review of workplace wellness interventions. The case study included 6544 employees, their spouses, and adult dependents who participated in a multicomponent CTI that assessed health risks and provided tailored feedback. Case study results were compared with 26 outcomes from 14 randomized controlled TTM-based CTIs, and with results from a published review of worksite-based wellness programs. The outcomes of the dissemination study were comparable to the average results of the TTM-based randomized controlled trials on stress and depression but exceeded the averages on smoking, healthy eating, fruit and vegetable consumption, and exercise by 16.4% to 44.8%. The dissemination study also exceeded by 89.3% to 7 times the average results of the workplace wellness interventions. The comparisons applied in this project represent a demanding test of the effectiveness of case studies. Length of treatment and choice of treatments are factors that may have contributed to above-average outcomes. (Population Health Management 2013;16:373–380)
Introduction
O
Excess risks also lead to excess costs. Modifiable health risks, such as tobacco use, depression, stress, and overweight status, are associated with short-term increases in the likelihood of incurring health expenditures and increases the magnitude of those expenditures. 12 In large worksite samples, employees' excess risk factors have predicted incremental increases in pharmaceutical, overall medical, and disability costs. 13 –15 Changes in multiple behaviors are related to changes in medical costs by an average of about $2000 per year. 13 Consequently, targeting change in multiple risk behaviors offers the potential to increase health benefits, maximize disease prevention and management, and reduce health care costs.
Until recently, there was little programmatic research demonstrating that multiple health risk behaviors could be changed simultaneously within high-risk populations. For example, Goldstein et al 16 reviewed the literature on multiple health behavior change (MHBC) interventions in primary care and concluded that there was insufficient literature available to review and large gaps in the field's knowledge base. As a response to such knowledge gaps, the recent National Institutes of Health (NIH) summary report on the Science of Behavior Change 17 identified simultaneously changing multiple behaviors as a top priority for all of NIH. In recent years, there has been an increase in multiple behavior change research as a result of the funding initiatives of NIH and foundations. 18 For example, significant impacts have been made on multiple health behaviors by applying Computer-Tailored Interventions (CTIs) based on the Transtheoretical Model (TTM) to treat multiple behaviors simultaneously. 19 –22 CTIs based on full TTM tailoring would include assessments and feedback on stages of change, pros and cons of changing, self-efficacy, and each of the 10 processes of change that are relevant to the individuals' stage of change. Most often, CTIs conducted by the authors would include 3 interactions over a 6- to 12-month period. What has not been studied is how effective these MHBC interventions are when disseminated in real-world programs that lack the controls and expert resources of randomized trials. This innovative case study assessed the long-term outcomes of empirically-supported TTM CTIs on multiple behavior change in a “real-world” setting. The CTIs were disseminated to employees and their spouses/dependents through a large multisite national company using participation-based incentives that produced a 92% participation rate.
This research draws heavily on the ambitious work that is being accomplished by the national Taskforce for Community Preventive Services (CHES) that is developing bodies of evidence for recommendations, predictions, evaluations, and innovations for population-based health promotion programs. 23 The Taskforce has made a compelling case that clinically-based treatments may be able to rely just on randomized clinical trials, but population-based interventions need a broader range of evidence, including case studies with concurrent comparison groups, case studies with multiple and pre- and/or post-measures, and pre- and post-measurement studies without comparison groups, which actually comprise the most common type of study. 24,25 The Taskforce also relies on a broad range of criteria to evaluate the strength of evidence for different types of interventions, including the number of studies, the consistency of results being in the same direction, the magnitude of the effects and the consistency of the magnitude, the type of study design, and the quality of the study implementation. These criteria have been applied by a number of chapters and teams in the Taskforce that develop a body of evidence for specific purposes, including for single behaviors such as smoking, 26 sun exposure, 27 or exercise, 28 or for specific diseases such as diabetes. 29
In their research, the authors included the body of evidence recently developed for a broader range of behaviors (eg, smoking, healthy eating, fruit and vegetable intake, exercise) that were treated by the most common types of intervention used for populations of employees. 25 These interventions included (1) health risk assessments with feedback (HRAF) repeated at least twice; and (2) HRAF plus at least 1 additional health promotion program. Only the HRAF plus have produced sufficient evidence for multiple behaviors. Results from this review were used as a benchmark to represent the average effects produced by one of the 2 most common interventions used for population-based health promotion for 4 health risk behaviors (smoking, healthy eating, fruit and vegetable intake, and exercise) in real-world settings.
The authors also generated a body of evidence from their best practices of CTIs based on the TTM. This evidence was restricted to population-based randomized trials that included 26 outcomes on 6 behaviors (smoking, healthy eating, fruit and vegetable intake, exercise, stress management, and depression prevention). This body of evidence is more homogeneous in terms of types of study design (randomized controlled trials [RCTs]) and type of intervention (TTM CTIs) than that of Soler et al, 25 but is more heterogeneous in terms of target populations, including employees, primary care patients, parents of high school students participating in health promotion programs, and more representative populations. The authors' body of evidence and that of Soler et al did, however, rely on common outcome criteria: the percentage of participants who progressed from being at risk for a behavior at baseline to not at risk at follow-up on national or consensus criteria such as changing from cigarette smoking to total abstinence.
Purpose
The purpose of this study is to advance bodies of evidence that can be used for benchmarking population-based health promotion programs, particularly case studies that do not have concurrent comparison groups. A real-world dissemination case study is used to demonstrate how the CHES criteria can be used initially for prediction and evaluation, and ultimately for innovation to produce a new benchmark for best practices. In addition to the CHES criteria of consistency of results across studies, magnitude of effects, types of design and implementation, the authors include 2 other criteria. The first is the percentage of a population participating in a program, because the goal of population programs is to maximize intervention impacts, which is defined as the percentage of participants multiplied by the magnitude of the effects. 30 Historically, RCTs relied only on efficacy for success. However, a smoking cessation program that has 30% efficacy or abstinence but reaches only 5% of a population impacts only 1.5% of the problem. An intervention that reaches 75% of a population but has only 20% efficacy produces a 15% impact (0.75×0.20) or has a 10 times greater impact than the treatment with 50% greater efficacy.
The second criteria added was replication of the intervention outcomes across multiple health risk behaviors. Changing multiple behaviors also increases intervention impacts, especially for multiple behavior populations that are at greater risk. The authors' predictions were that the results of the dissemination trial would be consistent on 6 different behaviors with the average of the outcomes from their 14 RCTs that were conducted with heterogeneous populations but used relatively consistent interventions (TTM CTIs). See Table 1 for a description of the trials included. The authors also expected that the magnitude of effects on 4 behaviors in the case study would be of greater magnitude than the average magnitude of effects produced by the more commonly used but more heterogeneous HRAF plus interventions evaluated in more heterogeneous types of study designs. 25 Finally, based on the results both in Soler et al and their RCTs, the authors predicted that the outcome comparisons between different behaviors would be substantial and would be greater than outcomes produced within the same behaviors in the case study versus their RCTs.
Methods
Sample
Data were collected from 6544 employees, spouses, and covered adult dependents of a national company from October 2005 through December 2010 as part of their wellness program delivered by Quality Health Solutions, Inc. Employees worked at one of 30 sites spread across the country and included both union and nonunion employees. The sample was 52.1% (n=3411) male; 84% (n=5500) were white, 10.2% (n=668) were black or African American, and 4.2% were Hispanic/Latino. The average age of participants was 43.57 years (standard deviation [SD]=10.4) and the average body mass index of the sample was 28.06 (S.D.=6.3).
Employees and covered spouses or dependents were incentivized to participate in their company's wellness program. Incentives consisted of increased company contributions to insurance premiums for participation in the worksite wellness program. Requirements for participation evolved throughout the course of the dissemination of this program, ranging from completion of at least 1 Health Risk Intervention (HRI) and biometrics per calendar year to completion of an HRI, biometrics, as well as multiple interactions with online LifeStyle Management programs or telephonic coaching.
Intervention
The self-directed HRI assessed chronic conditions, preventive screenings, health care utilization, quality of life, and 10 health behaviors. At the end of the assessment, stage-tailored feedback was provided and links to the appropriate online behavior change programs became accessible. Although other programs only assess current health risks (typically called a health risk assessment), this HRI involves an assessment, provides the participants with immediate feedback on their particular health risks, and recommends strategies for how they could begin to progress toward taking action. All assessments and feedback provided in the HRI were based on the TTM. Once participants completed the HRI, they received links to the LifeStyle Management programs that were most relevant to them based on their HRI responses. Participants also were eligible for telephonic coaching based on the principles of TTM and motivational interviewing. Participants included in the study had completed an HRI during the first 3 years of the dissemination ([T1]=Year 1, 2, or 3) and completed a follow-up HRI during the last 3 years of dissemination ([T2]=Year 4, 5, or 6) with a minimum of 2 years between sessions. Data were utilized from the individual's first session and the most recent session. On average, participants completed 3.58 (S.D.=0.38) HRIs with an average follow-up time difference of 47.61 months (S.D.=14.5). In addition to the data collected, results were reviewed from the authors' previous RCTs that used similar types of TTM interventions and outcome measures for multiple studies for each behavior to serve as comparisons to results from this real-world dissemination case study.
Measures
For each behavior, public health criteria (as applicable) were used for the classification of “at-risk” status. Stage of change for intentions to meet each public health criterion was assessed using the same definitions for all behaviors.
• Precontemplation—not meeting criteria and not planning to meet criteria in the next 6 months,
• Contemplation—not meeting criteria but planning to meet criteria in the next 6 months,
• Preparation—not meeting criteria but planning to meet criteria in the next 30 days,
• Action—meeting criteria for less than 6 months,
• Maintenance—meeting criteria for more than 6 months.
Participants were classified as at risk for a behavior if they were not meeting predefined health criteria. These individuals were in pre-Action stages (ie, Precontemplation, Contemplation, or Preparation). Those meeting health criteria were “not at risk” and were in the Action or Maintenance stages. The criteria used for each behavior are listed below.
Healthy eating
Healthy eating was defined as reducing calorie intake and limiting unhealthy nutrients by reducing dietary fat intake. 21
Exercise
The definition of regular exercise has changed over time as leading organizations have modified the criteria. The current definition consists of 150 minutes a week at a moderate intensity level, or 75 minutes a week at a vigorous intensity level, or a combination to equal 150 minutes. 21,31
Fruit and vegetable consumption
The current recommendation for fruit and vegetable consumption consists of eating at least 4½ cups of fruits and vegetables daily. 21,31
Smoking
Point prevalence smoking abstinence was assessed by asking about current smoking and intention to quit if currently smoking. 32 –35
Stress management
Effectively managing stress in healthy ways includes exercising, seeking social support, and using relaxation techniques. 36
Depression
Depression prevention is defined as effectively practicing strategies to prevent or reduce depression, including: controlling negative thinking, engaging in healthy, pleasant activities, practicing stress management, exercising, and getting professional help when needed. 37
Analysis
For each behavior (healthy eating, exercise, fruit and vegetable intake, smoking cessation, stress management, and depression prevention) at-risk participants at T1 were selected and frequencies of the percent of participants moving to criteria at T2 (% moving to Action/Maintenance) were calculated. Table 2 presents the T1 stage distributions for each behavior.
A, Action, C, Contemplation; M, Maintenance; PC, Precontemplation, PR, Preparation.
Pooled results from the authors' RCTs applying TTM interventions were used as benchmark comparisons to evaluate the TTM-tailored best practices against the TTM outcome measures in the case study. These pooled data provided the outcomes needed to benchmark the success rates from this real-world dissemination to the comparison RCTs.
The pooled results from the Soler et al systematic review 25 also were used as comparisons to benchmark findings beyond TTM-based RCTs. The median outcomes and interquartile interval (IQI) were included for the 4 behaviors that the study by Soler and associates had in common with the authors' dissemination case study. These outcomes in the Soler et al evidence base allowed the authors to benchmark their case study results against the average outcomes of the most commonly used population-based health promotion programs.
Results
Healthy eating
The Soler et al review 25 included results on dietary fat reduction for 13 arms from 11 studies. The majority of results favored the interventions with a median decrease of 5.4% in the proportion of employees with high-risk fat intake with an IQI of 1.8% to 21.9%. The authors had 6 RCTs available that used national criteria for dietary fat and/or calorie reduction. 19 –21,38 –40 Across those studies, progression to criteria ranged from 20.2% to 47.5% with an average of 32.3% progressing to criteria, which is 5 times higher than that found in the Soler et al review (Fig. 1). Of the 3370 participants in the case study who were not meeting the criteria for healthy eating at T1, 44.0% (n=1482) progressed to meeting the criteria at T2. This rate is 36.2% higher than the average across TTM studies and 7 times greater than that found in the Soler et al review. 25

Percent progressing to criteria for healthy eating from a review of worksite wellness programs, 6 TTM trial arms, their average, and a TTM case study. TTM, Transtheoretical model.
Exercise
Figure 2 shows that 5 arms from 4 TTM-based RCTs targeted exercise with an average of 47.5% of participants reaching criteria. Results ranged from 43.3% to 57.3%. 21,31,38,41 In the Soler et al review, 25 15 study arms from 12 studies had a median increase of 15.3% (IQI=8.3% to 37.2%; Fig. 2). Of the 2580 participants in the case study who were in pre-Action for exercise at T1, 55.3% (n=1427) progressed to the Action/Maintenance stages at T2. Both the TTM average and case study show greater than 2 times the amount of change, with the case study producing 16.4% more change than the TTM trial average.

Percent progressing to criteria for exercise from a review of worksite wellness programs, 5 TTM trial arms, their average, and a TTM case study. TTM, Transtheoretical model.
Fruit and vegetable intake
Two TTM-based RCTs reported outcomes on fruit and vegetable intake. One included a full TTM-tailored intervention 42 while the other used only stage-matched messages. 21 On average, 34.8% progressed to Action or Maintenance across the 2 RCTs (Fig. 3). Soler et al 25 did not report the percentage of participants reaching criteria because the median increase was so small (0.09 servings/day, IQI=−0.07 to 0.17). Of the 3012 participants in the case study who were in pre-Action for fruit and vegetable intake at T1, 50.4% (n=1518) progressed to the Action/Maintenance stages at T2, which is 44.8% greater than the average of the two TTM trials.

Percent progressing to criteria for fruit and vegetable intake from a review of worksite wellness programs, 2 TTM trial arms, their average, and a TTM case study. TTM, Transtheoretical model.
Smoking cessation
The Soler et al review 25 included 30 arms from 24 studies with a median quit rate of 17.8% (IQI=11.1% - 22.6%). Eight TTM intervention studies targeted smoking cessation. Cessation rates ranged from 21.1% to 25.6%, with an average of 23.7% of participants quitting (Fig. 4). 19,20,31,33,35,39,40,43 This average is 33.1% higher than that found in the Soler et al review. 25 Of the 840 participants in the case study who were in pre-Action for smoking cessation at T1, 33.7% (n=283) progressed to the Action/Maintenance stages at T2, which is 42.2% higher than the TTM average and 89.3% higher than the Soler et al review. 25

Percent progressing to criteria for smoking cessation from a review of worksite wellness programs, 8 TTM trial arms, their average, and a TTM case study. TTM, Transtheoretical model.
Stress management
Four arms of 3 studies reported outcomes for TTM interventions on stress management. 31,36,41 A range of 60.0% to 74.9% of participants moved from at-risk to not at risk following the interventions, with an average of 68.4% meeting stress management criteria (Fig. 5). Of the 874 participants in the case study who were in pre-Action for stress management at T1, 67.8% (n=593) progressed to the Action/Maintenance stages at T2.

Percent progressing to criteria for stress management from 4 TTM trial arms, their average, and a TTM case study. TTM, Transtheoretical model.
Depression prevention and management
One TTM-based RCT reported outcomes on depression prevention and management. 37 This study found 66.7% of participants moved from at risk to not at risk following the intervention. Of the 764 participants in the case study who were in pre-Action for depression prevention at T1, 68.0% (n=522) progressed to the Action/Maintenance stages at T2 (Fig. 6).

Percent progressing to criteria for depression prevention from a TTM trial and a TTM case study. TTM, Transtheoretical model.
Discussion
The CHES Taskforce applied their multiple criteria to evaluate the validity and generalizability of their results. 23 According to their consistency criteria, if the results were not consistently in the same direction (eg, posttest greater than pretest or treatment group greater than comparison group), then it should be assumed that the intervention would not be effective in all populations. The greater the consistency, the greater the confidence in the results. Examining the RCT results, there is not only consistency in the direction of the results but also in the magnitude. Although there are differences in the level of change across behaviors, there is much greater consistency within each of the behaviors. The results of the present case study were consistently in the same direction as the authors' RCT results in multiple ways: (1) the posttests were consistently greater than the pretests across all 6 behaviors; (2) the results of the case study and the average of multiple RCTs were consistently greater than the average results of the frequently used HRAF plus treatments for 4 different behaviors. 25
According to the magnitude of effects criteria, the CHES Taskforce concluded that the greater the magnitude of the effects, the greater the confidence in the validity of the results. The magnitude of the effects in the present case study was greater than the magnitude of effects in the Soler et al evidence base across all 4 of the common behaviors. The magnitude of effects in the present case study was larger than the average in the randomized population trials on 4 behaviors. For 2 other behaviors, the absolute magnitude of success (>60%) on a population basis was large and consistent across the case study and RCTs.
If this research would have been limited to a single body of evidence for benchmarking, namely the Soler et al, 25 then each of the results of the present case study likely would have been assessed as outliers. In terms of magnitude, the long-term outcomes of the case study exceeded by about 2 to 7 times the mean of results of 4 comparable behaviors in the Soler et al review. In contrast, when benchmarking against the body of evidence from population-based randomized trials, the results of the present case study exceeded the average results of 4 behaviors, but only by about one sixth to one half, and were nearly equal on the 2 affective behaviors. The dramatic differences in magnitude indicate the importance of having multiple bodies of evidence for benchmarking outcomes from case studies and other types of intervention research designs. With 2 benchmarks, comparisons can indicate whether a new study produces less than average, average, greater than average, best practice, or a new standard or benchmark for 1 or more behaviors. Special attention could be paid to studies surpassing current best practices to identify intervention factors, such as ongoing interventions or choice of target behaviors by participants, that could be tested in new RCTs designed to produce breakthroughs under controlled conditions.
Comparing the 2 bodies of evidence on these criteria, the authors' population-based RCTs produced much higher magnitude of effects than the studies in the Soler et al review. 25 The higher magnitude of effects is consistent across multiple behaviors. These RCTs have greater consistency in terms of types of interventions, the theory driving the interventions, and consistency of quality of implementation because of computer-based and evidence-based tailoring. The RCT study designs all reach the CHES criteria for strongest design and the average participation rates are higher, assuring greater population impacts. Applying the CHES criteria seems to lead to the Taskforce's conclusion that the stronger the body of evidence is across multiple criteria, the stronger is the support for the evidence.
What the authors would recommend for the CHES approach is that the bar continues to be raised for standards for benchmarking, not only for case studies but for all types of designs for population health promotion. Raising the bar for benchmarking may be the clearest way to keep advancing the impact of health promotion programs. This is a much riskier test for evaluating interventions than to merely require that a treatment group outperform a control group or some other concurrent comparison groups.
A recent survey found that a major barrier that prevents leading companies from adopting health promotion programs is their perceived inability to evaluate the programs' effects on changing health risk behaviors. 43 This study provides a demonstration of how published intervention outcomes can be used to determine the relative success of wellness programs in a case study that can be applied by decision makers to adopt and evaluate the relative effectiveness of wellness programs.
Footnotes
Author Disclosure Statement
Drs Johnson, Prochaska, Paiva, and Prochaska, and Ms Fernandez, and Ms DeWees declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors of this article have financial relationships with Pro-Change Behavior Systems, Inc. and Quality Health Solutions, Inc., both for-profit companies that create and distribute the programs described in the case study. The authors received no financial support for the research, authorship, and/or publication of this article.
