Abstract
This editorial describes recent randomized controlled trials of worksite wellness interventions and argues that fidelity to intervention designs should be contingent on careful consideration of internal and external validity. A China based hypertension management study which achieved impressive outcomes across 60 workplaces using a comprehensive approach is contrasted with the traditional wellness practices employed in other randomized controlled trials conducted in America. Why studies with negative findings receive more media and professional scrutiny than studies with positive findings is discussed. Three reasons are posited for why bad is stronger than good when it comes to capturing attention. Adoption of new evidence is discussed along with what health promotion professionals can do to advance best practices by considering adoption as an ongoing process.
Keywords
It was a series of studies by Andrea Foote and John Erfurt in the 1970’s and 1980’s that first captured my imagination about the workplace as an advantageous environment for delivering and studying health promotion initiatives and programs. They published a series of well-designed studies that produced impressive population level health improvements. Though they studied cardiovascular risk interventions in both industrial and community settings, it was their research into hypertension management at the workplace that I recall being particularly impressive given their emphasis on treatment and control at the worksite at a time when detection and referral was the norm. 1 In one randomized controlled trial (RCT) of several intervention approaches delivered over 3 years, 56 to 62 percent of the hypertensive employees in their programs had blood pressure readings below 140/90 mm Hg. Of employees in the control group, only 21 percent had readings below 140/90 mm Hg. 2 In the decades since, there have been relatively few randomized controlled studies of the effectiveness of health promotion initiatives at the workplace. Of those, studies with bad results are more likely to garner media attention than those with good results. More on why bad is stronger than good later.
Foote and Erfurt came immediately to mind when I reviewed a workplace hypertension management RCT by Wang and colleagues published in 2020 in the Journal of the American Medical Association. 3 It’s noteworthy that Wang did not cite Foote’s research given the similarities between their interventions and methods, albeit, Foote studied American worksites and Wang’s study was conducted in 60 workplaces across 20 urban regions in China with a program that spanned a full 2 years and reached over 4,000 participants. Blood pressure control was a primary aim of the China study, but general health practices were also measured and affected. And quite similar to Foote’s results, in the intervention group 66 percent achieved blood pressure control which was significantly higher than the 44 percent control rates found in the control group. What’s more, the intervention group reduced excessive salt consumption by 32 percent, affected an 18 percent reduction in use of alcohol and participants reported 23 percent lower rates of perceived stress.
Not only were the results reported in Wang’s study an impressive health improvement feat, replicating the intervention and achieving significant outcomes across such a wide range of workplaces also deserves wide recognition and emulation. Sadly, I found only one media outlet covered the study and it was a paywall protected trade publication. 4 In contrast, if you input something like “worksite wellness did not work” into a search engine, you will find pages of critiques in popular print that feature negative or modest results from a single study and make sweeping generalizations about the ineffectiveness of wellness programs writ large. Two RCT’s from recent years are responsible for most of the one-sided news of late about the shortcomings of worksite wellness, and I have previously critiqued them on these pages. 5
One study conducted at the University of Illinois found no improvements in 37 of 39 outcomes
6
after “the program had run its course” according to one journalist.
7
The journalist neglected to note, however, that this program was sparsely used by employees and ran for just 7 months. And rather than report on the program’s effects on attendees, a plethora of time over time metrics collected as part of university-wide health risk assessment campaigns were examined. An all too common case of over-assessing and under-intervening. Program planners who offer opt in health education classes sandwiched between mass health screenings are, in effect, using their budgets primarily to track the progressive health decline of their population. Program planners who offer opt in classes sandwiched between mass health screenings are, in effect, using their budgets primarily to track the progressive decline of their population.
The author of another RCT conducted at the BJ’s Wholesale Club concluded that employers should be cautious about presuming a return on investment from their wellness programs. 8 Given those who opted in to BJ’s programs attended, on average, only 1.3 learning modules, it’s a conclusion utterly divorced from what health promotion professionals would expect from such poorly attended programs. The BJ’s and Illinois programs were RCT’s where individual employees were the unit of analysis and most would characterize their interventions as “traditional wellness,” meaning the employer held health screening events followed up by targeting those found to be at risk and inviting them to attend group classes or schedule health coaching sessions. I’ve discussed the pros and cons of traditional wellness previously and would simply reiterate here that evaluations of traditional wellness programs should focus on the benefits that accrue to participants, something that doesn’t require the complexity or expense of a population wide randomized controlled trial. 9 In contrast to traditional wellness, the China hypertension study’s intervention was based on “best practices” given they designed initiatives based on the socio-ecological model; meaning they instituted environmental, cultural and policy changes for the whole population alongside programs offered at the individual level. Further, they employed a clustered group randomized controlled trial, rather than individual level analysis only, so they could assess the effects of their efforts at the organizational level. So why is it that a study with a comprehensive approach and impressive results received nary a media mention where studies of ineffective programs made national news?
Why Bad Is Stronger Than Good in Media Reports
I’d suggest there are 3 primary reasons for the imbalanced attention to unsuccessful compared to successful studies, not just in the health promotion field but in health and medical science generally. The first reason is simply human nature. The bromide in journalism that “if it bleeds it leads” is trite but well accepted. Disinformation trolls who disparage wellness programs and mock wellness professionals rely on our tendency to “rubber neck.” That is, we know we should not slow traffic to get a look at that gruesome accident, but we can’t resist. Though the nightly news slips in a positive human-interest story in the last minute of its broadcasts, it’s merely their penance for having fed us 29 minutes of urban violence, political chaos and natural disasters. They know we’re evolutionarily wired to pay attention to threats. Real science is plodding and incremental and builds knowledge via trial and error, but bloggers and journalists alike know that outlier findings that go against the grain will attract more readers than stories affirming consensus thinking. “Good may prevail over bad by superior force of numbers. Many good events can overcome the psychological effects of a single bad one.”
The second reason behind the imbalanced focus on bad results touches on a deeper issue than journalists seeking attention grabbing headlines. That is, we all lean toward a negativity bias. There are several psychological constructs relating to this phenomenon. For example, the “positive-negative asymmetry effect” holds that we spend much more time psychologically processing a negative event than a positive one. Whether this tendency is innate or learned, or whether psychology researchers have merely spent more time studying pain and sorrow than goodness and bliss is debatable. But it remains, researchers agree that the psychological impact of bad events outweigh that of good events. For a fascinating review of the strength of negatively valenced events versus positively valenced events read “Bad is Stronger than Good” by Baumeister and colleagues. 10 Implications of their research for the health promotion profession suggest that for every negative study cited in the press or for each misleading post by trolls, we likely need to counter with tenfold more accurate stories to mitigate the impact. Not a fair fight, but still worth waging. As Baumeister notes: “This is not to say that bad will always triumph over good, spelling doom and misery for the human race. Rather, good may prevail over bad by superior force of numbers: Many good events can overcome the psychological effects of a single bad one. When equal measures of good and bad are present, however, the psychological effects of bad ones outweigh those of the good ones.” 10
The Specificity-Generalizability Paradox
I had the pleasure of hosting the authors of the China hypertension study on a webinar sponsored by The Health Enhancement Research Organization (HERO), where I serve as a Senior Fellow. It is publicly available on HERO’s web site in the webinar archives. 11 Toward the end of the webinar we focused on internal and external validity questions and discussed cultural differences between our countries as they may influence intervention effectiveness. I asked co-investigator Zuqui Zhang, who presently works at a large health system in the United States, whether he considered it easier to have delivered such a comprehensive program in China. “The quick answer is yes,” said Zhang, “China is a collectivism culture where America is an individualism culture.” Zhang described how participation in the program offerings occurred on company time, and how supervisors and leaders encouraged participation because “they know it will help employee health and reduce the loss of employee talent.” He went on to discuss his experiences working in the United States, and said, “it is not easy to inspire employees to participate” and, he felt that leadership teams in America do not have sufficient incentives to encourage participation.
I was also honored to host authors of the Illinois study to present their findings at a HERO conference and in a webinar. At each venue they dutifully pointed out how their findings may not be generalizable to other organizations. Nevertheless, the title for the initial publication of their findings was, “What Do Workplace Wellness Programs Do? Evidence from the Illinois Workplace Wellness Study.” 12 It’s a title that denotes generalizability and panders to the impulses of journalists who tend to treat one study as definitive. After reading the Illinois manuscript I said a more fitting title would have been: “What did our wellness program do?” It’s a sentiment shared by Goetzel in his critique of the Illinois study. 13 Dr. Goetzel also summoned many experts to ask, “Do worksite health promotion programs work?” Or, essentially, what is the external validity of research on worksite wellness to date? Their answer? “It depends.” 14 Not exactly an eye grabbing headline, but, true.
I’d speculate that the third cause behind the inordinate attention given to studies with weak findings is a common tendency to assume that one study, good or bad, reflects on the efficacy of an entire discipline. Preeminent health promotion scientist, Dr. Larry Green, offered a terrific lecture on 3 research paradoxes, all of which relate to how study findings should be interpreted and adopted, or not, in the real world. 15 Of the 3 paradoxes, the China, Illinois and BJ’s studies need to be particularly considered in the context of the “specificity-generalizability paradox.” This paradox holds that the more specific an intervention is to one setting or context, the less generalizable the findings may be to other settings. How should outcomes from a university population, with sternly autonomous faculty members, inform the program planning at an auto plant such as where Foote and Erfurt wrought such amazing results? How do participation rates of employees in a Wholesale Club, with their strictures on accounting for work hours, compare to participation rates of employees in a tech firm with flexible scheduling?
A less nuanced example than comparing outcomes between worksite wellness initiatives, but every bit as instructive, is to consider the dramatic difference in the control of the Coronavirus between America and China. 16,17 It is apparent that the authoritarian aspects of China’s culture confers an advantage in adherence to rules of epidemic management. Few would argue that achieving China’s level of success in controlling a pandemic could be achieved if it depended upon seeking fidelity to China’s cultural norms. The question then, is not merely whether a study achieved good or bad results, but rather, what can we learn from a study’s successes or failures that can inform future research and can improve upon our current best practices? As Dr. Green explains (with the colloquialisms attendant to speaking rather than writing): “What I’m arguing and campaigning on, if I may, in the spirit of full disclosure, is that we really ought to be approaching adaptation as a process. A process by which we can make the changes—the lack of fidelity, with all of its moral overtones. If we can make that more scientific. Not make a special intervention less scientific by applying it in settings in which it wasn’t tested in the first place.”
In this regard, results from studies should start a conversation about those uncommon features of the study setting compared to those that are readily replicable. We should be judicious about study conclusions and parse between adoption recommendations based on facts versus researcher’s confirmation bias or a journalist’s opinion. 18 Instead of simply observing what worked, we should be examining the differences between individuals who benefited and those who did not. And rather than thinking of studies as discrete events with a finish line, we need to ask how we can sustain wins over time in ways that meet the needs of increasingly diverse populations far and wide.
