Abstract

Many health care providers, payers, and employers offer several programs designed to improve health and worker productivity. The 2022 Employer Health Survey from the Kaiser Family Foundation 1 reports that “85% offer workers one or more wellness programs, such as programs to help them stop smoking or lose weight, or programs that offer lifestyle and behavioral coaching.” These come in addition to insurance and benefit programs and complement other financial investments in the workforce.
To profile the performance of many programs, one can apply a consistent framework that compares them with each other in terms of their reach (penetration), implementation, participation, and effectiveness (PIPE). Pronk 2 developed this PIPE framework; it has been applied in many studies over the past 2 decades.
Most reports do not include enough information to thoroughly vet the value of health promotion programs and other investments, however. For example, Aziz et al 3 reported that only 3 of 38 diabetes management programs that reported results from any of the PIPE program performance categories reported on all 4 of them. Those studies are by Mensink et al 4 and Bo et al 5 that are referenced in this point of view.
Another useful way to report program value is to tell a wide-ranging story describing: Who health promotion programs and other investments are meant to serve; the physical, mental, and behavioral risks these people have; how well these programs engage people; how well programs are operating; how programs influence access to care, quality of care, health care utilization, and expenditures; how program participants feel about the utility of these programs; and what other financial or clinical results these programs deliver.
Ozminkowski and Serxner 6 summarized these aspects of program value in a conceptual article for employers. Wells et al 7 provided an example from a wellness program for beneficiaries with Medicare supplement coverage.
These frameworks can be combined to provide a comprehensive evaluation of many programs simultaneously. Viewing many programs using the same rubric can illustrate the value of each, relative to the value of the others. It can also be used to provide an estimate of the overall value of program investments.
Illustrating the Combined PIPE/Storytelling Framework
Rows in the following Table 1 show how several storytelling metrics fit into the PIPE framework. Columns illustrate how results obtained about these metrics can be arrayed for each program.
A Consistent Framework for Profiling Multiple Programs
PIPE: penetration, implementation, participation, and effectiveness.
The framework and structure applied in this table can guide production of reporting dashboards and details that include a wide array of visualization techniques that address many different learning styles.
Issues and Concluding Thoughts
Several items should be kept in mind when applying this framework.
PIPE scores may differ for many subgroups
For example, PIPE scores can be stratified by age, gender, location, type of risk, chronic condition, etc. This will help find pockets of success or failure that would otherwise be masked by focusing just on the entire group of everyone who qualifies for or is served by the program. Do not keep or kill a program based on just a single PIPE score; the variation among scores for population subgroups provide insights about where improvements can be made.
Similarly, although PIPE scores are often created from 1 metric in each of the 4 PIPE categories, creating several versions of the PIPE score based upon different storytelling metrics in each category may point out additional areas of success or need for improvement.
Every metric is not relevant for every program
For example, economists and business leaders may be interested in the financial metrics. Others may focus more on health status, access to health care, utilization, program satisfaction, worker productivity, or other metrics. For employers, providers, and payers who have programs addressing a variety of goals, any PIPE/storytelling table is likely to have many blanks. Reflecting upon these blanks, along with other cell entries, can help guide the strategy for many programs at once.
Low PIPE scores are not necessarily bad
Aziz et al 3 show a wide range of performance across all the PIPE categories. Often it is just 1 or 2 categories that lead to low overall scores, such as low participation or implementation difficulties. These should be investigated.
Viewing the range of PIPE scores across many programs can help set expectations about program impact that might otherwise be unrealistic
In general, program impact expectations for health-related interventions should be like expectations for other human resources programs or financial investments. 8 Putting all programs under the same lens will illuminate successes and failures that can guide future investment.
Every report, visualization, extrapolation, and insight carries risks of unintended biases
Some biases can be avoided, and some cannot, even in well-conducted quasi-experimental or randomized studies. Bias can be minimized by using a scientifically sound research design to guide reporting. One example not shown in the table would be to add columns to include results obtained from relevant, similar comparison groups of people who are not exposed to the programs you are reporting on. Contrasting results from these people with results from similar program participants will produce better estimates of program impact. 8 Methods to design and conduct studies that may produce causal inferences are described in Pearl et al 9 and Morgan and Winship. 10
Some programs are available to everyone, whereas others are meant only for people with certain problems. The process described here, which involves structured storytelling based on several relevant metrics and a consistent reporting framework, can be used to make insightful comparisons for all of them.
