Put All Your Health Investments Under the Same Lens

Abstract

Many health care providers, payers, and employers offer several programs designed to improve health and worker productivity. The 2022 Employer Health Survey from the Kaiser Family Foundation¹ reports that “85% offer workers one or more wellness programs, such as programs to help them stop smoking or lose weight, or programs that offer lifestyle and behavioral coaching.” These come in addition to insurance and benefit programs and complement other financial investments in the workforce.

To profile the performance of many programs, one can apply a consistent framework that compares them with each other in terms of their reach (penetration), implementation, participation, and effectiveness (PIPE). Pronk² developed this PIPE framework; it has been applied in many studies over the past 2 decades.

Most reports do not include enough information to thoroughly vet the value of health promotion programs and other investments, however. For example, Aziz et al³ reported that only 3 of 38 diabetes management programs that reported results from any of the PIPE program performance categories reported on all 4 of them. Those studies are by Mensink et al⁴ and Bo et al⁵ that are referenced in this point of view.

Another useful way to report program value is to tell a wide-ranging story describing:

Who health promotion programs and other investments are meant to serve;

the physical, mental, and behavioral risks these people have;

how well these programs engage people;

how well programs are operating;

how programs influence access to care, quality of care, health care utilization, and expenditures;

how program participants feel about the utility of these programs; and

what other financial or clinical results these programs deliver.

Ozminkowski and Serxner⁶ summarized these aspects of program value in a conceptual article for employers. Wells et al⁷ provided an example from a wellness program for beneficiaries with Medicare supplement coverage.

These frameworks can be combined to provide a comprehensive evaluation of many programs simultaneously. Viewing many programs using the same rubric can illustrate the value of each, relative to the value of the others. It can also be used to provide an estimate of the overall value of program investments.

Illustrating the Combined PIPE/Storytelling Framework

Rows in the following Table 1 show how several storytelling metrics fit into the PIPE framework. Columns illustrate how results obtained about these metrics can be arrayed for each program.

Table 1.

A Consistent Framework for Profiling Multiple Programs

PIPE category	“Telling the Right Story” category	Metric examples and advice	Pronk ²	Mensink et al.⁴	Bo et al.⁵
Penetration (P)	Who We Serve	Total qualified members	16,968	6108	1877
		Percent of qualified members reached	98%	100%	100%
	The Physical, Mental, and Behavioral Risks They Have or Have Screened Positive For	Percent screened positive for each risk (eg, high blood pressure, high cholesterol, smoking, anxiety, depression, poor exercise, and/or eating habits)	Poor exercise habits	Poor exercise and eating habits	Poor exercise and eating habits
	The Health Conditions They Have	(Report percent diagnosed with most prevalent and/or most costly conditions)	Diabetes	Diabetes	Diabetes
Implementation (I)	Operating Metrics	(Select metrics related to fidelity of the interventions and report percent of these executed as planned)	85%	27%	23%
Participation (P)		(Develop common definitions of engagement and report engagement rates by program and modality of engagement)
		Percent who complete program	17%	2%	18%
Effectiveness (E)	Quality of Care, Utilization, Expenditure Metrics	Percent of care gaps closed among those who engage	45%	48%	75%
		Percent adherence to selected evidence-based quality-of-care metrics
		Percentages of emergency department visits, hospitalizations, and/or readmissions avoided
		Percentages of low-value services avoided
		(Select other relevant effectiveness metrics for all programs if not reflected above and report percentages of members reaching those metrics)
	Access to Care	Percentage reporting adequate or better access to doctors, specialists, hospitals, etc
	Health Status	Percent reporting perceived health status as excellent or good (or other relevant metrics)
	Satisfaction	Percent rating program as excellent or very good
	Financial Metrics	Percent with positive return on investment—eg, percent of participants whose medical/Rx expenditures decline during/after engagement by an amount that is greater than the per-case cost of purchasing/operating/administering the program
		Insurance program renewal percentage
		Medical loss ratio (ie, percent of insurance premiums spent on care in year)
		Cost of purchasing and operating the program per member per month
		Monetary savings or losses per member per month
		Net present value (inflation-adjusted and discounted benefits—costs)
PIPE Values for Each Group of Members, or for each Metric Example of interest		(Report several PIPE scores by taking desired percentages from above rows (1 from each PIPE category) and multiplying them together
Overall PIPE Value for Each Program Separately		(Then report the weighted average of the above PIPE metrics as an overall effectiveness measure for each program separately)	98% ^* 85% ^* 17% ^* 45% = 7%	100% ^* 27% ^* 2% ^* 48% = < 1%	100% ^23% ^18% ^*75% = 3%
Summary PIPE Value for All Programs Taken Together		(To obtain a summary PIPE value for all programs taken together, report the weighted average of the overall PIPE values from each separate program above. In our example of these 3 programs, the summary PIPE score is about 5%.)

PIPE: penetration, implementation, participation, and effectiveness.

The framework and structure applied in this table can guide production of reporting dashboards and details that include a wide array of visualization techniques that address many different learning styles.

Issues and Concluding Thoughts

Several items should be kept in mind when applying this framework.

PIPE scores may differ for many subgroups

For example, PIPE scores can be stratified by age, gender, location, type of risk, chronic condition, etc. This will help find pockets of success or failure that would otherwise be masked by focusing just on the entire group of everyone who qualifies for or is served by the program. Do not keep or kill a program based on just a single PIPE score; the variation among scores for population subgroups provide insights about where improvements can be made.

Similarly, although PIPE scores are often created from 1 metric in each of the 4 PIPE categories, creating several versions of the PIPE score based upon different storytelling metrics in each category may point out additional areas of success or need for improvement.

Every metric is not relevant for every program

For example, economists and business leaders may be interested in the financial metrics. Others may focus more on health status, access to health care, utilization, program satisfaction, worker productivity, or other metrics. For employers, providers, and payers who have programs addressing a variety of goals, any PIPE/storytelling table is likely to have many blanks. Reflecting upon these blanks, along with other cell entries, can help guide the strategy for many programs at once.

Low PIPE scores are not necessarily bad

Aziz et al³ show a wide range of performance across all the PIPE categories. Often it is just 1 or 2 categories that lead to low overall scores, such as low participation or implementation difficulties. These should be investigated.

Viewing the range of PIPE scores across many programs can help set expectations about program impact that might otherwise be unrealistic

In general, program impact expectations for health-related interventions should be like expectations for other human resources programs or financial investments.⁸ Putting all programs under the same lens will illuminate successes and failures that can guide future investment.

Every report, visualization, extrapolation, and insight carries risks of unintended biases

Some biases can be avoided, and some cannot, even in well-conducted quasi-experimental or randomized studies. Bias can be minimized by using a scientifically sound research design to guide reporting. One example not shown in the table would be to add columns to include results obtained from relevant, similar comparison groups of people who are not exposed to the programs you are reporting on. Contrasting results from these people with results from similar program participants will produce better estimates of program impact.⁸ Methods to design and conduct studies that may produce causal inferences are described in Pearl et al⁹ and Morgan and Winship.¹⁰

Some programs are available to everyone, whereas others are meant only for people with certain problems. The process described here, which involves structured storytelling based on several relevant metrics and a consistent reporting framework, can be used to make insightful comparisons for all of them.

References

Kaiser Family Foundation. Section 12: Health Screening and Health Promotion and Wellness Programs—10020 | KFF, 2022.

Pronk

NP.

Designing and evaluating health promotion programs: simple rules for a complex issue. Dis Manage Health Outcomes, 2003; 11:149–157.

Aziz

, Absetz

, Oldroyd

, Pronk

, Oldenburg

. A systematic review of real-world diabetes prevention programs: learnings from the last 15 years. Implemen Sci, 2015; 10:172.

Mensink

, Feskens

, Saris

, De Bruin

, Blaak

. Study on lifestyle intervention and impaired glucose tolerance Maastricht (SLIM): preliminary results after one year. International journal of obesity and related metabolic disorders. J Inter Assoc Study Obesity, 2003; 27:377–384.

, Ciccone

, Baldi

, et al. Effectiveness of a lifestyle intervention on metabolic syndrome. A randomized controlled trial. J Gen Intern Med, 2007; 22:1695–1703.

Ozminkowski

, Serxner

Tell the Right Story with Your Program Reporting Processes | Corporate Wellness | Employee Well-Being (corporate wellness magazine.com) Accessed June 3, 2023 .

Wells

, Ozminkowski

, McGinn

, et al. Incorporating reporting efforts to managed and improve health and wellness programs. Popul Health Manag, 2017; 20:181–188.

Ozminkowski

RJ.

The development, implementation, and evaluation of corporate wellness programs. In: Burke

, Richardsen

, eds. Corporate Wellness Programs: Linking Employee and Organizational Health, Chapter 15. Northampton, MA: Edward Elgar Publishing, Inc., 2014:322–346.

Pearl

, Glymour

, Jewell

. Causal Inference in Statistics. Chichester, West Sussex, UK: John C. Wiley & Sons Ltd., 2016.

10.

Morgan

, Winship

. Counterfactuals and Causal Inference: Methods and Principles for Social Research, 2nd Edition. Cambridge, UK: Cambridge University Press, 2015.