Abstract
Medicare uses a pay-for-performance program to reimburse hospitals. One of the key input measures in the performance formula is patient satisfaction with their hospital care. Physicians and hospitals, however, have raised concerns regarding questions related to patient satisfaction with pain management during hospitalization. They report feeling pressured to prescribe opioids to alleviate pain and boost satisfaction survey scores for higher reimbursements. This overprescription of opioids has been cited as a cause of current opioid crisis in the United States. Due to these concerns, Medicare stopped using pain management questions as inputs in its payment formula. The authors collected multiyear data from six diverse data sources, employed propensity score matching to obtain comparable groups, and estimated difference-in-difference models to show that, in fact, pain management was the only measure to improve in response to the pay-for-performance system. No other input measure showed significant improvement. Thus, removing pain management from the formula may weaken the effectiveness of the Hospital Value-Based Purchasing Program at improving patient satisfaction, which is one of the key goals of the program. The authors suggest two divergent paths for Medicare to make the program more effective.
In 1998, the Institute of Medicine formed the Committee on Quality of Health Care in America aimed at developing a strategy to improve health care. This committee prepared two reports that have driven many of the changes in health care in the past two decades. The first report (Institute of Medicine 2000) was aimed at improving the safety of U.S. health care. The second report (Committee on Quality of Health Care in America 2001) outlined a framework for improving the quality of health care (Lindsay 2017). It highlighted the physician and hospital payment system as a large cause of quality problems in health care and a barrier to health reform. In the Medicare program, clinicians had incentives to focus on doing more rather than doing better. Since this report, Centers for Medicare & Medicaid Services (CMS) has gradually moved in the direction of a value-based pay-for-performance (P4P) system requiring hospitals to evaluate and demonstrate service delivery effectiveness (Lee et al. 2017).
In 2010, as a part of the Patient Protection and Affordable Care Act, CMS introduced the Hospital Value-Based Purchasing (HVBP) Program. It connected the Medicare payment system directly to patient care delivery and perceived quality measures. The program’s purpose was to reduce cost and improve health care quality. To do so, Medicare imposed reimbursement penalties or provided reimbursement bonuses on the basis of a hospital’s annual quality measures and actual health care outcomes in prior years (Lee et al. 2017). It went into effect in fiscal year 2013 and is mandatory for all acute-care hospitals, public and private, in the United States, except hospitals in Maryland, which operate under a different all-payer model. Under the HVBP Program, Medicare withholds a percentage of reimbursements (starting with 1% in 2013 and increasing by .25% each year to reach the target of 2% in 2017) from hospitals that do not perform well on a set of prespecified health care quality measures. Hospitals that do perform well receive reimbursement bonuses. It is a budget-neutral program, such that the total amounts of the rewards and penalties are equal. In 2018, the HVBP funding pool held an estimated $1.9 billion (Lee et al. 2017).
Over the years, the program’s emphasis gradually shifted from process-based quality measures toward outcome-based quality measures. In the first year of HVBP, 70% of the measures were process-related, whereas it now rewards or penalizes hospitals on the basis of their performance on multiple domains of care, including clinical processes, clinical outcomes (i.e., 30-day mortality rate), cost efficiencies (i.e., cost per discharge), and patient satisfaction (Figueroa et al. 2016). The evidence for effectiveness of this program in improving the specified quality measures, however, is mixed.
Patient satisfaction, which carries a weight of 25% in the HVBP payment formula, is obtained from the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey, which is the first national, standardized, publicly reported survey of patients’ experience of hospital care. Although the survey was inducted into the HVBP Program in 2012, the survey data have been collected since 2006 and publicly reported since 2008 (Tefera, Lehrman, and Conway 2016).
The HCAHPS survey asks patients about their recent hospital stay and attempts to score their overall experience and eight specific dimensions of their experience of care. One of these dimensions is patient perception of the quality of pain management during hospitalization. Research has shown that managing patient expectations about pain during and after surgical procedures can reduce patients’ distress, reduce their number of signs and symptoms, and improve their functional status. It can also result in positive emotional outcomes for patients, such as a decrease in anxiety and depression and an increase in a sense of well-being (Glowacki 2015). However, many health care providers have expressed concerns about the survey’s questions on pain management, saying that these questions wrongly equated pain management with prescription of a painkiller (Lowes 2016). 1 They reported feeling pressured to prescribe opioids to boost their hospital’s survey scores and, in turn, their hospital’s reimbursements. The American Hospital Association was among several prominent health care associations asking CMS to stop considering pain management questions in the HCAHPS survey when calculating payments under the HVBP Program (Dickson 2016). According to this school of thought, incentivizing aggressive pain management has contributed to the overprescribing of opioids in the United States and to the country’s larger struggle with opioid addiction and overdose (Hall Render 2016).
In response to these concerns, and to remove any perceived incentives of prescribing opioids, in July 2016 CMS announced that the HCAHPS survey’s pain management questions would no longer be considered in the calculation of HVBP. CMS has, however, stressed that robust pain control is an appropriate part of routine inpatient care, and it is conducting research to determine whether the HCAHPS survey is indeed associated with the opioid epidemic. Depending on the findings, it may develop new questions to reinclude the pain dimension in the HVBP calculation in the future (Hall Render 2016).
In this article, we study the effectiveness of the HVBP Program at improving patient satisfaction. Most of the existing studies in this domain fail to account for the wide heterogeneity of the more than 3,000 HVBP hospitals when comparing them with a small control group of fewer than 50 non-HVBP hospitals in Maryland. Furthermore, a large number of studies depend on only one year of data to observe changes in quality measures. We address both these limitations by collecting data over multiple years and employing propensity score matching to obtain a matched treatment group of HVBP hospitals before comparing them with the control group of hospitals in Maryland.
We integrated multiyear data from several diverse publicly available data sources. Then, we utilized difference-in-difference estimation framework to determine whether the HVBP Program actually led to improvement in patient satisfaction at the treatment group of hospitals compared with control group of hospitals. Our findings show that the only dimension of patient satisfaction that showed significant improvement was patient experience with pain management during hospitalization. Removing this very measure from penalty and bonus calculation thus may weaken the effectiveness of the HVBP Program at improving patient satisfaction, which is one of the key goals of the program.
We suggest two divergent paths for CMS to address this issue. One is that CMS could reinclude pain management in the HVBP payment formula. To address the potential association between these questions and opioid prescriptions, it could separately track opioid prescriptions at each hospital. Alternatively, CMS should consider completely removing patient satisfaction measures from the HVBP Program. Doing so would allow hospitals to focus their resources and attention back on clinical processes and outcomes. This might also deliver cost savings for both participating hospitals and CMS by eliminating the costs of administering the survey and analyzing responses from more than 3 million patients every year.
Background
The Committee on Quality of Health Care in America’s (2001) report highlighted the physician and hospital payment system as a significant cause of quality problems in health care and a barrier to health reform in the United States. This report spurred the CMS to move from a P4P system in which physicians and hospitals were reimbursed on the basis of the volume of services provided toward a value-based P4P system, such that they are rewarded for providing high-quality care and penalized for providing low-quality care (Jha 2017). In 2003, CMS tested the Premier Hospital Quality Incentive Demonstration, a P4P pilot project involving more than 200 hospitals, which provided financial incentives to physician groups that performed well on quality and cost measures (Damberg et al. 2014). In 2005, it launched the Hospital Compare database, with public reporting of process measures of hospital quality, later extending this reporting to include clinical outcomes such as mortality rates.
The HVBP Program
CMS introduced the HVBP Program in 2010 to improve health care quality and reduce costs. The program went into effect in 2013 and is mandatory for all acute-care hospitals in the United States (except Maryland). In this program, Medicare imposes reimbursement penalties or provides reimbursement bonuses based on a hospital’s performance on a set of predefined quality measures. Medicare withholds a percentage of reimbursements from hospitals that do not perform well and distributes this money as a performance bonus to hospitals that do perform well on its quality measures.
The Total Performance Score (TPS), which is used as the basis for calculation of reimbursement bonus or penalty, comprises four dimensions of health care delivery: clinical processes, clinical outcomes (i.e., 30-day mortality rate), cost efficiencies (i.e., cost per discharge), and patient satisfaction. Half of the score is based on clinical measures, with clinical outcomes contributing 40% to the total score and clinical processes contributing 10%. The rest of the score is obtained equally from cost efficiency and patient satisfaction (i.e., patient experience of care), with both contributing 25% each. Each hospital in the program gets two sets of scores on each of the four dimensions: one for achievement (hospital’s own performance compared with the 50th percentile of all hospitals’ performance) and one for improvement (hospital’s performance compared with its own performance in the previous period). The higher of the two scores on each dimension is utilized in the calculation of TPS, which is a weighted average of the four dimensions.
Whereas other dimensions of care delivery are objective, patient satisfaction is obtained from the HCAHPS survey. The survey is composed of 32 questions and is administered to a random sample of adult inpatients between 48 hours and 6 weeks after discharge from short-term, acute-care hospitals (Tefera et al. 2017). It is the first national, standardized, publicly reported survey of patients’ experience of hospital care.
Patient Satisfaction
Patient satisfaction has been widely studied by marketing scholars. Focusing on its potential antecedents, scholars have uncovered the role of demographic variables. For example, Mummalaneni and Gopalakrishna (1995) found that patient occupation, education, gender, and income were strongly associated with their satisfaction with health care service. This indicates that hospitals may not have full control over all determinants of patients’ perception of care received. Using Australian patient data, Rundle-Thiele and Russell-Bennett (2010) found that patients visiting more frequently reported a higher level of satisfaction. Studying health care as a service, scholars have also explored patient satisfaction from a service quality point of view. In a study based on South Korean health care consumers, Choi et al. (2005) found that compared with value proposition, service quality was a more important determinant of patient satisfaction. Exploring the role of various service quality dimensions, they found that physician concern, staff concern, and convenience of care were the most important determinants of outpatient satisfaction. The HCAHPS survey asks about most about these determinants.
Other marketing scholars have explored the benefits of patient satisfaction to hospitals. For example, patient satisfaction has been strongly associated with loyalty to the hospital as well as net positive word of mouth (Chang, Tseng, and Woodside 2013; Fisk et al. 1990; Meesala and Paul 2018). Using a large data set of 15,000 patients, Nelson et al. (1992) found that patient satisfaction was associated with financial strength of the hospital as measured by its earnings, net revenues, and return on assets. Raju and Lonial (2002) found that patient-perceived quality could explain up to 30% variation in hospital profitability.
Despite this recognized importance of patient satisfaction, its inclusion in the HVBP Program has led to a vigorous debate in policy circles. Advocates for inclusion of patient satisfaction contend that it measures critical components of care that only patients can report, such as whether pain was addressed effectively and whether patients received clear communication from physicians and nurses. This makes it an essential measure of how well a health care system functions. In a setting where patients should be the primary focus, the content of their experiences can help clinicians better mobilize around their needs. This builds trust in the health care system from the perspective of the patient and promotes collaborative practices between clinicians and patients (Chatterjee, Tsai, and Jha 2015). Prior studies in multiple health care settings have shown that poor patient satisfaction with the health care system is associated with slower recovery from illness and a lower likelihood of compliance with prescribed treatment regimens. Consequently, suboptimal patient experience has important implications not only for the health of patients but also for health care costs, which increase when patients use more health care services because of poor recovery and noncompliance (Chatterjee et al. 2012). When patients have a better experience, they are more likely to comply with treatments, return for follow-up appointments, and engage with the health care system by seeking appropriate care (Chatterjee, Tsai, and Jha 2015).
Critics of the patient satisfaction dimension in the HVBP Program argue that it drives physicians to focus on the wrong priorities, whereby hospitals end up behaving like hotels. Using patient satisfaction as a metric shifts provider attention away from delivering technically effective care to fulfilling patient expectations and demands (Chatterjee, Tsai, and Jha 2015). By conflicting with the clinical practice guidelines, higher patient satisfaction in fact may be associated with a higher rate of inpatient admissions, higher overall health care costs, and increased mortality. For example, providing a prescription may result in a satisfied patient but may increase the cost of care and contribute to ills such as antibiotic resistance and the opioid crisis (Lindsay 2017).
Pain Management Under HVBP and the Opioid Crisis
Within the broad criticism of including patient satisfaction in the HVBP Program, one item in particular has come under harsh scrutiny. The HCAHPS survey asks patients about their recent hospital stay and attempts to score eight dimensions of the experience of care they received. One of these dimensions is patient perception of the quality of pain management care during hospitalization.
In 2016, approximately 100 million people suffered from pain in the United States, out of which 9 million to 12 million complained of chronic pain. Others reported short-term pain from injuries, diseases, or medical procedures (Stoicea et al. 2019). Not managing patient expectations about pain during and after surgical procedures can result in poorer clinical and psychological outcomes for the patients. Patients in pain also have negative perceptions of health care they receive. Egbert et al. (1964) reported that patients who received pain education required 50% fewer narcotics during hospitalization and were discharged sooner than patients who did not receive pain education (Glowacki 2015).
In 1996 the American Pain Society labeled pain as the “fifth vital sign” and developed a national quality improvement program emphasizing measurable patient outcomes of effective pain management such as decreased length of stay, reduced hospital costs, and increased patient satisfaction (Glowacki 2015). In 2000, the Joint Commission on Accreditation of Healthcare Organizations released new pain management standards that asserted that pain control was a patient’s right, highlighted it as a perceived gap in clinician education and training, encouraged an aggressive approach to pain assessment, and emphasized safe pain management (Chidgey, McGinigle, and McNaull 2019). The commission established that both acute and chronic pain were major causes of patients’ dissatisfaction in the U.S. health care system (Glowacki 2015).
In 2010, the HVBP Program instituted reforms that included financial incentives for higher patient satisfaction scores. Patient satisfaction is strongly associated with their perspectives on management of signs and symptoms of their condition. They are more likely to experience dissatisfaction if they perceive a lack of validation in their pain experience or negative attitudes from their providers (Glowacki 2015). The HCAHPS survey contained three questions focused on pain management. Some physicians expressed concern that the questions wrongly equated pain management with prescription of a painkiller (Lowes 2016). These questions placed pressure on hospital staff to prescribe more opioids to achieve higher scores on the survey (Hall Render 2016). Furthermore, patients complete the survey during a time when many are filling postdischarge opioid prescriptions. This timing could also inadvertently incentivize providers to overprescribe opioids after discharge to ensure satisfactory ratings (Lee et al. 2017). Although pain management may constitute only a small part of the survey, respondents do not necessarily separate out pieces of the experience with which they were unhappy. If they were in pain and the hospital did not give them a painkiller despite their request, they may conclude that the hospital did not take good care of them. This can affect their responses to the entire HCAHPS survey (Tefera and Lehrman 2016). Thus, many physicians said that they felt pressured to overprescribe opioids to boost their hospital’s survey scores and, in turn, their hospital’s reimbursements.
The opioid crisis
The current opioid crisis started taking shape in the 1990s. From the late 1990s until 2012, opioid prescriptions written each year in the United States steadily rose to an annual peak of 225 million (Chidgey, McGinigle, and McNaull 2019). The Centers for Disease Control and Prevention reports that deaths attributable to prescription opioids more than tripled in the United States during the 1999–2014 period (Dickson and Blesch 2016; Jena, Goldman, and Karaca-Mandic 2016). Approximately 6% of the U.S. population (between 15 and 64 years old) reported some type of opioid abuse in 2015, and more than 42,000 people died of opioid overdose in 2016 alone (Stoicea et al. 2019; Volkow et al. 2019).
Prescription of opioids at the time of discharge from an acute hospitalization represents an important but underdescribed potential avenue through which patients may develop long-term opioid use. Use of opioids during and shortly after hospitalization is warranted in some clinical settings (e.g., patients undergoing surgery). Opioids are “powerful pain-reducing medications,” which, administered at appropriate doses, are effective at not only eliminating pain but also further preventing its recurrence in long-term recovery scenarios (Stoicea et al. 2019). Failure to appropriately manage pain in such cases may delay discharge from the hospital, interfere in postoperative rehabilitation, and adversely affect patients’ quality of life. However, use of opioids is also associated with both short- and long-term risks, including developing a dependence (Jena, Goldman, and Karaca-Mandic 2016).
Overprescription of opioids has been frequently identified as a major cause of the current opioid crisis (Chidgey, McGinigle, and McNaull 2019). Overprescribing has been attributed to misinformation and outside pressure from both pharmaceutical companies and accreditation bodies such as the Joint Commission on Accreditation of Healthcare Organizations. Caught between regulatory requirements aimed at eliminating pain and aggressive marketing campaigns along with a shift in cultural beliefs about pain control, physicians became unwitting accomplices in the opioid crisis (Chidgey, McGinigle, and McNaull 2019). In fact, the Promoting Responsible Opioid Prescribing Act of 2016 suggested that the pain management measure in the HCAHPS survey could have incentivized both greater inpatient use of opioids and greater prescribing of opioids at the time of discharge (Jena, Goldman, and Karaca-Mandic 2016).
However, another school of thought believes the evidence on the link between the HCAHPS survey and opioid prescription is inconclusive. For example, Lee et al. (2017) found no correlation between postoperative opioid prescribing and scores on HCAHPS pain measures. A coalition that included several pain-medicine societies such as the American Pain Society and the American Academy of Pain Medicine lobbied the CMS to retain the three questions, at least until better ones were drafted. They warned that in the absence of any conclusive evidence, eliminating pain-related questions would be a step back in proper pain management and would deprive researchers of valuable data that could improve pain management (Lowes 2016).
CMS has also offered defense of its decision to include patient perception of pain management in HCAHPS survey and, consequently, in the HVBP Program. Historical data show that the sharp increase in opioid prescription in mid-1990s coincided with the conceptualization of “Pain as the 5th Vital Sign” by the American Pain Society and the pharmaceutical industry’s campaign to detail opioid prescribing as safe, reasonable, and effective for chronic pain while downplaying the risks of opioid dependence, abuse, and overdose. The crisis thus began years before the HCAHPS survey was launched in 2006. There was no noticeable acceleration in opioid prescription in 2006 or in 2008, when public reporting of hospital scores began (Tefera et al. 2017).
Regarding the use of the HCAHPS survey, CMS is not aware of any empirical evidence that physicians prescribe opioids to inpatients with an intention to obtain better scores on the pain management questions, or that patients who receive opioids rate their hospital experience more positively than those who do not (Tefera and Lehrman 2016). Nothing in the survey suggests that opioids are a preferred way to control pain. In fact, good nurse and physician communication, critical issues from the patient perspective, are strongly associated with better HCAHPS scores (Tefera, Lehrman, and Conway 2016). There is no evidence that experience with pain management dominates patients’ overall assessment of their hospital experience. Moreover, the way the HCAHPS survey contributes to HVBP makes the pain management dimension negligible as far as its impact on the overall payment to the hospital; it is one of the eight equally weighted dimensions of patient satisfaction and determines less than one-tenth of 1% of total payment to the hospital (Tefera and Lehrman 2016). Indeed, patients diagnosed with substance abuse disorders are not included for the scoring of HVBP (Dickson and Blesch 2016).
Nonetheless, bowing to consistent criticism from health care providers, in July 2016 CMS announced that the pain management questions of the HCAHPS survey would not be considered in HVBP to remove any perceived incentives of prescribing opioids. Given the complexity of the issue straddling two national challenges—inadequate pain management and opioid overprescribing—and the need for additional research, CMS decided to continue to survey patients about pain management and provide participating hospitals with valuable patient feedback. However, these pain dimension results are not a part of the HVBP calculation (Tefera et al. 2017).
Empirical Evidence for the Effectiveness of the HVBP Program
The HVBP Program was instituted with the intention of improving health care outcomes and patient experience and reducing costs. However, evidence about the effectiveness of the program to achieve these goals is mixed.
A study comparing data in HCAHPS surveys in 2008 and 2009 found improvements in all measures in patient experience except doctors’ communication (Elliott et al. 2010). Staff responsiveness and whether patients received discharge information saw the largest improvements. Westbrook, Babakus, and Grant (2014) used factor analysis to show that all dimensions of HCAHPS survey except discharge information significantly influenced patient satisfaction. However, the study was based on data from two hospitals only. A study using difference-in-difference estimation methodology found that participating hospitals did not show significant improvement in any of the quality measures (Ryan et al. 2015). Some studies have compared participating hospitals in the HVBP Program with various control groups to determine whether the program made a relative difference in the quality of health care they deliver. A study comparing the participating hospitals with critical-care hospitals and hospitals in Maryland (these two categories of hospitals are not required to participate in HVBP) found no improvement in clinical outcomes as measured by 30-day mortality rates (Figueroa, Horneffer, and Jha 2016). Another study comparing the participating hospitals with Maryland’s critical-care hospitals found no significant differences in the improvement in clinical processes and patient experience across the two groups (Ryan et al. 2017). Papanicolas et al. (2017) found only moderate improvement in patient experience among HVBP hospitals, but even this improvement had occurred mostly before the intervention period.
Several other studies have compared groups of hospitals on the basis of some underlying characteristics and demonstrated that whereas one group shows an improvement, the other does not. For example, Jha et al. (2008) found substantial differences in the patients’ experiences across different geographical regions, which they attributed to the style of caregiving and organizational leadership. A study using data from 2009 to 2011 found that hospitals catering largely to older, White, female patients who underwent relatively fewer procedures did better under the program (Johnston et al. 2015). These hospitals were predominantly nonteaching, smaller, urban hospitals owned by the government or religious organizations. Another study using data from 2014 compared penalty or reward status of safety-net hospitals (i.e., hospitals that are legally required to provide health care regardless of patients’ insurance status) with other hospitals and found that safety-net hospitals were more likely to be penalized under the HVBP Program (Gilman et al. 2015; Joynt, Zuckerman, and Epstein 2017).
Many of these studies suffer from two limitations that could have biased their results. First, several of them used data from a single year only, which is not sufficient to capture the evolving dynamics in processes and outcomes of health care quality. Multiple years of data are required to capture any movement. Second, most studies do not account for heterogeneity in HVBP hospitals when comparing them with a small control group of hospitals. Comparing more than 3,000 hospitals under HVBP, which have a broad range of unique hospital and geolocational characteristics, with a small group of fewer than 50 hospitals, all of which are located in Maryland, can lead to biased results. Ideally, one should first obtain a matching sample of treatment group (i.e., hospitals participating in HVBP) before comparing them with the control group so that one can minimize the role of hospital characteristics in any changes in their quality of health care delivery.
In this study, we address both these limitations. We employ multiple years of data for model estimation and use propensity score matching to obtain a matched treated group of HVBP hospitals to compare with a control group of hospitals in Maryland.
Methodology
Data
We integrated multiyear data from six diverse publicly available large data sources. Patient satisfaction data are from HCAHPS, clinical measures and clinical outcomes data are from two separate data sets available on the Medicare website, cost efficiency data are from the Hospital Inpatient Prospective Payment System (IPPS) of CMS, hospital characteristics are from CMS Impact Files, and demographic data are from the 2010 U.S. census.
Main variables of interest related to patient satisfaction in the HCAHPS survey are obtained from the Hospital Compare data from 2011 to 2015 available at CMS website. All short-term, acute-care, nonspecialty hospitals are required to participate in the survey. The survey is administered after discharge to a random sample of adult inpatients, creating standardized, publicly reported measures that allow for fair comparisons of patient experience in hospitals across the nation. The nine HCAHPS measures derived from the survey reported on the Hospital Compare website assess physicians’ and nurses’ quality of communication, responsiveness of hospital staff to patient needs, quality of pain management, communication about medication, required information at the time of discharge, cleanliness and quietness of patient rooms, and overall rating (Lindsay 2017). The survey is administered by hospitals or their contracted vendors, who send the data to CMS, which validates, analyzes, and publicly reports the results. The scores that CMS reports reflect hospital-level patient experience during a 12-month period (Tefera et al. 2017). The survey is widely used with more than 31,000 patients across 4,100 participating hospitals every day. After removal of ineligible patients, the survey has a 30% response rate that translates to 8,500 surveys completed daily. Meta-analyses have established that the survey does not suffer from any nonresponse bias. Because HCAHPS adjusts for patient characteristics, the data provide statistically valid results that may help inform patients’ choice of hospital and drive quality improvement at the hospital level. The official HCAHPS scores reported on the CMS Hospital Compare website are based on 3.1 million completed surveys each year (Tefera, Lehrman, and Conway 2016).
We obtained clinical measures and clinical outcomes data from the Medicare website (medicare.gov). The data set “Complications and Deaths—Hospital” provides clinical outcomes as evaluated by the HVBP Program: 30-day mortality rates for pneumonia, heart attack, and heart failure patients. The data set “Hospital Value-Based Purchasing (HVBP)—Clinical Care Domain Scores” provides clinical process scores. Cost efficiency data are obtained from the Hospital IPPS. They provide a summary of a hospital’s overall cost and total number of discharges from which cost per discharge was calculated. Hospital characteristics such as the number of beds, the number of employees, case mix index, number of discharges, and locational data are obtained from CMS Impact Files. According to CMS, the impact files are “generally prepared in the summer preceding the Federal fiscal year and are based on the best data available at the time. The files are used in estimating payment impacts of various policy changes to the IPPS proposed and finalized in the Federal Register” (https://www.nber.org/research/data/centers-medicare-medicaid-services-cms-impact-file-hospital-ipps). Demographic data were obtained from the 2010 U.S. census (census.gov) and matched with each hospital by 10-mile radius within the zip code.
To evaluate the effectiveness of the HVBP Program while also overcoming the limitations of existing research, we took several steps. We obtained data from 2011 to 2015, which was the last full year before CMS announced that pain management questions would no longer be used in HVBP calculations. Using multiple years of data enable us to capture improvement in various quality measures.
Propensity score matching
We used propensity score matching to obtain a group of treatment hospitals (HVBP Program participants) that match the group of control hospitals (which do not participate in the HVBP Program and are all located in Maryland) on a set of diverse variables.
Previous research has shown that various measures of hospital performance may be correlated with such factors as hospital characteristics and socioeconomic characteristics in the hospital’s vicinity. For example, patients of different races or ethnicities tend to rate their satisfaction level toward a hospital very differently (Weech-Maldonado et al. 2003). Even aggregate patient characteristics such as gender ratio, household income, and health status significantly affect the satisfaction rating of hospitals (Haviland et al. 2005; Weech-Maldonado et al. 2003). Clinical outcome measures such as mortality rates are significantly higher at for-profit hospitals (Hartz et al. 1989) and at major teaching hospitals and significantly lower at large urban hospitals (Keeler et al. 1992). Thus, comparing all the HVBP hospitals, which are heterogeneous with respect to these characteristics, with a small geographically concentrated control group can lead to biased findings. Using propensity score matching ensures that the two groups we compare are similar in all relevant characteristics. Any differences in the outcomes in these two groups can then be attributed to the treatment group’s participation in the HVBP program.
We used hospitals in Maryland as a control group because Maryland does not participate in the HVBP Program. The Medicare waiver (codified in Section 1814[b] of the Social Security Act) exempted Maryland from the IPPS and Outpatient Prospective Payment System and allowed it to set rates for these services. Given the long-standing Medicare waiver for its own rate-setting system, Maryland's hospitals are exempted from the Medicare HVBP Program and operate on the Maryland Quality-Based Reimbursement Program, which is predominantly based on process measures. It thus is the most obvious choice as a control group for the purpose of comparison.
The data set contains 45 hospitals from Maryland. However, five hospitals did not meet the minimum data requirement established by CMS for valid results; CMS requires a minimum of 100 surveys from hospital patients to report clinical quality measures. Thus, we used the remaining 40 hospitals in Maryland as our control group. We used nearest-neighbor propensity score matching to obtain a treatment group of 40 HVBP hospitals comparable to hospitals in the control group. To avoid problems of endogeneity, we based this matching on a set of characteristics that are not subject to change due to participation in the HVBP Program. These included hospital ownership (government owned, voluntary nonprofit, or proprietary), geolocation (large urban, other urban, or rural) and socioeconomic characteristics within a ten-mile radius of the hospital (White population, Black population, Hispanic population, number of males and females, and average household income).
As Figure 1 shows, the overlap of the control group’s propensity scores is significantly better with scores of the matched treated group of HVBP hospitals than with scores of all HVBP hospitals. Figure 2 presents the comparison between the three groups of hospitals from 2011 to 2015 on clinical outcomes (30-day mortality rates), cost efficiency (cost per discharge), clinical process (conformance quality), and three dimensions of patient satisfaction (overall experience, nurse communication, and pain experience). This comparison demonstrates the bias that can afflict findings from studies that compare all the HVBP hospitals to a control group of hospitals.

Propensity scores before and after matching.

Comparison of quality measures between Maryland hospitals, matched HVBP hospitals, and all HVBP hospitals
Table 1 presents a detailed comparison of various characteristics of both treatment and control groups as well as all HVBP hospitals in fiscal year 2011. We performed t-tests to see whether the mean values of variables differed significantly between the matched treatment group and the control group on one hand and between the entire treatment group and the control group on the other hand. Results suggest that hospitals in the matched treatment group, compared with all hospitals in the treatment group, are more similar to hospitals in the control group. This further validates the importance of obtaining a matched group of hospitals before making a comparison with a control group of hospitals. Doing so helps reduce selection bias and strengthen causal arguments. Table 2 shows descriptive statistics for the performance measures used by the HVBP Program for the three groups (control group, the entire treatment group, and the matched treatment group).
Results of t-Tests Comparing Control Group of Maryland Hospitals with Matched Treatment Group of HVBP Hospitals and All HVBP Hospitals (Fiscal Year 2011).
a Values are in thousands.
Descriptive Statistics of Performance Measures (FY 2011).
a Values are from 0 to 100.
b Values are in hundreds.
Model Estimation and Results
We performed a difference-in-difference estimation for the effectiveness of the HVBP Program using the following specification:
where
We estimated this model for each performance measure to compare both the matched group of HVBP hospitals and the set of all HVBP hospitals with the control group. All the models are identified.
Results from both estimations show that patient perception of pain management is the only quality measure that showed a consistent significant improvement in HVBP hospitals (βmatched = 1.46, p < .01; βall = .77; p < .05; see Table 3). There is no significant difference in any other quality measure across the two groups of hospitals in matched samples.
Results from Difference-in-Difference Model Estimation.
**p < .05, ***p < .01 (two-tailed).
Overall, our results suggest that out of four broad quality measures utilized in the HVBP Program, there was no significant improvement in clinical processes, clinical outcomes, or cost efficiency when compared with control group of hospitals located in Maryland, which did not participate in the program. In patient satisfaction too, the only factor that showed significant improvement was patient perception of pain management during hospitalization.
Parallel Trends Assumption in Difference-in-Difference Framework
The difference-in-difference framework assumes that, in the absence of treatment, the average change in the response variable would have been the same for both the treatment and control groups (parallel trends). As pointed out by Ashenfelter (1978), one concern in a difference-in-difference study is that there is often a “dip” in outcome (e.g., earnings, employment) in the period before the treatment. For example, people who lose their jobs join the treatment group, whereas people who do not lose their jobs are in the control group. A pretreatment “dip” or “trend” that is unique to the treated units would lead to biased estimates. To test for this assumption, we applied the method proposed by Autor (2003) by including leads and lags in the estimation framework. If the coefficient for a lag variable is significant in previous years, it shows that there is a change in slope for the units that are about to become treated, which is a sign of violation of parallel trends assumption. The estimation framework is specified as follows:
where
and other covariates and error terms are the same as in the main model.
We estimated this model to compare the matched group of HVBP hospitals with the control group. Results from estimations in Table 4 show that the parallel trends assumption is indeed valid in the data set, as the coefficient for lag variable is not statistically significant.
Results for Parallel Trends Assumption Testing.
**p < .05 (two-tailed).
Robustness Check
We conducted a robustness check for our findings. We obtained matched groups of HVBP hospitals using other propensity score matching methods and reestimated our models using these matched groups. The nearest-neighbor matching method is a “greedy” method, in which the closest control unit for each treatment unit is chosen one at a time, without trying to minimize the global distance measure. Thus, one could argue that the matched group of hospitals may still differ significantly from the control group in underlying characteristics.
We used two other propensity score matching methods—optimal matching and genetic matching—to obtain matched group of hospitals and reestimated our model. The optimal matching method locates the matched units with the smallest average absolute distance across all the matched pairs. It can be particularly useful when there may not be an appropriately matched control unit for a treatment unit. Genetic matching, in contrast, is a general multivariate matching method that automates the process of finding a good matching unit. It is a generalization of propensity score and Mahalanobis distance matching. The idea is to use a genetic search algorithm to find a set of weights for the covariates to maximize the balance between matched treatment and control units. The main advantage of this method is that it optimizes the covariate balance directly. We used the same set of characteristics as in the nearest-neighbor method to match the groups in optimal and genetic matching. Next, we used the two matched treatment groups along with the control group of hospitals in Maryland to repeat difference-in-difference analysis using the same set of covariates. The results (Table 5) are consistent in showing that the only measure that significantly differs between the two groups is patient perception of pain management.
Results from Difference-in-Difference Model Estimation Using Matched Treated HVBP Hospitals vs Control Group.
**p < .05, ***p < .01 (two-tailed).
Discussion
In this article, we studied the effectiveness of the HVBP Program at improving patient satisfaction. We collected data over multiple years and employed propensity score matching to obtain a matched treatment group of HVBP hospitals to compare with the control group of hospitals in Maryland. Then we utilized difference-in-difference estimation framework to determine whether the HVBP Program actually led to improvement in patient satisfaction at the treatment group of hospitals compared with control group of hospitals. Our findings show that the only dimension of patient satisfaction that showed significant improvement is patient experience with pain management during hospitalization. Other components of the payment formula—clinical processes, clinical outcomes, and cost efficiency—showed no significant improvement under the HVBP Program. These findings are broadly consistent with several other studies that have failed to show any improvement in quality measures after HVBP introduction.
Drawing on the results of our study, we suggest two divergent paths for CMS to follow. One, CMS could reinclude pain management in the HVBP payment formula. In fact, the redesigned pain management questions that CMS used in 2019 seem suitable for reinclusion in the formula: they have no apparent link to prescription of a painkiller. 2 These questions can remove any perceived pressure on physicians to prescribe opioids and allow them to choose the best option for a patient in their particular situation. The best option can be nonpharmaceutical, a nonopioid pharmaceutical, or even an opioid (Tefera and Lehrman 2016). In addition, to eliminate any potential association between even these new questions and opioid prescriptions, CMS could separately track opioid prescriptions at each hospital. Given that the rates of fatalities due to opioid overdose vary markedly by state (Volkow et al. 2019), a one-size-fits-all decision of removing the pain management questions anyway may not be optimal.
Alternatively, CMS should consider completely removing patient satisfaction measures from the HVBP Program. Doing so would enable hospitals to focus their resources and attention back on clinical processes and outcomes. It may also deliver cost savings for both participating hospitals and CMS by eliminating the costs of administering the survey and analyzing responses from more than 3 million patients each year. Critics have argued that the HVBP Program lacks design features of a successful P4P program. It should focus on a small number of high-value measures to motivate clinicians to engage in good practices and have a simple-enough design for hospitals and clinicians to know how they are doing. The clinical outcomes and patients’ functional status are good choices for measures that can be included or retained in the payment formula (Jha 2017). Given its ineffectiveness at improving almost any health measure, HVBP could increase the stakes for hospitals by increasing the performance penalty or bonus amount to 5% to 10% of total Medicare payments of the hospital. That may be one way to focus the hospital’s attention at improving health measures (Jha 2017).
Limitations
Although our study helps explore the effectiveness of the HVBP Program and the importance of pain management questions in HCAHPS survey, it has a few limitations. First, HCAHPS measures are subjective by nature and may not be able to differentiate between actual differences in care delivery versus differences in patients’ expectations and perceptions. Second, the control group is small and contains only 40 hospitals. Although several scholars have demonstrated that group size does not necessarily affect propensity score matching (Dehejia and Wahba 2002; Pirracchio, Resche-Rigon, and Chevret 2012; Stone and Tang 2013) future research might consider an alternate set of hospitals such as critical access hospitals as the control group. Third, our analysis is at the hospital level, although almost all the data on health measures exists at patient level. Future research should utilize analytics tools to conduct this analysis at the patient level to obtain deeper insights into the role of HVBP in improving health measures and in aggravating the opioid crisis. Fourth, CMS has published the prescription data on opioids from 2013 to 2016. We did not integrate this data set in our study, but future research should determine whether this bears out the suspected link between score on pain management in HCAHPS survey and opioid oversubscription. Finally, one of our policy recommendations is to remove the patient satisfaction measure from HVBP because it showed no improvement other than in pain management perception. However, doing so may negatively affect reimbursements of hospitals that have a higher “achievement” score on patient satisfaction, even though they may not have improved their performance on this measure. Given that the ultimate objective of the HVBP Program is to push hospitals to improve their quality of health care, there is an argument for this course of action. However, one could argue that these hospitals should be rewarded if they are performing better when compared with other hospitals. Future research should explore the issue of “achievement” versus “improvement” scores because different measures in the HVBP formula may serve the ultimate objective of the program better using one score or the other.
Footnotes
Special Issue Guest Coeditors
Brennan Davis, Dhruv Grewal, and Steve Hamilton
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
