Abstract
Traditionally, risk-adjustment models do not address the characteristics of minority populations, such as race or socioeconomic status. This study aimed to evaluate the added value of place-based social determinants on risk-adjustment models in explaining health care costs and utilization. Statewide commercial claims from the Maryland Medical Care Database were used, including 1,150,984 Maryland residents aged 18 to 63 with ≥6 months enrollment in 2013 and 2014. Area Deprivation Index (ADI) was assigned to individuals through zip code. The authors examined the addition of ADI to predictive models of concurrent and prospective costs and utilization; linear regression was adopted for costs and logistic regression for utilization markers. Performance measures included R2 for costs (total, pharmacy, and medical costs) and the area under the curve (AUC) for utilization (being top 5% top users, having any hospitalization, having any emergency room [ER] visit, having any avoidable ER visit, and having any readmission). All performance measures were derived from the bootstrapping analysis with 200 iterations. Study subjects were ∼48% male with a mean age of ∼41 years. Adding ADI to the demographics or claims-based models generally did not improve performance except in predicting the probability of having any ER or any avoidable ER visit; for example, AUC of avoidable ER visits increased significantly from .610 to .613 when using ADI rank deciles in claims-based models. Future research should focus on patients with a higher need for social services, assess more granular place-based determinants (eg, Census block group), and evaluate the added value of individual social variables.
Introduction
Traditionally, risk-adjustment models have relied on individual-level variables to predict and explain health care costs and utilization. Individual-level data enable risk models to be used for risk-adjusted payment, 1 high-risk identification, 2 resource allocation, 3 and various other population health management tasks. Individual-level variables range from interviews and questionnaires to insurance claims data, 1 –4 electronic health records (EHRs), 5 –7 and vital signs. 7 However, these current models do not address the characteristics of minority populations (ethnic/racial minorities or low socioeconomic status [SES] subpopulations, for example) because they are based on clinical factors.
Impact of individual-level SES variables on risk of health care costs and utilization have been explored to some extent, but these studies usually have a very small sample size, given that such information is based on limited small surveys and not available on a large scale 8,9 ; therefore, using aggregated information from publicly available sources to construct community-level SES measures may be an alternative method to improve risk stratification and population health management efforts. 10 –12
Place-based determinants of health (ie, characteristics of the neighborhoods of patients' residence) are powerful drivers of morbidity, mortality, and future well-being, yet they mostly remain outside the conventional medical care delivery systems. 13 Indeed, modifiable place-based exposures, along with behavioral factors, play a significant role in 60% of preventable deaths in the United States. 14 Despite the United States having the highest per capita medical expenditures of any country, limited investments in these nonmedical services might be a factor in US health indicators lagging behind other high-income nations. 15 Investment in place-based determinants could improve health outcomes and reduce the cost of services. 16 For instance, a reduction of preventable hospitalization among residents of low-income neighborhoods in the United States to the level of those living in high-income neighborhoods would lead to ∼500K fewer hospitalizations per year, saving $3.6 billion in hospitalization costs anually. 16
Several studies have evaluated the linkage between place-based determinants and health care outcomes. One study showed that, after controlling for patient-level characteristics, veterans residing in census tracts with the higher neighborhood socioeconomic status (NSES) index experienced decreased odds of hospitalization. 17,18 Another study showed that patients living in high-poverty neighborhoods were 24% more likely than others to be readmitted, after adjusting for demographics and clinical conditions. 19 However, few studies, if any, have investigated the added value of place-based determinants on the performance of claims-based risk-adjustment models in predicting health care costs and utilization. One study showed that when added to clinical and utilization variables derived from EHRs, place-based determinants did not improve predictive performance for any health or utilization outcome. 20
This study adopted the Area Deprivation Index (ADI) as the instrument for place-based determinants, to evaluate its added value on claims-based risk-adjustment models predicting a range of utilization and cost outcomes. This study compared the performance of claims-based risk-adjustment models in predicting health care costs (ie, total costs, medical costs, pharmacy costs) and utilization (ie, being top 5% top users, having any hospitalization, having any emergency room [ER] visit, having any avoidable ER visit, and having any readmission within 30 day of discharge) with and without variables representing place-based determinants of health for Maryland residents and nonresidents who have a private insurance contract in Maryland.
Methods
This study employs a 2-year retrospective design using 2013 independent variables to predict concurrent (2013) and prospective (2014) outcomes. This study was approved by the institutional review board of the Johns Hopkins University.
Data source
The data set for this study is the Maryland Medical Care Database (MMCD) provided by the Maryland Health Care Commission (MHCC). 21 MHCC is the Maryland regulatory agency that plans for health system needs, and MMCD contains claims data for Maryland residents and nonresidents who have a private insurance contract in Maryland, including patient enrollment/demographics, professional, institutional, and pharmacy claims. In 2016 the Supreme Court of the United States ruled that the Federal Employee Retirement Income Security Act of 1974 preempts statutes of states requiring reporting, which allowed employer-sponsored health insurers to opt out of submitting data to MHCC 22 ; therefore, claims data from MHCC may not be complete starting in late 2015 (ie, a drop of ∼2 million individuals in the number of self-insured enrollees from 2014 to 2015 in the data set). 22 Thus, the analyses in this study were restricted to claims data from 2013 and 2014 only.
Study subjects
The research team started with 8,403,458 enrollees who had any record in the data set between 2012 and 2016. The team applied the following inclusion criteria and derived the eligible sample of 1,168,204 (13.90%) members who were: (1) between ages 18 and 63 years in 2013: 5,690,026 (67.71%); (2) with known sex: 8,403,459 (100.00%); (3) with Maryland residence in both 2013 and 2014 (the determination of the residence through zip code will be explained in the following section): 2,885,147 (34.33%); (4) with ≥6 months of medical enrollment in 2013 and 2014: 3,606,675 (42.92%); and (5) with ≥6 months of pharmacy enrollment in 2013 and 2014: 2,083,768 (24.70%). Furthermore, the analyses were restricted to zip codes with ≥1000 residents to exclude nonresidential areas and potential skewed sampling of populations residing in a zip code; hence, the final sample size was further narrowed to 1,150,984 subjects.
Zip code assignment
To attach geographic-based data, it is necessary to identify a patient's residential location. MMCD contained the zip code as the smallest geographic unit, which is common in claims data (as a Health Insurance Portability and Accountability Act-limited variable). Residential zip code can be extracted and attached to a patient in various ways from claims data. The research team chose 2 different methods for this extraction and assignment of residential zip code. In both methods, the time period of analysis was 1 calendar year.
The most frequently occurring residential zip code based on claims data
Zip code of residence is often supplied on claims in addition to enrollment. This can indicate the patient's zip code of residence at the time of service. Pharmacy claims did not contain zip codes, so only physician and facility claims were used. Using the zip code at time of service from the claims, the research team counted the number of distinct dates of service for each zip code and chose the non-missing zip code with the highest number of service dates. This was assigned as the most frequently occurring zip code. Patients who were enrolled but had no services in a year would not be assigned a zip code.
The most recent residential zip code in a year based on enrollment data
Residential zip code is supplied on enrollment records. These records usually specify an enrollment period with beginning and ending dates of enrollment. Using these records, the team selected the zip code from the record with a non-missing zip code on the latest (most recent) beginning date of service. The team chose to use the begin date, as sometimes the end date is left missing or set to some arbitrarily large value, such as December 31, 2099, to indicate that a patient's enrollment has not yet terminated. The team assigned this as the most recent residential zip code for the year.
The team first assigned an individual's zip code based on the most frequently occurring zip code; if such zip code was missing, the most recent zip code based on enrollment would be used.
Independent variable: ADI
ADI is a validated measure of community disadvantage calculated at multiple geographical levels. 23 ADI allows for rankings of neighborhoods by the socioeconomic disadvantage in a region of interest. ADI includes factors for the domains of income, education, employment, and housing condition; ADI can be used to inform health delivery and policy, especially for patients in the most disadvantaged neighborhoods.
The research team calculated ADI through the following processes: (1) 17 ADI grouped variables were composed from 71 independent census variables using 5-year estimate American Community Survey data of 2013; (2) 17 ADI weighted components were summed up as total scores using the Singh et al methodology 24 ; and (3) total scores were scaled to have a mean of 100 and a standard deviation of 20. ADI raw scores were constructed for 52 states (ie, continental states, Alaska, Hawaii, Puerto Rico) at the zip code level. Then, ADIs were sorted and ranked according to their values across the nation, and percentiles were assigned as national ranks to zip codes in Maryland to capture more socioeconomic variation.
Outcomes
The research team generated 3 types of costs and 5 utilization markers in the concurrent (2013) and prospective (2014) calendar year at the individual level. Annual costs included total costs, medical costs (non-pharmacy costs), and pharmacy costs. Annual total cost was the sum of paid and out-of-pocket amounts derived from both medical and pharmacy claims, while annual medical and pharmacy costs were derived from their respective claims. All costs were truncated at the respective top .5% of all positive costs to minimize the impact of outliers. Five utilization markers included being top 5% top users, having any hospitalization, having any ER visit, having any avoidable ER visit, and having any readmission within 30 days of discharge. The avoidable ER visit was defined as having the type of ER visit flagged as “nonemergent,” “emergent, primary care treatable,” or “emergent, ED needed, potentially avoidable.” 25 All outcomes were calculated using the Johns Hopkins Adjusted Clinical Groups (ACG) System. 25
Control variables
The team included 2 demographic variables (ie, age, sex) and 1 morbidity measure in the models as control variables. Age was divided into the following 5 categories: 18–24, 25–34, 35–44, 45–54, and 55–63 years; age was restricted to 63 in 2013 given that people aged 64 in 2013 would enter Medicare in 2014 and their claims information would not be represented in MMCD. The morbidity measure used is the DxRx-PM score, derived from the ACG system. 25,26 The ACG system provides various measures of an individual's morbidity using diagnosis and/or pharmacy claims. The DxRx-PM score is a comprehensive diagnosis-based predicted score constructed from various ACG morbidity metrics, including age group, sex, selected diagnosis-based morbidity markers (ACGs and ADGs [Adjusted Diagnosis Groups]), National Drug Code-based morbidity markers (RxMGs [Pharmacy-Defined Morbidity Groups]), a pregnancy without delivery indicator, hospital dominant markers (factors associated with ≥50% of hospital admissions in the next year), a medically frail indicator, and selected chronic disease markers. The DxRx-PM score has been demonstrated to be a valid measure of morbidity. 5,7,25
Statistical analyses
The team first described the characteristics of the study sample, and then built regression models to explore the additive value of adding ADI into the models.
For descriptive analyses, the team presented the mean and standard deviation for continuous variables, and percentage of frequency for binary/categorical variables. In regression analyses, the team built 2 base models: a demographics-only model and a demographics plus morbidity model. Three different sets of ADI variables were added separately to the base models to evaluate their impact on outcomes: (1) ADI national percentile as a continuous variable; (2) ADI national decile as a categorical variable; and (3) Five components of ADI index, which contribute to more than 90% of variation of the ADI (ie, median family income, income disparity, median home value, median gross rent, median monthly mortgage). In total, the team constructed 2 base models and 6 ADI-enhanced models for each outcome. Linear regression was adopted for the cost outcomes, which is the standard approach in various studies and has been shown to produce similar performance relative to more advanced statistical methods. 1,27 –29 R2 and mean absolute prediction error (MAPE) were compared between the base and the ADI-enhanced models. 1,5,7 For the utilization outcomes, the team employed logistic regression and compared the area under the curve (AUC) and Akaike's Information Criterion (AIC) between the base and ADI-enhanced models. All performance measures and 95% confidence intervals (CIs) were derived from the bootstrapping analysis with 200 iterations.
MAPE is the average of the absolute value of all residuals across all observations, and often is used as a comparable measure across different types of costs; such value was divided by the corresponding average cost and presented as a percentage. The smaller the MAPE, the better the model. AIC is a technique to estimate the likelihood of a model to predict the outcome based on the sample the model uses, and is designed to pick the model that generates the probability distribution that is the closest to the true distribution. 30,31 The smaller the AIC, the better the model.
Results
Among 1.15 million study subjects, ∼48% were male and the mean age was 41.2 years old; more than one quarter were between 45–54 years old while less than 14% were between 18–24 years old (Table 1). The rescaled ACG DxRx-PM score was slightly greater than 1, suggesting this sample had higher comorbidity than the general population. The average ADI national rank percentile was 47.3, suggesting that on average the sample lived in an area a bit more advantageous than the average American. Less than 9% lived in the bottom 10% socioeconomic areas while ∼11% lived in the top 10% socioeconomic areas. Average total health care costs in both years were about $2500, ∼$1600 of which were medical costs and ∼$900 were pharmacy costs. Percentages of hospitalizations, ER visits, avoidable ER visits, and 30-day readmissions are shown in Table 1.
Characteristics of the Study Sample
ACG, Adjusted Clinical Groups; ADI, Area Deprivation Index.
Base model 1 included age and sex only and did not explain any type of cost concurrently or prospectively. Model's R2 was between .02 (for pharmacy cost) and .04 (for medical costs) throughout the years. Adding ADI national rank percentiles (continuous variable), adding ADI national rank deciles (categorical variable), or adding the 5 components of ADI in the model had minimum impact on the R2. Base model 2 included age, sex, and the DxRx-PM score and explained costs better concurrently than prospectively. For example, the R2 reduced from .643 (95% CIs: .639–.648) concurrently to .330 (95% CIs: .326–.334) prospectively for total costs. Again, adding ADI national rank percentiles, ADI national rank deciles, or the 5 components of ADI to the model had minimum impact on the R2 (Table 2). Similarly, no improvement of the base model 1 or 2's performance was found after adding 3 variants of ADI variables for any type of concurrent or prospective costs using the MAPE performance measure (Table 3).
Impact of Adding Area Deprivation Index Variables on Risk-Adjustment Models (Measured by Adjusted R2 and Its 95% Confidence Intervals)
ADI, Area Deprivation Index.
Impact of Adding Area Deprivation Index Variables on Risk-Adjustment Models (Measured by Adjusted Mean Absolute Prediction Error and Its 95% Confidence Intervals)
ADI, Area Deprivation Index.
Concurrently and prospectively, both base models predicted being in the top 5% of health care users, having any inpatient service, or having any 30-day readmission better than having any ER visit or any avoidable ER visit. AUC of base model 1 ranged between .60 and .65 and that of base model 2 was between .73–.99 for the former 3 utilization markers. AUC of base model 1 was between .55 and .60 and that of base model 2 was between .60 and .65 for 2 ER visit-related markers. Adding ADI national rank deciles (categorical variable) had the highest impact while adding ADI national rank percentiles (continuous variable) had the smallest impact on all 5 utilization markers in both models. Across all 5 markers, adding ADI had the highest impact on 2 ER visit utilization markers than the other 3 utilization markers. For example, adding ADI national rank deciles to base model 2 statistically significantly increased AUC of predicting any avoidable prospective ER visit from .610 (95% CIs: .609–.612) to .613 (95% CIs: .611–.15), while doing so had no impact on predicting any prospective hospitalization (AUC of base model 2: .729, 95% CIs .726–.732; AUC of the base model 2 with ADI: .729, 95% CIs .726–.732) (Table 4). Similar patterns can be found for the performance measure of AIC (Table 5).
Impact of Adding Area Deprivation Index Variables on Risk-Adjustment Models (Measured by Area Under the Curve and Its 95% Confidence Intervals)
ADI, Area Deprivation Index; ER, emergency room; IP, inpatient hospitalization.
Impact of Adding Area Deprivation Index Variables on Risk-Adjustment Models (Measured by Akaike's Information Criterion and Its 95% Confidence Intervals)
ADI, Area Deprivation Index; ER, emergency room; IP, inpatient hospitalization.
Discussion
This study examined the added value of place-based determinants of health on demographics and claims-based risk adjustment models in explaining health care costs and utilization, using Maryland residents and nonresidents who have a private insurance contract in Maryland as an example. This study found that adding ADI into the demographics or claims-based risk adjustment models generally does not improve model performance across outcomes, except for predicting the probability of having any ER visit or having any avoidable ER visit.
Various studies have examined the differences in health care utilization of people residing in regions with various neighborhood characteristics. 32,33 For instance, Hu et al assessed various socioeconomic factors influencing readmissions within 30 days after discharge in an urban teaching hospital and showed that those patients living in high-poverty neighborhoods had 24% more readmission risk than other patients. 19 Also, Nagasako et al showed that SES data would narrow down the range of observed variations in readmission rates. 34 However, these studies differed from the present study, which focused on the impact of neighborhood characteristics on the performance of risk-adjustment models.
The only similar study the research team could identify did not find neighborhood characteristics contributing to risk prediction beyond demographic (eg, age, race) and clinical data in the EHR. 20 However, the present study incorporated risk factors from claims data instead of EHR data. In addition, this study examined a broad range of outcomes including 3 type of costs and 5 types of utilization. Furthermore, this study included 1.15 million people from Maryland, which is much larger than the 90K participants from a single county. Thus, the present study adds considerably to the limited literature on the impact of incorporating place-based determinants of health into risk-adjustment models.
Several composite measures have been developed to address the impact of place-based determinants of health on utilization. For instance, the Social Vulnerability Index developed by the Centers for Disease Control and Prevention addresses the capacity of different neighborhoods in case of natural disasters. 35 The NSES Index addresses SES characteristics of veterans' neighborhoods and their impact on mortality and other health outcomes. 36 Although overlaps exist among these composite measures (eg, similar census variables), the present study found ADI to be the most comprehensive measure that includes variables ranging from SES to housing issues of a neighborhood and is relatively easy to construct from publicly available census data. Furthermore, developing ADI's national rank helps to compare neighborhoods across the country at the most granular level, providing a higher level of variation for analysis.
Multiple challenges should be addressed to systematically identify and respond to the place-based determinants of health that have the highest impact on health care utilization. Most US health care systems lack analytic frameworks to incorporate information about patients' social and behavioral risks and neighborhood characteristics into clinical or organizational decision-making. They also lack access to place-based determinants of health data sources in the electronic databases for each subdomain of social and behavioral factors on a patient level, for subpopulations, or for neighborhoods at large. 37
The only significant impact of adding ADI, representing place-based determinants, on the performance of risk-adjustment models was on predicting whether an individual had an ER visit, especially an avoidable ER visit. It can be hypothesized that patients living in neighborhoods with lower SES were less likely to be able to afford seeking health care in the regular setting, given the lack of comprehensive health insurance and the cost burden; therefore, they were more likely to use the ER as their regular source of health care. Besides, the actual impact of adding place-based determinants was on “avoidable” ER visits, representing a visit with conditions or symptoms that can be treated in a regular outpatient visit; thus, it further showed that the utilization of ER services for patients in neighborhoods with low SES was a replacement for regular health care services. 38
One possible explanation for the lack of impact of adding place-based determinants to risk-adjustment models is that the denominator in the study included patients with varying degrees of risk for health care utilization. For healthy subjects who generally do not require health care services, bringing in more information would yield limited improvement in the ability of risk-adjustment models to predict their health care costs and utilization. However, for those with higher risks, more information may lead to better prediction of their health care costs and utilization. Future research should focus on high-risk patients and explore the usefulness of additional variables, such as those representing place-based determinants, in predicting their health care costs and utilization.
This study had several limitations. First, the research team was only able to identify a patient's residence at the zip code level. Using data on a zip code level in this study potentially masked the larger impact of ADI on outcomes, considering that each zip code included a population with a wide range of ADI. Using data available in smaller geographic units, such as a census tract or block group, may improve the performance of claims-based risk adjustment models in explaining/predicting outcomes. Second, the team used ADI to represent place-based determinants in a region while other place-based indicators could be deemed as viable substitutes. Even though the team did not find a high impact of ADI in this study, other studies that adopt different place-based determinants of health, study designs, or data sets may yield different findings. Future studies should assess place-based determinants that are more relevant to health care utilization and costs (eg, distance to health care providers). Third, although a statistically significant improvement was found in models predicting ER admission using ADI, the small improvements in the effect size may not carry clinical implications. Fourth, the team did not include individual-level SES in the study given the lack of such information in the claims data. Other studies should compare the impact of individual-level and neighborhood SES on the performance of risk-adjustment models. 9 Fifth, the team only had access to commercial claims data while the availability of EHRs (and the possibility to extract social determinants of health from EHR free text 39,40 ) is increasing and can potentially be included in future studies. Sixth, the team did not consider the situation that a patient's zip code might change in a year. Lastly, these results are only applicable to commercial populations while people with more social determinants of health challenges are usually enrolled in Medicare or Medicaid; further studies focusing on Medicare or Medicaid enrollees would provide more information on the impact of social determinants of health on the performance of risk-adjustment models.
Conclusion
Adding ADI to claims-based risk-adjustment models improves the ability of risk-adjustment models to predict the probability of having any ER visit or having any avoidable ER visit. Future research on risk-adjustment models should focus on patients with a higher need for social services (eg, Medicaid patients), assess more granular place-based determinants (eg, census block group), and evaluate the added value of individual social variables instead of a composite index such as ADI.
Footnotes
Authors' Note
The data that support the findings of this study are available from the Maryland Health Care Commission, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. However, data are available from the authors upon reasonable request and with permission of the Maryland Health Care Commission.
Author Disclosure Statement
This manuscript has been prepared by faculty and staff at The Johns Hopkins University (JHU). The manuscript also references the Adjusted Clinical Groups (ACG) system. JHU holds the copyright to the ACG System and receives royalties from the global distribution of the ACG system. The authors are members of a group of researchers who develop and maintain the ACG System with support from JHU. Dr. Chang is a paid part-time consultant in Monument Analytics, a health care consultancy whose clients include the life sciences industry as well as plaintiffs in opioid litigation. This arrangement has been reviewed and approved by Johns Hopkins University in accordance with its conflict of interest policies.
Funding Information
No funding was received for this study.
