Abstract
Health plans develop predictive models to predict key clinical events (eg, admissions, readmissions, emergency department visits). The authors developed predictive models of admissions and readmissions for a quality improvement organization with many large government and private health plan clients. Its membership and authorization data were used to develop models predicting 2019 inpatient stays, and 2019 readmissions following 2019 admissions, based on patients' age and sex, diagnoses identified and procedures requested in 2018 authorizations, and 2018 admission authorizations. In addition to testing multivariate models, risk scores were calculated for admission and readmission for all patients in the model. The admissions model (C = 0.8491) is much more accurate than the readmissions model (C = 0.6237). Measures of risk score central tendency and skewness indicate that the vast majority of members had little risk of hospitalization in 2019; the mean (standard deviation) was 0.042 (0.074), and the median was 0.018. These risk scores can be used to identify members at risk of admission and to support proactive risk management (eg, design of health management programs). Different risk thresholds can be used to identify different subsets of members for follow-up, depending on overall strategy and available resources. This model development project was novel in employing authorization data rather than utilization data. Advantages of authorization data are their timeliness, and the fact that they are sometimes the only data available, but disadvantages of authorization data are that authorized services are not always actually performed, and diagnoses are often “rule out” rather than final diagnoses.
Introduction
Predictive modeling of hospitalizations and rehospitalizations: purpose
Health plans often develop predictive models to predict and, if possible, manage key clinical events (eg, admissions, readmissions, emergency department [ED] visits). In this paper, the authors report on their effort to develop predictive models of admissions and readmissions for a quality improvement organization with many large government and private health plan clients.
Literature
There is a rich literature on predictive modeling of hospital admissions. The current research builds on earlier research by 3 of the current authors (Crawford, McAna, and Novinger), 1 which focused on Medicaid members and found that the strongest predictors of hospitalization were age >65 years, prior inpatient stays, and higher Charlson comorbidity index scores. The C statistics obtained generally ranged from 0.79 to 0.81, but some reached 0.86. The results specifically showed that increased risk scores predict hospitalization. McAna and colleagues concluded that they could target Medicaid members with a high probability of future avoidable hospitalizations for case management and other interventions.
Given the highly advanced state of the art of predictive modeling, the most outstanding recent article is one by the Society of Actuaries 2 that addresses the predictive accuracy of the commercial claims-based risk scoring models available in the marketplace. Another study is especially relevant to the present work: the retrospective cohort study performed by Lemke and colleagues 3 using a US health plan claims database, including annual person-level files with demographic, morbidity, and utilization measures, to develop and validate a model using separate subsets of the data. These authors obtained risk factors and outcome variables from administrative claims data using the Adjusted Clinical Group (ACG) system, and fitted models using multivariate logistic regression. Among their results were that 3.2% of patients had a hospitalization during a 1-year period, and 20% of patients who had been hospitalized during the previous year were rehospitalized. Effect sizes of risk factors were modest, with odds ratios generally <1.5; however, odds ratios >1.5 were found for some variables: age >80 years, ≥3 prior hospitalizations, ≥3 ED visits, some of the 20 ACG morbidity categories tested in the model, and some of the 40 diseases included: high impact neoplasms, bipolar disorder, cerebral palsy, chromosomal anomalies, cystic fibrosis, and hemolytic anemia. ACG hospitalization model performance was good, with a C statistic of 0.80, compared to a C statistic of 0.75 for a model based on prior hospitalization, and 0.78 for a model based on the Charlson comorbidity index. These authors concluded that “A validated population-based predictive model for hospital risk estimates individual risk for future hospitalization. The model could be useful to health plans and care managers.”
However, it is noteworthy that the authors of the present study could find no literature on predictive modeling of hospital admissions using authorization data.
There is considerable recent literature on predictive modeling of hospital readmissions, focusing on a wide range of topics. For example, a study by Jamei et al 4 predicting all-cause risk of 30-day hospital readmission using artificial neural networks; a literature review by Jeffery et al 5 drawing implications for nurses' roles in population health management; and a study predicting re-presentations to the ED by Ahn et al. 6 Kansagara and colleagues 7 sought to review readmission risk prediction models and assess their performance and capacity for clinical or administrative use, given interest in predicting hospital readmission risk to identify which patients could benefit from care transition interventions. In 2016, Zhou et al conducted a similar systematic review and found the performance of existing models to be inconsistent. They found the results to be strongly affected by both the specific definitions of the outcomes and the data sources used to develop the models. 8
Tabak and associates 9 developed an early readmission risk predictive model using automated clinical data available at admission, and then evaluated an administrative data-enhanced model by adding principal and secondary diagnoses and other variables. Their results were that, starting with a basic model with a C statistic just under 0.70 (0.697), adding administrative variables to the model yielded an improvement (to a C statistic of 0.722). Tabak et al concluded that automated clinical data can generate a readmission risk score early during a hospitalization that can be used to facilitate early care transitions.
Also, Smith and colleagues 10 performed a secondary analysis to determine clinical and patient-centered factors predicting nonelective 90-day readmissions of patients discharged with diabetes mellitus, congestive heart failure, and/or chronic obstructive pulmonary disease (COPD). Their results were that, of 1378 patients discharged, 23.3% were readmitted; after controlling for hospital and intervention status, risk of readmission increased if the patient had more hospitalizations and ED visits in the prior 6 months, higher blood urea nitrogen, lower mental health function, a diagnosis of COPD, and increased satisfaction with access to emergency care assessed on the index hospitalization. Smith and colleagues concluded that both clinical and patient-centered factors identifiable at discharge are related to nonelective readmission, and that these factors identify high-risk patients and provide guidance for future interventions.
Additionally, Nguyen and coauthors 11 conducted an observational cohort study to develop an all-cause readmissions risk-prediction model incorporating electronic health record (EHR) data. They concluded that, although including EHR data from the full hospital stay modestly improves prediction of 30-day readmissions, many other factors influence readmissions that they could not measure or account for.
As was the case for admissions, the authors of the present study could find no literature on predictive modeling of hospital readmissions using authorization data.
Study aim
The purpose of this research was to develop models to predict inpatient stays in 2019 and, following inpatient stays in 2019, readmissions in 2019, based on demographic characteristics of patients (ie, age, sex), admission authorizations in 2018, and diagnoses identified and procedures requested on authorization submissions in 2018. These predictions took the form of risk scores/probabilities of the events of interest for each individual whose data were included in the model.
Methods
The population studied included members covered by plan contracts administered by Kepro, an organization that provides services to enhance the outcomes of government and commercial health care programs, in 2018 who had a procedure/service requested during 2018 and whose data were available in Kepro's utilization review database. To be included in the population, members had to have active plan eligibility during 2018 and have at least 1 utilization management record within Kepro's medical management application. Kepro's Health Intelligence team provided data for 211,317 distinct members across 24 contracts.
Variables
The 2 outcome variables of interest were hospitalizations in 2019 and 30-day readmissions in 2019 given an initial admission in 2019. Candidate predictor variables in the model included age, sex, hospitalization authorization in 2018, and 20 diagnoses/comorbidities and 17 procedures identified in 2018.
Data source, variables, and data analysis
To guide the design and testing of the predictive models, the initial analytic data file was split randomly into 2 equal sized files: the training sample and the test sample.
The 2 main models – admissions and readmissions – were designed and tested using logistic regression analyses. The stepwise command was used to identify the most useful explanatory variables. Hospitalization risk scores were generated. All analyses were performed using SAS Release 9.4 (SAS Institute Inc., Cary, NC). The model was specified in a prospective manner (SAS Proc Logistic provides the option to produce probabilities (risk scores) for each individual; this procedure also includes code for saving the parameter estimates generated to a separate data set.)
Twenty diagnoses were categorized using the CCS Grouper 12 and were tested in the model (Supplementary Table S1). Seventeen procedures were included and tested in the model (online Supplementary Table S1). Assessment of the methodology against the TRIPOD (transparent reporting of multivariable prediction model for individual prognosis or diagnosis) criteria shows strong adherence to these criteria. 13
The authors affiliated with the Jefferson College of Population Health received approval of this research project from the Thomas Jefferson University Institutional Review Board.
Results
Description of key variables
Table 1 shows frequency distributions for the demographic variables – age and sex – and also plan type and admission and readmission (while the authors review these results for the training sample, please note that the results are virtually identical for the test sample). The mean age of sample members was approximately 43 years. The sample was almost evenly split between males and females. In this study population Medicaid beneficiaries constitute close to 80% of all members; the balance of the population are Commercial members. And, in this population the “prevalence” of hospitalizations was more than 4%, and the prevalence of readmissions was less than 1%.
Frequency Distributions, Training Sample, and Test Sample
sd, standard deviation.
Prediction of hospitalizations
Regarding the first predictive model, that of admission in 2019 by selected predictors in 2018 (Table 2), several observations are relevant: females are significantly less likely overall to be hospitalized (OR = 0.867; 95% confidence interval = 0.812 [lower control limit] - 0.925 [upper control limit], P < .0001); each year of age significantly increases a member's risk of hospitalization (P < .0001); and hospitalization authorization in 2018 significantly increases a member's risk of hospitalization in 2019 (P < .0001).
Predictive Model: Logistic Regression of Admission in 2019 by Selected Predictors in 2018
Many (14 out of the 20) 2018 diagnoses tested in the model statistically significantly increase the risk of hospitalization in 2019: COPD and bronchiectasis; spondylosis, intervertebral disc disorders, other back; paralysis; congestive heart failure; other nervous system disorders; other connective tissue disease; other lower respiratory disease; nonspecific chest pain; osteoarthritis; other gastrointestinal disorders; other ear and sense organ disorders; other bone disease and musculoskeletal deformities; diabetes mellitus with complications; and urinary tract infections.
Similarly, 7 of the 17 2018 procedures tested in the model statistically significantly increase the risk of hospitalization in 2019: ground mileage; durable medical equipment; oxygen concentrator; portable gaseous 02; humidifier heated used with positive airway pressure; nursing care in home; and ultrasound therapy.
As indicators of the overall predictive power of the model, the Wald chi-square for testing the global null hypothesis: beta = 0 is highly statistically significant (chi-square = 6768.79, df = 25, P < .0001), and there is a C statistic of 0.8491.
The admissions “confusion matrix” (data not shown) indicates that the test set results and training set results are virtually identical, and there is a far higher proportion of false positives (21.7% in the test set) than false negatives (0.9%) and a far higher proportion of true negatives (74.2%) than true positives (3.2%).
Prediction of readmissions
Regarding the second predictive model, that of rehospitalization in 2019 (given an initial hospitalization in 2019) by selected predictors in 2018 (Table 3), several observations are relevant: while age group is no longer a statistically significant predictor (in comparison to the aforementioned admissions predictive model), females are significantly less likely than males to be rehospitalized (OR = 0.842 (95% confidence interval = 0.726,0.976), P = 0.0227). Rehospitalization authorization in 2018 significantly increases a member's risk of rehospitalization in 2019 (P < .0001).
Predictive Model: Logistic Regression of Readmission in 2019 by Selected Predictors in 2018
Only 5 of the 20 2018 diagnoses tested in the model statistically significantly increase the risk of rehospitalization in 2019: COPD and bronchiectasis; congestive heart failure; osteoarthritis; pneumonia (except that caused by tuberculosis or sexually transmitted disease); and urinary tract infections.
And 4 of the 17 2018 procedures tested in the model statistically significantly increase the risk of rehospitalization in 2019: oxygen concentrator; manual therapy ≥1 regions; therapeutic activities; and neuromuscular reeducation.
To summarize the predictive power and accuracy of the model, the Wald chi-square for testing the global null hypothesis: beta = 0 is highly statistically significant (chi-square = 147.23, df = 14, P < .0001); the C statistic of 0.6237 is far lower than that of 0.8491 for the aforementioned hospitalization model.
The readmissions confusion matrix (data not shown) indicates that the test set results and training set results are very similar, and there is a higher proportion of false positives (28.1% in the test set) than false negatives (9.9%) and a far higher proportion of true negatives (50.4%) than true positives (11.5%). In short, as was true for 2019 admissions, there are relatively few 2019 readmissions to predict in this data. As expected from the C statistics seen for the 2 models, the relative differences seen among the confusion matrix categories for the readmissions model were not as striking as those seen for the admissions model.
Hospitalization risk scores
The admission risk scores calculated are highly positively skewed, with a mean (standard deviation) of 0.042 (0.074) and a median of 0.018, meaning that the vast majority of members have little risk of hospitalization in 2019 (Table 4).
Frequency Distribution of Risk Scores, Grouped
Discussion
Assessment of this methodology against the TRIPOD criteria shows strong adherence to these criteria. The two models tested were those of admissions in 2019 and of 2019 30-day readmissions given an initial admission in 2019. The predictive power of the admissions model (C = 0.8491) is much greater than that of the readmissions model (C = 0.6237). These results, using authorization data, are generally similar to those using utilization data to predict hospitalizations 2 and readmissions. 7
The admission risk scores calculated are highly positively skewed, with a mean (standard deviation) of 0.042 (0.074) and a median of 0.018, indicating that the vast majority of members had little risk of hospitalization in 2019. These risk scores may be used, as desired, to identify members at risk of admission, and thus to support proactive risk management programs (eg, health/disease management programs).
Contributions
This model development project was innovative in employing authorization data rather than utilization data, the latter being more commonly used. Among the advantages of authorization data are their timeliness, and the simple fact that they are sometimes the only data available. On the other hand, there are disadvantages to using authorization data: authorized services are not always actually performed, and diagnoses are often “rule out” diagnoses rather than final diagnoses. Addressing the question of the greater timeliness of authorization data, traditional medical claims-based risk identification tools must rely on claim files that are limited by a number of time lags, including the lags between the incurred date of the claim, receipt of the claim by the payer, the adjudication period, and the typically monthly submission of a paid claim file to the party performing data analysis based on that claim file. The date of authorization of prospective services obviously predates the date the service was actually incurred and thus eliminates some of these potential lags.
Applications
Kepro maintains numerous reporting and analytics tools for tracking members' care. Merging data across disparate data sources is a common problem in health care analytics. Implementing live member tracking using predictive models can provide care management teams with the data necessary to improve member care. The results of this study will be implemented both in stand-alone models and added to key indicators within existing predictive models of acuity. Using the models discussed here, member risk scores will be generated and updated daily, and trended over time to view member history and events. In addition to generating risk scores, Kepro plans to use the results of the model to identify high-risk members, determine if they are currently on case management, and make recommendations for case management where warranted. Summary member risk score analysis and individualized member profiles will be available to care management teams and clinical staff.
Limitations
The key limitations of these analyses include those already mentioned regarding the disadvantages of authorization data, and also the limit to generalizability inherent in use of only a single database, however large. Additionally, it is hard to assess comprehensively the value of a predictive model based on authorization data without being able to compare it to a predictive model based on utilization data. While such a comparison was not possible in the current research, such a comparative analysis would represent an important contribution to this field.
Conclusions
The objectives of this research were to develop models to predict admissions in 2019 and, given 2019 admissions, readmissions in 2019, based on demographic characteristics of patients (ie, age, sex), diagnoses identified and procedures requested on authorization requests in 2018, and admission in 2018. These predictions took the form of risk scores/probabilities of hospitalizations and rehospitalizations for all individuals whose data were included in the model.
Assessment of this methodology against the TRIPOD criteria showed strong adherence to these criteria.
The predictive power of the model of admissions in 2019 (C = 0.8491) is much greater than that of the model of 2019 readmissions (C = 0.6237).
The admission risk scores that were calculated are highly positively skewed, with a mean (standard deviation) of 0.042 (0.074) and a median of 0.018, meaning that the vast majority of members have little risk of hospitalization in 2019. These risk scores may be used, as desired, to identify members at risk of admission and thus for proactive risk management programs (eg, health/disease management programs). Different risk thresholds could be used to identify subsets of members for follow-up, depending on strategy and available resources.
Footnotes
Authors' Contributions
Dr. Crawford: Project conception, literature review, data analysis and interpretation, manuscript writing. Mr. Novinger: Project conception. Mr. Dominick: Project conception. Ms. Heckert: Data management. Dr. McAna: Project conception, literature review, data analysis and interpretation, manuscript writing.
Author Disclosure Statement
The authors declare that there are no conflicts of interest.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Table S1
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
