Abstract

US health care payment policy is shifting from fee-for-service to more capitated or value-based payments. At the same time, technology is increasingly becoming a part of health care delivery. As a result, health care providers, health care systems, and health plans are relying more and more on population health management algorithms to identify potentially high-cost, high-utilization patients and target these patients in efforts to improve outcomes or reduce health care costs. 1 Algorithms use a variety of data sources, including medical claims data, electronic health records (EHRs), and individual or neighborhood-level social determinants of health data to identify patients at high risk for hospitalization or death. Once the algorithm identifies a cohort of patients at greater risk for hospitalization or death, the provider or other stakeholders can act, including increasing outreach to these patients, involving care coordinators, or encouraging patients to participate in disease management programs.
These algorithms have become more popular, yet we believe concerns about data quality could have major implications for the validity and performance of these algorithms in clinical practice. First, missing data is an issue. Claims data are limited and often do not include diagnosis information. Relevant data may be missing from the patient's medical record. The patient history may not be available if the patient recently moved, long-term records are limited, or if pediatric records are not available. EHR systems are not always well integrated, so data from a primary care appointment, a recent hospitalization, an urgent care visit, a vaccination at a pharmacy, or any medical care on a trip out of state may not be available. Additionally, many important pieces of data are not being collected, including many social risk factors that greatly influence health, such as housing status. For example, when Stanford tried to use an algorithm to allocate COVID-19 vaccines among staff, the algorithm did not include data on the actual COVID-19 exposure of those staff, resulting in prioritization of older staff without very much exposure. 2
A second concern is data quality; where data are present, they may not be correct. Information may be out-of-date and not reflect recent test results. Recent studies have shown providers often enter incorrect data in EHRs 3 and data fields such as problem lists are not accurate. 4 In many cases, providers select templates that include default text for a negative review of systems or a normal physical exam. Copy-paste errors, where social histories are copied forward without being validated, or histories are used from previous visits, can make the timing of an event difficult to verify (eg, if the history repeatedly reports the pain began “last week”). Typos, misclicks, or otherwise “honest mistakes” that are meaningfully different from the patient's true health status are likely present in many medical records.
A third issue is availability of information. Even if all relevant data for these algorithms are present and correct, they still may be inaccessible to these algorithms; they could be held in free-text notes or other non-accessible formats, such as scanned PDFs or image files, and not in structured fields used by algorithms. Studies that have extracted this free-text information have shown statistically significant improvements in risk prediction, 5 showing that algorithms without free-text data are not as accurate as they could be.
Lastly, there may be problems with the algorithms themselves. Algorithms typically are developed based only on existing data, so they cannot account for unmeasured factors (such as many social determinants of health). In a recent infamous study, 6 an algorithm that predicted future care needs using claims data produced biased recommendations conditional on race, even though race was not explicitly factored into calculations, because previous health care utilization and costs were lower for Black patients than White patients with similar health risk factors. The algorithm recommended services at a lower rate for high-risk Black patients, further compounding existing disparities in the health care system. Furthermore, many algorithms are biased toward measured health care data elements; for example, providers mostly measure white blood cell counts if they suspect a patient has an infection, so the implications of test results are based on a biased subset of patients. 7 Machine learning and artificial intelligence approaches also are becoming increasingly popular. Machine learning is a type of artificial intelligence in which algorithms are developed automatically from a large body of data without explicit programming. These approaches also are limited by available data and may not produce clinically useful output. 8 Furthermore, these algorithms often are less transparent than other approaches, making them more difficult to troubleshoot and determine sources of bias.
Given potential problems with frequently used data and algorithms, concerns arise about the accuracy of risk prediction. This misclassification in risk prediction caused by data problems can have real-world consequences. We group misclassifications into 2 categories: cases where risk prediction algorithms overestimate patients' risk (relatively lower risk patients deemed higher risk), and cases where risk is underestimated (relatively higher risk patients deemed lower risk). Assuming a patient population who are either “high risk” or “low risk” for adverse outcomes, a simple classification algorithm results in 2 major problems: overutilization or underutilization (Table 1).
Potential Problems from Risk Misclassification
These data problems could cause overutilization of health care services in patients who are actually at low risk for hospitalization. For example, a patient with a relatively high number of documented comorbid conditions would seemingly be at higher risk for hospitalization or death, leading to recommended interventions such as case manager interventions. However, that patient may have mitigating social factors (eg, high socioeconomic status, dedicated caretakers) that reduce overall risk (unmeasured data). The patient could even be flagged as high risk because some conditions were incorrectly added to the EHR (incorrect data). These patients may be unlikely to benefit from extra outreach, and they could possibly even be harmed by overuse from extra testing, extra medications, or even extra costs associated with these services. Furthermore, resources and time of the care coordinator would be better spent working with patients who truly are high risk.
Another important problem would be the underutilization of health care services in patients who are actually high risk. Patients could have fewer clinical risk factors but more social risk (data showing this social risk could be missing or in free-text notes). Patients could have incorrect data in their EHR for the aforementioned reasons (such as a default negative review of systems or physical exam), or missing data from a recent hospitalization at an out-of-network hospital, so they do not appear to be high risk. Without this flag from the algorithm, these patients are unlikely to get the additional disease management or care coordination outreach they need and may be more likely to suffer adverse consequences as a result.
It is not clear that the data available allow algorithms to classify patients' risks correctly. The rating from the algorithm may give a false sense of confidence around a patient's risk status. As the use of such tools increases for population health management, we suggest that both designers of the algorithms and their users tread carefully. Algorithm designers should be aware of potential downsides of different sources of data when using them to inform algorithms. In a review of risk prediction models using EHR data, nearly half of the models did not consider the impact of missing data on the algorithm's performance. 9 Clinical data may be incorrectly assumed to be “gold standard,” while data entry practices vary across institutions or providers. Processes for testing algorithms may need additional validation through clinical judgment. Researchers need to consider how algorithms are used in real-world settings using less-than-perfect data. Similarly, clinical users and policy makers should be circumspect about using these data. Although an algorithm's outputs may be helpful at a population level, we worry that health data are not yet sufficiently accurately documented to allow skipping the clinical judgment step. Although humans have biases and make mistakes, algorithms are only as good as the underlying data. For now, data quality and missingness issues mean we cannot yet fully automate risk stratification.
Footnotes
Authors' Contributions
Drs. Predmore and Fischer conceived or designed the work; drafted the article; critically reviewed and revised the article; and provided final approval of the version to be published.
Authors' Disclosure Statement
The authors declare that there are no conflicts of interest.
Funding Information
No funding was received for this article.
