A Comprehensive Index for Predicting Risk of Anemia from Patients' Diagnoses

Abstract

This article demonstrates how time-dependent, interacting, and repeating risk factors can be used to create more accurate predictive medicine. In particular, we show how emergence of anemia can be predicted from medical history within electronic health records. We used the Veterans Affairs Informatics and Computing Infrastructure database to examine a retrospective cohort of 9,738,838 veterans over an 11-year period. Using International Clinical Diagnoses Version 9 codes organized into 25 major diagnostic categories, we measured progression of disease by examining changes in risk over time, interactions in risk of combination of diseases, and elevated risk associated with repeated hospitalization for the same diagnostic category. The maximum risk associated with each diagnostic category was used to predict anemia. The accuracy of the model was assessed using a validation cohort. Age and several diagnostic categories significantly contributed to the prediction of anemia. The largest contributors were health status ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1075, t = −92, p < 0.000), diseases of the endocrine ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1046, t = −87, p < 0.000), hepatobiliary ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1043, t = −72, p < 0.000), kidney ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1125, t = −111, p < 0.000), and respiratory systems ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1151, t = −89, p < 0.000). The AUC for the additive model was 0.751 (confidence interval 74.95%–75.26%). The magnitude of AUC suggests that the model may assist clinicians in determining which patients are likely to develop anemia. The procedures used for examining changes in risk factors over time may also be helpful in other predictive medicine projects.

Introduction

This article demonstrates how large scale redundant models with thousands of overlapping variables can be used for predictive medicine. In modeling health hazards, investigators typically use Cox Regression or artificial intelligence models to relate risk factors to probability of an event.¹ Typically a linear combination of risk factors is included in the analysis, whereas changes in risk factors over time and repeated treatments for the same diagnosis are ignored. In a sample of 20 recently published articles using Cox regression, 80% used less than 30 variables, 5% explored the interaction among the variables, 0% examined change in risk factors over time, and only 25% reported the fit of the model to the data.² Naturally, when the input to a predictive model is limited, model accuracy drops and its use to guide clinical practice is not reasonable. Predictive medicine calls for methods that can produce more accurate models by allowing time-dependent, interacting, and repetitive variables as risk factors. In this article, we show, by way of an example, how this can be done. To demonstrate, we focus on predicting anemia.

Anemia is a common, but often underappreciated, diagnosis affecting millions of Americans.³ The public health importance of this disease is particularly relevant given the aging population.⁴ Adequate identification and risk stratification of presence of anemia is a central issue in clinical decision making. Several indices for predicting anemia exist, but they have mainly been developed for, and assessed in, very specific populations, such as post-partum women, patients receiving chemotherapy, patients with hepatitis C, or end-stage renal disease.^5–10 Furthermore, the small sample sizes and the selective number of risk factors investigated limit the generalizability of these models and their direct use in clinical care. In contrast, this article develops a model for predicting the presence of anemia from the patient's current medical history, as coded within the patient's medical diagnoses.

In our model, anemia is predicted from medical history. There are more than 14,000 diagnoses within the International Classification of Diseases (ICDs) Version 9 that can be used to describe the patient's medical history. These diagnoses are used to measure progression of risk for anemia within 25 major diagnostic categories, which are organized into body systems.¹¹ Then, progression of risk factors within body systems is used to predict anemia. In this article, we report the percentage of variation in the presence of anemia explained by our model.

Materials and Methods

Model development

The general conceptualization of the problem is depicted in Figure 1. Electronic health records report encounters with the patient. Each encounter has a diagnosis. We use these diagnoses to examine change in risk for anemia. Statisticians refer to these events as “risk factors” or “covariates.” Covariates or risk factors are calculated over some baseline period, which is an interval of time before the presence of anemia. The objective is to estimate the anemia-free time of the patient, starting from the end of the baseline period (i.e., point b in Fig. 1).

FIG. 1.

Depiction of risk of anemia over time for purposes of model development.

Patients' health statuses can change dynamically after each encounter. Therefore, it is reasonable to assume that covariates or risk factors change during the baseline period and may depend on time. Furthermore, covariates are present for various lengths of time during the baseline period, creating different exposure times to risk factors. For example, in patients without anemia at the end of the baseline period, time b, the prediction model should give an assessment of the probability of anemia in the interval from b + 1 to a, based on all covariate and exposure time information available at time before b. In Figure 1, two covariates (diagnoses) R1 and R2 are shown occurring at times r1 and r2. These covariates are used to predict the time the patient is anemia free, or time to censor event.

The traditional approach to predicting anemia over time is to develop a model of the cumulative hazard faced by the patient over time. In this approach, the patient in shown in Figure 1 has hazard R1 for b − r1 time periods and hazard R2 for b − r2 time periods. Investigators have proposed several different ways to model this, including Cox regression, with time-dependent covariates, multistate models, accelerated failure models, proportional odds models, and additive hazard models.¹² Among these approaches, we chose to use the additive hazard model because of its simplicity.¹³ In this model, the conditional hazard is defined as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {h} ({t} {\vert} {\textbf{\textit{X}}}) = {\textbf{\textit{X}}} ^{ \rm{T}} {\boldsymbol{\beta}} (t) + {h}_{0} (t) , \end{align*} \end{document}

where X is the vector of covariates, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\textbf{\textit{X}}^{ \rm{T}}}$$ \end{document} is its transpose, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\boldsymbol{\beta}}$$ \end{document} (t) is the vector of regression coefficients at time t, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${h_0} \left( t \right)$$ \end{document} is the background hazard at time t (t = 0 corresponds to the end of the baseline interval). Each element X_i of the covariate vector corresponds to a single diagnostic category i. The risk factor X_i is the maximum of the risk factors within a given category: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {{ \rm{X}}_i} = { \rm{Maximum}} \left\{ {{{ \rm{X}}_{i , 1}} , {{ \rm{X}}_{i , 2}} , \ \ldots , \ {X_{i , j}}} \right\} , \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${X_{i , j}}$$ \end{document} is the jth risk factor within a category i during the baseline period.

In scoring progression of risk, we chose to focus on the condition with the maximum risk, as opposed to average risk across conditions. We chose this approach because we are most concerned about the worst condition that the patient had. Furthermore, we chose maximum, as opposed to latest risk, on the assumption that later conditions may be lower and may mask an earlier higher risk. Thus, we are assuming that lower later risk is not a true reduction of risk and what matters is the worst condition in the patient's history, no matter how much earlier it is reported.

For example, if a person has diseases A, B, and C within a particular body system, then the disease with the highest risk factor is selected for the covariate. This could be A, B, or C and is independent of when the diseases occurred in the baseline period. As these risk factors are calculated at different time periods and reflect different encounters, the covariate vector represents the progression of diseases within different body systems.

Source of data

The Department of Veterans Affairs Informatics and Computing Infrastructure (VINCI) is a research database designed to provide investigators with data and statistical tools needed to complete research studies in the Veterans Affairs (VA) population.¹⁴ VINCI houses data from all Veterans Health Administration's medical centers nationwide, representing the most complete repository of administrative and medical records in the United States. Using these data, we examined a retrospective cohort over an 11-year timeframe (January 1, 2004–August 12, 2015). This study was approved by the Washington, D.C. Veteran Affairs Medical Center Institutional Review Board.

Inclusion criteria

Subjects were included if they had at least 1 year of encounters within the VA healthcare system, and who were born after 1915, before 1974, and were alive in 2003. The total number of patients who met these criteria was 9.7 million. We selected the patients who had at least 1 year of follow-up, which reduced the number of patients to 5,392,267. Out of those with at least 1 year of follow-up, 1,103,533 had at least one diagnosis of anemia.

Dependent variable

The dependent variable was the number of anemia-free days starting from the first report of any hospitalization or outpatient diagnosis. Subjects were identified as having anemia by the ICD 9 codes listed in Appendix 1. Causes of anemia included iron, folate, and B12 deficiencies, renal insufficiency, anemia of chronic inflammation (formerly termed anemia of chronic disease), and unexplained anemia. The independent variables were diagnoses either in the hospital or in an ambulatory setting. These independent variables were used to stage progression of risk of anemia.

Censored data

For patients who did not have a diagnosis of anemia, the number of anemia-free days was censored by either death or date of last reported visit.

Risk factors/covariates

In this article we rely on patients' diagnoses and age to anticipate future occurrences of anemia. In version 9 of the ICDs, there are more than 14,000 classifications, creating a high number of variables to use in predicting the emergence of anemia. Investigators usually reduce the number of variables before constructing predictive models by using ridge or other similar statistical methods.¹⁵ These statistical methods of dimension reduction focus on variables that are both prevalent and predictive of the targeted disease. Thus, they ignore variables that are rare but also predictive. To include rare predictors, we take a different approach and do not reduce the number of variables. Instead, we examine the hazard of anemia for all patients' diagnoses, and report these within different body systems.

For each patient, the baseline point b (Fig. 1) is randomly selected in time between the starting point of the data and the diagnosis of anemia at a or the end of the data if there is no anemia diagnosis. This selection occurs once for each patient. The average length of the follow-up period was 2,574.28 days (standard deviation of 1309.03 days). For simplicity, all risk factors that occur before the baseline point b are assumed to occur at b. For example, if a patient has risk factor R1 at time r1 and risk factor R2 at time r2, both of which occur before the selected baseline b (as in Fig. 1), then, for the purpose of estimating the anemia hazard rate given one of the risk factors, the anemia-free time is the difference between the time of diagnosis of anemia and the baseline, that is, a − b. Any risk factor occurring after the baseline point b is ignored. For example, in Figure 1, the risk factor occurring at time r3 is ignored.

The hazard rate associated with a risk factor is assumed to be constant in time. As shown in Figure 1, this assumption may not be reasonable when the baseline period is a long interval or when early and late risk factors are not averaged across patients to be approximately occurring at the same time. Diseases that occurred less than 30 times in the database were ignored as accurate estimates could not be made for these diseases. We calculate the discrete hazard rate, that is, the daily probability of anemia, based on the inverse of the average number of days until the first anemia diagnosis: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} hazard \left( { diagnosis } \right) = \frac { 1 } { { average \ \# \ days \ until \ first \ anemia \ diagnosis } } . \end{align*} \end{document}

This method of calculation is computationally faster and, under the assumption of constant hazard over time, provides the same estimates as repeatedly counting the number of patients who have anemia for the first time after each time period (Appendix 2).

Progression of risk within diagnostic categories

The diagnoses were further classified into diagnostic categories. The Agency for Healthcare Research and Quality has organized diagnoses into major diagnostic categories.¹¹ The categories are formed by dividing all possible diagnoses into 25 mutually exclusive areas, so that each area corresponds to a single organ system or etiology. These smaller clinically meaningful categories are more useful statistically for the purposes of the development of this model than are individual ICD-9 codes. The 25 major diagnostic categories include diseases and disorders of the nervous system; eye; ear, nose, and throat; respiratory system; circulatory system; digestive system; hepatobiliary system and pancreas; musculoskeletal system and connective tissue; skin, subcutaneous tissue, and breast; endocrine, nutritional and metabolic diseases and disorders; kidney and urinary tract; male reproductive system; female reproductive system; pregnancy, childbirth and the puerperium; newborns and other neonates; diseases and disorders of the blood, blood-forming organs, immunological disorders; myeloproliferative diseases and disorders; infectious and parasitic diseases; mental diseases and disorders; alcohol/drug use; injuries, poisoning, and toxic effects of drugs; burns; multiple significant trauma; and human immunodeficiency virus infections. Agency for Healthcare Research and Quality has classified miscellaneous diagnoses into “other” or “factors influencing health status” categories. The health status category refers to visits that affect the health of patients but are not treatment for diagnosis or injury. For example, visits for donating organs, visits for discussing homelessness, or visits for immunization. Within each category, diagnoses were arranged from least to highest hazard rate for the development of anemia. As there were no cases in the database under the major diagnostic category of multiple significant traumas, this category did not contribute to the model.

Repeated diagnoses were recognized as separate events and were included in the relevant diagnostic category. Often, patients are seen multiple times for the same diagnosis. In the hospital setting, these diagnoses may indicate a treatment-resistant illness. For example, a patient repeatedly admitted for cancer unresponsive to treatment. Sometimes, repeated diagnoses are used to indicate continued treatment for the same diagnosis; for example, repeated visits for chemotherapy. In an outpatient setting, repeated diagnoses may signal follow-up on a diagnosis. For example, consider a patient who is repeatedly seen for diabetes.

To report progression of risk within the diagnostic category, the hazard rates were normalized from 0.0 to 1.0 and for each 10% progression toward a maximum risk of 1.0 within each body system. Since patients with no diagnosis within a body system were assumed to have a 0 score, the normalization was performed by dividing the hazard associated with the disease with the maximum hazard within the body system.

Overall risk of anemia

The risk for anemia was calculated by regressing the number of anemia-free days on the standardized scores within each body system. The higher the number of predicted days to anemia, the less is the risk for anemia. The lower the number of predicted days to anemia the higher is the risk for anemia.

Results

Table 1 describes the demographic characteristics of the patients included. As expected in the veteran population, patients are mostly male.

Table 1.

Demographic characteristics of the 9,738,838 veterans examined

	Percentage of cases	No. of cases
Dead during study period	22.52	2,193,398
Gender
Male	90.90	8,852,569
Female	9.10	885,768
Missing	0.01	501
Race
White	65.96	2,159,959
Unknown	18.20	595,870
Black	14.81	484,998
Asian	0.58	18,997
American Indian or Alaskan Native	0.39	12,618
Other	0.07	2135
Marital status
Married	53.65	5,225,132
Never married, divorced, separated, or single	29.20	2,843,695
Missing	8.70	847,048
Widow or widower	8.45	822,963
Year of birth (surrogate for age)
1915–1920	2.26	220,225
1920s	16.21	1,579,005
1930s	17.51	1,704,986
1940s	27.50	2,678,189
1950s	19.31	1,880,855
1960s	13.07	1,273,400
1970–1975	4.14	403,178

To give the reader a sense of how daily hazard rates for each diagnosis were calculated, Table 2 provides the top 10 diagnoses that had the shortest number of days to first reported anemia. For these 10 diagnoses, the number of days from first diagnosis listed until a diagnosis of anemia is relatively short (82–179 days).

Table 2.

Top 10 diagnoses that precede anemia

	Anemia-free days (standard deviation)	No. of patients
Malignant ascites	82 (172.07)	645
High-grade myelodysplastic syndrome lesions	83 (198.07)	87
Acute leukemia of unspecified cell type W/O remission	111 (373.86)	164
Malignant pleural effusion	125 (216.53)	1882
Hepatorenal syndrome	153 (448.01)	1108
Secondary malignant neoplasm of brain and spinal cord	157 (306.72)	6066
Encounter for palliative care	157 (352.19)	30,421
Spinocerebellar disease	162 (310.14)	310
Acute and chronic neoplasm-related pain	173 (350.63)	5804
Secondary malignant neoplasm of adrenal gland	179 (395.34)	2113

Table 3 is exemplary of how hazards associated with different diagnoses were arranged within major diagnostic categories. To make the table easy to understand, we report only three major diagnostic categories, and within these categories we report select diagnoses. The columns in Table 3 show the various diagnostic categories (e.g., diseases in the circulatory system). The first column reports the standardized score; this represents increasing hazard rates of anemia, normalized between 0.0 and 1.0. For example, in the circulatory system, “2nd cardiogenic shock” is assigned a score of 1.0, because it corresponds to the worst-case daily hazard rate (0.0074) for anemia among all diagnoses within the circulatory system. Similarly, “1st ruptured abdominal aortic aneurysm” is assigned a score of 0.4, representing a hazard rate that is ∼40% of that for cardiogenic shock. When a patient presents with multiple diagnoses within the same diagnostic category, the hazard rate is assumed to be the score associated with the worst diagnosis. For example, within the cardiovascular system, a patient may present with first and second “metabolic cardiomyopathy,” in which case the score for second occasion of the disease is used.

Table 3.

Examples of body systems and risk progression within each

Risk progression	Circulatory system	Digestive system	Endocrine, nutrition, metabolic system
Minimum 0%	1st 941.1, erythema caused by 1st degree burn not otherwise specified	1st 997.4, retained cholelithiasis after cholecystectomy	1st 244, benign neoplasm pituitary
+10%	2nd 941, burn of face, head and neck	1st 151.1, perforation of intestine	1st 783.3, vitamin A deficiency not otherwise specified
+20%	1st 996.02, lower extremity embolism	1st 151.3, malignant neoplasm stomach not otherwise specified
+30%	1st 428.2, acute pericarditis in diseases classified elsewhere	1st 197.4, malignant neoplasm abdomen	1st 194.9, malignant neoplasm endocrine not otherwise specified
+40%	1st 459.2, ruptured abdominal aortic aneurysm	1st 197.6, secondary malignant neoplasm peritoneum	4th 262, other severe malnutrition
+50%	2nd 425.7, metabolic cardiomyopathy	3rd 197.8, secondary malignant neoplasm peritoneum	1st 275.01, hereditary hemochromatosis
+60%	1st 785.5, shock not otherwise specified		1st 276.69, other fluid overload
+70%	3rd 427.5, cardiac arrest	6th 151.9, malignant neoplasm stomach not otherwise specified	1st 198.7, secondary malignant neoplasm of adrenal
+80%			4th 198.7, secondary malignant neoplasm of adrenal
+90%	8th 428.23, acute on chronic systolic heart failure
Maximum 100%	2nd 785.51, cardiogenic shock	1st 529.8, epistaxis	2nd 198.7, secondary malignant neoplasm of adrenal
Maximum daily hazard rate	0.0074	0.0081	0.0081

Repeated hospitalizations for the same diagnosis are scored separately.

The combined impact of diagnostic categories in predicting anemia-free days is represented in Table 4. Age and nearly all diagnostic categories have a statistically significant relationship on number of anemia-free days. The negative coefficient in most variables confirms that progression within various body systems brings about shorter days to anemia. The variables with positive coefficients suggest that these variables increase time to anemia beyond what is reported for the intercept. The magnitude of the coefficients for each body system is presented. Age has a large impact (coefficient, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1572, standard t-statistic, t = −199, p < 0.000). Several major diagnostic categories also have a large impact independent of the effects of aging: health status ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1075, t = −92, p < 0.000), endocrine ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1046, t = −87, p < 0.000), hepatobiliary ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1043, t = −72, p < 0.000), kidney ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1125, t = −111, p < 0.000), and respiratory diseases ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −1151, t = −89, p < 0.000) have the largest impact. The coefficient in the regression is interpreted as the change in the average number of days until anemia is diagnosed if a patient has the worst disease within the body system. Thus going from normal to a worst-case disease in the endocrine category hastens the arrival of anemia by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$ { \frac { 1 , 046 } { 365 } } = 2.87$$ \end{document} years. Similarly, changing from normal to the worst disease in the hepatobiliary, kidney, and respiratory diagnostic categories hastens the diagnosis of anemia by 2.87, 3.08, and 3.15 years. Although the impact was less, neoplastic ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −772, t = −33, p < 0.000) and nervous ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\beta$$ \end{document} = −769, t = −60, p < 0.000) disease also hasten the diagnosis of anemia by 2.11 and 2.10 years, respectively. All other categories shortened the time to the diagnosis of anemia by less than 1 year. The regression equation explains 20% of variation in days to anemia for more than 800,000 cases.

Table 4.

Impact of body systems on anemia-free days

Variable	Estimate	Standard error	Z-value	Pr(>\|z\|)
Intercept	2724.898	6.682	407.789	<2e-16^***
Age	−1572.916	7.869	−199.875	<2e-16^***
Male	112.684	5.979	18.846	<2e-16^***
Other diagnoses	−299.026	11.056	−27.047	<2e-16^***
Burns	−143.034	88.049	−1.624	0.104
Circulatory	−431.312	9.438	−45.699	<2e-16^***
Digestive	−302.583	10.242	−29.544	<2e-16^***
Ear, nose, mouth, throat	−301.394	4.716	−63.907	<2e-16^***
Endocrine	−1046.295	11.986	−87.293	<2e-16^***
Eye	−74.117	12.946	−5.725	1.03e-08^***
Female reproductive	−236.808	49.303	−4.803	1.56e-06^***
Health status	−1075.756	11.662	−92.246	<2e-16^***
Hepatobiliary	−1043.726	14.443	−72.263	<2e-16^***
HIV	−229.717	19.654	−11.688	<2e-16^***
Infectious	−354.703	9.797	−36.204	<2e-16^***
Injury	136.811	32.4	4.223	2.42e-05^***
Kidney	−1125.864	10.138	−111.056	<2e-16^***
Male reproductive	68.722	6.440	10.671	<2e-16^***
Mental illness	−349.4	9.450	−36.972	<2e-16^***
Musculoskeletal	−300.034	10.578	−28.365	<2e-16^***
Neoplasia	−772.983	23.720	−32.587	<2e-16^***
Nervous	−769.263	12.817	−60.021	<2e-16^***
Newborns	121.549	176.125	0.690	0.490
Prenatal	160.932	96.667	1.665	0.096
Respiratory	−1151.210	12.854	−89.560	<2e-16^***
Skin	−386.245	10.346	−37.331	<2e-16^***
Substance use	−72.343	5.781	−12.514	<2e-16^***

Significance codes: 0 “^***”.

Residual standard error: 1027 on 809033 degrees of freedom.

Multiple R squared: 0.1971, Adjusted R squared: 0.1971, F statistic 7640 on 26 and 809033 degrees of freedom p value <2.2e-16.

The accuracy of the model was tested on a single holdout set consisting of 20% of the data specifically set aside for validation testing. The area under the receiver operating characteristic (ROC; Fig. 2) curve for the prediction of the development of anemia is 0.751 (confidence interval [CI] 74.95%–75.26%). This was calculated as follows: For each patient, the estimated number of anemia-free days was estimated from the regression model. The actual number was obtained from the data. The ROC curve was generated by varying the threshold for predicted time to onset of anemia. A given threshold T corresponds to a single point on the curve, where the x-axis denotes the false positive rate (fraction of patients who are predicted to get anemia within T days but do not have anemia at this point) and the y-axis denotes the true positive rate (fraction of patients who are predicted to get anemia within T days and do so).

FIG. 2.

ROC curve for the predictive index.

Use of the anemia risk index

To use the index to predict the number of days to the development of anemia for a specific patient, first score the single attributes associated with the patients' age, gender, and major disease categories and then use the additive or the multiplicative parameters to obtain one overall score for the patient. For example, consider a 70-year-old patient with “streptococcal septicemia” and “secondary malignant neoplasm of liver.” Age 70 corresponds to a standardized score of 0.5 and male gender corresponds to a score of 1. Streptococcal septicemia corresponds to a score of 0.4 in the major disease category of infectious and parasitic diseases. Secondary malignant neoplasm of liver corresponds to a score of 0.7 in the hepatobiliary and pancreas categories. All other body systems have no relevant diagnoses, so these are scored as 0 and can be ignored. The overall number of days that this patient's anemia is hastened is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} & Predicted \ anemia \ free = Intercept + { \beta _{age}}age + { \beta _{mal}}male \\ & \quad + { \beta _{Inf}}infectious + { \beta _{hep}}hepatobiliary Anemia - free \ days \\ & \quad = 2725 - 1572*0.5 - 112*1.0 - 354*0.4 - 1043*0.7 \\ & \quad = 955 \ days \end{align*} \end{document}

For this patient, we are predicting that anemia will occur in 955 days from baseline, where patient's medical history was assessed. The smaller the number, the more likely that anemia will occur. If regression predictions are negative, then the prediction is that anemia has already occurred before the baseline diagnoses.

Discussion

We created a prediction model for the development of anemia using a very large, nationally representative database of U.S. veterans. Anemia was positively associated with age, which is consistent with findings of a higher prevalence in older adults from other large studies.¹⁶ The etiologies associated with the quickest onset of anemia in this prediction model were respiratory, kidney, endocrine, and hepatobiliary diseases. Almost all diagnostic categories have a statistically significant relationship on the number of anemia-free days, with the exception of burns, newborns, and prenatal visits. It was surprising that so many diagnostic categories had a significant contribution to the emergence of anemia. This is likely to be because of the size of the database.¹⁷ In massive databases, even small effect sizes could be statistically significant. Age, health status, and diseases of the respiratory, kidney, endocrine, and hepatobiliary systems all had large effect sizes.

Few studies have looked at the time to development of anemia, but all diagnoses have biological plausibility in their contributions to the development of anemia.^18–20 The differential in time courses for the development of anemia for the various disease states may be reflective of different or multiple mechanisms, including blood loss, dampening of hormonally trophic environments as with erythropoietin or thyroid hormone deficiency,²¹ hypoxia-sensing abnormalities,^22,23 increased inflammatory markers and hepcidin levels,²⁴ and nutritional deficiencies.²⁵

This article provides a method for early detection of anemia. The model's area under the ROC curve was 0.751 (CI 74.95%–75.26%). This accuracy level is similar to 0.79 for Centor's clinical prediction rule for streptococcal pharyngitis,²⁶ 0.62 to 0.86 for prostate-specific antigen testing,²⁷ and 0.67 to 0.82 for screening mammography.²⁸

Early recognition of anemia is particularly important given the association between anemia and adverse health outcomes. Anemia has been linked with increased mortality, functional and cognitive decline, and hospitalization.^29–32 Directing targeted treatment to the underlying pathophysiology may ameliorate these outcomes. Therefore, our study adds to the literature in that clinicians will be able to use this model to anticipate the potential development of anemia, and where appropriate, institute closer monitoring, or take preventative or therapeutic action.

Our study has a number of notable strengths. A considerable strength is the large sample size. The sample is also representative of veterans across the geographic United States. The large sample size and multiple sites improve internal data reliability. Another strength is that the model examines risk across all encounters and allows for progression of risk factors during the baseline period.

Although the index performs well at predicting anemia, the study has a number of limitations. The accuracy of the model could be improved. Some of the factors that may affect the accuracy include limitations inherent to the retrospective study design, including the use of the electronic health record. For example, it could be better if we had defined anemia using more detailed laboratory data, not available on most patients. The use of diagnostic codes to define anemia and comorbid conditions may have resulted in an underestimation of their prevalence. Previous studies have shown assignment of diseases by ICD-9 codes to have accuracy between 72% and 83% in principal and comorbid diagnosis extraction.³³ However, the key laboratory variable used for cohort definition (e.g., hemoglobin) is part of routine panels that are measured in most patients receiving healthcare, and therefore it is unlikely that a significant proportion of actively enrolled veterans would have been excluded.

Another limitation to consider when using data from electronic health records is the possible confounding that occurs. In particular, we used data obtained during the course of clinical practice (without randomization) and, therefore, selection bias is possible. The regression adjusts for a significant number of potential confounders, but we cannot rule out the presence of residual confounding.

The study population consisted mostly of male patients; hence, the results may not apply to female patients. For instance, there were insufficient cases in the database to add female reproductive disorders as a major diagnostic category to the overall prediction model. Salive et al. found that age was significantly associated with anemia, with a stronger effect in men than in women, suggesting an interaction between age and gender.³⁴ In patients with myocardial infarction, Tsujita et al. found that mortality was significantly higher in men with versus without anemia (4.6% vs. 1.8% at 30 days, p = 0.003; 8.9% vs. 3.0% at 1 year, p < 0.0001), but not in women (5.3% vs. 3.6% at 30 days, p = 0.42; 7.5% vs. 5.9% at 1 year, p = 0.54).³⁵ Therefore, our model may not accurately predict onset of anemia in female patients.

Conclusions

This article proposes a two-step model for predictive medicine. In the first step, the risk factors are assessed using maximum risk during the baseline period within each diagnostic category. In the second step, the hazard associated with each risk factor is assessed. Despite the limitations noted, we have shown that this two-step model has a relatively high cross-validated accuracy. The similarity of the accuracy of the proposed approach to screening tools currently in clinical practice suggests that the model may be used to predict the development of anemia within electronic health records. Future studies are needed to verify the accuracy of the prognostic index in a prospective design and in clinical settings.

Footnotes

Acknowledgment

This material is the result of work supported with resources and the use of facilities at the Veterans Affairs Medical Center in Washington, DC.

Author Disclosure Statement

M.G.T., F.A., J.F.S., and C.H. have no real or potential commercial associations that might create a conflict of interest in connection with this article.

Cite this article as: Tuck MG, Alemi F, Shortle JF, Avramovic S, Hesdorffer C (2017) A comprehensive index for predicting risk of anemia from patients' diagnoses. Big Data 5:1, 42–52, DOI: 10.1089/big.2016.0073.

Abbreviations Used

Appendix 1

The following International Classification of Diseases 9 codes were used to identify anemia in the data.

249.00 250.00 250.01 790.2 790.21 790.22 790.29 791.5 791.6 V45.85 V53.91 V65.46 249.01 249.10 249.11 249.20 249.21 249.30 249.31 249.40 249.41 249.50 249.51 249.60 249.61 249.70 249.71 249.80 249.81 249.90 249.91 250.02 250.03 250.10 250.11 250.12 250.13 250.20 250.21 250.22 250.23 250.30 250.31 250.32 250.33 250.40 250.41 250.42 250.43 250.50 250.51 250.52 250.53 250.60 250.61 250.62 250.63 250.70 250.71 250.72 250.73 250.80 250.81 250.82 250.83 250.90 250.91 250.92 250.93

Appendix 2

To help with understanding of our method of calculating discrete hazard rates, note the example in Table 5. The data are right censored, which can occur if a patient dies before onset of anemia or the last recorded doctor visit occurs before first anemia.

In this example, patient 1 was diagnosed with anemia 5 days after first occurrence of hypertension. Similarly, patient 3 was diagnosed with anemia 3 days after first report of hypertension. Patient 2's last visit occurred before a diagnosis of anemia—either because the patient died or the recorded doctor visits end. Patient 2 has 8 days (0 to 7) of recorded data without getting anemia starting from the first day of diagnosis with hypertension. Similarly, patient n has 6 days of recorded data without a diagnosis of anemia.

To define notation, let m_i be the index of the last day of data for patient i. If the patient is diagnosed with anemia, then m_i is the number of days until a diagnosis of anemia (e.g., m₁ = 5 in the example); otherwise m_i is the index of the last day of data (e.g., m₂ = 7 in the example). Let x_ij ≡ 1 if patient i is first diagnosed with anemia on day j and 0 otherwise, where j = 0, 1, 2, …, m_i. Note that x_ij = 0 for j = 0, 1, …, m_i−1 and x_ij can be either 1 or 0 for j = m_i. That is, each row contains all 0's except for the last element, which can be either 0 or 1. Let

That is, y_i = 1 if patient i is diagnosed with anemia, and 0 otherwise.

From these data, the daily hazard rate can be estimated. Let h(j) be the hazard rate of an anemia diagnosis on day j—that is, the probability that a patient is diagnosed with anemia on day j, given that the patient is alive on day j and has not yet been diagnosed with anemia on day j−1 (in this discussion, “day j” refers to the jth day since the first diagnosis of hypertension for a particular patient). An estimate for h(j) is the number of 1's in column j divided by the number of elements in column j. This can be quantified as follows. The number of elements in column j is

where 1_{{m_i ≥ j}} is an indicator function equal to 1 if m_i ≥ j and equal to 0 otherwise. An estimate for h(j) is then

where it is assumed that x_ij = 0 for j > m_i; that is, the gray boxes in Table 5 count as 0's in the sum.

Now, if we make the additional assumption that the hazard rate is constant as a function of time, h(j) = h, a constant, for all j, then an estimate for the constant hazard rate is the total number of 1's in the data divided by the total number of data points. The total hazard rate can be computed in two ways—either by summing the data elements row wise or by summing column wise. Using a row-wise sum, an estimate for the hazard rate is

In the previous example, using the data for patients 1–3, the numerator is 2 (the number of patients who are diagnosed with anemia) and the denominator is 6 + 8 + 4 = 18 (the sum of the number of data points in each row). Alternatively, the hazard rate can be estimated by summing column wise: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \hat h \equiv {{ \mathop \sum \limits_{j = 0}^T { \mathop \sum \limits_{i = 1}^m {{x_{ij}}} } } \mathord{ \left/ { \vphantom {{ \mathop \sum \limits_{j = 0}^T { \mathop \sum \limits_{i = 1}^m {{x_{ij}}} } } { \mathop \sum \limits_{j = 0}^T {{n_j}} }}} \right. } { \mathop \sum \limits_{j = 0}^T {{n_j}} }} , \quad\quad\quad { \rm Eq.} \ ( { \rm B}.5 ) \end{align*} \end{document}

where T is the maximum column index over all the rows (T = max_i{m_i}). In the example given in Table 5 (using only data for patients 1–3), the numerator is 2 and the denominator is 3 + 3 + 3 + 3 + 2 + 2 + 1 + 1 = 18.

References

Cruz

, Wishart

. Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2007; 2:59–77.

PubMed. 2017. Available online at www.ncbi.nlm.nih.gov/pubmed/ (last accessed February 27, 2017).

Guralnik

, Eisenstaedt

, Ferrucci

, et al. Prevalence of anemia in persons 65 years and older in the United States: Evidence for a high rate of unexplained anemia. Blood. 2004; 104:2263–2268.

Beghe

, Wilson

, Ershler

. Prevalence and outcomes of anemia in geriatrics: A systematic review of the literature. Am J Med. 2004; 116 Suppl 7A:3S–10S.

Martinez-Martinez

, Escandell-Montero

, Barbieri

, et al. Prediction of the hemoglobin level in hemodialysis patients using machine learning techniques. Comput Methods Programs Biomed. 2014; 117:208–217.

Ogawa

, Furusyo

, Nakamuta

, et al. Clinical milestones for the prediction of severe anemia by chronic hepatitis C patients receiving telaprevir-based triple therapy. J Hepatol. 2013; 59:667–674.

Dranitsaris

, Clemons

, Verma

, et al. Chemotherapy-induced anaemia during adjuvant treatment for breast cancer: Development of a prediction model. Lancet Oncol. 2005; 6:856–863.

Chen

, Peng

, Lai

, et al. An index to predict ribavirin-induced anemia in asian patients with chronic genotype 1 hepatitis C. Hepat Mon. 2015; 15:e2714–8.

Ampuero

, Del Campo

, Rojas

, et al. Role of ITPA and SLC28A2 genes in the prediction of anaemia associated with protease inhibitor plus ribavirin and peginterferon in hepatitis C treatment. J Clin Virol. 2015; 68:56–60.

10.

Allary

, Soubirou

, Michel

, et al. An individual scoring system for the prediction of postpartum anaemia. Ann Fr Anesth Reanim. 2013; 32: e1–e7.

11.

Agency for Healthcare Research and Quality. 2012. HCUP CCS fact sheet. Healthcare cost and utilization project (HCUP). Available online at www.hcup-us.ahrq.gov/toolssoftware/ccs/ccsfactsheet.jsp (last accessed November 9, 2015).

12.

van Houwelingen

, Putter

. Comparison of stopped cox regression with direct methods such as pseudo-values and binomial regression. Lifetime Data Anal. 2015; 21:180–196.

13.

Aalen

. A model for non-parametric regression analysis of counting processes. In: Klonecki

, Kozek

, Roskinski

(Eds.): Mathematical statistics and probability theory. New York: Springer-Verlag, 1980, pp. 1–25.

14.

VA Informatics and Computing Infrastructure (VINCI). 2014. Available online at www.hsrd.research.va.gov/for_researchers/vinci/default.cfm (last accessed November 2015 ).

15.

Hastie

, Tibshirani

, Friedman

. The elements of statistical learning: Data mining, inference, and prediction, 2nd ed. Stanford: Springer, 2010.

16.

Patel

. Epidemiology of anemia in older adults. Semin Hematol. 2008; 45:210–217.

17.

Sinha

, Hripcsak

, Markatou

. Large datasets in biomedicine: A discussion of salient analytic issues. J Am Med Inform Assoc. 2009; 16:759–767.

18.

Noskova

, Lishchinskaia

, Parfenov

, et al. Risk of development of clinical and pathogenetic features of anemia on the background of basic therapy of inflammatory bowel disease. Eksp Klin Gastroenterol. 2011; 12–17.

19.

Piron

, Loo

, Gothot

, et al. Cessation of intensive treatment with recombinant human erythropoietin is followed by secondary anemia. Blood. 2001; 97:442–448.

20.

Aguirre

, Juaristi

, Alba Alvarez

, et al. In vitro and in vivo studies of murine erythropoietic recovery after treatment with cyclophosphamide. Sangre (Barc). 1999; 44:182–187.

21.

Artunc

, Risler

. Serum erythropoietin concentrations and responses to anaemia in patients with or without chronic kidney disease. Nephrol Dial Transplant. 2007; 22:2900–2908.

22.

Erslev

, Caro

, Miller

, Silver

. Plasma erythropoietin in health and disease. Ann Clin Lab Sci. 1980; 10:250–257.

23.

Wang

, Semenza

. Purification and characterization of hypoxia-inducible factor 1. J Biol Chem. 1995; 270:1230–1237.

24.

Nemeth

, Rivera

, Gabayan

, et al. IL-6 mediates hypoferremia of inflammation by inducing the synthesis of the iron regulatory hormone hepcidin. J Clin Invest. 2004; 113:1271–1276.

25.

Nutritional anaemias. Report of a WHO scientific group. World Health Organ Tech Rep Ser. 1968; 405:5–37.

26.

Centor

, Witherspoon

, Dalton

, et al. The diagnosis of strep throat in adults in the emergency room. Med Decis Making. 1981; 1:239–246.

27.

Punglia

, D'Amico

, Catalona

, et al. Effect of verification bias on screening for prostate cancer by measurement of prostate-specific antigen. N Engl J Med. 2003; 349:335–342.

28.

Pisano

, Gatsonis

, Hendrick

, et al. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med. 2005; 353:1773–1783.

29.

Mindell

, Moody

, Ali

, Hirani

. Using longitudinal data from the health survey for england to resolve discrepancies in thresholds for haemoglobin in older adults. Br J Haematol. 2013; 160:368–376.

30.

Chaves

, Semba

, Leng

, et al. Impact of anemia and cardiovascular disease on frailty status of community-dwelling older women: The women's health and aging studies I and II. J Gerontol A Biol Sci Med Sci. 2005; 60:729–735.

31.

den Elzen

, Willems

, Westendorp

, et al. Effect of anemia and comorbidity on functional status and mortality in old age: Results from the leiden 85-plus study. CMAJ. 2009; 181:151–157.

32.

Culleton

, Manns

, Zhang

, et al. Impact of anemia on hospitalization and mortality in older adults. Blood. 2006; 107:3841–3846.

33.

Zeng

, Goryachev

, Weiss

, et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: Evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006; 6:3–0.

34.

Salive

, Cornoni-Huntley

, Guralnik

, et al. Anemia and hemoglobin levels in older persons: Relationship with age, gender, and health status. J Am Geriatr Soc. 1992; 40:489–496.

35.

Tsujita

, Nikolsky

, Lansky

, et al. Impact of anemia on clinical outcomes of patients with ST-segment elevation myocardial infarction in relation to gender and adjunctive antithrombotic therapy (from the HORIZONS-AMI trial). Am J Cardiol. 2010; 105:1385–1394.