Multiple Imputation of Missing Race/Ethnicity Information in the National Assisted Reproductive Technology Surveillance System

Abstract

Background:

Missing race/ethnicity data are common in many surveillance systems and registries, which may limit complete and accurate assessments of racial and ethnic disparities. Centers for Disease Control and Prevention's National Assisted Reproductive Technology (ART) Surveillance System (NASS) has a congressional mandate to collect data on all ART cycles performed by fertility clinics in the United States and provides valuable information on ART utilization and treatment outcomes. However, race/ethnicity data are missing for many ART cycles in NASS.

Materials and Methods:

We multiply imputed missing race/ethnicity data using variables from NASS and additional zip code-level race/ethnicity information in U.S. Census data. To evaluate imputed data quality, we generated training data by imposing missing values on known race/ethnicity under missing at random assumption, imputed, and examined the relationship between race/ethnicity and the rate of stillbirth per pregnancy.

Results:

The distribution of imputed race/ethnicity was comparable to the reported one with the largest difference of 0.53% for non-Hispanic Asian. Our imputation procedure was well calibrated and correctly identified that 89.91% (standard error = 0.18) of known race/ethnicity values on average in training data. Compared to complete-case analysis, using multiply imputed data reduced bias of parameter estimates (the range of bias for stillbirth per pregnancy across race/ethnicity groups is 0.02%–0.18% for imputed data analysis, versus 0.04%–0.66% for complete-case analysis) and yielded narrower confidence intervals.

Conclusions:

Our results underscore the importance of collecting complete race/ethnicity information for ART surveillance. However, when the missingness exists, multiply imputed race/ethnicity can improve the accuracy and precision of health outcomes estimated across racial/ethnic groups.

Introduction

Numerous studies have documented racial/ethnic differences in assisted reproductive technology (ART) utilization and outcomes in the United States.^1

–7 ART utilization rates are lower among non-Hispanic (NH) Black, Hispanic, and American Indian/Alaska Native women.^1,2 Moreover, NH Black, Hispanic, and NH Asian women have lower live birth rates following ART compared to NH White women, even after adjusting for covariates such as age, body mass index (BMI), cause of infertility, and number of embryos transferred.^3,4 Other studies have documented racial/ethnic differences in perinatal outcomes of ART treatments, including increased risk of low birthweight and preterm birth.^5,7 However, one common limitation of these studies is the large percentage of missing information on race/ethnicity, which may unfavorably impact analysis to obtain valid statistical inferences.⁸

Missing race/ethnicity is common in many national surveillance systems and registries. The degree of missingness varies across data sources.^9,10 Long et al. reported a range of race/ethnicity missingness from 9% to 45% in Veterans Health Administration registry and survey data based on published articles.⁹ Centers for Disease Control and Prevention's (CDC) National ART Surveillance System (NASS) is not immune from missing race/ethnicity information with race/ethnicity missingness over 30%.^1,6 Missing race/ethnicity information for ART patients limits complete and accurate reporting of racial and ethnic differences in ART access and treatment outcomes.

Traditional complete-case analysis that relies only on subjects without missing race/ethnicity may result in biased estimates because the missing completely at random assumption, that is, the assumption that missing race/ethnicity does not depend on any observed and unobserved data, is commonly violated.⁸ Moreover, removing subjects with missing race/ethnicity may also decrease statistical testing power because of the reduced sample size.¹¹ Therefore, it is necessary to consider different approaches to address race/ethnicity missingness to obtain valid statistical inferences. Multiple imputation (MI) has been proposed as a possible statistical tool to address missing data explicitly and to obtain valid statistical inferences.^12,13 Under the missing at random assumption (MAR), that is, the assumption that missing race/ethnicity only depends on observed data, MI methods impute missing race/ethnicity values using observed variables that are associated with race/ethnicity.⁸

MI generates multiple datasets to reflect the added uncertainty of the missing data, but because all subjects are included in each replicated dataset, the statistical power to detect significant differences generally increases.¹¹ Each dataset is analyzed separately, and final estimates are obtained using common combination rules.¹⁴ MI techniques have been widely used in many applications,^15,16 and they can be implemented with common statistical software (e.g., SAS [SAS Institute, Inc., Carry NC], STATA [StataCorp LLC., College Station TX], SUDAAN [RTI International, Research Triangle NC], R [The Free Software Foundation; https://www.fsf.org/]).

The objective of this study is to multiply impute race/ethnicity under MAR assumption in NASS and evaluate the operating characteristics of estimates including the similarity of distributions, correctly predicted values, and bias of estimated associations between race/ethnicity and ART treatment outcomes that rely on multiply imputed data.^17
–19

Materials and Methods

Data sources

Our study comprised data from NASS version 2.0 collected from January 1, 2016, through December 31, 2018, which include information on patient demographics, obstetrical and medical history, parental infertility diagnosis, clinical parameters of the ART procedure, and information regarding resultant pregnancies and births.²⁰ Clinics report patient race as binary information for White, Black or African American, Asian, Native Hawaiian or other Pacific Islander, and American Indian or Alaska Native. Patient ethnicity is reported as Hispanic or NH. Using these variables, we constructed a race/ethnicity categorical variable with seven mutually exclusive groups: NH White, NH Black, NH Asian, Hispanic, NH Native Hawaiian or other Pacific Islander, NH American Indian or Alaska Native, and two or more races (when more than one race is selected).²¹ Race/ethnicity was considered missing if either race or ethnicity was missing. Most patients with missing race/ethnicity (∼98%) had both race and ethnicity missing.

According to the Fertility Clinic Success Rate and Certification Act of 1992 (Public Law No.102-493, October 24, 1992), all U.S. fertility clinics that perform ART are required to report information about each ART cycle to the CDC every year.²² NASS is a deidentified national database, in which every observation represents a cycle, and one patient can have multiple cycles. As a result, patients may have different values for their reported race/ethnicity across cycles. To obtain consistent imputed race/ethnicity values across cycles from the same patient, we imputed race/ethnicity at the patient level.

To improve the imputation model, we linked the U.S. Census (2010) zip code-level population distributions for each of the seven race/ethnicity categories²³ to NASS data through patient's residential zip code.

Imputation variables

It is generally recommended to use as many variables as possible to predict imputed values.²⁴ In this effort, all variables were transformed to patient level. We used the values of the following variables as they were reported earliest, that is, either at the time of the patient's first cycle or the successive cycle(s) if they were not reported at the first cycle: patient's race/ethnicity, patient's and partner's age, patient's weight and height (which were used to compute patient's BMI), any prior pregnancies (yes/no), number of prior pregnancies, number of prior births, and number of prior ART cycles. We converted the following cycle-level binary variables to a single indicator variable if it was reported for any cycle: infertility diagnoses, smoking before treatment, intracytoplasmic sperm injection (ICSI), oocyte/embryo banking, preimplantation genetic testing (PGT), and stillbirth.

Partner's race was used if it was ever reported; otherwise, if donor sperm was used, we recorded the donor's race. For the following variables, we used the largest value ever reported: number of eggs retrieved, number of embryos transferred, infant birthweight, and number of infants born. Pregnancy outcome was consolidated across cycles using the following hierarchy: multiple birth, singleton birth, miscarriage, transferred but results unknown, not transferred, egg/embryo banking or transfer unknown. Cycle outcome was consolidated based on the following hierarchy: term birth (≥37 weeks), late preterm birth (32–36 weeks), early preterm birth (28–31 weeks), very early preterm birth (<28 weeks), miscarriage, not pregnant, and no transfer.

To build the imputation model, we selected variables for inclusion in the model using the strategy proposed by van Buuren and Groothuis-Oudshoorn.²⁵ In short, we included variables that could be used to examine associations between race/ethnicity and clinical outcomes. In addition, we included variables that were correlated with either race/ethnicity missingness or reported race/ethnicity. The χ ² tests were used to identify variables that are marginally correlated with either race/ethnicity missingness or reported race/ethnicity. Variables that are significantly associated with the race/ethnicity missingness or reported race/ethnicity (with p-values <0.0001) were included as predictors in the imputation model.

Variables with missingness above 45% were excluded from the imputation procedure. This strategy resulted in 32 variables that were included in the imputation model. As suggested by Silva et al.,¹⁹ we also included nine additional variables in the imputation model: patients' resident state as well as the 2010 U.S. Census reported proportions of Whites, Blacks, Asians, Latinos, Native Hawaiian/other Pacific Islanders, American Indians/Alaska Natives, other race, and mixed race in the patients' zip code.

Imputation procedures

The MI models were implemented using SAS' Proc MI with the fully conditional specification (FCS) procedure (SAS Institute, Inc., 2015). Besides race/ethnicity, the following covariates also had a high proportion of missing values: BMI (14.63%), smoking before treatment (11.05%), sperm source (24.38%), sperm source race/ethnicity (27.67%), partner age in years (39.23%), ICSI (17.73%), and PGT (10.08%). FCS procedure can impute missing values for race/ethnicity as well as covariates listed above simultaneously. The discriminant function method (a generalization of Fisher's linear discriminant method²⁶) was used for imputing nominal categorical variables, and linear regression models were used to impute continuous variables.²⁷ To address the possibility that some of the conditional models for missing variables are complex, we supplemented the imputation models with all two-way interactions between the selected 32 variables that were significantly associated with race/ethnicity (with p-values <0.05).

Each missing variable was imputed 20 times, resulting in 20 complete datasets. After imputation of the missing data, we compared the race/ethnicity distributions of complete-case or only reported race/ethnicity data (denoted as dataset R), imputed race/ethnicity data (denoted as dataset I), and reported and imputed race/ethnicity data (denoted as dataset R+I).

Imputed data evaluation

To evaluate the performance of the race/ethnicity imputation procedure, we created a patient-level training dataset with nearly complete race/ethnicity data. The race/ethnicity missingness varied by clinic and the median race/ethnicity missingness across all 490 ART clinics (total 413,025 patients) in NASS was 11.43% with a range of 0% (no missing) to 100% (all missing). The training dataset included 244 clinics with reported race/ethnicity completeness rates above 88.57% (i.e., the missingness <11.43%). These 244 clinics comprised 138,384 patients (33.50% of the total patient population). Of these, 6,182 patients with missing information on race/ethnicity were excluded. This resulted in a final training dataset that comprised 132,202 patients (95.53% of 138,384) with reported race/ethnicity (known race/ethnicity data, denoted as dataset K^t, where superscript “t” denotes training data).

Next, to mimic the missingness pattern of the original NASS dataset, we removed race/ethnicity information for ∼32% of the patients in the training dataset (complete-case data). The process to sample a training dataset was repeated 50 times. For each of the 50 replications, selection of patients with removed race/ethnicity was based on a model for the probability of missing race/ethnicity developed in the original data, assuming MAR missingness.

Formally, let M_i be an indicator that is equal to 1 if race/ethnicity is missing for patient i and it is equal to 0 otherwise. In addition, let X_i be the vector of all the covariates used in the imputation model for patient i. We estimated using a logistic regression model logit , where and are a set of unknown parameters. For patient i, we estimated the predicted probability of missing race/ethnicity based on the maximum likelihood estimates of and , and using these estimates, we calculated , and independently sampled M_i from a Bernoulli distribution with probability to decide if an individual had missing race/ethnicity. This resulted in an average of 42,450 patients (32.10% of 132,202) across the 50 replications that were sampled to have missing race/ethnicity information . Within each of the 50 replications, the fully conditional imputation algorithm was used to generate plausible values for the imposed missing race/ethnicity values (denoted as dataset I^t).

To examine the calibration of the imputation procedure under MAR, race/ethnicity distributions were compared between the training dataset where reported missing race/ethnicity values were artificially imposed and then imputed (denoted as dataset R + I^t) and the known race/ethnicity values (dataset K^t).

We also evaluated the performance of the imputed data in an analysis of stillbirth rates by race/ethnicity. For this analysis, we used a subsample of the training dataset with cycles that resulted in pregnancy. Following a similar procedure adopted by Zhang et al.,²⁸ in which the evaluation process was replicated 50 times, for each replicate, 20 imputations were conducted. Within each replicate, we compared the estimates of stillbirths per pregnancy as well as risk ratios using a logistic regression model, in which race/ethnicity was the independent variable. We implemented this analysis with the SUDAAN's PROC RLOGISTIC under the adjusted risk ratio option, where NH White is the reference for each racial/ethnic group.

The analysis was performed on the known race/ethnicity data (denoted as dataset K^ct, where superscript “ct” denotes pregnant cycles in training data), only reported race/ethnicity data with artificially imposed missing values excluded (denoted as dataset R^ct), imputed race/ethnicity data (denoted as dataset I^ct), and reported and imputed race/ethnicity data (denoted as dataset R + I^ct). The known race/ethnicity data (dataset K^ct) were used as the gold standard. We used SUDAAN'S option MI_COUNT = 20 in PROC statement to obtain combined estimates across 20 imputed datasets in each replicate, and averaged the point estimates, standard errors (SE), and the upper and lower 95% confidence bounds across the 50 replicates.

In addition, we conducted a sensitivity analysis to assess the impacts of possible violation of the MAR assumption on the quality of imputed data. In this sensitivity analysis, we excluded the variable stillbirth from the imputation model and examined the possible biases when estimating the associations between stillbirth and race/ethnicity.

Results

Missing data pattern by year

Figure 1 shows that the average proportion of patient-level race/ethnicity missingness in NASS across the years was 36.0%, varying from 42.2% in 2004 to 32.1% in 2018.

FIG. 1.

Trends in the proportions of patients with missing race/ethnicity in the U.S. National Assisted Reproductive Technology Surveillance System, 2004–2018.

Imputed and observed race/ethnicity distribution

The associations between race/ethnicity groups as well as the missingness indicator and the covariates used in the imputation model are depicted in Table 1. All associations were statistically significant (p < 0.0001).

Table 1.

Patient and Cycle Characteristics by Patient Race/Ethnicity, U.S. National Assisted Reproductive Technology Surveillance System, 2016–2018

Patient and cycle characteristics and outcomes	Patient race/ethnicity
Patient and cycle characteristics and outcomes	Non-Hispanic White	Non-Hispanic Black	Non-Hispanic Asian	Hispanic or Latino	Native Hawaiian or other Pacific Islander	American Indian or Alaska Native	Two or more races	Missing race/ethnicity
Age, years, mean	35.11	36.99	36.61	36.17	36.28	35.35	35.81	36.28
BMI, kg/m², %
<18.50	1.98	0.65	5.12	1.22	2.42	1.14	1.56	2.26
18.50–24.99	46.21	23.51	56.09	34.47	38.58	36.75	47.84	43.99
25.00–29.99	21.58	29.30	16.67	26.28	22.09	29.49	20.52	19.63
30.00–34.99	10.57	18.70	5.21	13.97	14.37	9.26	9.71	8.88
35.00–39.99	5.61	9.51	1.46	6.11	5.45	6.13	4.88	4.21
≥40.00	2.92	5.26	0.38	3.02	3.03	2.28	2.60	2.14
Missing	11.13	13.07	15.08	14.93	14.07	14.96	12.88	18.88
Any prior pregnancies, %
Yes	52.98	59.89	50.86	57.27	55.37	46.15	55.22	47.95
No	47.02	40.11	49.14	42.73	44.63	53.85	44.78	52.05
No. of prior pregnancies, mean	1.08	1.39	1.02	1.30	1.09	1.00	1.17	1.00
No. of prior births, mean	0.50	0.52	0.38	0.65	0.49	0.46	0.47	0.44
Smoking before treatment, %
Yes	2.08	1.71	1.08	1.60	1.82	1.57	2.13	1.27
No	91.56	92.41	90.19	91.22	95.31	95.87	94.86	79.68
Missing	6.36	5.89	8.74	7.18	2.87	2.56	3.01	19.05
Source of sperm, %
Partner	70.31	69.63	76.26	74.29	72.62	77.36	65.87	67.08
Donor	5.50	7.66	3.79	5.58	4.39	3.70	7.06	5.55
Mixed	0.10	0.11	0.05	0.07	—	—	—	0.13
N/A or missing	24.09	22.60	19.89	20.06	23.00	21.51	26.86	27.24
Sperm source race/ethnicity, %
Non-Hispanic White	73.28	7.54	14.16	19.10	29.05	18.52	40.05	3.40
Non-Hispanic Black	1.03	64.99	0.73	2.50	2.87	4.42	10.75	0.33
Non-Hispanic Asian	1.43	0.70	64.92	1.80	39.18	4.42	16.94	1.46
Hispanic or Latino	1.90	1.47	1.14	56.66	4.24	5.13	4.62	0.51
Other	0.15	0.23	0.11	0.16	—	41.17	1.35	0.04
N/A or missing	22.21	25.07	18.95	19.78	24.51	26.35	26.29	94.26
Partner age, years, %
<25	0.20	0.16	0.07	0.31	—	—	0.26	0.17
25–30	5.82	3.84	2.78	5.74	4.84	4.84	4.73	3.53
31–33	9.67	6.12	6.99	8.32	7.72	6.84	8.21	6.72
34–36	12.14	8.56	12.20	11.27	9.08	14.10	11.90	9.63
37–39	10.63	9.63	12.42	11.33	11.80	12.54	10.75	9.48
40–45	12.54	15.79	18.25	15.35	18.00	13.96	13.92	12.73
>45	6.30	12.08	11.67	8.22	10.74	8.55	7.74	7.54
N/A for donor sperm used	5.01	7.10	3.57	5.25	4.08	4.27	6.34	5.30
Missing	37.69	36.73	32.05	34.21	33.74	34.76	36.16	44.90
Infertility diagnosis,^a %
Diminished ovarian reserve	25.89	32.01	34.39	30.58	29.65	22.51	26.70	28.30
Endometriosis	8.13	6.25	5.96	6.75	11.50	5.98	7.27	5.41
Male infertility	30.90	29.07	24.47	28.71	33.13	25.07	28.05	25.14
Ovulation disorders including PCOS	15.50	11.87	12.21	14.43	15.43	17.66	12.21	11.42
Tubal factor: All	9.49	26.19	9.25	20.98	19.06	11.68	12.21	8.98
Tubal: hydrosalpinx	0.91	3.10	1.24	1.64	2.57	2.71	1.51	0.94
Tubal: ligation	1.37	2.85	0.50	6.68	3.03	2.71	1.97	1.15
Tubal: other	7.37	20.75	7.76	13.03	13.92	6.55	9.04	7.05
Uterine factor	5.18	15.85	7.17	7.89	7.72	7.41	5.87	5.18
Unexplained	11.27	6.26	11.75	8.04	8.17	12.96	9.97	12.81
Other	21.95	20.10	24.97	21.64	20.27	23.93	25.09	22.82
No. of prior ART cycles, mean	0.77	0.70	0.91	0.64	1.13	0.83	0.85	0.78
Total No. of cycles performed, mean	2.11	1.96	2.19	1.94	2.11	2.12	2.11	2.02
No. of eggs retrieved. mean	11.34	10.71	10.47	10.09	9.78	10.89	11.75	10.61
No. of embryos transferred. mean	1.18	1.24	1.02	1.26	1.27	1.38	1.10	1.07
ICSI ever used, %
Yes	71.89	70.17	74.86	72.38	73.22	74.22	71.06	68.17
No	10.79	11.81	9.95	12.51	9.23	7.69	8.42	12.37
Missing	17.32	18.02	15.18	15.11	17.55	18.09	20.52	19.46
Oocyte/embryo banking ever used, %	41.74	35.32	55.69	39.17	41.15	39.60	45.04	49.26
Preimplantation genetic testing used, %
Yes	32.93	21.83	47.80	28.44	28.74	29.63	31.95	40.74
No	57.95	67.08	42.39	63.05	62.03	59.26	53.61	47.89
Missing	9.12	11.09	9.81	8.51	2.23	11.11	14.44	11.37
No. of infants born, %
0	47.21	61.69	57.89	56.33	54.46	58.40	54.08	51.22
1	45.68	31.85	37.28	36.24	37.22	36.18	39.74	42.13
2	6.95	6.36	4.73	7.23	8.17	5.13	6.13	6.50
3+	0.16	0.10	0.09	0.20	—	0.28	—	0.15
Cycle outcome, %
No transfer	16.99	18.51	26.84	18.01	16.94	17.66	22.75	22.44
Not pregnant	23.72	34.92	24.68	30.73	29.50	32.48	23.95	22.59
Miscarriage	0.27	0.43	0.45	0.58	0.76	0.85	0.62	0.37
Very early preterm birth (<28 weeks)	6.87	9.77	6.61	8.17	9.23	7.98	7.90	6.51
Early preterm birth (28–31 weeks)	0.94	1.12	0.79	1.07	1.36	0.85	1.09	0.87
Late preterm birth (32–36 weeks)	7.81	7.24	5.65	7.85	8.32	7.55	7.06	7.18
Term birth (≥37 weeks)	43.40	28.01	34.98	33.59	33.89	32.62	36.62	40.04
Pregnancy outcome, %
Multiple birth	3.11	2.82	1.71	3.32	3.63	2.42	2.18	2.70
Singleton birth	17.66	12.99	11.97	14.75	13.77	15.24	15.53	16.11
Miscarriage	4.20	4.63	3.17	4.51	3.33	4.84	4.42	3.77
Transferred, but results unknown	22.30	26.22	18.08	24.07	23.60	23.36	21.71	19.16
Not transferred	11.48	14.83	11.49	12.64	15.13	13.53	11.38	10.11
Egg/embryo banking or transfer unknown	41.24	38.51	53.57	40.71	40.54	40.60	44.78	48.14
Stillbirth, %	0.32	0.77	0.24	0.46	0.76	0.28	0.57	0.28
Infant birth weight, mean	3231.24	2943.03	3072.91	3084.94	2976.30	3087.54	3179.02	3187.33

Chi-squared tests for race/ethnicity and all variables listed in the table were significant with p-value <0.0001.

Infertility diagnosis categories are not mutually exclusive.

ART, assisted reproductive technology; BMI, body mass index; ICSI, intracytoplasmic sperm injection; PCOS, polycystic ovary syndrome.

Table 2 compares distributions of patient race/ethnicity between three datasets in full NASS data: dataset R for only reported data with 34.70% missing race/ethnicity excluded (complete-case analysis), dataset I for imputed race/ethnicity data, and dataset R + I for reported and imputed race/ethnicity data. The differences in proportions of race/ethnicity between patients in dataset R and patients in dataset I ranged from 0.15% to 1.52%. We then compared the racial/ethnic distributions between dataset R and dataset R+I, and differences were smaller, ranging from 0.05% to 0.53%. In both comparisons, the largest differences were observed for NH Asian (1.52% and 0.53%, respectively) and NH White patients (0.83% and 0.29%, respectively).

Table 2.

Distribution of Patient Race/Ethnicity Before and After Multiple Imputation, National Assisted Reproductive Technology Surveillance System, 2016 to 2018

Patient race/ethnicity	Reported race/ethnicity data (dataset R) ^a (N = 269,697 patients), % (95% CI)	Imputed race/ethnicity data (dataset I) ^b (N = 143,328 patients), % (95% CI)	Reported and imputed race/ethnicity data (dataset R+I) ^b (N = 413,025 patients), % (95% CI)
Non-Hispanic White	63.78 (63.59–63.96)	62.95 (62.59–63.32)	63.49 (63.32–63.66)
Non-Hispanic Black	7.56 (7.46–7.66)	7.17 (6.97–7.38)	7.43 (7.33–7.52)
Non-Hispanic Asian	18.76 (18.61–18.91)	20.28 (19.95–20.62)	19.29 (19.14–19.44)
Hispanic or Latino	8.69 (8.58–8.79)	8.15 (7.94–8.37)	8.50 (8.40–8.60)
Native Hawaiian or Other Pacific Islander	0.25 (0.23–0.26)	0.59 (0.54–0.63)	0.36 (0.34–0.38)
American Indian or Alaska Native	0.26 (0.24–0.28)	0.41 (0.33–0.49)	0.31 (0.28–0.34)
Two or more races	0.71 (0.68–0.75)	0.44 (0.29–0.68)	0.62 (0.55–0.69)

Records with missing race/ethnicity (34.70%) excluded (complete-case analysis).

Results obtained using 20 replicates.

CI, confidence interval.

Evaluation of imputed race/ethnicity

Table 3 describes the results of our model validation analysis using training data: dataset K^t for known race/ethnicity data, dataset R^t for only reported race/ethnicity data with artificially imposed missing values excluded (complete-case analysis), dataset I^t for imputed race/ethnicity data, and dataset R + I^t for reported and imputed race/ethnicity data, where superscript “t” denotes training data. Differences in race/ethnicity proportions between dataset K^t (known race/ethnicity in training data) and dataset R^t (only reported race/ethnicity in training data) ranged from 0.0% to 2.03% (averaged across 50 replicates). The largest differences were observed for patients who were NH White (2.03%) and NH Asian (1.70%). Differences in race/ethnicity proportions between dataset K^t (known race/ethnicity in training data) and dataset I^t (imputed race/ethnicity in training data) ranged from 0.01% (SE = 0.05) to 1.06% (SE = 0.29) (averaged across 50 replicates).

Table 3.

Validation of Race/Ethnicity Distribution Before and After Multiple Imputation Using Training Dataset Obtained from 244 Clinics with Less Than 11% Missingness (a Total of 50 Replicates Performed and Each Replicate Used 20 Imputed Datasets)

Patient race/ethnicity	Training dataset with known race/ethnicity data (dataset K^t) ^a (N = 132,202 patients), % (95% CI)	Training dataset with reported race/ethnicity data (dataset R^t) ^b (N = 89,752 patients), % (95% CI)	Training dataset with imputed race/ethnicity data (dataset I^t) ^c,d (N = 42,450 patients), % (95% CI)	Training dataset with reported and imputed race/ethnicity data (dataset R + I^t) ^d (N = 132,202 patients), % (95% CI)
Non-Hispanic White	64.72 (64.46–64.97)	66.75 (66.43–67.06)	63.66 (63.04–64.28)	65.67 (65.37–65.97)
Non-Hispanic Black	7.61 (7.47–7.76)	7.83 (7.65–8.01)	7.66 (7.42–7.91)	7.77 (7.63–7.92)
Non-Hispanic Asian	16.68 (16.48–16.88)	14.98 (14.75–15.23)	16.69 (16.34–17.04)	15.58 (15.38–15.78)
Hispanic or Latino	9.82 (9.66–9.98)	9.31 (9.11–9.50)	10.02 (9.75–10.29)	9.56 (9.40–9.72)
Native Hawaiian or Other Pacific Islander	0.22 (0.20–0.25)	0.21 (0.18–0.24)	0.21 (0.11–0.42)	0.21 (0.16–0.27)
American Indian or Alaska Native	0.27 (0.24–0.30)	0.23 (0.20–0.27)	0.41 (0.36–0.47)	0.30 (0.27–0.33)
Two or more races	0.69 (0.64–0.73)	0.69 (0.64–0.75)	1.35 (0.97–1.93)	0.92 (0.77–1.11)

Each patient in the training dataset K^t has known race/ethnicity and superscript “t” denoting training data.

Race/ethnicity of average 32% of patients in the training data was imposed as missing and in dataset R^t patients whose race/ethnicity was imposed as missing were excluded (complete-case analysis).

Race/ethnicity of average 32% of patients in the training data I^t was imposed as missing and then imputed.

Results obtained using 50 replicates and each replicate used 20 imputed datasets.

The largest differences were observed for patients who were NH White (1.06%) and two or more races (0.66%). Differences in race/ethnicity proportions between dataset K^t (known race/ethnicity in training data) and dataset R + I^t (reported and imputed race/ethnicity in training data) ranged from 0.01% (SE = 0.02) to 1.10% (SE = 0.05) (averaged across 50 replicates). The largest differences were observed for patients who were NH Asian (1.10%) and NH White (0.95%).

The average proportion of correctly imputed race/ethnicity values compared to the known race/ethnicity values across 50 replicates and 20 imputed datasets was 89.91% (range 89.23%–90.62%) across all race/ethnicity groups, 94.76% (range 94.33%–95.20%) for NH White, 90.78% (range 89.57%–91.88%) for NH Black, 82.13% (range 80.95%–83.50%) for NH Asian, 80.97% (range 79.45%–82.50%) for Hispanic, 33.28% (range 22.39%–45.23%) for NH Native Hawaiian or other Pacific Islander, 75.37% (range 71.33%–81.12%) for NH American Indian or Alaska Native, and 40.03% (range 33.21%–49.94%) for two or more races.

Table 4 describes the results of our analysis of stillbirth rates (averaged across 50 replicates) by race/ethnicity in the subgroup of the training dataset with 80,068 cycles that resulted in pregnancy for 4 datasets: dataset K^ct for known cycle race/ethnicity data, dataset R^ct for only reported cycle race/ethnicity data with artificially imposed missing values excluded (complete-case analysis), dataset I^ct for imputed cycle race/ethnicity data, and dataset R + I^ct for reported and imputed cycle race/ethnicity data. The superscript “ct” denotes pregnant cycles obtained from patients in training data. The largest differences in stillbirth rates between dataset K^ct and dataset R^ct (52,038 cycles) were observed for Native Hawaiian or other Pacific Islander (2.35% vs. 1.69%, respectively), and American Indian or Alaska Native (0.52% vs. 0.17%, respectively).

Table 4.

Stillbirth Rates per Pregnancy and Risk Ratios by Race/Ethnicity Before and After Multiple Imputation Using Training Dataset with Cycles Resulting in Pregnancy Obtained from 244 Clinics with Less Than 11% Missingness (a Total of 50 Replicates Performed and Each Replicate Used 20 Imputed Datasets)

Patient race/ethnicity	Training dataset with known race/ethnicity pregnant cycle data (dataset K^ct) ^a (N = 80,068 cycles)		Training dataset with reported race/ethnicity pregnant cycle data (dataset R^ct) ^b (N = 52,038 cycles)		Training dataset with imputed race/ethnicity pregnant cycle data (dataset I^ct) ^c,d (N = 28,030 cycles)		Training dataset with reported and imputed race/ethnicity pregnant cycle data (dataset R + I^ct) ^d (N = 80,068 cycles)
Patient race/ethnicity	Stillbirth % (95% CI)	Stillbirth risk ratio (95% CI, SE)	Stillbirth % (95% CI)	Stillbirth risk ratio (95% CI, SE)	Stillbirth % (95% CI)	Stillbirth risk ratio (95% CI, SE)	Stillbirth % (95% CI)	Stillbirth risk ratio (95% CI, SE)
Non-Hispanic White	0.52 (0.47–0.59)	Reference	0.42 (0.36–0.49)	Reference	0.78 (0.66–0.92)	Reference	0.54 (0.48–0.61)	Reference
Non-Hispanic Black	1.75 (1.42–2.15)	3.34 (2.63–4.24, 0.41)	1.62 (1.24–2.11)	3.88 (2.85–5.28, 0.61)	1.47 (1.01–2.15)	1.89 (1.25–2.86, 0.40)	1.57 (1.26–1.95)	2.90 (2.27–3.71, 0.36)
Non-Hispanic Asian	0.45 (0.34–0.59)	0.85 (0.64–1.15, 0.13)	0.41 (0.28–0.59)	0.97 (0.65–1.46, 0.20)	0.46 (0.28–0.77)	0.60 (0.35–1.02, 0.16)	0.43 (0.32–0.58)	0.79 (0.57–1.09, 0.13)
Hispanic or Latino	0.99 (0.78–1.25)	1.89 (1.46–2.46, 0.25)	0.81 (0.58–1.12)	1.93 (1.34–2.78, 0.36)	1.24 (0.87–1.76)	1.59 (1.08–2.35, 0.32)	0.97 (0.76–1.23)	1.79 (1.37–2.33, 0.24)
Native Hawaiian or Other Pacific Islander	2.35 (0.89–6.10)	4.50 (1.70–11.93, 2.24)	1.69 (0.44–7.05)	4.02 (1.04–17.51, 2.89)	Not shown^e	Not shown^f	2.50 (0.61–10.38)	4.62 (1.14–20.47, 3.32)
American Indian or Alaska Native	0.52 (0.07–3.62)	1.00 (0.14–7.10, 1.00)	0.17 (0.12–5.85)	2.06 (0.29–14.62, 2.06)	Not shown^e	Not shown^f	0.47 (0.07–3.28)	0.87 (0.12–6.20, 0.87)
Two or more races	1.35 (0.64–2.80)	2.58 (1.23–5.43, 0.98)	1.16 (0.42–3.17)	2.76 (1.00–7.74, 1.44)	Not shown^e	Not shown^f	1.21 (0.53–2.75)	2.24 (0.97–5.23, 0.95)

Each patient in the subsample of training dataset K^ct has a known race/ethnicity and superscript “ct” denoting pregnant cycles obtained from patients in training data.

Race/ethnicity of average 32% of patients in the training data was imposed as missing and in dataset R^ct pregnant cycles were from patients whose race/ethnicity was imposed as missing were excluded (complete-case analysis).

Race/ethnicity of average 32% of patients in the training data was imposed as missing and then imputed and in dataset I^ct pregnant cycles were from patients whose race/ethnicity was imposed as missing and then imputed.

Results obtained using 50 replicates and each replicate used 20 imputed datasets.

Cells with values 1–4 of stillborn infants are suppressed to protect confidentiality.

Imprecise estimates with SE >5.

SE, standard error.

For stillbirth risk ratios, the largest differences were observed for American Indian or Alaska Native (1.00, 95% CI 0.14–7.10 in dataset K^ct vs. 2.06, 95% CI 0.29–14.62) and for NH Black (3.34, 95% CI 2.63–4.24 in dataset K^ct vs. 3.88, 95% CI 2.85–5.28 in dataset R^ct). In dataset I^ct (28,030 cycles), the 3 race/ethnicity groups, Native Hawaiian or other Pacific Islander, American Indian or Alaska Native, and two or more races, have small number of stillbirth cases (between 1 and 4), and were suppressed to protect confidentiality. For other race/ethnicity groups, the largest differences in stillbirth rates between dataset K^ct (known cycle race/ethnicity in training data) and dataset I^ct (imputed cycle race/ethnicity in training data) were observed for NH Black (1.75% vs. 1.47%, respectively) and NH White (0.52% vs. 0.78%, respectively).

For stillbirth risk ratios, the parameters were imprecisely estimated for three race/ethnicity groups (i.e., Native Hawaiian or other Pacific Islander, American Indian or Alaska Native, and two or more races) because of very small sample sizes and the rarity of the event. For other race/ethnicity groups, the largest differences were observed for NH Black (3.34, 95% CI 2.63–4.24 in dataset K^ct vs. 1.89, 95% CI 1.25–2.86 in dataset I^ct) and for Hispanic or Latino (1.89, 95% CI 1.46–2.46 in dataset K^ct vs. 1.59, 95% CI 1.08–2.35 in dataset I^ct). The largest differences in stillbirth rates per pregnancy between dataset K^ct (known cycle race/ethnicity in training data) and dataset R + I^ct (reported and imputed cycle race/ethnicity data in training data) were observed for NH Black (1.75% vs. 1.57%, respectively) and for the Native Hawaiian or other Pacific Islander group (2.35% vs. 2.50%, respectively).

For stillbirth risk ratios, the largest differences were observed for NH Black (3.34, 95% CI 2.63–4.24, in dataset K^ct vs. 2.90, 95% CI 2.27–3.71 in dataset R + I^ct) and for two or more races (2.58, 95% CI 1.23–5.43 in dataset K^ct vs. 2.24, 95% CI 0.97–5.23 in dataset R + I^ct).

Table 4 also demonstrates that the analysis based on dataset R + I^ct (reported and imputed cycle race/ethnicity data in training data) generally yielded smaller SE and narrower interval estimates than analysis based only on dataset R^ct (only reported cycle race/ethnicity in training data). Furthermore, compared to the analysis based on dataset K^ct (known cycle race/ethnicity in training data), the biases for stillbirth per pregnancy across different race/ethnicity groups were generally smaller for dataset R + I^ct with a range of 0.02%–0.18% than those obtained for dataset R^ct with a range of 0.04%–0.66%, and those obtained for dataset I^ct with a range of 0.01%–0.28%. Similarly, the range of biases for the risk ratio of stillbirth per pregnancy for the different race/ethnicity groups was smaller for dataset R + I^ct with a range of 0.06–0.44 compared to dataset R^ct with a range of 0.04–1.06, and the dataset I^ct with a range of 0.25–1.45.

The results from the sensitivity analysis with stillbirth excluded from the imputation model showed larger bias of stillbirth per pregnancy between K^ct and R + I^ct for NH Native Hawaiian or other Pacific Islander (2.35% vs. 1.27%, respectively), and two or more races (1.35% vs. 0.94%, respectively), compared to the biases observed when stillbirth was included in the imputation model that are displayed in Table 4. Similarly, larger biases of stillbirth risk ratio are observed compared to imputation models that include stillbirth (4.50, 95% CI 1.70–11.93, in dataset K^ct vs. 2.33, 95% CI 0.60–9.99 in dataset R + I^ct) and (2.58, 95% CI 1.23–5.43, in dataset K^ct vs. 1.72, 95% CI 0.71–4.23 in dataset R + I^ct).

Discussion

To overcome the large proportion of missing race/ethnicity values in NASS, we used MI to estimate race/ethnicity values. The evaluation and testing of the imputed race/ethnicity information based on a simulated data demonstrated high degree of accuracy and applicability to analysis of ART outcomes. Imputation was performed at the patient level to avoid inconsistencies in race/ethnicity imputation across ART cycles for the same patient, which occurred in previous efforts performed at the cycle level.^1,29 In addition to including NASS variables in the imputation model, we also included zip code-level information on racial/ethnic population distributions from the 2010 U.S. Census data to improve model's performance.

We evaluated the proposed imputation procedure under MAR. Our evaluation showed that this imputation procedure correctly predicted 89.91% of known race/ethnicity values on average across 50 replicates. This is comparable to a similar imputation approach in which the correct prediction rate was ∼81%.³⁰ The accuracy by race/ethnicity groups showed that imputation of large race/ethnicity groups was more accurate than small race/ethnicity groups, which may increase bias of parameter estimates in small race/ethnicity groups when using imputed data. However, the imputed and observed datasets generally resulted in smaller biases compared to using only the observed data.

In our study, the largest difference across race/ethnicity groups between the observed data and the observed and imputed race/ethnicity data was less than 1% (0.53%). This shows that the distributions of race/ethnicity groups in the observed and the observed and imputed datasets are relatively similar; however, even small differences may influence estimates of certain associations. We examined this by estimating stillbirth rates for each race/ethnicity group using the training data with 32% imposed as missing. Compared to complete-case (only observed) analysis, the observed and imputed race/ethnicity data analysis reduced the bias of estimates of stillbirth rate for each race/ethnicity group except for the NH Black group and yielded narrower confidence intervals (CI). This shows that the analysis using imputed data improves estimates of the association between race/ethnicity and a relevant outcome. These results underscore the importance of collecting complete race/ethnicity information for ART surveillance. However, when missingness exists, using multiply imputed race/ethnicity data has better operating characteristics than complete-case analysis.

This study has several limitations. First, the imputation models and the evaluation method assumed that race/ethnicity is missing at random. If this assumption is violated, the results may be biased in an unknown direction.³¹ In NASS data, the proportion of race/ethnicity missingness varies by clinic, with some reporting race/ethnicity information for all patients (0% missingness) and some not reporting race/ethnicity for any patients (100% missingness), which could have violated the assumption of missing at random mechanism. When a variable that is important for the analysis is excluded from the imputation model, one may expect larger bias, as shown in our analysis with stillbirth excluded from the study. Thus, following the general rule of thumb to include as many variables as possible in the imputation model is important to reduce possible biases and make the MAR assumption more plausible.³²

Specifically, it is important to include variables that are associated with the missingness mechanisms, the imputed variables, and variables for possible downstream analyses. Another limitation is that data are collected for each cycle such that imputed race/ethnicity across cycles for the same patient may be inconsistent if imputation was performed at the cycle level. To overcome this limitation and reduce computational burden, we transformed cycle-level variables to patient-level variables. However, it may impede the predictive ability of the model. Future studies could examine if other forms of aggregation and variables improve the imputation procedure. In addition, the clinics in the validation sample may not be representative of all ART clinics because their race/ethnicity missingness was less than the median, suggesting better overall data quality. Thus, our validation results may not be generalizable to patients at all ART clinics.

Moreover, in this study, we grouped patients with one more race reported in the two or more races and further specifying this group into distinct subgroups may help researchers and policymakers to better understand the experiences of the various subgroups within this heterogeneous group.

Conclusions

Multiply imputed race/ethnicity obtained using the proposed procedure under the MAR assumption correctly imputed race/ethnicity for over 89.91% of missing values and generally reduces bias of estimates of stillbirth prevalence compared to complete-case analysis in the validation sample. Generating multiple datasets with imputed race/ethnicity in NASS enables researchers to examine relationships between race/ethnicity and other variables with higher precision and accuracy. Continued efforts aimed at enhancing complete collection of race/ethnicity information, including collecting race/ethnicity at the patient level rather than the cycle level, could improve data quality in public health surveillance systems such as NASS and empower researchers and policymakers with necessary data to document racial and ethnic disparities and promote health equity.

Footnotes

Acknowledgments

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention or the CDC Foundation.

Authors' Contributions

Y.Z.: Conceptualization, Methodology, Data curation, Formal analysis, and Writing—Original draft.

D.M.K.: Conceptualization and Writing—Review and editing.

K.J.L.: Methodology, Data curation, Formal analysis, Validation, and Writing—Review and editing.

C.D.: Methodology; Formal analysis, Validation, and Writing—Review and editing.

A.K.Y.: Methodology; Validation, and Writing—Review and editing.

R.G.: Conceptualization; Methodology; and Writing—Review and editing.

All authors read and approved the final article.

Ethics Approval

Epidemiological research using NASS data is approved by the Institutional Review Board at the CDC.

Availability of Data and Materials

The datasets generated and/or analyzed during the current study are not publicly available because they are protected under Assurance of Confidentiality.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

Support for this research was provided by Open Philanthropy through a grant to the CDC Foundation.

References

Dieke

, Zhang

, Kissin

, et al. Disparities in assisted reproductive technology utilization by race/ethnicity, United States, 2014: A Commentary. J Women's Health (Larchmt), 2017; 26(6):605–608; doi: 10.1089/jwh.2017.6467

Shapiro

, Darmon

, Barad

, et al. Effect of race and ethnicity on utilization and outcomes of assisted reproductive technology in the USA. Reprod Biol Endocrinol, 2017; 5(1):44; doi: 10.1186/s12958-017-0262-5

Kotlyar

, Simsek

, Seifer

. Disparities in ART live birth and cumulative live birth outcomes for Hispanic and Asian women compared to White Non-Hispanic women. J Clin Med, 2021; 10(12):2615; doi: 10.3390/jcm10122615

Seifer

, Simsek

, Wantman

, et al. Status of racial disparities between black and white women undergoing assisted reproductive technology in the US. Reprod Biol Endocrinol, 2020; 18(1):113; doi: 10.1186/s12958-020-00662-4

Humphries

, Chang

, Humm

, et al. Influence of race and ethnicity on in vitro fertilization outcomes: Systematic review. Am J Obstet Gynecol, 2016; 214(2):212.e1–212.e17; doi: 10.1016/j.ajog.2015.09.002

Crawford

, Joshi

, Boulet

, et al. Maternal racial and ethnic disparities in neonatal birth outcomes with and without assisted reproduction. Obstet Gynecol, 2017; 129(6):1022–1030; doi: 10.1097/AOG.0000000000002031

Fujimoto

, Luke

, Brown

, et al. Racial and ethnic disparities in assisted reproductive technology outcomes in the United States. Fertil Steril, 2010; 93(2):382–390; doi: 10.1016/j.fertnstert.2008.10.061

Little

RJA

, Rubin

. Statistical Analysis with Missing Data. John Wiley & Sons, Inc.: Hoboken NJ; 2002.

Long

, Bamba

, Ling

, et al. Missing race/ethnicity data in Veterans Health Administration based disparities research: A systematic review. J Health Care Poor Underserved, 2006; 17(1):128–140; doi: 10.1353/hpu.2006.0029

10.

Smith

, Iyer

, Langer-Gould

, et al. Health plan administrative records versus birth certificate records: Quality of race and ethnicity information in children. BMC Health Serv Res, 2010; 10:316; doi:10.1186/1472-6963-10-316

11.

Zha

, Harel

. Power calculation in multiply imputed data. Stat Pap, 2021; 62(3):533–559; doi:10.1007/s00362-019-01098-8

12.

Grundmeier

, Song

, Ramos

, et al. Imputing missing race/ethnicity in pediatric electronic health records: Reducing bias with use of U.S. Census location and surname data. Health Serv Res, 2015; 50(4):946-960; doi: 10.1111/1475-6773.12295

13.

, Zhang

, Lyman

, et al. The HCUP SID imputation project: improving statistical inferences for health disparities research by imputing missing race data. Health Serv Res, 2018; 53(3):1870–1889; doi: 10.1111/1475-6773

14.

Rubin

DB.

Multiple Imputation for Nonresponse in Surveys, vol. 81. John Wiley & Sons, Inc.: New York; 2004.

15.

Chang

, Deng

, Jiang

, et al. Multiple imputation for analysis of incomplete data in distributed health data networks. Nat Commun, 2020; 11(1):5467; doi: 10.1038/s41467-020-19270-2

16.

Moore

, Hanley

, Lavoie

, et al. Evaluating the validity of multiple imputation for missing physiological data in the national trauma data bank. J Emerg Trauma Shock, 2009; 2(2):73–79; doi: 10.4103/0974-2700.44774

17.

van Buuren

Flexible Imputation of Missing Data, 2nd ed. Chapman and Hall, CRC Press: New York; 2018.

18.

Azur

, Stuart

, Frangakis

, et al. Multiple imputation by chained equations: What is it and how does it work?. Int J Methods Psychiatr Res, 2011; 20(1):40–49; doi: 10.1002/mpr.329

19.

Silva

, Trivedi

, Gutman

. Developing and evaluating methods to impute race/ethnicity in an incomplete dataset. Health Serv Outcomes Res Methodol, 2019; 19(1):1–21; doi:10.1007/s10742-019-00200-9

20.

Centers of Disease Control and Prevention. National ART Surveillance. Available from: https://www.cdc.gov/art/nass/index.html [Last accessed: June 30, 2023 ].

21.

National Center for Education Statistics. Definitions for New Race and Ethnicity Categories. Available from: https://nces.ed.gov/ipeds/report-your-data/race-ethnicity-definitions [Last accessed: June 30, 2023 ].

22.

Centers for Disease Control and Prevention. 2018. Assisted Reproductive Technology Fertility Clinic Success Rates Report. U.S. Department of Health and Human Services: Atlanta, GA; 2020.

23.

National Bureau of Economic Research. Census SF1 ZIP Code Tabulation Area (ZCTA) Data—2010 from Summary File 1. Available from: https://www.nber.org/research/data/census-sf1-zip-code-tabulation-area-zcta-data-2010 [Last accessed: June 30, 2023 ].

24.

Meng

XL.

Multiple-imputation inferences with uncongenial sources of input. Stat Sci, 1994; 9(4):538–558; doi: 10.1214/ss/1177010269

25.

van Buuren

, Groothuis-Oudshoorn

. Mice: Multivariate imputation by chained equations in R. J Stat Softw, 2011; 45(3):1–67; doi: 10.18637/jss.v045.i03

26.

Fisher

A. The use of multiple measurements in taxonomic problems. Ann Eugen, 1936; 7(2):179–188.

27.

Berglund

, Heeringa

Multiple Imputation of Missing Data Using SAS. SAS Institute, Inc.: Carry NC; 2014.

28.

Zhang

, Rose

, Zhang

, et al. Multiple imputation of missing race and ethnicity in CDC COVID-19 case-level surveillance data. Int J Stat Med Res, 2022; 11:1–11; doi: 10.6000/1929-6029.2022.11.01

29.

Schirmer

3rd , Kulkarni

, Zhang

, et al. Ovarian hyperstimulation syndrome after assisted reproductive technologies: Trends, predictors, and pregnancy outcomes. Fertil Steril, 2020; 114(3):567–578; doi: 10.1016/j.fertnstert.2020.04.004

30.

Xue

, Harel

, Aseltine

Jr . Imputing race and ethnic information in administrative health data. Health Serv Res, 2019; 54(4):957–963. doi: 10.1111/1475-6773.13171

31.

Jakobsen

, Gluud

, Wetterslev

, et al. When and how should multiple imputation be used for handling missing data in randomized clinical trials—A practical guide with flowcharts. BMC Med Res Methodol, 2017; 17(1):162; doi: 10.1186/s12874-017-0442-1

32.

Rubin

DB.

Multiple imputation after 18+ years. J Am Stat Assoc, 1996; 91(434):473–489; doi: 10.2307/2291635