The Reliability of Patient-Reported Pregnancy Outcome Data

Abstract

Pregnancy and neonatal outcome information is frequently used in disease management to evaluate the cost-effectiveness of prenatal interventions and for other research and reporting activities. The purpose of this study was to determine if a telephone interview process is a reliable methodology for collecting pregnancy outcomes.

High-risk patients from a large maternal–fetal medicine practice who received outpatient preterm labor management services from January 1996 to June 2001 were identified. Patient-reported pregnancy outcome data for 285 mothers and 478 infants were collected via a telephone interview by a perinatal nurse and compared to pregnancy outcome data abstracted from the maternal and infant hospital records. Overall, concordance and/or Kappa coefficients between maternal report and the medical record were high for delivery date (96.4%), birth weight within 100 grams (88.9%), Cesarean delivery (99.0%, Kappa = 0.98), and high-level nursery admission (91.2%, Kappa = 0.82). Both singleton and multiple gestation types accurately reported pregnancy outcome information.

A telephone interview with a skilled nurse can be a reliable methodology for collection of valuable clinical and research data related to pregnancy outcome. Data collected in this manner and maintained in a database may be used with a high level of confidence by health care providers, payers, and researchers. (Population Health Management 2010;13:27–32)

Introduction

Information relative to pregnancy and neonatal outcome events is important to health care providers, payers, and researchers. Health care providers use this information to assess a patient's health history and the risk status of subsequent pregnancies. Payers use these data to measure the effectiveness of disease management programs and clinical interventions designed to impact pregnancy outcomes and health care costs. Researchers utilize pregnancy and neonatal outcomes in epidemiological and other studies. The accuracy of this information is essential for the reporting of various program and treatment results, validity of research, and in clinical decision making regarding the management of subsequent pregnancies.

Pregnancy and neonatal outcomes data may be collected from medical records, insurance claims, or directly from patients. The hospital medical record is often considered to be the most accurate data source. Because fewer than 2% of US hospitals have comprehensive electronic records systems,¹ timely retrieval and identification of the required information from inpatient paper medical records often is not realistic for providers in physicians' offices or other outpatient settings. Abstraction of data from the medical chart requires health care knowledge and may be difficult due to inconsistencies in record organization and forms utilized.² Although the medical record is assumed to be the gold standard, recording errors can occur and medical criteria may vary from hospital to hospital.^3,4 Health Insurance Portability and Accountability Act privacy requirements may also limit medical record availability. Because of these issues, the evaluation of insurance claims data has been proposed as a method of data collection. The issues with the use of claims data involve the general lack of timeliness and completeness of clinical information needed to assess program and treatment outcomes, as claims data are collected for payment purposes. In addition, key outcome parameters such as actual infant birth weight are not available from claims data sets.

As a consequence of the limitations associated with utilization of medical records and insurance claims data, pregnancy and neonatal outcome data are frequently obtained from maternal report using personal interviews or questionnaires, though the reliability of this data collection method is often questioned. The purpose of this study was to determine if a telephone interview process is a reliable methodology for collecting pregnancy and neonatal outcomes data.

Methods

The population for this study included women with high-risk pregnancies who received outpatient preterm labor surveillance and management services through Matria Healthcare (now Alere) and prenatal care from physicians in a large maternal–fetal medicine practice in Phoenix, Arizona from January 1996 through June 2001. Alere provides comprehensive disease management services for the condition of pregnancy through their Women's and Children's Health division. These services include pregnancy risk assessment, patient education, obstetrical case management, and specialized condition-management outpatient programs for high-risk conditions such as gestational diabetes, preterm labor, gestational hypertension, coagulation disorders, and hyperemesis. Collection and assessment of maternal biometric condition-specific data, as well as scheduled and as-needed nursing interaction, are integral components of each outpatient program. Information collected from each patient and/or her health care provider includes medical and obstetrical history, current pregnancy diagnoses, hospital utilization, objective biometric data specific to her enrolled outpatient program, subjective patient assessments, and physician and nursing interventions.

For this study, women with singleton, twin, and triplet gestations who were enrolled for specialized condition-management services and who had complete pregnancy and neonatal outcome information in the database were eligible for inclusion. At the initiation of outpatient perinatal services, all patients provided written informed consent to receive care and to allow data from their clinical records to be used anonymously for research and reporting purposes. Delivery information and neonatal outcomes data were collected from the patients at 2–6 weeks post delivery by skilled perinatal nurses via a telephone interview using a structured data collection tool. During the interview process, the patient was asked to recall specific information related to her delivery such as delivery date, delivery type (vaginal or cesarean), infant sex and weight(s), nursery utilization (regular/rooming-in, or if a “higher-level” nursery was required such as Special Care Nursery, Intermediate Level II Nursery, or Level III Neonatal Intensive Care Unit [NICU]), infant length of stay in each nursery, infant need for ventilatory assistance, and specific neonatal complications (ie, hypoglycemia, respiratory distress syndrome, necrotizing enterocolitis, shoulder dystocia, sepsis, intraventricular hemorrhage, meconium aspiration, or congenital anomalies). The patient-reported pregnancy outcomes data were entered into a relational database linking all maternal and newborn information. If the infant remained hospitalized at the time of initial pregnancy outcome collection, follow-up calls were made to complete the neonatal outcome record. Once data entry was designated complete, the record was archived to the database and no further changes were made.

To perform the present study, 2 qualified perinatal research nurses, who were blinded to the data collected during the telephone interview process, obtained maternal delivery and neonatal nursery inpatient records for the population described previously. Electronic medical records were not utilized at the study institutions; therefore, paper charts for the mothers and infants were requested from the medical records department at the delivery hospital. If the maternal and/or neonatal hospital medical records were unable to be obtained from the hospital, the patient was excluded. Pregnancy and neonatal outcome information was abstracted from the hospital medical record using the identical data collection tool utilized in the patient interview process. Thirteen standard source documents in the medical record were identified at the beginning of the study from 3 participating hospitals and used throughout the data abstraction process to maintain consistency and accuracy. Source documents included the maternal-fetal database record, labor and delivery record, neonatal birth and transition record, newborn assessment form, physician progress notes and orders, and nursery flow sheets. If the data in the preferred source document were inconsistent with other data sources in the medical record, the research nurses used the data with the best agreement throughout the medical record. Periodic audits were conducted to validate the accuracy of the data being abstracted.

Analysis

Patient-reported outcomes data collected via a structured telephone interview and documented in the outpatient record were compared to data abstracted from the medical record review and analyzed for the following key perinatal indicators: delivery date and type, sex, neonatal birth weight, high-level nursery admission (NICU, special care nursery, or intermediate care), high-level nursery length of stay, neonatal ventilator use, and neonatal complications. Delivery date was used in conjunction with estimated date of delivery to determine gestational age at delivery. Delivery type (vaginal vs. cesarean section) is an important cost driver and measure of maternal/neonatal morbidity. Neonatal birth weight was used to determine low birth weight, very low birth weight, or macrosomia status for Healthcare Effectiveness Data and Information Set (HEDIS) reporting and is an indirect measure of morbidity. High-level nursery, which includes NICU, special care, or intermediate care admission rate, is an important cost driver and indirect measure of neonatal morbidity. Length of nursery stay and neonatal need for assisted ventilation are both important cost drivers and important indirect measures of neonatal morbidity.

The level of agreement (concordance) between the patient-reported telephone interview data and the data abstracted from the hospital medical records was determined for each data element and reported as the percent of records with concordance between sources. The concordance between the patient-reported telephone interview data and the data abstracted from the hospital medical records were compared using Cohen's Kappa intraclass correlation coefficient with a corresponding 95% confidence interval. A Kappa is a measurement of agreement that is corrected for the amount of agreement expected by random chance alone. A Kappa of .20 or less reflects slight agreement, 0.21–0.40 reflects fair agreement, 0.41–0.60 reflects moderate agreement, 0.61–0.80 substantial agreement, and greater than 0.80 almost perfect agreement.^5,6 Because a low Kappa value may reflect a low prevalence of that trait in a population and not lack of agreement,^7
–9 both the proportion of positive agreement (documented as yes in both sources) and proportion of negative agreement (documented as no in both sources) were calculated for elements pertaining to neonatal complications; the sum of the two is designated as “concordance” and reported as a percentage.

Results

Initially, a convenience sample of 435 maternal records was chosen from the outpatient database for the study period. A total of 150 records were excluded because of incomplete data in the maternal or neonatal record in either the Alere database or the hospital record, and loss of follow-up for neonatal records secondary to transfer of infants to other hospitals. After exclusions, data from 159 mothers with 159 singleton infants, 59 mothers with 118 twin infants, and 67 mothers with 201 triplet infants were analyzed for a total of 285 maternal and 478 neonatal records. Characteristics of the overall study population included a mean maternal age of 29.1 ± 6.3 years; 38.9% were nulliparous, 73.0% were married, 6.0% were cigarette smokers, 15.1% had a cerclage, and 28.8% had a history of preterm delivery. The mean gestational age at the start of outpatient services was 24.4 ± 5.3 weeks, and the mean gestational age at delivery was 35.5 ± 2.6 weeks. Maternal characteristics are presented by gestation type in Table 1.

Table 1.

Maternal Characteristics of Women by Gestation Type (n = 285)

Characteristic	Singletons n = 159	Twins n = 59	Triplets n = 67
Maternal age (years)	27.5 ± 6.1	30.5 ± 6.7	31.8 ± 5.1
Married (%)	62.3%	74.6%	97.0%
Smoker (%)	8.2%	6.8%	0%
Nulliparous (%)	27.7%	42.4%	62.7%
History of preterm delivery (%)	47.8%	6.8%	3.0%
Cerclage (%)	23.9%	6.8%	1.5%
GA start outpatient services (weeks)	25.8 ± 6.0	25.1 ± 3.2	20.4 ± 2.5
GA at delivery (weeks)	36.6 ± 2.3	34.6 ± 2.4	33.4 ± 2.1

Data presented as mean ± SD or percentage as indicated.

GA = gestational age.

Overall, exact concordance between the patient-reported data and the hospital medical record was 96.4% for delivery date, 99% (Kappa = 0.98) for delivery route (classified as either cesarean section or vaginal), and 98.1% (Kappa = 0.96) for infant sex, which reflects an almost perfect level of agreement for these data elements. Birth weight (used to determine low birth weight, very low birth weight, or macrosomia status) was recalled within 100 grams by 89% of women and within 200 grams by 95% of women. The mean difference between the records for birth weight in grams was 15.7 ± 103.2. There was 98% concordance for both low birth weight (Kappa = 0.95) and very low birth weight (Kappa = 0.90) babies and 100% concordance for birth weight >4000 grams. In the patient-reported data, maternal recall of birth weight was recorded in pounds/ounces and then converted to grams by calculations programmed in the computer system. Birth weight in the hospital medical records was recorded in either pounds/ounces or in grams. Several hospital records had a discrepancy in birth weight between the labor and delivery record, the neonatal birth and transition record, and the newborn assessment form; therefore, the birth weight documented most consistently was used. For the purposes of our study, all birth weights were converted to grams. Although concordance was high, this conversion from pounds to grams may account for some of the differences in agreement.

Regarding nursery utilization, 91% (Kappa = 0.82) of women were able to correctly report if their infants were admitted to a high-level nursery, defined as 8 hours or longer in the Special Care, Intermediate, or NICU nursery. Although many of these babies had extended lengths of stay and were transferred between units, 92% (Kappa = 0.84) of mothers were able to accurately recall their infant's exact hospital length of stay within 7 days. Overall, mothers were able to substantially recall neonatal need for assisted ventilation (92% concordance [Kappa = 0.67]). Level of agreement between the maternal report and the medical record remained consistent for the identified key perinatal indicators when stratified by singleton, twin, and triplet gestation (Tables 2, 3, 4).

Table 2.

Pregnancy Outcomes for Singleton Infants (n = 159)

Parameter	Concordance	Kappa (95% CI)
Cesarean delivery	98.6%	0.97 (0.94, 1.00)
Infant sex	97.9%	0.96 (0.93, 0.99)
HLN admission	90.6%	0.81 (0.74, 0.88)
HLN <7 days	91.2%	0.81 (0.74, 0.88)
Ventilator use	94.4%	0.71 (0.58, 0.84)
BW <1500 gms	97.9%	0.91 (0.84, 0.98)
BW <2500 gms	98.2%	0.96 (0.93, 0.99)
BW >4000 gms	100%	—

BW = birth weight; CI = confidence interval; HLN = high-level nursery; gms = grams.

Table 3.

Pregnancy Outcomes for Twin Infants (n = 118)

Parameter	Concordance	Kappa (95% CI)
Cesarean delivery	100%	—
Infant sex	98.5%	0.97 (0.92, 1.00)
HLN admission	90.4%	0.78 (0.66, 0.90)
HLN <7 days	92.8%	0.85 (0.76, 0.94)
Ventilator use	89.6%	0.67 (0.51, 0.83)
BW <1500 gms	97.6%	0.87 (0.72, 1.00)
BW <2500 gms	97.6%	0.91 (0.81, 1.00)
BW >4000 gms	100%	—

BW = birth weight; CI = confidence interval, HLN = high-level nursery; gms = grams.

Table 4.

Pregnancy Outcomes for Triplet Infants (n = 201)

Parameter	Concordance	Kappa (95% CI)
Cesarean delivery	100%	—
Infant sex	98.5%	0.97 (0.91, 1.00)
HLN admission	95.5%	0.87 (0.72, 1.00)
HLN <7 days	95.5%	0.88 (0.75, 1.00)
Ventilator use	83.6%	0.54 (0.31, 0.77)
BW <1500 gms	100%	—
BW <2500 gms	97.1%	0.82 (0.57, 1.00)
BW >4000 gms	100%	—

BW = birth weight; CI = confidence interval; HLN = high-level nursery; gms = grams.

The overall incidence of severe neonatal complications was low; thus overall concordance and positive and negative agreement are reported (Table 5). Respiratory distress syndrome was the most commonly reported complication at 6.9% (33/478), hypoglycemia was second at 2.1% (10/478), and congenital anomalies were reported for 1.9% (9/478) of infants. Even less common were sepsis at 0.8% (4/478), necrotizing enterocolitis at 0.6% (3/478), and intraventricular hemorrhage at 0.2% (1/478). There were no reports of shoulder dystocia or meconium aspiration. Overall, concordance and negative agreement for assessed neonatal complications was over 90% for all data elements except for respiratory distress syndrome, for which concordance and negative agreement were 80.7% and 81.3%, respectively. Discrepancies between what the patient reported and information in the medical record were most frequent for reports of hypoglycemia and congenital anomaly.

Table 5.

Agreement for Assessed Neonatal Complications

	Patient-reported incidence	Overall concordance	Positive agreement	Negative agreement
Hypoglycemia	10	93.3%	20%	93.6%
Respiratory distress syndrome	33	80.7%	72.7%	81.3%
Necrotizing enterocolitis	3	99.8%	66.6%	100%
Shoulder dystocia	0	99.6%	N/A	99.6%
Sepsis	4	95.4%	75%	95.6%
Intraventricular hemorrhage	1	97.7%	100%	97.7%
Meconium aspiration	0	100%	N/A	100%
Congenital anomaly	9	93.7%	55.5%	94.5%

Discussion

In this study, we have shown that, during a telephone interview with a skilled nurse, women with high-risk pregnancies are able to accurately report many pregnancy and neonatal outcome events with a substantial to almost perfect level of consistency with their medical records. The present study is different from previous reports in that we studied only women at high risk for preterm delivery who received outpatient perinatal services and we examined the recall of mothers having multiple infants.

Our data are consistent with other studies for recall of birth weight, method of delivery, and delivery date. In a study that compared maternal recall with the clinical record, Lederman and Paxton¹⁰ reported 100% concordance on infant's birth date and method of delivery, with no significant differences in mean birth weight. In general, they found that reported data were more complete than data in the clinical records. Hewson and Bennett³ interviewed 397 low-risk primiparous women to compare their responses with the medical record and found 100% agreement for method of delivery. In the present study concordance did not reach 100%, but was greater than 96% for birth date, delivery route, and infant sex. We attribute these differences to possible keystroke or recording errors, which may have occurred during the patient interview and/or data entry process.

Our results and others have shown accuracy of maternal recall for infant birth weight. Olson et al¹¹ assessed the validity and reliability of maternally reported birth information for 754 women and found excellent correlation between the medical chart and the interview data for birth weight (r = 0.98, Kappa = 0.91). O'Sullivan et al¹² validated the accuracy of parental recall of birth weights for 649 children and found it was accurate across social classes up to 16 years after delivery, with 75% of recalled birth weights within 50 grams and 85% within 100 grams of that recorded in hospital records. Seidman et al¹³ examined the accuracy of maternal recall of birth weight for 880 children and found a similar rate of 75% congruence within 100 grams. They concluded that, within defined limitations, maternal recall of birth weight is sufficiently accurate for clinical and epidemiological use. In the present study, women were able to provide information allowing for accurate identification of infant low birth weight, very low birth weight, and macrosomia—all conditions related to increased neonatal morbidity.

Unlike the key indicators of birth weight and method of delivery, there is very little data in the literature about mothers' recall of infant high-level nursery admission, nursery length of stay, or ventilator use. Githens et al¹⁴ conducted a phone survey to assess the recall of 102 women and found an overall 89% agreement for information related to maternal risk factors, prenatal care, and maternal and infant complications. They examined recall of problems encountered during the first 2 weeks postpartum and found a 95% agreement (Kappa = 0.90) for the response “infant admitted to NICU.” In the present study, we examined the mothers' ability to report if their infant(s) was/were admitted to a high-level nursery such as Special Care, Intermediate, or NICU. Recall of this important cost driver fell into the almost perfect category with a Kappa of 0.82 and 91% agreement. Differences between maternal recall and the medical record for high-level nursery admission may be related to the definition of an admission (length of stay longer than 8 hours) used in the data abstraction process, whereas the outpatient record did not report hours of nursery stay, and differences in terminology and definition of high-level nursery.

Another important outcome measure used to identify morbidity and impacting cost is length of nursery stay. In the present study of high-risk and multiple gestation pregnancies, for which one would expect extended lengths of stay, there was concordance of 92% (Kappa = 0.82) for length of nursery stay within 7 days. Recall of the need for mechanical ventilation was almost perfect with 92% concordance and a Kappa of 0.67. Differences here may be related to the inability of some mothers to distinguish between intubation, continuous positive airway pressure, and supplemental oxygen.

Concordance and negative agreement between patient-reported data and the medical record was over 90% for 7 of 8 specific neonatal complications. Thus, mothers generally are able to accurately report the absence of neonatal complications. Positive agreement was somewhat mixed. Discrepancies between maternal reports and the medical record may be related to the inability of some mothers to adequately understand the medical terminology used to describe their infants' condition during the NICU stay. In addition, physicians or nurses may use different phrases and terms to describe a condition to the mother rather than providing a specific diagnosis. This may make the reporting of a specific diagnosis at a later time difficult. Earlier studies have shown that parents correctly report only 50%–62% of diagnoses assigned to infants during a NICU stay.^15,16 Caution should be exercised when utilizing patient-reported data pertaining to specific neonatal diagnoses. The authors suggest utilization of data pertaining to high-level nursery admission, nursery length of stay, and the need for ventilatory assistance as surrogate measures for measuring neonatal morbidity when using patient-reported data. Another important consideration is the timing of data collection. In the present study, data were collected primarily within 2–6 weeks of delivery. Accuracy of maternal recall of neonatal events months or years after delivery was not addressed in the current study design.

There are advantages and disadvantages to any data collection methodology. Utilizing medical records to obtain information for clinical or research purposes is not always feasible; thus, frequently the patient is relied upon as the source of medical history information in the physician's office or other outpatient settings. Telephone interviews are frequently conducted to obtain health information for patient screening or disease management programs. The reliability of information obtained from patients is a limitation in that it may be impacted by many factors such as comprehension, recall, culture, and environment.¹⁷ Broader implementation of electronic health records may provide faster access to a patient's medical information, reduce health care costs, improve quality of care, and promote evidence-based medicine, though the findings from this investigation show that general pregnancy outcome and neonatal data can be obtained from patients with a high level of accuracy. These results may have been influenced by the high level of patient care received throughout the latter stage of pregnancy as well as the use of skilled perinatal nurses to conduct the telephone interviews.

In this study, we sought to determine if a telephone interview with a skilled perinatal nurse is a reliable methodology to collect valuable pregnancy outcome information. We examined maternal recall for key pregnancy and neonatal outcome indicators used by physicians, researchers, and payers. We conclude that women who have high-risk pregnancies, including those with twin and triplet gestations, are able to report pregnancy and neonatal events with a substantial to almost perfect level of consistency compared with information documented in the hospital medical record. Pregnancy outcomes data collected in this manner and maintained in a database may be utilized with a high level of confidence for reporting and research purposes.

Footnotes

Author Disclosure Statement

Ms. Desch, Ms. Istwan, Ms. Rhea, Ms. Collins, and Dr. Stanziano are all employees of Alere (formerly Matria Healthcare). Dr. Elliott is in private practice and is on the Speaker's Bureau for Alere.

No outside funding was received by any party for performing this study.

Presented at the Forum ’08, sponsored by the DMAA: The Care Continuum Alliance, on November 23–25, 2008 in Hollywood, Florida.

References

Jha

, DesRoches

, Campbell

et al. Use of electronic health records in U.S. hospitals. N Engl J Med, 2009; 360:1628–1638.

Marks

, Lee

, Slezak

, Berger

, Patel

, Johnson

. Agreement between insurance claim and self-reported hospital and emergency room utilization data among persons with diabetes. Dis Manag, 2003; 6:199–205.

Hewson

, Bennett

. Childbirth research data: Medical records or women's reports? Am J Epidemiol, 1987; 125:484–491.

Joffe

, Grisso

. Comparison of ante-natal hospital records with retrospective interviewing. J Biosoc Sci, 1985; 17:113–119.

Cohen

. A coefficient of agreement for nominal scales. Educ Psychol Meas, 1960; 20:37–46.

Landis

, Koch

. The measurement of observer agreement for categorical data. Biometrics, 1977; 33:159–175.

Feinstein

, Cicchetti

. High agreement but low kappa: I. The problem of two paradoxes. J Clin Epidemiol, 1990; 43:543–549.

Cicchetti

, Feinstein

. High agreement but low kappa: II. Resolving the paradoxes. J Clin Epidemiol, 1990; 43:551–558.

Yawn

, Suman

, Jacobsen

. Maternal recall of distant pregnancy events. J Clin Epidemiol, 1998; 51:399–405.

10.

Lederman

, Paxton

. Maternal reporting of prepregnancy weight and birth outcome: Consistency and completeness compared with the clinical record. Matern Child Health, 1998; 2:123–126.

11.

Olson

, Shu

, Ross

, Pendergrass

, Robison

. Medial record validation of maternally reported birth characteristics and pregnancy-related events: A report from the Children's Cancer Group. Am J Epidemiol, 1997; 145:58–67.

12.

O'Sullivan

, Pearce

, Parker

. Parental recall of birth weight: How accurate is it? Arch Dis Child, 2000; 82:202–203.

13.

Seidman

, Slater

, Ever-Hadani

, Gale

. Accuracy of mothers' recall of birthweight and gestational age. Br J Obstet Gynaecol, 1987; 94:731–735.

14.

Githens

, Glass

, Sloan

, Entman

. Maternal recall and medical records: An examination of events during pregnancy, childbirth, and early infancy. Birth, 1993; 20:136–141.

15.

Simons

CJR

, Ritchie

, Mullett

, Liechty

. Parental recall of infant medical complications and its relationship to delivery method and education level. J Dev Behav Pediatr, 1986; 7:355–360.

16.

Simons

CJR

, Ritchie

, Mullett

. Parents' perceptions of medical diagnoses and related issues for their high-risk infants. J Pediatr Health Care, 1998; 12:118–124.

17.

Redelmeier

, Tu

, Schull

, Ferris

, Hux

. Problems for clinical judgement: 2. Obtaining a reliable past medical history. CMAJ, 2001; 164:809–813.