Abstract
Background:
Depressive and anxiety disorders in the postpartum period cause significant suffering for women. State public health officials across the country use the Centers for Disease Control and Prevention (CDC)-sponsored Pregnancy Risk Assessment Monitoring System (PRAMS) to assess health behaviors and conditions, including depression and anxiety, that occur around the time of pregnancy. The purpose of the present study was to validate two to three items that could be included on the PRAMS questionnaire to detect depression and anxiety among postpartum women in a surveillance system.
Methods:
A comprehensive set of 16 depression and anxiety items was developed and tested in a final sample of 1077 postpartum women, 353 of whom completed Structured Clinical Interview for DSM-IV (SCID) interviews to determine the presence of a major depressive episode (MDE) and generalized anxiety disorder (GAD). Regression analyses reduced candidate items to 5 each for MDE and GAD. Responses were scored on a 5-point scale ranging from never (1) to always (5), and 2 and 3 item combinations of these items were examined for their psychometric properties as indicators of MDE and GAD.
Results:
Item sets varied in their psychometric properties. The combination of depressed mood, felt hopeless, and slowed down > 9 (out of a possible total of 15) yielded the highest positive predictive value (PPV=60) and estimated MDE prevalence most accurately (24.4% vs. 25.4% true prevalence). The combination of felt panicky, felt restless, and problems sleeping >9 estimated GAD prevalence most accurately (20.2% vs. 15.7% true prevalence) and had high specificity (83%).
Conclusions:
Depression and anxiety can be detected using very few items, which makes assessment feasible in surveillance systems, such as PRAMS, and in primary care settings that have severe limits on time for depression and anxiety screening.
Introduction
Depressive and anxiety disorders are disabling conditions that are common among women of childbearing age. 1,2 Postpartum depression (PPD) in particular is a serious mental health problem. 1 The period prevalence of depression over the first 3 months postpartum is approximately 19% for major and minor depression and 7% for major depression alone. 3 PPD may persist for many months, and even after successful treatment, many women experience relapse or recurrence. 4,5 Deleterious effects extend to the offspring and may cause delays in socioemotional and cognitive development and increased risk for internalizing and externalizing disorders and major depression. 6,7
Anxiety disorders in the perinatal period are less studied than depression, but it is becoming clear that they are as prevalent as major depression. Generalized anxiety disorder (GAD) has a prevalence rate of about 7% at 6 months postpartum; it is frequently comorbid with depression and leads to significant social impairment. 2
Consequences of PPD and generalized anxiety have led state public health departments to survey their prevalence to provide accurate and contemporary information for public health planning. The Pregnancy Risk Assessment Monitoring System
8
(PRAMS) is one of the primary tools used by state public health departments in the United States to assess risk factors for adverse maternal and infant outcomes. PRAMS, initiated in 1987, is an ongoing state-based and population-based surveillance system designed to monitor selected self-reported maternal behaviors and experiences that occur before, during, and after pregnancy among women who deliver a liveborn infant. PRAMS is administered by the Centers for Disease Control and Prevention (CDC) National Center for Chronic Disease Prevention and Health Promotion, Division of Reproductive Health, in collaboration with state health departments (
Since its inception, PRAMS has expanded from 6 to 40 states and New York City. Collectively, PRAMS represents approximately 78% of all live births in the United States. Prior to 2009, there were no questions that assessed PPD or anxiety in the PRAMS Core Questionnaire (
The goals of this study were to develop and evaluate 2–3 self-reported items assessing depression and 2–3 self-reported items assessing generalized anxiety that would provide the most accurate estimate of the prevalence of these conditions among postpartum women.
Materials and Methods
The research consisted of three phases, and phase III is our focus. Briefly, in phase I (M.W. O'Hara, unpublished observations), a total of 44 candidate items were developed by experts in the assessment of PPD and anxiety (the authors, M.W.O., S.S., D.W.). The aim was to develop a diverse array of items that together provided a reasonably comprehensive assessment of the full range of depressive and generalized anxiety symptoms.” Focus groups consisting of postpartum women were conducted to assess the format and meaningfulness of all items.
In phase II, a convenience sample of 1123 postpartum women completed a questionnaire containing questions on 20 candidate items and several additional validated screening instruments assessing depression and anxiety. Convergent and discriminant analyses led to the refinement and reduction from 20 items to 16 candidate items; specifically, 6 items were dropped and 2 new items were added to improve coverage of anxiety.
This report focuses on phase III, during which the sensitivity, specificity, positive predictive value (PPV), and estimated prevalence of combinations of items were assessed to identify the items that produced the most accurate measures of depression and generalized anxiety among postpartum women. All research procedures reported here were approved by the University of Iowa Institutional Review Board, and all subjects provided informed consent before their participation in this research.
Participants and procedure
The study sample included 1077 postpartum women recruited through two means. Of the 1077 study participants, 885 women gave birth between December 2004 and July 2007 and were recruited through infant birth records. Research staff mailed invitations and questionnaires to all women who delivered a baby in four rural and urban counties in Iowa. Women completed the questionnaire, on average, 21 weeks postpartum (range 1–56 weeks after delivery). To increase the racial diversity of the sample, a convenience sample of 192 nonwhite postpartum women was recruited in the same manner at maternal and child health centers in Iowa and Michigan.
To assess a major depressive episode (MDE) and GAD, a subset (n=475) of the 1077 women was invited to participate in clinical interviews by phone. Those invited included all minority women who completed questionnaires, all women who scored ≥ 13 (indicating probable depression) on the Edinburgh Postnatal Depression Scale (EPDS), 12 and every fifth woman scoring < 13 on the EPDS. A total of 353 of the 475 eligible women (74%) completed clinical assessments administered by master's level research assistants. The clinical assessment included the MDE and GAD modules of the Structured Clinical Interview for DSM-IV (SCID) 13 and the Hamilton Rating Scale for Depression (HRSD). 14
Measures
The self-administered questionnaire included demographic questions (age, race/ethnicity, education, income, marital status), the 16 candidate items, three depression scales, and one anxiety scale. The three depression scales were the Beck Depression Inventory (BDI),
15
the General Depression Scale of the Inventory of Depression and Anxiety Symptoms (IDAS-GD),
16,17
and the EPDS.
12
The anxiety scale was the Beck Anxiety Inventory (BAI).
18
The interview-based measures included the HRSD
14
and the SCID.
13
All the depression measures and the anxiety measure have been validated for use with postpartum women. For each of the 16 candidate items (Appendix, supplemental material available online at
The BDI 15 consists of 21 items, each of which contains four descriptive statements (scored 0–3) that reflect increasing levels of severity of each symptom. For each item, respondents choose the option that best characterizes how they have been feeling during the past week, including today. The BAI 18 assesses 21 affective and somatic symptoms of anxiety on a 4-point scale. For the purposes of this study, the 6-item Subjective subscale (BAI-Subj) was used because its items most clearly represent general anxiety symptoms. 18 Respondents indicate to what extent they have been bothered by each symptom during the past week, including today.
The EPDS 12 has 10 items, each of which contains four descriptive statements (scored 0–3) that reflect increasing levels of severity of each symptom. Respondents are asked to indicate the answer that comes closest to describing how they have been feeling in the past 7 days. It has shown good reliability and validity across a large number of studies. 12 The IDAS-GD 16,17 contains 20 items that ask about depression symptoms over the past 2 weeks and are rated on a 5-point scale (1, not at all, to 5, extremely).
The HRSD 14 contains 17-items with 3-point scales (0–2) to 5-point scales (0–4) and covers experiences in the past week. Each scale point is associated with a descriptive statement reflecting increasing severity. Twenty randomly selected cases were used to examine interrater reliability, which was excellent (intraclass correlation=0.99). The SCID 13 MDE and GAD modules were used in this study. The interrater reliabilities based on 20 randomly selected cases for the MDE and GAD modules were kappa=0.80 and kappa=1.00, respectively.
A depression composite for self-report measures was computed by converting the BDI, EPDS, and IDAS-GD severity scores to z-scores, adding them, and dividing by 3.
Statistical analyses
For the outcomes MDE (SCID-based), HRSD, and the depression composite, backward and forward stepwise regressions were used to identify 4–5 candidate items to evaluate further. Initially, backward stepwise regression was used to identify which of the 16 items was independently associated at a significance level of p≤0.001 with each of the three depression outcomes. Because the final measure needed to be very brief, forward stepwise regressions were undertaken with the items identified in the backward regressions to identify 4–5 candidate items that accounted for the most variance in each outcome (MDE [SCID-based], the HRSD, and the depression composite). The same approach was taken with GAD and the BAI. Once the smaller set of items was identified, prevalence estimates, sensitivity, specificity, PPV, and Youden's J 19 were calculated for a variety of item sets (separately for depression and anxiety) for the sample of women who completed the SCID interview. Youden's J is a statistic that was developed to provide a measure of overall performance of a screening test and to allow comparisons between tests. As a consequence, we were able to compare the performance of the various combinations of 2 and 3 items as indicators of MDE and GAD in our sample. Finally, receiver operating characteristic (ROC) analyses were undertaken, which yielded area under the curve (AUC) for each set of items. As a comparison to the performance of the candidate items, sensitivity, specificity, and PPV for various thresholds for the EPDS, BDI, and the IDAS-GD scales also were calculated. ROC analyses were undertaken for these scales as well. All analyses were conducted using SPSS version 19.
Results
Demographic characteristics of the samples are reported in Table 1. In both the entire sample and the interview subsample, the participants were largely well educated, married, and Caucasian, and approximately 43%–46% had delivered their first child.
M, mean; SD, standard deviation.
Means and standard deviations (SD) of depression and anxiety scales and candidate PRAMS items are reported in Table 2. Rates of moderate to severe depression, based on the EPDS, BDI, and IDAS-GD ranged from 11% to 16%. Approximately 16% of women reported at least moderate levels of anxiety on the BAI-Subj. The candidate items with the highest prevalence included felt overwhelmed, low energy, slowed down, problems sleeping, and felt tense. The least reported symptoms were self-harm, felt fearful, felt hopeless, felt panicky, loss of interest, and poor appetite.
Thresholds for cutoff scores are Edinburgh Postnatal Depression Scale (EPDS)>12; Beck Depression Inventory (BDI)>18; Beck Anxiety Inventory-Subjective Subscale (BAI-Subj)>6; Inventory of Depression and Anxiety Symptoms-General Depression Scale (IDAS-GD)>54. These levels represent a moderate severity of depression or anxiety. For Pregnancy Risk assessment monitoring system (PRAMS) candidate items, the % represents a report that the symptom was experienced often or always (4 or 5 on 5-point scale) during the postpartum period.
SCID, Structured Clinical Interview for DSM-IV.
Results from the different regression models predicting depression identified 5 items (depressed mood, felt overwhelmed, felt restless, felt hopeless, and slowed down) that were independently associated with at least one depression outcome (p<0.01) (Table 3). In addition, loss of interest was retained in the smaller set of items because it has been included in the PRAMS questionnaire in the past as a state optional question (along with depressed mood). The item self-harm was not included for ethical reasons because providing timely and appropriate follow-up and care is not possible in a surveillance system. Five candidate items were independently associated with GAD or the BAI-Subj (p<0.01): felt panicky, felt restless, felt fearful, problems sleeping, and worrying (Table 3).
HRSD, Hamilton Rating Scale for Depression.
Significance of this term is p=0.089.
Significance of this term is p=0.033.
Significance of this term is p=0.102.
R 2Δ=R square change for each step. For logistic regressions (SCID-based major depressive episode [MDE] and generalized anxiety disorder [GAD] diagnosis), the Cox and Snell R 2 was used. All regression models were significant at p<0.001. With three exceptions, all R 2Δ were significant, p<0.01.
When examining different combinations of items (Table 4), there were no significant differences in the statistical measure Youden's J, as all 95% confidence intervals (CI) overlapped (data not shown). In addition, there were no significant differences among item sets with respect to AUC; all item sets exceeded 0.800, which indicates excellent performance. 20 Differences were found in sensitivity, specificity, and PPV among the different combinations of items. For example, a combined score > 9 for depressed mood, felt hopeless, and slowed down produced the highest values for specificity (87%) and PPV (60%), and the estimated prevalence (24.3%) came close to the true prevalence of MDE (25.4%). A combined score of > 6 for depressed mood, felt hopeless, and slowed down produced the highest sensitivity (95%), but this cutoff score substantially overestimated prevalence (62%). The two depression items included in the PRAMS standard questionnaire from 2004 to 2008, depressed mood and loss of interest at the level of often (4) or always (5) yielded 63% sensitivity, 83% specificity, a PPV of 55%, and an estimated prevalence of 28.8%.
AUC, area under the curve; Dep, depressed mood; Hope, hopelessness; LI, loss of interest; Panic, felt panicky; PPV, positive predictive value; Rest, restlessness; Slow, slowed down.
As a comparison to the performance of the candidate items, various thresholds for the EPDS, BDI, and IDAS-GD scales are reported in Table 4. The highest sensitivity (93%) but highest estimated prevalence (57% compared to the true prevalence of 25.4%) among the scales occurred at an IDAS-GD threshold of > 38. The highest specificity (81%–82%), highest PPV (55%–56%), and closest estimated prevalence (31%–32%) to the true prevalence (25%) were observed when the EPDS threshold was > 12 or the BDI threshold was > 14.
With respect to GAD, there were no significant differences among item sets in the ROC analyses, which ranged from 0.780 to 0.830 (acceptable to excellent), or using different thresholds within item sets in terms of Youden's J. However, threshold levels and combinations of items influenced sensitivity, specificity, and PPV. Two sets of items (a 2-item and a 3-item combination) yielded essentially the same performance with respect to specificity and PPV. Both sets of items included felt panicky and problems sleeping. The 3-item set also included felt restless. Using a threshold of panic and sleep > 6 yielded 87% specificity and 42% PPV and a prevalence of 20%, which compared relatively well to the true prevalence of 16%. Using a threshold of > 4, felt panicky and problems sleeping yielded a sensitivity of 86% but a prevalence of 46%, almost three times the true prevalence. Using a threshold of > 5, felt panicky and felt restless yielded a good balance of relatively high sensitivity (75%) and specificity (77%) but a prevalence of 30.4%, about double the true prevalence of GAD. Using a BAI-Subj > 4 threshold yielded a sensitivity of 76% and a prevalence of 36%. Using a threshold of BAI-Subj > 6 yielded an 82% specificity, a 35% PPV, and a prevalence of 23%.
The item felt restless was common to well-performing 2-item sets for identifying MDE (depressed mood and felt restless) and GAD (felt panicky and felt restless). The combination of depressed mood, felt restless, and felt panicky had relatively good performance in identifying both MDE and GAD (Tables 4 and 5).
Except for BAI-Subj, only two-item and three-item combinations for which Youden's J≥0.400 are displayed.
Fear, Felt fearful; Sleep, problems sleeping; Worry, worrying.
Discussion
In this study, we sought to develop two to three questions to estimate depression prevalence and separate questions to estimate generalized anxiety prevalence for use on PRAMS, a state-based surveillance system of postpartum women. The ideal set of questions would have high sensitivity, specificity, and PPV. Because the prevalence of depression and anxiety is relatively low (even though clinically significant), high specificity is particularly important to achieve high positive predictive values and precise estimates of prevalence. This is particularly important because prevalence estimates for PPD and generalized anxiety will drive public expenditures in states that use the PRAMS.
The findings of this research led to several recommendations. We found that a number of combinations of 2–3 items performed as well as or better than existing scales with higher numbers of questions with respect to PPV and estimating true prevalence. For example, using the 3 items, depressed mood, felt hopeless, and slowed down, with a cutoff for scores > 9 yielded the highest specificity and PPV, closely estimated the true prevalence, and had higher specificity and PPV than the EPDS, the BDI, and the IDAS. These features are very important in a surveillance context, such as that represented by PRAMS.
Sensitivity was higher for all the previously validated measures (EPDS, BDI, and IDAS) than the shorter scales developed through this research. Consequently, longer measures, such as the EPDS, BDI, and IDAS, may be preferable to use in clinical settings (obstetrics-gynecology, family medicine), where a two-stage approach is feasible, first, to identify potentially depressed women and then to provide a more intensive assessment of those who screen positive. With respect to anxiety detection, several combinations of items performed equally well. With respect to PPV and estimation of true prevalence, the 2-item combination of felt panicky and problems sleeping performed very well. The 2-item combination of felt panicky and felt restless had a good balance of sensitivity and specificity.
Finally, one set of 3 items (depressed mood, felt restless, and felt panicky) performed reasonably well in identifying both depressed and anxious women. The items depressed mood and felt panicky were the prime predictors of MDE and GAD, respectively, and felt restless was a good indicator of both. This combination of these 3 items could be considered in clinical screening and surveillance contexts in which there is a severe constraint on number of items but a need to identify women at risk for both depressive and anxiety disorders.
There are limitations to the work reported here. Subjects represented a convenience sample; however, the interviewed subsample of women was quite diverse with respect to race and ethnicity. As a consequence, the findings of this research should have some generalizability to populations living outside of the State of Iowa. Nevertheless, it will be important to cross-validate the performance of the item sets and their thresholds for use both in surveillance, as in the case of PRAMS, and in clinical screening in primary care.
There was not always a good match between the time frame of the candidate items (since your baby was born) and the SCID assessments (past month), raising the possibility that time since delivery may modify the association between responses to candidate items and MDE and GAD diagnoses. This possibility was examined, and the association between time since delivery and a diagnosis of MDE or GAD was very weak (r=0.06 in both cases, p>0.29). This result reflects in part the well-known phenomenon that individuals who complete mood questionnaires meant to reflect extended periods of time are influenced considerably by their current mood state. 21,22 The implication of this finding in the context of PRAMS is that episodes of depression that end long before women complete the PRAMS questionnaire may be missed. However, PRAMS participants, on average, complete the survey within 3–4 months of childbirth, which lessens the possibility that clinically significant episodes of MDE occurring early in the postpartum period will be missed. It would be valuable in future research to document precisely with the timing of episodes of major depression and generalized anxiety in the postpartum period relative to the timing of the administration of PRAMS questionnaire.
Maternal depression and anxiety, particularly in the postpartum period, are significant public health issues. The impacts of these disorders on women's health and well-being are well documented, 1,2 as are the long-term negative effects for infants exposed to maternal depression. 7 It is important that state public health officials have tools to determine the prevalence of depression and anxiety among mothers of infants. The most common mechanism for health surveillance among women who have recently given birth is PRAMS. The findings of our study have provided, for the first time, performance measures for a set of items that reflect depression and anxiety among women who have recently given birth.
The findings of our study also support the use of 2-item and 3-item screening scales to identify women at risk for postpartum depressive and anxiety disorders in primary care settings when there is a premium on time, such that the use of longer scales is not feasible. The item sets tested here have an advantage relative to the commonly used EPDS 12 in that they can be completed more quickly and do not contain British idioms that may not be familiar to many American women.
Footnotes
Acknowledgments
We acknowledge the financial support of the Centers for Disease Control and Prevention (MM-0822, S. Stuart, PI). We thank Sarah Mott, B.S., for her assistance in data management.
Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
