Abstract
This study aimed to examine the psychometric properties of the PROMIS depression, anxiety, and anger item banks in a large Australian population-based sample. The study tested for unidimensionality; evaluated invariance across age, gender, and education; assessed local independence; and tested item bank scores as an indicator for clinical criteria. In addition, equivalence of the 7-day time frame against an alternative 30-day time frame was assessed. A sample of 3,175 Australian adults were recruited into the study through online advertising. All three item banks showed strong evidence of unidimensionality and parsimony, with no items showing local dependence. All items were invariant across age, gender, and education. The item banks were accurate in detecting clinical criteria for major depressive disorder, generalized anxiety disorder, and panic disorder, although legacy measures designed for this purpose sometimes performed marginally better. Responses to the 30-day time frame were highly consistent with the original 7-day time frame. The study provided support for the validity of the PROMIS emotional distress item banks as measures of depression, anxiety, and anger in the Australian population, supporting the generalizability of the measures. The time frame chosen for assessing mental health outcomes using these item banks should be based on pragmatic considerations.
More rapid and precise assessment of mental and physical health may lead to improved health outcomes. The Patient-Reported Outcomes Measurement Information System (PROMIS®), a collaborative initiative funded by the National Institutes of Health, has developed self-reported outcome measures for assessing a broad range of health-related constructs (Cella et al., 2007). Using item response theory (IRT) methods, the PROMIS initiative has developed and calibrated three item banks assessing emotional distress: depression, anxiety and anger (Pilkonis et al., 2011). These item banks offer a flexible approach to the measurement of emotional distress, and have been used to develop new static and adaptive scales that are highly efficient and precise (Choi, Reise, Pilkonis, Hays, & Cella, 2010; Pilkonis et al., 2011). Item banks provide more thorough assessment of health constructs than single measures, as they build from a systematic approach to pool items together before successive rounds of revision and reduction (Batterham, Brewer, et al., 2015; Pilkonis et al., 2011). Through this process, only the most informative, content valid and psychometrically robust items are retained. Item banks also enable linkages to be made between measures used in population and clinical settings (Pilkonis et al., 2011). Item banks can be used to form common metrics that link a network of existing scales for easy comparison across different measures using in population and clinical settings. However, evidence for their validity across the populations in which these measures are to be used is vital. In addition, as the item banks may be used for multiple purposes within population-based and clinical settings, an examination of the utility of different time frames is also warranted.
Psychometric Properties of the PROMIS Emotional Distress Item Banks
Few independent studies have assessed the validity of the PROMIS emotional distress item banks outside the United States. International testing of the item banks is important as scales may perform differently as a function of culture, health systems, and demographic compositions. One European study has examined the performance of the PROMIS depression measures in a clinical sample (Vilagut et al., 2015), reporting excellent measurement properties. To our knowledge, no independent population-based test of the psychometric properties of the PROMIS emotional distress measures has occurred. Identifying whether the item banks form distinct unidimensional constructs, testing whether items remain accurate across subgroups of the population, and assessing parsimony in an independent sample from outside the United States would provide further evidence for the use of the PROMIS emotional distress item banks in population settings.
In addition, the PROMIS initiative has focused on continuous constructs of health as opposed to categorical constructs. This focus is valuable and has resulted in highly flexible measures that suit a number of settings, including community settings and patients with physical and/or mental health problems. However, the field of psychiatry tends to value mental health scales on the basis of how well they distinguish a specific disorder, while better measurement can inform the clinical utility of diagnostic categorizations (First et al., 2004). The PROMIS measures have not been tested for use as clinical screeners. Therefore, the development of empirically derived cut-points and testing the PROMIS depression and anxiety tools for screening major depressive disorder (MDD), generalized anxiety disorder (GAD), and panic disorder (PD) holds promise of yielding additional support for their use.
Time Frame in Assessment of Mental Health
In assessing mental health symptoms, time frame is an important consideration. Choosing an appropriate time period may depend on a number of factors, including the frequency that symptoms occur, consistency with clinical definitions, consideration of recall biases and periods in which change might be observed (Watson, 1988). Specifically, based on Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5; American Psychiatric Association, 2013) criteria, MDD require symptoms to occur over a 2-week time period, while GAD requires longer term symptoms with 6-month duration. Mental health scales typically adopt similar time periods to clinical definitions. Time period may be particularly sensitive to less common symptoms. For example, anxiety symptoms such as heart palpitations might occur episodically, such that an individual may report absence of such symptoms in the past week but more frequently in the past month. Recall biases place upper limits on the time period, as individuals often forget or fail to report distant events including depressive episodes (Wells & Horwood, 2004). Consideration of realistic time frames for change may also influence the selection of a time period. For example, a scale that assesses symptoms over the past year may not be useful in a clinical setting where change over weeks may be of interest.
A small body of research has tested the effects of time period on self-report mental health data. The focus has primarily been on short- versus long-term measurement of affect. Findings have tended to suggest that time frame has little systematic effect on mood measurement, with little impact on the validity of affect measures (Luhmann, Hawkley, Eid, & Cacioppo, 2012; Watson, 1988). Endorsement of particular items may be higher in the long term, which may reflect a greater degree of emotionality in long term recall (Ready, Robinson, & Weinberger, 2006; Terry, Stevens, & Lane, 2005) or simply more time for noticing less common feelings or behaviors. It is important to quantify the impact of different time frames to increase the utility of scales such as the PROMIS item bank for use in a range of settings with differing duration requirements.
The Present Study
This study aimed to test the psychometric properties of the PROMIS emotional distress item banks (depression, anxiety, anger) in an Australian population-based sample. The study consisted of four elements: (1) testing whether the items in the banks fit a unidimensional structure; (2) evaluating invariance across age, gender, and education; (3) identifying items that may be redundant (locally dependent); and (4) testing whether items from the banks can be used to accurately identify individuals who meet clinical criteria for major depression disorder or generalized anxiety disorder. In addition, two versions of each item bank (the original 7-day recall period and a 30-day recall period) were administered in a subset of participants to test the influence of time frame on responses to the items.
Method
Participants and Procedure
Respondents were recruited from the online social media website Facebook using a series of advertisements targeted to all Australian adults aged 18 year or older during August to December 2014. The target population of Facebook users aged 18 years or older was 8.8 million, representing approximately 45% of the total Australian population in that age bracket. The advertisements resulted in 39,945 users clicking on the advertisement. A total of 10,082 adults consented to participate in the survey with 5,011 (50%) completing the survey.
As the survey covered a broad range of disorders with lengthy scales for each, participants were given the option of completing a brief form of the survey (three versions, each ~30 minutes), or a full form (three versions with different item orders, each ~60 minutes). For consistency across analyses, the present study only used the subgroup of participants who completed the full version of the survey (n = 3,175). A subset of these were administered the 7-day versions of the depression (n = 1,066), anxiety (n = 1,075) and anger (n = 1,034) item banks. The survey was conducted online using LimeSurvey software, with data stored on a secure server at the Australian National University, Canberra. The study received ethics approval from the Australian National University Human Research Ethics Committee (Protocol No. 2013/509).
Measures
Participants completed the PROMIS item banks, legacy scales assessing each mental health condition and a checklist of DSM-5 criteria. Other measures focusing on a range of other mental health problems, functioning and service use were assessed but not included in the present study.
PROMIS item banks for depression, anxiety, and anger consist of 28, 29, and 29 items, respectively, each rated on a 5-point categorical response scale from never to always (Pilkonis et al., 2011).
Legacy scales used to assess depression, generalized anxiety and panic disorder were the 9-item Patient Health Questionnaire (PHQ-9; Spitzer, Kroenke, & Williams, 1999), the 7-item Generalized Anxiety Disorder scale (GAD-7; Spitzer, Kroenke, Williams, & Lowe, 2006), and the Panic Disorder Screener (PADIS; Batterham, Mackinnon, & Christensen, 2015), respectively. Symptom scores on these scales can range from 0 to 27, 0 to 21, and 0 to 13, respectively, with higher scores indicating greater symptom severity. These scales have previously been shown to be accurate in screening for risk of disorder and had high internal consistency in the present sample (Cronbach’s α = .93, .92, and .90 for the PHQ-9, GAD-7, and PADIS, respectively).
The DSM-5 symptom checklist was developed by the authors as a proxy for clinical diagnosis. The checklist queried respondents about the presence or absence of symptoms based directly on DSM-5 definitions for each disorder of interest. There were 15 items used to assess MDD (including items to exclude hypomania), 14 for GAD and 21 for PD. Each item reflected a single DSM-5 criterion for the disorder of interest (e.g., for GAD: “During the past six months, have you found it difficult to control your worries?”), although some criteria were probed across multiple questions and additional items were used to exclude alternative diagnoses. The DSM-5 symptom checklist is available online as Supplemental Material (https://journals-sagepub-com.web.bisu.edu.cn/doi/suppl/10.1177/1073191116685809). The checklist was designed along similar principles to the electronic version of the Mini International Neuropsychiatric Interview (MINI; Zbozinek et al., 2012) in terms of structure (binary and categorical self-report items with conditional skip logic) and response burden. However, the checklist used in the current study was developed independently from the MINI, nonproprietary and based on DSM-5 rather than DSM-IV (American Psychiatric Association, 1994) criteria.
Background characteristics including age group, gender, and educational attainment were also assessed.
Psychometric Properties of Item Banks
Examination of the psychometric properties of the PROMIS item banks using IRT proceeded in four stages, reflecting assumptions about the composition of the item banks. First, the unidimensional structure of each item bank was tested using confirmatory factor analysis (CFA), on the basis of limited information weighted least squares with mean and variance adjustment. This estimator is suitable for the analysis of categorical/ordinal data using polychoric correlations. Good model fit was determined using established cut-off values of ≥0.95 for comparative fit index and Tucker–Lewis index and values ≤0.08 for root mean square error of approximation based on the large sample size and under the assumption that these statistics are scaling-corrected (Hu & Bentler, 1998, 1999). Items with a factor loading on a single factor solution <0.4 were identified as poorly fitting (Brown, 2015).
The second stage involved testing whether the items exhibited local independence, that is, the assumption that items correlated with one another only through the shared relationship with the latent variable. To identify locally dependent items, modification indices of residual correlations were inspected. Modification indices associated with significant residual correlations between item pairs were identified and the content across the item pairs were inspected. Items with expected parameter change (fully standardized) ≥0.2 were identified as violating the assumption of local independence (Whittaker, 2012).
The third stage involved testing whether the underlying factor structure and the item response parameters estimated by the model were invariant on the basis of age, gender, and education. Evidence of invariance is a necessary prerequisite for group comparisons. Three key sociodemographic variables were tested using a multigroup CFA approach: age (18-55 years vs. 56+ years), gender (males vs. females), and education level (post–high school education vs. high school or lower). This approach proceeded in a series of steps by fitting nested models that estimated increasing levels of measurement invariance separately across age, gender, and education level. Specifically, nested configural (freely varying across groups) and scalar (constrained loadings and thresholds) models were compared using a chi-square difference test implemented by the DIFFTEST function in Mplus. A nonsignificant chi-square test (p > .05) would imply that the scalar model does not provide a significant decrement in model fit and all the items can be assumed invariant. Items with expected parameter change (fully standardized) ≥0.2 were identified as demonstrating noninvariance across age, gender or education (Whittaker, 2012). Data were then compared to the original U.S. calibration data for the PROMIS item banks (Pilkonis et al., 2011) accessed through Harvard Dataverse (https://dataverse.harvard.edu), with invariance by country tested using identical criteria. Demographic differences between the two samples were also assessed based on chi-square statistics.
The item banks were calibrated using the two-parameter graded response model suitable for ordinal data (Muthén & Muthén, 2010) by estimating a unidimensional CFA with a full information robust maximum likelihood estimator and a logit link function. This method takes into account the statistical similarities between IRT and factor analysis for ordered categorical items when converting item factor loadings and thresholds to the respective IRT discrimination and difficulty parameters (Takane & De Leeuw, 1987). As the sample was not representative of the general population, with higher levels of psychopathology and overrepresentation of females, a weighting scheme was applied to the data for the derivation of normative IRT estimates. This scheme was designed to make the sample more representative of the general population in terms of age, gender, and psychopathology distributions, to account for these potential imbalances (Batterham, Sunderland, Carragher, Calear, Mackinnon, et al., 2016). The weighting scheme used Australian data on the population prevalence of anxiety, affective, and substance use disorders, accounting for comorbidity between these disorder categories in each age and gender group. Data on mental disorders was obtained from the 2007 Australian National Survey of Mental Health and Wellbeing (Slade, Johnston, Oakley Browne, Andrews, & Whiteford, 2009), while data on age and gender distributions of the general population were obtained from Australian Bureau of Statistics population estimates (Australian Bureau of Statistics, 2016).
Finally, item banks were compared with DSM-5 criteria for MDD, GAD, and PD to test whether the item banks can provide robust screening for these specific disorders. The accuracy of the PROMIS items in identifying caseness for MDD, GAD, and PD (area under the receiver operating characteristic curve [AUC], sensitivity and specificity) was compared with the accuracy of the legacy scales (PHQ-9, GAD-7, PADIS). For comparability of length, existing short forms of the PROMIS depression and anxiety item banks (Pilkonis et al., 2011) were also tested against clinical criteria. Youden indices were used to identify optimal cut points where none were available.
Comparison of Time Frame
The 7- and 30-day time frame questions were compared using three approaches. First, correlations between the 7- and 30-day items were calculated, to assess whether correlations between equivalent items were at least as great as correlations between nonequivalent items. Second, differences in mean scores on each item were compared with identify whether there were systematically higher responses in the 30-day versions. Third, standardized factor loadings from confirmatory factor analyses were descriptively compared with identify whether items had a similar influence on the underlying construct in the 7- versus 30-day version of the item bank.
IRT analyses were conducted using Mplus version 7.2 (Muthén & Muthén, 2013). Remaining analyses were conducted using SPSS v23.0 (IBM Corp., Chicago, IL).
Results
Sample Characteristics
Table 1 displays the sociodemographic and mental health characteristics of the sample. There were elevated levels of psychopathology in the sample and females were over-represented compared to the general population. The weighting scheme reduced these biases, and was applied in the estimation of normative IRT parameters.
Characteristics of the Complete Sample (N = 3,175).
Note. PD = panic disorder; MDD = major depressive disorder; GAD = generalized anxiety disorder.
Unidimensionality, Local Dependence, and Invariance of Item Banks
All three item banks showed strong evidence of unidimensionality. Single factors accounted for 71%, 63%, and 58% of the variance in depression, anxiety, and anger respectively. All items had significant loadings, with the depression item loadings >0.78, anxiety item loadings >0.61, and anger item loadings >0.63. The CFAs for the depression, anxiety and anger item banks had adequate fit, meeting criteria for scaling-corrected estimates of root mean square error of approximation (0.06, 0.06, and 0.05, respectively), comparative fit index (0.98, 0.97 and 0.95 respectively), and Tucker–Lewis index (0.98, 0.96, and 0.95, respectively). Across all three item banks, no modification indices had a standardized estimated parameter change ≥0.2, indicating that no items had local dependence.
Items showed invariance by age, gender and education. Although there were significant differences in fit between configural and scalar models, all items had a fully standardized expected parameter change <0.07 for age (18-55 years vs. 56+ years), <0.03 for gender (males vs. females), and <0.08 for education level (post–high school education vs. high school or lower), indicating negligible noninvariance.
The IRT parameters for the items in each bank are provided in Table 2, based on data weighted to the Australian population. Each item showed high discrimination, with estimates ranging from 1.29 to 5.34, and nonoverlapping difficulty parameters (overlapping difficulty parameters would indicate redundant response choices). The parameters were consistent with those reported in the original calibration of the item banks (Pilkonis et al., 2011), with higher discrimination parameters for the depression bank, followed by the anxiety and anger banks. However, in formal comparisons for invariance with the U.S. calibration sample, some items showed substantial noninvariance. Table 2 presents items ordered identically to the U.S. ordering, that is, by order of discrimination in the U.S. sample (Pilkonis et al., 2011). In the depression bank, items related to hopelessness (e.g., I felt helpless, I felt that I wanted to give up on everything) discriminated relatively better than depression, sadness, and unhappiness items in the Australian sample compared with the United States, with significant noninvariance identified between the two samples on some of these items as noted in the Table. Some items reflecting isolation or loneliness also demonstrated substantive noninvariance. Anxiety items were largely consistent with the U.S. calibration, with no items demonstrating noninvariance. In the anger item bank, a number of items showed noninvariance between Australia and the United States, although these items did not fit into distinct categories. In general, estimates of discrimination and thresholds were of the same magnitude in this sample as in the original calibration sample.
IRT Estimates Based on Weighted Data for the Three Item Banks.
Indicates items with substantial noninvariance (SEPC > 0.2) between the Australian and U.S. calibration samples.
Despite these similarities, there were also significant differences in the compositions of the two samples. The U.S. sample had underrepresentation of younger adults (19% aged 35 or younger in U.S. sample that completed one or more of the PROMIS emotional distress scales vs. 32% in the weighted Australian sample) and overrepresentation of older adults (46% vs. 33% in the U.S. vs. Australian samples aged 56 or older; χ2 = 286.7, df = 5, p < .001). The U.S. sample also had slight overrepresentation of females (55% vs. 51% in the U.S. and Australian samples; χ2 = 15.5, df = 1, p < .001), and underrepresentation of people with no high school qualification (2% vs. 11% in the U.S. vs. AU samples; χ2 = 305.1, df = 3, p < .001). Based on T scores, scored on the basis of IRT estimates from the U.S. calibration sample with M = 50 and SD =10 in the U.S. general population, the weighted Australian sample had comparable scores on the depression (M = 51.0, SD = 10.8), anxiety (M = 48.6, SD = 10.2), and anger (M = 46.6, SD = 9.5) short form scales, although due to cross-national noninvariance, the T scores may not reflect the underlying population distributions in Australia.
Precision of the short forms, full item banks, and legacy scales across the spectrum of symptom severity was assessed using information curves. Figures 1 and 2 indicate that the full item banks had good precision across the spectrum from approximately two SD below the population mean up to four SD above the mean. This is to be expected due to the larger number of items. However, comparing the seven-item short forms and the legacy scales of 7 to 9 items (PHQ-9, GAD-7) it is evident that the short forms had greater overall precision across the spectrum, particularly in the upper tail where risk of disorder is highest.

Information curves for the full PROMIS depression item bank (28 items), short form PROMIS depression item bank (7 items), and PHQ-9 (9 items).

Information curves for the full PROMIS anxiety item bank (28 items), short form PROMIS anxiety item bank (7 items), and GAD-7 (7 items).
Accuracy in Screening for Clinical Criteria
Total raw scores (summed responses of all items in the bank) for the depression and anxiety PROMIS item banks (original 7-day versions) were compared to existing screeners for MDD, GAD, and PD against clinical criteria. Table 3 presents the AUC and sensitivity and specificity at prescribed cut points for the PROMIS item banks, short forms, and legacy scales (PHQ-9, GAD-7), compared with DSM-5 caseness for depression, GAD, and panic disorder. The AUC for the PROMIS depression bank in assessing MDD was significantly poorer than the AUC for the PHQ-9 (χ21 = 8.2, p =.004). However, there was no significant difference in the performance of the PROMIS anxiety bank in assessing clinical GAD relative to the GAD-7 (χ21 = 2.3, p = .133). In assessing PD, the performance of the PROMIS anxiety bank was not significantly different from the PADIS (χ21 = 1.07, p = .301) or the GAD-7 (χ21 = 3.79, p = .052) in identifying PD. A raw (sum) score ≥87 on the PROMIS depression item bank had 89% sensitivity and 83% specificity for MDD. A raw score ≥66 on PROMIS anxiety had 83% sensitivity and 78% specificity for GAD, while a raw score ≥74 on the same item bank had 82% sensitivity and 76% specificity for PD.
Performance of Item Banks, Short Forms, and Legacy Scales Against DSM-5 Clinical Criteria.
Note. DSM-5 = Diagnostic and Statistical Manual of Mental Disorders, 5th edition; AUC = area under the curve; PROMIS® = Patient-Reported Outcomes Measurement Information System; PHQ = Patient Health Questionnaire; GAD = Generalized Anxiety Disorder scale.
Performance of existing PROMIS short forms was also assessed as they are of comparable length to the legacy scales. Although the 8-item short form PROMIS depression measure was accurate in identifying MDD caseness, it remained significantly less accurate than the PHQ-9 (χ21 = 10.5, p = .001). The 7-item short form PROMIS anxiety measure performed well and not significantly less accurately than the GAD-7 in detecting GAD (χ21 = 3.27, p = .071). Furthermore, the short form PROMIS anxiety measure was accurate in detecting PD, with operating characteristics not significantly different to the PADIS (χ21 = 1.8, p = .180) or the GAD-7 (χ21 = 1.75, p = .185) in detecting PD.
Time Frame Comparison
The correlational analysis demonstrated strong associations between 7- and 30-day versions of the PROMIS items, with correlations ranging from 0.79 to 0.90 for depression items, 0.84 to 0.89 for anxiety items, and 0.71 to 0.82 for anger items. Each 7-day item had a stronger correlation with the identical 30-day item than with any other item, with the single exception of the depression item “I felt fearful,” which had a correlation of 0.80 with both “I felt fearful” and “I felt frightened” using the 30-day timeframe.
Raw scores on the 30-day versions of the item banks were consistently higher than on the 7-day versions, indicating systematically greater endorsement of all 30-day items. Depression 30-day raw scores (M = 68.3, SD = 31.4) were significantly greater than 7-day raw scores (M = 63.5, SD = 31.9, t = 22.8, df = 2151, p < .001), as were anxiety raw scores (M = 62.8, SD = 25.4 vs. M = 59.1, SD = 26.0, t = 16.2, df = 1168, p < .001) and anger raw scores (M = 57.4, SD = 21.7 vs. M = 50.6, SD = 21.6, t = 24.3, df = 1095, p < .001).
CFAs indicated that both timeframe versions of each item bank performed similarly, with item loadings within ±0.05 for each item, and comparable proportion of variance explained by a single factor (70% vs. 73% for depression 30- and 7-day; 61% for both anxiety versions; and 56% for both anger versions).
Discussion
The findings indicate that the PROMIS emotional distress item banks provided robust measures of depression, anxiety and anger in an Australian population sample. The measures demonstrated evidence of unidimensionality, local independence, and invariance across age, gender, and education. These findings provide support for the validity of the PROMIS item banks in measuring emotional problems in the general population, providing confidence that the item banks are appropriate for use outside the United States and in samples that are diverse in terms of age, gender, and education. Some items were shown to exhibit substantive cross-national variations. In particular, hopelessness-related items tended to discriminate better in the Australian sample than more general depression/sadness items, relative to the original U.S. calibration sample, while items reflecting loneliness or isolation also performed differently in Australia and the United States. Similarly, some of the anger items showed noninvariance, although there was no consistent category of anger items that displayed noninvariance between countries. The observed differences may be due to cultural differences in the understanding of terms or in the expression of emotional states (Lange, Thalbourne, Houran, & Lester, 2002). Alternatively, the differences may reflect variation in the sampling methodology, with the original PROMIS calibration study using online market research panels supplemented with clinical subgroups (Pilkonis et al., 2011). The compositions of the sample reflected the different sampling approaches, with differences in age, gender and education, which may have been the source of the cross-national noninvariance. Notwithstanding these differences, the overall consistency of the item banks with the original calibration, and performance relative to legacy scales, provides confidence that they deliver a robust assessment of the constructs of depression, anxiety, and anger in the general population.
The item banks also performed well as indicators for clinical caseness, with high sensitivity and specificity in detecting MDD, GAD, and PD. The legacy scales (particularly PHQ-9) tended to have significantly stronger operating characteristics in detecting clinical caseness than the PROMIS item banks or short forms. However, items from the legacy scales were designed specifically to assess DSM-IV criteria, so it is unsurprising that they performed incrementally better than the PROMIS measures in detecting disorder. Nevertheless, information curves indicated that the PROMIS item banks and short forms tended to provide more precise measurement across the spectrum of symptom severity. The PROMIS measures may therefore have greater precision in capturing severity, while providing an accurate indicator of clinical caseness and need for clinical intervention. Further exploration of item banks specifically for panic disorder and other anxiety disorders may be warranted (Batterham, Sunderland, Carragher, & Calear, 2016).
The comparison of time frames indicated that corresponding items are rated comparably regardless of the time frame, with the only clear difference that longer time frames result in higher levels of item endorsement. These findings are consistent with previous research in the mental health domain (e.g., Watson, 1988) and with other PROMIS item banks (Lai, Cook, Stone, Beaumont, & Cella, 2009), and suggest that either of these time frames is likely to provide an accurate estimate of symptoms across the severity spectrum. Consequently, the choice of the most appropriate time frame may be based on pragmatic considerations. These considerations may include the period over which assessment is conducted (particularly for repeated assessments), the period where change may feasibly be expected to occur, and consistency with clinical definitions of psychopathology. Nevertheless, researchers and clinicians should be aware that symptoms that occur infrequently may not be endorsed as commonly when using a briefer time frame relative to a longer time frame.
This study provided unique data on the psychometric properties of the PROMIS emotional distress item banks in a large non–U.S. population-based adult sample. It also demonstrated that the item banks can be used to identify whether an individual is likely to meet criteria for MDD, GAD, or PD, and that the time frame used to assess symptoms has minimal impact on the properties of the measures. However, there were some limitations that could not be addressed in the study. The sample consisted of self-selected Facebook users, who were not representative of the general population in terms of gender or psychopathology. Traditional population recruitment tends to suffer from similar problems with representativeness of samples (Batterham, 2014). Although the weighting scheme largely accounted for overrepresentation of females and people with mental health problems, further testing in other population-based samples would provide additional confidence in the validity of the measures. In addition, the standard used to assess clinical criteria was a self-report checklist, due to the practical and financial constraints involved in conducting a large-scale population-based assessment. The checklist systematically assessed all criteria for MDD, GAD, and PD based on DSM-5, similar to other tools such as the MINI (Zbozinek et al., 2012). Nevertheless, self-report of DSM-5 criteria may not be consistent with clinician-administered measures, so clinician diagnosis would provide a more rigorous standard for diagnostic comparison. Further investigation of the reliability of the measures over time may provide additional support for their use.
The present study provided further evidence that the PROMIS emotional health item banks provide robust measures of depression, anxiety, and anger in the general population. These measures also provide a satisfactory indication of whether an individual is likely to meet clinical criteria and therefore may serve dual purposes of both assessing emotional health on a continuum and assessing risk of clinical states. Finally, the time frame comparison provided evidence that changing the time frame of a mental health measure does not typically affect the psychometric properties of the measure, although symptom endorsement will typically be more common over longer periods. This finding suggests that the time frame selected in assessing mental health outcomes should be based on pragmatic considerations.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: PJB, MS, and ALC are supported by National Health and Medical Research Council (NHMRC) Fellowships 1083311, 1052327, and 1013199, respectively. The study was funded by NHMRC Project Grant 1043952.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
