Abstract
There is evidence that the major anxiety and depressive disorders could reflect a single underlying internalization factor. For a group of 1,031 clinic-referred children, the study examined support for this factor, and used the two-parameter logistic model to examine the item response theory properties of the disorders in this factor. For the set of anxiety and depressive disorders, confirmatory factor analysis supported a one-factor model. The two-parameter logistic model analysis indicated that all the internalizing disorders in this factor were strong discriminators of the internalizing dimension. Also, they measured more of the internalizing dimension and with more precision in the upper half of the trait continuum. There was also support for the convergent validity of the internalizing dimension, in that it had large-to-medium effect size correlations with internalizing scores of other measures. The implications of the findings for clinical practice and clinical classification are discussed.
The fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association [APA], 1994) and its text revision edition DSM-IV-TR (APA, 2000) has separation anxiety disorder (SAD), social phobia (SOP), specific phobia (SPP), panic disorder (PD), agoraphobia (AG), generalized anxiety disorder (GAD), obsessive-compulsive disorder (OCD), and posttraumatic stress disorder (PTSD) in a single group of anxiety disorders. The unipolar depressive disorders of dysthymia (DYS) and major depressive disorder (MDD) are in a separate group. The current DSM-5 (APA, 2013) has SAD, SOP, SPP, PD, AG, and GAD grouped together as anxiety disorders. OCD is in a separate group called obsessive-compulsive and related disorders, and PTSD is in a group called trauma- and stressor-related disorders. DSM-5 has DYS and MDD combined together as a single disorder called persistent depressive disorder, and it is a separate group. Although DSM-IV, DSM-IV-TR, and DSM-5 view the different anxiety and depressive disorders as distinct categories of disorders, there is evidence that these disorders are related to a single underlying internalizing factor (Clark & Watson, 1991; Krueger, 1999; Mineka, Watson, & Clark, 1998). The current study tested the applicability of the one-factor model for the internalizing disorders, and used item response theory (IRT) to ascertaining the properties of the disorders in this factor in a group of clinic-referred children.
To date, many studies involving adults (e.g., Clark & Watson, 2006; Krueger, 1999; Mineka et al., 1998; for a meta-analysis, see Krueger & Markon, 2006), and adolescents and children (e.g., Higa-McMillan, Smith, Chorpita, & Hayashi, 2008; Lahey et al., 2008; Lewinsohn, Zinbarg, Seeley, Lewinsohn, & Sack, 1997; for a meta-analysis, see Angold, Costello, & Erkanli, 1999) have shown high-than-chance levels of comorbidity among the anxiety and depressive disorders. To explain this, Mineka et al. (1998) proposed a hierarchical model that has a single higher order general negative affectivity/distress factor that is common to all the anxiety and depressive disorders (similar to the negative affectivity factor in the tripartite model proposed by Clark & Watson, 1991) and lower order specific factors for the different depressive and anxiety disorders.
Although studies with adults (Beesdo-Baum et al., 2009; Seeley, Kosty, Farmer, & Lewinsohn, 2011), and adolescents and children (Higa-McMillan et al., 2008; Lahey et al., 2004; Lahey et al., 2008; Trosper, Whitton, Brown, & Pincus, 2012) have reported multiple factor models for the internalizing disorders and symptoms, the correlations for the latent factors in these models have generally been high, thereby supporting the possibility that a one-factor model could adequately represent the internalizing disorders. Consistent with this, existing studies have provided support for a hierarchical model with a single internalizing latent factor (Beesdo-Baum et al., 2009; Krueger, 1999; Slade & Watson, 2006; Vollebergh et al., 2001; Watson, 2005; see Krueger & Markon, 2006, for a meta-analysis), and also a one-factor model in adults (Krueger, Caspi, Moffitt, & Silva, 1998; Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007) and adolescents (Gomez, Vance, & Gomez, 2014).
To date, at least three studies have used IRT to examine how the anxiety and depressive disorders map onto the internalizing dimension (Gomez & Vance, 2015; Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007). IRT is model based, and although there are many IRT models, a general feature of all IRT models is that they all show the relationship between the response to an item and the latent trait the item is measuring (Embretson & Reise, 2000). One commonly used IRT model for binary responses is the two-parameter logistic model (2-PLM). The 2-PLM was used in all the three previous studies in this area (Gomez & Vance, 2015; Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007). In this model, like most IRT models, a graph called item characteristic curve (ICC) is generated for each item showing the probability of a positive response to the item as a function of the underlying trait. For each item, the model estimates the item difficulty (β) and discrimination (α) parameters. The difficulty parameter indicates the point on the scale of the latent trait where a person has a .5 probability of endorsing or responding positively to the item. Values above this can be inferred as providing more representation of the latent trait by the item. The item discrimination parameter is the ability of an item to discriminate people with different levels of the underlying trait (Steinberg & Thissen, 1995). Higher values would mean better ability to discriminate different levels of the trait in question, and consequently stronger associations with the latent construct. The 2-PLM model can also generate item information function (IIF), test information function (TIF), and the standard error of measurement of the TIF. The IIF indicates the effectiveness or precision of an item to measure the latent trait at different levels of the trait continuum, while the TIF provides the effectiveness or precision of all the items together to measure the latent trait at different levels of the trait continuum. The standard error of measurement of the TIF provides a measure of the imprecision of the TIF along the trait continuum. The 2-PLM can also compute the latent trait score (theta) for each participant, and this is based on the participant’s specific pattern of item responses for the set of items in the model.
The studies by Krueger and Finger (2001), and McGlinchey and Zimmerman (2007) focused predominantly on adults, whereas the study by Gomez and Vance (2015) focused on adolescents. Krueger and Finger included MDD, GAD, SOP, simple phobia (SIP—a diagnosis in DSM-III-R that is comparable to SPP in DSM-IV/DSM-IV-TR), PD, AG, and DYS in their study. McGlinchey and Zimmerman included MDD, SOP, PD/AG, SPP, and GAD. Gomez and Vance included SAD, SOB, SPP, PD, AG, GAD, OCD, PTSD, DYS, and MDD. The findings in all three studies showed that all the disorders included were strong discriminators of the underlying internalizing dimension (high α values), and measured more of the internalizing dimension in the upper half of the internalizing trait spectrum than the lower half (β values above the mean level of the latent trait spectrum). Also, the TIF showed more precision in the upper half of the internalizing trait spectrum than the lower half (TIF values higher above the mean level of the latent trait spectrum), peaking at around one standard deviation (1 SD) from the mean. In the Krueger and Finger, and McGlinchey and Zimmerman studies, the internalizing trait scores of participants correlated positively and almost perfectly with their number of internalizing disorders diagnosed, and were associated positively with several measures of social burden. In the Gomez and Vance study, the internalizing trait scores also correlated positively and almost perfectly with the number of internalizing disorders diagnosed, strongly positively with the internalizing scores of other measures, and either weakly positively or negatively with the externalizing scores of other measures. Thus, there is support for the external validity for the internalizing dimension in both adults and adolescents.
Although the general findings across past studies in this area are highly comparable, there were also some differences across them. Most notably, relative to other disorders, the difficulty value of GAD was one of the highest in the study by McGlinchey and Zimmerman, whereas it was among the lowest in the studies by Krueger and Finger (2001) and Gomez and Vance (2015). DYS had the lowest value in the Gomez and Vance study, while it had the highest value in the study by Krueger and Finger. McGlinchey and Zimmerman, (2007) have explained differences in IRT properties across studies in terms of the different sets of disorders and the frequencies of comorbidity in the samples examined.
Although the notion of broad internalizing factor has an extensive history of studies in the childhood (Cicchetti & Toth, 1991), to date, no study has examined support for a single internalizing factor for the major internalizing DSM-IV-TR disorders and the IRT properties of these disorders in children. There are reasons to suspect that they could differ from those reported for adults and adolescents. Existing data for anxiety and depressive disorders indicate developmental differences in their onset and prevalence. For anxiety disorders, SAD and SPP emerge usually in early childhoods, with most SAD diagnosed before 5 years of age, and most SPP diagnosed before 8 years of age. SOP and OCD usually emerge during late childhood to early adolescence, with most cases for SOP diagnosed before 12 years of age, and most cases of OCD diagnosed before 14 years of age. Although present during childhood, GAD is more often diagnosed from early adolescence, and AG, PD, and PTSD have very low prevalence in childhood (Beesdo-Baum & Knappe, 2012). For the depressive disorders, both DYS and MDD are more often diagnosed from early adolescents onward (Goldman, 2012). Such developmental differences in onset and prevalence could influence IRT estimates because when IRT models are applied to psychopathology data, they conflate prevalence with thresholds. More specifically, the less prevalent disorders, being the more “difficult” indicators, will tend to have relatively higher difficulty parameter values; whereas the less prevalent disorders, being the less “difficult” indicators, will tend to have relatively lower difficulty parameter values. Given difference in the prevalence of the anxiety and depressive disorder in children, compared with older age groups, it is conceivable that the IRT properties of the internalizing disorders in children could differ from older groups. Thus, the IRT properties of the internalizing disorders in children need to be evaluated directly.
There were three major aims in the current study. The first aim was to examine support for the single internalizing factor for a large group of clinic-referred children. This was done primarily to ascertain if unidimensional IRT model would be applied to the set of disorders in the study. The study included the same set of DSM-IV/DSM-IV-TR internalizing disorders examined by Gomez and Vance (2015) for adolescents: SAD, SOP, SPP, PD, AG, GAD, OCD, PTSD, DYS, and MDD. The second aim was to use 2-PLM to examine the IRT properties of the internalizing disorders. The third aim was to examine the convergent and discriminant validities of the internalizing dimension by correlating participants’ internalizing traits scores (obtained through the 2-PLM analysis) with other internalizing and externalizing scales scores. Related to this, we examined the incremental validity of the internalizing trait score over the internalizing raw score for all the disorders in the IRT model.
Method
Participants
The data for all participants were collected archivally from the Academic Child Psychiatry Unit (ACPU) of the Royal Children’s Hospital, Melbourne, Australia. The ACPU is an outpatient psychiatric unit that provides services for children and adolescents with behavioral, emotional, and learning problems. Referrals are generally from other medical services, schools, and social and welfare organizations. For the current study, we used the records of children, aged between 6 and 12 years, referred between 2004 and 2010, who had been interviewed for clinical diagnosis. In all, there were 1,031 children, with an overall mean age of 9.04 years (SD = 1.97).
Table 1 shows the sociodemographic characteristics and clinical diagnoses of participants in the study. As shown, there was more than twice the number of males than females. Most fathers were employed, and most mothers were mainly employed or involved in home duties. About two third of participants had mothers and fathers who had attended at least secondary schools, and most were from families with income less than $50,000 per year. In relation to clinical disorders, externalizing disorders were highly prevalent, with around 79% and 69% of the participants having attention deficit/hyperactivity disorder and oppositional defiant disorder/conduct disorder, respectively. Among the internalizing disorders, GAD, SPP, and DYS were more prevalent. Around 49%, 35%, and 34% of the participants had GAD, SPP, and DYS, respectively. SAD and SOP were also relatively high, and PD, PTSD, and AG were relatively rare. For those with an anxiety disorder, 54% had a depressive disorder, and for those with a depressive disorder, 87.9% had an anxiety disorder. Also, for those with depressive disorders, 80% had both depressive disorders.
Demographic Characteristics of Participants (N = 1,031).
Measures
Anxiety Disorders Interview Schedule for Children
The Anxiety Disorders Interview Schedule for Children (ADISC-IV; Silverman & Albano, 1996) is a semistructured interview, based on the DSM-IV/DSM-IV-TR diagnostic system (APA, 2000). Although ADISC-IV has been designed primarily to facilitate the diagnosis of the major childhood internalizing disorders, it can also be used for diagnosing other major childhood disorders. The ADISC-IV guideline for diagnosis is that the child be given diagnosis of all disorders meeting the diagnostic criteria. However, ADISC-IV diagnoses do not take into account the hierarchical, exclusionary rules outlined by the DSM-IV for making diagnoses. There are ADISC-IV versions for parent interview and child interview. All diagnoses reported in this study were based on parent interviews. Only parent interviews were used as there is evidence of poor levels of agreement for diagnosis between information across the child and parent versions of the ADISC-IV (Grills & Ollendick, 2003), and because clinical interviews of children can lead to unreliable diagnosis of anxiety disorders (Edelbrock, Costello, Dulcan, Kalas, & Conover, 1985). The parent version of ADISC-IV has sound psychometric properties (Silverman, Saavedra, & Pina, 2001). Test–retest reliability for the ADISC-IV scores over a 7- to 14-day interval has shown good-to-excellent reliability. Kappa values for interviews with parents ranged from 0.65 to 1.00 (Silverman et al., 2001).
Child Behavior Checklist/6-18 and Teacher Report Form
The Child Behavior Checklist/6-18 (CBCL) and the Teacher Report Form (TRF; Achenbach & Rescorla 2001) are two of the measures in the Achenbach System of Empirically Based Assessment (Achenbach & Rescorla, 2001). The CBCL, completed by parents, has 113 items, while the TRF has 120 items for teacher completion. Both are used to rate children between 4 and 18 years of age. Respondents indicate the degree or frequency of each behavior described in the item on a scale of 0 (not true), 1 (somewhat or sometimes true), or 2 (very true or often true). The standard rating period is 6 months for the CBCL and 2 months for the TRF. The CBCL and the TRF have excellent psychometric properties (Achenbach & Rescorla, 2001). The CBCL and the TRF include scales for various behavior and emotional problems syndromes. In addition, they also provide two broad scores for internalizing behavior problems, and externalizing behavior problems. The total raw scores for these scales were used to examine the convergent and discriminant validities of the internalizing dimension.
Procedure
The study was approved by the RCH ethics committee as part of ACPU’s comprehensive examination of children and adolescents referred for psychological problems. Each legal guardian and participant provided informed written consent for any data provided by them to be used in future research studies. This is a standard part of the ACPU assessment procedure.
All children and their parents participated in separate interviews and testing sessions with breaks over 2 days. Information was also obtained from teachers using various checklists and questionnaires. In all cases, parental consent forms were completed prior to the assessment. The data collected covered a comprehensive demographic, medical (primarily neurological and endocrinological), educational, psychological, familial, and social assessment of the child and child’s family. All psychological data were collected by research assistants, who were students in clinical psychology, and under the supervision of two registered clinical psychologists. The research assistants were provided with extensive supervised training by registered psychologists prior to them collecting data. This training for the ADISC-IV-P included observations of it being administered by the psychologists. The research assistants commenced administering the ADISC-IV only after they attained competence in its administration, as assessed by the registered psychologists. There was adequate interrater reliability for the diagnoses made between the research assistants and the psychologists (κ = .88). Standard procedures were used for the administration of all measures. Where necessary, researchers read the items to participants (approximately 5% of the sample). Approximately 95% of the parent ADISC-IV interviews involved mothers only, and the rest involved fathers only or both fathers and mothers together. Using the categorical data from the parent ADISC-IV, clinical diagnosis was determined by a consultant child psychiatrist who independently reviewed these data.
Statistical Procedures
The one-factor model or unidimensionality for the internalizing disorders (an assumption in the 2-PLM) was examined using the confirmatory factor analysis (CFA) procedure for ordered-categorical data, as recommended by Reeve et al. (2007) and Hill et al. (2007). Support for unidimensionality is inferred when there is good model fit, with significant and substantial factor loadings. Mplus Version 7.11 (Muthen & Muthen, 1998-2013), with weighted least squares means and variance adjusted (WLSMV) χ2 estimator, was used to conduct all the CFA models. Like all other χ2 values, WLSMV χ2 values are inflated by large sample sizes. In addition to the WLSMV χ2, the fit of the models was examined using root mean squared error of approximation (RMSEA) and the comparative fit index (CFI). For this, model fit was based on guidelines suggested by Hu and Bentler (1998): RMSEA values ≤.06 are good fit, values >.06 to .08 are moderate fit, values >.08 to .10 are marginal fit, and values >.10 are poor fit. For the CFI, values of 0.95 or above are good fit, values >.90 and <.95 are acceptable fit, and values less than .90 are poor fit. Where necessary, nested models were compared using ΔCFI > .01 and ΔRMSEA > −.015 (Chen, 2007; Cheung & Rensvold, 2002).
This study used Item Response Theory for Patient-Reported Outcomes (IRTPRO) Version 3.1 (Cai, du Toit, & Thissen, 2011) to perform the 2-PLM analyses. For each disorder, the following IRT psychometric parameters were examined: ICC (graphically), α, β, and IIF (graphically). In addition, for the overall internalizing dimension, the TIF was also examined (graphically). For the ICC, IIF, and TIF graphs, the x-axis is the trait (θ) scale from −3.00 to 3.00, with mean = 0 and SD = 1. The total latent trait scores (theta) for participants, based on their specific patterns of endorsement for the set of disorders, were computed using expected a posteriori (Bock & Aitkin, 1981).
As 2-PLM is model based, it is necessary to test if there is model–data fit. Item level fit to model was examined using the S-χ2 item-fit statistic for each item provided by IRTPRO. For each response category, S-χ2 indicates the degree of similarity between model-predicted and empirical (observed) response frequencies, with a statistically significant value indicating poor fit. Given that this value is sensitive to large sample size, statistical significant cutoff values was set at p = .001 (Stone & Zhang, 2003). In addition to item fit, the fit for the overall model was examined using the M2 limited information goodness-of-fit statistic and its associated p value, and RMSEA value (Cai, Maydeu-Olivares, Coffman, & Thissen, 2006). These values are also provided by IRTPRO on request. M2 values assume perfect model–data fit in the population, with smaller values indicating better fit. However M2 values are also sensitive to large sample size. Consequently, this study used p at .001 to infer statistical significance. The RMSEA values reported in IRTPRO are interpreted as in CFA models, with values close to .06 or less indicating good fit.
In addition to unidimensionality, the 2-PLM assumes local independence. Local independence implies that associations between items are only caused by the underlying latent trait. Local independence was examined using the standardized local dependence (LD) χ2 statistic for each item pair provided by IRTPRO. Generally LD χ2 statistics greater than 10 are considered large and reflecting likely LD (Cai et al., 2011). In addition to the standardized LD χ2 statistic, the output for the one-factor CFA model was also be used. For this, support for local independence can be inferred when no residual correlation is more than .20 (Morizot, Ainsworth, & Reise, 2007), and when none of the modification index (MI) for error covariances is abnormally large compared to the others (Hill et al., 2007).
To examine concurrent validity of the internalizing dimension, the internalizing traits scores (obtained through the 2-PLM analysis) were correlated with the raw total scores of the CBCL and TRF internalizing scales. They were also correlated with the raw total scores of the CBCL and the TRF externalizing scale scores to ascertain discriminant validity. For each association involving the CBCL and TRF, the incremental validity of the internalizing traits scores compared to internalizing raw scores (that is, the number of diagnoses endorsed) was examined by computing partial correlation, controlling for the internalizing traits scores. Significant correlations were inferred as support for the incremental validity of the internalizing traits scores.
Results
One-Factor Model (Unidimensionality) for the Internalizing Disorders
The fit values for the one-factor model with the 10 disorders were WLSMV χ2(df = 35) = 148.16, p < .001; RMSEA = .056; CFI = .928. Based on Hu and Bentler (1998) guidelines, the RMSEA indicated good fit, while the CFI indicated only acceptable fit. Thus, there was mixed fit for this model, and therefore its unidimensionality. Also for this model, there was extremely large MI between the error variances for DYS and MDD (MI = 70.82). The fit values of a revised model with these error variances correlated were WLSMV χ2(df = 34) = 78.46, p < .001; RMSEA = .036; CFI = .972, and the fit values for this model were significantly better than the initial model, ΔRMSEA = .020 and ΔCFI = −.044. Given this, DYS and MDD were collapsed to a new variable reflecting either presence of DYS or MDD or presence of both DYS and MDD (D/MD). The fit of the CFA with D/MD and the other eight original disorders were WLSMV χ2(df = 27) = 48.96, p < .01; RMSEA = .028; CFI = .983. Both the RMSEA and CFI values indicated good fit for this model. The fit values of a revised model with these error variances correlated were WLSMV χ2(df = 26) = 36.38, ns; RMSEA = .020; CFI = .992. The fit values for this revised model did not differ from the model with the nine disorders without the path between the error variances between OCD and PTSD, ΔRMSEA = .008 and ΔCFI = −.009. Thus, unlike, the model with all 10 disorders, the revised model with 9 disorders showed good support for unidimensionality. The factor loadings for the nine disorders in this model were all significant (p < .001), and salient (ranging from .80 to .45). Table 2 presents the tetrachoric correlation matrix between the nine disorders. As shown in the table, the disorders were all significantly and positively correlated (p < .001).
Tetrachoric Correlations for Nine Unipolar Mood and Anxiety Diagnoses (N = 1,603).
Note. SAD = separation anxiety disorder; SOP = social phobia; SPP = specific phobia; PD = panic disorder; AG = agoraphobia; GAD = generalized anxiety disorder; OCD = obsessive-compulsive disorder; PTSD = posttraumatic stress disorder; D/MD = dysthymia/major depressive disorder. p < .001 for all correlations.
Item Parameter Estimates for the Internalizing Disorders
Given good support for the unidimensionality of the CFA model with D/MD and the other eight original disorders, we used this nine-disorder model for the 2-PLM analysis. Prior to the examination of the IRT parameters, we examined support for local independence, and model data fit for this model. In relation to local independence, the standardized LD χ2 statistic for each item pair provided by IRTPRO ranged from 0.0 to 4.2. Additionally, the CFA of this model indicated that the highest residual correlation was .14, and the remaining residual correlations ranged from .00 to .10. In addition, for the CFA model with D/MD and the other eight original disorders, the highest MI between the error variances were all relatively small (<12.66). These findings imply good support for the local independence for the model used in the 2-PLM. Table 3 shows the S-χ2 item-fit statistic for each item provided by IRTPRO from the calibration results. It shows that with the exception of D/MD, the S-χ2 statistic for all other disorders indicated satisfactory fit (i.e., p < .001). In relation to fit at the model level, the M2 value was not significant at p = .001—M2(27) = 47.3, p = .009—and the RMSEA value was .03. These findings can be interpreted as evidence of good model–data fit. Overall, therefore, there was support for local independence, and model–data fit for the 2-PLM tested.
Two-Parameter Logistic Item Response Model Parameter Estimates and S-χ2 Statistics.
Note. IRT = item response theory; SAD = separation anxiety disorder; SOP = social phobia; SPP = specific phobia; PD = panic disorder; AG = agoraphobia; GAD = generalized anxiety disorder; OCD = obsessive-compulsive disorder; PTSD = posttraumatic stress disorder; D/MD = dysthymia/major depressive disorder; df = degrees of freedom.
The discrimination and difficulty parameters for the nine diagnostic indicators in the 2-PLM are provided in Table 3. Figure 1 shows the ICCs for these indicators. Table 3 shows that although there was wide variability, the discrimination values for all diagnoses were high, ranging from 0.92 to 2.27, thereby indicating that each disorder was good at discriminating the underlying internalizing dimension. The order in terms of increasing discrimination values were SPP, OCD, PTSD, D/MD, SOP, SAD, PD, AG, and GAD. The values were 0.92, 0.95, 0.96, 1.04, 1.27, 1.63, 1.82, 2.09, and 2.27, respectively. Table 3 show that although there was variability for the difficulty values, all disorders were located from .33 to 2.34. The difficulties value for GAD, D/MD, SPP, SOP, SAD, OCD, PD, PTSD, and AG were 0.33, 0.65, 0.78, 1.04, 1.09, 1.75, 1.86, 2.33, and 2.34, respectively.

Item characteristic curve (ICC; indicated as Curve 1 in figure) and item information function (dotted) of the internalizing disorders.
Figure 2 shows the TIF graph for all the disorders together in the internalizing factor. It had relatively low values up to about the mean trait level. They were relatively high from the mean level onward. The TIF peaked at around +2 SD from the mean. Figure 1 also shows the IIF graphs for the different disorders. It shows that GAD had relatively high information between −0.5 SD and +1.5 SD from the mean. PD, and AG, and to a lesser degree, SOP and SAD, had relatively high information between roughly 0.5 SD and +1.5 to + 2 SD from the mean. In contrast, SPP, OCD, PTSD, and D/MD had relatively low information across the entire trait spectrum.

Test information function and its standard error (dotted line) for the internalizing dimension.
Convergent and Discriminant Validities of the Internalizing Dimension
Correlation analysis indicated that the IRT derived trait scores were significantly correlated with the internalizing scale scores of the CBCL (r = .69, p < .001) and the TRF (r = .27, p < .001). There was also significant correlation with the externalizing scale scores of the CBCL (r = .16, p < .001) and the TRF (r = .10, p < .05). Based on guidelines suggested by Cohen (1992; correlations .1 to <.3 = small, correlations of .3 to <.5 = medium, and correlations ≥.50 = large), the correlations with the number of diagnoses was large, and the correlations with the internalizing scale scores of the CBCL and the TRF were of large and small effect sizes, respectively. In contrast, the correlations with the externalizing scale scores of the CBCL and the TRF were small and negligible, respectively. These findings can be interpreted as supportive of the convergent and discriminant validities of the internalizing latent dimension. Partial correlation analyses (controlling for the internalizing trait scores) between the total internalizing raw scores with the internalizing and external scales scores for the CBCL and TRF were all not significant (all not more than .05), thereby not supporting the incremental validity of the internalizing traits scores.
Discussion
One aim of the current study was to examine if there was support for a one-factor model for the following DSM-IV/DSM-IV-TR internalizing disorders in a group of clinic-referred children: SAD, SOP, SPP, PD, AG, GAD, OCD, PTSD, DYH, and MDD. The findings for these 10 disorders indicated mixed fit, with extremely high MI between the error variances for DYS and MDD. The fit values of a revised model in which DYS and MDD were collapsed to a new variable (D/DM) reflecting either DYS or MDD, or both DYS and MDD with the other eight disorders indicated good fit, The loadings for all nine disorders were significant and salient. Related to this, existing data for adolescents for the 10 internalizing disorders (Gomez et al., 2014), and for adults with various sets of internalizing disorders (Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007) have supported a unidimensional factor.
The study then used 2-PLM to examine the IRT properties of the disorders in the model with D/MD and the other eight internalizing disorders. The findings showed that generally all the disorders had high discrimination values, and their difficulty values were all located from .33 to 2.34. The discrimination values indicated that all the disorders were relatively strong discriminators of the internalizing dimension, and the difficulty parameters values indicated that all the disorders measured more of the internalizing dimension from the mean level of this dimension. Another general finding was that the TIF values were higher in the upper half (above the mean) of the internalizing trait continuum. This finding indicates that when all disorders were taken together they provide more measurement precision (or reliability) in the upper half of this trait continuum, but not the lower half. All these general findings were as hypothesized, and are consistent with previous IRT studies involving adults (Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007) and adolescents (Gomez & Vance, 2015).
Despite the general finding, there were noteworthy differences across the disorders. Based on guidelines proposed by Baker (2001) that discrimination value <0.64 is low, 0.65 to 1.34 is moderate, 1.35 to 1.69 is high, and >1.69 is perfect (interpreted here as very high), our findings indicate moderate values for SPP, OCD, PTSD, D/MD, and SOP; high value for SAD; and very high values for PD, AG, and GAD. In absolute terms the difficulties values for GAD, D/MD, SPP, SOP, SAD, OCD, PD, PTSD, and AG were 0.33, 0.65, 0.78, 1.04, 1.09, 1.75, 1.86, 2.33, and 2.34, respectively, from the mean. It is important to note that the discrimination and difficulty parameters are not completely independent since the discrimination value of an item will determine the item’s difficulty value. Because the item difficulty tells us the location on the construct continuum where the probability of a disorder reaches 50%, that location may be further to the right on the continuum as a consequence of a lower slope. Thus, the locations of the threshold parameters for different disorders need to consider the influence of the slope parameters. As already noted, SPP, OCD, PTSD, D/MD, and SOP; and PD, AG, and GAD had comparable discrimination values. For SPP, OCD, PTSD, D/MD, and SOP, the difficulty values in increasing order was for D/MD, SPP, SOP, OCD, and PTSD. For PD, AG, and GAD, the difficulty values in increasing order was GAD, PD, and AG. In relation to IIF, GAD had relatively high information between −0.5 SD and +1.5 SD from the mean. PD, and AG, and to a lesser degree, SOP and SAD, had relatively high information between roughly −0.5 SD and +1.5/+2 SD from the mean, and SPP, OCD, PTSD, and D/MD had relatively low information across the entire trait spectrum.
Taken together, the discrimination parameter values for the different disorder indicate that PD, AG, GAD, and SAD are relatively stronger discriminators of the internalizing dimension than SPP, OCD, PTSD, D/MD, and SOP. The difficulty discrimination values indicate that between SPP, OCD, PTSD, D/MD, and SO, D/MD is better than the other disorders to represent the internalizing dimension, followed by SPP, SOP, OCD, and PTSD. Also between PD, AG, and GAD, GAD is better than the other disorders to represent the internalizing dimension, followed by PD and AG. Seen together, the IIF values for the different disorders indicate that GAD, PD, AG, SOP, and SAD are relatively more reliable for measuring the internalizing dimension (from around the mean trait level to around 1.5/2 SD from the mean); and SPP, OCD, PTSD, and D/MD have relatively low reliability for measuring the internalizing dimension across its entire spectrum. It is important to note that these are reliabilities of the different internalizing disorders to measure the internalizing dimension and not their own individual unique reliabilities. For the latter several studies involving parent ratings of scales comprising internalizing symptoms, inspired by DSM-IV-TR, have demonstrated acceptable internal consistency reliabilities (Ebesutani, Bernstein, Nakamura, Chorpita, & Weisz, 2010; Spence, 1997). For example, Ebesutani et al. reported the following internal consistency values: MDD = 0.83, SAD = 0.83, SOP = 0.8; GAD = 0.88, PD = 0.8, and OCD = 0.84.
There are some noteworthy similarities and differences for the discrimination and difficulty parameter findings in the current study and the previous studies (Gomez & Vance, 2015; Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007), and the IIF values across the findings in the current study and the Gomez and Vance study. Krueger and Finger, and McGlinchey and Zimmerman did not provide IIF results in their articles. Like this study, all previous studies found relatively higher discrimination values for GAD (Gomez & Vance, 2015; Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007). Also, like this study, Gomez and Vance found moderate values for SPP, OCD, PTSD, and very high values for PD, AG, and GAD. However, unlike the current study that found moderate and high values for SPP and SAD respectively, Gomez and Vance found high and moderate values for SPP and SAD, respectively. While the current study found moderate values for D/MD, Gomez and Vance found very high values for DYS and MDD. In relation to difficulty parameter values, like the findings in this study, Gomez and Vance found that GAD is better able to represent the internalizing dimension from around 0.5 SD from the mean trait level; depressive disorders, SPP and SOP are better able to represent the internalizing dimension from around 1 SD above the mean; OCD and PD are better able to represent the internalizing dimension from slight below 1.75 SD above the mean; and PTSD and AG are better able to represent the internalizing dimension from slightly below 2.25 SD above the mean. Like this study and the study by Gomez and Vance, Krueger and Finger also found relatively low difficulty values for GAD. However, it had one of the highest values in the study by McGlinchey and Zimmerman. Additionally, although the current study found that SAD is better able to represent the internalizing dimension from around 1 SD above the mean, Gomez and Vance found that SAD is better able to represent the internalizing dimension from around 2 SD above the mean. In relation to IIF, like the findings in the current study, Gomez and Vance found that GAD had relatively high IIF values from around −1 SD to around 2 SD from the mean, PD and AG contributed relatively high information, and SPP, PTSD, and OCD contributed relatively high information. However, unlike the findings in the current study, Gomez and Vance found relatively high information for the depressive disorders, and low information for SOP and SAD. Thus, with the exception of the depressive disorders, SOP and SAD, the findings for children and adolescents are somewhat comparable.
When the findings for the discrimination parameters in this and previous studies are taken together, it could be speculated that the anxiety disorders have comparable ability to discriminate the internalizing dimension across children, adolescents, and adults, and that for these groups, GAD is a relatively better discriminator than the other disorders. Additionally, the depressive disorders have relatively less discrimination ability in children than adolescents. In relation to the difficulty parameters, the current and past studies indicate that GAD measured more of the internalizing dimension at around the mean trait level in children, adolescents, and adults. Also, with the exception of SAD, the other internalizing disorders have comparable representation of the internalizing dimension across the trait spectrum in children and adolescents. Across the current and the past Gomez and Vance studies, the IIF results indicate that compared with adolescents, the depressive disorders are relatively less reliable indicators of the internalizing dimension in children, and SOP and SAD are relatively more reliable indicators of the internalizing dimension in children.
The current study also examined the convergent and discriminant validities of the internalizing latent trait factor. Correlation analyses indicated that the trait scores of participants correlated with large effect size with the number of internalizing diagnoses. It correlated with large and small effect sizes with CBCL and TRF internalizing scales, respectively, and small and negligible effect sizes with CBCL and TRF externalizing scales, respectively. These findings are supportive of the convergent and discriminant validities of the internalizing latent dimension. The support for the convergent validity of the internalizing latent factor is consistent with the findings of past studies in this area (Gomez & Vance, 2015; Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007). Thus, there was also support for the external validity of the internalizing dimension.
The findings in the study and the previous studies in this area (Gomez & Vance, 2015; Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007) have implications for clinical practice. In relation to diagnosis, the close associations between the anxiety and depressive disorders found in these studies highlight the need for a comprehensive evaluation of all the internalizing disorders for a better understanding of psychopathology in children, adolescents, and adults. The findings in this study and that of Gomez and Vance, and Krueger and Finger, showing that GAD has high discrimination values and ability to represent the internalizing dimension with high reliability at least from the mean trait level of the internalizing disorder, raise the possibility that GAD could be seen as an important disorder to focus on in individuals with multiple comorbid anxiety and depressive disorders. Indeed, GAD would be seen as useful when diagnosing, treating, and monitoring changes in children who present with a wide range of internalizing disorders. In relation to treatment, the findings imply that treatment of anxiety and depressive disorders in children may have to focus on general distress rather than the individual disorders. In this respect, recently developed transdiagnostic treatment approaches for anxiety and depressive disorders in children and adolescents (e.g., Ehrenreich-May & Bilek, 2012) and adults (e.g., McEvoy & Nathan, 2007) would be valuable. In brief, transdiagnostic approaches focus on common factors that produce symptoms in related classes of disorders, such as anxiety and depressive, thereby addressing common concerns or disorders within an individual (McEvoy, Nathan, & Norton, 2009).
The findings in the study have implications for clinical classification. First, support for the one-factor model in this and the previous studies in this area (Gomez & Vance, 2015; Krueger & Finger, 2001; McGlinchey & Zimmerman, 2007) suggest that from a psychometric viewpoint, the internalizing disorders could be grouped together in one broad overarching category of emotional disorders (Watson, 2005). In this respect it also worth noting that anxiety and depressive disorders that form the internalizing dimension share many similar genetic, familial, and environmental risk factors (Kendler, 1996; Kessler et al., 2005), and cognitive-affective, interpersonal, and behavioral maintaining factors (Harvey, Watkins, Mansell, & Shafran, 2004). Second, this study found mixed fit when DYS and MDD were considered as separate disorders in the one-factor model. However, it showed good fit for a revised one-factor model when DYS and MDD were collapsed to reflect either DYS or MDD or both together. This is supportive of the approach taken in DSM-5, where DSM-IV DYS and MDD are combined as a single disorder because of inability to find scientifically meaningful differences between DYS and MDD (DSM-5, APA, 2013). We believe that the DSM-5 approach is appropriate from a psychometric viewpoint, at least in children.
In conclusion, there are limitations in the study that need to be considered. First, around 79% and 69% of the participants had attention deficit/hyperactivity disorder and oppositional defiant disorder/conduct disorder. As this was not controlled it is uncertain if this exerted any influence on the findings. Second, as this study examined clinic-referred children, the findings here may not be applicable to depressive and anxiety disorders in children from the general community. Third, all the participants in this study were from the same clinic. It is possible that this may constitute an additional bias. Fourth, this was a predominantly male sample, and this may have added some bias to the findings. Fifth, it is important to keep in mind that although children with internalizing disorders were the target of analysis, the information about these disorders were derived from parents’ interviews and not children themselves. It is possible that this may not have influenced parameter estimates. This is because, as already discussed, there were some notable differences in the current study and the study by Gomez and Vance (2015) that involved adolescent interviews of their internalizing disorder symptoms. Given these limitations, there is a need for cross-validation of the findings. However, given the general comparability of the findings and past studies, it could be argued that there is some degree of stability in the IRT properties of the internalizing disorders across the lifespan.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
