Abstract
Key concerns about the psychometric properties of the 25-item version of the Strengths and Difficulties Questionnaire (SDQ) have consistently been raised in the literature. The present study aimed at examining the meaningfulness of an alternative model to the SDQ in which 7 problematic items are excluded. French-speaking parents of 262 boys and 263 girls aged 6 to 16 years completed the SDQ. Through confirmatory factor analyses (CFAs), results provided support for a new, reduced, and psychometrically sound version of the SDQ called SDQ-R that displayed good factorial validity, construct validity, reliability, and multi-group invariance across gender. Overall, the attractive features of the SDQ-R make it a promising instrument for quickly screening emotional and behavioral problems in children and adolescents.
Keywords
Introduction
The Strengths and Difficulties Questionnaire (SDQ; R. Goodman, 1997) is one of the most widely used screening and research tools in child and adolescent mental health practice. It consists of 25 items covering five scales relating to emotional symptoms, peer problems, conduct problems, hyperactivity, and prosocial behavior.
One central and recurrent issue is about the psychometric properties of the SDQ in its current 25-item version. Regarding the factorial validity of the SDQ, many studies reported only a “modest” and/or “questionable” fit between observed data and models, regardless of which factorial structure was tested (Rønning, Handegaard, Sourander, & Mørch, 2004; Van Leeuwen, Meerschaert, Bosmans, De Medts, & Braet, 2006). At the item level, further difficulties have also been evidenced from SDQ studies. For instance, a number of authors reported less than acceptable loadings (i.e., <.40) of several items on their intended factor (e.g., “Steals from home and school,” “Picked on or bullied,” or “Kind to younger children”; Capron, Thérond, & Duyme, 2007; D’Acremont & Van der Linden, 2008; Rønning et al., 2004; Van Leeuwen et al., 2006). Some others observed items associated with unjustified cross-loadings (e.g., “Often unhappy”; Dickey & Blumberg, 2004; Mojtabai, 2006; Percy, McCrystal, & Higgins, 2008). Some concerns bearing on the construct validity of the 25-item version of the SDQ have also been formulated. Contentwise, some items are semantically far from or fairly disproportionate compared with other items of their scale, so they fail to reflect the same underlying latent dimension (e.g., unlike the other items of the Emotional problems scale, the item “Has headaches or stomach aches” might be considered as referring to a physiological problem rather than to an emotional one; the item “Fights a lot or bullies others” might be considered as both disproportionate and inherently different from other items of the Conduct problems scale because “bullies” is a very strong term and it is the only one that includes the others); other items are susceptible to socially desirable responding (e.g., “Steals from home and school”) or to be redundant within one dimension (e.g., “Restless, overactive, cannot stay still for long” and “Constantly fidgeting and squirming” within the hyperactivity dimension). In parallel, reliability estimates (Cronbach’s α coefficients) for the Conduct and Peer problems scales were regularly lower than sufficient (i.e., <.70 and even <.60; Capron et al., 2007; Van Leeuwen et al., 2006).
Accordingly, the aim of the present study was to examine the meaningfulness of an alternative model to the current SDQ in which problematic items are excluded to ultimately propose a new, reduced, and psychometrically sound version of the SDQ (named SDQ-R). Based on aforementioned evidences from the literature, a set of seven problematic items were excluded from the model. They were “Often unhappy” and “Has headaches or stomach aches” (from the Emotional symptoms scale), “Picked on or bullied” (from the Peer problems scale), “Steals from home and school” and “Fights a lot or bullies others” (from the Conduct problems scale), “Constantly fidgeting and squirming” (from the Hyperactivity scale), and “Kind to younger children” (from the Prosocial scale). In this study, we focused on the parent version of the SDQ as it is the one most widely used for research purposes (Stone et al., 2013).
Method
Participants, Material, and Procedure
A sample of parents of 720 children attending the public junior secondary school located in the French-speaking region of Switzerland (Fribourg) was asked to complete a French translation of the 25-item SDQ. 1 The informant scored each item using a 3-point Likert-type scale to indicate how each item applies to the target participant. The options were 0 (not true), 1 (somewhat true), and 2 (certainly true). The response rate was 73%, resulting in a final sample of 525 children (263 girls, 262 boys) aged from 6 to 16 years (M = 11.94, SD = 2.93).
Statistical Analysis
All the analyses were performed using Mplus (version 7.3; Muthén & Muthén, 1998-2012). As the items of the SDQ are ordered-categorical measures, we used a weighted least squares mean and variance adjusted (WLSMV) estimator which is an estimator specifically designed for models involving ordinal, non-normally distributed data, 2 and which performs well with sample sizes as moderate as our sample size (Brown, 2006). Inspection of the data revealed only 20 missing data (about 0.15% of the SDQ item scores). The WLSMV–Pairwise Deletion (PD) estimator was used for handling them.
Results and Discussion
CFA of the SDQ-R: Testing for Psychometric Properties of the SDQ-R
According to the cutoff values recommended by Hu and Bentler (1999), 3 fit indices indicated an acceptable to good fit, WLSMV χ2(df) = 335.41(125), comparative fit index (CFI) = .925, Tucker–Lewis index (TLI) = .908, root mean square error of approximation (RMSEA) = .057, weighted root mean square residual (WRMR) = 1.178, for the SDQ-R model. More importantly, all fit indices (except for RMSEA 4 ) showed a significant improvement compared with the classic 25-item version of the SDQ for which these indices failed to demonstrate an acceptable fit, WLSMV χ2(df) = 692.69(265), CFI = .878, TLI = .862, RMSEA = .055, WRMR = 1.377.
Table 1 presents the SDQ-R model. All standardized factor loadings were equal or higher than .40 (and 15 of the 18 standardized loadings were above .50). Along with the good overall fit indices of the model, this pattern of factor loadings showed evidence of factorial validity of the SDQ-R. This result represents an original contribution to the psychometric literature on the SDQ because the SDQ-R offers a clean five-factor structure consisted of the originally hypothesized five subscales, where the SDQ in many previous studies using the parent SDQ (e.g., Van Leeuwen et al., 2006) or another version of the French SDQ (e.g., Capron et al., 2007; D’Acremont & Van der Linden, 2008) did not offer such a clean structure (A. Goodman, Lamping, & Ploubidis, 2010).
Standardized Factor Loadings (Standard Errors of Estimates), Cross-Scale Correlations, and Reliability of the SDQ-R Factors.
Note. Standard errors of estimates are given in parenthesis. Reliability is given in brackets. SDQ-R = reduced version of Strengths and Difficulties Questionnaire.
The response categories for these items were reversed before the analyses.
p < .01.
As shown in Table 1, correlations between the five latent factors revealed (a) a set of medium-sized correlations among all difficulties scales (except for the high correlation between conduct problems and hyperactivity, providing additional support to the idea that conduct and hyperactivity disorders are often comorbid—see A. Goodman et al., 2010) and (b) strongly negative correlations between all these problem-oriented scales but the Emotional symptoms scale and the Prosocial behavior scale. This is fully in line with the findings of previous studies of the parent SDQ (e.g., Stone et al., 2013; in the Netherlands) and the self-report and teacher versions of the French SDQ (Capron et al., 2007; D’Acremont & Van der Linden, 2008). As such, this result was a clue for construct validity of the SDQ-R.
Reliability as estimated by the rho coefficient (all ≥.70) provided evidence for the reliability of all the SDQ-R scales. This is an important finding because most of previous studies using the parent SDQ (e.g., McCrory & Layte, 2012) or another version of the French SDQ (Capron et al., 2007) have reported less than acceptable reliability estimates. 5
Multi-Group CFA of the Reduced SDQ: Testing for Multi-Group Invariance Across Gender 6
As outlined by Brown (2006), a preliminary CFA was first conducted separately for each group to ensure that the SDQ-R was acceptable in both groups. All fit indices indicated an acceptable fit for boys (CFI = .904, TLI = .882, RMSEA = .067, WRMR = 1.072) and an acceptable to good fit for girls (CFI = .946, TLI = .934, RMSEA = .045, WRMR = .917). Having now checked this prerequisite, we were ready to test for multi-group invariance across gender. To assess invariance, we used not only the corrected chi-square difference test (using the DIFFTEST option in Mplus) 7 but also the decrease in CFI criterion first proposed by Cheung and Rensvold (2002). 8 In the initial step, a test of configural invariance was conducted which required only the same number of factors and the same pattern of factor loadings across groups (as such, no equality constraints were imposed across gender). Table 2 shows the goodness-of-fit statistics for each model, the WLSMV χ2 DIFFTEST, and the CFI decrease. The configural invariance model showed acceptable to good fit indices. In particular, RMSEA for this model was .056 and fell below the .093 cutoff, indicating configural invariance. In the next step, weak factorial invariance was tested by constraining the factor loadings to be equal across groups. Again, the fit indices indicated an acceptable to good model fit. Moreover, the corrected chi-square difference test (DIFFTEST) showed no significant difference with the previous model and the CFI increased by .010 (and, thus, fell below the .009 cutoff of CFI decrease), both indicating weak factorial invariance. In the next analysis, the two-item thresholds of each item were constrained equal across groups to test for strong factorial invariance. 9 Results showed acceptable to good fit indices for this model. Moreover, although the DIFFTEST was significant (p = .002), the decrease in CFI was .008 (and fell below the .0082 cutoff), suggesting strong factorial invariance. In the next step, we tested for—and found—strict factorial invariance by adding an across-group equality constraint on the item residual variances. Indeed, all fit indices showed an acceptable to good model fit, the DIFFTEST with the previous model was non-significant, and the CFI increased by .004. At this level, as this study is a part of the few existing studies that have examined (and evidenced) strict factorial (measurement) invariance of the SDQ (here the SDQ-R) parent version (see Stone et al., 2013, for details), this study adds to the literature on the SDQ. Having established strict invariance across gender, we were in a position to validly compare the factor variances, covariances, and latent means across boys and girls (Sass, 2011). We then added a constraint on the equality of factor variances to test for variance invariance. Once again, model fit was acceptable to good, the DIFFTEST was non-significant, and the CFI increased by .002 (and, thus, fell below the .006 cutoff of CFI decrease), indicating variance invariance. Subsequently, we added an across-group equality constraint on the factor covariances, and we demonstrated factor covariance invariance by a good model fit, a non-significant DIFFTEST, and a .014 increase in CFI. Finally, to test for factor mean invariance, factor means were constrained to be equal across groups (i.e., fixed to zero in both groups). All fit indices indicated an acceptable to good fit. However, as can be seen in both from the significant DIFFTEST and the decrease in CFI of .009 (above the .006 cutoff), there was no factor mean invariance: Some latent means differed between boys and girls. Inspection of modification indices thus revealed that the means of Emotional symptoms, Hyperactivity, Peer problems, and Prosocial behavior scales differed across groups. For each of them, group mean difference was associated with a small to moderate effect size correlation (reffect size) with 0 not included in the 95% confidence interval (CI). Specifically, girls had higher scores for emotional symptoms, M = .18, reffect size (95% CI) = .15 [.07, .23], and prosocial behavior, M = .77, reffect size (95% CI) = .18 [.10, .26], and lower scores than boys for hyperactivity, M = −.17, reffect size (95% CI) = −.10 [−.19, −.02], and peer problems, M = −.11, reffect size (95% CI) = −.13 [−.21, −.04]. This set of latent mean differences across gender is congruent with the observed mean differences across gender consistently showed in France (Capron et al., 2007) as well as in the international literature on the SDQ (e.g., Rønning et al., 2004, in Norway). More importantly, as these mean differences across gender have been found at the latent level, this shows that these differences are “real,” rather than the result of measurement non-invariance of the SDQ-R. As such, this study represents an important contribution to the research on emotional and behavioral problems in boys and girls at school.
Fit Indices and Invariance Across Gender for the Multi-Group CFAs of the SDQ-R.
Note. Each WLSMV χ2 DIFFTEST compares two consecutive models (unconstrained vs. constrained). CFIdecrease = CFIunconstrained model − CFIconstrained model. CFAs = confirmatory factor analyses; SDQ-R = Strengths and Difficulties Questionnaire-Reduced; WLSMV = weighted least squares mean and variance; CFI = comparative fit index; TLI = Tucker–Lewis index; RMSEA = root mean square error of approximation; WRMR = weighted root mean square residual.
This study suffers, however, some limitations. First, further aspects of validation need to be examined before using the SDQ-R as a research tool for assessing child mental health problems. In particular, future studies should examine the construct validity of the SDQ-R more thoroughly. It could be made through multitrait-multimethod matrix (MTMM) analyses as a method for assessing the convergent and discriminant validity of the SDQ-R (A. Goodman et al., 2010). Second, only French-speaking parent ratings were used. To allow for a widespread use of the SDQ-R, future research should replicate the findings of the present study using self-report and teacher ratings from countries with other languages than French. Third, although the sample size was adequate for performing our analyses, future studies could be more extensive by including larger samples. This would have the advantage of allowing for cross-validation as well as additional investigations such as multi-group invariance and test for mean differences across age. Such analyses were beyond the scope of the present study but would be both an interesting and highly relevant focus for future research.
Despite these limitations, this study addressed some key concerns of the literature on the SDQ by providing evidence of the meaningfulness of an alternative model to the current SDQ, and ultimately proposing a new, reduced, and psychometrically sound version of the SDQ named SDQ-R. As such, the SDQ-R has several attractive features that make it a promising instrument for quickly screening emotional and behavioral problems in children and adolescents.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
