Abstract
Keywords
ADHD and oppositional defiant disorder (ODD) are common psychiatric disorders in childhood and adolescence (worldwide pooled prevalences of 3.4%, 95% confidence interval [CI] = [2.6, 4.5] for ADHD and 3.6%, 95% CI = [2.8, 4.7] for ODD; Polanczyk, Salum, Sugaya, Caye, & Rohde, 2015). For the diagnosis of both disorders, the International Classification of Diseases (10th edition, ICD-10; World Health Organization, 1993) and the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5; American Psychiatric Association, 2013) require the presence of functional impairment, which may be defined as “the extent of restriction in a child’s ability to perform important daily life activities including physical, social, and personal activities due to their health condition or to specific symptoms” (Palermo et al., 2008, p. 984). In contrast to the concept of health-related quality of life, which refers to an individual’s subjective perception of the impact of a physical or mental disease on his or her life, functional impairment is thought of as relatively objective (Coghill, Danckaerts, Sonuga-Barke, Sergeant, & ADHD European Guidelines Group, 2009; Danckaerts et al., 2010). Symptoms, functional impairment, and health-related quality of life are often interrelated, but not the same (i.e., more severe symptoms are not necessarily equivalent to higher functional impairment; for example, Angold, Costello, Farmer, Burns, & Erkanli, 1999; Danckaerts et al., 2010).
Often, functional impairment rather than the presence of symptoms is a reason for seeking treatment (see Epstein & Weiss, 2012). The assessment of functional impairment might be useful for verifying the need for treatment, treatment planning, and the measurement of treatment success (Winters, Collett, & Myers, 2005). Due to their high practical relevance, measures of functional impairment should be considered in psychotherapy research in addition to measures of symptoms (e.g., Becker, Chorpita, & Daleiden, 2011).
Instruments assessing functional impairment can be divided into global, multidimensional, and domain-specific types (e.g., Winters et al., 2005). Compared with global rating scales, which provide an overall score indicating functional impairment across several domains, multidimensional scales cover different domains of functioning. They have the advantage that they include more discrete and observable variables than global scales and, as a result, are less susceptible to rater bias (Bates, Furlong, & Green, 2006), more sensitive to change over time, and of greater usefulness for prevention and treatment planning (Canino, Costello, & Angold, 1999). However, they are less cost- and time-efficient (e.g., Winters et al., 2005). Domain-specific instruments concentrate on a single domain of functioning (e.g., school performance) and are thus not able to cover variability across different domains (e.g., impairment might be visible at school, but not at home; Epstein & Weiss, 2012). Moreover, generic measures covering impairment across different disorders can be differentiated from disorder-specific scales, which assess impairment related to a specific disorder (Danckaerts et al., 2010; Epstein & Weiss, 2012). Disorder-specific instruments may be more sensitive to treatment effects but are not suitable for comparing different disorders (Danckaerts et al., 2010).
The Weiss Functional Impairment Rating Scale (WFIRS; Canadian Attention Deficit Hyperactivity Disorder Resource Alliance [CADDRA], 2011) is a multidimensional, disorder-specific instrument, which was originally designed for the assessment of impairment in children and adolescents diagnosed with ADHD. It is available in a self-report and a parent version. The WFIRS is a commonly used instrument in research and practice (e.g., Hantson et al., 2012; see Gajria et al., 2015). However, its psychometric properties have rarely been examined. Three studies, one performed by the authors of the WFIRS, and one Chinese and one international study, altogether found satisfactory internal consistencies for the WFIRS–Parent Report (WFIRS-P) total and subscales (α > .70), test–retest reliability, factorial validity, moderate discriminating validity from symptoms and quality of life measures, as well as moderate convergent validity with other measures of functioning (CADDRA, 2011; Gajria et al., 2015; Qian, Du, Qu, & Wang, 2011).
The current study analyzed the psychometric properties of a German adaptation of the WFIRS-P in a clinical sample of children with externalizing behavior disorders aged 4 to 12 years. We generally consider “externalizing behavior disorders” as a group of disorders including ADHD, ODD, and conduct disorder (CD). However, the sample used in this study comprised only children with ADHD and ODD.
The first aim was to examine the factor structure of the WFIRS-P by confirmatory factor analysis. In contrast to previous articles on the psychometric properties of the WFIRS-P, special emphasis was put on the examination of a bifactor model. Bifactor models can help to answer the question of whether it is useful to calculate a total score and/or subscale scores of an instrument (Olatunji, Ebesutani, & Kim, 2015; Reise, Morizot, & Hays, 2007). As a bifactor model is consistent with the hypothesized factor structure of a general construct of functional impairment (total scale) and specific group factors (subscales), which additionally reflect item variance (see Reise et al., 2007), we supposed that it would provide an adequate fit to the data. Second, the reliability of the WFIRS-P total scale and subscales was assessed. Third, the divergent validity of the WFIRS-P was examined by considering its associations with a measure of ADHD symptoms, the Symptom Checklist for Attention-Deficit/Hyperactivity Disorder (FBB-ADHS; Doepfner, Goertz-Dorten, & Lehmkuhl, 2008), and with a measure of ODD symptoms, the ODD scale of the Symptom Checklist for Oppositional Defiant Disorder and Conduct Disorder (FBB-SSV; Doepfner et al., 2008). Because symptomatology and functional impairment are related, but not equal, constructs, we expected moderate correlations between the WFIRS-P total and subscale scores on one hand, and the total and subscale scores of the symptom checklists on the other hand.
Method
Data were collected in the course of two randomized controlled trials, which examined the efficacy of a telephone-assisted self-help program conducted via written materials and telephone consultations (Dose et al., 2016; Hautmann & Doepfner, 2015). Both studies were registered at ClinicalTrials.gov (Identifier Study 1: NCT01350986; Identifier Study 2: NCT01660425) and approved by the Medical Ethical Committee of the University Hospital of Cologne. All participants provided written informed consent.
Measures
WFIRS-P
The WFIRS-P (CADDRA, 2011) is a parent-rated questionnaire, which was originally designed to assess functional impairment in children diagnosed with ADHD. For this study, we extended its use to externalizing behavior disorders in general. The original version of the WFIRS-P consists of 50 items that can be aggregated to six scales: (a) Family (10 items, for example, “having problems with brothers and sisters”), (b) Learning and School (10 items, for example, “needs extra help at school”), (c) Life Skills (10 items, for example, “problems getting ready for bed”), (d) Child’s Self-Concept (three items, for example, “my child feels bad about himself or herself”), (e) Social Activities (seven items, for example, “problems getting along with other children”), and (f) Risky Activities (10 items, for example, “breaking or damaging things”). For the German adaptation used in this study, we excluded the scale for the assessment of risky activities as, in our opinion, this scale covers symptoms of disruptive behavior disorders rather than functional impairment. The remaining 40 items were rated on a 4-point Likert-type scale ranging from 0 (never or not at all) to 3 (very often or very much), with higher scores indicating higher impairment. In the original version, parents have the option to answer “not applicable” if an item is not appropriate for their child. For the German adaptation, we adopted the following procedure: If a problem was not applicable for their specific situation, parents were requested to rate the item as “0” to avoid missing values. However, Items 17 (“receives detentions [during or after school]”) and 20 (“receives grades that are not as good as his or her ability”) could still be rated as “not applicable” because these items were not reasonably answerable for children attending preschool. Moreover, the wording of the items was changed slightly: Sentences all starting with “My child’s behavioral problems lead to . . .” were formulated to ensure that parents rated impairment caused by their child’s behavioral problems and not impairment due to other reasons.
Subscale scores and a total score can be computed by averaging the associated item scores. For clinical purposes, domains with at least two items scored “2”, one item scored “3”, or a mean score > 1.5 can be considered impaired (CADDRA, 2011).
Symptom Checklists for Attention-Deficit/Hyperactivity Disorder (FBB-ADHS) and for Oppositional Defiant Disorder and Conduct Disorder (FBB-SSV), parent rating
The FBB-ADHS (German: “Fremdbeurteilungsbogen fuer Aufmerksamkeitsdefizit-/Hyperaktivitaetsstoerungen”; Doepfner et al., 2008) and the FBB-SSV (German: “Fremdbeurteilungsbogen fuer Stoerungen des Sozialverhaltens”; Doepfner et al., 2008) are symptom checklists for the assessment of ADHD symptoms or symptoms of ODD and CD, respectively, according to the Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV; American Psychiatric Association [APA], 1994) and ICD-10. The FBB-ADHS consists of 20 items each covering a specific symptom behavior (e.g., “has problems in organizing tasks or activities”) that can be aggregated to two subscales (Inattention and Hyperactivity/Impulsivity) and a total score calculated by averaging the associated items. From the FBB-SSV, only the ODD scale was used in this study (nine items). The items are rated with regard to their severity on a 4-point Likert-type scale ranging from 0 (not at all) to 3 (very much); higher scores indicate higher symptom severity. All subscale scores and the FBB-ADHS total score have shown satisfactory internal consistency (α > .80) and factorial validity (Doepfner et al., 2008; Erhart, Doepfner, Ravens-Sieberer, & BELLA Study Group, 2008; Goertz-Dorten, Ise, Hautmann, Walter, & Doepfner, 2014). Also, in the sample used in this study, the ODD scale of the FBB-SSV (α = .90), the total score of the FBB-ADHS (α = .91), and the subscales of the FBB-ADHS (Inattention: α = .87, Hyperactivity/Impulsivity: α = .91) demonstrated high internal consistency.
Participants and Recruitment
Participants were recruited between May 2011 and November 2013 within two studies evaluating a telephone-assisted self-help program for parents of children with ADHD or an ODD (Dose et al., 2016; Hautmann & Doepfner, 2015). Recruitment was achieved by sending written information to registered child psychiatrists and pediatricians, child guidance offices, and sociopsychiatric service centers all over Germany by mail, and by promoting the study on the Internet. In addition, some institutions were consulted personally to introduce the program. Questionnaires were sent to the families and returned by mail. Where possible, any missing data were collected by telephone. In one of the studies, the presence of functional impairment in at least one of the domains captured by the WFIRS-P was an inclusion criterion. To define functional impairment, we followed the instructions for the WFIRS-P (see measures section). However, all available WFIRS-P pretest data were used for the current analyses (regardless of whether the respective family took part in the study afterward). We chose this procedure to increase the sample size for the analyses and to increase the variance in the item scores by also including families with low ratings.
Participants were parents of children diagnosed with ADHD and/or ODD. In most of the cases (89%), the child’s biological mother completed the questionnaire. In the first study, WFIRS-P pretest data were collected for 147 children aged 4 to 11 years (M = 7.76, SD = 1.94; 78% boys) with ADHD and/or ODD; 82 (56%) of these children met the DSM-IV criteria for both ADHD and ODD, 29 (20%) met the criteria for ADHD only, and 36 (24%) for ODD only. In this study, 104 of the children attended school, 41 preschool, and two neither preschool nor school. In this subsample, 47 children (32%) received medical treatment (40 of them received methylphenidate, two atomoxetine, two amphetamine, and three free fatty acid supplementation). In the second study, WFIRS-P pretest data were available from parents of 117 schoolchildren aged 6 to 12 years (M = 9.79, SD = 1.60; 82% boys). In this subsample, all children had been diagnosed with ADHD by a pediatrician or psychiatrist and were receiving methylphenidate medication (professionals were advised to enroll only children meeting these inclusion criteria). In this sample, data on the presence of an ODD diagnosis were not systematically collected. However, considering the parents’ ratings on the ODD scale of the FBB-SSV, 58 (50%) children in this subsample showed ODD symptoms in the clinical range (stanine value ≥ 8).
The total clinical sample comprised parents of 264 children with ADHD and/or ODD aged 4 to 12 years (M = 8.66, SD = 2.06; 80% male). Most of the children lived with both their biological parents (64%), 19% lived in single-parent families, and 11% with one parent and his or her partner. The remaining children lived with grandparents, foster parents, or adoptive parents. The mean age of parents was 38.83 years (SD = 7.19; five missing values); 96% of the participants were female. Following the International Standard Classification (Organisation for Economic Co-Operation and Development [OECD], 1999), the participants in this study showed an average of 12.29 educational years (SD = 2.83; seven missing values), which is comparable with the German “Abitur” (A-levels). In most of the families (98%), German was the language spoken at home (which was also the language of assessment).
Data Analysis
All analyses were performed using SPSS 22 or Mplus (for the confirmatory factor analyses). All analyses were performed in the complete sample (with fewer values available for some items of the Learning and School scale) and in a subsample of all children attending school from both studies so as to examine whether the results are biased by the fewer values available for children attending preschool. Analysis of missing values in the complete sample revealed no missing values for items of the scales Life Skills, Child’s Self Concept, and Social Activities and only one missing value for the second item (“causing problems between parents”) of the Family scale. Regarding the Learning and School scale, 43 parents chose the “not applicable” option for Item 17, and 46 did so for Item 20 (possibly parents of children not attending school as well as parents of children who had just started school and had not received grades up to the time of data collection). One missing value occurred for Item 19 (“misses classes or is late for school”). For the FBB-ADHS, there were no missing values; for the ODD scale of the FBB-SSV, there was only one missing value for one item. Due to their low numbers, missing values were not replaced.
Confirmatory factor analyses were performed to examine alternative factor structures of the WFIRS-P. Four different models were evaluated and compared (see Figure 1). First, a unidimensional model suggesting a single common factor of functional impairment that influences all items was examined (Model I). This model assumes that individual differences in all item scores are caused by individual differences in the common factor of functional impairment and implies that higher scores on the general factor are associated with higher scores on all items (see Brunner, Nagy, & Wilhelm, 2012). Second, a first-order correlated-factors model with five factors according to the a priori expected scale structure (Family, Learning and School, Life Skills, Child’s Self Concept, Social Activities) was considered (Model II). The model implies that each first-order factor influences a subset of items; higher scores on a first-order factor are associated with higher scores on the items that are influenced by the factor (see Brunner et al., 2012). Each item was specified to load on a certain factor, and the factors were allowed to correlate freely. For identification purposes, the first item per factor was chosen as the reference variable for the factor scale (as is the default in Mplus) and its factor loading was fixed to one. Third, a second-order model was specified with five first-order factors according to the WFIRS-P subscales and a second-order, overall impairment factor that explains the correlations between the first-order factors (Model III; see Reise, Moore, & Haviland, 2010). In contrast to the preceding models, this model suggests both a general construct of functional impairment and specific constructs according to the WFIRS-P subscales (cf. Brunner et al., 2012). The general, second-order factor is supposed to influence the first-order factors, which, in turn, influence particular item subsets. To identify this model, the loading of one item in each first-order factor and one loading in the second-order factor were fixed to one. Fourth, a five-factor bifactor model was evaluated (Model IV). In the bifactor model, one general factor (e.g., functional impairment) is supposed to account for the variance in the item scores. Moreover, additional group factors are specified to further explain variance in item subsets that cannot be attributed to the general factor (see Chen, West, & Sousa, 2006; Reise et al., 2010). The general factor and the group factors do not correlate and compete equally to explain variance (see Ebesutani, McLeish, Luberto, Young, & Maack, 2014), that is, all covariances among factors in this model were fixed to zero. As we theoretically assumed that both the total scale and the subscales account for variance in the item scores, we expected the bifactor model to provide the best fit to the data.

Possible alternative factor structures underlying the Weiss Functional Impairment Rating Scale–Parent Report, which were examined by the use of confirmatory factor analysis.
Because of the use of a 4-point Likert-type scale, item scores were considered ordered categorical data. Thus, the robust weighted least squares with mean and variance adjustment (WLSMV) estimator, which uses polychoric correlations, was used for model estimation (Brown, 2006; Muthén & Muthén, 1998-2015). For the handling of missing data (i.e., the two actual missing values and the cases in which an item was considered to be “not applicable” in the complete sample), the default procedure for WLSMV in Mplus was used (pairwise present analysis; Muthén & Muthén, 1998-2015).
To evaluate the model fit, several fit indices were considered. The chi-square test was used as the absolute fit index (Brown, 2006). Because the chi-square test is highly dependent on sample size and, as a result, models tend to be rejected in large samples although they are actually acceptable, the chi-square value was considered relative to its degrees of freedom (χ2/df; Schermelleh-Engel, Moosbrugger, & Mueller, 2003). The ratio should be as small as possible; χ2/df values between two and three indicate a “good” or “acceptable” model fit, respectively (Schermelleh-Engel et al., 2003). The comparative fit index (CFI) and the root mean square error of approximation (RMSEA) were regarded as additional goodness of fit indices. A CFI close to .95 (Hu & Bentler, 1999) and an RMSEA < .08 (Browne & Cudeck, 1992) indicate an acceptable model fit. Following Kline (1994), factor loadings of λ > .30 were considered acceptable.
For comparison of nested models, the chi-square difference test integral to Mplus was used (Muthén & Muthén, 1998-2015). The unidimensional and the second-order model are both nested within the less restricted first-order correlated-factors model and the bifactor model; the bifactor model is the least restricted model (see Brunner et al., 2012; Reise, 2012; Reise et al., 2010). If the result of the chi-square difference test is significant, the null hypothesis of equal model fit of the compared models is rejected and the less restricted model should be retained. In case of a non-significant result, the more restricted model does not show a significantly worse data fit than the other model and, as a result, should be favored (Schermelleh-Engel et al., 2003).
In classic test theory, Cronbach’s alpha is often used to determine a scale’s reliability. However, alpha is not recommended for estimating the reliability of hierarchically structured constructs as in second-order or bifactor models, as alpha does not take into account that subscales might eventually not have any substantial impact on the item scores once the influence of the general scale is controlled for (Brunner et al., 2012; Reise et al., 2007). One could, for example, find high internal consistency for a certain subscale that is actually due to a general construct underlying the items of this specific subscale as well as the items of other subscales (Reise et al., 2007). Hence, model-based reliability estimates for the WFIRS-P total score and subscales are provided using omega (ω) statistics. Omega reveals the amount of variance in item raw scores due to all constructs underlying a scale score (i.e., in a bifactor model, the amount of variance accounted for by the total scale and the respective subscales taken together). In addition, omega hierarchical can be computed to estimate the amount of variance in item raw scores accounted for only by the general factor (ωH) or by a specific factor (ωS) in a hierarchically structured model (e.g., Brunner et al., 2012; Reise, 2012; Zinbarg, Revelle, Yovel, & Li, 2005). ωH was computed to estimate the amount of variance attributable to the general factor in all item scores and, moreover, in item subsets that belong to the subscales. Omega statistics were computed using Microsoft Excel.
Nevertheless, Cronbach’s alpha was computed to allow for comparison with results from previous studies. According to Nunnally (1978), the internal consistency of a satisfactory test or scale needs to be at least .70. In addition, part–whole corrected item–scale correlations, also used in classic test theory, were examined with regard to the total scale and the subscales. Item–scale correlations of .30 ≤ rit ≤ .50 were considered moderate; item–scale correlations of > .50 were regarded as high (Bortz & Doering, 2006).
To further evaluate the dimensionality of the WFIRS-P, the explained common variance (ECV) was computed as an index of unidimensionality (e.g., Reise, 2012). The ECV equals “the ratio of variance explained by the general factor divided by the variance explained by the general plus the group factors” (Reise, 2012, p. 687). High ECV values indicate that most of all common explained variance is accounted for by the general factor (Reise, 2012).
To examine the divergent validity of the WFIRS-P, correlations of its total scale and subscale scores with the FBB-ADHS total score and subscale scores and with the ODD scale score of the FBB-SSV as measures of symptoms were examined using Pearson’s correlation coefficients (r). In addition, the multiple correlation (R) between the ODD scale score and the FBB-ADHS total score on one hand and the WFIRS-P total score on the other hand was considered.
Results
Confirmatory Factor Analysis
In the complete sample, the overall model fit of the unidimensional model (Model I) was low, indicating that this model is not suitable for the data. The first-order correlated-factors model (Model II), the second-order model (Model III), and the bifactor model (Model IV) all provided a satisfactory fit for the data (see Table 1).
Confirmatory Factor Analyses Comparing Alternative Models of the Weiss Functional Impairment Rating Scale–Parent Report (Estimator: WLSMV).
Note. Sample size: n = 264; WLSMV = robust weighted least squares with mean and variance adjustment estimator, χ2 = empirical χ2 value, df = degrees of freedom, p = empirical significance value, CFI = comparative fit index, RMSEA = root mean square error of approximation, CI = confidence interval, Δχ2 = corrected difference between χ2 values of two competing models for difference testing.
Δχ2 test significant on 1% level.
In the first-order correlated-factors model (Model II), all standardized factor loadings were significant and ranged from λ = .31 to λ = .94. This model showed significantly better model fit than the unidimensional model (Model I; Δχ2 = 569.10, df = 10, p < .01). The model-implied correlations between the factors estimated by Mplus were all significant and moderate (.32 ≤ r ≤ .63), suggesting that the factors share a certain amount of common variance.
The second-order model (Model III) showed a similar CFI and RMSEA to Model II (see Table 1). The standardized loadings of the first-order factors on the second-order factor were all significant and ranged from λ = .56 to λ = .79. A second-order model is not able to model the data better than a first-order solution with freely intercorrelated factors. The aim of a second-order model is to reproduce the correlations between the factors in the first-order model with fewer freely estimated parameters (Brown, 2006). Then, one must examine whether the higher-order factor leads to a significant reduction in model fit relative to the first-order solution (Brown, 2006). The results of the chi-square difference test indicated that this was the case (i.e., that the first-order model fitted the data significantly better than the second-order model; Δχ2 = 25.58, df = 5, p < .01). Brunner et al. (2012) state that first- and second-order models may further be compared by regarding “the residual correlations Δr that are computed as the difference between the model-implied correlations among the first-order constructs [in the second-order model] and the corresponding correlations in the first-order factor model” (p. 808). In our analyses, these residual correlations ranged from −.10 to .07. According to Brunner et al. (2012), the low residual correlations in combination with an overall satisfactory fit of the second-order model support the theoretical assumption of an underlying hierarchically structured construct although the chi-square difference test was significant (i.e., support the second-order model).
On a descriptive level, the bifactor model (Model IV) provided slightly better fit indices than the alternative models (χ2/df = 1.75, CFI = .93, RMSEA = .05; see Table 1). The results of the chi-square difference test indicated that the bifactor model fitted the data significantly better than the second-order model (Model III; Δχ2 = 213.52, df = 35, p < .01). With the exception of Items 17 (“receives detentions [during or after school]”) and 18 (“suspended or expelled from school/preschool”) of the Learning and School scale, all items loaded significantly on the general factor (see Table S1 of the online supplement). Standardized loadings on the general factor ranged from λ = .13 to λ = .74; the loadings of Items 1 (Family scale), 13, 17, 18, 19 (Learning and School scale), 21, 23, 25, and 30 (Life Skills scale) on the general factor were generally less than or around .30 (range = .13-.31). The low loadings of these items indicate that the general construct has no substantial impact on these items (see Chen, Hayes, Carver, Laurenceau, & Zhang, 2012). Most items showed substantial loadings on their specific group factor. However, it is worth noting that Items 5, 10 (Family scale), 21, 27, 28, 29, 30 (Life Skills scale), 35, and 37 (Social Activities scale) showed loadings of <.30 on their respective specific group factor. With the exception of Items 16 and 20, the items of the Learning and School scale loaded higher on their specific group factor than on the general factor, although loadings on the general factor were also significant and substantial with the exception of Items 17 and 18. This indicates that the specific construct of impairment at school has more impact on these items than impairment in general. A higher loading on the specific factor compared with that on the general factor could also be observed for several items of the Family scale, the Life Skills scale, and the Social Activities scale. For Items 21 (“excessive use of TV, computer, or video games”) and 30 (“has trouble taking medication, getting needles, or visiting the doctor/dentist”) of the Life Skills scale, loadings were low on the general as well as on the specific factor.
In the subsample of schoolchildren, similar results emerged. The first-order correlated-factors model (Model II), the second-order model (Model III), and the bifactor model (Model IV) all provided a satisfactory fit to the data, the bifactor model showing a slightly better fit than the alternative models. The factor loadings in this subsample were similar to those observed in the complete sample. Thus, we may conclude that the “missing values” for children not attending school had no substantial influence on the results.
Reliability
The first-order correlated-factors model (Model II), the second-order model (Model III), and the bifactor model (Model IV) provided an adequate fit to the data. However, bifactor models have the advantage over other approaches that they help to distinguish variance attributable to the general construct from variance attributable to the subscales (e.g., Brunner et al., 2012; Reise, 2012). Hence, reliability estimates were computed based on the bifactor model. Omega (amount of variance accounted for by the total scale and the respective subscales taken together) was .96 for the total scale and ranged from .80 to .92 for the subscales (see Table 2). Omega hierarchical for the general factor was .78 with regard to the total scale and ranged from .25 to .63 with regard to the specific scales. The omega hierarchical that displays the amount of variance in item subsets explained by the specific factors ranged from .29 to .66. It is worth noting that for the Learning and School scale, the specific factor accounted for far more variance than the general factor. For the other subscales, the general scale accounted for more variance than the subscales, but the subscales still accounted for a substantial amount of variance. The ECV value based on the bifactor model was .48, which indicates that the general factor and the specific group factors contribute to the explanation of common item variance in the sample. Results were very similar when only the subsample of schoolchildren was considered (ω = .96 for the total score and ω = .81-.93 for the subscales; ωH = .79 for the total score and ωH = .26-.62 for the subscales; ωS = .30-.66).
Descriptive Statistics, Internal Consistencies, Part–Whole Corrected Item–Scale Correlations, Factor Loadings, and Omega Statistics Based on the Bifactor Model for the Weiss Functional Impairment Rating Scale–Parent Report Subscales and Total Scale.
Note. Overall sample size N = 264. an = 217, bn = 263, cn = 218, M = mean (items rated on a 4-point Likert-type scale ranging from 0 to 3), SD = standard deviation; α = Cronbach’s alpha (internal consistency), rit = range of part–whole corrected item–scale correlations, ω = omega (amount of variance accounted for by the total scale and the respective subscales taken together), ωH = omega hierarchical (amount of variance accounted for by the total scale), ωS = omega hierarchical subscale (amount of variance accounted for by the subscale). The different sample sizes are due to missing values.
Cronbach’s alpha was .92 for the total scale in the complete sample. Coefficients for the subscales ranged from .73 to .87 (see Table 2). Item–subscale correlations were mostly moderate to high (rit = .32-.78; see Table 2). Dissatisfactory item–subscale correlations emerged for Items 21 (“excessive use of TV, computer, or video games”; rit = .23), 29 (“needs more medical care”; rit = .27), and 30 (“has trouble taking medication, getting needles, or visiting the doctor/dentist”; rit = .25) of the Life Skills scale. Neither the deletion of a single item nor the simultaneous deletion of all three items led to an improvement in internal consistency.
When considering only the schoolchildren, Cronbach’s alpha was also .92 for the total scale and ranged from .74 to .88 for the subscales. Again, item–subscale correlations were > .30 except those for Items 21, 29, and 30 of the Life Skills scale.
Correlations With the FBB-ADHS and the FBB-SSV (ODD Scale)
In the complete sample, all correlations between the WFIRS-P total and subscale scores on one hand and the FBB-ADHS total and subscale scores and the ODD scale score on the other hand were significant on the 5% level and mostly low to moderate (.15 ≤ r ≤ .56; see Table 3). In the subsample of schoolchildren, correlations in the same range were found. The multiple correlation between the FBB-ADHS total score and the ODD scale on one hand and the WFIRS-P total score on the other hand was .63 (R2 = .40) in both samples.
Correlations Between the WFIRS-P and the FBB-ADHS and the FBB-SSV (ODD Scale).
Note. WFIRS-P = Weiss Functional Impairment Rating Scale–Parent Report, FBB-ADHS = Symptom Checklist for Attention-Deficit/Hyperactivity Disorder, FBB-SSV = Symptom Checklist for Oppositional Defiant and Conduct Disorder, ODD = oppositional defiant disorder, n = sample size. All correlation coefficients were significant on the 5% level (not adjusted). The different sample sizes are due to missing values.
Discussion
This study examined the psychometric properties of the German adaptation of the WFIRS-P (CADDRA, 2011) in a clinical sample of children aged 4 to 12 years with externalizing behavior disorders. This is, to our knowledge, the first article to concentrate on the psychometric properties of a German version of the WFIRS-P and, additionally, to examine a bifactor structure of this rating scale. In general, the results support the scale construction, and the reliability and divergent validity of the WFIRS-P, and are thus in line with previous studies on its psychometric properties in other languages (CADDRA, 2011; Gajria et al., 2015; Qian et al., 2011).
Although the first-order correlated-factors model (Model II), the second-order model (Model III), and the bifactor model (Model IV) all provided a satisfactory fit to the data, confirmatory factor analyses most likely support a bifactor model, with a general factor accounting for common item variance and independent specific group factors accounting for additional variance in item subsets. For four subscales (Family, Life Skills, Child’s Self Concept, Social Activities), omega hierarchical statistics indicated that the general scale accounted for a substantial amount of variance in item scores, with additional contributions from the specific factors. In addition, omega values revealed that the total and subscales taken together accounted for a substantial proportion of item variance. As expected, this supports the interpretability of the subscales as well as the general score for these scales, which is further supported by the ECV value of .48. However, it has to be stated that most of the items of the Learning and School scale showed stronger loadings on their subscale than on the general scale, thus indicating that the items of this scale are more likely to reflect a specific construct (see Chen et al., 2012). This conclusion is supported by the omega hierarchical statistics, which reveal that the general scale accounts for only about a quarter of the variance in this scale’s items whereas the subscale accounts for about two thirds of item variance. One possible reason for the differing results for the Learning and School scale compared with the other subscales may be that not all items of this scale capture aspects that are directly observable for parents, or that behavior (and thus, impairment) at school is actually different from that in other areas of life. Although Cronbach’s α and ωH of the total score remain high when the items of the Learning and School scale are excluded, we recommend that they remain in the computation of the total score. First, most of them also show substantial loadings on the general factor, and second, this facilitates the interpretation and also comparability with results from international studies using the WFIRS-P.
In previous studies, confirmatory factor analyses showed a better fit of models with correlated factors with the data if the Learning and School scale was divided into two subscales, namely, Learning and Behavior (CADDRA, 2011; Gajria et al., 2015). In our sample, results for models assuming six first-order or six specific group factors (in a bifactor model), respectively, were similar to those for the solutions with five first-order or specific group factors and are presented in the supplement of this article (see Table S2 of the online supplement). Hence, the results of these analyses do not clearly favor a solution with five or six factors.
A closer analysis of the bifactor model showed that some items contributed more to the general construct of functional impairment than to a specific subscale whereas other items behaved in the opposite way. Some items of the Family, the Life Skills, and the Social Activities scales had low, partly even non-significant loadings on their specific group factor. These results are partly in line with those reported by Gajria et al. (2015). The authors found that, for a first-order model with correlated factors, some items of the Life Skills scale had no substantial loadings on their specific group factor.
To ensure comparability with results from international studies using the WFIRS-P in other languages and as the overall fit of the bifactor model was satisfactory, we decided to keep the problematic items within the respective scales. However, for practical use of the WFIRS-P, it should be kept in mind that some items of the Life Skills scale in particular provide only limited information beyond the general construct of functional impairment. Nevertheless, they might be of great value for treatment planning if considered at the item level as they comprise very specific problem behaviors in different situations that might form a useful starting point for interventions.
Considering a classic test theoretical framework, internal consistencies and most of the part–whole corrected item–scale or item–subscale correlations were satisfactory and provided further support for the reliability of the WFIRS-P subscales and total scale. Exceptions were Items 21, 29, and 30 of the Life Skills scale, which had already shown weak psychometric properties in the examination of the bifactor model. The fact that some subscales showed high internal consistencies but a comparably lower ωS might point to the presence of an underlying general construct of functional impairment, which causes the item coherence. Once the influence of this factor is controlled for, the subscales explain a lower proportion of variance (Brunner et al., 2012; Reise et al., 2007).
As expected, the WFIRS-P subscales showed divergent validity from ADHD and ODD symptoms. The low-to-moderate correlations between the WFIRS-P scale scores and the FBB-ADHS and ODD scale scores, as well as the moderate multiple correlation between the WFIRS-P total score on one hand and the FBB-ADHS total score and the ODD scale score on the other, indicate that although the constructs measured by these instruments have something in common, they are nevertheless not the same (i.e., the WFIRS-P provides information other than symptom information).
The WFIRS-P was originally designed to measure functional impairment in children and adolescents with ADHD. In this study, we extended its use to children with ODD. However, when we excluded the 36 children who only met the criteria for ODD from the analyses, the results were very similar to those for the original sample. The confirmatory factor analyses produced similar fit indices, similar internal consistencies and item–subscale correlations as in the original sample were found, and correlations between the WFIRS-P subscales and total score with the FBB-ADHS subscales and total score and with the FBB-SSV ODD scale remained low to moderate. Thus, we concluded that the extended use of the WFIRS-P in children with externalizing behavior disorders in general had no substantial impact on the results.
Of course, the study has several limitations. The criterion measure to assess validity was limited to ADHD and ODD symptomatology. Future studies should include examination of the divergent validity from other concepts (e.g., measures of quality of life). Moreover, the convergent validity of this German adaptation with other measures of functional impairment has still to be assessed.
Conclusion
Despite some limitations of the study and poor psychometric quality of a few items, the overall results of this study suggest that the WFIRS-P is a reliable and valid instrument for assessing functional impairment in children with externalizing behavior disorders. The adequate fit of a bifactor model comprising a general factor and five independent specific group factors corresponds with the a priori expected scale structure. Reliability estimates based on the bifactor model and the factor loadings on the general and specific factors support the interpretability of both subscales and a general scale. Reliability indices based on the classic test theory approach (internal consistencies and part–whole corrected item–subscale correlations) are also satisfactory. Moreover, the WFIRS-P showed adequate divergent validity from measures of ADHD and ODD symptoms. Further studies should try to clarify the role of the Learning and School scale, to examine convergent validity with other measures of functional impairment as well as divergent validity from measures of quality of life, and consider a wider age range.
Footnotes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Manfred Doepfner received consulting income and research support from Lilly, Medice, Shire, Janssen Cilag, Novartis, and Vifor. He received royalties from treatment manuals, books, and psychological tests published by Guilford, Hogrefe, Enke, Beltz, and Huber.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Data for the analyses presented in this article were collected within two studies. The first study was supported by the German Research Foundation (grant number DO 620/5-1). The second study was supported by Shire Pharmaceuticals Development Ltd. (grant number IST-DEU-000199 / SPD544-603).
