Abstract
The purpose of this research was to determine whether child, parent, and teacher characteristics such as sex, socioeconomic status (SES), parental depressive symptoms, the number of years of teaching experience, number of children in the classroom, and teachers’ disciplinary self-efficacy predict deviations from maternal ratings in a multitrait-multimethod (MTMM) confirmatory factor analysis. The study included 978 families from the National Institute of Child Health and Development Study of Early Child Care and Youth Development. Results indicated that teachers with more disciplinary self-efficacy, teaching experience, and children in their classrooms generally rated their students’ behavior in a more consistent manner with ratings completed by the students’ mothers. In addition, fathers who reported more symptoms of depression rated their children’s behavior in a less consistent manner with ratings completed by mothers. Finally, the perspectives of mothers generally deviated more from the perspectives of fathers and teachers when they were rating boys.
Many research studies have examined the differences and similarities in the way parents, teachers, and other informants rate child behavior problems (Achenbach, McConaughy, & Howell, 1987; Duhig, Renk, Epstein, & Phares, 2000). Notably, Achenbach et al. (1987) in a meta-analysis of 119 studies found that the average correlation for behavior ratings completed by different types of informants was .28, and the average correlation between self-ratings and the responses of other informants was .22. Such results highlight the unique information gained from different informants, but obtaining different results from different raters may also make interpreting assessment results difficult.
Although some of the variation in ratings among different informants may be due to the setting in which the raters interact with the child (Achenbach et al., 1987), variation in behavior ratings may also be due to rater effects. Research examining rater discrepancies suggests that mother ratings are more divergent from ratings of other informants and may be negatively biased when mothers endorse depressive (Boyle & Pickles, 1997; Fergusson, Lynskey, & Horwood, 1993) and anxious (Chilcoat & Breslau, 1997) symptoms on self-reports. In addition, rater discrepancies may be in part due to attribution bias because observers may be more likely to attribute behavior problems to the child’s disposition, whereas children may be more likely to attribute their own behavior to contextual variables (De Los Reyes & Kazdin, 2005).
Methodological Approaches to Rater Discrepancy Analysis
A common methodological approach to studying rater discrepancy is to examine correlations between different raters. Such studies provide information on the degree to which different raters agree, but analyses can also be broken down to determine whether moderators are present. Previous studies have examined the differences in correlations across a multitude of variables and found differences in correlations across age (Grills & Ollendick, 2003), sex (Kolko & Kazdin, 1993), socioeconomic status (SES; Duhig et al., 2000), and race (Youngstrom, Loeber, & Stouthamer-Loeber, 2000) among other variables. The results of such studies have been aggregated to provide researchers and clinicians with guidelines for interpreting rater discrepancies (Achenbach et al., 1987; Duhig et al., 2000; Smith, 2007). Teacher characteristics are notably missing from research on rater discrepancies (Smith, 2007); however, teachers with more years of experience and larger classroom sizes may be less biased when completing behavior ratings because they are more familiar with what constitutes average behavior. In addition, low disciplinary self-efficacy predicts depersonalization (Brouwers & Tomic, 2000), which might manifest as bias when teachers complete behavior rating scales.
In contrast to correlational studies that provide information regarding the circumstances under which raters are more or less likely to agree, Fergusson et al. (1993) examined correlates of error variances in latent variable structural equation models. They created a maternal depression latent variable using two measures of maternal depression and allowed it to correlate with a latent child behavior variable measured by rating scales completed by mothers, fathers, and teachers. The authors included a path, and in a separate analysis a correlation, from the latent maternal depression variable to the error variance of the mother-rated child behavior variable. The authors were able to study how maternal depression relates to both child behavior and error in mother-rated measures of child behavior and found that maternal depression was positively related to a propensity for mothers to over-report child behavior problems. This approach provided an advantage over studies that examine moderators of correlation coefficients because it examined correlates of error variance, or the variance that is not accounted for by the latent variable.
The analysis of correlates of error variance, although an improvement beyond correlational studies, can be improved by more completely modeling measurement error. Multitrait-multimethod (MTMM) analyses allow researchers to study method effects or the degree to which the methodology affects variance in the measure as well as the trait of interest (Campbell & Fiske, 1959). In a MTMM study of the Child Behavior Checklist (CBCL) among Russian youth, Grigorenko, Geiser, Slobodskaya, and Francis (2010) correlated age and sex with method factors. They found that for certain behaviors, as children age, their self-ratings increasingly deviate from their mothers’ ratings of their behavior and that girls’ self-ratings were more likely to deviate from their mothers’ ratings than boys’ self-ratings. In addition, Quilty, Oakman, and Risko (2006) found that avoidance motivation, or neuroticism, was associated with the method effect factor for negatively worded items and thus endorsement of negatively worded items on the Rosenberg Self-Esteem Scale. These approaches to the study of rater discrepancies shift the focus of inquiry from moderators of correlations between different informants to the predictors of rater-specific bias.
The Current Study
This research study will replicate and extend previous findings from correlational studies examining rater discrepancies within the MTMM framework and will focus on variables that predict deviations from a referent rater. In addition, this study will add to the existing body of literature on rater discrepancies by including teacher and classroom characteristics in the analysis to better understand how they relate to behavior rating scores. Accordingly, the purpose of this research is to answer whether maternal and paternal depressive symptoms, the child’s sex, and SES predict whether ratings completed by teachers and fathers deviate from maternal ratings, and whether teacher and classroom characteristics, such as the number of years of teaching experience, number of children in the classroom, and self-efficacy predict whether ratings completed by teachers deviate from maternal ratings.
Method
Participants
This study used data from Phase III of the National Institute of Child Health and Development (NICHD) Study of Early Child Care and Youth Development (SECCYD). Beginning in 1991, the NICHD-SECCYD recruited 1,364 families from 10 states across the United States. Data for Phase III were gathered between 2000 and 2004, through which 1,061 families remained in the study. Children were included in the study if their mother, father, or step-father completed the behavior rating scales and were excluded from the study if they were missing data on all dependent variables. The total sample for this study includes 978 families. The demographic data for participants are shown in Table 1.
Sample Demographics for Sex, Race, Hispanic Status, and Mother’s Education (n = 978).
Note. GED® = General Education Development.
Instruments
Child Behavior Checklist
Internalizing and externalizing behavior problems were measured using the Child Behavior Checklist (CBCL; Achenbach, 1991) while the child was in sixth grade. The child’s parents rated their child’s behavior on 122 items, each item on a 3-point scale (0 = not true of the child to 2 = very true of the child), and the child’s teacher rated the child’s behavior on 116 items on the same 3-point scale; however, only the 58 items that were common to both the teacher and parent forms were used in this study. The Externalizing scale is composed of two subscales that measure delinquent and aggressive behavior, and the Internalizing scale is composed of three subscales that measure withdrawal, somatization, and anxiety/depression. Evidence for this hierarchical factor structure is provided in previous research (Konold, Walthall, & Pianta, 2004). Coefficient alphas for the Internalizing and Externalizing scales were computed for all participants in the data set and ranged from .85 to .88 and from .89 to .94, respectively.
Disruptive Behavior Disorders Rating Scale
Behavior problems associated with attention-deficit hyperactivity disorder (ADHD) were measured with the ADHD Score from the Disruptive Behavior Disorders Rating Scale (DBD; Pelham, Evans, Gnagy, & Greenslade, 1992) while the child was in sixth grade. The child’s parents and teacher rated the child on 26 questions, which were on a 4-point scale (0 = not at all, 1 = just a little, 2 = pretty much, 3 = very much), adapted from the diagnostic criteria for ADHD from the Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV; American Psychiatric Association, 1994). Coefficient alphas for the ADHD score were computed for all participants in the data set and were .93 for mothers, .94 for fathers, and .95 for teachers.
Social Skills Rating System
The Social Skills Rating System (SSRS; Gresham & Elliott, 1990) is a measure of a child’s socially acceptable behaviors that enable effective interpersonal interactions. The child’s parents rated the child on 38 items measuring self-control, assertiveness, responsibility, and cooperation; however, only the 30 items measuring self-control, assertiveness, and cooperation were used in this study. The child’s teacher also rated the child on 30 items measuring self-control, assertiveness, and cooperation while the child was in sixth grade. All items were constructed with a 3-point scale (0 = never, 1 = sometimes, and 2 = very often), and again, only the 21 items that were common to both the teacher and parent forms were used in this study. For the present study, items were reverse-coded so higher scores would indicate less socially desirable behavior and would correlate positively with the CBCL and DBD. Evidence for this hierarchical factor structure is provided in previous research (Van Horn, Atkins-Burnett, Karlin, Ramey, & Snyder, 2007). Coefficient alphas for the Total Social Skills Deficits scale were computed for all participants in the data set and were .87 for mothers, .89 for fathers, and .94 for teachers.
Teacher Questionnaire
The Teacher Questionnaire (NICHD Early Child Care Research Network, 2005) was used to gather information regarding each child’s sixth-grade teacher and classroom. Data from the Teacher Questionnaire that were used in this study included teacher reports of the number of years of teaching experience and the number of children in the classroom.
Teacher Self-Efficacy Scale
The sixth-grade teacher’s disciplinary self-efficacy was measured with the Teacher Self-Efficacy Scale (NICHD Early Child Care Research Network, 2005). The scale is composed of 21 items, 3 of which are designed to measure how much teachers perceive they discipline students effectively on a 9-point scale (1 = nothing to 10 = a great deal). Coefficient alpha for the Disciplinary Self-Efficacy scale was .74 in this sample.
Center for Epidemiological Studies Depression Scale
The child’s mother’s and father’s depressive symptoms were measured with the Center for Epidemiological Studies Depression Scale (CES-D) while the child was in sixth grade. The child’s parents rated themselves on 20 items measuring behaviors and symptoms associated with depression on a 4-point scale (1 = less than once a week to 4 = 5-7 days a week). Coefficient alphas for the CES-D scores were computed for all participants in the data set and were .72 for mothers and .71 for fathers.
Demographic variables
Other demographic variables in the study included the child’s sex and the family’s income-to-needs ratio. The income-to-needs ratio is computed by dividing the family’s income by the poverty threshold (NICHD Early Child Care Research Network, 2005).
Analysis
Rater discrepancies were analyzed with a multiple indicator correlated trait–correlated method minus one (CT-C[M − 1]) model (Eid, Lischetzke, Nussbeck, & Trierweiler, 2003; Grigorenko et al., 2010). This type of MTMM model includes trait factors for each measured trait but one fewer method factor than methods used. It also requires multiple indicators for each trait–method combination. For example, internalizing behaviors rated by fathers must be measured at least twice rather than just once. In this study, each trait was summed into two scores similar to Grigorenko et al. (2010); however, the scores were summed so subscales would be equally represented in each half. Because each trait–method combination was measured by two indicators, method factors were trait specific. For example, social skills measures completed by fathers loaded onto a method factor for father-rated social skills, with that method factor distinct from other method factors for father-rated behaviors. In addition, in CT-C(M − 1) models, one rater serves as the reference method to which other raters are compared. For this study, mothers were chosen as the reference method; thus, method factors only for father and teacher ratings were included in the model for each trait. Similar to other MTMM models, each of the six ADHD scales (i.e., two mother, two father, and two teacher) loaded onto an ADHD factor, and each of the six Internalizing, Externalizing, and Social Skills Deficits scales loaded onto Internalizing, Externalizing, and Social Skills Deficits factors, respectively. Half of the analyzed model is shown in Figure 1 because the full model, with all method and reference factors, was too large to include. After the CT-C(M − 1) model was analyzed, predictor variables were added to the model to determine the degree to which they predicted method effects. Paths from the maternal and paternal depressive symptoms, income-to-needs ratio, and child sex variables to each of the method and trait factors were specified. Paths from the teacher-related variables to each of the trait and teacher-related method factors were also specified.

CT-C(M − 1) model of social skills scores from the SSRS and internalizing from the CBCL as rated by teachers, mothers, and fathers.
Several fit indices were used to determine how well the proposed models fit the data. The Comparative Fit Index (CFI), Tucker–Lewis Index (TLI), Standardized Root Mean Square Residual (SRMR), and Root Mean Square Error of Approximation (RMSEA) were used to assess model fit. Models where the CFI and TLI were more than .95, where the SRMR was less than .08, and where the RMSEA was less than .05 suggested good model fit, while values between .90 and .95 for the CFI and TLI suggested adequate model fit (Hu & Bentler, 1998, 1999). Mplus Version 6.12 (Muthén & Muthén, 2010) was used to compute all analyses. Maximum likelihood procedures were used to estimate missing data.
Results
The CT-C(M − 1) model generally fit the data well with fit indices indicating adequate (RMSEA = .052, 90% CI [.048, .056]) to good fit (CFI = .969, TLI = .957, SRMR = .053). Correlations among traits (ADHD, Internalizing, Externalizing, and Social Skills Deficits) were all significant (p < .001) and ranged from .400 to .687. Correlations between method effects for fathers were also significant (p < .001) and ranged from .441 to .818 indicating fathers who overestimate behavior problems compared with maternal ratings in one domain tend to overestimate behavior problems compared with maternal ratings in other domains as well. The same pattern of correlations held for teachers with correlations ranging from .492 to .792. Correlations between method effects for similar traits indicated that fathers and teachers share similar views that differ from mothers with regard to social skills deficits (r = .156, p = .002), externalizing behaviors (r = .251, p < .001), and ADHD symptoms (r = .251, p < .001). Consistency coefficients, or the proportion of variance in the indicators due to maternal ratings (Eid et al., 2003), were higher for father ratings (.210-.411) than teacher ratings (.075-.127) indicating more convergence between ratings completed by fathers and mothers than teachers and mothers. Method-specificity coefficients, or the proportion of variance in the indicators that is rater-specific (Eid et al., 2003), were generally higher for teachers (.512-.834) than fathers (.457-.601) indicating that more of the variance in teacher ratings is rater-specific than father ratings. Trait and method standardized factor loadings as well as consistency and method-specificity coefficients are reported in Table 2.
Standardized Factor Loadings for Trait and Method Factor and Consistency, Method-Specificity, and Reliability Coefficients.
Note. All trait, method, and residual factor loadings were significant (p < .001). Consistency coefficients are equal to the squared trait factor loadings, method-specificity coefficients are equal to the squared method factor loadings, and reliability coefficients are equal to the sum of consistency and method-specificity coefficients. SSRS = Social Skills Rating System; DBD = Disruptive Behavior Disorders Rating Scale; ADHD = attention-deficit hyperactivity disorder.
Predictors of Mother–Teacher Discrepancies
Predictors of method and trait effects were added to the CT-C(M − 1) model. This model fit the data well (CFI = .969, TLI = .952, SRMR = .032, RMSEA = .043, 90% CI [.040, .047]), and several variables were predictive of the method factors for teachers. When rating boys, teacher ratings deviated more from mother ratings of social skills deficits (β = −.272, p < .001), externalizing behaviors (β = −.231, p < .001), and ADHD symptoms (β = −.268, p < .001). Teacher ratings of social skills deficits also deviated more from mother ratings when mothers endorsed symptoms of depression (β = .089, p = .020), and teacher ratings of ADHD symptoms also deviated more from mother ratings when children came from families with lower incomes compared with their needs (β = −.114, p = .003).
Teacher and classroom characteristics were also predictive of method effects for teachers. Ratings completed by teachers with more disciplinary self-efficacy were more consistent with maternal ratings with regard to social skills deficits (β = −.177, p < .001) and ADHD symptoms (β = −.078, p = .021). In addition, teachers with more experience were more likely to rate a student’s behavior similarly to the student’s mother when rating social skills deficits (β = −.090, p = .010), internalizing behaviors (β = −.085, p = .023), and externalizing behaviors (β = −.081, p = .021). Finally, teachers with more children in their classrooms were more likely to rate student behavior in a more consistent manner with their students’ mothers with regard to internalizing behaviors (β = −.091, p = .015) and ADHD symptoms (β = −.112, p = .001).
Predictors of Mother–Father Discrepancies
Several variables were also predictive of the method factor for fathers. Fathers were more likely to report discrepant results compared with maternal ratings with regard to social skills deficits (β = −.147, p < .001), externalizing (β = −.154, p < .001), and ADHD-related behavior (β = −.134, p = .001) when their child was boy. Fathers who endorsed depressive symptoms were also more likely to report discrepant results compared with maternal ratings in regard to social skills deficits (β = .270, p < .001), internalizing (β = .415, p < .001), externalizing (β = .284, p < .001), and ADHD-related behavior (β = .251, p < .001).
Predictors of Trait Effects
Paths from predictor variables to latent trait variables, which represent maternal ratings, were also analyzed. Results suggested that children whose mothers and fathers score higher on measures of depressive symptoms show higher mother ratings for social skills deficits, internalizing behaviors, externalizing behaviors, and ADHD symptoms. In addition, children from families with a higher income-to-needs ratio demonstrate fewer social skills deficits, externalizing behaviors, and ADHD-related behaviors, as rated by mothers. Boys were more likely than girls to demonstrate social skills deficits and ADHD symptoms according to maternal ratings. Finally, teacher disciplinary self-efficacy predicted internalizing behaviors ratings completed by mothers. The path coefficients from predictor variables to latent trait and method variables are reported in Tables 3 and 4.
Standardized Coefficients for Predictor Variables on Method Variables.
Note. SSRS = Social Skills Rating System; ADHD = attention-deficit hyperactivity disorder.
p < .05. ** p < .01.
Standardized Coefficients for Predictor Variables on Trait Variables.
Note. ADHD = attention-deficit hyperactivity disorder.
p < .05. **p < .01.
Discussion
The purpose of this research was to determine whether parent, child, teacher, and classroom characteristics predict rater discrepancies. Results indicated that ratings completed by fathers who reported more symptoms of depression were less consistent with ratings completed by mothers. Maternal depression, however, was less predictive of rater discrepancy and only predicted social skills rating discrepancies between mothers and teachers. A family’s income relative to their needs was also less predictive of rater discrepancy than other variables. Sex was predictive of rater discrepancy with results indicating that the perspectives of mothers were generally more deviant from the perspectives of fathers and teachers when they were rating boys. The most unique findings of this study, however, were in regard to the effects of teacher and classroom characteristics on rater discrepancy and indicated that teachers with more disciplinary self-efficacy, teaching experience, and children in their classrooms generally rated their students’ behavior in a more consistent manner with ratings completed by the students’ mothers.
Contrary to expectations and “distortion-claim studies” that purport that depressed mothers are more likely to over-report depressive symptoms in children due to cognitive distortions associated with their own depression (Richters, 1992), maternal depressive symptoms were generally not predictive of rater discrepancies. Maternal depression was only predictive of discrepancies between mothers and teachers when rating social skills and, importantly, was not predictive of discrepancies between mother and fathers or mothers and teachers when rating internalizing behaviors. Maternal depressive symptoms, however, did predict all trait factors, which represented maternal ratings due to the specification of mothers as the reference method, indicating that mothers who endorse depressive symptoms are more likely to rate their children with more behavior problems. These ratings, however, were generally consistent with other raters. Thus, when behavior ratings differ between raters and the child’s mother demonstrates symptoms of depression, it should not be assumed that the mother’s ratings are discrepant due to the depressive symptoms.
A related unexpected finding was that paternal depressive symptoms predicted all discrepancies between maternal and paternal ratings. This relation between paternal depressive symptoms and rater discrepancy coupled with the general lack of a relation between maternal depressive symptoms and rater discrepancy may reflect the effect of spousal depression on husbands and wives. While increases in depressive symptoms in husbands may lead to increases in depressive symptoms in their wives over time, depressive symptoms in husbands do not appear to be significantly influenced by the depressive symptoms of their wives (Kouros & Cummings, 2010). Because depressive symptoms may influence the way a parent rates a child’s behavior (Boyle & Pickles, 1997), paternal depressive symptoms may be more influential regarding the way in which both parents rate their children than maternal depressive symptoms.
This study provided information regarding how child, parent, and teacher characteristics predict discrepancies; however, the question of whether rater discrepancies are due to rater response style or the unique context in which the rater interacts with the child is still unclear because true behavior may change depending on context (Kraemer et al., 2003). For example, a teacher with low disciplinary self-efficacy is likely to rate a child’s behavior differently from the child’s mother would rate her child’s behavior. The difference in ratings might be due to the child engaging in more problematic behavior in the teacher’s classroom because of a lack of rule enforcement. Alternatively, the difference in ratings might be due the relation between teacher self-efficacy and depersonalization (Brouwers & Tomic, 2000) and might represent rater-bias. Accordingly, rater discrepancies may be due to response style or bias, the context in which the rater interacts with the child, or a combination of the two.
This study was limited to predictor variables available in the data set. Although variables such as parental depression and teacher disciplinary self-efficacy may be important predictors of method effects, other variables not included in the NICHD-SECCYD data set such as the rater’s attribution bias (De Los Reyes & Kazdin, 2005) might also contribute to measured scores. In addition, of the 978 participants in this study, 830 were White and only 56 were Hispanic; thus, the results might not be generalizable to all racial, ethnic, or cultural groups (Achenbach, 2011). Another limitation of the study, as with all non-experimental research, is that the validity of the results of this study is dependent on how well the model approximates reality. Paths, correlations, and variables could be added to or deleted from the model, which might lead to different results; accordingly, the results of the study are dependent on the assumption that the model tested provides a better approximation to reality than other possible models.
Future research regarding inter-rater differences in behavior ratings might focus on moderators of CT-C(M − 1) models, additional informants, and additional variables. Future research that uses moderators in multiple indicator CT-C(M − 1) models would show how trait-specific method effect factor loadings vary across different levels of sex, race, and other moderators. Also, future studies with similar methodologies might determine what predicts method effects that arise from self-ratings. Finally, future research might include other predictor variables, such as rater locus of control and self-esteem.
This study provides researchers and practitioners with important information regarding the ways in which parent and teacher characteristics predict scores on behavior rating scales and added to the literature on rater discrepancy in two important ways. First, it provided information regarding teacher and classroom characteristics that predict discrepancies in ratings between teachers and mothers. Second, it used advanced methodologies to reevaluate the degree to which parental characteristics predict rater discrepancies. Accordingly, the results of this research clarify and extend previous findings regarding inter-rater discrepancies for those interpreting behavior rating scales.
Footnotes
Acknowledgements
We are grateful to the two anonymous reviewers of an earlier version of the manuscript. Their guidance greatly improved the manuscript. Any remaining errors, of course, are our fault, not theirs.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
