Abstract
Keywords
Introduction
The Importance of Early Diagnosis
Education research indicates that early identification for emotional and/or behavioral problems can help to minimize the long-term harm of mental health disorders and reduce the overall health care burden and costs (Aos, Lieb, Mayfield, Miller, & Pennucci, 2004). Detection efforts are particularly critical during the early educational years, when students are most amenable to change in behavioral, social, and academic arenas and before students at risk of emotional and behavioral disorders (EBD) and children with autistic spectrum disorder (ASD), experience negative outcomes within and beyond the school setting (Landrum, Tankersley, & Kauffman, 2003; Lane, 2003; Volkmar, Lord, Bailey, Schultz, & Klin, 2004; Wagner, Kutash, Duchnowski, Epstein, & Sumi, 2005). Flanagan, Bierman, and Kam (2003) examined the prevalence of child problem profiles at school entry based on behavior problems (e.g., aggression, hyperactivity–inattention, and prosocial skill deficits) and investigated the predictive utility of screening. Findings illustrated that educators who observe different aspects of children’s behavior during their lessons were able to identify young children at high risk of school adjustment problems related to attention, conduct, learning, and mood with a great deal of accuracy. In addition, the results indicated interdependency between aggressive, hyperactive–inattentive, and low levels of prosocial behaviors and the existence of these behaviors made unique contributions to the prediction of later school difficulties in behavioral, academic, and social adjustment domains.
Given the costs associated with EBD, to students themselves, their families, and society as a whole, it is not surprising that reducing the incidence of EBD through systematic screening and comprehensive intervention efforts is a growing area of interest to educational research (Kauffman & Landrum, 2009; Lane, 2007; Nelson, Babyak, Gonzalez, & Benner, 2003).
The Role of Physical Educators in the Assessing Procedure
Results on observational studies suggest that explicit behavioral symptoms can be systematically observed during standardized play procedures (Mol Lous, Wit, De Bruyn, & Riksen-Walrawen, 2002). Physical education (PE) lessons and group play situations provide a unique opportunity to observe a child moving, interacting with his or her peers, cooperating or just being on his or her own. PE teachers spend a lot of time with the children, have the flexibility to work with them, and observe their behaviors in several ways (e.g., structured lessons or free play situations) and in several different settings (inside or outside the classroom, at the playground or at the school-yard). Giving the fact that evidence for the presence of externalizing and/or internalizing symptoms can be obtained in multiple active situations, and a number of behavioral symptoms can be observed during PE classes and team games (Kashani, Allan, Beck, Bledsoe, & Reid, 1997), PE teachers have the skills and the opportunity to distinguish between maladaptive and general age-related motor-related behaviors among their students.
PE teachers have the knowledge and the skills to focus on the “warning sings” of atypical motor behaviors providing useful information about the development of school-age children. However, there are only a few instruments that use the physical educators as main source of information about children’s development and the majority of them are focusing on movement and motor coordination problems like the Bruininks–Oseretsky Test of Motor Proficiency (BOT-2; Bruininks & Bruininks, 2005), the Test of Gross Motor Development (TGMD; Ulrich, 2000), or the Movement Assessment Battery for Children (MABC-II; Henderson, & Sugden, 2007), which assess gross and fine motor skills, balance, levels of motor skill development as part of psychological test batteries, for making decisions about educational placement, developing and evaluating intervention programs. In addition, none of the existing instruments for physical educators assess a wide array of children’s problematic behaviors, as most of them are focusing only on one specific disorder (e.g., anxiety) which is highly connected with performance in sports like the Physical Education State Anxiety Scale (PESAS; Barkoukis, Tsorbatzoudis, & Grouios, 2008) or are focusing on class management in school settings like the Physical Education Classroom Instrument (PECI; Kulinna, Cothran, & Regualos, 2003).
The Current Study
Considering the need for an instrument that is practical for wide-scale school use by physical educators, assesses a wide array of children’s behaviors, and possesses supportive psychometric evidence, this study aimed to examine the psychometric properties of the Motor Behavior Checklist (MBC) in a elementary school-aged sample of children, namely, the factorial validity, the internal consistency, the test–retest reliability, and the interrater reliability.
Method
The Preliminary List
Preliminary list included PE teachers’ reports about children’s problematic behaviors in school settings, and observable motor-related behaviors from the official psychiatric criteria about children’s psychopathology (Diagnostic and Statistical Manual of Mental Disorders [DSM-IV-TR]; American Psychiatric Association, 2000; International Statistical Classification of Diseases and Related Health Problems–Tenth Revision [IDC-10]; World Health Organization, 1992). During instrument development, every attempt was made to be sensitive to the varied contexts and children who participate in PE classes and some of the items represented behaviors unique to PE settings. A preliminary version of the MBC for children containing 85 items and a 5-point response scale format was used ranging from 0 (never) to 4 (almost always) for each of the items to give the rater the possibility to provide information about the frequency of the exhibited behavior.
Participants
The data analyzed were collected from a randomly selected sample (N = 841) of elementary school-age children. School review broad approval was obtained as well as appropriate consent/assent from participants and their parents. The data collected were anonymous and only codes about demographic characteristics of the participants were used. Overall sample consisted of 421 (50.1%) girls and 420 (49.9%) boys, ranging in age from 6 to 11 years (M = 8.4 years, SD = 1.7 years) and they were of Greek nationality (99%). The data derived from 35 typical Greek elementary schools widely spread across Greece selected so that the sample distribution would be representative of the urban and rural population. The schools were located in urban areas (63.3%) and in rural areas and islands (36.7%). The PE teachers (n= 62) of the schools who participated in this study were 35 females (56%) and 27 males (44%), with mean age 39.4 years (SD = 6.2 years) and mean teaching experience 7.2 years (SD = 3.4 years). The participants were asked to randomly select four children (2 boys and 2 girls) from each class and rate them using the 85-item preliminary version of MBC for children. The initial data were randomly divided into two sample groups. Sample 1 (n= 426) was used for examining the structure validity of the list and Sample 2 (n = 415) was used to cross validate the data and further assess the model fit. An overview of children’s characteristics from confirmatory factor analyses (CFAs) and reliability studies are presented in Table 1.
Children’s Characteristics Participating per Study.
Note. CFA = confirmatory factor analysis.
Standard deviations are in parentheses.
Data Analysis
The aims of study were twofold: (a) investigation of the factorial structure via confirmatory analyses and (b) examination of the reliability of the list. More specifically, the investigation of the structure validity was conducted into three different methodological steps: (a) initial examination of the factor structure of the list, (b) selection of the items based on specific criteria, and (c) investigation of the adequacy of the model fit using cross-validation. Concerning the reliability of the list, the internal consistency, the reproducibility, and the interrater agreement were also examined.
Factor structure
The initial data were divided randomly into two data samples. Data from Sample 1 (n = 426) were used to ran a CFA using maximum-likelihood method (LISREL 8; Jöreskog & Sörbom, 1993). Modifications to the hypothesized factor model were made based on the correlations among factors.
Selection of the items
Taking under consideration the selection criteria proposed by Marsh, Ellis, Parada, Richards, and Heubeck (2005), the standardized loadings, the modifications indices, and the item total correlations were examined to produce a more concise instrument within a parsimonious model. A confirmatory analysis was performed to examine a second-order factor model with seven factors using the 85 items of the list. In case of inadequate fit for the hypothesized model, modifications to the CFA were performed based on the analyses of items. The modifications made were based on high factor loadings, correlated uniqueness within each factor and inter correlations between items within each factor. More specifically, items that (a) best measured each factor having high standardized factor loadings (≥50) and (b) had minimal cross loadings on other factors as assessed via modification indices were chosen. Caution was taken not to reduce the number of subscales or the number of items within each scale so drastically that it led to construct under presentation that could mask the intended purpose and validity of the measure (Messick, 1995). The modifications made consisted of freeing up cross loadings and correlated uniqueness within each factor until a reasonable fit was obtained.
Assessing model fit
To confirm the adequacy of the model which revealed via the CFAs, we performed a cross-validation procedure. Data from Sample 2 (n = 415) were used to run a second-order factor confirmatory analysis using maximum-likelihood (LISREL 8; Jöreskog & Sörbom, 1993). Because the chi-square statistic frequently yields false positives when large samples are analyzed (Floyd & Widaman, 1995), the comparative fit index (CFI), the normed fit index (NFI), the goodness-of-fit index (GFI), the root mean of approximation (RMSEA), and the standardized root mean square residual (SRMS) indexes were used to evaluate the fit of the data.
Internal consistency
Internal consistency was estimated to measure the extent to which items in a subscale are correlated (homogeneous), thus measuring the same concept. Cronbach’s alpha coefficients were calculated of the subscales using the initial data of children (N = 841). Alpha values estimated separately for the Externalizing and Internalizing scales and the seven problem subscales.
Reproducibility
To determine test–retest reliability, intraclass correlation coefficients (ICC) were calculated using a sample of 129 elementary school children, 61 girls (47%), and 68 boys (52%) who were rated twice by their physical educators at schools. The children had mean age 8.51 years (SD = 1.75), 111 (86%) had the Greek nationality, and they were attending nine typical elementary schools from Athens and Thessaloniki. The participant physical educators, 7 females and 4 males, recorded their student’s motor-related behaviors during PE lessons in school environment using the MBC for children twice within 1 month.
Interrater reliability
Data from 22 physical educators (14 females and 8 males) from 11 elementary schools from Athens and Thessaloniki, who rated 126 of their students using the MBC, were used to assess interrater reliability of the checklist. The participant children were 67 boys (53%) and 59 girls (47%), with mean age 8.51 years (SD = 1.75) and 118 (92%) were of Greek nationality. From each school, two physical educators working independently with the same students but in different settings where involved. One was the traditional PE teacher working only in movement situations, involving sports, and the second one was working in both settings, inside the classroom, giving theoretical information about regulations in sports and nutrition, and outside the classroom working in team games.
Results
Factor Structure
Examination of the nine-factor model revealed high correlation between factor Disobedience and factor Aggressiveness (r = .98) and between factor Hyperactivity and factor Impulsivity (r = .95). It was assumed that these factors were so similar in content describing and assessing aspects of the same construct and therefore had to be reduced into two. Factor Disobedience (8 items) and factor Aggressiveness (7 items) were merged into one factor containing the sum of their items (n = 15 items) and it was named “Rules Breaking” as the most of the items were describing aggressive behaviors mainly connected with disobedience and violation of rules in school environment. Factor Hyperactivity (6 items) and factor Impulsivity (9 items) were also merged into one factor containing the sum of their items (n = 15 items) and named “Hyperactivity/Impulsivity” containing items describing hyperactive and impulsive behaviors. The other five factors were Low Energy (n = 4 items), with items describing decreased activity; Stereotyped Behaviors (n = 6 items), with items describing repeated patterns of activity; Lack of Attention (n = 10 items), containing items describing problems in attention and lack of concentration; Lack of Social Interaction (n = 16 items), containing items describing problems in communication and social interaction with teachers and peers; and Lack of Self-Regulation (n = 19 items), containing mainly items describing anxiety and inability of the child to regulate behavior.
Selection of the Items
A second-order CFA indicated the existence of two higher orders Externalizing and Internalizing factors containing the seven problems subscales. The indices of fit for the initial model were as follows: RMSEA = .092, CFI = .87, NFI = .90, GFI = .89, and SRMS = .068. These indices were considered rather low, and to improve the model fit and reduce the number of items, based on the criteria proposed by Marsh et al. (2005), the following modifications were made: (a) Items with low factor loadings (≤.50) were deleted and (b) items with cross loadings were excluded from the initial list. In this way, a new sorter list containing 59-items was developed. More specifically, the reduced model consisted of factor Rules Breaking (7 items), Hyperactivity/Impulsivity (14 items), Lack of Attention (10 items), Low Energy (4 items), Stereotyped Behavior (2 items), Lack of Social Interaction (10 items), and Lack of Self-Regulation (12 items). The reduced model presented an optimal level of fit to the data (RMSEA = .074, CFI = .97, NFI = .96, GFI = .93, and SRMS = .55) and was characterized by significant and substantial loadings (i.e., ranging from .57 to .81). The second-order CFA model for the MBC is presented in Figure 1.

Second-order CFA of the MBC.
Correlation between the two higher factors, Externalizing and Internalizing problems, was r = .32. The estimated correlations between the Externalizing factor and Rules Breaking, Hyperactivity/Impulsivity, Lack of Attention factors were r = .95, r = .98, and r = .85, respectively. In addition, the correlations between the Internalizing domain and the four factors—Low Energy, Stereotyped Behavior, Lack of Social Interaction, and Lack of Self-Regulation—were r = .81, r = .80, r = .95, and r = .97, respectively.
Model Fit
We performed cross-validation on the second half of the sample to confirm that the reduced model presented an optimal level of fit to the data. Results from the CFA on the second half of the data (n = 415) and examination of the fit indices supported further the adequacy of the reduced model fit. More specifically, the CFI was .96, the NFI was .95, and the GFI was .92. In addition, the RMSEA was .072 and the SRMS value was .054. According to Hu and Bentler (1999), values less than .80 for RMSEA and SRMS indices are acceptable and indicate good fit for the factor model. Correlations among factors and indices of fit are presented in Table 2.
Fit Statistics and Correlations Among Scales of the MBC.
Note. MBC = Motor Behavior Checklist; RMSEA = root mean of approximation; CFI = comparative fit index; NFI = normed fit index; GFI = goodness-of-fit index; SRMS = standardized root mean square residual.
p < .001.
Internal consistency
Alpha values for all the subscales were excellent suggesting that the list was homogeneous in content. More specifically, for the factor Rules Breaking (7 items), alpha value was .95; for Low Energy (4 items), alpha = .82; for Stereotyped Behavior (2 items), alpha = .85; for Hyperactivity/Impulsivity (14 items), alpha = .95; for Lack of Attention (10 items), alpha = .95; for Lack of Social Interaction (10 items), alpha = .94; and for Lack of Self-Regulation (12 items), the alpha coefficient was .91. In addition, for the Externalizing scale (31 items), alpha coefficient was .93, and for the Internalizing scale (28 items) the coefficient alpha was .91.
Reproducibility
ICC for each of the seven subscales were calculated separately. For the Rules Breaking scale, the ICC was .87; for the Low Energy factor .78; for the Stereotyped Behavior, it was .82; for factor Hyperactivity/Impulsivity, it was .90; for the Lack of Attention, it was .89; for Lack of Social Interaction, it was .85; and for the Lack of Self-Regulation, the ICC coefficient was .83. For the Externalizing scale, the ICC coefficient was .87, and for the Internalizing scale, the ICC coefficient was .81. All correlations were highly significant at p < .001 (see Table 3).
Reliability Coefficients.
Note. ICC = intraclass correlation coefficient.
p < .001.
Interrater reliability
The ICC coefficients were significant at p < .001, ranging from .75 (Low Energy) to .91 (Lack of Attention). More specifically, for the scales of the Externalizing domain, the interrater agreement was for the Rules Breaking factor ICC = .88, for the Lack of Attention factor ICC = .91, and for the Hyperactivity/Impulsivity factor ICC = .88. In addition, lower but statistical significant correlations were assessed for Internalizing factors. The correlation agreement for the Low Energy factor was ICC = .75, for the Stereotyped Behaviors was ICC = .85, for the Lack of Social Interaction was ICC = .74, and for the Lack of Self-Regulation factor was ICC = .81. In addition, ICC coefficient for the Externalizing scale was .78, and for the Internalizing scale, the ICC coefficient was .71. Internal consistency, test–retest reliability, and interrater reliability coefficients are presented in Table 3.
Summary of the Results
Aim of study was to investigate the key psychometric properties, namely, the structure validity, the internal consistency, the temporal stability, and the interrater agreement of a new scale (i.e., the MBC) for the assessment of emotional, behavioral, and developmental disorders in elementary school-age children by the PE teacher. A series of CFAs revealed a second-order model with two (Externalizing and Internalizing) broadband factors and seven problems scales. The items belonging to the two main factors and the seven problem scales are presented in Table 4.
Motor Behavior Checklist for Children: Items per Problem Scale.
The internal consistency was high for each scale suggesting that the list was homogeneous in content. In addition, the reproducibility and the interrater agreement were excellent suggesting that the MBC for children is an instrument with high temporal stability and high correlation agreement when used by physical educators in school settings.
General Discussion
A New Instrument for PE Teachers
Students with EBD include a wide range of children and youth, including those with externalizing and internalizing behavior problems (Morris, Shah, & Morris, 2002; Walker, Ramsey, & Gresham, 2004). These students often have broad-based needs because of their behavioral, social, and academic deficits which often do not improve over time (Lane, 2007; Mattison, Hooper, & Glassberg, 2002; Nelson et al., 2003). Because not all students with EBD will necessarily require special education, it is very important that educators and school administrators be prepared to implement systematic screening efforts to identify students who may show early signs of EBD. As such, the first step is to implement systematic screening tools to identify students who might benefit from more focused supports (Lane, 2007).
Physical educators have the advantage of observing the child within a peer group, allowing these experts in movement situations to distinguish between maladaptive and typical age-related behaviors. This study fills an important gap in the literature as physical educators lack a practical and reliable instrument for providing useful and valid information about children’s with behavioral and emotional problems on the basis of their motor-related behaviors, despite the fact that a lot of useful information could be obtained through observation during PE lessons in school settings or free play situations (Mol Lous et al., 2002).
MBC for children is a new practical and useful measure for assessing externalizing and/or internalizing problems in elementary school-age children by the PE teacher. From this point of view, the instrument could be used to provide valuable additional information about child’s problematic behavior and help physical educators in their important decision to refer or not students for further evaluation by the diagnostic teams. Although the MBC for children is not designed to be used as a diagnostic tool in clinical settings, however, the data provided by the instrument could be useful as complementary information during assessment procedures. Rating the child on a number of motor-related behaviors, a lot of valuable information concerning the global behavioral status of the child could help pediatrics and school psychologists, during their psychological evaluation and especially when psychomotor intervention programs and behavioral interventions are proposed to applied.
Psychometric Properties of the MBC for Children
The aim of this study was the evaluation of the key psychometric properties of the new instrument: structure validity, internal consistency, reproducibility, and interrater agreement. A series of CFAs established a second-order model with two (Externalizing and Internalizing) broadband domains and seven problems scales (Rules Breaking, Hyperactivity/Impulsivity, Lack of Attention, Low Energy, Stereotyped Behaviors, Lack of Social Interaction, and Lack of Self-Regulation). Items selection and items reduction per scale were based on statistical techniques (i.e., factor loadings, correlated uniqueness), and the selected items reflected areas that are important to the target population that is being studied. Therefore, the target populations (elementary students) as well as the target experts (PE teachers) were involved during item selection.
Results from the internal consistency revealed high and significant values for each problem scale suggesting that the list was homogeneous in content. Results from test–retest study supports evidence that the list is characterized by satisfactory short-term stability. The level of correspondence between the test and retest was significant, but yet, it should be noted that the time lapse between the two administrations was relatively short (2 weeks) and future research should verify whether the MBC results remain stable over longer period of time. In addition, the interrater agreement was significant for all the problems scales, but the higher correlations agreements between the two physical educators were on the externalizing problems scales and especially on Rules Breaking and Hyperactivity/Impulsivity scales, when the lowest agreement were noticed on the Lack of Social Interaction scale and the Low Energy scale. The different educational settings between the two observers could partly justify the lower agreement on these Internalizing problem scales as social interaction and decreased activity are no easily observed in classroom settings.
An issue that disserves to be discussed is the incremental validity of MBC as a new instrument in the Greek culture. The fact that MBC for children was developed based on a theoretical and procedural framework derived from pilot studies in Greece makes it more ecological valid and appropriate in identifying culture specific aspects of a construct (Tsaousis & Georgiadis, 2009). Furthermore, the instrument contains items that were derived from the reports of Greek physical educators and as is demonstrated in this study, the MBC for children has sound psychometric properties.
Limitations and Future Research
Participants were typical Greek elementary school-age children. It must be noted that findings may differ with a more diverse sample of clinical populations from psychiatric centers. In addition, future research studies are needed to investigate the ability of the list to predict children with disorders based on MBC scores on separate problem scales of the instrument. Future research efforts could investigate the degree to which the MBC is a tool sensitive and specific in identifying students with (a) externalizing problems and/or (b) Internalizing problems with accuracy.
Conclusion
Taking into consideration that early identification for emotional, behavioral, and/or developmental problems can help to minimize the long-term harm of mental disorders and reduce the overall health care burden and costs (Aos et al., 2004), the MBC for children could be used for various educational purposes, including research projects and intervention programs. The findings of this study are quite encouraging for the future use of MBC for children in the Greek population. Psychometric results supported the model suggesting that MBC for children is a new instrument homogeneous in content, with high temporal stability and high interrater agreement that can provide useful and reliable ratings on behavioral and emotional problems in children when used by PE teachers in school settings. In addition, recent research studies (Efstratopoulou, Janssen, & Simons, 2012a; Efstratopoulou, Janssen, & Simons, 2012b) have also established the discriminant and concurrent validity of the instrument.
Implication for practical use of the MBC for children in school settings could investigate children’s problematic motor-related behavior and the effectiveness of intervention programs aiming to reduce inappropriate behavior. The information provided by the MBC may contribute to physical educators in developing class management techniques and assess the effectiveness of their educational interventions with a pre–post administration of the instrument. Moreover, one very important issue that is connected with special education settings is that the information provided, when assessing children’s deviant behaviors in a valid and systematic way within elementary school settings, may help PE teachers to decide about the referral or not of children for further diagnostic evaluation.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
