Abstract
As the interest in improving children’s early math skills has grown, due in part to their strong associations with later overall school achievement, questions have been raised regarding teachers’ knowledge about children’s math abilities. The current study uses hierarchical linear modeling to examine the associations between teachers’ judgments of children’s math skills using an indirect rating scale assessment and children’s performance on two direct assessments of their number sense and geometry and measurement skills. Approximately 40% of the variance in the rating scale completed by the teacher is attributable to systematic differences between classrooms, not specifically to the child. Overall, the association between teacher report and students’ skills is approximately r = .50, which suggests that teachers can accurately determine whether students are above or below the mean but do not appropriately rate students as high or low as a direct assessment of their skills would indicate. This finding has implications for teachers, particularly in terms of teacher preparation, and for researchers, because of the information regarding the accuracy of teacher rating scales of preschool students’ math skills.
As research increasingly suggests an association between early math skills and later school achievement (Duncan et al., 2007; National Association for the Education for Young Children & National Council of Teachers of Mathematics, 2002), focus has turned to factors that affect math skills in preschool (National Research Council, 2009). For the purposes of this study, the four main domains of math in preschool are studied: geometry, measurement, number sense, and operation skills (National Research Council, 2009). One factor that is hypothesized to affect children’s math scores is their teachers’ beliefs and knowledge about math, both conceptually and in terms of children’s development of mathematic skills (National Research Council, 2009). The current study focuses on the accuracy of teachers’ perceptions of children’s math skills at the beginning of the preschool year. This information can inform research by determining whether ratings gathered from teacher report accurately align with children’s math skills, thus reducing the need for direct assessments of children’s math skills to determine curricular effectiveness, and inform practice by allowing teachers to more fully understand their abilities and limitations in making judgments on preschoolers’ math skills.
Types of Assessment
The need for and accuracy of assessments of preschool children’s skills is a controversial topic. Some advocate the use of assessments to determine whether certain programs or instructional supports are effective in increasing children’s knowledge (Meisels, 2006) and others maintaining that the outcomes of assessments on preschool-aged children are not reliable and put the children in a situation that is unnecessarily stressful (Shepard, 1994). However, despite the differences in views, Cabell et al. (Cabell, Justice, Zucker, & Kilday, 2009) note that it is important for educators to have access to detailed information about students’ development to plan lessons and provide appropriate scaffolding for individual children. The two types of assessments examined in the current study are direct assessments: (1) tests given directly to children, of number sense and geometry and indirect assessments, and (2) teachers’ report of children’s skills in the same domains. Direct assessments are used to elicit children’s skills through demonstration. For example, children are asked to demonstrate knowledge by counting the number of objects and telling the assessor how many there are in total. Indirect assessments elicit information from adults, usually teachers, who have had the opportunity to observe the child over time to rate the child’s skills in a given area. Conducting direct assessments with young children can be problematic because of children’s inability to sustain attention and possible shyness or uneasiness in working with strangers (Feldman et al., 2005; Vacc & Ritter, 1995). Indirect assessments, because of their typical questionnaire or checklist formats, are an efficient way of collecting information about children’s skills in a short amount of time, are convenient, low cost, and do not require the child to sit through long periods of testing (Cabell et al., 2009). Given the benefits of indirect assessment, it would be helpful for researchers to know whether teacher reports are valid measures of children’s skills and whether there is any information about children’s skills that is obtained from direct assessments that is not captured in teacher reports.
Although the benefits of indirect assessment are numerous, it is important to note that previous research findings have indicated that indirect assessments of children’s skills contain a certain amount of variance that is attributable to the teacher and not the student (Konold, Walthall, & Pianta, 2004; McConaughy & Ritter, 1995). An analysis of teachers’ ratings of students’ social emotional skills indicated 15% to 33% of the variability in teacher ratings were attributable to the teacher (Mashburn, Hamre, Downer, & Pianta, 2006). Because of the subjective nature of indirect assessments, a teacher’s ratings may be reflecting characteristics of him or herself and not the child being assessed (Mashburn & Henry, 2004). Mashburn et al. note that interpreting teacher ratings as a true reflection of children’s skills may lead to incorrect conclusions. Factors that have been shown to be associated with teacher ratings of children’s kindergarten readiness are the teacher’s education level and the socioeconomic status of children in their class (Mashburn & Henry, 2004). In addition, teacher-level factors that are associated with teachers’ ratings of children’s social and emotional development in pre-k are years of experience, race of the child, self-efficacy of the teacher, the ratio of adults to children in the classroom, length of the school day, and whether or not the classroom is located in an elementary school (Mashburn et al., 2006).
Associations Between Teachers’ Knowledge and Perceptions and Children’s Skills
Understanding the accuracy of teacher ratings is helpful not only to researchers but also to teachers because it will allow teachers to know whether they can trust their initial impressions about student abilities or whether further direct assessment would provide additional necessary information for scaffolding and teaching plans. Previous studies have found evidence that teachers’ beliefs and knowledge affect their teaching (Fang, 1996; Pianta et al., 2005; Stipek & Byler, 1997). For example, Pianta et al. found that teachers’ beliefs about teaching, specifically the extent to which they adhered to more adult-centered beliefs, were associated with lower global classroom quality. Although less research has been done specifically on teachers’ perceptions of children’s math skills, Heuvel-Panhuizen (1990) found that teachers significantly underestimated the capabilities of 6-year-old children, in terms of knowledge of symbols, counting, and mathematical operations. Overall, teachers rank math teaching in preschool as less important than social-emotional and literacy teaching (Ginsburg, Lee, & Boyd, 2008). This is supported by the difference in time spent on these activities (National Research Council, 2009). In a study of preschool classroom practices, math teaching and learning occupied only 6.6% of the school day, with the majority of this being in conjunction with other activities over half (58%) of the time (National Research Council, 2009, p. 238). Overall, teachers report being uncomfortable with teaching mathematics (Clements & Sarama, 2007; Ginsburg et al., 2006b) and most are lacking professional preparation in teaching it (Ginsburg, Cannon, Eisenband, & Pappas, 2006a).
Research Aims
This study is designed to examine the concurrent validity of teachers’ judgments of students’ math abilities in preschool. To assess this, the indirect teacher report measure of children’s math skills will be analyzed in terms of its variance (i.e., how much variance is attributable to the teacher level of analysis and how much to the student level). Also, the indirect measure will be examined in terms of the extent to which the measure aligns with two direct assessments of children’s math skills.
Method
Procedures
The participants in this study were teachers and students enrolled in a field trial of a curriculum designed to enhance students’ knowledge of math and science. The children attended public pre-k programs targeting children at risk of school failure and exhibited one or more of an established set of risk factors (i.e., low family income, single-parent household, substance abuse present in the home; Virginia Preschool Initiative—Guidelines, 2006-2007). Teachers in 33 classrooms participated in the study, and approximately 10 children in each classroom were assessed using direct assessments of their mathematical knowledge, for a total of 318 children. Also, there is some missing data at the child (6%) and teacher level (3%), primarily in terms of demographic information.
Consent forms were sent home to the guardian of each child in the classroom to ask for permission to complete a battery of direct assessments with his or her child. Ninety-five percent of students’ guardians gave consent for participation in the study. Students with an active IEP, other than for speech, or who spoke a language other than English, were excluded from the pool of students for direct assessment. Once consents were received, up to 10 students in each classroom were randomly chosen to participate in the battery of direct assessments, depending on the pool of consented students in each classroom.
All data included in this study were collected at the beginning of the school year, primarily in October. Although school began in September, the time frame for ratings and assessments allowed teachers approximately a month to allow the teachers time to get to know the students’ skills and abilities before being asked to complete rating scales of students’ math and science knowledge. To be included in the analyses, the rating-scale data had to be received from the teachers by the end of November so as to correspond with the direct assessment measures.
Participants
Demographic information was collected for student participants through a survey sent home to parents or caregivers. All 318 students included in the study were between 3.5 years and 5 years old (M = 4.57, SD = 0.50), and these measures were collected during their preschool year prior to entry in kindergarten. The sample is divided evenly between girls and boys, with 69% African American students and the remainder constituting Caucasian or other races. The majority of mothers of students in the study had some college experience (57%), and only 10% had a bachelor’s degree or higher education. The average household income for students included in the study was US$32,634.23 (SD = US$21,831.71).
Teachers were asked to complete a survey reporting on their demographic information, attitudes and feelings toward teaching, background in early childhood education, and structure of their classrooms. Teachers in this study ranged between 25 and 66 years of age (M = 46.59 years, SD = 10.94) and had between 1 and 32 years of teaching experience (M = 7.74, SD = 6.61). Fourteen of the teachers were Caucasians, and 18 were African Americans. One third of the teachers reported holding a bachelor’s degree, and the remaining two thirds reported having education beyond a bachelor’s degree or another type of degree. Of the teachers, 55% held degrees in early childhood education, 30% in elementary education, and 15% in other fields.
Measures
Indirect Assessments: Teacher Rating of Math Skills
Overall score
Each teacher was asked to rate the math skills of the 10 students selected for direct child assessments from his or her class. The measure used is a modified version of the Academic Rating Scale (ARS) for mathematics that was developed by the Early Childhood Longitudinal Study—Kindergarten Cohort (ECLS-K). The measure included the original seven items from the ARS along with an additional five items developed to assess specific objectives in the math curriculum. The additional items were designed to follow the format of the original ARS items. Teachers were asked to rate the skills of students in various mathematical topics on a scale of 1 to 5, with 1 representing that the child has not yet demonstrated the skill, 2 representing that the child is beginning to demonstrate the skill, 3 representing the skill is in progress, 4 representing that the child demonstrates an intermediate level of the skill, and 5 representing that the child is proficient in the skill. The skills rated pertain to the student’s abilities in number sense, numerical operations, geometry, and measurement. In addition, teachers had the option to mark any skill as “Not Applicable” (N/A). For the purposes of these analyses, ratings of children were retained for analysis if the child had at least eight non-N/A answers. To determine the overall score, a mean score was created from all questions that were rated as non-N/A for each child. The mean for all children for the overall math score was 2.29 (SD = 0.80) and ranged from 1.00 to 4.78 (see Table 1 for descriptive information on teacher ratings).
Outcome Descriptives
Note: TR = teacher report; DA = direct assessments; TEMA-3 = Test of Early Mathematic Ability—3rd ed.; M-TEAM = Modified Tools for Early Assessment in Mathematics.
Number sense and geometry & measurement (G&M) subscales
In addition to the overall mean, two subscales were derived from a theoretical examination of the questions and corresponding outcome measures. One subscale consisted of four questions that addressed students’ knowledge of number and numerical operations (Cronbach’s alpha = .92). The second subscale consisted of five questions that addressed students’ skills in geometry and measurement (G&M; Cronbach’s alpha = .89). Three were not included in either subscale because of their general application to math skills as a whole, not specifically to number sense or G&M. An example of this type of question is, “Uses a variety of strategies to solve math problems,” which pertains to general math skills. For the purposes of these analyses, a mean score was created for each child for each subscale. The subscale mean scores, like the overall mean score, were calculated by creating an average of all non-N/A answers included in the subscale. The average for the number sense subscale was 2.30 (SD = 0.95) and ranged from 1.00 to 5.00. The geometry and measurement subscale also ranged from 1.00 to 5.00 and had a mean of 2.47 (SD = 0.83).
Direct Assessments
Direct assessment math composite
To compare the overall math rating scale total with an overall math direct assessment total, a composite score was created from the direct assessments. To give each test equal weight, students’ scores on the number sense assessment and the geometry and measurement test were standardized and then added together. This score became the child’s overall math skills composite score (M = 0.00, SD = 1.82; see Table 1 for descriptive information on the direct assessments). For further information on the two tests used in the composite score, see the following.
Number sense—TEMA-3
Students’ knowledge of number and numerical operations was tested using the Test of Early Mathematic Ability—3rd ed. (TEMA-3; Ginsburg & Baroody, 2003). This standardized measure uses pictures and counting chips to assess students’ skills in number knowledge, such as cardinality, ordinality, one-to-one correspondence, and enumeration, and their abilities in numerical operations, for example, the ability to share fairly, add and subtract using word problems, and comparison of groups to determine the larger and smaller of the two. The TEMA-3 is designed to measure a child’s knowledge of both formal and informal mathematic abilities, focusing on the domains of counting, one-to-one correspondence, numeral recognition, number facts, calculation, and understanding of concepts. The TEMA-3 has parallel forms (A and B) and is norm referenced and designed to be given to children between the ages of 3 and 8 years, as either a diagnostic tool for children having difficulty in a specific mathematics domain or to determine how a child is performing in relation to his or her peers. The TEMA-3 (Ginsburg & Baroody, 2003) reports high internal reliability with coefficient alphas for 6 overarching age intervals between .92 and .96, and subgroup analysis revealed .98 and .99 for all subgroups, excluding Native Americans. Test–retest reliability ranged from .82 to .93. Content validity was determined by correlating items on the test to determine item discrimination indexes (.45 to .68). Criterion validity with the KeyMath R Basic concepts subtest was .54 and .91 with the Young Children’s Achievement Test Math Quotient (Bliss, 2006). For the current sample, the mean across children was 12.09 (SD = 4.86) and ranged from 2 to 27 points (Cronbach’s alpha = .91).
Geometry & measurement—modified tools for early assessment in mathematics (M-TEAM)
The measure assessing students’ knowledge in geometry and measurement was adapted from the Tools for Early Assessment in Mathematics (TEAM; Clements, Sarama, & Wolfe, 2011). Because of the length of the assessment, the measure was modified to be shorter and items were added to address the specific needs of the curriculum being tested. Eight questions were included from the original assessment and 10 new questions, designed to align with the format and style of the original questions, were added to assess additional objectives included in the curriculum. It uses manipulatives and pictures to address questions regarding the measurement of objects and properties of shapes. For the current sample, Cronbach’s alpha was .82 and the mean across students was 10.72 (SD = 6.56) and ranged from 0 to 35 points.
Data Analyses
Data were analyzed using Hierarchical Linear Modeling—6 (HLM-6; Raudenbush & Bryk, 2002) framework and software. Analyzing the data in this manner provided estimations of the two sources of variance in teacher ratings of student skills: the teacher, who determines the ratings, and the student. To determine the extent to which ratings were a function of the teacher assigning the ratings, unconditional models of each outcome variable were analyzed in the HLM framework. The three pieces of the unconditional model are provided in Equations 1a, 1b, and 1c. Equation 1a specifies the first level of the two-level model, which states that the teacher’s rating (Y) for a child (i) in a classroom (j) is a function of the teacher’s average rating of children in her class (β00) and the child’s individual distance from that mean (rij), which is the error term. The second level of the two-level model is specified by Equation 1b, which states that the average ratings of children within a teacher’s class (β00) is determined by the average rating of children by teachers across classes (ϒ00) and the extent to which the individual teacher varies from that mean (u0 j ), the error in the second level of the model. Equations 1a and 1b are combined to provide the overall model in Equation 1c, which states that an individual child’s rating (Yij) is a function of the average rating of teachers across classes (ϒ00), the child’s teacher’s variation from that mean (u0 j ), and the child’s individual variation from the other students in his or her class (rij).
Intraclass correlations were calculated for each outcome from the variance associated with each equation. The variance of the Level 1 equation is represented by σ2 ij and the variance from the Level 2 equation is represented by τ0j. The intraclass correlation is the proportion of the total variance (σ2 ij + τ0j) that is attributable to the rater, as determined by the average differences between raters (τ0j).
To determine the accuracy of teachers’ ratings of students’ math skills, child performance on direct assessments is added as a predictor in the Level 1 equation (see Equation 2). The regression coefficient (β0n) from this equation can be interpreted similarly to a correlation coefficient in that it explains the degree to which the two assessments are related.
For the purposes of these analyses, the outcome measures including teacher reports of overall math ability, number sense, and geometry and measurement skills were compared to the direct assessment measures of overall math achievement, and number sense and geometry and measurement. To provide easily interpretable results, each of the six measures was standardized for use in the analyses and all measures across both types of assessment were compared.
Results
Descriptively, teachers rated the majority of children in the fall of the school year as “beginning” to demonstrate given skills or were “in progress” toward developing those skills. Children were rated less frequently as “not yet” demonstrating, “intermediate,” or “proficient” in the skill. The intraclass correlations (ICCs) computed for each of the measures are presented in Table 2, along with the between-classroom variance. For all measures except the TEMA-3, the between-classroom differences were significant. This indicates that a significant proportion of the variance in the measures were due to systematic differences between classrooms, reflecting factors attributable to the teacher, rather than the children’s skills. Analyses of the indirect measures indicated that, overall, approximately 40% of the variation in teachers’ ratings of students’ math skills stems from characteristics inherent to the teacher and not the skills of the child.
Intraclass Correlation Coefficients (ICC) and Percent of Variance Explained Within and Between Classrooms
Note: Level 1, n = 318; Level 2, n = 35. TR = teacher report; DA = direct assessments; G&M = geometry & measurement; TEMA-3 = Test of Early Mathematic Ability -3rd Edition; M-TEAM = Modified Tools for Early Assessment in Mathematics.
p < .05. **p < .01.
Results of the regression analysis comparing associations within and between the teacher report and direct assessment measures are presented in Table 3. Within types of measure (direct or indirect), associations were strong with all regression coefficients above .85 (p <.01), with the exception of the direct measures of number sense and G&M, which had a regression coefficient of .63 (p <.01). Across types of measures, associations were all significant at the p <.01 level and moderate in size (βs between .42 and .54). The association between teacher report of students’ math skills (overall) and the composite of the direct assessments of children’s math skills was .53, indicating that, because of the standardization in the measures, students who were 1 SD above the mean on the direct assessment composite were rated as approximately 0.5 SD above the mean by the teacher. In terms of teacher report in the specific domains of math, teachers scored children who were 1 SD above or below the mean in number sense as 0.49 SD above or below the mean (β = .49, p <.01) and scored children who were 1 SD above or the below the mean as 0.43 SD above or below the mean in G&M skills (β = .43, p <.01). For children below the mean, teachers were also more accurate in judging number sense than geometry and measurement, with students 1 SD below the mean rated as 0.49 below the mean in number sense and 0.43 below the mean in geometry and measurement.
Regression Coefficients (Robust Standard Errors)
Note: Level 1, n = 318; Level 2, n = 35. TR = teacher report; DA = direct assessments; G&M = geometry & measurement; TEMA-3 = Test of Early Mathematic Ability—3rd ed.; M-TEAM = Modified Tools for Early Assessment in Mathematics.
p < .05. **p < .01.
Discussion
The results from this study indicate that though teachers can accurately determine whether children are above or below the mean, they still tend to misestimate children’s abilities in math in the fall of the preschool year prior to kindergarten entry, which aligns with previous findings of older children (Heuvel-Panhuizen, 1990).
Implications for Teachers and Teacher Preparation
Overall, teachers tend to rate children who are 1 SD above or below the mean as approximately 0.5 SD above or below the mean, respectively. The findings also suggest that teachers are slightly better at rating children in terms of their number sense than their skills in geometry and measurement. This could be because teachers are more familiar with recognizing children’s number-sense skills than those in geometry and measurement.
The finding that teachers are not entirely accurate in their judgments of students’ math skills in preschool is not surprising, given previous research on teachers’ knowledge of math and perceptions of the importance of math in preschool. Because many teachers feel uncomfortable teaching math and address it less frequently in their classrooms (National Research Council, 2009), they may not know how to appropriately judge children’s math skills and rely more on a general impression of children’s overall skills instead of specific skill demonstration by the child. Overall, this work has implications for teacher preparation programs because it indicates that teachers need to be better able to appropriately recognize and understand demonstrations of children’s math skills and also have a stronger understanding of children’s learning trajectories in mathematics (Clements & Sarama, 2009; Sarama & Clements, 2009).
Implications for Researchers
Although teacher report is a cost-effective method of gathering student-level data, researchers should be cautious when using the data, with the understanding that it may not be accurately capturing children’s skills and abilities, particularly in the area of mathematics. Overall, for situations in which a researcher would want to examine unintentional effects of a project on math achievement, indirect assessments may be sufficient. However, in situations where math outcomes are the focus of research, direct assessments give more accurate information on students’ skills, and would be preferred, despite the increased time required and cost incurred.
Also, it should be considered that the ratings gathered reflect not only the skills of the student but also the characteristics of the teacher and classroom, with approximately 40% of the variability in ratings due to factors other than those attributable to the child. It may be helpful for researchers to try to control for as many of these factors as possible, when utilizing indirect assessments. Also, when compared with findings of studies regarding teacher report in other domains, the proportion of the variance that is attributable to the teacher is slightly higher. For example, in the study by Mashburn et al. (2006) examining teachers’ perceptions of language skills, the proportion of teacher variance ranged between 15 and 33%. This difference could also be due to teachers’ lack of familiarity with specific behavioral markers for the demonstration of math skills.
Although the appropriateness of direct assessments for preschool-aged children is somewhat contentious, with some advocating the need for direct assessments to determine program effectiveness (Meisels, 2006) and others maintaining that children are too young to be reliably assessed (Shepard, 1994), the current study demonstrates the problem of relying solely on teacher report of students’ skills. Because teachers’ judgments are inherently subjective, they reflect a significant amount of variance that stems from the teacher and classroom level and are not specifically related to a child’s skills. Also, in terms of the ability to reliably assess preschool-aged children, the TEMA-3 showed nonsignificant between-classroom variance, indicating that it was not receiving statistical “noise” from factors other than the child and the measure overall showing high validity and reliability, all of which supports the idea that children can be reliably assessed.
Limitations
It is important to note that these analyses were explored only with children who had eight or more non-N/A ratings from the teacher, which excluded a portion of the test population. In addition, information on the exact date of teacher report and its relation to the beginning of the school year was not available, excluding the possibility to examine whether teachers ratings varied as a function of the time that they had worked with the children.
Summary and Future Directions
This research is a first step toward understanding the hypothesized relationship between teachers’ beliefs and student outcomes (National Research Council, 2009). The findings demonstrate that teachers misestimate preschool students’ abilities in math, both in number sense and in geometry and measurement, which aligns with previous work on older children (Heuvel-Panhuizen, 1990). In addition, it supports the findings of teacher reports in other domains that indicate a significant proportion of the variance is due to factors other than the child because of the inherently subjective nature of rating scales (Mashburn et al., 2006). The next step in this line of research will be to examine the relationship between preschool teachers’ perceptions of students’ skills and what and how they choose to teach math in their classrooms. These findings have implications for teachers, because of the indication that they systematically tend to misestimate students’ skills in math, and for researchers, by demonstrating the link between direct and indirect assessments that can lead to more informed decisions about which is most appropriate for specific research needs.
Footnotes
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The authors received the following financial support for the research, authorship, and/or publication of this article: The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grants R305B040049 and R305A07068 to the University of Virginia.
