Abstract
Numerous studies have identified differences between males and females in academic performance across the areas of reading, writing, and mathematics. The current study examined whether or not gender differences exist when math curriculum–based measures (M-CBMs) are used to assess basic math computation skills in a sample of third- through eighth-grade students. Participants included 1,626 general and special education students from five schools in a rural southeastern school district. Two-way repeated measures ANOVAs were used to determine significance across genders at each grade level. Statistically significant differences in favor of females were found in Grades 5, 7, and 8. The discussion highlights applied and theoretical implications of these findings.
Increasing numbers of schools are using multitiered system of supports (MTSS) models to identify students at risk for academic failure and provide early intervention to those students with the goal of preventing later failure. As the use of MTSS frameworks becomes more popular, many schools have relied upon formative assessments of basic academic skills to universally screen students for potential deficits and to provide more frequent progress monitoring for students who receive remedial interventions. Curriculum-based measurement (CBM) is one such way to screen and progress monitor students’ academic achievement. CBM consists of standardized sets of brief assessments that measure basic academic skill attainment in areas such as reading, writing, and mathematics. CBM was initially developed nearly 30 years ago by researchers at the University of Minnesota, and has a large body of research supporting its reliability, validity, and efficacy for use with K-12 students. In schools that are implementing response to intervention (RtI), CBM scores may be used to make eligibility decisions regarding special education services. Recently, researchers have found gender differences favoring females on CBM probes that assess early literacy, reading, and writing skills (Below, Skinner, Fearrington, & Sorrell, 2010; Fearrington et al., 2014; Keller-Margulis, Mercer, Payan, & McGee, 2015; Malecki & Jewell, 2003).
No studies to date have examined whether or not gender differences are apparent when math curriculum–based measures (M-CBMs) are used to evaluate mathematical ability. M-CBM probes assess basic math computational skills, such as addition, subtraction, multiplication, and division. It is important for educators to be cognizant of any gender differences that may exist across M-CBM scores, as student scores on these measures may be used to make high-stakes decisions including placement in special education. Also, M-CBM scores are often used to evaluate student improvement (or lack thereof) after exposure to evidence-based interventions. The current study aimed to reveal any possible gender gap that may exist in mathematical ability as measured by triannually administered M-CBM probes.
Does It Exist? The Gap in Academic Achievement
A multitude of researchers have investigated potential gender gaps in academic achievement over the past 40 years. Numerous studies have been conducted using a wide range of age groups and a number of different achievement measures, including standardized tests, informal classroom assessments, and grades (National Center for Education Statistics, 2011a, 2011b). Most researchers have found a slight female advantage in reading achievement and a more pronounced female advantage in writing (Below et al., 2010; Berninger, Nielsen, Abbott, Wijsman, & Raskind, 2008; Camarata & Woodcock, 2006; Davenport et al., 2002; Gibb, Fergusson, & Horwood, 2008; Jones & Myhill, 2007; Keller-Margulis et al., 2015; Klecker, 2006; Logan & Johnston, 2010; Narahara, 1998).
Unlike the general consensus of the female advantage found in reading and writing, research in the area of mathematics has yielded less consistent results. Some studies have detected a male advantage (Camarata & Woodcock, 2006; Else-Quest, Hyde, & Linn, 2010; Lindberg, Hyde, Petersen, & Linn, 2010; McGraw, Lubienski, & Strutchens, 2006), but it has not been steadily found when age and ability are considered. Most studies have found that males begin to outscore females in high school and that the discrepancy increases with age. In addition, identified gaps are more consistent and pronounced in high-achieving students. Other studies have found negligible or no differences (Hyde, Lindberg, Linn, Ellis, & Williams, 2008; Lachance & Mazzocco, 2006; Scafidi & Bui, 2010; Scheiber, Reynolds, Hajovsky, & Kaufman, 2015).
Interestingly, there is greater consensus when studies are categorized by the manner in which mathematical ability is measured. Researchers have generally assessed gender differences using standardized measures of achievement, such as state/national test scores and individually administered tests, with results indicating that males outscore females on such comprehensive, formal assessments (College Board, 2010; Klecker, 2006; Scheiber et al., 2015; U.S. Department of Education, 2004). In contrast, when classroom assessments, grades, and overall grade point averages (GPAs) are used to evaluate gender differences in math, a number of studies indicate that girls perform better than boys (Corbett, Hill, & St. Rose, 2008; Duckworth & Seligman, 2006; Pomerantz, Altermatt, & Saxon, 2002; Voyer & Voyer, 2014). It is also noteworthy to consider the specific mathematical skill that is measured. Several studies have found that females tend to do better on tests that measure computation skills, whereas males outperform females on tasks that require mathematical reasoning (Gibbs, 2010; Scheiber et al., 2015; Wei et al., 2012).
Several hypotheses have been offered to explain why a male advantage is seen, including the possibility of a discrepancy in cognitive ability, a difference between males and females in their approach to schoolwork, and differences in adopted learning strategies, classroom behavior or self-regulation, math self-efficacy, and the planning and attention strategies of students (Casey, Nuttal, & Pezaris, 1997; Duckworth & Seligman, 2006; Kenney-Benson, Pomerantz, Ryan, & Patrick, 2006; Matthews, Ponitz, & Morrison, 2009; Naglieri & Rojahn, 2001). Despite the overall inconsistency in the literature base, most researchers have concluded that although males once exhibited higher math achievement and greater persistence than females, this gap has narrowed in recent years.
Purpose of the Current Study
Previous studies have examined gender differences in mathematics using summative, standardized assessments administered at one point in the school year (Gibb et al., 2008; Hyde et al., 2008). Others have formatively assessed math skills using report card grades of students, which are not standardized (Duckworth & Seligman, 2006; Kenney-Benson et al., 2006; Pomerantz et al., 2002). The current study differs from former research in that we have used a standardized, formative measure to assess student performance at three different points during a school year.
School districts throughout the United States are adopting RtI at an increasing rate. RtI is a powerful tool for educators, as it helps to identify gaps in learning so that instruction can be adjusted to meet the needs of students within both general education and special education curriculums. The use of CBM, particularly in the early academic skill areas, is a vital component in the RtI process. M-CBM scores are used as part of a universal screening process and as progress monitoring tools, and can be used to aid special education eligibility decision making. As these data are used to make such decisions, it is important to guard against any artificial overrepresentation of one gender that may be a result of the measure used. To extend the preliminary research that has investigated gender differences in CBM to other academic areas, this study addressed two research questions:
Method
Participants
The current study included 1,626 students in Grades 3 through 8 from both general and special education classes. There were an equal number of male and female participants in the sample (812 females and 813 males). Participants attended five different schools in a rural southeastern school district (three elementary schools, two middle schools) that enrolled 5,550 students in 12 schools. The ethnic breakdown of the sample was as follows: 94.4% Caucasian, 3.4% Hispanic, 1.4% African American, 0.5% Asian American, and 0.5% American Indian. Participants’ math skills were measured with M-CBM probes that were administered triannually to all students in the months of August, January, and May. On average, students in the district obtained proficient scores on statewide mathematics tests.
Measures
Each participant completed a mixed-skill, calculation-based M-CBM probe during the district’s regularly scheduled triannual benchmarks. As with other curriculum-based measures, M-CBM probes are sensitive to change, easy and quick to administer and score, and have a large number of alternate forms available (Shinn, 2005). The reliability and validity of these measures has been established in former research studies (Christ, Scullin, Tolbize, & Jiban, 2008; Clarke & Shinn, 2004; Fewster & Macmillan, 2002; Kelley, Hosp, & Howell, 2008). M-CBM probes consist of basic computational math problems, including addition, subtraction, multiplication, and division. The level of difficulty for each measure rises as the grade level increases, and probes are designed to align with Common Core standards for computation skills taught in the respective grade (Common Core State Standards Initiative, 2016). M-CBM probes are scored for correct digits (CD), which means students are able to receive credit for partially correct answers. When scoring CDs, every numeral in each answer is evaluated. For multidigit problems, each numeral must appear in the correct column to be scored as a CD. For example, if a student writes an answer of 14, but the correct answer is 13, he or she will receive one point instead of two, as the digit in the tens place is correct. Depending on grade level, students have either 2 or 4 min to work on each M-CBM probe. In the time allotted, students are instructed to answer as many math computation problems as possible. Each probe contains a substantial number of problems, making it unlikely that a student could finish within the time limit (Shinn, 2005).
Procedures
Researchers performed a secondary data analysis using M-CBM probes that were administered to all students during routine benchmark periods in the fall, winter, and spring of the academic school year. There were approximately 12 weeks of instruction between each benchmark period. All M-CBM probes were administered and scored by a trained benchmarking team that consisted of school psychologists, school counselors, and teaching assistants. Prior to the initial benchmarking period, each individual attended training sessions to learn standardized administration and appropriate scoring procedures. After a review of the administration and scoring procedures used for M-CBM, multiple practice samples were scored as a group. Also, each trainee was observed completing a practice administration. Both administration procedures and scores on practice samples were evaluated by trainers with the Accuracy of Implementation Rating Scale for M-CBM probes (AIRS-Math; Shinn, 2005). Prior to graduating from training, each trainee scored a probe that was also independently scored by a trainer. Interscorer agreement was calculated between trainer and trainee scores; a minimum of 95% agreement was required to exit the training program.
During each benchmarking period, M-CBM probes were group administered to each classroom. To ensure consistency, the same examiners were responsible for the same classrooms across the three benchmark periods. Examiners were instructed to pass out the M-CBM probes before reading the standardized instructions. Time limits were dependent on grade level: Students in Grade 3 had 2 min, whereas students in Grades 4 through 8 had 4 min. Once completed, all probes were immediately scored and entered into an online database. Because this was a secondary data analysis, researchers did not have access to the actual probes. Lack of access to the probes themselves prevented independent interrater reliability calculations on M-CBM scores.
Data Analysis
A causal-comparative cross-sectional and longitudinal design was used to test for gender differences. The same participants were used across the fall, winter, and spring benchmarking period while comparing scores within each grade level (3-8). Due to the grade-level variations in time limits and skills measured on M-CBM probes, researchers did not run statistical tests across grade levels.
A series of two-way repeated measures ANOVAs were used to examine differences in the M-CBM scores across the three administrations and genders in each grade level. These ANOVAs also examined the interaction between gender and time of year in each grade. The within-group variables were the M-CBM scores at each benchmark (time) and the between-group variable was gender. To minimize familywise alpha and reduce Type I error rates for the within-group comparisons, contrast codes were used to compare each benchmark against the previous administration (i.e., fall was only compared with winter, and spring was only compared with fall). These contrasts were also used to examine interactions between gender and time. Partial eta-squared was calculated to determine effect sizes of significant differences; these results were compared with the qualitative categories defined by Cohen (1988), with .02 being considered small, .13 medium, and .26 large.
Results
Homogeneity of Variance
We used Levene’s test to measure homogeneity of variance (see Table 1), which indicated that all but one M-CBM administration met the assumption of equality of variance between males and females. The normality assumption of the outcome measures were examined using the Shapiro–Wilk Test for each administration by gender and grade level. Results revealed that 16 of the 48 individual Shapiro–Wilk Tests violated the assumption of normality, which introduced the possibility that outliers could affect the study’s results. A trimmed analysis was conducted that removed any scores more than two standard deviations away from group-level means. All analyses were then conducted on the trimmed data and compared with the results from the full sample. There were no substantive differences, so we used the results from the full sample.
Tests of Between-Gender Homogeneity of Variance.
Note. Homogeneity of variance above examines the equality of variance between males and females for M-CBM administration across grade levels. M-CBM = math curriculum–based measure.
Main Effects of Administration Time
Results from the two-way repeated measures ANOVAs revealed a significant main effect for time in all grades (see Table 2). Within-group contrasts are summarized in Table 3. In Grades 3, 4, and 5, M-CBM scores significantly increased from fall to winter and from winter to spring. However, a different pattern emerged for Grades 6, 7, and 8. In these grades, M-CBM scores significantly increased from fall to winter, but significantly declined from winter to spring. An examination of the effect size measures reveals that the increases from winter to spring in Grades 3, 4, and 5 were generally larger than the losses from winter to spring in Grades 6, 7, and 8. In addition, the gains seen from fall to winter in all grades were generally larger than the magnitude of change seen in winter to spring scores.
Two-Way Repeated Measures ANOVAs: Within-Group Results.
Note. Results presented above are from six separate two-way repeated measures ANOVAs. Within-group F values for time and gender by time have the Huynh–Feldt correction applied.
Within-Group Contrasts for M-CBM Administration.
Note. Results presented above are specific within-group contrasts from six separate two-way repeated measures ANOVAs. The contrasts compared each M-CBM administration with the previous administration within a grade level. Thus, no contrasts were calculated for the fall administrations. M-CBM = math curriculum–based measure.
Main Effects for Gender
Results from the two-way repeated measures ANOVAs for the gender main effects are reported in Table 4. The analyses revealed that male and female scores were statistically equivalent in third and fourth grades, but females scored significantly higher in fifth grade. In the sixth-grade sample, both genders’ scores were again statistically equivalent, but females obtained significantly higher M-CBM scores than males in the seventh and eighth grades. Examining the effect size measures revealed that the magnitude of the differences between males and females were relatively small across all grades.
Two-Way Repeated Measures ANOVAs: Between-Group Results.
Note. Results presented above are from six separate two-way repeated measures ANOVAs. The between-groups F values for gender assume equality of variance. M-CBM = math curriculum–based measure.
Gender by Administration Time Interaction
The gender by administration time interaction results are summarized in Table 2. These results indicated that significant interactions between gender and administration time occurred in the third and fifth grades. An examination of Figure 1 and the post hoc comparisons indicated that in the third grade, males’ and females’ M-CBM scores were not significantly different during the fall—F(1, 192) = .03, p = .868, p2 < .01—and winter—F(1, 192) < .01, p = 1.000, p2 < .01—administrations, which was consistent with the gender main effect results. However, females scored significantly higher in third grade than males during the spring administration, F(1, 192) = 4.05, p = .046, p2 = .05. In fifth grade, post hoc comparisons indicated that females scored higher than males in the fall—F(1, 267) = 4.56, p = .034, p2 = .02—but males and females were not different in the winter, F(1, 267) = 3.45, p = .064, p2 = .01. In the spring, females again scored significantly higher than males—F(1, 267) = 13.50, p < .001, p2 = .05—with the effect size measures indicating that females performed notably better in the spring. There were no significant interaction effects found in Grades 4, 6, 7, and 8.

Average M-CBM scores for males and females during fall, winter, and spring administrations for Grades 3 through 8.
Discussion
This is a preliminary study examining gender differences on M-CBM probes for students in Grades 3 through 8. Previous studies have investigated gender differences in mathematics using summative, standardized assessments administered at one point in the school year (Gibb et al., 2008; Hyde et al., 2008). Others have formatively assessed math skills using report card grades of students, which are not standardized (Duckworth & Seligman, 2006; Kenney-Benson et al., 2006; Pomerantz et al., 2002). The current study differs from former research in that we have used a standardized, formative measure to assess student performance at three different points during a school year.
Researchers sought to uncover overall significant differences between boys’ and girls’ scores as well as to examine the pattern of any differences across grade level and time of year. The analyses yielded several significant effects. First, it was found that females scored higher than males in the third grade spring administration, the fall and spring administrations in fifth grade, and across all three administrations in Grades 7 and 8, with the largest differences between the genders being found in eighth grade. Second, a significant main effect for time showed that M-CBM scores changed from each benchmark period to the next. In all grades, for both males and females, the scores from the fall to winter administrations increased. These changes were generally larger in magnitude than the changes seen from the winter to spring administrations. In Grades 3, 4, and 5, the scores continued to increase from the winter to spring administrations, with the largest gains between these two benchmarking periods being in Grades 3 and 4. In Grades 6, 7, and 8, the scores between winter and spring declined significantly, although the magnitudes of these declines were rather modest.
Our first research question explored whether or not gender differences would be found in students’ performance on CBM assessments of mathematical skills. Results provided mixed support for the hypothesis that gender differences would be found in student M-CBM scores. Specifically, females outperformed males in nine of the 18 administration periods examined but did not consistently begin to outperform males until the seventh and eighth grades. That is, in Grades 3 through 6, females scored statistically higher than males in only three of the 12 benchmarks, but in Grades 7 and 8, females outperformed males in all six administration periods. Although these differences were statistically significant, the practical relevance of such differences should be explored in future research.
Although only one measure of mathematical ability was examined in this study, researchers were able to determine the extent to which M-CBM scores in the sample aligned with math scores on the summative, statewide assessment. Consistent with previous research, moderate correlations were found between the two measures, suggesting that M-CBM probes are reliable predictors of end-of-year test performance (r = .52 for the entire sample, r = .51 for females, and r = .53 for males). This relationship highlights the importance of the identified gender differences, given the value in using CBM to predict student performance on high-stakes tests. Interpretation of the found differences may be enhanced by considering how M-CBM scores compare with another measure of mathematical skill.
There have been no other studies to date that have used M-CBM probes to evaluate gender differences in student scores. Previous researchers have, however, identified a female advantage when report cards were used as an assessment tool (Duckworth & Seligman, 2006; Kenney-Benson et al., 2006; Pomerantz et al., 2002). Another group of researchers, Else-Quest et al. (2010) analyzed international data from two standardized assessments of mathematics achievement and found a small gender difference favoring boys in some nations. As this difference was small, however, Else-Quest et al. concluded that the math achievement scores for boys and girls were similar overall across most nations. Comparably, Hyde et al. (2008) found no difference between boys and girls in math achievement after analyzing end-of-grade assessment data across 10 different states for students in Grades 2 through 11.
The pattern of gender differences found in the current study might be explained in part by the findings of Camarata and Woodcock (2006). In examining gender differences in cognitive abilities, these researchers found that girls held a significant advantage over boys in the area of processing speed, a narrow reasoning ability that contributes to one’s overall intelligence score. The female advantage increased with age, peaking in adolescence. As M-CBM probes are timed, students are instructed to work quickly while completing the problems. Thus, any advantage that girls might hold over boys in processing speed could help explain why they scored higher on the M-CBM probes in the present study, particularly in the upper grades. If this is the case, M-CBM scores could potentially misidentify boys as needing math skills interventions and/or qualifying for special education as a result of this processing speed difference.
In addition to cognitive differences between boys and girls, gender differences in classroom behaviors might also help account for the female advantage obtained in this study. Several researchers (e.g., Gibb et al., 2008; Kenney-Benson et al., 2006) have found that teachers tend to describe boys as exhibiting higher levels of behaviors that are likely to have a detrimental impact on classroom performance, such as distractibility and restlessness, although rule compliance has been found to be similar for boys and girls (Adams, Ryan, Ketsetzis, & Keating, 2000). As the M-CBM probes used in this study were administered to large groups of students in their own classrooms, it is possible that distractibility, disengagement, or lack of motivation contributed to the boys’ poorer performance.
Additional studies have found that boys do tend to perform better than girls on standardized assessments of math skills (College Board, 2010; U.S. Department of Education, 2004). This may lead one to expect that boys would do better on the M-CBMs, which are also standardized measures of performance. However, the M-CBM probes, which represent a type of formative assessment, differ from the summative standardized assessments that former researchers have used when evaluating gender differences. First, M-CBMs are administered at multiple points during the school year and in familiar classroom settings. Second, many end-of-grade standardized assessments are presented to students in a multiple-choice format, but the M-CBM probes require students to write in their own answers, minimizing the possibility of getting a correct answer from guesswork. Third, M-CBMs are quite brief and only measure basic math fact skills, whereas an end-of-grade exam and other standardized summative measures of achievement are typically lengthy and evaluate more complex math reasoning abilities.
The second research question investigated the specific grade of onset when gender differences on M-CBM probes may arise. Our results suggest that although inconsistent differences between males and females occur in Grades 3 through 6, a female advantage was consistently found starting in seventh grade. Thus, although these findings partially support our hypothesis that girls’ would score higher at each grade level, the results prevent us from formulating a definitive conclusion regarding the onset of the girls’ advantage.
Our second research question also examined the patterns of gender differences across time of year. Results indicated that males and females both followed the same overall trends regarding the extent to which their performance changed from benchmarking periods. Specifically, males’ and females’ scores improved from the previous benchmarking period in Grades 3, 4, and 5, with the highest scores occurring in the spring administration. In Grades 6, 7, and 8, scores for both males and females increased from fall to winter but decreased from winter to spring, with the highest scores occurring in the winter administration. The increases found from the fall to winter benchmarking periods were generally larger in magnitude than the changes seen from the winter to spring benchmarking periods.
Limitations and Directions for Future Research
To our knowledge, this has been the first study to evaluate gender differences in performance using M-CBM probes. The results highlight the need to continue investigating this topic to ensure valid decision making in schools. Several limitations of this study are worth considering. The sample population used in the current study consisted of students in Grades 3 through 8. Future studies should determine whether or not significant differences exist in math computation skills below third and above eighth grade. By expanding the population of interest, we can more directly address the age of onset question and better examine gender differences within a broader developmental framework.
Mathematical ability was evaluated using only M-CBM probes, which are limited in that they only measure computational skills. There are, of course, other important components to mathematical ability that may be assessed by other curriculum-based measures. Future studies could examine gender differences on CBM probes that measure other aspects of mathematical ability, such as math reasoning and problem-solving skills.
Earlier findings (Kenney-Benson et al., 2006) suggest that girls tend to report poorer self-efficacy for math and less confidence when taking high-stakes assessments than do boys. Furthermore, Pomerantz et al. (2002) found that girls reported higher levels of distress and worried more often than boys about their academic performance. Thus, it would be wise to include measures of math anxiety and self-efficacy and perhaps general academic worry and self-regulation in future studies of gender differences in math. Doing so would elucidate the contributions that such personal qualities make to gender differences in performance on different types of math measures (e.g., brief CBM vs. lengthy, high-stakes, end-of-grade tests).
Another possible limitation of the current study is that researchers used a cross-sectional design to compare students’ scores across each grade. A longitudinal design would yield data on the same group of students across time. This might be an important direction for future research, as it would provide information on how mathematical ability changes through the course of development, and the pattern of gender differences across a specific group of students could be assessed.
Additional limitations should be noted regarding the characteristics and origin of the data set. Participants attended five schools in a small, rural school district that was predominantly Caucasian in nature. For results to generalize across settings, future research should aim to study more diverse samples of students who better represent the population at large. The secondary data set that was analyzed in this study was collected by school-based practitioners. Although researchers were able to verify that all members of the benchmarking team were rigorously trained to administer and score the measures, it was not possible to calculate interrater reliability on M-CBM scores. Although the nature of the data set is characteristic of what typically occurs in schools and has ecological validity, readers should consider these limitations prior to drawing conclusions.
Implications for Practice
Understanding where gender differences arise in all areas of academic achievement can help teachers establish appropriate instructional strategies for each student. Many school systems across the United States are adopting the RtI process for determining special education eligibility. CBMs are often used to universally screen for students who are at risk of failure in a certain academic area. The current study has found a female advantage on M-CBM probes at certain grades and times, which could mean that boys are more likely to be identified as at risk in this academic area. As noted in this discussion, however, additional research is needed to determine the source of these gender differences so that more substantive recommendations can be made. For example, if future research reveals that these differences result from true differences in math skill levels, then identifying more boys than girls as being at risk and in need of targeted math intervention would be justified. If, however, these differences are found to result from gender-related behavioral, attitudinal, self-regulatory, or motivational factors associated with how M-CBM probes are administered, then it might be appropriate to change the context for administration (e.g., from group administration to individual) or to establish separate M-CBM norms for boys and girls for the purposes of determining at risk status.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
