Abstract
Gender differences in math-related professional achievements have been identified as a worldwide problem. Academic achievement assessments, however, have repeatedly revealed gender similarities. The observed gender similarity might be due to biased assessments that heavily rely on reading skills, which favors girls. The current study analyzed 29 international and within-country datasets representing a total of 9,471,692 students from 1,456 regions through four typical, large-scale student academic achievement assessments. The results showed a gender difference in mathematics achievements of greater than 0.76 (Cohen's d), favoring boys for each dataset after controlling for general reading achievements. The gender difference in mathematics achievements favoring boys exceeded 0.35 in each region, with a mean of 0.70 for 79 countries or jurisdictions in the 2018 Programme for International Student Assessment (PISA 2018) after controlling for general reading achievements. Dataset- and region-level gender differences are robust, suggesting that there is a clear gender difference in mathematics achievements that previous analyses have not identified due to the effect of reading achievements differences.
Introduction
Gender differences in mathematics performance have long attracted public interest and academic research. Women are underrepresented in professional positions in the fields of science, technology, engineering and mathematics (STEM; Else-Quest et al., 2010; Stockard et al., 2021). However, school-organized academic achievement assessments have repeatedly found only minor gender differences in mathematics achievements or a gender difference favoring girls (for a meta-analysis, see Fryer & Levitt, 2010; Kenney-Benson et al., 2006; for reviews, see Kimball, 1989; Lindberg et al., 2010). Large-scale international or within-country investigations and associated meta-analyses have shown trivial gender differences in this regard (Else-Quest et al., 2010; Hyde, 2005; Hyde et al., 2008; Hyde & Linn, 2006; OECD, 2010). The gender effect size d (calculated as the mean for males minus the mean for females, divided by the pooled within-gender standard deviation) is typically close to zero (d < 0.10) or small (0.11 < d < 0.35). Based on this minor difference, a gender similarity hypothesis that male and female individuals perform similarly in mathematics was proposed (Hyde, 2014; Hyde & Plant, 1995).
Girls’ apparent mathematics skills might be related to their advantage in reading. Reading and language can significantly contribute to mathematic achievements, and numerous studies have demonstrated a close relationship between language processing and mathematics performance (e.g., Dowker et al., 2008; Hecht et al., 2001; Imbo et al., 2014; Koponen et al., 2007; Purpura & Ganley, 2014; Vukovic & Lesaux, 2013; Wei et al., 2012). Performance in arithmetic calculations (e.g., Wei et al., 2012; Yang & Meng, 2016) and word-problem solving (e.g., Fuchs et al., 2018; Fuchs et al., 2020) both rely on reading ability. For example, a recent study found that for both boys and girls aged 8 to 10 years, their reading abilities were significantly positively correlated with their arithmetic performance (Wei et al., 2012). Specifically, girls outperformed boys in arithmetic calculations, but this advantage in arithmetic disappeared merely after controlling for differences in reading ability. Reading ability may promote learners’ understanding of mathematical knowledge (e.g., mathematical terminology, principles, and rules), since the acquisition of this knowledge involves semantic processing (e.g., Li et al., 2019; Liu et al., 2019; Zhou & Zeng, 2022). The verbalized mathematical teaching approach helps students acquire mathematical knowledge (OECD, 2015). Current academic assessments usually rely heavily on rote knowledge that requires language skills, reading abilities, and memory (Reynolds et al., 2015). Hence, if reading differences are controlled for, a gender gap in mathematics achievements may be more apparent, and this finding would be consistent with the gender difference that has been observed at the professional level (Stockard et al., 2021).
The current study explored the potential that there is a hidden gender difference in mathematics achievement, which is covered by gender differences in language abilities. On the one hand, we agreed with the similarity hypothesis according to students’ mathematical achievements at school (see review in Lindberg et al., 2010). On the other hand, we proposed that gender differences or gender dissimilarities in mathematics achievements might be more obvious when language components are separated from mathematics components. Similar to the analysis in previous studies (e.g., Hyde et al., 2008; Reynolds et al., 2015), the current study used large-scale international and within-country datasets to investigate gender differences in mathematics achievements at the dataset and region levels (by country or jurisdiction). This study used 29 datasets assessing reading and mathematics achievements through four large-scale student assessments: the National Assessment of Educational Progress (NAEP), the Programme for International Student Assessment (PISA), the Trends in International Mathematics and Science Study (TIMSS), and the Progress in International Reading Literacy Study (PIRLS). Gender differences in mathematics achievements were examined with and without controlling for reading. Students’ reading scores reflect more than their reading ability; they can also relate to students’ socio-economic status (e.g., Berkowitz et al., 2017; Sastry & Pebley, 2010), achievement motivation (e.g., Kriegbaum et al., 2018; Zhang et al., 2018), and intelligence (e.g., Peng et al., 2019; Ritchie & Bates, 2013). Therefore, controlling for reading scores allowed us to identify gender differences in mathematics achievements independently from reading abilities.
Materials
First, to calculate the dataset-level effect sizes of gender differences in mathematics achievements without or with controlling for reading abilities, a total of 29 datasets with both mathematics and reading scores in the same year were analyzed. These datasets represented 1,466 regions (countries or jurisdictions) and about 9,471,692 students. The average scores and standard deviations for boys’ and girls’ scores from each region in each dataset were downloaded from the datasets’ respective websites, and gender's effect size on mathematics achievements was analyzed for each dataset (see Table 1 for sample details).
Samples in the current analysis.
Note: The table above presents the sample size of the study's dataset-level analysis, and each region is assessed individually. The number of students included in the mathematics and reading assessments was obtained from the technical reports pertaining to each dataset. For PISA, TIMSS, and PIRLS 2011, individual students completed both reading and mathematical assessments; 3,046,165 students are represented in PISA, and 185,475 students are represented in TIMSS and PIRLS 2011. For the NAEP, individual students only completed either reading or mathematical assessment, and 6,240,052 students are represented. Hence, this analysis included a total of 9,471,692 students.
Second, to analyze gender's region-level effect size, the scores of each student in the PISA 2018 were downloaded. The PISA 2018 dataset includes data for 606,627 individual students from a total of 79 countries or jurisdictions, and it includes gender, and scores for mathematics and reading scores of each student.
NAEP
NAEP is the largest national assessment in the United States. It targets students in the fourth, eighth, and 12th grades, and it assesses their understanding in the subject areas of mathematics, reading, science, writing, technology and engineering literacy, arts, music and visual arts, civics, geography, economics, and US history. NAEP uses a balanced incomplete block approach to allow all its items to be completed by a representative sample of students, while individual students only complete a subset of the NAEP items for a single subject area.
There were ten datasets representing fourth-grade students, nine datasets representing eighth-grade students, and two datasets representing 12th-grade students (see Table 1). A total of 6,240,052 students were included in NAEP (National Center for Education Statistics, 1992–2019), with 3,130,914 students taking the mathematical assessments and 3,109,138 students taking the reading assessments.
PISA
PISA is an international assessment administered every 3 years. It tested the 15-year-old students’ abilities mainly in reading, mathematics, and science, aiming to evaluate how well students can apply their knowledge and skills to their future lives.
The PISA data comprise seven successive datasets (i.e., PISA 2000, PISA 2003, PISA 2006, PISA 2009, PISA 2012, PISA 2015, and PISA 2018; OECD, 2000–2018) and represent a total of 3,045,165 students. In the current study, data from the PISA 2018 dataset (OECD, 2020), including mathematics and reading scores for individual students, were used to analyze gender's region-level effect size on mathematics achievements.
TIMSS and PIRLS 2011
TIMSS and PIRLS are both international datasets. TIMSS has been conducted every 4 years since 1995, and it measures the mathematics and science understanding of students in the fourth and eighth grades. PIRLS has been conducted every 5 years since 2001, and it measures fourth-grade students’ reading comprehension. In 2011, 34 countries and three benchmarking entities administered both the TIMSS and PIRLS assessments to the same samples of fourth-grade students, providing a unique opportunity to analyze the relationships between fourth-grade students’ reading and mathematics achievements. The current study separately obtained data from the TIMSS 2011 International Dataset (National Center for Education Statistics, 2011a) and the PIRLS 2011 International Dataset (National Center for Education Statistics, 2011b) and combined them into a dataset representing 185,475 students. We omitted data from Botswana and Honduras since these countries had administered these assessments to sixth-grade students, rather than fourth-grade students. Therefore, the combined dataset (of the TIMSS 2011 and PIRLS 2011 datasets) included samples from 34 countries and one benchmarking entity.
Data analysis
This study focused on gender differences in mathematics achievements without or with controlling for reading. The effect size (Cohen's d; Cohen, 1977) was selected as the index for gender differences.
Dataset-level gender difference
The data normality in each dataset was firstly examined with the Shapiro–Wilk normality tests, because it provided high statistical power regardless of sample size (e.g., Ghasemi & Zahediasl, 2012). In datasets that normality assumption were not violated, the parameter t-tests were used for effect size analysis of gender differences, whereas the nonparametric Mann–Whitney tests were used in non-normal data. With the Shapiro–Wilk normality tests, 16 out of 29 datasets (55.17%) did not violate the normality assumption.
For datasets that did not violate the normality assumption, gender differences were calculated using paired t-tests since the mean scores of boys and girls for each region can be regarded as correlated variables (see the r values for the Pearson correlations coefficients between the mathematics scores of boys and girls in each dataset in Table 2). We determined the 16 effect sizes of gender on mathematics based on the means scores for boys and girls in a region (country or jurisdiction) based on the following formula (Dunlap et al., 1996):
Dataset-level population correlations (r) and effect sizes (d) of gender differences in mathematics after and without controlling for reading abilities.
Note: N = numbers of regions (countries or jurisdictions); r = Pearson correlation coefficient across pairs of mathematics scores for boys and girls in each normal dataset; t = t value of the paired t test; z = z value of the Mann–Whitney test; d = the effect size of gender differences.
PISA assesses 15-year-old students, rather than students in a certain grade.
Gender differences in dataset with non-normal data were examined with the nonparametric Mann–Whitney test.
*p < .05, **p < .01, ***p < .001.
In 13 datasets with non-normal data, we performed nonparametric Mann–Whitney test to examine the gender differences. Then, we computed the effect size R using the formula:
To calculate the gender differences and gender's effect size on mathematic skills after controlling for reading abilities, we first conducted a median regression analysis of reading on mathematics in each dataset, using reading scores as the predictor and the mathematics scores as the dependent variable. The residual in this regression analysis referred to the mathematical part that cannot be explained by reading, and it was used here as the mathematics score after controlling for reading abilities. Then, we can calculate another set of gender differences and corresponding effect sizes using the above formula.
Region-level gender difference in PISA 2018
The analysis focused on the most recent large-scale international dataset, PISA 2018. Region-level gender differences were analyzed using individual students’ data, rather than individual regions’ data. PISA 2018 estimates each student's performance with ten plausible values rather than a single score. We averaged the ten plausible values for each student's abilities in both reading and mathematics. The average scores were used as the final estimated reading and mathematics scores for each student.
Since the PISA 2018 dataset has a large sample size, we examined gender differences with the parametric tests (e.g., Ghasemi & Zahediasl, 2012), and we focused on the effect size, rather than the significant p values. To analyze the region-level gender differences, we calculated gender's effect size (d) according to the method described by Cohen (1977). This effect size was the mean for boys minus the mean for girls, divided by the pooled within-groups standard deviation as per the following equation:
Results
The 29 datasets collected from NAEP, PISA, and TIMSS and PIRLS 2011 were first analyzed to calculate gender differences in mathematics achievements at the dataset-level, both without controlling for reading abilities’ impact and while controlling for reading abilities’ impact, and each country or jurisdiction was regarded as an individual sample. The dataset-level effect size of gender differences (d) on students’ mathematics scores without controlling for reading abilities was 0.22 (SD = 0.17), ranging from − 0.13 to 0.57. After controlling for reading abilities, the average effect size was 1.61 (SD = 0.35), ranging from 0.76 to 2.22, favoring boys (see Figure 1, Table 2).

The dataset-level effect sizes of gender differences in mathematics in NAEP, PISA, and TIMSS and PIRLS with and without controlling for reading abilities
We next focused on the PISA 2018 dataset, exploring gender's region-level effect size. Here, each student was regarded as an individual sample. In analyzing region-level gender differences, we examined the presence of gender differences in each country or jurisdiction both without controlling for reading abilities and while controlling for reading abilities. Reading's effect size was − 0.34 (SD = 0.12), ranging from − 0.73 to − 0.10 (all p values < .001). The original gender effect size in mathematics was 0.04 (SD = 0.11), ranging from − 0.26 to 0.30 (56 of 79 p values < .05). After controlling for reading abilities’ impact, gender's effect size on mathematics was 0.70 (SD = 0.14), ranging from 0.35 to 1.04 (all p values < .001). Figure 2 shows the effect size distributions of the reading score, original mathematics scores (not controlled for reading abilities), and mathematics scores after controlling for reading abilities for all countries or jurisdictions in PISA 2018.

The region-level effects of gender differences in reading scores, original mathematics scores (without controlling for reading abilities), and mathematics scores after controlling for reading abilities for all regions in PISA 2018.
Discussion
The aim of this investigation was to determine if there is gender difference in mathematics achievements without or with controlling for reading abilities. We hypothesized that our data would reveal a robust gender difference in students’ mathematics achievements after controlling for their reading abilities. The results showed that there were enlarged gender differences in mathematics after controlling for reading abilities (see Table 2 and Figure 1 for dataset-level gender differences; see Figure 2 for region-level gender differences), and this difference was consistently evident in our results across datasets and grades.
Our analysis revealed a small original gender difference in students’ mathematics achievements, and this finding is consistent with previous studies that have observed either no gender difference or small gender difference in this regard (e.g., Hyde et al., 2008; Stoet & Geary, 2013). Previous studies have also observed a salient gender effect for the high-performers (e.g., Hyde et al., 2008; Stoet & Geary, 2013), and the PISA results also indicated that the gender gap in mathematics is much wider among top-performing students than among low-performing students (OECD, 2015). Interestingly, top-performing students have exhibited the smallest gender difference in reading abilities (Stoet & Geary, 2013). Therefore, we can reasonably infer that the narrower gender gap in reading abilities contributes to the wider gender gap in mathematics achievements among top-performing students.
These results indicated that gender difference in students’ mathematics achievements are masked by differences in reading abilities. Previous studies on gender differences have typically found that girls perform better than boys in reading (e.g., Breda et al., 2018; Breda & Napp, 2019; Guiso et al., 2008), but only minor gender differences have been observed for in-school mathematic performance (e.g., Fryer & Levitt, 2010; Robinson-Cimpian et al., 2014). Women's underrepresentation in math-related professional careers has typically been believed to result from social inequalities (cultural or economic inequalities). Although reading scores were controlled for in the current study, other non-verbal factors that are related to both reading and mathematics abilities may also have been controlled for, such as socio-economic status (e.g., Berkowitz et al., 2017; Sastry & Pebley, 2010), achievement motivation (e.g., Kriegbaum et al., 2018; Zhang et al., 2018), and intelligence (e.g., Peng et al., 2019; Ritchie & Bates, 2013). Thus, we were able to assess isolate gender differences in mathematics abilities more precisely. Our results suggested that girls and boys perform similarly in mathematics; however, the underlying mechanisms explaining their performance might differ. These results help understand women and girls’ underrepresentation in STEM-related fields in a new way, suggesting that mathematics abilities that are independent of reading abilities may contribute to women and girls’ underrepresentation, though women and girls can leverage their reading advantages to promote their mathematics performance.
Implications of this study's findings
The gender gap revealed in the current study may lead to stereotype threat for girls in the field of mathematical learning. Stereotypes threats can negatively affect children's learning (e.g., Appel & Kronberger, 2012; Keller, 2007; Rydell et al., 2010), suggesting that we should also avoid exaggerating gender differences in mathematics achievements. From another point of view, our results suggest that gender differences in mathematics achievements can be reduced through certain policies or educational approaches. Educator should pay attention to both language component and symbolic component in mathematics education at the same time. The development of mathematical abilities might be promoted by improving reading abilities, and verbalized approaches to teaching mathematics should be emphasized in mathematical education. For instance, boys and girls who perform poorly in mathematics at school may benefit from a language-supported, verbalized teaching approach that focuses on mathematics knowledge and mathematics vocabulary. Second, the observed gender similarity in mathematical performance should be emphasized, and social equality is required in educational opportunities and social resource allocation.
Limitations and future research directions
The current study faced some notable limitations. First, although it used data from large-scale student academic achievement assessments across years and grades, paired reading and mathematics scores were collected from comprehensive tests, and the causal link between the gender gap in reading abilities and mathematics achievements should be interpreted cautiously. Further studies could design mathematics assessment materials with different amounts of verbal components and examine gender differences in mathematics tests using different amounts of verbal content. Second, due to the current study's design, this study did not incorporate cognitive covariates. In further studies, researchers could consider using a longitudinal experimental design with covariates (e.g., intelligence, socio-economic status, and achievement motivation) to explore whether mathematics achievements reflect a gender gap. Additionally, further studies could use mathematics and reading tests that are comparable across grades to examine gender differences across grades.
Conclusion
Overall, our analysis contradicted the previously reported gender similarity in school academic mathematics assessments. When correcting for differences in reading abilities, we found an obvious and robust gender difference in students’ mathematics achievements. Further studies should investigate the underlying mechanism that explains the gender differences that we have observed.
Supplemental Material
sj-docx-1-spi-10.1177_01430343221149689 - Supplemental material for Assessing gender difference in mathematics achievement
Supplemental material, sj-docx-1-spi-10.1177_01430343221149689 for Assessing gender difference in mathematics achievement by Yujie Lu, Xuan Zhang and Xinlin Zhou in School Psychology International
Footnotes
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics statement
Ethical approval was not sought for the present study, because it was based on the reuse of anonymous open data from large-scale student's assessments: the National Assessment of Educational Progress (NAEP), the Programme for International Student Assessment (PISA), the Trends in International Mathematics and Science Study (TIMSS), and the Progress in International Reading Literacy Study (PIRLS).
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
