Abstract
Executive function (EF) training has shown promise for remedying general EF deficiencies faced by students with mathematics difficulty (MD) and for improving their performance. However, latest research also suggests that the instant and sustained effects of EF training remain inconsistent. In this study, 32 Chinese students with MD, age 7 to 10 years, were recruited and randomly divided into two groups: the training group (n = 16, 25 training sessions) and the control group (n = 16). Both groups took a pretest, a posttest, and a follow-up test (after 6 months) on EF, fluid intelligence, and mathematics skills. In the posttest, the training group’s performance significantly improved in 2-back, number shifting, letter shifting, calculation fluency, and mathematics problem-solving tasks, but not in Stroop, Flanker, 1-back, numerical operations, and colored progressive matrices tasks. In the follow-up test after 6 months, the effects of training only on the 2-back and letter shifting tasks were sustained. The effect on the numerical operations task appeared; however, the effects on number shifting, calculation fluency, and mathematics problem-solving tasks disappeared. The results of this study show that EF training has instant effects of improving EF and mathematics skills of students with MD, and 6-month sustained effects on some of the improved skills. However, for fluid intelligence, the effects may be very limited.
Keywords
A specific learning disability is defined in the Diagnostic and Statistical Manual of Mental Disorders (5th edition; DSM-5; American Psychiatric Association, 2013) as a neurodevelopmental disorder characterized by persistent difficulties in learning and using academic skills, including reading, written expression, and mathematics. Only 3%–8% of school-age students have been diagnosed with a specific learning disability in mathematics (Devine et al., 2013; Geary, 2004). However, many more students may experience mathematics difficulty (MD) without an official disability diagnosis (Powell & Driver, 2015; Powell et al., 2020), that is, not all students with MD would meet a DSM-5 clinical diagnosis as having a specific learning disability (Tannock, 2014). Learning difficulty can be described as learning deficits or general academic problems in reading, writing, and mathematics (Andersson, 2010). A student with a learning difficulty tends to perform lower as compared to their typically developing peers. Mathematics difficulty is a common type of learning difficulty (Jordan et al., 2003). It refers to lower performance in mathematics that is not explained by an official disability diagnosis and not caused by motivational or intellectual problems (Mazzocco, 2007; Nelson & Powell, 2018; Powell et al., 2020; Zhang et al., 2018). In this study, we use the term MD to describe students who score significantly below (< 25 percentile) the grade-level average on mathematics achievement tests in the absence of a specific disability diagnosis. According to compulsory education laws and inclusive education practices in China, students with MD are enrolled in general education classrooms (Deng & Poon-McBrayer, 2012). However, students with MD are not provided with individualized education in regular schools in China (Ministry of Education of P.R. China, 2014; Wang & Mu, 2014). There are few intervention services for the individual learning needs of school-age students with MD. Therefore, it is crucial to develop effective interventions for students with or at risk of MD.
Deficits in executive function (EF) are often identified as targets of intervention to help students who struggle with mathematics (Andersson & Lyxell, 2007; Bull & Scerif, 2001; Deng et al., 2022; Peng et al., 2012, 2018; van der Sluis et al., 2004). EF refers to a collection of top-down control processes that help suppress interference, maintain information, and switch rules to accomplish a task or goal in a more flexible and optimized manner (Diamond, 2013). Miyake et al. (2000) divided EF into three sub-components: inhibition, updating, and shifting. First, inhibition is the ability to suppress irrelevant information and unsuitable dominant or automated responses; second, updating is the ability to update, retain, and manipulate information in the working memory (WM); and finally, shifting is the ability to flexibly switch between different tasks, operations, or mental sets. Executive functions are pliable and can be improved through training (Zelazo & Carlson, 2012; Zhang et al., 2018). Therefore, researchers have studied EF training effects on students with MD, mainly focusing on whether training can improve their EF and other related cognitive abilities.
Executive Function Training Effects
There is some evidence indicating that EF training has effects on students’ EF. This evidence is gathered from studying not only typically developing students (Johann & Karbach, 2020; Karbach & Kray, 2009; Thorell et al., 2009; Traverso et al., 2015; Witt, 2011) but also students with learning difficulty or a specific learning disability (Abo-Ras et al., 2018; Alloway & Alloway, 2009; Chen et al., 2017; Lotfi et al., 2020; Malekpour & Aghababaei, 2013; Peng & Fuchs, 2015; Wu et al., 2019; Zhang et al., 2018). Evidence also suggests that EF training can improve other cognitive functions closely related to EF, such as fluid intelligence (Au et al., 2015; Chen et al., 2017; Jaeggi et al., 2011; Karbach & Kray, 2009; Pugin et al., 2014; Wang et al., 2019; Zhao et al., 2011) and mathematics skills (Alloway & Alloway, 2009; Holmes & Gathercole, 2013; Kroesbergen et al., 2012; Layes et al., 2017; Wu et al., 2019; Zhang et al., 2018).
Effects on fluid intelligence
Fluid intelligence refers to the ability to solve novel and abstract problems using logical reasoning without relying on existing knowledge and experience (Carpenter et al., 1990; Van Aken et al., 2016). In the EF model established by Diamond (2013), fluid intelligence is a complete synonym for reasoning and problem-solving, two high-level EF components. As a comprehensive cognitive process involving information maintenance, anti-interference, and strategy transformation, EF is considered as the basis of fluid intelligence (Blair, 2006; Colom et al., 2006). Research has shown that EF and fluid intelligence are moderately to highly correlated with WM, with updating as the most profound indicator (Engle et al., 1999; Friedman et al., 2006; Mogle et al., 2008; Van Aken et al., 2016). Some studies have further explored the mechanism underlying the potential relationship between WM and fluid intelligence. Short-term memory storage (Colom et al., 2008) and secondary memory process, such as retrieval and search (Mogle et al., 2008), are found to be the driving forces behind the relationship.
The effect of EF training on fluid intelligence has been investigated in many studies. However, the results have been inconclusive. Several studies reveal that fluid intelligence and EF may be improved simultaneously with EF training for typically developing students (Au et al., 2015; Jaeggi et al., 2011; Karbach & Kray, 2009; Pugin et al., 2014; Zhao et al., 2011), low-achieving students (Wang et al., 2019), and students with a learning disability (Chen et al., 2017). For example, Chen et al. (2017) adopted a computerized WM updating training for 20 days, 45 min every day, among 27 students with a learning disability ages 9 to 11 years. After training, they found that fluid intelligence improved significantly in the training group. However, some studies failed to find any improvement in fluid intelligence even after several months of EF training (Ang et al., 2015; Holmes et al., 2009; Layes et al., 2017; Malekpour & Aghababaei, 2013). For instance, in a study by Ang et al. (2015), two groups of children ages 6 to 7 years with low WM and mathematics performance received Cogmed training and updating training, respectively, and the two training groups showed no significant improvement in fluid intelligence. The fact that previous studies have revealed inconsistent conclusions about EF training effects on fluid intelligence may be due to different focuses of training, such as WM (e.g., Chen et al., 2017), EF (e.g., Wang et al., 2019), and task-switching (e.g., Karbach & Kray, 2009). The studies also included different types of participants, such as typically developing students (e.g., Au et al., 2015), low-achieving students (e.g., Wang et al., 2019), and students with a learning disability (e.g., Malekpour & Aghababaei, 2013). However, few studies have examined the EF training effects on fluid intelligence for students with MD.
Effects on mathematics skills
Mathematics skills depend on the learning of mathematics knowledge as well as the development of general cognitive ability, including EF (Bull et al., 2008; Cragg et al., 2017; Toll et al., 2011). Calculation and mathematics problem-solving are important components of mathematics skills (Fritz et al., 2019; Huang & Chen, 2016). Research has found that students with MD lag behind typically developing students in calculation and mathematics problem-solving (Tolar et al., 2016; Wei et al., 2013; Zhang et al., 2019). Calculation requires short-term numerical storage and arithmetical processing of numerical input, followed by numerical output (Brainerd, 1983). Mathematics problem-solving involves complex processes, such as cognitive characterization, formulation planning, and algorithm selection (Fritz et al., 2019). These processes rely on EF. Specifically, inhibition ability is needed to suppress (a) the automatic activation of incorrectly calculated or unsuitable dominant responses and (b) the interference of irrelevant information, to find suitable and effective strategies (Passolunghi & Siegel, 2001; Swanson & Beebe-Frankenberger, 2004). Updating ability is needed to filter, maintain, and manipulate numbers, operators, and number facts; interpret problem statements; generate problem representations; and retrieve mathematical facts (Wang et al., 2019; Wiley & Jarosz, 2012). Shifting ability is needed to flexibly switch between different tasks and strategies to adapt to the current problem-solving needs by changing the calculation rules between different problems or choosing multiple methods to solve the same calculation problem (Wen et al., 2007; Yeniad et al., 2013).
Despite debate within the literature, there is some evidence that EF training can improve mathematics skills (Alloway & Alloway, 2009; Holmes & Gathercole, 2013; Kroesbergen et al., 2012; Layes et al., 2017; Wu et al., 2019; Zhang et al., 2018). For example, Alloway and Alloway (2009) implemented an 8-week WM training for eight students with learning difficulties, age 13 years. Post training, the students showed a significant improvement in WM tasks and mathematics performance. However, some studies failed to find any improvement in mathematics skills after EF training (Alloway et al., 2013; Ang et al., 2015; Dunning et al., 2013; Gray et al., 2012; Nelwan & Kroesbergen, 2016; Wang et al., 2019). For instance, Wang et al. (2019) adopted WM and inhibitory control tasks for EF training for two groups of low-achieving 7-year-old children. After training, there were no significant training effects on mathematics skills for either the WM training group or the inhibitory control training group.
The inconsistent results of EF training on mathematics skills might be explained by the fact that mathematics is a multi-dimensional structure, including calculation, mathematics problem-solving, and other mathematics skills (Geary, 2006). The effect paths for different EF components may vary depending on the targeted mathematics skills (Cragg & Gilmore, 2014; Guo et al., 2018). According to the hierarchical framework of mathematics by Cragg et al. (2017), EF could influence mathematics achievement through mathematics competencies in specific domains, such as factual knowledge, procedural skills, and conceptual understanding. They suggested that varied effect paths of different EF components exist. For example, updating had a direct effect on mathematics and an indirect effect on mathematics through factual knowledge, procedural skill, and conceptual understanding. In contrast, inhibition had an indirect effect on mathematics through factual knowledge and procedural skills only. Shifting, however, had neither a direct nor an indirect effect on mathematics. Simanowski and Krajewski’s (2019) structural equation model indicated that updating significantly affects number word sequence and promotes the learning of rote digital material; inhibition and switching significantly affect number word linkages and promote the deep understanding of numbers. The results above indicated that different components of EF contribute to mathematics achievement through different domains of competencies and different numerical abilities. However, most studies have discussed EF training effects on mathematics performance generally or on a holistic level (e.g., the effects on standardized mathematics tests, Alloway et al., 2013; the effects on mathematics academic examinations, Chen et al., 2017). Few studies have discussed the effects of training on specific mathematics skills (e.g., numeracy skills, Kroesbergen et al., 2012) or compared the effects of training on different mathematics skills.
In this study, we provided training on three components of EF, namely, inhibition, updating, and shifting. According to the hierarchical framework of mathematics proposed by Cragg et al. (2017), improving these three components may promote the development of children’s factual knowledge, procedural skills, and conceptual understanding, which in turn might improve their performance on different mathematics skills, such as calculation and mathematics problem-solving. Furthermore, considering that the improvement of mathematics performance as a result of EF training does not necessarily appear immediately and may take some time to manifest (Gunzenhauser & Nückles, 2021; Holmes et al., 2009), this study also focuses on the sustained effects of training.
Sustained Effects of EF Training
Sustained effects of EF training refer to whether training can result in long-term benefits for individuals, and they are important measures of the effectiveness of EF training. Sustained effects of EF training are supported by some studies (Alloway et al., 2013; Ang et al., 2015; Chen et al., 2017; Holmes et al., 2009). For example, Alloway et al. (2013) implemented computerized WM training for 23 students with learning difficulties, age 11 years. The result showed that the training group improved on untrained tasks of WM, vocabulary, and spelling. They administered a follow-up test to all the students who completed all the training sessions and found that the training effects on WM and vocabulary tasks remained after 8 months. Chen et al. (2017) implemented WM updating training for children with learning disabilities, ages 9 to 11 years. They found that the effects on updating ability and fluid intelligence can be sustained for at least 6 months. However, some studies have yielded inconsistent results. For instance, in the study by Ang et al. (2015), 6- to 7-year-old children with poor WM and mathematics performances showed a significant improvement in WM in the instant posttest after WM training, but the improvement disappeared in the follow-up test 6 months later. As the sustained effect of training is crucial for the development of the future cognitive functions of students, based on these findings, there is a need for further research to determine the sustained effects of EF training.
Purpose of the Present Study
The literature suggests that EF training may improve inhibition, updating, and shifting ability, as well as fluid intelligence, and mathematics skills of students with MD, and these training effects may be sustained after a few months. However, there is still a great controversy about the effects of EF training on general cognitive ability and academic performance. In particular, the instant and sustained effects of EF training are still unclear. Among other limitations, there is inadequate literature on the effects of EF training on students with learning difficulties, especially students with MD. In addition, EF training mainly focuses on updating and rarely involves inhibition and shifting training. However, fluid intelligence and mathematics skills tasks usually do not rely on a single EF component but require interactions between different components. Moreover, most studies measure mathematics skills on a holistic level without attending to specific skills. In view of the large differences between Chinese and Western students in mathematics learning (Cai, 2000; Kang & Liu, 2018; Tang et al., 2006), and the fact that there have been few EF intervention programs for students with MD in China, it is important to collect evidence of effectiveness for EF training in the context of Chinese culture.
The EF training with focuses on inhibition, updating, and shifting was designed for Chinese students with MD. To determine the various effects of the training, the students’ EF, fluid intelligence, and different mathematics skills were measured before, immediately after, and 6 months after the training. Two research questions guided this study:
RQ1. In the Chinese context, how effective is EF training on the EF, fluid intelligence, and different mathematics skills of students with MD?
RQ2. Can the EF training effects be sustained after 6 months?
Method
Participants
Participants were recruited from 1,100 Grades 2–4 students in 24 classes in one public elementary school in Shanghai, China. Two classes were randomly selected for each grade, and 271 students from six classes participated in the preliminary screening. All the students spoke Mandarin as their mother tongue and came from an area with a per capita disposable income of 17,022 yuan (about US$2,657); furthermore, according to their teachers’ report, they had no visual or sensory impairment, emotional or behavioral disorders, social or cultural adaptation problems, or other physical disorders.
Scores from the math standard achievement test (MSAT, Dong & Lin, 2011) and three recent mathematics mid-term and final tests were used as the screening criteria (Zhang et al., 2020). Students whose standardized Z-scores were < 25th percentile in all the tests were selected (n = 54). The MSAT included 25 single-choice items and two short-answer items; for example, “If the perimeter of a square ground is 24 meters, how many square tiles with an area of 1 square meter could just cover this square ground?” Participants were asked to solve these questions without time limit. The final score was the total number of problems correctly solved (maximum score = 32). Cronbach’s alpha was .82 to .85 for different grade levels. Students with poor mathematics scores due to learning motivation or intelligence were excluded according to the definition of MD (Deng et al., 2022). The exclusion criteria were as follows: (a) Students whose scores were lower than two standard deviations below the mean score of the motivation adaption assessment test (MAAT; Zhou, 1991) were excluded (n = 11). (b) Students whose scores were lower than two standard deviations below the mean score of the Raven’s standard progressive matrices (SPM; Zhang & Wang, 1989) were also excluded (n = 6). Figure 1 shows the screening process.

CONSORT flowchart of participants.
After screening, 37 students with MD were recruited and randomly divided into two groups: training group (n = 18, age = 8.39 ± 0.82 years, 7 girls) and control group (n = 19, age = 8.63 ± 0.84 years, 7 girls). No significant difference in age (t = .175, p = .862) was observed between the two groups. However, of the original 37 students, five students failed to complete the follow-up test after 6 months. The final sample consisted of 32 students (training group: n = 16, age = 8.30 ± 0.78 years, 6 girls; control group: n = 16, age = 8.48 ± 0.86 years, 4 girls). Table 1 provides an overview of the demographic characteristics of the participants. The experiment was approved by the Ethics Committee of the university. Parental permission and student consent were obtained prior to the study.
Demographic Characteristics of Participants in the EF Training Study.
Note. EF = executive functioning.
Measures
Pre-, post-, and follow-up tests of EF
Inhibition
Inhibition was assessed using the Stroop and Flanker tasks. The Stroop task was adopted from the Cognitive Assessment System-2 (CAS-2; Naglieri et al., 2014). Participants were asked to name the color of ink printed in an incongruent ink color (e.g., the word “red” was printed in blue ink). The stimuli color words were green, yellow, blue, and red in an 8 × 5 arrangement. When the instructor said “Begin,” participants named the color of the ink as quickly as possible. The experiment consisted of 40 trials, after which accuracy was recorded.
In the Flanker task, the number “5” was selected as the target stimulus, whereas other numbers were selected as interferences (Fan et al., 2002). The experiment materials were represented by single-digit numbers (e.g., 4, 5, 6) or three-digit numbers (e.g., 656, 535). In the experiment, participants were required to press the “A” key when the number “5” was in the middle position, otherwise press the “L” key. Each number stimulus was displayed on the screen for 500 ms, and the stimulus interval was 2,500 ms. This experiment consisted of 96 trials, after which accuracy was recorded.
Updating
Updating was assessed using the N-back task. The N-back task was adopted from Kirchner (1958), and numbers were used as stimuli. In the 1-back task, participants were asked to judge whether the current number was consistent with the last number. In the experiment, participants were required to press the “A” key when the current stimulus was consistent with the last one, and the “L” key for the rest of the numbers. In the 2-back task, participants were asked to judge whether the current number was consistent with the last two. Participants were required to press the “A” key when the current stimulus was consistent with the last two, and the “L” key for the rest of the numbers. The first stimulus was displayed on the screen for 3,650 ms, and the next stimulus was displayed for 3,150 ms. Participants were instructed to press the key within 500 ms, or a missed response was recorded. Participants completed eight practice trials to familiarize themselves with the task. The experiment consisted of 24 trials for the 1-back task and 24 trials for the 2-back task. Twelve trials were targets, whereas 12 trials were non-targets. Accuracy was then recorded for the 1- and 2-back tasks.
Shifting
Shifting was assessed using the number shifting and letter shifting tasks. The number shifting task was adopted from Li et al. (2016) and used one-digit numbers as stimuli. This task used a red pair of numbers or a green pair of numbers as stimuli. When a red pair of numbers was presented, participants were required to press the “A” key when the number on the left was greater and the “L” key when the number on the right was greater. When a green pair of numbers was presented, participants were required to press the “A” key when number on the left was smaller and the “L” key if the number on the right was smaller. Each number stimulus was displayed on the screen for 1,000 ms, and the stimulus interval was 3,650 ms. The experiment consisted of 48 trials, after which accuracy was recorded.
The letter shifting task was adopted from the work of Li et al. (2016) and used a letter in red or in green as stimulus. If a red letter was presented, participants were required to press the “A” key when the letter was a capital letter, otherwise press the “L” key. If a green letter was presented, participants were required to press the “A” key when the letter was not a capital letter, otherwise press the “L” key. Each number stimulus was displayed on the screen for 1,000 ms, and the stimulus interval was 3,650 ms. The experiment consisted of 48 trials, after which accuracy was recorded.
Pre-, post-, and follow-up tests of fluid intelligence
Fluid intelligence was assessed using Raven’s colored progressive matrices (CPM; Raven, 1947). The CPM consisted of 36 items. Each item contained a figure with a missing piece. Below the figure, six pieces were presented, only one of which correctly completed the figure. The participants were instructed on successive trials to point to the piece that best completed the pattern. The score was the total number of correct responses (maximum score = 36).
Pre-, post-, and follow-up tests of mathematics skills
Mathematics skills were measured using three tasks: numerical operations, calculation fluency, and math problem-solving tasks. The numerical operations task was adapted from the Wechsler Individual Achievement Test–Third Edition (WIAT-III; Wechsler, 2009). Participants answered as many questions as they could without a time limit. The score was the total number of correct responses (maximum score = 40). Cronbach’s alpha for the current sample was .86 to .92 for different grade levels.
The calculation fluency task from the WIAT-III assessed the speed and accuracy of single- and multi-digit calculations of participants. It included three subtests: addition, subtraction, and multiplication. The addition and subtraction subtests had 48 items, and the multiplication subtest had 40 items. Participants were asked to solve as many items as possible within a 1-min time limit. The total score was the sum of the correct answers obtained from the addition, subtraction, and multiplication subtests (maximum score = 136). The split-half reliability coefficient for the current sample was .90.
The math problem-solving task was also adopted from WIAT-III, and participants solved as many mathematics problems as they could without a time limit. For example, participants were presented with a picture with four sticks (one marble on the first stick, two marbles on the second stick, three marbles on the third stick; however, the fourth stick was empty). Then, they were asked a question: “Observe the first three sticks. How many marbles should be placed on the fourth stick?” The problem was first read aloud by the instructor, and the answer was given by the participants. The test was terminated in the event of four consecutive errors. The score was the total number of problems correctly solved (maximum score = 72). Cronbach’s alpha for the current sample was .86.
Training tasks
Inhibition training task
Referring to the classic pattern of the inhibition task, 75 sets of number pairs were presented on an A4 size paper. Difficulty was manipulated by varying the size of the item (neutral, congruent, and incongruent). For the neutral condition, number pairs of the same printed size were presented. For the congruent condition, the actual size of the number was the same as the printed size; that is, if the actual number was greater, the printed size of the number was also greater. For the incongruent condition, the actual size of the number was inconsistent with the printed size; that is, if the actual number was greater, the printed size of the number was smaller. Participants were required to ignore the printed size and report whether the number on the right was greater or smaller than that on the left. The total score was the number of correct answers obtained from oral reports within a 1-min time limit.
Updating training task
Referring to the keep-track task (Yntema & Mueser, 1962), participants were required to update the items of animal pictures at the center of the screen sequentially. Before the first animal picture appeared, a “+” sign 2 cm × 2 cm in size was presented to remind participants of an upcoming target stimulus. The duration of “+” lasted 800 ms. In this task, trials of animal pictures were presented sequentially at 1,500 ms/per item in a random order (see Figure 2). All the animal pictures were presented on the screen in a 3 × 2 matrix at the end of each trial. Participants were required to click on the last three animals in the trial. The number of animal pictures was chosen randomly from possibly 4, 5, 6, or 7 animals in each trial. There were 24 trials, after which accuracy was recorded.

Graphical rendition of updating training task.
Shifting training task
Referring to the task-shifting paradigm (Kray & Lindenberger, 2000), red and green numbers (range 1–9, except 5) were randomly distributed on an A4 size paper. There were 80 numbers. Participants were required to provide oral responses. If the number was red, participants reported whether the number was greater than or smaller than 5; if the number was green, participants stated whether the number was even or odd. The number of correct answers within a 1-min time limit was recorded as the total score.
Procedure
Pretest
During the pretest, all the students first spent approximately 30 min on the group tests (i.e., CPM, numerical operations, and calculation fluency tasks) in a quiet classroom. Then, according to the daily class schedule, each student was brought into a room one at a time to complete the individual tests, including paper-and-pencil tests (i.e., Stroop and mathematics problem-solving tasks) and computer-based tests (i.e., Flanker, 1-back, 2-back, number shifting, and letter shifting tasks). The individual tests lasted for approximately 60 min. All the tests were conducted by well-trained graduate students.
Training
After the pretest, the training group received 25 training sessions, which were conducted 2 to 3 times per week (25 min each), whereas the control group participated in normal class activities. For the training group, participants completed inhibition training tasks for approximately 7 min and had a 1-min break. Subsequently, participants completed shifting training tasks for about 8 min and had a 1-min break. Finally, participants completed updating training tasks for about 5 min. According to the daily class schedule, each student was brought into a room one at a time to complete training tasks that were conducted by well-trained graduate students.
Posttest and follow-up tests
All the students received the posttest after the training and the follow-up test 6 months after the training. The posttest and the follow-up test were the same as the pretest.
Quality of Measurements
Training for experimenters
The experimenters learned how to administer the tests and the training tasks from a 2-day professional development workshop provided by the primary author—a professor in the field of developmental psychology with 10 years of clinical experience. During the professional development workshop, the primary author first explained the background knowledge of EF training to ensure that the experimenters understood the training goals. Subsequently, the primary author explained the target skills, implementation points, and precautions for each task and demonstrated the tasks. Finally, the experimenters practiced the tasks, and the primary author ensured that they were able to perform the training independently and accurately.
Fidelity of implementation data for training
The experimenters filled in the training record form after each training session, documented the training process, and provided participant feedback and remarks in detail. All the experimenters held a meeting every week to discuss the effectiveness of the training sessions. Then, the participants’ accuracy for each training task was recorded, and the effectiveness of the training was determined by evaluating the changes in the participants’ accuracy after consecutive training sessions.
Data analysis
First, to examine the training gains across 25 training sessions, a series of repeated-measures analysis of variance (ANOVA) tests were performed on the training group to compare participants’ accuracy on each of the training tasks at three sessions of the training (at the end of the first, 13th, and 25th training sessions). Second, to examine the training effects on EF, fluid intelligence, and mathematics skills, a series of two-way, repeated-measures ANOVA tests were performed to compare the accuracy of the training group and that of the control group for each of the tasks for the pretest, posttest, and the follow-up test. Post hoc Bonferroni comparisons and independent sample t tests were performed whenever a main effect was found to be significant. Third, when the interaction effects between timepoint and group were significant, simple effect analyses were performed to further determine the pairwise differences between the training group and the control group.
Results
The descriptive statistics of the pretest, the posttest, and the follow-up test for each of the tasks are displayed in Table 2 for the training and the control groups, respectively.
Means and Standard Deviations for Performance in All Tasks for the Training and Control Groups at Three Timepoints.
Note. Descriptive statistics included the number of observations that were included in the subsequent statistical analyses. ACC = accuracy; NF = number shifting; LF = letter shifting; CPM = Raven’s colored progressive matrices; NO = numerical operations; CF = calculation fluency; MPS = mathematics problem-solving.
Pretest
To test the differences between the training group and the control group on the performance of EF, fluid intelligence, and mathematics skills, independent sample t tests were conducted. The results revealed no significant differences between the training group and the control group in any of these variables (ps > .05).
Training Gains
Participants’ performance on each of the three EF training tasks was measured and compared at three training sessions (i.e., first, 13th, and 25th training sessions) during the study. Their performance on each of the tasks increased from session to session, indicating that the training group’s EF improved over the period of the study (see Figure 3).

Mean level reached in three training tasks for the training group across 25 training sessions.
For the inhibition training task, the effect of session was significant, F(2,15) = 339.289, p < .001, η p 2 = .958. Post hoc Bonferroni comparisons showed that performance in the 13th (Mdiff = 0.36, p < .001) and 25th (Mdiff = 0.60, p < .001) sessions were better than that in the first session, whereas performance in the 25th session was better than that in the 13th session (Mdiff = 0.24, p < .001).
For the updating training task, the effect of session was also significant, F(2,15) = 59.853, p < .001, η p 2 = .800. Post hoc Bonferroni comparisons indicated that performance in the 13th (Mdiff = 0.33, p < .001) and 25th (Mdiff = 0.44, p < .001) sessions were better than that in the first session, whereas performance in the 25th session was better than that in the 13th session (Mdiff = 0.11, p = .007).
Similarly, for the shifting training task, there was also a significant effect of session, F(2,15) = 365.386, p < .001, η p 2 = .961, and post hoc Bonferroni comparisons showed significant differences among the three training sessions. Specifically, performance in the 13th (Mdiff = 0.38, p < .001) and 25th (Mdiff = 0.66, p < .001) sessions were better than that in the first session, whereas performance in the 25th session was better than that in the 13th session (Mdiff = 0.28, p < .001).
Executive Function Training Effects
The results of 3 × 2 (timepoint [pretest, posttest, and follow-up test] × group [training group and control group]) repeated-measures ANOVA tests are displayed in Table 3.
The Statistical Results of the Six EF Tasks, Raven’s Colored Progressive Matrices, and Mathematical Performance for the Training and Control Groups in Three Timepoints.
Note. EF = executive function; ACC = accuracy; TG = training group; CG = control group; Pr = pretest; Po = posttest; Fo = follow-up test; NF = number shifting; LF = letter shifting; CPM = colored progressive matrices; NO = numerical operations; CF = calculation fluency; MPS = mathematics problem-solving.
For the Stroop task, the ANOVA results showed that none of the effects were significant. Similar results were obtained for the Flanker task, with non-significant results for all the effects.
For the 1-back task, the ANOVA results showed that the main effect of timepoint was not significant, but the main effect of group was significant. The interaction effect between timepoint and group was not significant. The independent sample t-test results showed that there was no significant difference between the training group and the control group for the pretest (t = 1.148, p = .260), but the performance for the training group was significantly higher than that of the control group for the posttest (t = 2.784, p = .012) and for the follow-up test (t = 2.406, p = .026).
For the 2-back task, the main effects of timepoint and group were both significant. The interaction effect between timepoint and group was also significant. Further simple effect analysis indicated that, for the training group, the posttest (Mdiff = 0.26, p < .001) and the follow-up test (Mdiff = 0.19, p < .001) scores were significantly higher than that of the pretest, and there was no significant difference between the posttest scores and the follow-up test scores (Mdiff = 0.07, p = .051). For the control group, student performance for the posttest (Mdiff = 0.01, p = .996) and their performance for the follow-up test (Mdiff = 0.06, p = .394) were not significantly different from that for the pretest, and there was no significant difference between the posttest scores and the follow-up test scores (Mdiff = 0.05, p = .236). The performance of the training group was significantly higher than that of the control group for the posttest (Mdiff = 0.27, p < .001) and the follow-up test (Mdiff = 0.14, p = .001).
For the number shifting task, the main effects of timepoint and group were both significant. The interaction effect between timepoint and group was also significant. Further simple effect analysis indicated that, for the training group, the posttest performance was significantly higher than that for the pretest (Mdiff = 0.10, p = .006), and there were no significant differences between the follow-up test scores and the pretest (Mdiff = 0.07, p = .122) scores, or between the follow-up test scores and the posttest (Mdiff = −0.02, p = .888) scores. For the control group, student performance for the posttest (Mdiff = −0.04, p = .423) and their performance for the follow-up test (Mdiff = 0.06, p = .195) were not significantly different from that for the pretest, and the follow-up test performance was significantly higher than the posttest performance (Mdiff = 0.10, p = .025). The posttest performance of the training group was significantly higher than that of the control group (Mdiff = 0.20, p < .001).
For the letter shifting task, the main effects of timepoint and group were both significant. The interaction effect between timepoint and group was also significant. Further simple effect analysis indicated that, for the training group, student performance for the posttest (Mdiff = 0.13, p < .001) and their performance for the follow-up test (Mdiff = 0.09, p = .001) were significantly higher than that for the pretest, and there was no significant difference between the performance for the posttest and for the follow-up test (Mdiff = 0.04, p = .077). For the control group, student performance for the posttest (Mdiff = −0.04, p = .414) and their performance for the follow-up test (Mdiff = 0.02, p = .782) were not significantly different from that for the pretest, and the follow-up test performance was significantly higher than that for the posttest (Mdiff = 0.06, p = .007). The performance of the training group was significantly higher than that of the control group for the posttest (Mdiff = 0.22, p < .001) and for the follow-up test (Mdiff = 0.12, p = .013).
For the CPM task, the main effect of timepoint was significant, but the main effect of group was not significant. The interaction effect between timepoint and group was not significant.
For the numerical operations task, the main effect of timepoint was significant, and the main effect of group was not significant. The interaction effect between timepoint and group was significant. Further simple effect analysis indicated that, for the training group, student performance for the follow-up test was significantly higher than that for the pretest (Mdiff = 1.94, p = .034), and there were no significant differences between the performance for the posttest and that for the pretest (Mdiff = 1.06, p = .541), and between the performance for the follow-up test and that for the posttest (Mdiff = −0.88, p = .485). For the control group, their performance for the pretest (Mdiff = 2.56, p = .018) and for the follow-up test (Mdiff = 2.94, p < .001) were significantly higher than that for the posttest, and there was no significant difference between the performance for the pretest and that for the follow-up test (Mdiff = −0.38, p = .939).
For the calculation fluency task, the main effect of timepoint was significant, but the main effect of group was not significant. The interaction effect between timepoint and group was significant. Further simple effect analysis indicated that, for the training group, their performance for the posttest (Mdiff = 16.06, p = .001) and for the follow-up test (Mdiff = 15.56, p = .001) were significantly higher than that for the pretest, and there was no significant difference between the posttest scores and the follow-up test scores (Mdiff = 0.50, p = .998). For the control group, the follow-up test performance was significantly higher than that for the pretest (Mdiff = 13.47, p = .007) and for the posttest (Mdiff = 10.87, p = .014), and there was no significant difference between the performance for the pretest and that for the posttest (Mdiff = −2.60, p = .900).
For the mathematics problem-solving task, the main effects of timepoint was significant, but the main effect of group was not significant. The interaction effect between timepoint and group was significant. Further simple effect analysis indicated that, the performance for the posttest (Mdiff = 4.13, p < .001) and for the follow-up test (Mdiff = 5.19, p < .001) were significantly higher than that for the pretest, and there was no significant difference between the posttest performance and the follow-up test performance (Mdiff = −1.06, p = .600). For the control group, the follow-up test performance was significantly higher than that for the pretest (Mdiff = 2.50, p = .013) and for the posttest (Mdiff = 2.38, p = .048), and there was no significant difference between the performance for the pretest and that for the posttest (Mdiff = −0.13, p = .999).
The average performance on each of the tasks for the pretest, the posttest, and the follow-up test are displayed in Figure 4 for the training and the control groups, respectively. The effects on the Stroop and the Flanker tasks were not significant and are omitted from the figure.

The pretest, posttest, and follow-up test scores for 1-back, 2-back, number shifting, letter shifting, numerical operations, calculation fluency, and MPS.
Discussion
This study investigated the instant and sustained effects of EF training on EF, fluid intelligence, and mathematics skills of students with MD. The results showed that EF training significantly improved not only the updating and shifting abilities of students, but also their performance in calculation fluency and mathematics problem-solving. After 6 months, the training effects on the 2-back and letter shifting tasks were sustained, and the training effect first appeared on the numerical operations task. However, for the calculation fluency task and mathematics problem-solving task, the training effects were not sustained according to the follow-up test. The results above are in line with previous studies showing that EF training produced instant (Malekpour & Aghababaei, 2013; Zhang et al., 2018) and sustained effects (Abo-Ras et al., 2018; Chen et al., 2017) in updating ability. No significant improvement in the training group was observed in the 1-back task, although the scores for training group were higher than those for the control group on the posttest and on the follow-up test. One possible explanation may be that the 1-back task was insufficient to measure updating ability due to its low requirement of cognitive resources. In this study, the performance of training group was at ceiling for 1-back at pretest; therefore, there was little improvement space.
In addition, the training group showed improvements in the number shifting and letter shifting tasks, indicating instant effects of EF training on shifting ability. Some researchers have suggested that task-shifting training can improve performance on similar tasks, such as the trail making task (Li et al., 2016). However, in the follow-up test, only the training effect on the letter shifting task was sustained. One reason may be that the number shifting task is quite complex. The current number shifting task in this study used numbers within 10. It required switching among multiple tasks (i.e., the number system, identify number and its quantity representation, and compare symbolic number magnitude), whereas the letter shifting task only required shifting among letter sets (i.e., capital letters and lowercase letters). The current number shifting task required individuals to pay attention to the color of the stimuli, the numbers, and the nature of the shifts, for example, from which number was greater to which number was smaller.
There was a non-significant training effect on the Stroop task and the Flanker task for the training group. One possible explanation is related to the specificity of the EF training. Maraver et. al. (2016) investigated the differential impacts of WM training and inhibitory control training; the enhancement in the Stroop task was observed only for the inhibitory control training group, and the benefits in the Stroop task were not observed for the WM training group. Although some previous studies have shown the benefits of the Stroop task following WM training (Borella et al., 2010; Chein & Morrison, 2010), we failed to observe the effects of EF training on inhibition tasks. Hence, verification through further research is needed.
The EF training effect on the fluid intelligence of students with MD was not observed. Our finding is generally in line with findings of previous studies showing that the effects of updating training do not generalize to fluid intelligence (Ang et al., 2015; Holmes et al., 2009; Malekpour & Aghababaei, 2013). Although some earlier studies suggested that updating training improved fluid intelligence (Chen et al., 2017; Wang et al., 2019), this was not found in this study. One possible explanation for this inconsistency may be attributed to the individual differences in EF (Jaeggi et al., 2011). In this study, participants were restricted to 7- to 10-year-old students with MD. The effectiveness of training on fluid intelligence appears to change across cognitively diverse populations. It would be worthwhile to further investigate the transfer effect of EF training on fluid intelligence for students with different cognitive deficits. Another explanation for the limited effect of EF training on improving fluid intelligence is that given two highly correlated variables, such as EF and fluid intelligence, when one variable improves through training, the other variable does not necessarily co-variate (Moreau & Conway, 2014). A possible reason for this phenomenon is that the method of measuring EF and fluid intelligence at a single task level artificially reduces the potential correlation between them (Smid et al., 2020).
In this study, EF training improved student performance on the calculation fluency and mathematics problem-solving tasks. The finding on the calculation fluency task is consistent with previous research (Alloway & Alloway, 2009; Dahlin, 2013; Layes et al., 2017; Witt, 2011). When researchers of the abovementioned studies conducted WM training with different tasks and duration, they found that the training group made significant progress in the calculation tasks. In the calculation fluency task, two numbers with one to three arithmetic operations (addition, subtraction, and multiplication) are combined to obtain the predetermined answer. The combined answer is derived by relying on counting while attending to shifts between different procedures, or retrieval from long-term memory. In this study, EF training improved the updating and shifting abilities of students with MD, which may be the reason for the significant improvement in the calculation fluency task for the training group.
The findings on the mathematics problem-solving task are consistent with those from some previous research (Dahlin, 2013; Ren & Cai, 2019). Various studies have provided evidence for the strong relationship between EF and mathematics problem-solving (Passolunghi & Siegel, 2001; Swanson & Beebe-Frankenberger, 2004; Swanson et al., 2008). EF plays an important role in solving mathematics problems (Lglesias-Sarmiento et al., 2015; Peng & Fuchs, 2015; Ven et al., 2012), and improvement in EF helps improve mathematics problem-solving ability. In this study, the improvement in updating ability helped students with MD to update information, direct attention to relevant information, and disregard irrelevant information. Moreover, the improvement of shifting ability assisted them in switching among solution strategies, operations, and steps for solving multi-step problems. It may also be that the use of numbers as stimuli in EF training tasks improves students’ performance on mathematics skills tasks. In future research, it is necessary to further investigate whether EF training consistently has significant effects on mathematics tasks that contain irrelevant numerical information and/or irrelevant narrative information, require multiple steps, and have missing values at the beginning.
In the follow-up test after 6 months, the advantages of the training group in calculation fluency and mathematics problem-solving tasks disappeared. In other words, the results showed that EF training only produced instant effects on these two mathematics skills tasks but did not produce sustained effects. One possible explanation is that, over time, students in the control group experienced similar improvements in the calculation fluency and mathematics problem-solving tasks. Previous research showed that students with MD will also develop continuously in cognitive and mathematics abilities. However, in the numerical operations task, we found a completely different result. No instant effect was observed, but after 6 months, effects were observed the first time. This is quite common in the field of EF training. Some other studies have achieved similar results (Chen et al., 2017; Holmes et al., 2009). This inconsistency may be related to the differences between different mathematics skills tasks. Compared to the calculation fluency, the numerical operations task requires more complex cognitive functions in the following domains: basic mathematical facts, basic operations with integers, fractions, geometry, algebra, and calculus. It requires students to (a) understand the meanings of the various arithmetic operations, (b) select and use appropriate computational methods from mental mathematics, estimation, and paper-and-pencil, and (c) check the reasonableness of the results (Bjork & Bowyer-Crane, 2013). The improvement induced by EF training on complex numerical operations tasks may take a time to become apparent.
Limitations and Future Directions
While the results of this study suggest that EF training for students with MD warrants further exploration as a potential promising treatment, this study also has limitations. First, the sample size was small and was from one elementary school. Second, given our training design, the difficulty level of the training tasks remained the same; we did not adapt the difficulty level to the performance of the participant, and the time allocation of each training task was uneven. Third, the placebo control group was not included to control for the effects of high expectations for the effectiveness of the training and high motivation.
Implications for Practice
The current EF training study suggests two implications for students with MD. First, the EF training for MD needs to be persistent, because the effects of EF training on cognitive and mathematics abilities are often instant. After 6 months, the effects tend to fade, or only partial effects are left. Further studies should design individualized training programs for specific types of MD and use them consistently. For example, combining EF training with mathematics word-problem instruction may benefit students with word-problem difficulties (Peng & Fuchs, 2015). Given the cognitive deficits among students with MD (Kroesbergen et al., 2012), further research is necessary to detect the specific effects of different domains of EF training on different mathematics skills. Second, classroom-based interventions may be developed. In particular, the brief EF training intervention may be worth the investment of classroom time, given its instant effects on mathematics problem-solving and its delayed effects on numerical operations that emerged for the students with MD in this study. For students who struggle with mathematics, such as those in the current study, EF training may be a useful classroom-based activity to strengthen mathematics-related EF, and in turn, promotes more effective comprehension processes, and produces gains in mathematics knowledge and skills. For example, teachers/practitioners can include 1- to 2-min daily EF practices (e.g., number pairs) into mathematics classrooms to facilitate the growth of mathematics skills. In addition, the findings of this study provide important insights into inclusive education practices in China. The EF intervention, such as EF training used in the current study, can be an appropriate approach to accommodate the individual learning needs of students with MD in regular education classes in China.
Conclusion
This study shows that EF training targeting inhibition, updating, and shifting components has the potential to improve the performance of students with MD on 2-back, number shifting, letter shifting, calculation fluency, and mathematics problem-solving tasks. Both instant and sustained effects are found for 2-back and letter shifting tasks. Delayed effects are found for numerical operations. The findings of this study indicate that EF training can be particularly helpful in improving updating, shifting, and numerical operations abilities, and can be a promising intervention for Chinese students with MD.
Footnotes
Acknowledgements
The authors are very grateful to Dr. George K. Georgiou (University of Alberta) and Dr. Rui Kang (Georgia College & State University), who helped on the paper revision, and Ying Cui (Jiading Experimental Middle School Affiliated to Tongji University), who helped on the data collection.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This research was funded by The General Project of Education Science by Shanghai Philosophy and Social Sciences (No. A2021002) and Shuguang Program supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission (No. 20SG45).
