Abstract
English learners (ELs) experience difficulty with mathematical problem solving because word problems require complex processes beyond basic math skills, such as the use of linguistic information, identifying relevant information, and constructing the appropriate problem statement. This study used a combined multiple baseline design and criterion changing design to assess the effectiveness of a paraphrasing intervention on the problem-solving performance for nine third-grade students who are ELs and at risk of mathematical disabilities (MD). Although the magnitude of the Tau-U effect sizes was in the small range, the visual analysis indicated that all students displayed increasing trends in problem-solving accuracy as a function of the paraphrasing intervention. The results were discussed in terms of providing continual support toward maintaining intervention outcomes.
Mathematic skills are necessary for academic success, everyday problem solving, future career options, and earning potential (McIntosh & Vignoles, 2001; Rivera-Batiz, 1992; Shapka, Domene, & Keating, 2006). As many math concepts are cumulative, basic skills in numeracy, calculation, and problem solving are necessary for future academic success. In addition, mathematic skills are required in elementary years to demonstrate proficiency on standardized high-stakes testing. Unfortunately, when compared with their monolingual English-speaking peers, English learners (ELs) whose first language is Spanish often perform poorly in mathematics (e.g., Martiniello, 2008, 2009). ELs encounter unique academic challenges, including cultural and linguistic acclimation in addition to the pressures of achieving academically, often resulting in disproportionately low achievement (Garcia & Cuéllar, 2006).
Word Problem Solving
As students progress through school, instruction in math programs increasingly emphasizes word problem solving. Mathematical word problems are linguistically presented arithmetic problems that require students to generate a solution (Fuchs & Fuchs, 2007; Fuchs et al., 2006). Word problems require students to use linguistic information to identify relevant information for solution accuracy, construct the appropriate number sentence, and calculate the problem accurately. Students with mathematical disabilities (MD) experience significant difficulty with word problems because complex processes beyond basic math skills are involved (Swanson, 2006). Students with MD perform significantly lower in math than age-equivalent peers, with the gap widening as each academic year passes (Cawley, Parmar, Foley, Salmon, & Roy, 2001).
Thus, math word problems present unique challenges for children who are ELs at risk of MD, such as acquiring the formal mathematical linguistic register and lack of exposure to discussions, which develop higher order thinking skills (Janzen, 2008). Linguistic complexity of word problems, which pose reading comprehension challenges, is one important factor that presents more difficulty for ELs when compared with their English-proficient peers with comparable math ability (Martiniello, 2008). Developing appropriate intervention for EL students at risk of MD may provide valuable support to prevent potential academic failure (Gersten, Jordan, & Flojo, 2005; Griffin, 2007).
The purpose of the present study was to test whether an intervention directed at helping EL students at risk of MD effectively focus on key information within word problems improves solution accuracy. Paraphrasing information has been identified an effective strategy to improve problem-solving accuracy (e.g., Moran, Swanson, Gerber, & Fung, 2014; Swanson, Moran, Lussier, & Fung, 2014). For example, Moran et al. (2014) examined the effect of a paraphrasing intervention for third-grade students at risk of MD. Students were randomly assigned to one of four paraphrasing conditions: restate, relevant, complete, or control. The restate condition involved students paraphrasing and rewriting the question in their own words. The relevant condition taught students to paraphrase all relevant information, including the question and numbers necessary to solve. The complete condition directed students to paraphrase the question and separate relevant and irrelevant information. Results indicated that students in the relevant and complete conditions improved on measures of word problem-solving accuracy when compared with the students in the restate and control conditions. Although group design studies, such as Moran et al. have identified paraphrase training as effective for children with MD, its effectiveness for children who are ELs at risk of MD is unknown.
The purpose of this study was to investigate the effectiveness of paraphrasing word problem-solving intervention for third-grade EL students who are at risk of MD. The following research questions were considered:
Method
Setting and Participants
Nine third-grade EL students at risk of MD participated in this study. The children were selected from four classrooms from an elementary (K–8) school in southern California. The school’s population consisted of 700 students (73% Hispanic, 11% Black/African American, 9% White [non-Hispanic], 2% Asian, and 5% Other [two or more races]). Thirty-four percent of these students were ELs. In addition, 65% of the school’s population qualified for free or reduced lunch prices.
Although controversy exists over the definition of MD, the growing consensus among researchers has indicated that a cutoff score on achievement is more appropriate to determine risk rather than a discrepancy between achievement and IQ (e.g., Fletcher et al., 1989). For the purposes of this study, children were identified as at risk of MD based on the following considerations: (a) teacher recommendation for intervention based on students receiving general math instruction for at least 2 years, (b) students who continued to experience difficulties solving word problems in the general education classroom, and (c) students performed at or below the 25th percentile on a norm-referenced math test, Test of Math Ability–2 (TOMA-2; Brown, Cronin, & McEntire, 1994). Students already receiving special education services were not included in the study.
EL status was determined by the presence of the California English Language Development Test (CELDT) score. The CELDT is an assessment used to determine and monitor the progress of children who are limited English proficient on listening, speaking, and writing in English. Table 1 provides descriptive and school-related information for the participating students.
Demographic and School-Related Data.
Note. CELDT = California English Language Development; TOMA = Test of Math Ability (Problem-Solving subtest); DRA = Developmental Reading Assessment; F = female; Beg = beginning; M = male; EI = early intermediate; Int = intermediate.
General Procedures
Two graduate students administered the intervention utilizing an instructional protocol. This study was conducted in small groups (three students) in the general classroom setting for 21 sessions over an 8-week period. Each intervention session averaged 30 min and was a supplementary intervention to the general education math curriculum students received (50 min/day). Two follow-up sessions were conducted following the conclusion of the study.
In this study, word problems were modified from the classroom text (EngageNY; Expeditionary Learning, 2013). Word problems in the intervention included one- and two-step addition and subtraction word problems with the following elements: (a) a question, (b) relevant information and numbers required to solve the problem, and (c) irrelevant information or numbers. The following example illustrates the components of a one-step word problem: David has 52 baseball cards. (relevant information) David gave 19 baseball cards to Nick. (relevant information) David also collects football cards. (irrelevant information) How many baseball cards does David have left? (question)
Experimental Design
A changing criterion, multiple baseline across subjects design was utilized to evaluate the effects of a paraphrasing intervention on the word problem-solving performance of nine EL students at risk of MD (Kennedy, 2005). Because the difficulties in problem solution across sessions (number of irrelevant sentences and number of steps) systematically increased, a changing criterion component was implemented within the multiple baseline phases. The word problems for this study were selected and modified from classroom text (EngageNY; Expeditionary Learning, 2013). Seventy problems were selected and then randomly assigned to sessions. Students were placed in three groups of three students. To control for possible classroom teacher effects, students from each of the four classrooms were randomly assigned to small groups. That is, no small group consisted of students from one classroom teacher. The first group received baseline measures for three sessions; Group 2, baseline measures for five sessions; and Group 3, baseline measures for seven sessions. Each baseline and treatment measure consisted of 10 one- and two-step addition and subtraction word problems.
Baseline phase
For the baseline sessions, preinstructional performance of word problem solving that each student could accurately solve without assistance was established.
Intervention phase
For the intervention sessions, problem-solving instruction directed students to apply a paraphrasing strategy to word problems (to be discussed). Each intervention session presented seven word problems in total—one word problem with explicit instruction, one word problem to solve with teacher assistance, and five word problems to be solved independently.
As mentioned, word problem difficulty (i.e., number of sentences, number of steps required to solve the problem) increased across intervention sessions. Lessons 1 to 4 taught one-word problems. Lessons 5 to 15 included multiple-step word problems. In addition, the number of sentences and complexity of sentences increased across lessons. Lessons 1 to 9 included word problem instruction with three to five sentences. Lessons 10 to 15 taught word problems with five to seven sentences. That is, the intervention phase was comprised of three levels of problem-solving difficulty (changing criterion). Level 1 (Lessons 1–4) consisted of one-word problems with three to five sentences. Level 2 (Lessons 5–9) consisted of two-step word problems with three to five sentences. Finally, Level 3 (Lessons 10–15) included two-step word problems with five to seven sentences. The proportion correct for each session was recorded (see Table 2).
Word Problem-Solving Mean Percent Accuracy Scores Across Phases.
Note. Intervention L1 = one-step word problems, three to five sentences; Intervention L2 = two-step word problems, three to five sentences; Intervention L3 = two-step word problems, five to seven sentences.
Maintenance phase
All instructional phases concluded after the predesignated intervention session number for each group (15 sessions, 13 sessions, 11 sessions). Two weeks after completion of the intervention phase (i.e., Session 18), all students were administered three maintenance measures of one- and two-step word problems to verify maintenance of treatment skills.
Follow-up phase
Two months after completion of the intervention phase, maintenance tests of one- and two-step word problems were administered again to verify continuation of skills after extended time periods without treatment.
Instructional Procedures
A paraphrasing strategy intervention was designed to improve problem-solving accuracy. The intervention directed students to paraphrase and write out components of a word problem in their own words. The elements of word problem included a question, relevant information and numbers required to solve the problem, and irrelevant information or numbers. The strategy was taught during each intervention session during the course of the study. The instructional phases during each session are as follows:
Phase 1: Warm-up activity
During this phase, students participated in brief warm-up activities alternating between math calculation and reading comprehension. Warm-up activities did not exceed 5 min per session.
For the calculation warm-up activity, students were provided (one, two, or three digit) addition and subtraction problems. Students were encouraged to complete as many problems accurately in the given time.
For the reading comprehension warm-up activity, students read short paragraphs adapted from student texts (EngageNY; Great Minds, 2015) and answered multiple-choice questions regarding the content of the paragraph. These exercises included literal reading comprehension questions, which will require students to recall characters, main events, or main ideas from the paragraphs. Students were also asked to generate a sentence identifying the main idea of the text.
Phase 2: Explicit instruction
During this phase of the intervention, students were taught the paraphrasing strategy through direct instruction. The intervention incorporated the following four steps:
Know—“What do I know about the question” occurred after the teacher read the word problem aloud. The teacher identified the question for the group. Then, the teacher modeled how to paraphrase the question by writing a sentence.
Find—“Find the relevant information” occurred during reading the word problem aloud for a second time. Again, using think-alouds, the teacher modeled how to find and paraphrase important information to answer the question.
Cross out—“Cross out irrelevant information” occurred after finding relevant information. Students were guided in eliminating information that was not relevant to solve the problem. This information was not paraphrased.
Solve and check—After gathering necessary relevant information by summarizing important propositions, the teacher modeled how to set up and solve the problem. Finally, the teacher checked whether the answer stated in a complete sentence addresses the initial question by stating the question again.
During this step, the instructor demonstrated each step of the strategy through visual and explicit instruction. Visuals to aid instruction included a checklist to serve as reminders for each step.
Phase 3: Guided practice
During this phase of intervention, students answered word problems using the paraphrasing strategy. The teacher prompted students to apply each step of the strategy utilizing an instructional protocol. Instructors checked students’ answers for each step of the strategy. If difficulty persisted in a step of the strategy, the instructor provided corrective feedback. If after two attempts of guided instruction, the student was not able to apply a step in the strategy, the teacher modeled the answer.
Phase 4: Independent practice
Finally, students solved word problems independently. If a student asked for help during this phase of intervention, the teacher encouraged him or her to solve the problems independently before offering assistance. Performance on word problems during the independent phase served as the dependent measure to assess treatment effects.
Dependent Measures
Test of mathematical abilities
The story problem subtest from the TOMA-2 (Brown et al., 1994) is a 25-item word problem-solving assessment. Students are required to read and solve the word problems individually while recording answers in their test booklets. The items increase in difficulty and involve all four mathematics calculation areas. Testing is discontinued after 10 min. Reliability coefficient for this subtest exceeded .80.
Word problem-solving accuracy
The primary targeted dependent measure was word problem-solving accuracy. Each session included the administration of five one- and two-step addition and subtraction word problems. Each word problem included (a) a question, (b) relevant information and numbers required to solve the problem, and (c) irrelevant information or numbers. These word problems were adapted from a previous study (Kong & Orosco, 2015). The coefficient alpha for these problems was acceptable (.77). Accuracy was measured as the percentage correct (number correct divided by number possible). Solution accuracy was recorded whether the student used the intervention strategy or not.
To assess generalization of the treatment condition to other problem-solving measures, the AIMSweb Math Concepts and Applications (M-CAP) was administered. The M-CAP AIMSweb measure was administered every third intervention session. The M-CAP is a general outcome measure of typical math curriculum including problem-solving, reasoning, and analytical skills. The M-CAP is group administered and does not exceed 8 min to administer. Students read and solve the word problems while recording answers on their test sheet. The alternate-form reliability coefficient for the third-grade form is .81 (AIMSweb technical manual; NCS Pearson, Inc., 2012).
Interrater reliability and treatment fidelity
Twenty-five percent of the data were rescored by an independent observer. Interrater reliability was calculated by dividing the total number of agreements by the total number of agreements and disagreements. Interrater agreement was 100% across all measures.
To ensure consistency of delivery of instruction, all intervention and assessment sessions were scripted. However, to encourage natural teaching, interaction, and questions from students, the scripts served as an outline for instruction. A treatment fidelity checklist based on the paraphrasing strategy for each phase of intervention was applied by an independent classroom observer for 26.67% of all intervention sessions. The observer coded for fidelity via a checklist and score “yes” or “no” for each behavior observed. A percentage of presence of intervention behaviors for all sessions was calculated at the conclusion of the study. The percentage of presence of intervention components behaviors for all sessions was 94.92%.
Data Analysis
Data were analyzed using visual analysis and Tau-U effect size calculation of the intervention for each changing criterion for each student. Visual analysis was conducted to determine evidence for causal relations between the paraphrasing intervention and word problem-solving performance (Kratochwill et al., 2010). The recommended steps for conducting visual analysis to document at least three demonstrations of intervention effect were followed (Kratochwill et al., 2010). Level stability was determined based on the “80% in 25%” criteria, if 80% of the data fell within 25% of the median value (Gast & Ledford, 2014). In addition, changes in trends following the implementation of the intervention in staggered phases (multiple baselines) were examined. Trends were also deemed stable on whether 80% of the data fell within 25% of the trend line (Gast & Ledford, 2014).
In addition to visual analysis, Tau-U effect sizes were also calculated to determine overall effect of the paraphrasing intervention for each changing criterion for each student, and a combined effect for all students between baseline and intervention phases and during the intervention (Vannest, Parker, & Gonen, 2011). Tau-U is an analytic method for calculating effect size that combines nonoverlap between phases (baseline and intervention) with calculating trend within the intervention phase, while also allowing for control of trends within the baseline phase (Parker, Vannest, Davis, & Sauber, 2011). The formula to calculate Tau is as follows: Tau = Nc − Nd / (n [n − 1] / 2), where c = concordant pairs (between baseline and intervention phases), d = discordant pairs, and n = possible pairs (Parker et al., 2011). Effect sizes were classified as small (0–0.65), medium (0.66–0.92), large (above 0.93) based on recommended ranges for nonoverlap (Parker, Vannest, & Brown, 2009; Soares, Harrison, Vannest, & McClelland, 2016).
Results
Experimental Word Problem-Solving Measures
Figure 1 displays word problem-solving raw score accuracy for each student as a function of baseline, intervention, maintenance, and follow-up sessions. As shown in Figure 1, all students displayed increases in problem-solving accuracy from the baseline condition. This pattern was also supported when computing the weighted average Tau-U of the paraphrasing intervention on word problem-solving accuracy compared with baseline conditions, 0.53, SE = 0.08; 95% confidence interval (CI) = [0.37, 0.69]. The magnitude of the Tau-U yielded a small effect size in favor of the paraphrasing intervention when compared with the baseline condition on word problem-solving accuracy.

Word problem-solving accuracy percentage per session.
The general pattern of improvements in problem-solving accuracy was next subjected to a visual analysis. The word problem-solving accuracy mean scores for proportion of problems correct for all students in the baseline, intervention, maintenance, and follow-up phases, respectively, are reported in Table 2. The level, trend, and variability in all phases were analyzed (Kratochwill et al., 2010). First, in terms of level, a predictable pattern of data was documented in the baseline phase before the intervention was administered. All students displayed low and flat trends in the baseline phase. All levels of the intervention (changing criterion) were taken together to fit a line to the intervention data. Second, in terms of trends, visual analysis indicated that all students displayed increasing trends in problem-solving accuracy after the staggered implementation of the paraphrasing intervention throughout the changing criterion. Following completion of the intervention, students displayed decreasing trends in word problem-solving accuracy. Finally, in terms of variation, students displayed very little fluctuation of scores in the baseline phase. That is, all students displayed consistently low scores before the intervention was administered. In the intervention phase, the variability of problem-solving scores around the best-fit line was also low. There was moderate variability in scores after the completion of the intervention. That is, students displayed fluctuations in word problem-solving accuracy without access to the intervention. As a follow-up to these general patterns, the results of each participant are next analyzed.
Mary
The mean performance during each phase of the study was analyzed (Horner et al., 2005). Mary was administered three baseline sessions, with a baseline mean score of 16.67% accuracy. She received a total of 15 intervention sessions (four Level 1, five Level 2, and six Level 3 intervention sessions). Her mean Level 1 intervention score was 55%. Her mean Level 2 intervention score was 42%. Her mean Level 3 intervention score was 63.33%. Her maintenance mean score was 20%. Finally, her two follow-up measures’ mean score was 15%. The level stability of her intervention data for each changing criterion (i.e., Levels 1–3) was stable according to the 80% in 25% criteria. In addition, the trend stability of her data for each changing criterion in the intervention phase was stable using the 80-25 criteria.
Tau-U effect size was calculated for baseline versus intervention contrasts (baseline vs. Level 1, baseline vs. Level 2, baseline vs. Level 3 intervention) for all nine students, controlling for baseline trend for six students (Mary, Edgar, Alex, Jane, Daria, and Mateo). The Level 1 Tau-U effect size for Mary with a corrected baseline was 0.92, 90% CI = [0.14, 1.0]. Her Level 2 Tau-U effect size with a corrected baseline was 0.53, 90% CI = [−0.20, 1.0]. Her Level 3 Tau-U effect size with a corrected baseline was 0.61, 90% CI = [−0.09, 1.0]. The weighted average Tau-U of the paraphrasing intervention on her overall word problem-solving accuracy was 0.68, SE = 0.26, 95% CI = [0.17, 1.0].
James
James also received three baseline sessions with a mean score of 20% accuracy. He was absent for two intervention sessions, resulting in a total of 13 intervention sessions (four Level 1, four Level 2, five Level 3). His mean intervention scores were 20%, 50%, and 76% for Levels 1, 2, and 3, respectively. His maintenance average score was 36.67%. He did not return to this school the following year, and did not participate in follow-up measures. The level stability of his intervention data for Level 2 and Level 3 of the changing criterion were deemed stable according to the 80% in 25% criteria. Intervention data for Level 1 of the changing criterion were not considered stable using the 80-25 criteria. In addition, the trend stability of his data for each changing criterion in the intervention phase was stable.
The Level 1 Tau-U effect size for James was 0.00, 90% CI = [−0.78, 0.78]. The Level 2 Tau-U effect size for him was 0.75, 90% CI = [−0.03, 1.0]. His Level 3 Tau-U effect size was 1.00, 90% CI = [0.26, 1.0]. The weighted average Tau-U of the paraphrasing intervention on his overall word problem-solving accuracy was 0.59, SE = 0.27, 95% CI = [0.07, 1.0].
Edgar
Edgar received three baseline sessions resulting in a mean score of 20% accuracy. He received 15 intervention sessions. His mean Level 1 intervention score was 35%. His mean Level 2 intervention score was 54%. His average Level 3 intervention score was 66.67%. His maintenance and follow-up mean scores were 66.67% and 25%, respectively. His intervention data level stability for Level 3 of the changing criterion was deemed stable according to the 80% in 25% criteria. Intervention data for Level 1 and Level 2 of the changing criterion were not considered stable using the 80 in 25 criteria. The trend stability of his data for each changing criterion in the intervention phase was stable according to the 80 in 25 criteria.
Edgar’s Level 1 Tau-U effect size compared with the corrected baseline was 0.17, 90% CI = [−0.61, 0.94]. His Level 2 Tau-U effect size compared with the corrected baseline was 0.60, 90% CI = [−0.14, 1.0]. His Level 3 Tau-U effect size compared with the corrected baseline was 0.67, 90% CI = [−0.04, 1.0]. The weighted average Tau-U of the paraphrasing intervention on his overall word problem-solving accuracy was 0.49, SE = 0.26, 95% CI = [−0.02, 0.99].
Alex
Alex received five baseline measures with mean score of 22% accuracy. He was absent for one intervention session, resulting in a total of 12 intervention sessions (three Level 1, five Level 2, four Level 3). His mean intervention scores were 33.33%, 62%, and 45% for Levels 1, 2, and 3, respectively. Finally, his maintenance and follow-up measures’ mean scores were 73.33% and 25%, respectively. His Level 2 and Level 3 intervention data-level stability were deemed stable according to the 80% in 25% criteria. Intervention data for Level 1 of the changing criterion were not considered stable using the 80 in 25 criteria. In addition, the trend stability of his data for each changing criterion in the intervention phase was found to be stable.
Alex’s Level 1 Tau-U effect size compared with the corrected baseline was 0.07, 90% CI = [−0.67, 0.80]. The Level 2 Tau-U effect size for him compared with the corrected baseline was 0.92, 90% CI = [0.29, 1.0]. His Level 3 Tau-U effect size compared with the corrected baseline was 0.10, 90% CI = [−0.57, 0.77]. The weighted average Tau-U of the paraphrasing intervention on his overall word problem-solving accuracy was 0.38, SE = 0.24, 95% CI = [−0.08, 0.85].
Jane
Jane also received five baseline sessions resulting in a mean score of 4% accuracy. She received 13 intervention sessions (four Level 1, five Level 2, four Level 3), increasing her mean score to 20%, 38%, and 50% accurate for Levels 1, 2, and 3, respectively. Her maintenance mean score was 60%. Finally, after 2 months without access to intervention, her follow-up mean score was 5% accuracy. The level stability of her intervention data for Level 2 of the changing criterion was found to be stable according to the 80% in 25% criteria. Intervention data for Level 1 and Level 3 of the changing criterion were not considered stable. In addition, the trend stability of her data for each changing criterion in the intervention phase was stable.
The Level 1 Tau-U effect size for Jane compared with the corrected baseline was 0.55, 90% CI = [−0.12, 1.0]. The Level 2 Tau-U effect size for her compared with the corrected baseline was 0.92, 90% CI = [0.29, 1.0]. Her Level 3 Tau-U effect size compared with the corrected baseline was 0.90, 90% CI = [0.23, 1.0]. The weighted average Tau-U of the paraphrasing intervention on her overall word problem-solving accuracy was 0.79, SE = 0.23, 95% CI = [0.34, 1.0].
Brian
Brian received five baseline measures with a mean score of 38% accuracy. He received 13 intervention sessions (four Level 1, five Level 2, four Level 3), increasing his mean scores to 60%, 88%, and 80% for Levels 1, 2, and 3, respectively. Immediately following intervention, his maintenance mean score across three sessions was 93.33%. Finally, his follow-up mean score was 55%. The level stability of his intervention data for Level 2 of the changing criterion was stable according to the 80% in 25% criteria. Intervention data for Level 1 and Level 3 of the changing criterion were not stable using the 80 in 25 criteria. In addition, the trend stability of his data for each changing criterion in the intervention phase was considered stable according to the 80 in 25 criteria.
The Level 1 Tau-U effect size for Brian was 0.55, 90% CI = [−0.12, 1.0]. The Level 2 Tau-U effect size for him was 1.00, 90% CI = [0.37, 1.0]. His Level 3 Tau-U effect size was 0.75, 90% CI = [0.08, 1.0]. The weighted average Tau-U of the paraphrasing intervention on his overall word problem-solving accuracy was 0.77, SE = 0.23, 95% CI = [0.32, 1.0].
Diana
Diana received seven baseline measures with a mean score of 21.43% accuracy. She was absent for one intervention session, resulting in a total of 10 intervention sessions (four Level 1, four Level 2, two Level 3). Her intervention mean scores were 30%, 40%, and 60%. Her maintenance average score was 46.67. At follow-up, her mean score was 5%. The level stability of her intervention data for Level 2 of the changing criterion was considered stable according to the 80% in 25% criteria. Intervention data for Level 1 and Level 3 of the changing criterion were not stable using the 80 in 25 criteria. In addition, the trend stability of her data for each changing criterion in the intervention phase was considered stable.
Diana’s Level 1 Tau-U effect size was 0.43, 90% CI = [−0.19, 1.0]. The Level 2 Tau-U effect size for her was 0.43, 90% CI = [−0.19, 1.0]. Her Level 3 Tau-U effect size was 0.00, 90% CI = [−0.80, 0.80]. The weighted average Tau-U of the paraphrasing intervention on her overall word problem-solving accuracy was 0.31, SE = 0.24, 95% CI = [−0.16, 0.78].
Daria
Daria also received seven baseline measures resulting in a mean score of 20% accuracy. She received 11 intervention sessions (four Level 1, five Level 2, two Level 3). Her Level 1 average score was 40. Her Level 2 mean score was 50. Finally, her average Level 3 score was 70%. Her maintenance and follow-up mean scores were 46.67% and 10%, respectively. Her Level 2 and Level 3 intervention data-level stability were considered stable according to the 80% in 25% criteria. Intervention data for Level 1 of the changing criterion were not considered stable using the 80 in 25 criteria. In addition, the trend stability of Alex’s data for each changing criterion in the intervention phase was found to be stable using the 80 to 25 criteria.
Daria’s Level 1 Tau-U effect size compared with the corrected baseline was 0.25, 90% CI = [−0.37, 0.87]. The Level 2 Tau-U effect size for her compared with the corrected baseline was 0.40, 90% CI = [−0.18, 0.98]. Her Level 3 Tau-U effect size compared with the corrected baseline was 0.64, 90% CI = [−0.16, 1.0]. The weighted average Tau-U of the paraphrasing intervention on her overall word problem-solving accuracy was 0.41, SE = 0.24, 95% CI = [−0.05, 0.88].
Mateo
Finally, Mateo received seven baseline measures with an average score of 32.86% accuracy. Mateo was absent for one intervention session and received a total of 10 intervention sessions (three Level 1, five Level 2, two Level 3). His average intervention scores were 46.67%, 78%, and 70% for Levels 1, 2, and 3, respectively. His maintenance and follow-up mean scores were 66.67 and 30, respectively. The level stability of his intervention data for Level 3 of the changing criterion was stable according to the 80% to 25% criteria. Intervention data for Level 1 and Level 2 of the changing criterion were not considered stable using the 80 to 25 criteria. The trend stability of his data for each changing criterion in the intervention phase was stable.
The Level 1 Tau-U effect size for Mateo compared with the corrected baseline was 0.24, 90% CI = [−0.45, 1.0]. The Level 2 Tau-U effect size for him with a corrected baseline was 0.57, 90% CI = [−0.01, 1.0]. His Level 3 Tau-U effect size compared with the corrected baseline was 0.07, 90% CI = [−0.73, 0.87]. The weighted average Tau-U of the paraphrasing intervention on his overall word problem-solving accuracy was 0.32, SE = 0.24, 95% CI = [−0.16, 0.80].
Finally, the weighted average Tau-U of the paraphrasing intervention on word problem-solving accuracy was 0.53, SE = 0.08, 95% CI = [0.37, 0.69]. This indicated a small effect size of the paraphrasing intervention on word problem-solving accuracy.
Summary
As predicted, students made gains in problem-solving accuracy after the administration of the intervention and in maintenance sessions immediately following the conclusion of the intervention. However, all students except one displayed considerable decreases in word problem-solving points after prolonged periods of time without access to the intervention (2 months). The level stability of each changing criterion was considered for each student. The Level 1 intervention data for Mary were considered stable using the 80 in 25 criterion (Gast & Ledford, 2014), whereas all other Level 1 data were not considered stable. It should be noted that only four data points were collected in Level 1. All Level 2 intervention data for each student were considered stable according to the 80% in 25% criterion with the exception of two students (Edgar and Mateo). Finally, Level 3 intervention data for six out of nine students were considered stable according to the 80 in 25 criterion. In addition, the trend stability of each student’s intervention data for each changing criterion was considered stable utilizing the 80 in 25 criteria. The weighted average Tau-U effect sizes varied from 0.31 to 0.79 with an overall mean of 0.53. The overall mean reflects a small effect size (Soares et al., 2016). However, it is important to note that three of the participants yielded effect sizes in the moderate range (Mary, Jane, and Brian).
Curriculum-Based Measures
Figure 2 displays M-CAP accuracy points for each student as a function of baseline, intervention, maintenance, and follow-up sessions. Visual analysis indicated that all students demonstrated a predictable pattern of data in the baseline phase. The variability within all phases was low. All students, with the exception of Alex, displayed increases in problem-solving accuracy after the implementation of the paraphrasing intervention.

AIMSweb Math Concepts and Applications points accuracy as a function of baseline, treatment, maintenance, and follow-up.
The mean MCAP scores for each student in the baseline, intervention, maintenance, and follow-up phases, respectively, are reported in Table 3. AIMSweb has presented default cut scores for each of their measures predicting probabilities of success on state tests (NCS Pearson, Inc., 2012). These cut scores are associated with 50% and 80% probability of passing the state test in math. The first cut score is the lowest scoring (15th percentile) of the nationally normed sample, indicating severe risk in math (needing intensive intervention). The second cut score is the lowest 45% of students, indicating moderate risk (defined as “at-risk” or strategic). The M-CAP cutoff scores for third graders in the Spring semester are below 8 for severe risk (below the 15th percentile), and 8 to 14 for moderate risk (15th–45th percentile).
M-CAP Mean Scores Across Phases.
Visual analysis
The mean MCAP scores for each student in the baseline, intervention, maintenance, and follow-up phases, respectively, are reported in Table 3. Data trends in each phase of the study were analyzed. In the baseline phase, all students demonstrated relatively flat trends before the intervention was administered. Following the intervention, slight improvement in M-CAP scores were noted in six out of nine students. Trend was not evaluated for Diana in the intervention phase because only one data point was collected. Finally, taken together, trends in the maintenance and follow-up phases were decreasing. Variability in word problem-solving accuracy during each phase was examined. Students displayed low variation in scores in the baseline and intervention phases. Variability was not analyzed in intervention scores for Diana, Daria, or Mateo because two or fewer data points were collected during this phase for these students. Taken together, students displayed low variability in the maintenance and follow-up phases. Variability in the maintenance and follow-up phases were not analyzed for James because he did not return to school the following year and data were not collected in the follow-up phase.
Mary
Mary was administered three baseline sessions, resulting in a baseline mean score of 7.00 points. Her baseline mean score indicated the need for intensive intervention. She received four M-CAP measures during the intervention phase with an intervention mean score of 11.5 points. Her intervention mean score fell within the moderate risk range. She received one maintenance measure, with a score of 12. Finally, she received two follow-up measures with a mean score of 12.5. The Tau-U effect size for her after controlling for baseline trend was 0.91, 90% CI = [0.14, 1.0].
James
James received three baseline measures with a baseline mean score of 6.33. His baseline score fell below the cutoff score designated as severe risk. He received four intervention measures, resulting in an intervention mean score of 12. His intervention mean score fell within the moderate risk range. His maintenance average score was 12.5 points. He did not return to this school the following school year, and did not participate in follow-up measures. The Tau-U for him was 1.00, 90% CI = [0.32, 1.0].
Edgar
Edgar also received three baseline measures, resulting in a mean baseline score of 7 points. His mean baseline score indicated severe risk. His four intervention measures resulted in a mean intervention score of 13.75. His mean score fell within the moderate risk range. His maintenance average score was 15. Finally, at follow-up, his mean score was 16.5 points. His maintenance and follow-up scores were above the moderate risk cutoff score. The Tau-U effect size for him was 1.00, 90% CI = [0.32, 1.0].
Alex
Alex received four baseline measures with a mean baseline score of 6.5 points. His mean score fell below the cutoff score for severe risk. He received three intervention measures, resulting in a mean intervention score of 8. His maintenance mean score was 7 points. At follow-up, his average score was 8.5 points. The Tau-U for him was 0.41, 90% CI = [−0.36, 1.0].
Jane
Jane received four baseline measures with an average score of 4.5. Her baseline mean score indicated severe risk, needing intensive intervention. She received three intervention sessions, with a mean score of 8.67. Her maintenance average score was 10. At follow-up, her mean score was 8.5. Her intervention, maintenance, and follow-up scores fell within the moderate risk range. The Tau-U for her was 1.00, 90% CI = [0.32, 1.0].
Brian
Brian received four baseline measures with an average score of 7.25. His baseline mean score fell below the severe risk cutoff score. He received three intervention measures, with an average score of 7.33. His maintenance and follow-up mean scores were 11 and 12.5, respectively. His maintenance and follow-up scores fell within the moderate risk range. The Tau-U for him was 0.17, 90% CI = [−0.61, 0.94].
Diana
Diana was absent for one baseline session, resulting in a total of four baseline measures. Her mean baseline score was 5 points, indicating severe risk. She was also absent for one intervention session, resulting in one intervention measure with a score of 11. Her maintenance mean score was 9. At follow-up, her average score was 9. The Tau-U effect size for her was 1.00, 90% CI = [−0.16, 1.0].
Daria
Daria received five baseline sessions, with a mean score of 6 points. She received two intervention measures with an average score of 8, falling within the lower end of the moderate risk range. Her maintenance mean score was 11.5. Her follow-up mean score was 8.5. The Tau-U effect size for her with a corrected baseline was −0.20, 90% CI = [−1, 0.65].
Mateo
Finally, Mateo received five baseline measures with a mean score of 11.4, indicating moderate risk. He received two intervention measures, resulting in an average of 16.5. His maintenance and follow-up scores were also above the cutoff score for moderate risk at 18.5 and 21.5, respectively. The Tau-U effect size for him was 0.70, 90% CI = [−0.15, 1.0].
Finally, the weighted average Tau-U of the paraphrasing intervention on M-CAP accuracy was 0.66, SE = 0.17, 95% CI = [0.33, 1.0]. This indicated a medium and significant effect of the paraphrasing intervention on M-CAP accuracy.
Summary
Overall, with the exception of Mateo, all students’ baseline mean scores on the M-CAP fell below the cutoff score, indicating severe risk status. This indicated that students were in need of intensive intervention. With the exception of Brian and Mateo, all students’ intervention mean scores were in the moderate risk range. Mateo’s mean intervention score fell above the cutoff score, whereas Brian remained in the severe risk category. At maintenance and follow-up, all students, with the exception of Alex, scored at least within the moderate risk range. Edgar and Mateo’s scores were above the cutoff score for risk. The Tau-U effect sizes varied from −0.20 to 1.0 with an overall mean moderate effect size mean of 0.66.
Discussion
The purpose of this study is to investigate the effectiveness of paraphrase comprehension strategy on word problem-solving accuracy for third-grade EL students who are at risk of MD. Overall, the current study provides positive support for the effectiveness of the paraphrasing intervention on the word problem-solving accuracy. The magnitude of effect sizes on the experimental measure was in the medium range (Tau-U = 0.53). That is, 53% of the intervention phase data showed improvement when compared with the baseline phase. However, all students displayed a drop in problem-solving performance after prolonged periods of time without the intervention. For the general outcome measures of typical grade-level problem-solving curriculum (M-CAP), the magnitude of effect sizes was in the moderate range (Tau-U = 0.66). In contrast to the experimental problem-solving measures, students did not display significant decreases in M-CAP performance after an extended period of time without the intervention.
Taken together, the current study extends the literature base by focusing on the use of a paraphrasing comprehension strategy intervention to improve the math problem-solving skills of ELs at risk of MD. The results of this study were consistent with the recent group design studies regarding the positive effects of paraphrasing interventions on the word problem-solving skills of students with MD (Moran et al., 2014; Swanson et al., 2014). Although it is difficult to compare results directly due to different samples and methodology, Moran and colleagues (2014) found that paraphrasing-relevant propositions produced an effect size of 0.93. However, this study included a mix of both ELs and English-proficient students. In addition, students received intervention 2 times a week for 25 to 30 min over the course of 10 weeks (20 intervention sessions). The current study delivered a maximum of 15 intervention sessions over the course of 5 weeks (approximately 3 times a week, 30 min each session) and focus only on ELs.
In terms of single subject design studies, the literature for word problem-solving interventions for ELs at risk of MD is limited. To the authors’ knowledge, the only published single subject design studies for problem-solving interventions exclusively for ELs at risk of MD were conducted by Orosco and his colleagues (2013; 2014). These studies utilized as intervention modifications of dynamic testing (explicit prompting) within the child’s language system (Spanish) to improve problem-solving performance. Thus, it is difficult to compare the current study with this previous work. Regardless, the current and the aforementioned study found positive results by emphasizing academic language and comprehension strategies to address the word problem-solving skills of ELs at risk of MD.
Limitations
Despite some encouraging outcomes of this study, results should be interpreted with caution. There were at least four limitations of the current study. First, this was a small-scale study (N = 9), in which data for individuals were collected for a duration of 23 sessions. Thus, the extent to which this paraphrasing intervention could have mediated word problem-solving skills in other EL students at risk of MD for this duration of time is unknown. Thus, generalization of intervention effectiveness to other populations of students who are ELs at risk of MD is unclear.
Second, paraphrasing intervention was delivered in a highly intensive manner. Students were taught the intervention in small groups with an average of 3 times a week over the course of 6 weeks. The extent to which frequency and duration of the intervention may not directly match current practice in special education.
Third, although the experimental design was a changing criterion, multiple baseline across subjects, and difficulty in problem solution increased throughout intervention, the analysis of students’ scores did not adjust for increasing difficulty. Thus, the extent to which the increasing difficulty of problem solution across sessions was captured is limited.
Finally, data were not collected for that particular session because of student absences. Because single subject design studies examine repeated measures on an individual over sessions, the extent to which missing data affected the results is unclear.
Implications for Practice and Research
The results of this study offer implications for practice and future research. Intervention that focused on a reading comprehension strategy involving paraphrasing helped ELs who were at risk of math disabilities improve problem-solving performance. However, the results also suggested that students did not maintain skills after an extended time without access to the intervention. Thus, students need continual support until a certain level of mastery or proficiency is reached. In addition, this finding emphasizes the need to include techniques into the instructional protocol that may promote and explicitly teach generalization.
Conclusion
In summary, this study found that ELs at risk of MD improved in problem-solving accuracy with a paraphrase intervention. Given the positive outcomes related to the intensive use of paraphrasing intervention, further research should next be directed to identify instructional components that will sustain performance after the intensive intervention has been removed.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
