To Screen or not to Screen: Criterion-Related Validity of Math and Reading Curriculum-Based Measurement in Relation to High-Stakes Math Scores

Abstract

This study analyzed the relationship between benchmark scores from the newly published Dynamic Indicators of Basic Early Literacy Skills Math (i.e., Acadience™) math probes and student performance on math and reading sections of a state-mandated high-stakes test. Participants were 420 students enrolled in third, fourth, and fifth grades in a rural southeastern school district. Specific to this study was the calculation of the predictive validity of benchmark scores obtained in the spring from curriculum-based measurement probes measuring math computation, math application skills, and reading ability. Results of the study suggest that math application probes have strong predictive validity. The study also provides evidence that even at early grades the skill of reading is associated with performance on a high-stakes math test. The study provides some evidence that calculation skills are needed, but do not account for as much of the variance as reading ability does in grades as low as third grade. Implications for practice are discussed as it relates to multiple gating screening procedures at the elementary level.

Keywords

curriculum-based assessment < education assessment math assessment reading < disciplines & subjects response to intervention/multitiered system of supports < education assessment

Many public school districts in the United States are in the process of adopting national initiatives that place emphasis on preventative programming. The most common framework emphasizes the use of multitiered systems of support (MTSS) to deliver preventative services to all students. A major component of MTSS is the use of both formative and summative data to inform educational decisions. School-wide screenings and frequent monitoring of progress are hallmarks of MTSS. As schools transition to this data-based decision-making model encompassed within the MTSS framework, one of the first decisions to be made is on what type of data to collect. Because screening and progress monitoring data are used to make significant decisions, schools need to rely on measures that are psychometrically sound.

In the academic realm, curriculum-based measurement (CBM) is one of the most popular assessment tools used by schools (Fuchs & Fuchs, 2004). Originally designed to monitor progress toward individualized education program goals for students in special education, CBM has also become a valid and reliable method for universal screening, progress monitoring, and educational planning for students in regular education settings (Marston & Magnusson, 1985). Although CBM has been used to screen large groups of students and identify those at risk since the 1980s (Deno, 1985), the recent adoption of MTSS has led to the development of many commercial CBM products. However, CBM products are often marketed and adopted for use before sufficient data exist to confirm their reliability and validity. Of particular importance is the criterion-related validity of CBM tools. As compared to the large body of research establishing the validity of CBM in reading, fewer studies have focused on CBM tools that measure mathematical skill (Wayman, Wallace, Wiley, Ticha, & Espin, 2007).

The lack of research on math measures is problematic, given that research suggests that deficits in math start early in elementary school and then persist into adulthood (Reyna & Brainerd, 2007). In 2019, the National Assessment of Educational Progress assessed mathematics achievement in a nationally representative sample of fourth-grade students and eighth-grade students. Findings indicated that 66% of eighth graders and 59% of fourth graders were below proficiency standards in mathematics (National Center for Education Statistics, 2019). Furthermore, it is estimated that nearly 7% of K-12 students have a specific learning disability in mathematics (Swanson, Jerman, & Zheng, 2009).

Dynamic Indicators of Basic Early Literacy Skills (DIBELS) Math (renamed Acadience™ in June 2019) is an assessment of mathematics skills for students from kindergarten through sixth grade. It can be used to identify students at risk for mathematics difficulties, help teachers identify areas to target for instructional support, progress monitoring at-risk students receiving targeted instruction, and evaluate the effectiveness of core mathematics instruction (Wheeler, 2016). DIBELS Math includes three different types of math measures: early numeracy (EN), computation (COMP), and concepts and applications (CAs). The types of problems included on DIBELS Math measures are drawn from domains of the Common Core State Standards for Mathematics (Dynamic Measurement Group, Inc., 2014a, 2014b). Similar to DIBELS Next measures, DIBELS Math utilizes benchmark goals, cut points for risk, and composite scores. Benchmark goals help to indicate the conditional probabilities for students to meet EN and COMP outcomes in the future, meaning that a student achieving a benchmark goal on a DIBELS Math measure is more likely to achieve later mathematics outcomes (Dynamic Measurement Group, Inc., 2014a, 2014b). DIBELS Math measures are intended to be used as a general outcome measure (GOM) of mathematics skill (Dynamic Measurement Group, Inc., 2014a, 2014b).

In a 2007 literature review on progress-monitoring measures in mathematics by Foegen, Jiban, and Deno, only 32 studies from a pool of 578 articles, dissertations, and reports related to CBM addressed mathematics measures. The majority of these studies examined the validity of CBM mathematics measures. The most commonly studied measures were the Monitoring Basic Skills Progress (MBSP; Fuchs, Hamlett, & Fuchs, 1998) and measures sampling grade-level COMP skills (Foegen, Jiban, & Deno, 2007). Another review of mathematics CBM research examined CBM mathematics COMP measures, but excluded commonly used commercial products such as the MBSP and AIMSweb mathematics COMP CBM (Christ, Scullin, Tolbize, & Jiban, 2008). Christ et al. (2008) suggest the need for further research establishing the utility of mathematics CBM similar to the vast body of research supporting the utility of oral reading fluency (ORF) CBM. Foegen et al. (2007) found that most CBM mathematics studies explored Stage 1 research, which examines the technical adequacy of measures as static indicators. Results of these studies have provided preliminary data to suggest there is adequate reliability and validity for the particular measures included in their research. Although a number of studies have established the technical adequacy of various CBM mathematics tools, to our knowledge, none have independently explored the technical adequacy of DIBELS Math.

The Role of Reading in Mathematical Proficiency

Research indicates the need to consider reading ability when providing mathematical instruction as reading ability/reading comprehension provides access to mathematical knowledge (Whitley, 2019). Reading ability may influence performance on many math-related tasks, as some areas of math are dependent on one’s ability to read and comprehend (Aaron, 1968; Jordan, Kaplan, & Hanich, 2002). For example, in order to complete math word problems, students will need the ability to read fluently and comprehend the problems. Also, almost every high-stakes math achievement test consists mainly of multiple-choice questions that also require adequate reading skills (Helwig, Rozek-Tedesco, Heath, & Tindal, 1999). However, few researchers have explored the relationship between math and reading assessment tools. Even fewer studies have considered the role that specific basic academic skills (e.g., fluent reading) might play in seemingly unrelated basic academic skills (e.g., math COMP).

To date, research using CBM to look at the unique contribution of reading skills within math assessments is sparse. Crawford, Tindal, and Stieber (2001) found a moderate correlation between scores on ORF probes and mathematics performance on a high-stakes test (rs = .46–.53). More resent research completed by Whitley (2019) provided similar results for students in the fifth and sixth grades. The author found moderate correlations between ORF (rs = .49–.55) and the mathematics portion of the Illinois State Achievement Test. Anselmo, Yarbrough, Kovaleski, and Tran (2017) also found positive correlations between MAZE (rs = .30–.34) passages and the North Carolina End of Grade assessment for math with a sample of seventh graders.

Purpose of the Study

In this study, DIBELS math and reading passages were used to predict math performance on statewide achievement tests in a sample of third-, fourth-, and fifth-grade students. Specifically, the following research questions were addressed: (1) What is the predictive validity of the DIBELS math benchmark measures? (2) What is the unique contribution of reading ability to a prediction model that includes math application and calculation skills?

Method

Participants

The current study included 420 third-, fourth-, and fifth-grade students who attended two elementary schools in a southeastern school district. The district-level demographic data indicated that 2% of the students were English language learners and 72% qualified for free and reduced lunch. Specific demographic information on the sample is presented in Table 1. Participants’ math and reading ability was assessed using DIBELS benchmark math COMP, math application, and ORF probes that were administered as part of routine benchmarking procedures during the fall. In June of 2019, the probes used in this study began to be published by Acadience™. The criterion measure was student performance on the math portion of the North Carolina high-stakes test from each grade.

Table 1.

Participant Demographics.

		Sex		Ethnicity					Received special education
Grade		Male	Female	White	African American	Hispanic or Latinx	Multiracial	Asian	Received special education
3rd	N	89	68	99	32	15	9	2	21
3rd	Percent	57	43	63	20	10	7	1	13
4th	N	78	87	103	38	17	4	3	28
4th	Percent	46	54	62	23	10	3	2	17
5th	N	44	54	63	19	11	5	0	13
5th	Percent	47	53	64	20	11	5	0	13
Total	N	211	209	265	89	43	18	5	62
Total	Percent	50	50	63	22	10	4	1	15

Measures

Math COMP

M-CBM COMP probes measure how fluently a student can produce correct answers to grade-specific math COMP problems. Benchmark assessments in third through fifth grades were analyzed. The probes contained a mixture of the four grade-specific whole number operations (addition, subtraction, multiplication, and division), as well as multi-digit calculation problems. Probes were administered class-wide following standardized administration procedures. Third graders had 3 minutes, fourth graders had 5 minutes, and fifth graders had 6 minutes to complete the probe. Scores were calculated by counting how many digits were accurately placed in the correct place column (digits correct). This instrument has strong reported reliability across third through sixth grades for inter-rater (r ranging from .98 to .99), test–retest (r ranging from .81 to .90), and alternate-form reliabilities (r ranging from .73 to .88). Criterion validity coefficients have also been shown to be high. Concurrent and predictive validity was established by comparing beginning of the year COMP scores and end of year COMP scores to the SAT10 Total Math Score, which was also assessed at the end of the year. Coefficients for predictive validity (r ranging from .71 to .83) and concurrent validity (r ranging from .69 to .76) were strong (Gray, Warnock, Dewey, Latimer, & Wheeler, 2019).

Math CAs

CA probes include a selection of problems containing multistep procedures that incorporate addition, subtraction, multiplication, or division knowledge. The probes were designed to sample the math curriculum and contain items at each grade level in the following areas: operations and algebraic thinking, number and operations in base 10, number and operations/fractions, measurement and data, and geometry (Wheeler et al., 2019). Probes were administered class-wide following standardized administration procedures. Third graders had 12 min, fourth graders had 10 min, and fifth graders had 14 min to complete the probe. When scoring CAs, items are scored based on a rubric for each item. Some items are scored based on correct digits and receive a point value for how many digits are correct. For instance, a problem might have an answer of 15 earning a 5 points for a child for one correct digit or 10 points for two correct digits. Developers report strong reliability coefficients across third through sixth grades based on inter-rater (r ranging from .99 to 1.0), test–retest (r ranging from .75 to .85), and alternate-form reliabilities (r ranging from .78 to .87). Criterion validity coefficients have also been shown to be high for the CA probes in grades three through five. Concurrent and predictive validity was established by comparing beginning of the year CA scores and end of year CA scores to the SAT10 Total Math Score, which was also assessed at the end of the year. Validity data are strong, with predictive validity coefficients ranging from .74 to .81 and concurrent validity coefficients ranging from .76 to .83 (Gray et al., 2019).

ORF

ORF probes were used as a GOM for reading ability. ORF consists of a grade-level reading passage that measures how many correct words a student can read correct per minute. ORF data were obtained by taking the median of three 1-minute administrations from separate grade-level passages. This measure has been widely regarded as a psychometrically sound instrument to obtain information about a student’s reading ability for many years (Fuchs, Fuchs, Hosp, & Jenkins, 2001; Shinn, Good, Knutson, & Tilly, 1992). Dewey, Powell-Smith, Good, and Kaminski (2015) reported inter-rater and alternate form reliabilities ranging from .93 to .99 in grades three through five. Predictive and concurrent validity data are also reported as falling between .65 and .77 for predictive validity and between .65 and .74 for criterion validity (Dewey et al., 2015).

Criterion measure

The North Carolina end-of-grade math (NC-EOG-M) test is a statewide summative assessment that students take in grades three through eight. The assessment is given each spring to measure student’s level of proficiency in the area of mathematics. The 2017 NC-EOG-M test measures the skills outlined for mathematics instruction as part of the North Carolina Mathematics Standard Course of Study. The 2017 NC-EOG-M test is aligned to the NC Standard Course of Study for Mathematics. The competency goals and skills of the mathematics curriculum are divided into five separate domains with different amounts of each at the different grade levels: (a) operations and algebraic thinking, (b) number and operations in base ten, (c) number and operations: fractions, (d) measurement and data, and (e) geometry.

This is a summative assessment that is a multiple-choice and gridded response (students have to produce an answer within a grid of boxes) exam. The exam has two components, a calculator active portion and a non-calculator active section. Both sections combine for the total scaled score for the assessment. Items range in the amount of reading required, but the majority of the test is not just the presentation of calculation skills. A small analysis of questions from a portion of a released fifth-grade test showed that a random sample of 10 questions had an average of 23.7 words per question (range was between 4–47 words).

Students receive two scores on the test: a scaled score and a score that represents the overall achievement level. Scaled scores ranged from 430 to 465. Scaled scores are then placed into one of five achievement levels. Achievement levels are reported as a 1, 2, 3, 4, or 5. Level 3, 4, and 5 are considered passing, and 1 and 2 are considered not passing or performing below the grade level. The most recent data reported for the NC-EOG-M test indicated reliability coefficients between .91 and .92 for third grade (Mbella, Zhu, Karkee, & Lung, 2016). The authors also report average internal consistency (coefficient alpha) reliability of .92 across grade levels. SE was also reported to be low across the tested grade levels at 2 to 3 scaled score points. These data demonstrate good classification accuracy ranging from .90 to .96 and consistency from .86 to .95. Content validity is reported based on test construction and item reviews that were completed by properly trained classroom teachers from across the state. Criterion-related validity data have been reported in that scaled scores can be linked to a quantile framework. This quantile framework is reported to be highly correlated at varying levels with the demands of college and career-readiness standards (Mbella et al., 2016).

Procedures

Benchmarking data were independently collected and stored by school personnel prior to this study. Benchmarking occurred in September, 2016, January, 2017, and May, 2017 during the 2016–2017 school year. Math probes were group-administered in eight 3rd-grade classrooms, eight 4th-grade classrooms, and five 5th-grade classrooms across two elementary schools. Probes were scored by the classroom teachers after training, and integrity of scoring was checked by a curriculum coordinator at each school. ORF probes were individually administered by the same classroom teachers during the same benchmark window that the math probes were given. Prior to benchmarking, all examiners were trained to administer and score each measure using standardized procedures.

After obtaining consent from the school district administration and the institutional review board, the research team conducted a secondary data analysis of the CBM probes and NC-EOG-M test scores for all students. Students’ scores on the three CBM predictor measures administered in the fall were compared to their scores on the NC-EOG-M test that was administered the following spring. To determine the relation between both types of math probes and high-stakes test scores after accounting for reading skill, a series of regression analyses were performed with both math measures and ORF as predictors of high-stakes test scores in math.

Results

The first research question explored the predictive validity of all three CBM measures at each grade level. Means, SDs, and zero-order correlations for all variables at each grade level appear in Tables 2–4. Results of this study suggest that math COMP, math application, and oral reading probes were related to the high-stakes test scores at each grade level. All correlations were significant at the .001 level (rs = .42–.73). Validity coefficients were the strongest between math application scores and the high-stakes test scores at each grade.

Table 2.

Zero-Order Correlations and Descriptive Statistics for Third-Grade Variables.

Measure	1	$2$	3	4
COMP	—	.47	.35	.47
CA		—	.52	.67
ORF			—	.58
NC EOG math				—
M	11.32	23.25	92.41	451.18
SD	5.94	13.36	41.04	9.00

Note. COMP = math computation probe; CA = math concepts and application probe; ORF = oral reading fluency probe; NC-EOG-M = North Carolina end-of-grade mathematics test. All correlations were significant at the .001 level.

Table 3.

Zero-Order Correlations and Descriptive Statistics for Fourth-Grade Variables.

Measure	1	$2$	3	4
COMP	—	.62	.24	.42
CA		—	.48	.73
ORF			—	.47
NC EOG math				—
M	16.15	36.76	97.90	450.82
SD	5.94	21.20	33.59	10.02

Table 4.

Zero-Order Correlations and Descriptive Statistics for Fifth-Grade Variables.

Measure	1	$2$	3	4
COMP		.72	.47	.58
CA			.53	.59
ORF				.54
NC EOG math
M	27.83	25.07	115.07	452.54
SD	15.46	16.64	32.41	8.33

Significant correlations were found between COMP scores and NC-EOG-M scores at each grade level (3rd grade: r = .47; 4th grade: r = .42; 5th grade: r = .58), with the strongest relationship occurring in the 5th-grade sample. Third-grade COMP scores accounted for 21% of the variance in the 3rd-grade NC-EOG-M scores. In the fourth grade, COMP scores explained 17% of the variance. Student scores on fifth-grade COMP probes accounted for 33% of the variance in the 5th-grade NC-EOG-M scores.

Predictive validity coefficients were the strongest between CA scores and NC-EOG-M scores at each grade level. The strongest relationship was found in the 4th-grade sample (r = .73). CA scores explained 44% of the variance in third grade, 53% of the variance in fourth grade, and 53% of the variance in fifth grade.

ORF scores at each grade level were also significantly correlated with the high-stakes math test, with validity coefficients ranging from .47 to .58. ORF scores accounted for 34% of the variance in NC-EOG-M scores in third grade, 21% of the variance in fourth grade, and 28% of the variance in fifth grade.

The second research question investigated the unique contribution of reading ability to a prediction model that includes math application and calculation skills. Specifically, researchers sought to ascertain the extent to which ORF scores explain variance beyond the variance attributed to COMP and CA scores on the NC-EOG-M. To answer this question, a series of regression analyses were executed with both M-CBM measures and ORF as predictors of NC-EOG-M scores. An examination of assumptions revealed no violations of normality, linearity, homoscedasticity of residuals, and collinearity. Results from each grade level are presented in Tables 5–7. The largest amount of variance was explained by math application scores and ORF scores. This pattern was noted across all three grades. Math COMP scores explained a minimal amount of variance beyond the variance attributable to math application skill at each grade level.

Table 5.

Curriculum-Based Measurement in Mathematics and Oral Reading Scores as Predictors of NC-EOG-M Scores for Third Grade.

Model/variable	NC-EOG-Math
Model/variable	β	SE	R ²	ΔR²
1. CA	.67***	.04	.45
2. CA	.58***	.05
COMP	.19**	.10	.48	.03
3. CA	.45***	.05
COMP	.15*	.10
ORF	.30***	.01	.54	.06

Note. COMP = math computation probe; CA = math concepts and application probe; ORF = oral reading fluency probe; NC-EOG-M = North Carolina end-of-grade mathematics test. *p < .05, **p < .01, ***p < .001.

Table 6.

Curriculum-based Measurement in Mathematics and Oral Reading Scores as Predictors of NC-EOG-M Scores for Fourth Grade.

Model/Variable	NC-EOG-Math
Model/Variable	β	SE	R ²	ΔR²
1. CA	.73***	.03	.54
2. CA	.76***	.03
COMP	−.05	.10	.54	.001
3. CA	.69***	.03
COMP	−.03	.10
ORF	.14*	.02	.55	.02

Note. COMP = math computation probe; CA = math concepts and application probe; ORF = oral reading fluency probe; NC-EOG-M = North Carolina end-of-grade mathematics test. p < .05, **p < .01, ***p < .001.

Table 7.

Curriculum-based Measurement in Mathematics and Oral Reading Scores as Predictors of NC-EOG-M Scores for Fifth Grade.

Model/Variable	NC-EOG-Math
Model/Variable	B	SE	R ²	ΔR²
1. CA	.59***	.04	.35
2. CA	.36**	.06
COMP	.32**	.06	.39	.04
3. CA	.25*	.06
COMP	.27*	.06
ORF	.28**	.02	.44	.06

In the third grade sample, the largest amount of variance (R² = .54) was explained by CA scores (β = .45, p < .001) in combination with COMP scores (β = .15, p < .05) and ORF scores (β = .30, p < .001). When COMP was added to the model, an additional 3% of the variance was accounted for beyond the variance attributed to CAs alone. Although the COMP beta weight was statistically significant, the ΔR² value indicates that the addition of COMP scores only explained an additional 3% of the variance. However, ORF explained an additional 6% of the variance beyond that attributed to both math measures.

A similar pattern was found in the fifth-grade sample, with all three measures explaining 45% of the variance on the NC-EOG-M (β = .28, p < .01). CA scores alone accounted for 35% of the variance. Although the combination of all three measures explained the largest amount of variance, ORF scores made a slightly larger impact than did COMP scores (COMP ΔR² = .04, ORF ΔR² = .06).

The weakest contribution of COMP scores was found in the fourth-grade sample. The largest amount of variance (R² = .55) was explained by CA scores (β = .73, p < .001) and ORF scores (β = .15, p < .05). The addition of ORF scores explained 2% of the variance beyond what was attributed to CAs and COMP. The COMP beta weights were not statistically significant, indicating that COMP scores did not strengthen the model.

Discussion

The results of this study present several important findings for schools and practitioners that use math universal screening instruments at the elementary level. In this study, we investigated the criterion validity of DIBELS Math measures, which includes COMP CBM and math application CBM (CA) probes. This study compared benchmark scores to the NC-EOG-M assessment. Specifically, the predictive validity of each measure was examined. One of the findings of this study is that COMP scores produced moderate predictive validity coefficients in the third- and fourth-grade samples. Coefficients in third grade were .47 and coefficients in fourth were .42. Correlation coefficients were higher in the fifth grade at .58. These findings are consistent with previous research completed with elementary students (Foegen et al., 2007; Fuchs et al., 1994; Fuchs, Hamlett, & Fuchs, 1999; Keller-Margulis, Shapiro, & Hintze, 2008). While these coefficients are significant and suggest moderate correlations, they were significantly lower than the coefficients found for the CA scores in the third- and fourth-grade samples. The CA coefficient in third grade was .67 and that in the fourth grade was .73. It is important to note that this pattern was not consistent in the fifth-grade sample. In fifth grade, the COMP (r = .58) and CA (r = .59) correlations were very similar.

Third- and fourth-grade findings could be explained by the nature of high-stakes assessments in math. These assessments have very few pure COMP-based problems. They are reasoning-based problems presented using words, tables, and charts. Mathematical skills needed to complete these types of problems require knowledge above and beyond fluency with grade-level COMP. The fifth-grade assessment is similar to the assessments in third and fourth grades, so the lack of consistent pattern could be related to a smaller sample size for the fifth graders within the study.

Using regression models, researchers found that while COMP probes were moderately correlated with math outcomes on a high-stakes test, COMP skills did not contribute significant unique prediction beyond the contribution of CA scores. In our sample, reading skills across all grade levels accounted for more of the variance in NC-EOG-M scores as opposed to COMP skills. These results were unexpected, given that COMP fluency is essential to perform grade appropriate math tasks successfully. While unexpected, this finding is consistent with findings from Anselmo et al. (2017), which demonstrated similar results with a sample of seventh-grade students.

Several possible reasons exist for this finding in the current study. First, even though Cawley, Parmar, Foley, Salmon, and Roy (2001) state that elementary math curricula focus mainly on COMP skills, high-stakes math assessments, even at the elementary level, appear to go well beyond basic COMP skills. Therefore, math assessments require reasoning skills in addition to sufficient reading skills to comprehend the content being presented. Students starting in the third grade are expected to apply basic COMP skills to more complex math word problems and multistep analyses. The focus on conceptual math skills is further demonstrated by the domains assessed by the third- through fifth-grade NC-EOG-M, which include operations and algebraic thinking, number and operations in base 10, number and operations: fractions, measurement and data, and geometry. These content areas are representative of math reasoning and application skills.

Another possible reason as to why COMP scores did not have stronger predictive validity may be explained by the way in which questions on the NC-EOG-M are presented. High-stakes math tests at all levels are affected by students’ reading abilities due to how questions are presented (Jordan et al., 2002). Scores on such assessments appear to be the result of abilities including but not limited to COMP proficiency, reasoning skills, and overall reading competency (Anselmo et al., 2017). These findings are also consistent with previous research that suggests when reading ability was part of a prediction model, prediction accuracy increased significantly, suggesting that reading is an essential part of math ability (Thurber, Shinn, & Smolkowski, 2002).

Although our results indicate weaker criterion-related validity of COMP, the opposite was true for CA probes. The strong predictive validity coefficients for CA probes across grade levels (rs = .59–73) are in agreement with previous literature and suggest these benchmark probes have very good criterion validity. We found that CA probes consistently accounted for more of the variance in NC-EOG-M scores than both COMP and ORF. These results indicate that M-CBM probes containing multistep problems, reasoning, and calculation skills are well equipped to predict performance on a high-stakes math test as early as third grade. The strong correlations between the CA probe and the NC-EOG-M provide evidence that these probes are appropriate for measuring math skills at the elementary level. Our findings regarding CA probes are consistent with previous research (Fuchs et al., 1994, 1999, 2000; Jitendra, Sczesniak, & Deatline-Buchman, 2005).

Implications for Practice

The applications of CBM vary in its use as a formative assessment tool. When employed as part of a functional MTSS system, probes are used for screening, progress monitoring, and, at times, as part of the evaluation component to determine the need for special education services. With the growing popularity of CBM, companies have been expanding the academic areas that CBM will assess. One of the newer options for schools to help with monitoring math skills is DIBELS Math CBM. Like any new instrument, independent validation is required to provide professionals in schools the best options available. The current study demonstrates that DIBELS application probes have very good predictive validity, thus adding to the evidence that suggests this instrument that is well suited for universal screening at the elementary level in the area of mathematics.

Also, practitioners need to understand that calculation skills, while important, do not necessarily equate to overall math knowledge or at least the math knowledge that is required of students on a high-stakes end-of-year assessment. Math ability at all levels appears to be made up of several independent skills. The skills of reasoning, reading ability, and calculation skills are all needed to perform adequately when taking these assessments. DIBELS CA probes appear to be more representative of the skills required to perform well on a Math End-of-Grade assessment. This is not surprising considering the presentation of items is similar between the two assessments with the exception that most of the NC-EOG-M test is of multiple-choice questions. This is not to say that math calculation skills are not important, if not even a crucial component of overall mathematical literacy. However, in the interest of giving back time to instruction, schools may approach mathematical screening using a more gated approach as early as third grade. This approach would have schools screen all students with an application-based math assessment. Students who demonstrate trouble on this assessment would then be administered probes measuring COMP skills. This gated approach would allow educators to better see the students who need further survey-level assessment if COMP skills are found to be lacking, and those who need help with math application skills, and those who need only core instruction. While these probes are not diagnostic in nature, they can be used to give educators valuable information as to who may need supplemental instruction within different math domains.

Limitations and Directions for Future Research

Readers should consider several limitations before drawing conclusions about the results of this study. First, participants were students from three grades at two elementary schools located in North Carolina. Also, the sample was ethnically homogenous. Thus, generalizability to populations beyond this sample might be limited.

A second limitation of this study concerns the archival nature of the data. Researchers did not have access to the actual probes, which calls into question the fidelity of administration and scoring procedures. Researchers were able to verify that all examiners were trained to administer and score the measures according to standardization procedures.

Future research is needed to validate and expand upon the results of this study. Replications are suggested to examine the consistency of these findings across a variety of settings and among different populations. It is also important for this research to be conducted across the United States as statewide assessments are designed based on standards that are selected by each state and have known problems with reliability and validity (Kingsbury, Olson, Cronin, Hauser, & Houser, 2003). Despite these issues, state assessments continue to be required by federal legislation (ESSA) and remain an important standard on which students, teachers, and school districts are evaluated. Finally, studies that account for the fidelity and integrity of the administration procedures would be helpful as following these protocols could make a significant impact on performance.

Directions for Future Research

One of the main purposes of the current study was to provide further evidence for the criterion validity of newer math universal screening measures. Follow-up research should be completed on the diagnostic accuracy of screening measures in math for predicting high-stakes outcomes. Because these assessments come with benchmarks from the publishing companies, independent studies should be done to not only establish area under the curve values but also provide information on the sensitivity, specificity, and positive/negative predictive power of these assessments. Subsequent research could also use other measures that predict reading outcomes in order to continue to tease out the unique contribution of reading and comprehension skills when completing high-stakes math assessments.

Conclusion

As CBM becomes more ubiquitous in the school setting, competition is ever evolving to provide standardized probes across subject areas to aid in formative assessment procedures. To that end, many educational companies are currently producing M-CBM in addition to reading CBM measures. These new M-CBM measures are in need of independent research to determine their utility within the schools. DIBELS Math measures are one of the newest options available for benchmarking math skills. DIBELS Math measures appear to be an adequate option as it pertains to criterion validity. However, districts may save time by using these measures as part of a gating procedure that could save time on assessment and scoring for teachers. Also, schools need to maintain an understanding that high-stakes math assessments require adequate reading skills.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Giancarlo A. Anselmo

References

Aaron

I. E.

(1968). Reading in mathematics. In Howes

V. M.

Darrow

H. F.

(Eds.), Reading and the elementary school child: Selected readings on programs and practices (pp. 70-74). New York: Macmillan.

Anselmo

G. A.

Yarbrough

J. L.

Kovaleski

J. F.

Tran

V. N.

(2017). Criterion-related validity of two curriculum-based measures of mathematical skill in relation to reading comprehension in secondary students. Psychology in the Schools, 54(9), 1148-1159. doi:10.1002/pits.22050

(2001). Arithmetic performance of students: Implications for standards and programming. Exceptional Children, 67(3), 311-328.

(2008). Implications of recent research: Curriculum-based measurement of math computation. Assessment for Effective Intervention, 33, 198-205.

Crawford

Tindal

Stieber

(2001). Using oral reading rate to predict student performance on statewide achievement tests. Educational Assessment, 7(4), 303-323. doi:10.1207/S15326977EA0704_04

Deno

S. L.

(1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52(3), 219-232. doi:10.1177/001440298505200303

(2015). Acadience™ reading technical adequacy brief. Eugene, OR: Dynamic Measurement Group, Inc.

Dynamic Measurement Group, Inc. (2014a). How DIBELS® math relates to the common core state standards in mathematics Inc. Retrieved from https://dibels.org/papers/DIBELSMath_CCSS_table.pdf

Dynamic Measurement Group, Inc. (2014b). DIBELS® math preliminary benchmark goals and composite scores Inc. Retrieved from https://dibels.net/

10.

Foegen

Jiban

Deno

(2007). Progress monitoring measures in mathematics: A review of the Literature. The Journal of Special Education, 41, 121-139. doi:10.1177/00224669070410020101

(2004). Determining adequate yearly progress from kindergarten through grade 6 with curriculum-based measurement. Assessment for Effective Intervention, 29, 25-37. doi:10.1177/073724770402900405

(1994). Technical features of a mathematics concepts and applications curriculum-based measurement system. Diagnostique, 19(4), 23-49. doi:10.1177/073724779401900403

(2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading, 5(3), 239-256. doi:10.1207/S1532799XSSR0503_3

(2000). The importance of providing background information on the structure and scoring of performance assessments. Applied Measurement in Education, 13(1), 1-34. doi:10.1207/s15324818ame1301_1

(1998). Monitoring basic skills progress: Basic math manual. Austin, TX: PRO-ED.

(1999). Monitoring basic skills progress: Basic math concepts and applications [computer program manual]. Austin, TX: PRO-ED.

(2019). Acadience™ math technical adequacy brief. Eugene, OR: Acadience Learning Inc.

(1999). Reading as an access to mathematics problem solving on multiple-choice tests for sixth-grade students. The Journal of Educational Research, 93(2), 113-125. doi:10.1080/00220679909597635

(2005). An exploratory validation of curriculum-based mathematical word problem-solving tasks as indicators of mathematics proficiency for third graders. School Psychology Review, 34(3), 358-371.

(2002). Achievement growth in children with learning difficulties in mathematics: Findings of a two-year longitudinal study. Journal of Educational Psychology, 94(3), 586-597. doi:10.1037/0022-0663.94.3.586

(2008). Long-term diagnostic accuracy of curriculum-based measures in reading and mathematics. School Psychology Review, 37(3), 374-390.

(2003). The state of state standards: Research investigating proficiency levels in fourteen states. Lake Oswego, OR: Northwest Evaluation Association.

23.

Marston

Magnusson

(1985). Implementing curriculum-based measurement in special and regular education settings. Exceptional Children, 52(3), 266-276. doi:10.1177/001440298505200307

(2016). The North Carolina testing program technical report mathematics assessments: Technical report. Retrieved from http://ncdpi.edu

25.

National Center for Education Statistics. (2019). The nation’s report card: Mathematics 2019. Washington, DC: Institute of Education Sciences, U.S. Department of Education.

26.

Reyna

V. F.

Brainerd

C. J.

(2007). The importance of mathematics in health and human judgment: Numeracy, risk communication, and medical decision making. Learning and Individual Differences, 17(2), 147-159. doi:10.1016/j.lindif.2007.03.010

(1992). Curriculum-based measurement of oral reading fluency: A confirmatory analysis of its relation to reading. School Psychology Review, 21, 459-479.

(2009). Math disabilities and reading disabilities: Can they be separated? Journal of Psychoeducational Assessment, 27, 175-196. doi:10.1177/0734282908330578

(2002). What is measured in mathematics tests? Construct-validity of curriculum-based mathematics measures. School Psychology Review, 31(4), 498-513.

(2007). Literature synthesis on curriculum-based measurement in reading. The Journal of Special Education, 41(2), 85-120. doi:10.1177/00224669070410020401

31.

Wheeler

C. E.

(2016, April). DIBELS® math: An overview for kindergarten—sixth grade. Paper presented at the Annual Oregon Response to Instruction and Intervention Conference. Retrieved from https://dibels.org/papers/Courtney_Wheeler_Oregon_RTI_Conference_2016.pdf

32.

Wheeler

C. E.

Lembke

E. S.

Richards-Tutor

Wallin

Good

R. H.

Dewey

E. N.

Warnock

A. N.

(2019). Acadience™ math assessment manual. Oregon: Acadience Learning Inc.Warnock.

33.

Whitley

(2019). Oral reading fluency and maze selection for predicting 5th and 6th grade students’ reading and math achievement on a high stakes summative assessment. Reading Improvement, 56(1), 24-35.