Abstract
Students’ understanding and proficiency with rational number concepts and operations is considered a key foundational skill for future success in algebra. As middle school students work with these concepts, teachers need timely data to determine whether students are making adequate progress. The purpose of this article is to document the content specifications and technical adequacy data of three experimental measures for Grades 6 to 8 that are intended to provide teachers with ongoing information about students’ development of algebra-readiness concepts and skills. Three experimental measures were administered to a total of 575 students in Grades 6 to 8 in two states. Results from a systematic content review indicate that the content assesses key algebra-readiness concepts and skills. Pilot study results suggest that the measures may provide teachers with technically adequate data.
Algebra, often referred to as the “gatekeeper” to advanced mathematics, can be defined as generalized arithmetic in which the problem solver extends his or her knowledge beyond computation of concrete numbers to focus on relationships between quantities (Wu, 2001). Students need a strong foundation in algebra to succeed in advanced mathematics and science classes in high school, and those students who pass Algebra II are more than four times as likely to graduate from college as students with lower levels of mathematics (Adelman, 1999). Moreover, algebra is routinely used in many professions (e.g., business management and marketing, computer science, plumbing) and in daily living activities (e.g., calculating measurements, unit pricing). Student performance in high school algebra is highly correlated with college attendance and graduation, career readiness, and higher income (National Mathematics Advisory Panel [NMAP], 2008). However, on a recent assessment for the High School Longitudinal Study of 2009 (National Science Board, 2012), most ninth-grade students struggled with three of the five assessed subcomponents of algebra: Approximately, 40% demonstrated mastery in algebraic equivalence, 18% in systems of equations, and 9% in linear functions. Although 86% of the ninth-grade students demonstrated proficiency in algebraic expressions and 59% in multiplicative and proportional thinking, these skills were identified as the least complex of the assessed content.
Examination of National Assessment of Educational Progress (NAEP; National Center for Education Statistics, 2013) algebra subscales reveals differences in algebra readiness based on student characteristics and demographics as early as Grade 4. Students with disabilities, students who are English language learners (ELL), and students eligible for the National School Lunch Program scored statistically significantly lower (p < .01) on the NAEP algebra subscale when compared with their respective peers. White students performed statistically significantly higher than African American and Hispanic students. These differences persist on the Grade 8 NAEP algebra subscales. Taken together, results from these national assessment systems point to a pressing need to improve U.S. mathematics education.
Addressing the need for increased mathematics achievement for all students is a formidable challenge. Mathematics instruction is often based on materials that focus on a “one size fits all” classroom with little attention to the vast differences in background experience, knowledge, and entry skills of students, especially those with or at risk for disabilities (Woodward & Brown, 2006). Without systematic differentiation of instructional support, struggling learners may continue to fall further behind their peers, and achievement gaps will continue to widen (Gersten et al., 2009). To support struggling students, teachers need to be better informed about students’ current and ongoing learning needs as well as their response to instructional changes. Carefully designed algebra-readiness assessments that are sensitive to student growth are needed to allow timely and informed instructional decisions. Although research and development on progress-monitoring measures for high school algebra is ongoing (cf. Foegen, Olson, & Impecoven-Lind, 2008), few technically adequate assessment systems are available to provide this formative information to middle school teachers, especially for algebra and algebra readiness.
The purpose of this study is to document the content specifications and examine the technical adequacy data of three experimental measures intended to provide teachers with ongoing information about middle school students’ algebra readiness. Specifically, these measures are designed to assess skills that are necessary for success in algebra, including conceptual understanding of rational numbers, facility with and the ability to reason using proportions, and application of number properties. We focus on these topics due to their importance in establishing foundations for algebra (NMAP, 2008).
Foundations for Algebra
To prepare for success in algebra, the NMAP (2008) recommended that students develop procedural fluency with whole number operations, conceptual understanding of rational number systems, and proficiency operating with rational numbers. The developmental sequence found in many curricular resources, including the Common Core State Standards for Mathematics (CCSS-M; National Governors Association & Council of Chief State School Officers, 2010), focus first on the development of students’ conceptual understanding of whole numbers and then develop proficiency with operations. Next, students generalize these skills by extending their understanding of whole numbers to fractions and decimals and then performing operations with them. Algebra requires students to further generalize these arithmetic principles to solve abstract problems involving symbolic notation. Building on our previous work defining the foundational components of algebra (Ketterlin-Geller & Chard, 2011), we hypothesize that students’ conceptual understanding of number systems, facility with basic number properties, and understanding and application of operations are the basis of students’ algebra readiness in middle school mathematics.
Understanding Number Systems
Foundational to students’ ability to reason algebraically is their knowledge of number systems and the relationships between those systems. Deepening students’ understanding of natural numbers, whole numbers, integers, and rational numbers supports them in making connections between number systems and mathematical operations (e.g., subtraction and negative integers). Specifically, students’ understanding of fractions in upper elementary and middle grades is a significant predictor of later achievement in high school algebra and mathematics even when controlling for general intellectual ability, other mathematics content knowledge, and family background (Siegler et al., 2012). Because numeracy and, more specifically, understanding rational numbers are critical for success in algebra (NMAP, 2008), students need to develop a solid conceptual understanding of number systems in the middle grades.
Place value and magnitude are central concepts that support students’ understanding of number systems (Wu, 2001). However, reading numbers correctly using place value language does not guarantee that students understand the value of each numeral (Milgram, 2005). Decomposing and composing numbers into their component values (374 = 300 + 70 + 4) deepens students’ knowledge of number (Gersten et al., 2009) and assists them in representing numbers in a variety of ways. Place value and magnitude concepts also assist students in making connections between number systems. For example, when locating and ordering rational numbers on a number line, students begin to understand that any whole number can be represented as a rational number and that multiple representations of a number may exist within a system (e.g., fractions and decimals). Understanding representations of the magnitude of whole numbers (Booth & Siegler, 2008) and fractions (Siegler, Thompson, & Schneider, 2011) is significantly related to students’ overall mathematics achievement, as well as performance on whole number and fraction arithmetic, respectively. Because of the critical link between students’ understanding of these two central concepts (i.e., place value and magnitude) and their knowledge of number systems, and the critical link between students’ knowledge of number systems and their performance in algebra, it follows that students’ understanding of place value and magnitude concepts is a prerequisite skill for algebraic proficiency.
Facility With Basic Number Properties
In addition to understanding number systems, students’ knowledge and application of number properties affects their algebra readiness and their ability to reason algebraically. These number properties include the distributive property, the commutative, associative, identity, and inverse properties of addition and multiplication, and mathematical equality. Equality can be considered one of the most central ideas students must understand to reason algebraically (Geary et al., 2008). However, many students have misconceptions about the equal sign, some identifying it as an operator. This stems from early mathematics experiences where the equal sign often indicates to students that they needed to perform an operation (e.g., 40 + 8 = ___). Viewing the equal sign in this manner may prevent students from developing essential algebra-readiness skills (Kieran, 1981). Conceptually understanding equality prepares students to properly manipulate equations.
In addition, being able to flexibly apply number properties enables students to efficiently solve contextualized or unfamiliar problems and apply and understand mathematical algorithms (Geary et al., 2008). Decomposing and composing numbers, which relies heavily on number properties, enables students to work flexibly with numbers during procedural computations, understand the mechanisms behind mathematical algorithms, and generalize their understandings to abstract situations. Furthermore, the ability to manipulate numbers using these properties lays the foundation for the manipulation of symbols in algebra.
Understanding and Application of Algorithms
To efficiently perform mathematical operations, such as addition, subtraction, multiplication, and division, mathematical algorithms are commonly used. A mathematical algorithm is generally considered to be a computational process or sequence of specific steps that can be used to solve a specific task (Wu, 2010). Although many algorithms, such as the traditional long-division algorithm, contain multiple steps, others are less formal and less complex, such as counting on to add two numbers (Geary et al., 2008). Proficiency in executing algorithms, specifically whole number division, significantly predicts upper elementary and middle school students’ later achievement in high school algebra and mathematics (Siegler et al., 2012).
Understanding why algorithms work and being able to apply algorithms enables students to transfer their mathematical knowledge to new situations and contexts (Siegler et al., 2011). As stated previously, students’ knowledge of basic number properties can serve as an anchor for understanding the conceptual underpinnings of algorithms. As students develop fluency with algorithms, they are able to use their working memory to solve more complex problems instead of having to focus on procedural computations.
Development of these skills, however, is not an intuitive process and, as recent achievement data suggest, may not be occurring in upper elementary and middle school classrooms. Multiple reasons may explain these results; however, as noted in the NMAP (2008), students need additional instructional support to extend their understanding of whole numbers to other number systems (including rational numbers and integers), solidify their understanding of properties of whole numbers, and generalize their knowledge of operations to rational numbers. For teachers to monitor students’ progress in these concepts and respond with timely and individualized instructional changes, technically adequate assessment systems are needed. The purpose of this article is to document the design of and technical adequacy data for three measures intended to provide teachers with timely data that can be used to help make informed instructional decisions to support students’ development of algebra-readiness skills and knowledge.
Using Progress-Monitoring Data to Inform Instruction
Progress monitoring is a process of frequently collecting performance data to evaluate a student’s response to instruction. Data from progress-monitoring measures are examined to determine whether a student is progressing at an adequate rate to reach his or her instructional goals (Fuchs, 2004). Baseline data are gathered and used to set performance goals, progress-monitoring measures are administered in regularly occurring intervals at least monthly, and observed data are evaluated in relation to the established goal. Because students’ scores are compared with their own performance over time, empirical data on individual student growth is generated (Stecker, Fuchs, & Fuchs, 2005). When superimposed with changes in students’ instructional programming, these data can be used to evaluate students’ response to instructional changes. Researchers (Foegen, Espin, Allinder, & Markell, 2001; Stecker et al., 2005) have found that this process of making instructional decisions based on systematic evaluation of progress-monitoring data is associated with improvements in students’ mathematics achievement in the elementary grades.
To facilitate this process, progress-monitoring measures must meet specific criteria. Originally conceptualized within the framework of curriculum-based measurement (CBM), progress-monitoring measures must be relevant to the curriculum in which decisions are made, be sensitive to small changes in student learning, have multiple parallel forms for frequent administration, and be efficient to administer and score to maximize instructional time (Kelley, Hosp, & Howell, 2008). Depending on the grade and type of measure, most mathematics CBMs include approximately 20 to 60 items per measure, and students complete as many items as possible within a preset time limit (i.e., 2–8 min). Items are scored dichotomously or students receive partial credit based on the number of digits recorded correctly. Scores are graphed over time for teachers to evaluate student growth.
Middle School Mathematics Progress Monitoring
Because of the growing concern about students’ preparedness for high school algebra, providing middle school mathematics teachers with data to monitor students’ progress in algebra-related concepts may support their instructional decisions. As described earlier, becoming ready for algebra involves integrating mathematical knowledge and skills across number systems to reason abstractly about problem situations. In contrast to progress-monitoring measures that assess students’ acquisition of computational skills, measures that assess students’ conceptual understanding of mathematics may provide more useful data about students’ algebra readiness. Initial results from CBMs that measure students’ conceptual understanding support this proposition: Research conducted by Helwig, Anderson, and Tindal (2002) found moderate to strong correlations between concept-based CBM tasks in mathematics and state assessment results for 171 Grade 8 students (r = .61 for students in special education; r = .80 for students in general education). Similarly, when examining the relationship between student performance on concept-based CBMs and other measures of mathematical proficiency, Foegen (2008) found moderate correlations for n = 79 to 84 Grade 6 students (r = .58–.71) and strong correlations for 73 to 77 Grade 7 students (r = .68–.87). In comparison with CBMs assessing computational skills, concept-based CBMs appear to be closely associated with other mathematics achievement measures for middle school students (Foegen, 2008; Helwig et al., 2002).
Extending this work to high school algebra, Foegen et al. (2008) created a series of progress-monitoring measures to provide teachers with information about students’ conceptual understanding and procedural fluency with algebraic skills. Measures of students’ basic skills in algebra and foundational concepts of algebra were designed as robust indicators of students’ algebra proficiency, whereas a content analysis measure was closely associated with the curriculum. Through repeated development and refinement, acceptable levels of evidence for reliability and validity were obtained. Although strong evidence was observed for the measures that emphasize procedural fluency, Foegen and Morrison (2010) suggested that additional research and development would be needed to design measures that align with curricular expectations associated with students’ conceptual understanding.
Our work builds on these lines of research to develop concept-based measures that can provide teachers with meaningful data about middle school students’ algebra-readiness knowledge and skills. We developed three experimental measures for Grades 6 to 8 to monitor students’ conceptual understanding of rational numbers, ability to reason about proportions, and application of number properties. Because these concepts broadly represent the knowledge and skills students need to be successful in algebra, the experimental measures are intended to serve as robust indicators of algebraic readiness. The purpose of this study is to document the content specifications and examine the technical adequacy data of these measures that we collectively refer to as the Algebra Readiness Progress-Monitoring (ARPM) measures. Results from this study will be used to guide the future development of the ARPM measures with the intention of creating a progress-monitoring system.
Research questions addressed in this study include
Method
Content Review
To determine whether the ARPM measures appropriately represent the critical foundations of algebra, content-related evidence was evaluated. Three mathematics education experts and three mathematics teachers reviewed the test protocols for alignment with mathematical skills and knowledge needed for success in algebra.
Content-expert review
Mathematics education experts were recruited if they demonstrated expertise in both mathematics and mathematics education. Criteria included earned undergraduate degree in mathematics, earned master’s and doctoral degrees in mathematics or mathematics education, current appointment as a university faculty member, and sustained engagement with the mathematics education community as measured by involvement in state or national mathematics education organizations.
Three mathematics education experts who met the above-stated criteria were selected from a convenience sample of more than 35 experts. All experts served as faculty members in mathematics departments at universities in Texas and Oregon and taught courses to preservice educators. Two females and one male participated, and had 11, 14, and 19 years of experience. Although not faculty in schools of education, the mathematics education experts had experience in mathematics education and regularly engaged with the mathematics education community through publications, presentations, and/or other service.
The mathematics education experts reviewed all ARPM forms in Grades 6 to 8 for mathematical accuracy and appropriateness of symbolic notation. Ratings were solicited for each measure on a 4-point scale (1 = not at all, 2 = somewhat, 3 = mostly, 4 = extremely).
Instructional-expert review
Teachers were recruited to participate as reviewers if they demonstrated significant experience preparing students for success in algebra. Criteria included a valid teaching certificate, at least 5 years of experience teaching mathematics, expertise in supporting students’ algebra readiness, and experience developing classroom assessments.
Three teachers who met the above criteria were selected from a convenience sample of more than 150 teachers. All reviewers were female and had undergraduate and graduate degrees in education. The reviewers had experience teaching mathematics in Texas to students in Grades 5 to 9 for 7, 8, and 9 years. Two reviewers served as mathematics facilitators working directly with elementary through middle school teachers to support their instructional practices.
The instructional experts rated each measure on six categories: clarity of directions, appropriateness of content, appropriateness of test layout, instructional relevance, sensitivity to growth, and efficiency of administration. Ratings were solicited for each measure on a similar 4-point scale as described above.
Pilot Test Procedures
Participants
Students were recruited at the classroom level from a convenience sample of schools in two states. Schools with which the researchers had established relationships were invited to participate. Participants included 154 Grade 6 students, 173 Grade 7 students, and 250 Grade 8 students from four public middle schools. School 1 was a public charter school located in an urban community in the south; 100% of the mathematics classes participated. School 2 was located in a rural community in the south; 100% of the mathematics classes participated. School 3 was located in a suburban community in the Pacific Northwest; five mathematics classes serving students in Grades 7 and 8 participated. School 4 was a suburban middle school in the Pacific Northwest; four Grade 8 mathematics classes participated.
Demographic information was collected from students’ self-report. Due to the incomplete records we received and unreliable nature of self-report data, school-level demographic data obtained from the states’ department of education websites are presented in Table 1. Because students were recruited at the classroom level, we anticipated that the demographic characteristics of the sample closely matched the demographics of the school; however, it was not possible to verify this assumption. The study was conducted at the end of the spring semester after the state testing windows had closed. During the study year, all schools achieved satisfactory performance ratings in mathematics by the state education agency.
Demographic Characteristics for Participants’ Schools Obtained From the States’ Department of Education.
Note. Econ disadv = economically disadvantaged; met stds = met standards in mathematics; PNW = Pacific Northwest.
This school serves students in Grades 5 to 8. These data are inclusive of all students.
Measures
Sample items for each measure are included in Figure 1.

Sample student directions and items for the Algebra Readiness Progress-Monitoring measures in Grade 6.
Number Properties (NP) algebra-readiness measure
The NP measure assesses students’ ability to recognize and use number properties to efficiently solve arithmetic problems. In each of the 54 items included on this measure, students evaluate two expressions and then fill in the blank with the correct symbol (greater than, less than, or equal sign) to make the statement true. Answers are scored dichotomously as correct or incorrect.
The content sampled for the measure is based on the National Council of Teachers of Mathematics (NCTM; 2006) Curriculum Focal Points and is aligned with the CCSS-M. Specifically, the CCSS-M requires that students in Grades 6 to 8 know, apply, and use number properties to generate equivalent expressions (e.g., 6.EE.3, 6.NS.4, 7.EE.4, 7.NS.1, 8.NS.1; EE stands for “Expressions and Equations,” and NS stands for “The Number System”). In accordance with these standards, the number properties assessed across Grades 6 to 8 are the commutative, associative, inverse, and identity properties of addition and multiplication as well as the distributive property. As the NCTM Curriculum Focal Points and the CCSS-M for Grades 6 to 8 emphasize students’ facility with whole numbers, rational numbers, and integers, the NP measure also emphasizes these number systems.
Quantity Discrimination (QD) algebra-readiness measure
The QD measure assesses students’ ability to quickly recognize magnitude differences within and between grade-level-appropriate number systems. In each of the 54 items included on this measure, students evaluate two numbers and circle the number that is greater. Items are forced choice between two options and scored dichotomously.
Pertinent content for this measure was cross-referenced using the NCTM Curriculum Focal Points and the CCSS-M for Grades 6 to 8. Students in these grades should be able to compare, order, and determine equivalence among numbers in a variety of number systems (e.g., 6.NS.6, 6.NS.7, 7.NS.1, 8.NS.2). This measure tests students conceptual knowledge about magnitude within and among number systems, including whole numbers, fractions, and decimals at Grade 6; integers and rational numbers at Grade 7; and whole, rational, and irrational numbers, and negative numbers across all number systems at Grade 8.
Proportional Reasoning (PR) algebra-readiness measure
Students’ ability to read, order, and compare proportions is assessed using grade-level-appropriate representations in the PR measure. In each of the 54 items included on this measure, students evaluate the proportion and select the symbol (greater than, less than, or equal sign) that makes the statement true. Items are scored dichotomously.
In accordance with the NCTM Curriculum Focal Points and the CCSS-M, students must be able to extend their work with ratio to recognize and solve proportions (e.g., 6.RP.1, 7.RP.1, 8.EE.5; RP stands for “Ratios and Proportional Relationships”). In this measure, students compare proportions represented in percent notation (percent of 100 or percent of a whole) or number notation expressed as a part of a whole (e.g., 8 out of 32) or a fraction of a whole (e.g., 3/4 of 16).
In addition to being aligned with content standards from the NCTM Curriculum Focal Points and the CCSS-M, all three of the measures incorporate Standards for Mathematical Practice from the CCSS-M. More specifically, these measures require that students “Make sense of problems and persevere in solving them,” “Reason abstractly and quantitatively,” and “Look for and make use of structure.”
Procedures
The grade-level-specific experimental ARPM measures were administered to participants by the lead researcher and/or trained research assistants. To ensure comparability in administration across sites, test administrators participated in a training session to review the standardized administration protocol and the scripted directions. Although fidelity observations were not conducted, test administrators were asked to report any deviations from the standardized protocol.
Each participant received a booklet with one of two parallel forms of the three ARPM measures. Anchor items representing a range of difficulty levels were embedded on each form to allow for equating across pilot test forms. To minimize the impact of fatigue, two versions of each form were created to redistribute the same items within the first half and second half of the form, respectively. Form administration was counterbalanced within and across forms. Booklets were randomly distributed to participants. Makeup sessions were not provided for students who were absent on the day of testing.
Testing was completed within one class period. After delivering the scripted directions, participants had the remainder of the class period (approximately 40 min) to complete the testing booklet. Most students finished within the allotted time; additional time was not provided. Untimed administration was required to collect sufficient data on all items on the pilot test forms. Calibrating items under timed conditions may result in biased item parameter estimates due to scoring unreached items at the end of the form as incorrect and/or the possibility of random guessing as time runs out (Bolt, Cohen, & Wollack, 2002). Because results from the pilot test were used to create future parallel test forms, data on all items were required to obtain accurate item parameter estimates.
Two research assistants independently scored the responses according to an answer key. Data were entered into a spreadsheet and verified for accuracy by a third research assistant. For attempted measures, unanswered items were marked as incorrect. For measures that were not attempted, unanswered items were recorded as missing.
Analyses
Item response theory (IRT) is ideally suited for developing measures in which item and form comparability are important. Classical test theory (CTT) does not provide an adequate model for creating comparable tests because of sample dependence: The values of relevant item statistics (i.e., item difficulty and item discrimination) depend on the sample with which the pilot testing was conducted (Hambleton, Swaminathan, & Rogers, 1991). As such, many of the conclusions drawn from test results are no longer valid for populations whose demographic characteristics do not closely match the normative sample. IRT, however, is not sample dependent. Data can be meaningfully interpreted from any reasonable subset of the population used in the original scaling of items (Embretson & Reise, 2000). Properly scaled measures can be administered to groups of students with varying characteristics with the expectation of equally valid inferences from each group.
IRT is also ideally suited for developing measurement systems in which students respond to different sets of items over time. Students’ scores from tests developed using CTT are based on the difficulty of the items sampled to create the test form: Students will have lower scores when taking tests with difficult items as compared with tests with less difficult items. As such, variability in students’ scores over time may be due to differences in item difficulty as opposed to changes in students’ knowledge and skills. In contrast, IRT models scale students’ responses to generate an ability estimate that is independent of the specific items sampled on the test (Hambleton et al., 1991). When administered properly scaled items, students’ ability estimates are comparable regardless of the specific set of items. These defining features of IRT support its use for creating comparable forms of progress-monitoring measures.
IRT is gaining popularity with developers of progress-monitoring systems. Of the 13 publishers listed on the National Center on Intensive Intervention (2014) progress-monitoring tools chart, four explicitly reference the use of IRT models as the analytic framework for creating progress-monitoring measures. For example, Anderson, Lai, Alonzo, and Tindal (2011) used the Rasch model to report validity evidence for using the easyCBM® assessment system to identify and monitor progress of low-performing students. Similarly, the STAR Math Assessment (Renaissance Learning, 2014) used IRT modeling in the design of their screening and progress-monitoring systems.
Results from the pilot test of the ARPM measures were analyzed using the Rasch (one-parameter logistic IRT) model. The Rasch model was selected because of the strict assumption that the probability of a person responding correctly to a given item is governed by the person’s ability and the item difficulty; item discrimination and guessing are not estimated in the Rasch model (Wright, 1977). Items that do not fit the Rasch model violate this assumption. Given the selected response nature of the ARPM measures, of specific concern was the susceptibility of items to guessing. However, as the tenets of the Rasch model suggest, items prone to error associated with guessing do not fit the model and will be subsequently excluded. Items with acceptable goodness-of-fit statistics meet the assumptions of the Rasch model.
The content review and pilot study were conducted concurrently.
Results
Content Representation of ARPM Measures
We asked mathematics education experts and middle school mathematics teachers to thoroughly review the measures to determine whether the assessed skills and knowledge adequately represent the critical foundations of algebra. The mathematics education experts rated the PR and QD measures as extremely accurate, both mathematically (M = 4.0, SD = 0) and symbolically (M = 4.0, SD = 0). One reviewer rated one NP form as mostly accurate because of a formatting issue; all other NP forms were rated as extremely accurate overall. The average rating for mathematical accuracy of NP was 3.9 (SD = 0.3) and of symbolic notation was 4.0 (SD = 0).
On most dimensions, the middle school mathematics teachers rated the measures as mostly to extremely appropriate for students in the assigned grade level. The average rating for instructional relevance of the content across the measures and grades was 3.2 (SD = 0.8) on a 4-point scale. The NP measure received the highest rating for instructional relevance across grades (M = 3.4, SD = 0.7). PR had the lowest rating for instructional relevance with an average score of 3.0 (SD = 0.9); teachers perceived proportional reasoning as less instructionally relevant for students in Grade 8 as compared with Grades 6 to 7. The average rating for sensitivity to growth was 3.7 (SD = 0.7). The majority of the comments regarding the content were due to variations in state content standards (i.e., Texas Essential Knowledge and Skills) as compared with the NCTM Curriculum Focal Points and the CCSS-M. Teachers also noted that grade-level content could be added to the measures, such as negative rational numbers on the Grade 7 QD measure and irrational numbers on the Grade 8 NP measure. Teachers provided feedback regarding the test layout and format, requesting additional space between items. One teacher noted that there should be an equal distribution of correct responses as greater than, less than, and equal to.
Technical Adequacy of ARPM Measures
Pilot test data were analyzed using the Rasch model with Winsteps statistical software (Linacre, 2013). During the calibration process, items across parallel forms of the same measure were equated using a concurrent equating method (Hanson & Beguin, 2002). Common anchor items were used to link the forms so that all items within an ARPM measure were calibrated to the same scale. Using this equating design, subsequent performance on a subset of items can be meaningfully compared with performance on another subset of items.
Prior to examining item difficulty parameters and goodness-of-fit statistics, model assumptions were evaluated. The Rasch model assumes that students’ responses to one item are not dependent on their responses to another and that items represent a unidimensional construct (Hambleton et al., 1991). Principal Components Analyses (PCA) of the residual score matrix were planned as a common analytic approach to evaluate dimensionality for Rasch analyses. Before conducting PCA for each measure, Bartlett’s Sphericity test and Kaiser–Meyer–Olkin (KMO) index were evaluated to determine whether the data were appropriate for PCA. Bartlett’s Sphericity test evaluates whether the residual correlation matrix diverges from the identity matrix, and the KMO measure of sampling adequacy is used to evaluate the magnitude of the partial correlations among variables. For all ARPM measures, the Bartlett’s Sphericity test and KMO index indicated that the residual matrices were not suitable for modeling PCA. These results may be due to the sparse data matrices or the large number of items per measure. In the absence of a test for dimensionality, we cannot conclude that the ARPM measures are unidimensional. Noteworthy as an indication of unidimensionality, the mean correlations among items in the residual matrices for the ARPM measures were approximately zero, indicating that the Rasch model had accounted for most of the covariance in the items (Chou & Wang, 2010).
Item difficulty is plotted on the logit scale with typical values ranging from −3.0 to +3.0. The range of item difficulty parameters for the ARPM measures was evaluated to determine whether the items represent a broad spectrum of algebraic reasoning abilities. Table 2 presents the item statistics for each measure. Although Rasch analyses using Winsteps constrains the mean to zero, the standard deviation indicates a moderate range of item difficulty estimates. Levels of skewness and kurtosis were within acceptable limits (<±1). For all grades, the NP measures had the greatest range in item difficulty and include the most difficult items. The QD measures had the narrowest range with the least difficult items. The PR measures had a moderate range of item difficulty.
Item Statistics for Each ARPM Measure by Grade.
Note. ARPM = Algebra Readiness Progress Monitoring; NP = Number Properties; QD = Quantity Discrimination; PR = Proportional Reasoning.
The Test Information Function (TIF) describes how precise the ability estimates can be measured at various points along the ability scale (Embretson & Reise, 2000) and is conveniently represented graphically. The peak of the curve indicates the point on the distribution at which ability estimates are most precise. For the ARPM measures, the shape of the TIFs indicate that the ARPM measures provided the most precise information between −2.0 and +2.0 (see Figure 2 for the TIFs for each grade by ARPM measure). Ability estimates at the extremes of the scale were less precisely estimated with the ARPM measures.

Test information functions for each ARPM measure by grade.
The mean person ability estimates were greater than the mean item difficulty estimates and ranged from +0.49 to +1.83 (see Table 2). These data indicate that the average ability of the participants was high, possibly due to the administration of the ARPM measures at the end of the academic year. The distributions of the person ability estimates were approximately normal, with levels of skewness and kurtosis with acceptable limits (<±1) for all but the PR measures in Grades 7 and 8.
Goodness of fit with the Rasch model indicates the level of correspondence between the item difficulty parameter and the expected student performance. Model fit was determined by examining the Mean Squares Infit and Outfit statistics. Infit estimates the correspondence that is proximal to the item difficulty estimate. Outfit examines correspondence that is distal to the item difficulty estimate. To be productive for measurement, mean square values should range from 0.5 to 1.5; mean square values near 1.0 indicate little distortion of the measurement system (Linacre, 2002). The average Mean Square Infit and Outfit statistics for each ARPM measure were near 1.0. Seventeen percent (n = 82) of the total number of items were classified as misfitting across the nine ARPM measures (see Table 3). Additional research is needed to understand why these items were misfitting; however, they may have been susceptible to error associated with guessing behavior.
Excluded Items for Each ARPM Measure by Grade.
Note. ARPM = Algebra Readiness Progress Monitoring; NP = Number Properties; QD = Quantity Discrimination; PR = Proportional Reasoning.
Internal consistency reliability was evaluated using Cronbach’s alpha (see Table 2). Coefficients ranged from .92 to .97. Because low item-total correlations can affect internal consistency (Tan, 2009), 46 items with item-total correlations below .20 were excluded. Seventy-two percent (n = 33) of the items with unacceptable item-total correlations were also classified as misfitting. The numbers of items with unacceptable correlations are reported for each ARPM measure in Table 3.
Correlation Across Measures
To evaluate the relationship between the ARPM measures, we correlated the students’ ability estimates across the three ARPM measures within grade. Correlations ranged from r = .43 to .63. Table 4 displays the correlation matrix.
Correlation Matrix for the ARPM Measures by Grade for Convergent Evidence for Validity.
Note. ARPM = Algebra Readiness Progress Monitoring; NP = Number Properties; QD = Quantity Discrimination; PR = Proportional Reasoning.
Discussion
The ARPM measures were designed to provide teachers with ongoing information about middle school students’ algebra readiness. The purpose of this study was to describe the content specifications of the ARPM measures and examine technical adequacy data. In conjunction with content experts and mathematics teachers, we analyzed the content alignment of the ARPM measures to the critical foundations of algebra. Using pilot study data, we evaluated the technical adequacy of the ARPM measures by examining the distribution of item difficulty parameters, fit statistics, internal consistency, and correlations. Together, results from these studies provide initial evidence indicating the value of the ARPM measures for generating meaningful data about students’ development of algebra-readiness knowledge and skills.
Assessing Algebra-Readiness Content
The ARPM measures focus on critical foundations of algebra, including conceptual understanding of rational numbers, facility with and the ability to reason using proportions, and application of number properties. These skills align with recommendations established by the NMAP (2008) for supporting students’ algebra readiness. Competence with these concepts and skills provides students with foundational knowledge for abstract reasoning and lawfully manipulating and operating on numbers and symbols, thereby establishing the foundation for algebra and advanced mathematics (Milgram, 2005). However, students who are not demonstrating growth in these skills may be at risk for not being successful in algebra.
Designed to serve as a robust indicator, the ARPM measures assess content that is not directly taught in the curriculum but is broadly representative of the skills and knowledge students need to be successful in algebra. Feedback from mathematics education experts and teachers provides initial content-related evidence for the ARPM measures. The experts rated the measures as extremely accurate in their representation of critical foundations of algebra. Middle school mathematics teachers identified the measures as mostly instructionally relevant. PR in Grade 8 had the lowest rating for instructional relevance. One possible explanation for this finding is the shift in the CCSS-M in Grade 8 to graphing proportional relationships and understanding the linear equations. Although not directly assessed in the ARPM measures, students’ ability to graph and recognize linear equations presupposes proficiency with understanding proportions. Overall, in considering the utility of the measures for monitoring growth, the expert reviewers overwhelmingly rated the measures as extremely useful.
Adequately Measuring Algebra Readiness
To provide instructionally relevant and useful data that is sensitive to growth in student learning, specific technical requirements must be met. Items should adequately and reliably capture students’ knowledge and skills in relevant content. Measures should differentiate between high and low performers (Foegen, 2008). Small changes in student learning should be discernable over time (Kelley et al., 2008).
Technical adequacy data gathered for the ARPM measures indicate that these measures may be feasible for providing teachers with meaningful data about students’ development of algebra-readiness knowledge and skills in Grades 6 to 8. Rasch model fit statistics and item-total correlations indicate that a majority of the items across the ARPM measures performed as predicted by the IRT model and met the criteria needed to be productive for measurement. Strong Cronbach’s alpha coefficients indicate acceptable internal consistency reliability estimates. Item statistics for the ARPM measures indicate that the item difficulty parameters were broadly distributed across the ability scale yet are sufficiently clustered between −2.0 and +2.0 to provide precise estimates of students’ abilities.
For all ARPM measures, the dispersion of the ability estimates were approximately normal and were centered above +1.0 on the logit scale. These findings indicate that the average ability of the students was greater than the average difficulty of the items. Because the pilot study was conducted at the end of the year with intact classrooms and participants were allowed an extended period of time to complete the ARPM measures, these findings are expected and desired. Had the distribution of the person ability estimates during an untimed administration been centered at zero, implementation of timed measures would likely result in a floor effect (clustering of scores at the lower end of the distribution), thereby limiting the utility of the measures for monitoring growth.
Contributing to Algebra Readiness Skills and Knowledge
CBMs that are designed as robust indicators assess aspects of the targeted domain that are independent of the annual curriculum. The ARPM measures were designed as robust indicators of students’ algebra readiness; together, the content measures students’ ability to synthesize and integrate their mathematical knowledge and skills to demonstrate their progress in thinking algebraically and understanding complex mathematical processes. Moderate correlations between the ARPM measures indicate that student performance across the ARPM measures is positively related. Additional evidence is needed to support the claim that the ARPM measures serve as a robust indicator of algebra-readiness knowledge and skills.
Future Directions and Limitations
The present studies provide important yet incomplete data about the utility of the ARPM measures for providing teachers with meaningful data about middle school students’ algebra readiness. In this article, we provide theoretical evidence about the components of the assessed construct, as well as content-related evidence documenting the sufficiency by which we instantiated this construct. Moreover, we provided initial technical adequacy data supporting the feasibility of the ARPM measures in assessing the construct. However, additional data are needed to justify implementation of the ARPM measures for making instructional decisions.
To support teachers’ use of the ARPM measures to inform instruction, data from the pilot study were used to create three parallel forms of each measure. Using the item statistics obtained from the Rasch analyses, forms were created to include items with comparable difficulty parameters. Additional items are needed to create multiple forms to provide progress-monitoring data. Once these forms are created, studies will be necessary to examine the reliability of the slope, acceptable rates of improvement, and the sensitivity of the measures to monitoring growth. Empirical data are also needed from studies examining the relation between results on the ARPM measures and other criterion measures. Specifically, concurrent-related evidence is needed to evaluate whether results on the ARPM measures correlate with results on other measures of algebra readiness. Although there is a dearth of technically adequate measures of algebra readiness, available measures such as the Iowa Algebra Readiness Assessment (Schoen & Ansley, 2006), which serves as a placement measure, may be suitable. Importantly, predictive-related evidence is needed to determine whether results on the ARPM measures are predictive of students’ future performance in algebra.
Once sufficient alternate forms of the ARPM measures are available and properly scaled using the Rasch model, scale scores can be generated from students’ performance on each measure (Embretson & Reise, 2000). Because the ARPM measures are intended to serve as a robust indicator of algebra readiness, scores from the three ARPM measures should not be disaggregated. Instead, a composite score will be generated to represent the combined knowledge and skills students need to be successful in algebra. Data can be plotted over time to evaluate students’ progress toward proficiency in algebra-readiness knowledge and skills. Similar to other progress-monitoring systems, teachers can use these data to guide their instructional decisions such as adjusting their instructional practices if students are not making sufficient progress or increasing students’ instructional goals if they are making adequate growth.
Several limitations affect the observed findings. First, limited demographic information was available on the participants. Although accurate estimates of item difficulty parameters are obtainable through IRT with any reason-able subset sample of the population, additional details about the representativeness of the sample for the ARPM measures are needed. Demographic data would allow further analysis of item functioning for subgroup populations. Specifically, differential item functioning (DIF) analyses are needed to identify potentially biased items. Second, tests for the dimensionality assumption of IRT were not possible given the structure of the data set. Although evidence suggests that the Rasch model accounted for most of the covariance in the items, additional analyses are needed to verify essential unidimensionality with subsequent collections of data. Third, and finally, item statistics were analyzed from an untimed pilot administration of the ARPM measures. Using mixture IRT models may be useful for examining the impact of timed administration on item statistics (Bolt et al., 2002).
Implications for Practice
The results of this research have several key implications for practitioners working in middle school settings. First, based on the theoretical assertions about indicators of algebra readiness as well as empirical data collected from a rigorous content review and pilot test, the ARPM measures are feasible and adequately assess foundational skills needed for success in algebra. Although further development is needed, practitioners could use these measures to obtain meaningful and trustworthy information about students’ proficiency in algebra-readiness knowledge and skills. Second, further specifying key components of the mathematics curriculum that may support students’ readiness for algebra provides educators with additional information to consider when prioritizing the content of their instruction. Students’ conceptual understanding of rational numbers, facility with and the ability to reason using proportions, and application of number properties are associated with critical foundations of algebra and may serve as key levers within the middle school curriculum to improve performance. Additional research is needed to evaluate the relationship between these content components and subsequent success in algebra; however, practitioners may consider incorporating these concepts in their instruction. Third, and finally, with further development of an integrated assessment delivery system, data generated from the ARPM measures may help teachers make timely and informed instructional decisions. Data can be used to identify students who need varying levels of instructional support to be ready for algebra and monitor their response to differentiated instructional opportunities. As such, these measures may be implemented within a systems-level initiative designed to improve students’ preparedness for algebra.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
