Abstract
Understanding students’ readiness for precalculus and calculus at the community college level is critical not only because of the key role community colleges play in higher education but also because calculus remains a gateway course for students in advancing to higher level mathematics. The Algebra and Precalculus Concept Readiness Assessment for Community Colleges (APCR-CC) was designed to investigate community college students’ quantitative reasoning abilities and conceptual understanding in algebra. The present study investigates the psychometric properties of the APCR-CC instrument using item response theory based on a sample of intermediate and college algebra students from six community colleges collected in a pretest (N = 1,131) and posttest (N = 772) setting. We examine unidimensionality, item fit, local item independence, measurement invariance, and sensitivity to instruction. Our findings suggest that the APCR-CC instrument is sufficiently characterized by one underlying construct, local dependence does not seem to be an issue, and 80% of the items in the APCR-CC instrument are sensitive to instruction.
Keywords
Although the predominant focus of research in the field of mathematics education has been on K-12 and university settings, there is an abundance of information to be learned from both mathematics students and faculty in the community colleges. The role that the 982 public community colleges play in the United States is vital for both workforce development and educating students beyond high school. Community colleges in the United States provide an open access opportunity for students to complete various certificates and associate degrees or to complete college coursework prior to or during their university education. According to the National Center for Educational Statistics, there were just more than 14 million undergraduate students in postsecondary public institutions in the United States in the fall of 2015, with nearly 43% of them enrolled in community colleges (Snyder, de Brey, & Dillow, 2018). A report on the role of community colleges in postsecondary success from the National Student Clearinghouse Research Center (2017) showed that 46% of 4-year college graduates in 2013 to 2014 had attended a community college at some point in their past. The 2017 demographics of community colleges show a highly diverse student population: 62% of all community college students were enrolled part-time, 56% were female, 36% were the first in their family to attend college, 17% were single parents, and 22% of full-time students held full-time jobs, whereas 41% of part-time students held full-time jobs (American Association of Community Colleges, 2017). Certainly, the community colleges provide educational opportunities to a widely diverse student population with a variety of academic goals, and due to their demographics, the student population of community colleges is often very different from that of universities. Helping students realize their educational dreams and persevere through obstacles is the primary mission of the community colleges in the United States, and yet a major challenge, given the diversity of students.
In mathematics, community colleges serve an even more important role because they offer precollege coursework, also known as developmental mathematics, designed to enhance students’ preparedness for college-level mathematics (e.g., college algebra [CA] or precalculus). The 2015 College Board of the Mathematical Sciences survey reported that about 42% of all mathematics students enrolled in U.S. institutions of higher education were taking their mathematics courses at a public 2-year college (Blair, Kirkman, & Maxwell, 2018). Blair et al. (2018) also found that approximately 41% of all community college mathematics students in fall 2015 were enrolled in developmental mathematics courses. Although many students who enroll in community colleges have previously taken algebra in high school, the reality is that only 62% would have completed Algebra II for graduation and even fewer (18%) would have completed additional courses that prepare them for taking precalculus (e.g., trigonometry 16%; see Champion & Mesa, 2017). These statistics suggest that a significant proportion of high school students who want to pursue an advanced degree in science, technology, engineering, or mathematics will need more mathematical preparation after graduation, and that many of these students are likely to do so at a community college. In addition, as Sitomer et al. (2012) point out, “community college mathematics teachers are reteaching mathematical content that students have encountered in previous mathematics courses, yet little is known about how students arrive at understanding when they are reintroduced to the content” (p.35). The American Mathematical Association of Two-Year Colleges (AMATYC) has made it a priority to promote research on the teaching and learning of mathematics in the first 2 years of college and the organization has called for more research investigations “to understand how to best assist students in two-year colleges to succeed” (AMATYC, 2018, p. 94).
Students’ readiness for precalculus and calculus hinges upon the capacity of their prior algebra preparation to open doors to the possibility of earning a science, technology, engineering, and mathematics (STEM) degree or a non-STEM career that demands college-level mathematics skills. Yet, beyond commercially available placement assessments such as ALEKS, Accuplacer, and Compass that focus on students’ procedural skills rather than conceptual knowledge, the field does not have research-based instruments (e.g., concept inventories) that truly assess students’ precalculus readiness specifically at the community college level. In a research by Wladis, Offenholley, Licwinko, Dawes, and Lee (2018), they found that no validated assessments (such as concept inventories) existed to evaluate students’ conceptual understanding specifically in postsecondary elementary algebra. However, such concept inventories are extremely important for two reasons. First, they help assess students’ knowledge of fundamental disciplinary concepts (e.g., the Force Concept Inventory in physics, Hestenes, Wells, & Swackhamer, 1992; the Precalculus Concept Assessment [PCA] in mathematics, Carlson, Oehrtman, & Engelke, 2010), and second, when they are valid and reliable, they can be used to assess the impact of interventions that target teaching those fundamental concepts. Wladis et al. (2018) contend that not having access to validated assessments is a barrier for community college faculty because “instructors cannot systematically detect which incorrect or underdeveloped algebraic conceptions are impeding student progress, and thus they cannot target instruction to address these conceptions explicitly” (p. 1).
To address this barrier, the study presented in this article stems from a research project focused on understanding the precalculus knowledge of fundamental concepts that students possess while they are in algebra-intensive courses, such as intermediate algebra (IA) and CA, in the community college setting. The purpose of this article is to analyze and report on the psychometric properties of an assessment developed to assess community CA students’ quantitative reasoning abilities and conceptual understanding of fundamental algebraic ideas. This assessment, called the Algebra and Precalculus Concept Readiness Assessment for Community Colleges (APCR-CC), informs both our knowledge about students’ understanding of fundamental algebraic concepts and the impact of instruction on learning in algebra courses at the community college level.
Development of the APCR-CC Instrument
Research in mathematics over the past four decades has investigated students’ knowledge of algebraic concepts, such as proportionality (Harel, Behr, Post, & Lesh, 1992; Hoffer & Hoffer, 1988; Lamon, 2007), rate of change (Carlson, Jacobs, Coe, Larsen, & Hsu, 2002; Thompson, 1994a, 1994b), and the concept of function (Breidenbach, Dubinsky, Hawks, & Nichols, 1992; Saldanha & Thompson, 1998; Thompson & Saldanha, 2003). Yet, this research has been predominately focused on students in 4-year universities. Over the course of 10 years, the PCA was developed by Carlson et al. (2010) to assess students’ critical reasoning and conceptual abilities in college-level precalculus. The PCA taxonomy, which provided the framework for item generation, included mathematical ideas of proportional reasoning, rate of change, function as a process (Breidenbach et al., 1992), covariational reasoning (Carlson et al., 2002), and multiple representations of functions. Research has shown that these mathematical ideas are foundational for knowing and understanding varying rate of change and developing meaning of dynamic situations (Carlson et al., 2002; Thompson, 1994a, 1994b).
Carlson et al.’s (2010) development of the PCA was informed by past concept instruments in physics education (e.g., the Force Concept Inventory by Hestenes et al., 1992), where guiding taxonomies on students’ thinking provided the framework for item development and evolution. Initially, the PCA included 34 open-ended items to gather a variety of students’ conceptions about precalculus concepts. These items were generated after a series of investigations with university students to develop the taxonomy describing the critical reasoning abilities and conceptual understandings that are critical for success in calculus. Clinical interviews were conducted with a subset of university students to understand the nature of their thinking that contributed to successive iterations of the items and to the development of the multiple-choice version for each item. To validate the instrument, Carlson et al. (2010) conducted ongoing cyclic refinement of the items through administering the PCA, conducting interviews, analyzing students’ work, and refining the items until the revised version with 25 multiple-choice items was stable and representative of the guiding taxonomy.
Despite the contributions of the PCA, it was not an appropriate instrument for our student population in this investigation, given that the mathematical content was too advanced. Thus, in collaboration with Carlson and her colleagues, this investigation built upon and extended the previous PCA work by Carlson et al. (2010) to create a comparable instrument called the APCR-CC that assesses community college students’ quantitative reasoning abilities and conceptual understanding in algebra that lead to the foundational precalculus concepts identified in the PCA. Quantitative reasoning is foundational to developing and supporting algebraic reasoning, and, therefore, necessary for learning concepts in precalculus and above (Smith & Thompson, 2007). During the pilot phase of our study, the APCR-CC was administered to students in three different courses at the community college: beginning algebra, IA, and CA. After the initial analysis revealed that several items were unstable in the 25-item APCR-CC with respect to their psychometric properties, such as item difficulty and item discrimination, and too advanced for our student population, a revised APCR-CC instrument was developed while maintaining the focus on the construct of quantitative reasoning and conceptual understanding in algebra. Specifically, the revised APCR-CC instrument focuses on proportional relationships, linearity, covariational reasoning, exponential growth, and rational function behavior. After the pilot phase, we focused our data collection on IA and CA only because the mathematical ideas listed above were found to be concentrated in these courses.
Three items from the APCR-CC assessment are shown below. Item 3 requires students to be proficient with function notation and to interpret the behavior of a function in the context of a different real-life situation. Item 10 shows a graphical representation that illustrates a covarying relationship of two quantities, time and distance, and requires students to assess the falsehood of the statements within that particular context. Item 19 requires the use of proportional relationships in a real-life context. Note that the student is not asked to solve a problem, rather to select an expression that illustrates the relationship between two variables.
Item 3: If S(m) represents the salary (per month), in hundreds of dollars, of an employee after m months on the job, what would the function R(m) = S(m + 12) represent? The salary of an employee after m + 12 months on the job. The salary of an employee after 12 months on the job. US$12 more than the salary of someone who has worked for m months. An employee who has worked for m + 12 months. Not enough information.
Item 10: Kirk’s distance from a park bench in terms of the number of seconds since he started walking is represented by this graph.
Which of the following statements about this situation is false?
As the number of seconds since Kirk started walking increases from 0 to 8 s, Kirk’s distance from the park bench is always increasing.
As the number of seconds since Kirk started walking increases from 8 to 12 s, Kirk’s distance from the park bench decreases at a constant rate.
As the number of seconds since Kirk started walking increases from 4 to 8 s, Kirk’s distance from the park bench increases by a greater amount each second.
As the number of seconds since Kirk started walking increases from 0 to 4 s, Kirk’s distance from the park bench increases by a greater amount each second.
After Kirk has walked for 8 s, he changes directions and begins walking toward the park bench.
Item 19: When preparing a prescription, a pharmacy mixes seven parts of water with two parts of concentrated amoxicillin. Which formula should the pharmacy technician use to determine the number of milliliters of water w to add to a milliliters of concentrated amoxicillin?
We report here on the psychometric properties of the revised APCR-CC instrument using item response theory (IRT) models. IRT is a framework of statistical models that allows the estimation of item discrimination and item difficulty parameters that are sample independent, places persons and items on a common scale, and describes items according to their location and capacity to discriminate between test takers (de Ayala, 2009). Specifically, we investigate unidimensionality, item fit, local item independence, and measurement invariance (MI) across IA and CA students in a pre- and posttest setting. In addition, we also analyze the items’ sensitivity to instruction.
Several research questions guided the present study:
We hypothesize that one trait will be sufficient to describe the data because the APCR-CC assessment was developed to measure the single construct of quantitative reasoning abilities and conceptual understanding in algebra. In addition, ideally, we would like the measured construct to have the same structure across IA and CA students, so their scores are comparable. Finally, we would like the APCR-CC assessment to be sensitive to instruction, so the difference between pre- and posttest scores represents an estimate of how much learning occurred.
Method
Participants
The revised APCR-CC consisting of 25 items dichotomously scored (1 = correct, 0 = incorrect) was applied in a pre- and posttest setting during fall semester 2017 across six community colleges in three states. A total of 1,131 students answered the instrument in the pretest, and 772 in the posttest. There were 503 IA students and 628 CA students in the sample data set during the pretest, and 343 and 429, respectively, during the posttest. In the pretest (posttest) sample, 46.9% (48.4%) of the students were women; 9.3% (10.3%) of the participants were below 18 years, 65.5% (66.7%) were between 18 and 21 years, 12% (11%) were between 26 and 35 years, and 3.5% (3.8%) were above 36 years; 65.3% (66.9%) of the students were White, 8.3% (6.17%) were Black or African American, 8.6% (9.1%) were Asian American, and 10% (9.5%) were mixed race. Finally, from the whole sample, 18.4% (18.6%) of the students were Hispanic or Latino.
Procedures
We investigated unidimensionality first (Analysis 1); then, we examined different IRT models to find the one that best describes our data (Analysis 2); and finally, we assessed MI across IA and CA students (Analysis 3). All analyses were conducted in the IRTPRO 2.1 software (Cai, Thissen, & du Toit, 2011). In addition, once results from pre- and posttest data were obtained, items’ sensitivity to instruction was analyzed (Analysis 4).
Analysis 1
To apply IRT models, the dimensionality of the data needs to be assessed first. In practice, researchers using IRT usually assume that a single trait or underlying construct characterizes examinees’ behavior (de Ayala & Hertzog, 1991). However, there could be cases in which it would be more realistic to assume examinees’ responses are described by multiple traits. In this study, exploratory multidimensional IRT models were used to investigate data dimensionality. Multidimensional IRT models seek to accurately describe the interaction between persons and items with the objective of closely reproducing the probability of correct response to each item based on such interactions (Reckase, 1997). Two IRT models, a one-dimensional and a two-dimensional, were fitted to assess dimensionality. The two-dimensional model was motivated by the fact that the APCR-CC assessment includes concepts of algebra and precalculus. The one-dimensional model tests the hypothesis of a single trait, which we call quantitative reasoning abilities and conceptual understanding in algebra. The Bayesian information criterion (BIC) fit index was used for model comparison purposes. In addition, we used the confirmatory DETECT statistic available in the conf.detect function of the sirt R package (Robitzsch, 2018) to test the unidimensionality of the data. DETECT is a nonparametric conditional covariance-based dimensionality procedure that provides an estimate of the amount of multidimensionality on the test (Zhang, 2007). The larger the index, the greater the amount of multidimensionality present in the data. Values of 1 or more indicate large multidimensionality, between 0.4 and 1 moderate to large multidimensionality, between 0.2 and 0.4 weak multidimensionality, and values below 0.2 (including negative values) indicate unidimensionality (Jang & Roussos, 2007; Robitzsch, 2018; Zhang, 2007). The commented code used to estimate the DETECT index is presented in the appendix.
Analysis 2
After confirming that the revised APCR-CC instrument can be sufficiently characterized by one underlying trait in Analysis 1 (see the “Results” section), we investigated model fit for two unidimensional IRT models: the one-parameter logistic (1PL) and the two-parameter logistic (2PL).
The probability of correct response of person i to item j in a 2PL model is defined as
where
Item discrimination,
Theoretically, item difficulty or item location,
The 1PL model is defined exactly as the 2PL by assuming that item discrimination takes the same value across all items. Model estimation was carried out in IRTPRO 2.1 for Windows using the default settings, that is, using maximum likelihood estimation via the Bock–Aitkin approach with expectation–maximization algorithm and normality as the distributional assumption (Cai et al., 2011; Paek & Han, 2012).
Model fit was compared with the BIC index, for which smaller values are preferred. The root mean square error of approximation (RMSEA) computed using the M2 statistic is reported for each model. For good model fit, the RMSEA should be close to zero (Cai et al., 2011). The
Local item independence refers to the assumption in IRT models that the responses to an item are independent to the responses to any other item in the instrument, given the person’s ability level. High values of the
Analysis 3
To verify that IRT-derived ability scores are comparable across groups, the researcher must analyze whether the construct (quantitative reasoning abilities and conceptual understanding in algebra) has the same structure across groups. MI is a statistical property achieved when the relationships between items’ responses and the construct being measured are the same across groups. That is, MI assesses the psychometric equivalence of a construct across groups and indicates that the construct has the same meaning for those groups (Putnick & Bornstein, 2016). Examination of MI provides essential evidence to support scale development and comparability of scores across populations. In this particular context, we are interested in testing MI across IA and CA students.
The investigation of the extent to which the construct shows MI across groups involves a series of steps. First, a baseline model is established by fitting a model for both groups simultaneously. Model fit is evaluated, and if the baseline model fits well, the researcher proceeds to the following stages in which other two models are fitted with increasingly restrictive equality constraints. If the fit of these models is preserved, then the measure is said to be invariant (Pendergast, von der Embse, Kilgus, & Eklund, 2017; Sass & Schmitt, 2013).
These different steps allow to test three different types of MI. Step 1—Configural invariance shows that items are associated to the construct in the same way across groups. The same model is estimated but parameter estimates are free to vary across groups. In this step, the best fitting IRT model found in Analysis 2 (2PL model, see “Results” section) is estimated for both groups allowing discrimination and difficulty parameters to be freely estimated. This constitutes the baseline model and model fit is evaluated for each group. Step 2—Metric invariance provides evidence that each item contributes in a similar manner between groups or that the magnitude of the relationships between items and the construct is equivalent across groups. In this step, a 2PL model is fitted simultaneously holding discrimination parameters equal across groups. If the model fit does not change substantially between models in Steps 1 and 2, then metric invariance is supported. Step 3—Scalar invariance shows that the metric of the observed variables is relatively equal across groups. In this step, both discrimination and difficulty parameters are held equal across groups. If the model fit does not change substantially between models in Steps 2 and 3, then scalar invariance is supported.
The RMSEA and BIC are reported for each model. Smaller values of BIC are preferred, and increments in RMSEA (ΔRMSEA) greater than 0.015 are considered cautionary when examining change in model fit for MI purposes (Chen, 2007). Furthermore, likelihood ratio (LR) tests were used for model comparison purposes. The LR test compares nested models with different levels of constraints to determine whether they significantly differ from one another. These tests were used to compare the models fitted in Steps 1 to 3 as recommended in the MI literature (de Ayala, 2009; Edwards, Houts, & Wirth, 2018; Kim, Cao, Wang, & Nguyen, 2017; Meade, Lautenschlager, & Hecht, 2005). The LR test compares two nested models: a more complex or less constrained model (also known as full model) and a simpler or more constrained model (also known as reduced model). The LR statistic is defined as LR = (−2log[LR]) − (−2log[LF]), where −2log(L) denotes the log likelihood of the model, LR the likelihood of the reduced model, and LF the likelihood of the full model. This statistic is distributed as a chi-square with the degrees of freedom equal to the difference in the number of parameters between the two models. If the LR statistic is nonsignificant, then the additional complexity of the full model is not necessary, the two models do not differ statistically speaking, and the reduced model is preferred (de Ayala, 2009). In the context of MI, the first step is to compare metric (reduced) against configural (full) invariance. If the LR test is nonsignificant, then MI holds at that level and metric invariance is superior in model–data fit. The second step compares scalar (reduced) against metric (full) invariance. Scalar invariance will hold if the LR test is nonsignificant.
Analysis 4
Instructional sensitivity is defined as “the tendency of items to discriminate with respect to the effectiveness of instruction” (Haladyna & Rodriguez, 2013, p. 348). An instrument that is sensitive to instruction should reflect changes in scores due to instruction. This property is important in the context of this study because the APCR-CC instrument was given pre- and postinstruction during the fall 2017 semester. Thus, if the instrument is sensitive to instruction, the change in item difficulty between pre- and posttest represents an estimate of how much learning occurred. We look at whether item parameters, difficulty, and discrimination improved in the posttest application compared with the pretest. That is, if item difficulty decreased and item discrimination increased from pretest to posttest APCR-CC application, then we say the item showed sensitivity to instruction.
Results
We first present and discuss results using the pretest data and then we discuss results using posttest data.
Analysis 1
One-dimensional and two-dimensional IRT models were fitted to assess dimensionality. The BIC index favored the one-dimensional model in both pre- and posttest data (see Table 1), suggesting that the revised APCR-CC instrument was unidimensional, that is, that one latent trait suffices to characterize the probability of correct response to an item. In addition, the confirmatory DETECT index for both pre- and posttest data had a value of −0.71 and −0.69, respectively, as shown in Table 1, confirming the unidimensionality of the data. Consequently, several unidimensional IRT models were investigated, which are presented in the following section.
Model Fit Indexes of Analysis 1 and 2.
Note. PL = parameter logistic; BIC = Bayesian information criterion; RMSEA = root mean square error of approximation.
Analysis 2
As presented in Table 1, the BIC and RMSEA indexes showed that the 2PL model was favored over the 1PL model in the pretest data, as well as in the posttest data. This suggests that it is preferred to use a model in which both item discrimination and item location are estimated.
There were two items that presented low item discrimination and item difficulties outside the interval [−3, 3] during the pretest (see Table 2). Item 17 was extremely difficult
2PL Model Item Fit Index and Item Parameter Estimates.
Note. PL = parameter logistic; APCR-CC = Algebra and Precalculus Concept Readiness Assessment for Community Colleges.
Signal item discrimination estimates below 0.15.
Signal item difficulty estimates below −3 and above 3.
The
Note. PL = parameter logistic; APCR-CC = Algebra and Precalculus Concept Readiness Assessment for Community Colleges.
Signal items for which p < .05.
Finally, Cronbach’s alpha was also computed for both pre- and posttest data as a measure of internal consistency. Cronbach’s alpha was equal to .73 and .79 for pre- and posttest data, respectively, which is satisfactory measure of internal consistency. Considering these measures and the overall results from the 2PL IRT model for both pre- and posttest data, we can say the APCR-CC assessment shows good internal consistency and the data–model fit of the 2PL model was adequate.
Analysis 3
Table 4 shows RMSEA, BIC, and loglikelihood indexes of the three models fitted to evaluate MI. First, to assess configural invariance, the 2PL model was fitted simultaneously across IA and CA students where parameter estimates were freely estimated. This model provided a good overall model–data fit as noted by RMSEA values of 0.2 and 0.4 for pre- and posttest APCR-CC data, respectively.
Fit Indexes of 2PL Models Fitted for MI Assessment.
Note. PL = parameter logistic; MI = measurement invariance; APCR-CC = Algebra and Precalculus Concept Readiness Assessment for Community Colleges; RMSEA = root mean square error of approximation; BIC = Bayesian information criterion.
Furthermore, by examining model fit for each group separately, it was found that Items 9 and 17 showed the same pattern for IA in the pretest as when analyzing all data together in Analysis 2. Item 9 was extremely easy, whereas Item 17 was extremely difficult for the IA group. However, for CA, the pattern was inverse: Item 9 was extremely difficult and Item 17 was extremely easy. The rest of the items showed a performance within the expected range for the pretest for both groups. The
The configural invariance analysis for the posttest data showed similar findings for the IA group as when analyzing all data together in Analysis 2. Item discrimination and difficulty estimates were within the expected range for the IA group. However, Item 9 was extremely difficult and Item 17 was extremely easy for CA students. The
As summary, satisfactory levels of configural invariance held for pre- and posttest data considering both IA and CA groups as shown by the RMSEA index and the analysis of model fit for each group separately. Thus, the underlying structure fits the data reasonably well when no group constrains are imposed, which indicates we can now move to Step 2.
The next level of invariance is metric invariance, which was assessed by comparing model fit of Step 1 and Step 2 models. The LR test between metric and configural invariances was statistically significant for both pretest and posttest data (see Table 4), meaning that configural invariance was preferred. However, the chi-square statistic is known to be highly sensitive to large sample sizes, as it is the case in the present study. Thus, consideration of additional indexes is warranted (Kim et al., 2017). The BIC index was smaller for Step 2 for both pre- and posttest data. Likewise, the RMSEA index either had the same (pretest) or a lower value (posttest), indicating better model fit for models in Step 2. Therefore, we could conclude that metric invariance held for both pre- and posttest APCR-CC data.
Finally, by comparing models in Steps 2 and 3, we assessed whether scalar invariance was supported as shown in Table 4. The LR test between scalar and metric invariances was statistically significant for the pretest but not for the posttest data, meaning that scalar invariance was supported for posttest data. However, the BIC index favors the models fitted in Step 3. The RMSEA index presented the same value for the pretest and increased by 0.01 for the posttest data, suggesting that the model fit did not change considerably between models in Steps 2 and 3. Therefore, scalar invariance held for pretest data. In addition, for posttest data, LR test, the BIC, and ΔRMSEA indexes favored scalar invariance.
In summary, the overall results supported the three levels of MI for pretest and posttest APCR-CC data across IA and CA students. Although the LR test was significant for most comparisons, results from BIC and ΔRMSEA favored scalar invariance for both pre- and posttest data. Thus, we can conclude that the relationships between items’ responses and the construct being measured are equivalent across the two groups of students.
Analysis 4
There were 15 items (Items 3, 5, 6, 10-12, 15-23) whose item discrimination and difficulty improved from the pretest to the posttest APCR-CC application. Moreover, five additional items showed an improvement in their difficulty parameter estimates (Items 1, 4, 7, 24, 25). That is, a total of 20 items showed some sign of sensitivity to instruction, which means that under the assumption that the content was taught during the instruction period, student performance in 80% of the items suggests that learning occurred from the pretest to the posttest.
Discussion
The current study investigated the psychometric properties of the revised APCR-CC instrument. Our findings suggest that the APCR-CC instrument is sufficiently characterized by one underlying trait. A 2PL IRT model provided a good model fit for pre- and posttest data, local dependence does not seem to be an issue, and 80% of the items in the APCR-CC instrument are sensitive to instruction. Even though two items (9 and 17) seemed to fail to aid in measuring quantitative reasoning in community college students during the pretest, these two items showed acceptable performance for the posttest data, indicating sensitivity to instruction. During the pretest, Item 9 was extremely easy but students with high quantitative reasoning ability answered it incorrectly, whereas Item 17 was extremely difficult for the population under study. However, for the posttest APCR-CC application, all items showed acceptable item parameter estimates. Therefore, it seems that the instruction period had an important effect on students learning.
A reliable instrument for measuring students’ quantitative reasoning abilities and conceptual understanding in algebra upon entrance to community college could be beneficial for students and faculty alike. Such an instrument could inform faculty about whether students are prepared for specific math courses, as well as serve as a research instrument to measure readiness for a particular course and growth in conceptual understanding over a period. We envision the APCR-CC to be primarily used as a research instrument to measure college students’ quantitative reasoning abilities and conceptual understanding of fundamental algebraic ideas. The current study provides the basis for the continuous analysis of a future revision of the revised APCR-CC instrument in which modifications to the problematic items are incorporated. As such, it can be administered to IA and CA students to determine the degree to which they are mathematically prepared for future mathematics courses focused on quantitative reasoning. Additional data from research on students’ quantitative reasoning based on the use of this instrument may guide and support curricular changes in algebra courses throughout collegiate mathematics.
The analysis of MI was established for the type of students (IA and CA) as the only variable of interest. However, MI should also be established for other important variables of interest such as gender and or race. This constitutes a limitation of the present study. Likewise, although the analyses presented here provide evidence for construct validity of the APCR-CC instrument, other aspects of validity should also be investigated. For instance, content validity could be examined by analyzing the relation between the APCR-CC’s content and the construct of quantitative reasoning abilities and conceptual understanding in algebra. Or, it could be investigated whether scores from the APCR-CC assessment are able to predict future performance in a specific setting or other test scores to have criterion-related evidence of validity. Future research will aim to address these limitations to further support the interpretation and use of scores derived from the APCR-CC assessment.
Footnotes
Appendix
The confirmatory DETECT statistic was computed using the conf.detect function of the sirt R package (Robitzsch, 2018). The code was
where data_name is the data frame of dichotomous responses; score is the vector of ability estimates. In this case, we used the weighted likelihood estimates for dichotomous responses based on the Rasch model, available in the function wle.rasch of the same package. itemcluster is the item cluster for each item. The order must correspond to the columns in data_frame.
Acknowledgements
We are grateful to Megan Breit-Goodwin, Anne Cawley, Saba Gerami, Patrick Kimani, Nicole Lang, Dexter Lim, Angeliki Mali, Randy Nichols, Jon Oaks, Carla Stroud, and Judy Sutor who assisted with refining the APCR-CC instrument and collecting data for this project.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by National Science Foundation Grant #1561436, Algebra Instruction at Community Colleges: An Exploration of Its Relationship With Student Success.
