Abstract
Acadience Reading Diagnostic: Comprehension, Fluency, and Oral Language Assessment (CFOL) is an individually administered diagnostic assessment published by Acadience Learning for students in kindergarten through sixth grades. The measure purportedly provides diagnostic information in story coherence/text structure, listening and reading comprehension, vocabulary and oral language, and fluency with expository and narrative texts. The skills assessed with the CFOL are designed to provide information about skills related to students’ comprehension difficulties so educators can better target comprehension interventions to support students’ specific learning needs.
Test Description
Acadience Reading Diagnostic: Comprehension, Fluency, and Oral Language (CFOL; Powell-Smith et al., 2021) is the newest entry into the popular and widely used Acadience Learning suite of assessments. The CFOL is purportedly designed as a diagnostic assessment to identify possible reasons why a student is experiencing comprehension, fluency, or oral language difficulties. This tool is part of the Acadience Reading Diagnostic (the CFOL and the Acadience Reading Diagnostic: Phonological Awareness & Word Reading and Decoding [PA & WRD]), which can be used together to inform the selection of appropriate interventions for individual students. Diagnostic assessment, defined as data collected to determine mastered and unmastered skills in students, is one of the three types of assessment used within the data-based individualization framework, along with screening and progress monitoring (Welland et al., 2024).
Sections and Tasks Within the Comprehension, Fluency, and Oral Language Assessment.
Target Population
The CFOL is used with students in kindergarten through sixth grades who have already been identified through screening as experiencing reading difficulties. The entire battery is not designed to be used with each student, but rather assessors should decide which domains and skills to assess based on a student’s grade level and any existing information about a student’s specific reading difficulties. The CFOL Checklist, provided in the Assessment Manual, can be used to guide this decision making.
Demographic data are reported for the population of the schools used to validate the instrument but not the exact sample of students. Additionally, the three pilot studies used samples restricted to midwestern states. Thus, the evidence to support this assessment is likely limited to a specific population of students.
Test Construction
The CFOL test construction began with a comprehensive review of the literature on scope and sequence for CFOL skills instruction. First, a list of all identified skills was compiled and organized by grade. After removing duplicates, skills were evaluated for feasibility (i.e., could a task be constructed to measure the skill) and efficiency (i.e., could more than one skill be measured at once to save time). Additional sources were examined to resolve discrepancies regarding which grade or grades each skill should be designated. Some skills were tested across multiple grade levels as part of the CFOL initial pilot study (see the Technical Adequacy section). Following the initial pilot study, a content expert evaluated the overall structure and content and offered recommendations. The CFOL was revised following the expert feedback, including focusing the test on skills most likely to affect reading comprehension, reducing working memory components, and restructuring several passages and assessment tasks.
The test is organized into seven domains outlined in Table 1, with each containing two to five tasks. All reading passages used in the tasks were leveled using the Acadience Learning Passage Difficulty Index, which assessed word difficulty, semantic difficulty, and syntactic difficulty both individually and combined as a composite score to determine whether a passage was appropriate for a specific grade level. Individual words used in the tasks were evaluated using the Living Word Vocabulary (Dale & O’Rourke, 1981), which provided a percent known value for each word for each specific grade level, starting with grade 4. Words with high percentage known values were chosen for tasks when possible.
Technical Adequacy
The CFOL was validated through three primary studies. The first pilot study examined the scope and sequence of tasks, determined if the test directions and procedures were easily understood by students, determined start and discontinuation rules, and determined if prompting and scoring rules were appropriate. The pilot study was conducted with 76 students from a midwestern state (i.e., 14 K & first; 16 from each grade second through fourth). All data were collected by trained personnel from Acadience Learning and consisted of qualitative feedback from assessors that then helped the publishers to revise the assessment for the second pilot study.
The second pilot study (Pilot-2) had similar purposes to the original pilot, but it also included an examination of the relation between Acadience Reading benchmark measures and CFOL, and item-response analysis. Pilot-2 was conducted with 762 students from grades Pre-K to fourth. Analysis revealed ceiling scoring issues and starting point issues in younger grades, and that some tasks were not showing growth across grade levels. After adjusting for these results, a survey with final items and tasks were sent to a sample of assessors. Assessors were asked to respond to 15 items using a 6-point Likert scale ranging from 1 (Strongly Disagree) to 6 (Strongly Agree). Publishers report that all measures received average high ratings (Powell-Smith et al., 2021), ranging from 4.13 (SD = 1.02) to 5.29 (SD = 0.69). Example ratings from the expert reviewers included: “Overall, the measures would be beneficial for planning reading instruction for struggling readers” and “All items included within the measure were appropriate” (p. 141).
The third study conducted was a more traditional validation study (Powell-Smith et al., 2015). The main purposes of this study were to examine the procedural reliability (i.e., a set of steps that ensures that any person with the appropriate skill set can complete the task without error), the appropriateness of the order of items within each task and across tasks, confirm final discontinuation rules, examine the relation between diagnostic and reading benchmark data, examine factor structure, and determine consumer satisfaction. The study was conducted with approximately 1500 students across multiple settings (i.e., urban and rural). A stratified random sampling was used so that approximately 50% of the students in the sample were at or above benchmark level at the beginning of the year and the other 50% were below or well below benchmark.
Validity
Validity refers to the accuracy that the scores of an assessment capture what they are intended to measure. According to the Standards for Educational and Psychological Testing, in order for scores to demonstrate validity, researchers must find evidence across multiple measures that demonstrate their measure is truly assessing the skills it claims to assess, at the accurate grade/age level, and at an accurate rate that reflects true development rates (American Educational Research Association [AERA], American Psychological Association, & National Council on Measurement in Education, 2014). Validity is one of the “most fundamental considerations in developing tests and evaluating tests” (p. 11) and a necessary component when considering the quality of an assessment (AERA et al., 2014). Validity of the CFOL was assessed in four ways: (a) criterion-related validity via correlations with Acadience Reading measures, (b) correlations among the CFOL sections, (c) item-response analyses, and (d) factor analysis.
Criterion-related validity was measured using correlations between CFOL tasks and Acadience Reading measures reported by grade level, time of year, and section of the assessment. Validity was assessed as a correlation between the score on the assessment and the criterion. Interpretations of the correlations followed the guidelines set forth by Hopkins (2002; i.e., <.09 = Very Small, .10–.29 = Small, .30–.49 = Moderate, .50–.69 = Moderate-Strong, >.70 = Strong).
The ranges of the correlations for criterion-related validity were as follows: Section A tasks ranged from small (.20 with Letter Naming Fluency) to moderate (.32 with First Sound Fluency), Section B ranged from small (.21) to moderate (.38), Section C ranged from small (.21) to moderate (.69), Section D ranged from small (.22) to moderate-strong (.70), Section E ranged from small (.21) to moderate (.55), Section F ranged from small (.21) to moderate (.59), and Section G ranged from small (.28) to strong (.79). These correlation coefficients suggest the scores on these measures have low concurrent and predictive validity. These small and moderate correlations may be because the majority of the criterion measures to which the CFOL is being compared are not measuring similar constructs. For example, the criterion validity coefficients for Sections A (Story Coherence/Text Structure) and B (Listening Comprehension) were compared to Acadience Reading measures of phonological awareness and phonics (e.g., Phoneme Segmentation Fluency, Letter Naming Fluency). There are a few instances where sections are compared to similar measures. For example, Section C (Reading Comprehension) is compared to Maze, which also purportedly measures reading comprehension. However, even when the criteria are similar, the coefficients are small to moderate (.28 and .35).
Results from the item-response analysis demonstrated that the items progressed in difficulty both within the progression of the tasks in the domains and across grades, which indicated that the measure adequately assessed the progression of these skills. A confirmatory factor analysis organized tasks into four skill-based categories (comprehension, fluency, oral language, and vocabulary). In comparison to two alternative models, the confirmatory model exhibited the lowest AIC, emphasizing its superior fit. Strong loadings (.78–.99) were observed for all four categories. Within the Comprehension category, most tasks from the three included sections (A: Story Coherence/Text Structure, B: Listening Comprehension, and C: Reading Comprehension) displayed robust correlations (.50–.75), except for task A1 (.14). The Oral Language category confirmed two sections (D: Syntactic Knowledge/Grammar and E: Morphological Awareness) with moderate to strong loadings (.64–.84). The Vocabulary/Word Knowledge category affirmed one section (F: Vocabulary/Word Knowledge) with strong loadings (.81–.86). In the Fluency category, one section (G: Reading Fluency) demonstrated a confirmed fit within the predicted category, exhibiting strong loadings (.90–.94). Three sections (i.e., morpheme identification, syntax discrimination, and summarizing main ideas in short passages) were removed based on the findings from both the item-response analysis and confirmatory factor analysis.
Reliability
Reliability refers to how consistently an assessment measures the intended skills (AERA et al., 2014). This allows educators to know that use after use, an assessment will give a score that was consistent with the last and therefore an accurate reflection of the child’s skills, if under ideal conditions. To demonstrate and ensure reliability, researchers need to present replications showing consistent scores, with varying conditions, and populations (AREA et al., 2014). Reliability/precision scores need to be reported appropriately and clearly for interpretation.
Reliability was estimated with (a) inter-rater reliability, (b) internal consistency, using Cronbach’s alpha, and (c) the communality estimates from the factor analysis. Inter-rater reliability indicated moderate to high consistency between raters on both interobserver agreement (.76–.99) and kappa (.41–.93). The kappa coefficients were calculated by task (21 tasks), and reliability descriptions were determined by Landis and Koch’s (1977) guidelines (i.e., <0 = Poor, 0–.20 = Slight, .21–.40 = Fair, .41–.60 = Moderate, .61–.80 = Strong, .81–1.0 = Almost Perfect). Eight of the 21 tasks demonstrated moderate rater agreement, while the remainder demonstrated strong to almost perfect rater agreement.
Internal consistency was measured by calculating Cronbach’s alpha for each grade level for each of the 21 tasks and had considerable variability ranging from .34 (Sentence Completion, Grade 4) to .91 (Reading Fluency, Grade 4). Out of the 21 tasks across grade levels, 28 out of 68 (41.18%) demonstrated internal consistency above .80 which is considered acceptable reliability for low-stakes decisions about individual students (Salvia et al., 2017). Of the 21 tasks, 6 demonstrate internal consistency above .80 across all grade levels assessed including A1 Story Telling, B1 Retell, C1 Paragraph Reading Retell, E3 Making Words, F2 Multiple Meanings, and G Reading Fluency (Narrative). Given that a little over a third (41.18%) of the subtests demonstrated adequate reliability, the internal consistency of this scale is an area that needs additional research, especially given that many well-established measures for reading demonstrate higher reliability (National Center on Intensive Intervention, n.d.).
Communality demonstrated a strong relationship between observed scores and the latent construct (ranging from .78 in the Fluency skill category to .99 in the Oral Language skill category), which suggested that CFOL scores accurately measured targeted reading skills. Communality estimates were used to calculate reliability estimates (ranging from .61 in the Fluency skill category to .98 in the Oral Language skill category).
Commentary
Assessing reading comprehension has long been identified as the missing link in any assessment-to-intervention model for reading because most measures of comprehension were too strongly influenced by word decoding (Keenan, 2012). Thus, a tool that assesses reading comprehension for diagnostic purposes is especially welcomed. The CFOL was designed to sample behaviors and skills involved in reading comprehension in order to fully inform intervention efforts. While myriad skills contribute to reading comprehension, this test prioritized skills that were time-efficient, cost-efficient, and practically related to reading interventions. It was designed to be differentiated from other diagnostic measures based on three features: integration with other early literacy screeners (e.g., Acadience Reading K–6), comprehensive coverage of comprehensive skills with a brief administration, and explicit linkage to instruction and intervention decisions. Although recent attempts to measure reading comprehension attempt to isolate the skill from word reading (Carlson et al., 2014), the CFOL attempts to differentiate comprehension deficits due to poor word reading from other potential difficulties, which is also a strength. Finally, the tasks covered in the CFOL align well with the Common Core State Standards (National Governors Association Center for Best Practices, 2010) and are likely relevant to most reading curricula.
The test construction assessed validity in multiple domains, as suggested by standards (AERA et al., 2014), including evidence based on test content (i.e., item-response analysis), evidence based on relations to other variables (i.e., convergent evidence of correlation with other Acadience measures), and evidence based on internal structure (i.e., inter-item correlations and factor analysis). However, limited information was provided about validity generalization, consequences of testing, or response process (AERA et al., 2014), and criterion-related validity was estimated with correlations to other Acadience reading measures and resulted in relatively low coefficients. The validity evidence shows potential promise but as of yet is not convincing and additional research is needed. Moreover, the theoretical structure of the assessment is pragmatic and potentially useful and incorporated most strategies used when comprehending text, but there are some potentially important omissions, such as looking back at preceding text for relevant information, that are especially responsive to intervention and practice (van den Broek & Espin, 2012).
Reliability was examined somewhat comprehensively and estimated and presented by grade, both of which were strengths of the measure, but reliability estimates could have been provided for other relevant subgroups (e.g., specific linguistic and cultural subgroups or with individuals who have a particular disability). Presenting decision consistency information and reporting standard errors of measurement would have also been helpful (AERA et al., 2014). Many of the estimates of internal consistency were well below recognized acceptable standards, and additional research is needed before this measure can be used with confidence.
The CFOL provides some guidance (i.e., CFOL Checklist) on when it should be used instead of other measures but ultimately lacks clarity regarding systematic decision guidelines. For instance, the manual highlights the importance of evaluating fluency as a prerequisite for reading comprehension and suggests considering the PA & WRD measure if accuracy is low on an oral reading fluency measure, to assess decoding difficulties. However, it does not provide explicit guidelines for precisely when the CFOL should be used within this process. Moreover, the score analysis only presents means and standard deviations for each grade level but does not provide clear guidelines for score interpretation. Scoring is based on item-specific scoring guidelines from the Assessment Manual, which makes it challenging for practitioners to score in the moment, making the test less user-friendly and accessible for teachers with limited time.
Conclusion
The CFOL is an interesting entry into assessing reading comprehension and could be the first with the purported purpose of intervention design. The CFOL is mostly aligned with areas of effective reading instruction and reading comprehension theory. It appears to be a potentially promising tool, especially given it is one of the first tools developed to explicitly provide diagnostic information regarding strengths and weaknesses in comprehension, fluency, or oral language skills. Only 6 of the 21 tasks demonstrated internal consistency above .80, and thus, more research is needed to examine the reliability before the measure can be used confidently.
When used within a data-based individualization framework, the CFOL is probably not appropriate to be used for screening or progress monitoring as it is too time intensive to be used with a large group of students and does not appear to be sensitive to short-term changes in growth. It appears to be most appropriate when used as a diagnostic assessment for students already identified as needing intensive intervention to gather information about a student’s specific strengths and difficulties (Jung et al., 2018; Welland et al., 2024). However, practitioners could use the CFOL as part of an overall problem-solving framework that combines multiple measures to identify reading interventions for individual students. The test authors recommend using the Acadience Reading Diagnostic (PA & WRD and CFOL) with their Acadience Reading K–6, an assessment package designed to be used for screening and progress monitoring. Additional guidelines to help determine when the tool should be used, how to interpret the data, and how to be used with existing measures a school is using would also facilitate its use. The CFOL may also be useful for researchers interested in studying reading comprehension and how to best intervene for it. Although additional research is needed, the CFOL is a reading comprehension diagnostic tool that makes a potentially important contribution to school-based assessment-to-intervention efforts.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
