Abstract
Inadequate responders demonstrate significant risk for learning disabilities. Previous investigations of the cognitive profiles of inadequate and adequate responders have not included measures of executive functions (EFs), which have well-documented associations to reading comprehension. We evaluated EF performance on a common factor comprised of shared variance across tasks as well as five separable EF factors in the context of an intensive reading intervention for struggling fourth graders. To determine whether EF performance at pretest is associated with subsequent responder status, we compared EF performance of three subgroups of students: inadequate and adequate responders and typical students not at risk for reading disabilities. Results of discriminant function analyses and linear regression models comparing groups were largely null; EF performance at pretest demonstrated only small associations with responder status. These results suggest that the assessment of EF may have limited value in predicting which individual students will respond to intensive reading interventions.
We define executive functions (EFs) as domain general control processes that permit individuals to complete goal-directed activities (Cirino et al., in press), similar to other broad-based definitions (e.g., Lezak, Howieson, Bigler, & Tranel, 2012; Miyake et al., 2000), although it is well recognized that EF can be characterized in many different manners (Suchy, 2009). As most school-based tasks represent goal-directed activities (e.g., reading or performing mathematics calculations), there is considerable intuitive and theoretical appeal for investigating EF in educational contexts. Performance on measures of EF is strongly associated with academic performance in key academic areas, such as reading, writing, and math (Blair & Razza, 2007; Gerst, Cirino, Fletcher, & Yoshida, 2017; Yeniad, Malda, Mesman, van IJzendoorn, & Pieper, 2013), and deficits in EF are associated with difficulties in reading (e.g., working memory; Cain, Oakhill, & Bryant, 2004).
The term EF has its roots in neuropsychology. For example, Goldstein (1949) noted that patients who had sustained damage to the frontal lobes of the brain had a deficit in “abstract attitude” with effects on both performance tests as well as everyday behavior, although earlier case studies were available (e.g., Harlow, 1848). Pribram (1973) was one of the first to use the words “executive” in connection with the frontal cortex. Since that time, the conceptualization of EF has become more expansive, in several ways. First, the meaning of EF has become decoupled from a one-to-one relationship with frontal lobe functioning (Stuss, 1992), even as this relationship continues to be explored. Second, the structure of EF (how its components relate to one another) is becoming more clear across age ranges (e.g., Cirino et al., in press; Miyake et al., 2000; Wiebe et al., 2011), and there is now greater information about how EF components such as working memory, inhibition, shifting, planning, and so on, relate to one another. Furthermore, EF is known to develop and improve throughout childhood (Anderson, Anderson, Northam, Jacobs, & Catroppa, 2001; Best, Miller, & Jones, 2009), including in terms of its structure (e.g., Huizinga, Dolan, & van der Molen, 2006; Lee, Bull, & Ho, 2013), all of which is consistent with neurobiological development, particularly with regard to the frontal lobes (Jurado & Rosselli, 2007). Third, EF has come to be associated with other broad functions, each of which has their own historical traditions across cognitive, developmental, and education literatures. These functions include attention, working memory, metacognition, and self-regulation, and there is recognition of the relationship of EF to such skills (Hofmann, Schmeichel, & Baddeley, 2012), assessed either with laboratory tasks or rating scales. EF has been associated with achievement outcomes at the level of both intervention and prediction. For example, intervention approaches have focused on either broad, curricular approaches that emphasize self-regulation (Diamond & Lee, 2011) or explicit working memory training (Melby-Lervåg & Hulme, 2013), although the effects of neither have been robust. Prediction studies are common, though it is rarer to couple this with explicit achievement intervention. The present article attempts to both use a conceptualization of EF that reflects its complexity, across the manners in which it is assessed (see below), and also takes place in the context of a reading intervention that has a self-regulatory component.
The well-established associations between academics and EF have spurred interest in whether interventions targeting EF (broadly construed) may result in improved academic performance and improvement in reading (e.g., Bierman, Nix, Greenberg, Blair, & Domitrovich, 2008; Diamond, Barnett, Thomas, & Munro, 2007). This issue was investigated in a recent meta-analysis (Jacob & Parkinson, 2015) that evaluated both the unconditional relationship of EF to academic skills, as well as causal relations—that is, whether manipulating EF experimentally results in improved academic performance. Jacob and Parkinson (2015) concluded that there was a clear relationship between academic performance and EF in its various instantiations but found no compelling evidence that associations between EF and academic performance are causal. Of note, Jacob and Parkinson identified only five studies that included a robust set of covariates (e.g., maternal education, socioeconomic status, or prior achievement) deemed sufficient to suggest a unique role for EF in academic performance; among the 13 correlations between EF and academics in these studies, only one, a correlation between working memory and math achievement (Fitzpatrick & Pagani, 2012), was significant and positive.
Beyond correlational or predictive relationships, and experimental manipulation of EF, there is a third potential way for EF to affect reading performance. EF may affect the manner in which students respond to intensive reading interventions. For example, a threshold of skill in EF prior to participating in the intensive intervention may be necessary, if not sufficient, for responding effectively to the reading intervention. Alternatively, students with differing levels of EF may respond differently to reading intervention, with good EF potentially compensating to some extent for skill deficits. These possibilities are especially plausible for older students (i.e., beyond Grade 2), for whom the focus of reading tasks is on comprehension rather than or in addition to a word-level focus. In this context, EF may be utilized to help scaffold or control essential processes of reading comprehension, such as incorporating background knowledge, summarizing passages, or monitoring engagement with text. To our knowledge, previous reviews of factors associated with intervention response in reading have not included studies explicitly evaluating EF or those focusing on reading comprehension in older students (Stuebing et al., 2015), which makes this an innovative area of inquiry. To address such questions, studies must include measures of EF, academic achievement, and (and experimentally manipulate) intensive reading intervention. Below, we describe the importance of studying the relations between EF and reading intervention response in the context of intensive interventions, and the EF characteristics of students who do and who do not demonstrate adequate response to intensive interventions. The key question that guides this study is, Can baseline (pretest) measures of EF help us understand which individual students will respond to intensive reading interventions?
EF and Intervention Response
The Importance of Intervention Response
There are many potential causes of poor reading, which confounds generalizations from many cross-sectional and correlational research studies that include cognitive and neuropsychological dimensions. For example, academic difficulties may be the result of limited academic opportunity or ineffective instruction (Fletcher & Vaughn, 2009; L. Fuchs, Fuchs, & Vaughn, 2008). Students arrive at school with different language and literacy experiences (Bhattacharya, 2010; Molfese & Molfese, 2002). Students engage in heterogeneous language and literacy activities (Connor, Morrison, & Slominski, 2006; Foorman et al., 2006; Piasta, Connor, Fishman, & Morrison, 2009) requiring consideration not only of reading level among students with reading difficulties but also whether students have received adequate opportunity, in the form of intensive, evidence-based reading interventions (Fletcher et al., 2011; Miciak et al., 2014).
The Attributes of Inadequate Responders: A Critical Subgroup
Studies that focus on intervention response permit inferences about inadequate responders, a population of considerable interest in special education practice and research and who by definition demonstrate intractable academic difficulties. In the last decade and a half, there has been movement toward learning disabilities (LD) identification criteria that rely, in part, on inadequate response to intensive interventions as an inclusionary criterion, data which are typically collected as part of a school-wide response to intervention (RTI) framework (also called multidisciplinary systems of support; Gresham et al., 2005; VanDerHeyden & Burns, 2010). This procedural shift mandating that students identified with LD must demonstrate inadequate response to evidence-based interventions has not been without controversy, as some have questioned the validity of LD identification decisions emerging from RTI frameworks (Hale et al., 2010; Kavale, Kauffman, Bachmeier, & LeFever, 2008). However, the mandate for documentation of inadequate RTI is not unique to methods that rely on RTI frameworks. Across all potential LD identification criteria permitted by Individuals With Disabilities Education Act (IDEA) 2004, the comprehensive assessment process must document inadequate response to evidence-based interventions (Fletcher, Lyon, Fuchs, & Barnes, 2007). This mandate to document intervention response as an inclusionary criterion for all LD identification highlights the critical need to provide empirical evidence for the validity of classifications based on instructional response (Fletcher et al., 2011; Miciak et al., 2014). Students who demonstrate differential response to evidence-based, intensive interventions can be compared on dimensions not utilized for group formation—and EF is a key example of these. If students differ on such external dimensions, the classification accrues validity (Morris & Fletcher, 1988). By focusing on EF, the present study extends validity research on RTI methods by comparing inadequate and adequate responders in this important and previously unstudied domain.
Understanding the attributes of inadequate responders also contributes to research that identifies potential intervention targets. For example, inadequate responders are known to have deficits in several cognitive and linguistic domains, including phonological awareness, vocabulary, and listening comprehension (Fletcher et al., 2011; Stage, Abbott, Jenkins, & Berninger, 2003). Many of these domains are promising intervention targets (Lesaux, Kieffer, Kelley, & Harris, 2014; Torgesen, Wagner, Rashotte, Herron, & Lindamood, 2010). Therefore, if EFs are associated with responder status, then to the extent these domains of EF are malleable, it would serve as another promising intervention target, potentially as an adjuvant to intensive reading interventions.
Predicting Who Will Respond: The Importance of Accurate Prediction
A second reason to evaluate EFs in the context of reading interventions is that findings from this evaluation may prompt educators to provide more intensive intervention earlier. If measures of EF are demonstrated to be associated with intervention response, then administering these measures early and providing early treatment may provide for more effective outcomes. This is especially relevant because most RTI service delivery frameworks include three to four tiers of increasing intensity (Jimerson, Burns, & VanDerHeyden, 2016; Kovaleski, VanDerHeyden, & Shapiro, 2013). The idea that cognitive and linguistic measures could be useful predictors of individual response to intensive reading interventions is not new. Previous literature reviews (Al Otaiba & Fuchs, 2002) and meta-analyses (Nelson, Benner, & Gonzalez, 2003) have implicated a number of skills as potential predictors of intervention response, including phonological awareness, rapid naming, problem behavior, and deficits in working memory. However, these reviews provide associations between cognitive attributes and reading problems and do not evaluate the extent to which baseline cognitive performance predicts response status following intensive reading intervention.
Stuebing and colleagues’ (2015) meta-analysis evaluating predictors of intervention response identified three models by which predictors of intervention response have been investigated: (a) unconditional growth curve models, (b) unconditional gain models, and (c) change conditioned on initial reading status. The three models address distinct research questions, and confusion about differing analytic frameworks may account for different interpretations of evidence on the relative contributions of student characteristics to intervention response. Model 3 is particularly pertinent in the present context, as it directly addresses a “value-added” component to which the assessment of EF could contribute (Fletcher et al., 2011; Miciak et al., 2014). Academic assessments are inexpensive, easily administered, and directly inform intervention. To justify the use of more comprehensive psychoeducational assessments (e.g., cognitive assessment and assessments of EF), it would be critical to show that they contribute unique and meaningful information about student response—that is, that cognitive and EF assessments explain unique variance in posttest outcomes (Fletcher & Miciak, 2017).
In the present study, we identify whether EF contributes unique variance to responder status (inadequate vs. adequate), beyond baseline reading variables. Previous studies of the association between academic status and cognitive functioning have demonstrated a clear stepwise progression in which students who experience the most significant academic delays experience the most significant cognitive delays, consistent with a continuum of severity hypothesis (Fletcher et al., 2011; Vellutino, Scanlon, Small, & Fanuele, 2006).
The Present Study: Context and Purpose
The present study extends a previous investigation of the structure of EF (Cirino et al., in press). In that study, 27 indicator measures were assessed, representing eight potentially separable latent EF factors from across cognitive, neuropsychological, and educational literatures. Multiple models were evaluated; the best fitting model was a bifactor model with a common EF factor, as well as five specific latent factors: (a) working memory as recall span with and manipulation/planning; (b) working memory as ongoing maintenance of information and updating in active memory; (c) generative fluency (e.g., verbal fluency) as efficient and accurate retrieval from semantic memory within a set of parameters; (d) self-regulated learning (SRL) as the ability to control one’s own learning processes, including strategy use and self-efficacy; and (e) metacognition as executive processes that monitor, manipulate, or control/regulate other cognitive processes. This final model serves as the basis for the analyses conducted in the present study. For purposes of the present study, factor scores were generated from the latent model (common EF and five specific EF factors), resulting in scores along a z-score metric. The present study extends Cirino et al. (in press) with previously unreported data on a subsample of struggling readers (Vaughn, Solís, Miciak, Taylor, & Fletcher, 2016). These students received an intensive reading intervention and were later empirically classified as either inadequate or adequate responders to the intervention using postintervention data. These two groups were compared with one another and a group of typical students who did not demonstrate reading difficulties. We then evaluate whether there is unique variance in responder status explained by identified EF factors, after including pretest reading and language performance.
We hypothesized that the results of this study would be consistent with a continuum-of-severity hypothesis. That is, because EF is related to reading, and typical readers outperform adequate responders, who in turn (by definition) outperform inadequate responders, we expect EF to differentiate these groups in a similar stepwise fashion. We expect this differentiation to remain (although diminished in size) once initial reading status is accounted for, with typical readers and adequate responders demonstrating relatively better performance on measures of EF. Given the robustness of the common EF factor, we expect differences on this factor score to be the largest, but it is more difficult to form strong a priori hypotheses about a differential pattern of EF to be exhibited by the three groups on the specific EF factors. Our expectations are not only bolstered by the robustness of the measurement of the common factor but also tempered by the evidence with other cognitive functions to date, which have shown little added value (Fletcher et al., 2011; Miciak et al., 2014; Stuebing et al., 2015). Nonetheless, the present study contributes to our understanding of the role that cognitive and neuropsychological processes play in reading outcomes.
Method
Overview
The present study, as noted previously, extends findings from a measurement study on the structure of EFs (Cirino et al., in press) and an intervention study of an intensive reading intervention implemented with struggling readers (Vaughn et al., 2016). Participants for the present study include students who completed the full EF battery from the measurement study and also received intensive reading intervention provided in the intervention study. The study also compared the intervention students with typical students (i.e., students who did not demonstrate reading risk) who were randomly selected as part of the measurement study and were assessed in the fall and spring. The data set, research questions, and analytic plan for the present study are unique. No previous study investigating the effects of the intensive intervention has analyzed EF as a potential predictor of intervention response.
Participants and Group Formation
School sites
Participants for the present study were drawn (approximately equally) from two sites: one large urban district (eight elementary schools) and a second site encompassing two close urban school districts (nine elementary schools) in the Southwestern United States. The mean enrollment among the 17 schools was 697 students (range = 425–1,140), and the percentage of students who received free or reduced lunch was 81.6% (range = 46.1%–98.4%). All schools were rated as academically acceptable. Students for both the measurement and intervention studies were drawn from the same set of schools and sites.
Selection of intervention participants
The target population for the intervention was fourth-grade students at risk for reading disabilities, as measured by a standard score of 85 or below on the Gates–MacGinitie Reading Test (GMRT; MacGinitie, MacGinitie, Maria, Dreyer, & Hughes, 2000). Of the 1,695 fourth graders who were screened across the 17 participating schools, 487 were identified as struggling readers. Demographic information is provided in Table 1. The struggling readers were randomly assigned at the student level within schools in a 2:1 ratio to either a researcher-provided intervention group or to a business-as-usual (BAU) comparison group. There were no statistically significant differences between the groups according to participant age, t(479) = 1.15, p > .05; free or reduced lunch status, χ2(1) = 0.00, p > .05; special education status, χ2(1) = 0.34, p > .05; or race/ethnicity, χ2(1) = 3.20, p > .05. Of students randomly assigned to the researcher-provided intervention, several attrited (see Vaughn et al., 2016, for a full review), but analysis revealed no differences between attrited and nonattrited students by score on the GMRT at pretest, t(482) = 0.05, p > .05; school type (urban or near urban), χ2(1) = 2.50, p > .05; or intervention assignment (researcher-delivered intervention or BAU), χ2(1) = 0.76, p > .05. Additional students did not complete the pre- and posttest academic assessments, resulting in 231 intervention students included in the current study.
Demographic Data for the Intervention and Typical Sample.
Limited English proficient (LEP) students were excluded from participation in the large measurement study and were therefore not available for comparisons as typically reading students. However, students classified as LEP were included in the intervention study and are included as at-risk students and classified as adequate and inadequate responders.
Study participants
The present study includes 305 participants from two nonoverlapping pools: intervention students and typical readers. Intervention students (n = 231) were students who were assigned to the intervention condition and completed the pre- and posttest academic assessments, which permitted classification of individual students as responders or inadequate responders. Typical readers (n = 74) were students who achieved a standard score greater than 90 at pretest on the primary reading criterion measures (below), completed the EF measures, and completed the posttest academic battery.
Categorization of inadequate responders
Participants who completed the researcher-provided intervention were classified as inadequate or adequate responders based on posttest performance on two measures of reading comprehension: the Woodcock–Johnson Third Edition, Passage Comprehension (WJ-III Passage Comprehension; Woodcock, McGrew, & Mather, 2001) subtest and the GMRT (MacGinitie et al., 2000). Students who received a standard score of 85 or below on the GMRT or a score of 86 or below on the WJ-III Passage Comprehension subtest were classified as inadequate responders. Students who scored above 85 on the GMRT and above 86 on the WJ-III Passage Comprehension subtest were classified as adequate responders. These cut points were applied because they represented the threshold for the bottom quartile for both measures based on a local, benchmark sample (described below). The application of these criteria resulted in 167 inadequate responders and 64 adequate responders.
Our decision to employ final-status criteria (as opposed to growth criteria or dual discrepancy criteria; D. Fuchs & Deshler, 2007) is based on two considerations. First, final status directly addresses a critical educational programming question: Does the student continue to require intensive reading intervention? In addition, indices of growth add little predictive power over final status to a determination of adequate response (Schatschneider, Wagner, & Crawford, 2008; Tolar, Barth, Fletcher, Francis, & Vaughn, 2014), particularly in samples drawn beyond early elementary grades. This is because indices of growth and final status are highly correlated when selection is based on performance at time point one (i.e., pretest; as in this study). The only manner in which students can achieve differential final status is differential (fan-shaped) growth.
Our decision to utilize multiple, final-status indicators of reading comprehension to determine intervention response matches previous studies of the cognitive characteristics of inadequate responders (Fletcher et al., 2011; Miciak et al., 2014). Agreement across single indicators of intervention response is limited (Barth et al., 2008; Fletcher et al., 2014). When using single indicators, group membership will fluctuate due to changes in criterion measures or testing occasion, an artifact of imperfectly reliable measures that inadequately measure the latent construct. Furthermore, there is no “gold standard” for determining intervention response (Fletcher et al., 2007). As a result, the selection of criterion measures and cut points is to some extent arbitrary. We therefore chose to employ response criteria that included (a) multiple measures, (b) cut points fixed to the same benchmark sample to eliminate error introduced by different norming samples (Fletcher et al., 2014), and (c) multiple scores above the specified cut point to be considered an adequate responder. This third criterion reduces the number of false negatives (students considered to be adequate responders who continue to demonstrate reading risk and need interventions) at the expense of false positives (students considered to be inadequate responders who do not demonstrate subsequent difficulties in reading). This decision reflects our judgment that false positives are less deleterious than false negatives, given the relatively low expense and risk associated with continued intervention.
Benchmark sample
The benchmark sample consists of all 846 students in Grades 3 to 5 in the measurement study (Cirino et al., in press) but is not directly evaluated here. Rather, the benchmark sample is utilized to define the cut point for the bottom quartile on the criterion measures for intervention response. The measurement study oversampled struggling readers—a result of the planned overlap with the intervention study. To adjust scores to account for this oversampling of struggling readers, we weighted observations within the full sample of 846 to reflect a normal distribution of the GMRT within the population of the study (Cirino et al., in press, for additional details).
Measures
All measures were administered by trained examiners hired and supervised by the research team. Additional details regarding measures and procedures can be found at https://www.texasldcenter.org/projects/measures.
Demographic information
Student demographic information was collected from the schools where the intervention took place. These data included student age, gender, free or reduced lunch status, special education eligibility, and race/ethnicity.
Executive functioning measures
Details of the measures of EF can be found in Cirino et al. (in press). All EF measures were administered in fall, prior to the beginning of the reading intervention, except for the teacher questionnaires, which were administered in spring. Teacher questionnaires were administered in spring, after a full year in classrooms, because a fall administration would have required teachers to answer questions about students with whom they are not familiar. For Shifting, parts of four subtests from the Delis–Kaplan Executive Function System (D-KEFS; Delis, Kaplan, & Kramer, 2001) were utilized (Design Fluency, Verbal Fluency, Color-Word Identification Test, and Trailmaking Test). Participants switched between two previously introduced actions or stimuli in all conditions. Inhibition was assessed with two measures. A Go/No-Go task (Draine, 2003) required students to refrain from responding to an infrequent target. For Stop Signal (Draine, 2003), students tried to respond appropriately to two visual stimuli while inhibiting any response to an auditory stimulus. Working Memory as Span/Manipulation With Planning (WMSMP) included five measures, including the Listening Span subtest from the Working Memory Test Battery for Children (WMTB-C; Pickering & Gathercole, 2001), Corsi Blocks (Draine, 2003), two measures of the Tower of London test (Draine, 2003), and the WJ-III (Woodcock et al., 2001) Planning subtest. Working memory as updating used four N-back measures (Kirchner, 1958), also administered on the Inquisit platform. Using letters or complex (unnamable) shapes, students need to keep track of a running sequence and gauge whether a current stimulus matches one presented sometime previously; a d-prime measure was computed for each. Generative fluency used three subtests from the D-KEFS (Delis et al., 2001): Lexical Fluency, Categorical Fluency, and Design Fluency; for each subtest, students had 1 min to generate as many exemplars as possible from a given letter (Lexical), semantic grouping (Category), or dot pattern (Design). SRL was assessed with the Contextual Learning Scale (Cirino et al., in press), a researcher-created self-rating scale that included new items, as well as existing or adopted items from the Motivated Strategies for Learning Questionnaire (Pintrich & De Groot, 1990), School Motivation and Learning Strategies Inventory (Stroud & Reynolds, 2006), Child Behavior Rating Scale (Bronson, Goodson, Layzer, & Love, 1990), Motivations for Reading Questionnaire (Wigfield & Guthrie, 1997), Self-Regulation Strategy Inventory–Self-Report (Cleary, 2006), and Patterns of Adaptive Learning Survey (Midgley et al., 1996). The scale assesses self-efficacy and effort, reading strategies, and perceived skill and preference for reading activities. Finally, Metacognition was assessed with the same-named scale of the Behavior Rating Inventory of Executive Function–Teacher (BRIEF; Gioia, Isquith, Guy, & Kenworthy, 2000) and with the Inattention subscale from the Strengths and Weaknesses of ADHD Symptoms and Normal behavior (SWAN; J. Swanson et al., 2006). Both questionnaires were administered in spring. Per the measurement study, factor scores were only derived for common EF, and specific EFs including (a) WMSMP, (b) working memory as updating, (c) fluency, (d) SRL, and (e) metacognition.
Reading
For descriptive purposes, we report scores in three areas of reading: decoding and spelling, fluency, and comprehension. All reading measures were administered in fall (pretest) and spring (posttest). For Decoding and spelling, the WJ-III Letter–Word Identification and Spelling subtests (Woodcock et al., 2001) were used to measure word reading accuracy and spelling. Test–retest reliability coefficients for children aged between 8 and 13 years range from .84 to .85. For Fluency, the Test of Word Reading Efficiency (TOWRE; Torgesen, Wagner, & Rashotte, 1998) Sight Word Efficiency subtest was used to measure word list fluency with real words and pseudowords. Alternate forms reliability for this instrument exceeds .90 (Torgesen, Wagner, & Rashotte, 1998). For Comprehension, the GMRT (MacGinitie et al., 2000) was administered in a timed, group setting. Alternate form reliability ranges from .80 to .87. The Passage Comprehension subtest from the WJ-III (Woodcock et al., 2001) is a cloze-type assessment in which students read a passage and fill in missing words. Test–retest reliabilities for children aged 8 to 13 years range from .76 to .86.
Intervention
A full description of the intervention can be found in Vaughn et al. (2016). The standardized, researcher-delivered intervention met 5 times per week for approximately 16 weeks from November to April. Each session lasted approximately 30 min and was conducted with small groups of four to five students. The intervention consisted of several 2-week thematic units of 10 lessons each that ended with a Maze progress monitoring activity during the 10th lesson of the unit. During each session, students engaged in three activities: (a) word and concept building through the development of vocabulary knowledge (3–10 min), (b) text-based reading of narrative or expository passages (15–20 min), and (c) word study using decoding strategies (6–10 min).
Vocabulary
Students received vocabulary instruction for 3 to 10 min during nine out of each 10 sessions in a unit (six explicitly taught vocabulary words) that pertained to the key social studies concepts presented in their readings. Words were taught using explicit instruction procedures, a sequence that included (a) “kid friendly” definitions, (b) visual representations, (c) synonyms, and (d) application of the word to personal experiences, and turn and talk procedures to utilize new words. Practice was distributed across the 10-day unit, and vocabulary learning was assessed utilizing a Maze activity to check for word understanding and inform instruction.
Text-based reading
During the text-based reading portion of each unit, students read two types of texts: (a) stretch texts or (b) fluency texts. Stretch texts included grade-level content, whereas the fluency texts contained content at a student’s reading level. Stretch-text instruction included extended text reading and text-based questions that prompted students to integrate different sections of the text. During fluency-text instruction, students read passages from the QuickReads program (Hiebert, 2003). The instructional sequence included three steps (a) text preview and questions, (b) an individual silent read, and (c) a summary activity, similar to stretch-text reading. To build fluency, students then engaged in repeated readings of the same text with an identified fluency focus (i.e., improved rate, improved accuracy, or better expression).
Word study
Students practiced phonics skills through patterned word reading with words, phrases, and sentences. The word lists included multisyllabic words, high-frequency words, and word patterns, depending on the abilities of the students.
Intervention implementation
A team of 19 tutors who were hired and trained by the research team implemented the intervention. All tutors received 10 hr of training on the implementation of the intervention as well as effective behavior management and engagement strategies. During the 16-week intervention, the tutors participated in 8 hr of continuing professional development, attended biweekly staff development meetings, and received coaching once every 2 to 3 weeks to maintain fidelity of implementation. The average total amount of instruction delivered to students at all sites was 42.2 hr (SD = 7.1 hr) over the course of the intervention.
Intervention fidelity
Intervention implementation fidelity was evaluated using a 4-point Likert-type rating scale for each component of the intervention. Global quality and fidelity scores were also rated on a 4-point scale. Trained members of the research team, who had achieved interrater reliability >90%, completed all fidelity observations on a randomly selected sample of audio recordings of taught lessons. Across all components and tutors, the researcher-provided intervention received a mean implementation score of 3.71 (SD = 0.24, range = 3.45–4.00), a mean global quality score of 3.71 (SD = 0.50, range = 2.00–4.00), and a mean global fidelity score of 3.48 (SD = 0.55, range = 2.00–3.82). Additional information pertaining to the intervention implementation and fidelity can be found in Vaughn et al. (2016).
Data Analysis
The pattern of EF performance over the one common and five specific EF factors across the three reader groups was assessed using a split-plot design to compare group performance across the six EF factors. We followed procedures outlined by Huberty and Olejnik (2006) for a descriptive discriminant analysis that allows for the interpretation of the contribution of different dependent variables to group discrimination. The advantage of this analysis plan is that it allows for simultaneous analysis of all variables and addresses the relative influence of the grouping variable (responder, inadequate responders, and typical) on the outcome variables (the six EF factors). The design addresses two issues: (a) group differences across the set of EF factors and (b) pattern or profile of group differences across the set of EF factors.
The analysis plan encompassed three steps. First, we evaluated the group-by-task interaction in the omnibus analysis to determine whether the effect of the grouping variable (responder status) was consistent across the set of dependent variables. In the presence of a statistically significant interaction, we proceeded with pairwise multivariate comparisons for each group combinations (typical vs. adequate responder, typical vs. inadequate responder, and adequate responder vs. inadequate responder) to identify differences between groups. This analysis permits interpretation of specific group differences on the set of dependent variables. To control for potential Type I error, a Bonferroni-adjusted alpha of p < .017 (.05 / 3) was used for all pairwise multivariate comparisons. Each pairwise comparison computes a linear discriminant function, which maximally separates the groups. Following procedures described by Huberty and Olejnik (2006), we report three methods for interpreting the contribution of specific EF factors to the discriminant function: (a) canonical structure correlations, (b) standardized discriminant function coefficients, and (c) univariate contrasts. Univariate significance is evaluated at a Bonferroni-adjusted alpha of p < .008 (.05 / 6). When conducting pairwise comparisons, univariate contrasts parallel the findings of canonical structure correlations. However, there is no statistical test of significance associated with the two multivariate methods for interpreting the contribution of specific variables to the discriminant function.
To address the value-added contribution of EF for response, we followed procedures outlined by Stanovich and Siegel (1994), and used in later studies evaluating the cognitive attributes of adequate and inadequate responders to intensive interventions in early elementary (Fletcher et al., 2011) and middle school (Miciak et al., 2014). Following these examples, we developed six regression models; one model predicting each of the EF factors included in this study. The three predictor variables comprise the two reading comprehension measures (WJ-III Passage Comprehension and GMRT) at pretest and a contrast reflecting adequate or inadequate responder status. The contrast determines whether there is unique variance associated with the relation between performance on the EF variable and responder status beyond the variance explained by performance on the reading measures at pretest. A statistically significant contrast for responder status would suggest that the continuum of severity hypothesis is insufficient to explain intervention responsiveness among this sample.
Results
Means and standard deviations on reading measures and on EF measures are provided in Table 2. A comparison of the z-score profiles for each group is presented in Figure 1. A split-plot design comparing the performance of the three groups on all six EF factors showed a significant group-by-task interaction, F(10, 590) = 11.31, p < .0001, η2 = .04. To investigate this interaction, we performed three pairwise multivariate comparisons investigating main effects and interaction terms.
Descriptive Statistics for the Intervention and Typical Sample at Pretest and Posttest.
Note. WJ-III = Woodcock–Johnson Third Edition; TOWRE SWE= Test of Word Reading Efficiency–Sight Word Efficiency; RC = Reading Comprehension; BRIEF = Behavior Rating Inventory of Executive Function–Teacher.

EF z-score profiles by group.
Discriminant Function
Typical versus adequate responder
The interaction term for the comparison of the typical and adequate responder groups was statistically significant, F(5, 129) = 8.67, p < .001, η2 = .04. Table 3 shows that the common EF factor contributed most to the discriminant function maximally separating groups. The univariate contrast for the general EF factor was statistically significant, F(1, 133) = 44.47, p < .001, η2 = .25. No other univariate contrasts were statistically significant.
Canonical Structure Correlations and Standardized Coefficients for EF Dimensions Group Status.
Note. EF = executive function; SDFC = standardized discriminant function coefficient; WM = working memory; BRIEF = Behavior Rating Inventory of Executive Function–Teacher.
Indicates univariate contrast significant at p < .008.
Typical versus inadequate responder
The interaction term for the comparison of the typical and inadequate responder groups was statistically significant, F(5, 232) = 24.69, p < .001, η2 = .05. Table 3 shows that the common EF factor contributed most to the discriminant function maximally separating groups. The univariate contrast for the general EF factor was statistically significant, F(1, 236) = 45.52, p < .001, η2 = .31, as was the univariate contrast for BRIEF/Metacognition, F(1, 236) = 7.75, p < .001, η2 = .04. No other univariate contrasts were statistically significant.
Adequate responder versus inadequate responder
The interaction term for the comparison of the adequate responder and inadequate responder was not significant, F(5, 225) = 1.56, p > .05, η2 = .00. The main effect was significant, F(5, 225) = 33.16, p < .001, η2 = .08. Table 3 shows that the common EF factor contributed most to the discriminant function maximally separating groups. However, no univariate contrast was statistically significant.
Continuum of Severity
We created regression models predicting each of the six EF factors from this study. Each regression equation included three predictor variables: pretest performance on the two criterion measures utilized to identify responder status (WJ-III Passage Comprehension and GMRT) and a dummy-coded grouping variable (1 for adequate responder and 0 for inadequate responders). The grouping variable was established to investigate whether there is unique variance in the EF factor associated with responder status beyond that accounted for by the reading variable. A significant contrast would suggest that reading performance at pretest is insufficient for predicting intervention responder status and that a continuum of severity hypothesis is insufficient to explain this pattern of results. In all six regression models, the contrast for responder status was not statistically significant, consistent with a continuum of severity hypothesis (β coefficient range = –.05 to .16).
Discussion
This study investigated whether EF (at pretest) is associated with responder status for a sample of fourth graders who are at significant risk for LD and who completed an intensive reading intervention. First, we evaluated to what extent performance on six (one common and five specific) EF factors was associated with group status: typical, adequate responder, and inadequate responder. Second, we evaluated whether initial EF performance explained unique variance in responder status beyond the variance explained in initial reading status. The present study extends previous literature in two ways. First, previous studies of EF have primarily utilized a correlational or cross-sectional design, or have investigated the relations of reading and EF in the context of direct EF training (typically with a focus on working memory). In contrast, the present study investigates relations of EF and reading in the context of an intensive reading intervention, ensuring that all participants have adequate instructional opportunity and permitting an investigation of whether initial EF might serve as a potential intervention adjuvant. Second, previous studies of the cognitive characteristics of inadequate responders have not included robust batteries evaluating EF (Stuebing et al., 2015). Given that inadequate response to evidence-based interventions is an inclusionary criterion for the identification of LD, it is important to better understand the performance of at-risk students who may differ on this dimension.
Differences in EF Attributes by Group
We found few statistically significant differences between the three groups across the six EF factors. As hypothesized, the largest separation occurred between the typical and inadequate responder groups. These differences were statistically significant in two domains: the common EF factor and the BRIEF/Metacognition factor. Correlations with the discriminant function indicated that group separation was driven largely by the common EF factor, with a smaller role for the BRIEF/Metacognition factor. Other correlations with the discriminant function were small and not statistically significant. That the largest separation occurred on the common EF factor is not surprising. In a separate study that also utilized this EF factor structure in the full sample of 846 across Grades 3 through 5 in fall (but not addressing the intervention context), we found that the common EF factor demonstrated strong associations with “pretest” reading skills (Cirino et al., under review), the defining dimension of our group structure.
Given that groups were defined by reading comprehension performance, and the well-documented unconditional relations between reading comprehension and EF performance (Jacob & Parkinson, 2015), it is surprising that no statistically significant separation occurred between the inadequate responder and adequate responder groups, as hypothesized. Previous analyses of the cognitive attributes of inadequate responders have found statistically significant separation across a number of dimensions, including phonological awareness, verbal knowledge, and rapid naming (Cho et al., 2015; Fletcher et al., 2011; Miciak et al., 2014). In addition, reviews and meta-analyses have implicated several other cognitive dimensions, including attention, problem behavior, nonverbal reasoning, and working memory (Al Otaiba & Fuchs, 2006; Nelson et al., 2003), although it should be noted that these studies were largely conducted with early elementary students. In explanation, we note that the effect size for the group-by-task interaction was of much smaller magnitude than in previous studies (e.g., Cho et al., 2015; Fletcher et al., 2011; Miciak et al., 2014), indicating that responder group status was more weakly associated with EF performance than previously investigated cognitive dimensions such as phonological awareness, rapid naming, or language. Given the well-documented and robust relations between reading comprehension and language processes (Catts, Adlof, & Weismer, 2006; Catts, Hogan, & Adlof, 2005), weaker associations for EF relative to language were expected. However, the failure to find any statistically significant between group differences was not consistent with previous studies of this type.
The Unique Prediction of EF
Our second analysis investigated whether there was unique variance in responder status related to EF beyond that explained by reading performance at pretest. If EF and responder status were associated, even in a more comprehensive statistical model that included pretest reading performance, it would suggest that EF is uniquely predictive of intervention response. However, a lack of statistically significant, unique variance between EF and responder status beyond reading pretest suggests that observed differences in levels of EF are associated with the severity of reading difficulties, consistent with the continuum of severity hypothesis (Vellutino et al., 2006).
We in fact found no statistically significant associations between responder status and EF across the six contrasts. This result is consistent with previous studies of inadequate responders that have not found statistically significant “value added” for cognitive functioning above and beyond pretest for explaining responder status (Fletcher et al., 2011; Miciak et al., 2014). These results also align with the results of Stuebing et al. (2015) that documented small unique contributions for baseline cognitive characteristics above and beyond initial status in reading. Notably, Stuebing et al. included a wide range of potential predictors, including strong predictors of reading performance such as vocabulary and phonological awareness, but did not include EF as a predictor.
It is important to note that much of the work evaluating predictors and/or moderators of intervention response has focused on reading interventions (as in the present study), and often in early elementary grades. These results may not generalize to other grades and other academic areas (e.g., math). For example, there is intriguing evidence emerging in mathematics research that working memory may moderate response to specific intervention protocols that differ in cognitive input and/or demands (e.g., L. S. Fuchs et al., 2013; L. S. Fuchs et al., 2014; H. L. Swanson, Lussier, & Orosco, 2013). These studies suggest that a more granular understanding of the role of EF and academic achievement may be necessary.
EF, Reading Intervention, and Assessment
To understand the largely null results of the present study, we find it valuable to compare and contrast this study with our reading prediction study (Cirino et al., under review). We found moderate relations of EF with reading performance, even after controlling for a very robust set of covariates, including language and demographic variables. We consider this strong evidence for a unique role for EF in reading comprehension at this age range. These findings and this assertion may seem incongruent with the findings of the present study, although predicting responder status following an intensive intervention is a fundamentally different question than predicting reading performance at a single time point. The former question is a more difficult—though pragmatic—undertaking, because there is typically less variance associated with changes in reading performance in individuals with reading difficulties than with reading performance across a full range of achievement. Such analyses emanate from our experience in educational practice in which critical questions are shaped by considerations of who needs and who does not intensive intervention or who will continue to need interventions of even greater intensity and dosage after a robust intervention protocol, as documented in this study. In addition, analyses of contributions of intervention response may point to promising avenues to improve intervention outcomes.
The results of the present study also have practical implications for the routine assessment of EF as part of typical school practice and for the identification of LD and/or attention problems. The present findings, particularly the findings of the second analysis which found little value added for measures of EF, provide little support for the necessity of assessing EF as a potential predictor of RTI in reading. In other words, the assessment of EF does not provide important information that might be used to preselect those with low EF for more intensive or differential types of intervention. In other forums, we have argued against the necessity of cognitive assessment for the identification of LD in reading for similar reasons (Fletcher & Miciak, 2017). The results from this study do little to dissuade us from such assertions related to EF and reading comprehension. However, we do not interpret the results of this study as demonstrating that EF plays no role in reading comprehension nor would we discourage future studies that attempt to leverage EF and EF-related skills to improve academic outcomes (e.g., Cirino et al., 2016; Peng & Fuchs, 2017). Additional work is needed in this area to determine whether and how best to incorporate EF instruction or academic instruction that explicitly incorporates procedures that scaffold for low EF, while maintaining an appropriate focus on the core instructional needs of struggling readers (i.e., explicit reading instruction).
Limitations
There are several potential explanations for the lack of statistically significant differences observed across both analyses. First, it is possible that the reading intervention incorporated in the present study was not ideally suited to leverage differences in EF at pretest to improve intervention response. This study did not evaluate response to an intervention that specifically attempted to build EF skills in the context of reading intervention. Although it included elements of self-regulation, including goal-setting and self-monitoring, these components did not constitute a large portion of instructional time, which was focused on building reading skills in core reading areas (e.g., decoding, fluency, and comprehension). Future studies should investigate whether an intervention that includes additional time instructing reading-related aspects of EF within the context of reading intervention might demonstrate more robust findings. In addition, there are limitations related to the sample utilized in this study. The sample sizes for the adequate responder group and students identified as typical readers are relatively small. It is possible that with larger samples, more statistically significant differences would be observed.
Conclusion
We investigated whether performance on EF at pretest was predictive of responder status following an intensive reading intervention. At posttest, participants were classified into three groups (inadequate responders, adequate responders, and typical students) based on performance on two measures of reading comprehension. We then compared group performance on EF according to two analytic plans. Results indicated that students identified as typical readers significantly outperformed inadequate and adequate responders on a common EF factor, and significantly outperformed inadequate responders on the BRIEF/ Metacognition factor, a teacher report measure. All other comparisons were not significant. Additional contrasts investigating whether EF predicted responder status above and beyond initial reading status were null, suggesting that observed mean differences in EF by group may reflect differences in reading ability. These results suggest that EF may not be a robust predictor of intervention response in reading comprehension. Thus, routine assessment of EF in school practice is currently not informative for predicting a student’s likely response to intensive reading interventions.
Footnotes
Authors’ Note
The content is solely the responsibility of the authors and does not necessarily represent the official views of the Eunice Kennedy Shriver National Institute of Child Health and Human Development or the National Institutes of Health.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Award Number P50 HD052117, Texas Center for Learning Disabilities, from the Eunice Kennedy Shriver National Institute of Child Health and Human Development to the University of Houston.
