Abstract
The heart of effective programming for gifted students lies in the integration of advanced curricula with effective instructional strategies to develop leaning activities that will enhance student learning outcomes. However, empirical evidence of the effectiveness of units based on such curricular and instructional interventions from large-scale experimental studies in multiple settings are limited. To document the effectiveness of units that integrated the principles from curricular and instructional models in the field of gifted education, two language arts units for gifted third graders were developed and tested in a randomized cluster design. Multilevel analyses of data collected from more than 200 classrooms document statistically significant differences favoring the treatment group over the comparison group on standards-referenced assessments.
Keywords
There are an estimated 3 million gifted and talented students in U.S. classrooms, spanning pre-Kindergarten to Grade 12 (National Association for Gifted Children [NAGC] & Council of State Directors of Programs for the Gifted, 2011). While educators have advocated for special services for gifted students (e.g., Colangelo, Assouline, & Gross 2004; Marshall, 1994; Renzulli & Reis, 1991; Robinson & Moon, 2003), this group of high-ability students are often not challenged at levels reflecting their current performance or their capabilities. In an era of school accountability where political and funding priorities focus more on bridging the achievement gaps, modifications of curriculum or instructional practices that respond to the level of performance of gifted students are oftentimes not considered. Furthermore, lack of a large body of empirical evidence of the effects of quality curriculum and differentiated instruction are cited as a key shortcoming in serving this group of high-potential, high-achieving students (e.g., Callahan, 1996; VanTassel-Baska, Zuo, Avery, & Little, 2002). Examination of outcomes across very diverse programs (no distinction by type of program, length of program involvement, intensity of program, or curriculum) did not indicate an effect of gifted programming (Adelson, McCoach, & Gavin, 2012), and enrollment in specific programming options including magnet options did not produce higher achievement outcomes in either regression discontinuity analysis for students on the margin or analyses of groups formed by a lottery except in science (Bui, Craig, & Imberman, 2011). In neither of these studies was verification of unique curricular offering by program option studied or verified. While these findings call into question the effectiveness of programming options, they also suggest the importance of studying curriculum in conjunction with grouping options.
The heart of effective programming for gifted education services lies in the development and the implementation of curricula and instructional strategies that will challenge and enhance learning outcomes for these students. While the field abounds with models that provide frameworks for curriculum and suggested instructional practice for gifted students (e.g., Maker, 2001; Renzulli, 1977; Renzulli & Reis, 1985, 1997; Tomlinson, 2001; VanTassel-Baska, 1986), empirical evidence related to the effectiveness of these models is still evolving (VanTassel-Baska, Bass, Reis, Poland, & Avery, 1998; VanTassel-Baska & Brown, 2007), and large-scale studies of interventions with gifted students are limited. Challenges to documenting effectiveness of curricular and/or instructional modification for teaching this group of students include (a) difficulty with establishing effective outcome measures (Hunsaker, Nielsen, & Bartlett, 2010), (b) complexities in determining the extent to which these models are responsible for observable and measurable outcomes using experimental paradigms (Sanchez et al., 2007), and (c) lack of data on fidelity of implementation (O’Donnell, 2008).
To gather further data on the effectiveness of model-based units on student learning in classrooms for gifted students, critical and compatible components of three highly regarded models reflecting suggested curricular and instructional modifications appropriate for gifted students were integrated into a single framework and two language arts units for third-grade students were developed. Key compatible elements from Tomlinson’s (2001) differentiated instruction model, Renzulli and Reis’s (1985) schoolwide enrichment model (SEM), Kaplan’s (2005) depth and complexity model, and empirically documented instructional strategies formed the basis for the integrated model called CLEAR.
Review of the Literature
Educators have voiced concerns about the lack of empirically tested differentiated curricula and instruction for gifted students (Passow, 1986; VanTassel-Baska et al., 1998, 2002). Researchers (e.g., Reis & Purcell, 1993; Reis et al., 1993; Yang & Siegle, 2006) have documented that approximately 40% to 50% of traditional classroom content and skill instruction at a given grade level is redundant for gifted students; yet gifted students spend up to 80% of their time in classrooms working on the same content, knowledge, and skills as all other students, resulting in a lost opportunity to learn. A national survey of third- and fourth-grade teachers (Archambault et al., 1993) also found a lack of learning opportunities for gifted learners. Archambault et al. (1993) documented that teachers make only minor modifications in the regular curriculum for gifted students leaving them largely unchallenged in achieving their full potential.
In response to concerns regarding lack of challenging curriculum and instructional opportunities for gifted students, guiding principles for curriculum and instruction for these students have been provided by national organizations (NAGC, 2010; Purcell, Burns, Tomlinson, Imbeau, & Martin, 2002) and have been continuously echoed by experts in gifted education (e.g., Kaplan, 1986; Reis & Purcell, 1993; Renzulli & Reis, 1994). These principles articulate standards for curriculum planning and instruction and specify elements that should be included to ensure optimal learning for gifted students. While some studies utilize these guiding principles in developing and evaluating units of study (e.g., Gavin, Casa, Adelson, Carroll, & Sheffield, 2009; Little, Feng, & VanTassel-Baska, 2007; VanTassel-Baska & Brown, 2007), further study of how to translate these guiding principles effectively into practice with diverse learners in various contexts is still needed.
Curricular and Instructional Models
Adjustments to classroom practice to respond to high-ability students’ current level of knowledge, understanding, and skill development as well as their ability to learn more rapidly are reflected in models specifically designed for curriculum development and instructional modifications, such as depth and complexity (Kaplan, 2005), differentiated instruction (Tomlinson, 2001), and the schoolwide enrichment model (Renzulli & Reis, 1985). While there are different degrees of emphasis on curriculum and pedagogy in these models, each model articulates guidelines for both curricular and instructional modifications. The curricular modifications in these models specify guidelines for increased challenge through choice of content and skills to be offered to gifted students such as increased depth and complexity of ideas presented, greater abstractness of content, greater ambiguity (in the positive sense of seeing multiple points of view), more open-ended problem solving, inclusion of more complex and abstract concepts, addition of critical thinking skills beyond grade level, and use of more sophisticated and advanced resource materials. Instructional modifications include the creation of activities that require more independence in completion of tasks, less basic detail in presentation of content, assumptions of greater student ability to make connections, greater choices in product and paths to production, and accelerated pace of instruction. Common themes across curricular and instructional models targeting the education of gifted students include a focus on more complex concepts and principles within and across disciplines of study (advanced for grade level), stress on advanced processing skills, interdisciplinary thinking, and modification of content allowing advanced learners greater depth of learning.
Differentiated Instruction Model
Differentiated instruction (Tomlinson, 2001) has been widely described as a means of meeting the diverse learning needs of students. Although the model positions instruction as the key element in a classroom, it is grounded in modifying three key elements of curriculum—content, process, and product—based on variation across student characteristics of readiness, interest, and learning profile. As such, the differentiated instruction model incorporates a variety of both curricular and instructional arrangements to meet the diverse needs of learners. Those arrangements include focusing on “big ideas” and concepts in a discipline; matching the pace of learning, degree of challenge, and interests of students to instructional tasks; and allowing students to create individual products that reflect their learning (Tomlinson, 2001). Theoretical support for the model comes from learning and development theories that propose (a) optimal learning is achieved when learners are exposed to tasks slightly above their current level of performance (e.g., Csikszentmihalyi, 1990; Csikszentmihalyi, Rathunde, & Whalen, 1993; Howard, 1994; Jensen, 1998; Sousa & Tomlinson, 2010; Vygotsky, 1962) and (b) learning should focus on understanding of key knowledge and principles of the field of study rather than rote memorization of information (e.g., Donovan, Bransford, & Pellegrino, 1999; Wenglinsky, 2002). While there is considerable research on the individual aspects of the differentiated instruction model (e.g., Reis et al., 2003, 2005; Reis & Fogarty, 2006), only recently have researchers examined outcomes when multiple elements in the model are applied as a whole. Such studies (e.g., Geisler, Hessler, Gardner, & Lovelace, 2009; Marulanda, Giraldo, & Lopez, 2006; Rasmussen, 2006) have documented positive effects of the model in student achievement and higher level thinking skills with diverse learners in a variety of school settings but not in randomized control group settings. Other studies on implementing schoolwide differentiation (Beecher & Sweeney, 2008; Burris & Garrity, 2008; Tomlinson, Brimijoin, & Narvaez, 2008) also report positive student achievement gains across subject areas, grades, and performance levels, but also not in randomized control group settings.
Depth and Complexity Model
In line with the differentiated instruction model’s (Tomlinson, 2001) focus on big ideas in a discipline, the depth and complexity model (Kaplan, 2005) emphasizes a disciplinary approach in developing curriculum by combining the elements of depth and complexity of content in a discipline. The model employs standards-based curriculum as the foundation to promote academic rigor and develops understanding by integrating elements of depth (details, patterns, rules, big ideas, unanswered questions, and ethical issues in the discipline) and complexity (multiple perspectives, interdisciplinary connections, and changes over time). The dimension of depth focuses the teacher’s and student’s attention on increasingly more challenging, divergent, and abstract qualities of knowing in an area of study, while complexity can be defined as the means by which knowledge is extended or broadened. The dimension of complexity affords the teacher and the students opportunities to identify associations such as connections and relationships that exist within, between, and among areas of study. In a differentiated curriculum, the dimensions of depth and complexity serve as prompts to form inquiries students may probe. According to Ausubel and Fitzgerald (1961), the concept of prompts as distinctive cognitive organizational symbol or structure becomes a mechanism of thought and/or action. Ausubel and Fitzgerald articulate that these cognitive structures facilitate the transfer of prior cognitive structure to new learning allowing learners to make connections within and across disciplines. A recent study on effectiveness of the model reported academic growth in social studies and language arts among elementary students at all levels including gifted students (cited in Kaplan, 2013).
Schoolwide Enrichment Model
The SEM has been widely adopted and extensively documented as one of the most popular approaches in gifted programming (VanTassel-Baska & Brown, 2007). Since the model was initially developed as a program approach based on the enrichment triad model (Renzulli, 1977), it encompasses broader modifications ranging from learning environment to curricular and instructional modification for advanced learners. The SEM promotes student engagement through the use of three types of enrichment experiences, which include (a) exposure to extensions of traditional content within the context of its use by real-world professionals, (b) process skills that are applied to solving real problems, and (c) investigations and/or creation of products that reflect in-depth investigations into solving real problems in areas of student interest and ability. These products are to reflect appeal to authentic audiences—authentic to the profession of the investigation.
Curriculum modifications within the model provide for a broader scope of topics introduced to the gifted students than they would encounter in the general education curriculum, greater depth of learning and development of more sophisticated problem-solving and investigative skill than what is typically specified on grade level, and production of products beyond what would be expected in general education curricular frameworks (Reis & Renzulli, 2003; Renzulli & Reis, 2010). The approaches emphasized in the model are grounded in the learning theories (e.g., Renzulli, 1977; Renzulli & Reis, 1997; Sternberg & Davison, 2005) that advocate meaningful learning within the context of a real and present problem. As such curricular and instructional modification emphasized in SEM share a common ground with problem-based learning (PBL) in that both approaches (a) are student centered, which encourage students to study topics of their interests and to determine how they want to study them, and (b) employ open-ended problems that serve as the initial stimulus and framework for learning. The student-centered approach helps students develop self-directed lifelong learning skills with intrinsic motivation to learn. In addition, the open-ended nature of the product cultivates a variety of modes for student assessment. Wiggins’s (1998) standards for authentic assessment capture this concept of open-ended nature of products and assessment encouraged in both SEM and PBL: The task or tasks replicate the ways in which a person’s knowledge and abilities are tested in real-world situations; the student has to use knowledge and skills wisely and effectively to solve unstructured problems, such as when a plan must be designed, and the solution involves more than following a set routine or procedures or plugging in knowledge; instead of reciting, restating, or replicating through demonstration what he or she is already known, the student has to carry out exploration and work within the discipline. (p. 22)
The effectiveness of SEM has been studied through field testing in relation to (a) student creative productivity (e.g., Burns, 1987; Delcourt, 1988; Newman, 1991; Reis, 1981; Starko, 1986), (b) student personal and social development (e.g., Olenchak, 1991; Skaught, 1987), and (c) student self-efficacy (e.g., Schack, Starko, & Burns, 1991; Starko, 1986; Stednitz, 1985). While SEM’s applicability in serving high-ability students in a variety of educational settings has been commended by educators, methodological limitations such as lack of the use of experimental design and small sample sizes in some of the studies have made it difficult to attribute results to implementation of the SEM (Van Tassel-Baska & Brown, 2007). Reis and her colleagues (Reis et al., 2003, 2005; Reis & Fogarty, 2006; Reis, McCoach, Little, Muller, & Kaniskan, 2011) used a cluster-randomized design to investigate SEM as a curricular framework in a reading intervention for elementary students of all ability levels including advanced learners. They reported higher scores in reading achievement, reading fluency, and students’ attitudes toward reading for the experimental group.
Taken together, the review of the related literature illustrates the need for investigating the extent to which model-based units of study are accountable for observable and measurable outcomes in classrooms for gifted students using an experimental paradigm. The current study investigated effectiveness of an integrated model by assessing student outcomes from two language arts units for third-grade gifted students. The purpose of the study was to document the effectiveness of the CLEAR model, specifically: Do gifted learners exposed to integrated model-based units of study outperform equally able learners in a comparison group who engage in traditional learning activities based on the same outcome standards on standards-referenced posttests when controlling for prior achievement?
Description of the CLEAR Model and Units
The CLEAR model was developed as a framework for curricular and instructional modifications for gifted students based on the critical components from Tomlinson’s (2001) differentiated instruction model, Renzulli and Reis’s (1985) SEM, and Kaplan’s (2005) depth and complexity model. The CLEAR model also integrates the components from these models with five foundational elements as the theoretical and philosophical underpinnings for curricular development. The five elements are:
The CLEAR model units are designed around learning goals that are meaningful, important, and clear. The meaningfulness and importance of the learning goals are derived from their alignment with national standards; the goals also reflect the essential knowledge, skills, and principles central to the field of study. In particular the language arts objectives for elementary schools as specified in state standards from across the United States were examined for commonalities and accommodated into the learning goals for the units. While the Common Core State Standards (CCSS; National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010) had not been developed or disseminated at that time, examination of the units revealed close alignment with the CCSS. 1
Underlying the CLEAR model units are the assumptions that even advanced learners (a) vary in their readiness levels, interests, and preferred learning profile and (b) learn best and most efficiently when their differences are accommodated. As such, learning experiences within the units based on the CLEAR model are differentiated to meet needs of diverse learners. Another underlying assumption of the CLEAR model is that learning is made most meaningful when learners (a) develop the knowledge, understanding, and skills needed by professionals in a field of study and (b) apply the knowledge, understanding, and skills they have acquired in real-world and relevant contexts. The units based on the CLEAR model allow teachers to provide instruction that guides students in developing and carrying out projects on topics of their own choosing using the methods and tools of professionals in a field of study.
In addition, the CLEAR model units are designed to lead students beyond mere factual knowledge to deep understandings of essential knowledge, principles (big ideas), and skills of a discipline. High-level challenges are built into the units by having students identify and apply the sophisticated and advanced vocabulary and language of the discipline; investigate the patterns, rules, varied perspectives, unanswered questions, and ethical issues within a unit of study; and make connections within and across disciplines.
Finally, the units include content differentiation, which reflects more advanced, complex, and abstract concepts both in terms of the learning goals of third-grade language arts but also with the introduction of concepts more normally encountered in fourth- and fifth-grade language arts. For example, in the poetry unit described in the following, the students read “The Fish,” a poem that is more advanced than third-grade students would be likely to read and are asked to determine the central message and explain how it is conveyed through the details in the poem, distinguish literal from nonliteral language, refer to parts of the poem and describe how each successive part builds on earlier parts of the poem, and explain how the poem conveys mood (all aspects of the third-grade CCSS for literature but applied to more advanced content). In addition, students are asked to determine the theme from details in the poem, determine the meaning of words and phrases as used in the poem, explain major differences between poems and refer to the structural elements of a poem (e.g., verse, rhythm, meter), compare and contrast the point of view reflected in poetry (fourth-grade CCSS), quote accurately from a text when explaining what the text says explicitly and when drawing inferences from the text, determine the theme of a poem from details, determine figurative language such as metaphors and similes, explain how a series of stanzas provide the overall structure of a poem, and analyze how visual elements of a poem contribute to the beauty of text (fifth-grade CCSS). 2
The two language arts units based on the CLEAR model were The Magic of Everyday Things, which focuses on poetry, and Exploration and Communication, which focuses on expository, nonfiction text, and research skills (referred to as the “poetry” and “research” units, respectively). The learning goals in the units are aligned with NAGC, National Council for Teachers of English (NCTE), and International Reading Association (IRA) standards, and they reflect the CCSS in English Language Arts. As such, the units reflect the essential knowledge, skills, and principles central to reading, writing, and communicating. Because the most prevalent service option for gifted learners is in pullout programs with instructional blocks of 45 to 60 minutes once or twice a week (Callahan, Moon, & Oh, 2013, 2014; NAGC, 2011), the units were designed so the whole unit could be completed during the fall/winter or winter/spring sessions in a typical pullout classroom for gifted students including time for pre-assessments, formative assessments, and the creation of an authentic product for each unit. However, the structure was such that the units could also be used daily in a self-contained classroom for gifted students or with frequency deemed appropriate by teachers in those settings.
Poetry Unit: The Magic of Everyday Things
The poetry unit is designed to increase students’ knowledge and understanding of different forms of poetry as the students expand their comprehension and writing skills. The title reflects one of the big ideas of poetry: Poetry helps readers to see the extraordinary in ordinary experiences. Within the poetry unit students are expected to achieve objectives that reflect (a) essential knowledge about poetry and literature such as literary devices and figurative language, (b) understanding of relevant key principles about poetry and literature, and (c) the development and demonstration of writing, reading, and skills relevant to poetry and literature. The unit also gives students multiple opportunities to learn ways to connect with the bigger, more abstract ideas expressed in poetry and to enhance their word knowledge, reading comprehension strategies, and writing skills in line with state and national standards.
At the beginning of the unit, students are introduced to appreciation activities in which they listen to poems and share how words draw and paint mental images. This work in imagery leads students to the writing process where they deconstruct and demystify poems through exploration of different forms and identification of distinct literary devices inherent in poems. This writing process is guided by writing prompts and is used with a variety of poems relevant to students’ lives. Then, students apply knowledge as poets in writing workshops, offering authentic learning experiences for students to write, peer review, revise, and present their poems to an audience.
The unit allows in-depth learning about poetry, as evidenced by authentic assessments. Students become explorers of their own experiences and the experiences of others, as they read and write poems in which concrete details reveal larger, more abstract ideas. As a culminating activity, students produce a poetry anthology, a summative assessment in which they organize and demonstrate their knowledge, skills, and understanding of poetry, as well as the habits of mind necessary for authentic work in poetry. Students’ learning in the unit is continuously assessed through a series of formative and performance assessments consistent with the CLEAR model. The unit guides teachers to use various assessments for making instructional decisions effectively, such as grouping arrangements, instructional modes, pacing adjustments, and organization of extra support or challenge to meet the diverse learning needs of advanced learners.
The unit as a whole exemplifies best practices in developing curriculum and instruction for gifted students in that it effectively translates the recommended principles into the language arts content area. It also provides advanced and conceptually challenging, in-depth, distinctive, and complex learning opportunities for gifted students. The learning activities also address learning standards in the Reading–Literature, Writing, and Language strands within the CCSS English Language Arts standards.
Research Unit: Exploration and Communication
The research unit is designed to help students derive information from, analyze, and evaluate a variety of nonfiction texts and expand their skills in research, writing, and the use of reading comprehension strategies. Using the metaphor of researcher as explorer, students learn how to focus their interests in an area, person, or topic into authentic research projects. Students set out on a “knowledge expedition” by framing initial research questions, followed by identifying, organizing, and evaluating information from different categories of nonfiction texts. Learning goals also include developing an appreciation of multiple perspectives on a topic and developing an understanding of how perspective shapes the way we interpret and share information. Using the concept of researcher as communicator, students learn how to share their findings with a specified audience in clear and meaningful ways of communication through writing and speaking. This learning process involves comprehension of texts and writing for communication emphasized in the Reading–Informational Texts, Writing, and Language strands of the CCSS not only in third grade but also throughout elementary and secondary English language arts.
The activities and assessments are designed to help students develop essential understanding and skills in research that can be applied across a range of disciplines. Additionally, the lessons give students the opportunity to work with ideas that suit their individual interests. Throughout the unit a series of formative assessments guide teachers to assess student progress and tailor instruction for optimal learning experiences. In the culminating assessment, students design and conduct a complete research project, which they share with an audience of students, parents, and teachers at a classroom “Research Gala.” The ultimate learning goal for students is an understanding that research is an organized and systematic strategy for finding answers to important questions. As such, the learning process incorporated in the unit with multiple layers of depth and complexity allows students to be an informed consumer and producer of knowledge. As with the poetry unit, the research unit also affords clearly articulated performance criteria for teachers to use in gauging student learning in each assessment.
Method
Research Design and Participants
We randomly assigned classrooms to treatment and comparison conditions using a cluster-randomized experimental design. Teachers and students in either pullout or self-contained classrooms specifically designated for gifted students were recruited through national advertisement at the state and district levels. Interested participants were informed that participation was contingent on compliance with random assignment. After recruitment, these classrooms were randomly assigned to treatment or comparison conditions with students nested within classrooms. In cases where teachers taught multiple classrooms in different schools or where multiple classrooms for gifted students were housed in a single school, those classrooms were assigned to the same condition in order to avoid a possible diffusion of treatment effect. A total of 1,215 students from 71 pullout classrooms and 14 self-contained classrooms in 11 states participated in the first year of the study (Cohort 1, 2009–2010 school year), 1,007 students from 66 pullout classrooms and 16 self-contained classrooms in 14 states took part in the second year of the study (Cohort 2, 2010–2011school year), and 683 students from 51 pullout classrooms and 5 self-contained classrooms in 19 states participated in the third year of the study (Cohort 3, 2011–2012 school year). 3 In the second and the third year of the study, some teachers continued participation with different groups of third-grade students; those teachers were placed in the treatment condition, resulting in more classrooms in the treatment condition. 4
Intervention Procedure
At the beginning of each school year, teachers in the treatment condition were given the two CLEAR model units described earlier to implement. In the first two years of the study, teachers were allowed to decide the order and scheduling of implementation of the units to be completed by the end of the school year. In the third year, teachers in the treatment condition received the poetry unit only due to funding constraints for the study. In addition to the printed units, all teachers in the treatment group were provided supporting materials needed to implement the units. The research team provided treatment group teachers with a disk that included an oral presentation of the rationale for the units, an explanation of how the units were developed, instructions for implementation, and a complete copy of materials. The research staff also offered voluntary webinars to the treatment group teachers through which teachers were informed about core elements of the CLEAR model, scope and sequence of the units, and explanations of strategies illustrated in the units. In addition, when teachers in the treatment group had questions or needed extra resources in the implementation process, the research team was available to provide continuous support and training via phone calls, emails, and the password-protected online resource center. The print and electronic materials were designed to be self-explanatory, and only a small number of teachers participated in the webinars. Questions to the research staff focused primarily on issues such as how to find suggested supplemental materials or organizational issues related to the study.
As fidelity measures provide a critical link between intervention and outcomes attributable to the intervention’s effectiveness (Sanchez et al., 2007), the research team utilized on-site classroom observations and teacher interviews in order to monitor teachers’ implementation of the units. Observations were recorded on an observation guide, a semi-structured tool developed to parallel the critical components of the units. Observers used the tool to indicate whether each critical component was implemented, modified, or omitted during the observation. Following each observation, the observer conducted an interview with teachers. The semi-structured interview protocol included open-ended questions to obtain information about the teacher’s decision-making process relating to the particular lesson and unit implementation, classroom environment, the teacher’s experience or knowledge of the content of the unit and their students, any rationale for modifications or omissions, and the factors that influenced the teacher’s decision to make modifications. Teachers were also asked to report fidelity to the implementation process through a teacher log developed by the research team. The teacher log, given to teachers to report on how they implemented each lesson, mirrored the observation guide. The checklist format of the log was selected with the assumption that a teacher would be more likely to fill out the log and return it at the end of the unit implementation if the reporting were simple and easy. Additionally, the log included an area where teachers could provide descriptions of any modifications, omissions, or additions and the rationale behind the adaptation.
While teachers in the treatment condition implemented the unit, teachers in the comparison site proceeded with “business as usual.” The research team also observed and interviewed teachers in the comparison classrooms in order to identify the presence or absence of the critical components of the CLEAR model units that distinguish the intervention from the instruction in the comparison classrooms.
Data Sources
Iowa Test of Basic Skills Survey Battery Reading Subtest
All students in both treatment and comparison classrooms were pre-assessed using the Iowa Tests of Basic Skills (ITBS) Survey Battery Reading Subtest, Level 9, Form A (Hoover et al., 2003). For Level 9, the test consists of distinct fiction and nonfiction passages that represent various types of text students read in and out of school. The passages come from previously published material and information sources including narrative, poetry, folklore, and topics in science and social studies.
To provide validity evidence supporting the content of the ITBS, the test developers offer evidence that it was developed to correspond with common goals of instruction across schools in the nation. The process used to design the ITBS, including curriculum reviews, preliminary and national item tryouts, fairness reviews, and form design within and across levels, reflect national standards for test design (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999; Lane, 2007). The publisher reports reliability estimates of .91 for the reading subtest in the general population.
The ITBS was used as a covariate in the study design to increase the precision of comparisons between the treatment and comparison groups by accounting for variation on an important pretreatment variable, student achievement. While all students had been designated as “gifted,” each school system in the study used different identification procedures and criteria. Thus, it was important to account for potential pretreatment achievement differences.
Posttests
The learning outcomes in each unit warranted a focus on assessments that required students to analyze, synthesize, and make judgments at a level of depth and complexity that existing “off-the-shelf” assessments (e.g., NWEA, MAP, NAEP, or ITBS) are limited to measure. Hence, to assess the outcomes of the units, two standards-referenced posttests, each with 35 items, were developed. A multidisciplinary team comprised of six researchers with combined expertise in the areas of English education, gifted education, assessment, elementary education, special education, curriculum and instruction, and reading participated in the development process. As in the unit development process, authors of the tests based the items on representative learning standards from professional organizations (e.g., NCTE and IRA) and common third-grade English Language Arts standards across different states to ensure that students at the third-grade level irrespective of treatment condition would have had the opportunity for exposure to the content. As such, the content standards covered on the unit posttests are not likely to favor the student in treatment condition in measuring essential knowledge, skills, and principles central to reading, writing, and communicating. In addition, tables of specifications for each unit were constructed for the purpose of ensuring a maximum degree of alignment between standards reflected in the unit and items on the posttests. The research team also determined the relative emphasis on particular concepts and principles as well as skills addressed in the units so that each would have the same approximate weight on the postassessment. Once team members created questions for each unit, the questions were evaluated for developmental and reading level appropriateness, difficulty, length, clarity, and the degree to which the questions are aligned with the standards reflected in the units. A seventh member, a gifted education coordinator in one of the research sites who holds a doctoral degree in gifted education with extensive experience in teaching elementary gifted students, provided feedback in the review process as well.
Questions on the posttests were eliminated during the development process if it was concluded that the item was too difficult, beyond the development and/or reading level of most third-grade students, too lengthy, or was otherwise not an appropriate reflection of the standards. Questions that referred to specific material (e.g., poems) contained in the curriculum unit were also eliminated or revised so that students in the treatment group would not be advantaged. Questions were sent to five third-grade teachers who provided additional review and critique. Based on the expert knowledge and feedback from teacher reviewers, the questions were finalized for the postassessment for each unit. After the postassessment for each unit was created, they were piloted at four treatment classrooms in different states. Questions deemed to be too difficult or to lack clarity based on difficulty and discrimination indices of the question were revised.
In addition to the deliberate development process, external reviewers who possessed advanced degrees in gifted education with more than 10 years of teaching experience in programs for gifted students were invited to review the assessments to ensure that the assessments represented a valid translation of the identified standards, depth and complexity of the content, and the level of thinking processes reflected in the unit. Content validation from reviewers indicated that the posttests demonstrated adequate alignment with identified standards and guidelines for best practices in gifted education. The reviewers also indicated that the posttests were designed in a way that is appropriate for third grader’s attention span and that they were not likely to be biased in regard to student ethnicity, socioeconomic status, or disability.
In the second year, item statistics from the first year data were scrutinized in order to ensure adequate reliability and appropriate level of difficulty of each item on both posttests. Two items in the research posttest that were found to be too difficult or have less item discrimination were revised accordingly.
School Profile Form
Upon signing up for participation in the study, teachers and central office personnel were asked to complete a school profile form. The respondents were asked to provide information on gifted education services including identification of students for such services (i.e., grade level at which the system identifies students as gifted, instruments used in the identification process, and criteria for selection) and type of gifted services (e.g., pullout or self-contained classroom).
Data Analysis
Classrooms were assigned according to a cluster-randomized experimental design in which classrooms were randomly assigned to treatment or comparison conditions. Data are considered to have a hierarchical structure when persons are nested within organizational units or multiple observations are nested within persons, resulting in an interdependency of scores for units within clusters (Raudenbush & Bryk, 2002). Multilevel analyses allow the nested nature of the data set to be taken into account and prevent issues with aggregation bias, the misinterpretation of standard errors, and heterogeneity of regression (Maas & Hox, 2004; Raudenbush & Bryk, 2002; Scherbaum & Ferreter, 2009). Classrooms of gifted students in this study within a school did not share school variance as the classrooms were in nearly all cases made up of students transported to the classroom from multiple schools across a district for part or all of a day.
Hierarchical data sets do not automatically require multilevel modeling. If there is no variation in response variables across Level 2 clusters, the data can be analyzed using simple ordinary least squares regression analysis (Bryk & Raudenbush, 1992; Peugh, 2010; Raudenbush & Bryk, 2002; Wu & Wooldridge, 2005). As such, to determine whether multilevel modeling is necessary, intraclass correlation coefficient (ICC) and design effect (DE) indices were calculated prior to data analyses (reported in Table 1). Significant between classroom variances in all three years supported the need for multilevel modeling for outcome measurement for both units across years. The ICC ranged from .18 to .44, indicating that about 18% to 44% of student achievement variance occurred across classrooms. The DE ranged from 2.89 to 6.44 suggesting that the standard errors produced by assuming simple random sampling were close to three to six times smaller than they would be if the clustered nature of the data were taken into account. That is, the standard errors from other traditional statistical analysis should be multiplied by the DE of 2.89 or 6.44 to get more realistic estimates of the standard errors, given the dependence in the data.
Intraclass Correlations and Design Effect by Cohort
In order to examine achievement differences between the treatment and comparison groups, multilevel models were generated. The statistical package Mplus 6.0 (Muthén & Muthén, 2010) was used for data analyses. Considering the non-normality of the outcome data found from preliminary inspection, the maximum likelihood robust (MLR) estimation was employed. The problem with non-normality in the data is that it negatively biases standard errors and results in increased Type 1 errors (Raudenbush & Bryk, 2002). Using MLR estimation with standard errors and chi-square test statistics that are robust to non-normality and nonindependence of observations in clustered data allowed minimizing biases through controlling against Type 1 errors in parameter estimation (Asparouhov & Muthén, 2003; Chou, Bentler, & Satorra, 1991; Muthén & Muthén, 2010). For an effect size index, proportional variance reduction (PVR) statistics were calculated. The PVR is a local effect size estimate that can be used in multilevel analyses as it quantifies the effect of individual variables on the response variable through estimating reduction in residual variances after adding variables in the analysis model (Peugh, 2010; Raudenbush & Bryk, 2002; Singer & Willett, 2003). In calculating the PVR for each analysis model, we used the formula:
While student- and classroom-level variables were entered in the model specification, school level was not considered due to wide variation in classroom configurations in the study. In some of the classrooms, students were transported from multiple schools for special gifted education services and did not share any school-level variance. Among the student-level variables, neither ethnicity nor gender was found to be significant (p > .05) in the preliminary analyses, and the models were not improved when these variables were included. As such, the final models for each cohort and units do not include student gender and ethnicity variables. In addition, the slope variance of the ITBS at level 2 (u1j) was not included in the final models as the effect of ITBS scores was not found to vary across classrooms in preliminary analyses (p > .05).
The final Level 1 model contained students’ ITBS scores. The ITBS scores were entered after grand-mean centering, in which the grand mean for the ITBS scores was subtracted from each student’s ITBS score (ITBS
ij
–
Results
Test of Assumptions
Prior to multilevel analyses, several assumptions were examined for the ITBS reading subtest and the unit posttests.
ITBS Survey Battery Reading Subtest
Using standard scores, visual inspection of the histogram suggested an approximately normal distribution. However, Shapiro-Wilk test revealed significant departure from normality (p < .05). The skewness and kurtosis values were well within acceptable limits. Standardized residuals were calculated to identify potential outliers or unduly influential observations. No outliers were detected in the data. Balancing these findings, the reading variable was left unaltered in all years. The reliability estimates using standardized score for the participants in this study were .78 for Cohort 1, .76 for Cohort 2, and .77 for Cohort 3. While there is a discrepancy between the ITBS publisher’s reported reliability and the reliability from the study sample, reliability coefficients are affected by the variability of scores within a particular sample. Given that this study’s sample was not reflective of the sample demographics reported in the ITBS technical manual and had a much more restricted range of scores attained on the test, it was not surprising to have lower estimates of reliability as reflected in the alpha coefficients. Considering that reliability is the ratio of true score variance to observed score variance, in a more homogenous group such as a sample of gifted students the lower true score variance leads to a smaller reliability estimate (Thorndike & Thorndike-Christ, 2010).
Descriptive statistics for each cohort are reported in Table 2.
Descriptive Statistics on the Iowa Tests of Basic Skills (ITBS) by Cohort and Treatment Condition
Note. The average median ITBS Standard Score for the typical third grade students reported by the publisher is 185.
Posttests
Univariate normality was examined through histograms, evaluation of skewness and kurtosis, and more formally through the Shapiro-Wilk test statistic. The skewness and kurtosis values were well within acceptable limits, and visual inspection of the histogram suggested an approximately normal distribution. However, Shapiro-Wilk revealed significant departure from normality (p < .05). Standardized residuals were calculated to identify potential outliers or unduly influential observations. No outliers were identified in the data. Balancing these findings, the posttest reading variable was left unaltered.
With regard to reliability of the tests, Cronbach’s α estimates ranged from .73 to .80 for the poetry unit and from .68 to .73 for the research unit (see Tables 3 and 4; reliability was not assessed for Cohort 3). These reliability estimates suggest adequate internal consistency for research purposes.
Posttest Reliability by Cohort and Unit
Descriptive Statistics on the Posttests by Cohort
Note. In Cohort 3, teachers in treatment condition received the poetry unit only due to time constraints for the study.
Hierarchical Linear Modeling Assumptions
Statistical assumptions on each data source were tested, and standard statistical assumptions of hierarchical linear modeling (HLM) including the normality of Level 1 residuals and homogeneity of Level 1 variances were examined. The Level 1 residuals were normally distributed, and no violations were noted for the test of homogeneity of Level 1 variances. At Level 2, the residuals for the intercept and slope for dependent variables were normally distributed and homogeneity of variance assumptions held.
Multilevel Analyses
For Cohort 1, the multilevel analyses result showed a significant difference (p < .01) favoring the treatment group over the comparison group on the outcome measures after controlling for ITBS scores for both units. The results also indicated that different classroom type (pullout and self-contained) was not a statistically significant factor that impacted student learning.
Model fit tests corroborate the significance of the treatment effect on student achievement for both units: Δχ2df=2 = 37.18, p < .01 for the poetry unit and Δχ2df=2 = 10.10, p < .01 for the research unit. In Tables 5 through 9, the model summaries for poetry and research are reported. The PVR between classrooms for both units was calculated to estimate effect size indices of the treatment. Results indicated that a significant amount of the residual variation across classrooms in student achievement scores (14.2% for the poetry unit and 17.8% for the research unit) decreased by adding treatment condition in the analyses.
Model Summaries for Poetry Unit in Cohort 1
p < .01.
Model Summaries for Research Unit in Cohort 1
p < .01.
Model Summaries for Poetry Unit in Cohort 2
p < .05. **p < .01. ***p < .001.
Model Summaries for Research Unit in Cohort 2
p < .01. ***p < .001.
Model Summaries for Poetry Unit in Cohort 3
p < .001.
For Cohort 2, the multilevel analyses result also showed a significant difference (p < .01) favoring the treatment group over the comparison group on the outcome measures after controlling for ITBS scores for both units. Different classroom type was not a significant factor associated with student learning in either unit. Model fit tests corroborate the significance of the treatment effect on student achievement for both units as well: Δχ2df=2 = 16.30, p < .01, for the poetry unit and Δχ2df=2 = 15.37, p < .01, for the research unit. The PVR between classrooms for both units indicated that a significant amount of the residual variation in student achievement scores (10.9% for the poetry unit and 12.6% for the research unit) decreased by adding treatment condition in the analyses.
The results from Cohort 3 data analyses also showed a significant difference (p < .01) favoring the treatment group over the comparison group on outcome measures after controlling for ITBS scores. As observed in the previous years, different classroom type was not found to be significant on student learning. Model fit tests corroborate the significance of the treatment effect on student achievement for the unit Δχ2df=2 = 60.49, p < .01. The PVR between classrooms indicated that 23.3% of residual variance in student achievement scores decreased by adding treatment condition in the analyses.
Discussion and Implications
The significant differences on the achievement outcome measures between treatment and comparison groups from two different units of study based on the CLEAR model are promising indicators of the potential for integrated model to guide development of units that positively affect learning for advanced students across distinct program delivery settings (self-contained classes and pullout programs). The results suggest that the CLEAR model, which establishes the context of rich curriculum and responsive instruction driven by key components of three existing curricular and instructional models, is a viable option to enhance student learning.
The current study also collected data with regard to the fidelity of implementation. The result from the fidelity studies indicate that teachers implemented the lessons with moderate to high fidelity (Azano et al., 2011; Foster, 2014) and that teachers’ fidelity had positively significant association with student learning (p < .05). Further documentation of fidelity of implementation and its relationship to outcomes in the study can be found in (Azano et al., 2011; Foster, 2014). Additionally, the authors also found that teachers’ implementation of the CLEAR model units for multiple years did not have significant association with student learning (Oh, Callahan, Moon, & Azano, 2014). While we hypothesized that teachers’ previous experiences of implementing the units might have increased faithful implementation of the units and positively affected student outcomes, this carryover effect was not found to be a significant factor affecting student learning outcome.
The field has been divided on the issue of “what model works best?” in challenging and enhancing high-achieving students’ learning. In this study, we were able to construct a model that incorporated the common threads of the curricular and instruction principles recommended to develop units of study that (a) clearly reflect content standards from professional associations, (b) are common across states across the nation, and (c) reflect the CCSS but at a level of content and skill differentiation beyond what other students “could, should, or would” engage in (Passow, 1982). These principles for developing curriculum and instruction for high achieving and high potential students hold promise for all learners in that rich curriculum and responsive instruction can be applied in any grade level and in content area.
Limitations
While wide variation in gifted program service delivery exists (Callahan et al., 2013, 2014; NAGC & Council of State Directors of Programs for the Gifted, 2009, 2011), the study targeted teachers and students in pullout or self-contained classrooms specifically designated for gifted students only. Among the students and teachers in the study, certain racial groups are not represented in the classrooms in the same proportions as the general student population. The pattern of underrepresentation of certain racial groups and economically disadvantaged students among the highest scorers that are observed in national level achievement tests such as NAEP and advanced academic settings (e.g., AP or IB courses) is also persistent in gifted education classrooms (Gandara, 2007; Moore, Ford, & Milner, 2005; Naglieri & Ford, 2003; U.S. Office for Civil Rights, 2000, 2002, 2004, 2006; Worrell, 2013). As such, students served in gifted education classrooms do not represent the general student population. Additionally, the research team’s effort to collect students’ free or reduced lunch price status was challenged because school districts that participated in the study limited release of these data, which are protected under the Family Educational Rights and Privacy Act (FERPA) regulations. It is important for future work to understand the extent to which CLEAR model units are equally responsive to all learners, regardless of culture. In doing so, we can begin to understand how to engage students from underrepresented populations in the learning process, thereby bolstering their academic achievement.
Due to funding constraints of the grant under which the research was carried out, data collection and analysis with Cohort 3 were limited to one unit (poetry) during the fall semester. While many teachers integrate poetry instruction throughout the year, some isolate such instruction to the spring semester or during April, which is considered “poetry month.” Therefore, if students in comparison classrooms were not yet exposed to poetry content or concepts, their lack of exposure may account for some variance in outcomes in the final year of the study.
Outcome assessments with sufficient ceiling to measure the achievement of gifted students are limited. Students in this study scored near the top of out-of-level standardized language arts assessments prior to beginning the study. This necessitated the construction of instruments to measure study outcomes. While we took great care to base the outcome assessments on standards common to third-grade learning outcomes in the language arts and to avoid any references to specific poems or other resources used in the units, it is possible that students in the treatment group may have benefited from a closer alignment of instruction with the outcomes as measured by the instruments constructed for use in the study.
While the model was found to be effective across different curriculum units and for gifted students, the research team’s ability to assess longitudinal effects was inhibited by funding constraints and initial agreements with the school districts for examining third-grade performance only as part of the agreement to participate in the study. The study findings are further limited by the exclusion of students who were not identified and served in gifted programs. Hence, we cannot conclude that students not identified and served in gifted programs would not benefit from the curriculum. The concern with non-identified studies was not ignored. Pilot studies of the curriculum that included implementation in heterogeneous classrooms indicated that teachers in those classrooms were unwilling or unable to implement the curriculum in their classrooms citing the difficulty of the content, the pace, and the lack of exact parallel to state assessments.
Future Directions
The results and the limitations of the study suggest possibilities for future research. While the research-based units demonstrated positive effects in student achievement, areas for further research include investigation of differential effects across students in different settings and across students from more diverse racial and socioeconomic groups, the influence of classroom context such as instructional time or class size, the degree to which fidelity of implementation influences student learning outcomes within treatment groups, and effects of applying the model across disciplines and across grade levels. Notably, regardless of the type of program setting (i.e., self-contained or pullout) in which treatment students were taught (pullout or special class), these students outperformed their counterparts in the comparison group. Further investigation of whether the finding would generalize to other settings in which gifted students are served, the ways in which implementation in additional settings influence these outcomes, and whether the units are most effective as supplements to the general curriculum or as replacement units warrant further study. Finally, investigation of how the CLEAR model works in different content areas involving students in different grade levels and across varying levels of achievement and aptitude is the next logical step.
Footnotes
Notes
C
T
S
A
E
