Concurrent Validity of the Classroom Strategies Scale for Elementary School

Abstract

The present study is an initial investigation of the concurrent validity of a new assessment, the Classroom Strategies Scale (CSS version 2.0) for Elementary School—Observer Form. The CSS assesses teachers’ use of instructional and behavioral management strategies. In the present study, the CSS is compared to the Classroom Assessment Scoring System (CLASS), a widely researched measure of global classroom quality. In a sample of 125 general education K-5 grade teachers, correlations were computed to assess the relationship between the CSS scales and conceptually similar and dissimilar domains and dimensions on the CLASS. In comparison to the CLASS, the CSS classroom observations and strategy rating scale scores demonstrated correspondence with conceptually similar scales, providing initial evidence for the concurrent and discriminant validity of the CSS. Results highlight the unique features of the CSS for assessing teacher classroom practices.

Keywords

teacher behavior assessment instruction behavior management

Teacher accountability has received increased attention by school systems and state/federal government (Bales, 2006). Changes in the American education system, including the passage of the No Child Left Behind legislation (NCLB, 2001), have emphasized the importance of general education teachers and their practices with respect to student achievement. At the same time, Response to Intervention (RtI; Fletcher, Lyon, Fuchs, & Barnes, 2007) and Positive Behavioral Intervention and Support (PBIS; www.pbis.org; Sugai & Horner, 2002) frameworks rely heavily on the general education teacher as the implementer of best practice interventions for academic instruction and/or behavior management.

With an increased emphasis on teacher accountability in addition to student achievement outcomes, it follows that schools need to establish mechanisms to evaluate teacher performance. Legislation has emphasized student performance on state-wide assessments as one metric of determining effective teacher performance (NCLB, 2001). However, the use of state-wide assessment scores as a method for assessing teacher performance ignores other contributors to educational success, and the scores in and of themselves do not provide information on how to improve teachers’ performance. For example, student academic achievement has numerous contributors (i.e., child intellectual level, family socioeconomic status, parent education, prior educational experience), with teacher contributions representing a small portion of the variance (Connor, Son, Hindman, & Morrison, 2005). Importantly, many of these child/family variables that may contribute to state-wide test scores are outside of the teachers’ control. Because of this, attention has turned to developing and validating assessments of teacher performance and professional practice that focus on teacher behaviors within the control of the general educator. These include strategies that are consistent with best practice in instruction and classroom behavior management. Presumably, focusing on improvements in teacher practice will contribute to the promotion of downstream student achievement gains.

To meet this need to evaluate and monitor teacher classroom practices, multiple approaches are available. A traditional approach includes a principal’s observation of classrooms followed by a review with the teacher of narrative notes. This approach is limited, however, as it is susceptible to inconsistency and bias. Another approach involves use of local rubrics, and whereas this may increase the consistency and structure of the observation, it may have limited external validity. Beyond these approaches there appear to be few tools to monitor teachers’ use of evidence-based classroom instructional and behavioral management practices. For example, in the National Center for Education Evaluation’s comprehensive review of available school assessments, there does not appear to be a measure that assesses both teachers’ instructional and behavior management practices in instructional contexts (Malone et al., 2010). Also, there is no teacher measure designed for repeated use (progress monitoring purposes). Collectively, these approaches have conceptual and psychometric limitations that leave a void for school personnel to effectively assess, monitor, and improve teachers’ Tier 1 evidence-based instructional and behavioral management practices (Reddy & Fabiano, 2012; Reddy, Fabiano, Barbarasch, & Dudek, 2012).

One well-developed classroom observational system that attempts to address these limitations is the Classroom Assessment Scoring System (CLASS; Pianta, La Paro, & Hamre, 2008). The CLASS is a multidimensional classroom observation system that assesses global classroom quality in preschool through third-grade classrooms. As shown in Table 2, the CLASS consists of 10 nested single-item dimensions within three domains (i.e., emotional support, classroom organization, and instructional support). The CLASS has been used as a research tool in several federally funded projects and has good reliability and validity (Pianta et al., 2008).

Table 1.

Descriptions of the CSS Stage 1 (Classroom Observation) and Stage 2 (Strategy Rating Scales) Assessment.

Stage 1	Definitions
Concept summaries	A teacher summarizes or highlights key concepts or facts taught during the lesson. Summarization statements are typically brief and clear. This teaching strategy helps students organize and recall material taught.
Academic response opportunities	A teacher creates opportunities for students to provide verbal academic responses (i.e., answers or responds to lesson content questions, summarizes or repeats key points, generates questions, brainstorms ideas, explains answer).
Clear one- or two-step commands	A teacher-directed verbal instruction that specifically requests a behavior. These commands are clear and direct, and they provide specific instructions to students. They are declarative statements (not questions), describe the desired behavior, and include no more than two steps.
Vague commands	A teacher-directed verbal instruction that is unclear when requesting a behavior. These commands are vague, may be issued as questions, and often include excess verbalizations or more than two steps.
Praise statements	A teacher issues a verbal or nonverbal statement or gesture to provide feedback for a positive or appropriate behavior.
Corrective feedback	A teacher issues a verbal or nonverbal statement or gesture to redirect inappropriate behavior.
Stage 2: PIS scales	Definitions
Total Scale	The Total PIS scale reflects the overall use of instructional methods and academic monitoring/feedback.
Instructional Methods Composite Scale	How classroom instruction occurs. Measures teachers’ use of teacher-directed or student-directed methods. This includes how a teacher incorporates active learning techniques such as hands on learning and collaborative learning in the presentation of lessons as well as how a teacher delivers academic content to students.
Student focus learning and engagement subscale	Strategies for engaging students in the lesson, creating active learners, and encouraging self-initiative in the learning process. These practices encompass direct experience, hands on instructional techniques, linking lesson content to personal experiences, and cooperative learning strategies.
Instructional delivery subscale	Methods for conveying information to students and strategies employed while teaching lesson content/concepts. These practices include modeling, advanced organizers, summarizing, and other instructional methodology.
Academic Monitor/Feedback Composite Scale	How teachers monitor students’ understanding of the material and provide feedback on their understanding. These strategies assess students’ thinking and encourage students to examine their own thought processes. Teachers guide students’ understanding by encouraging students, affirming appropriate application of the material, and correcting misperceptions.
Promotes student thinking subscale	Practices for stimulating students’ metacognitive and higher order thinking abilities. They encourage students to critically think about the lesson material (why/how analysis), generate new ideas, and examine their own thought processes.
Academic performance feedback subscale	How teachers provide feedback to students’ on their understanding of the material. These practices assess teachers’ efforts to explain what is correct or incorrect with students’ academic performance.
Stage 2: BMS scales	Definitions
Total Scale	The Total BMS scale reflects the overall use of proactive methods and behavior feedback.
Behavior Feedback Composite Scale	How teachers respond to students appropriate and inappropriate behaviors. This includes the usage of praise to encourage positive behaviors and corrective feedback to redirect negative behaviors.
Praise subscale	Verbal and nonverbal strategies that teachers use to praise students for specific appropriate behaviors in the classroom.
Corrective feedback subscale	Verbal and nonverbal strategies teacher use to redirect or correct students’ inappropriate behavior in the classroom.
Proactive Methods Composite Scale	Strategies that teachers use to promote positive behaviors in the classroom and reduce the likelihood of negative behaviors. These strategies include prompts, routines, reviewing rules, and presenting instructions or requests in a clear manner.
Prevention management subscale	Proactive verbal and nonverbal strategies that teachers use to promote positive classroom functioning and establish effective learning environments. These practices include taking actions to prevent problem behaviors from occurring, establishing clear and consistent expectations, and creating a positive atmosphere in the classroom.
Directives/transitions subscale	Strategies that teachers use to communicate their behavioral requests to students and manage the movement and behavior of students during class transitions.

Table 2.

Descriptions of the CLASS Domains and Dimensions.

Construct/item	Definitions
Emotional support domain
Positive climate	The emotional connection, respect, and enjoyment demonstrated between teachers and students, and among students.
Negative climate	The level of expressed negativity such as anger, hostility, or aggression exhibited by teachers and/or students in the classroom.
Teacher sensitivity	Teachers’ awareness of and responsivity to students’ academic and emotional concerns.
Regard for student perspectives	The degree to which teachers’ interactions with students and classroom activities place an emphasis on students’ motivations and points of view.
Classroom organization domain
Behavior management	How effectively teachers monitor, prevent, and redirect behavior.
Productivity	How well the classroom runs with respect to routines and the degree to which teachers organize activities and directions so that maximum time can be spent in learning objectives.
Instructional learning formats	How teachers facilitate activities and provide interesting materials so that students are engaged and learning opportunities are maximized.
Instructional support domain
Concept development	How teachers use instructional discussions and activities to promote students’ higher order thinking skills in contrast to a focus on rote instruction.
Quality of feedback	How teachers extend students’ learning through their responses to students’ ideas, comments, and work.
Language modeling	The extent to which teachers facilitate and encourage students’ language.

Note. Cited from Pianta et al. (2008).

A recently developed observational measure, the Classroom Strategies Scale (CSS) for Elementary School collects data on specific teacher behaviors related to evidence-based instructional and behavioral management strategies. The CSS is guided by constructivist and differentiated learning theories, and opportunity to respond, positive behavioral interventions and supports (PBIS), and behavioral management literature (e.g., Coutinho, Wiemer-Hastings, Skowronski, & Britt, 2005; Gable, Hester, Rock, & Hughes, 2009; Marzano, 1998; Matthews, 2000; Sugai & Horner, 2002; Sutherland, Adler, & Gunter, 2003; Tomlinson & Edison, 2003).

Unlike the CLASS, the CSS does not measure teacher–student interactions, but instead measures teacher behavior. The CLASS assesses global classroom quality across 10 single-item dimension scores and the CSS assesses discrete data via frequency counts of six behaviors and a 57-item rating scale composed of two total, four composite, and eight subscale scores. The CSS also yields discrepancy scores (i.e., frequency − ideal frequency ratings), which provide feedback for changes in teachers’ practices. Thus, the CSS and CLASS are conceptually and methodologically different tools that offer distinct and complimentary information for measuring teachers’ professional functioning.

An important step to test validation is to examine the concurrent validity of new assessment tools to existing criterion measures. However, at this time there is no gold standard measure of Tier 1 teacher instructional and behavioral management practices. Given this void in school-based assessments, the CSS was compared to the CLASS as it is a widely researched observational assessment of classroom quality.

This investigation serves as the first concurrent validity study of the CSS Observer Form by comparing the CSS to the CLASS. It was hypothesized that the CSS Instructional Methods and Academic Monitoring and Feedback discrepancy scores would be negatively correlated with the CLASS classroom organization and instructional support domains and associated dimension scores. Similarly, it is hypothesized that the CSS discrepancy scores of behavioral management would be negatively correlated with the CLASS classroom organization domain and associated dimension scores.

Method

Sample

General education K-5 teachers (N = 125) from 45 elementary schools in New Jersey and New York were assessed by 40 observers on both the CSS and CLASS. The teacher sample included 117 females predominately Caucasian (96%). The average age was 39 (SD = 11.68). Teacher grade-level assignment included 20 in kindergarten, 23 in first grade, 25 in second grade, 20 in third grade, 17 in fourth grade, and 20 in fifth grade. The average number of students per classroom was 21 (SD = 3.94). The average number of years of teaching experience was 11.72 (SD = 8.52). Educational degree included 45% with bachelor degree, 47% with a master’s degree, and 8% did not report.

Forty observers (30 females) were included. The majority were Caucasian (90%). The average age was 37.32 (SD = 14.88). Educational degree included 25% with an associate degree, 13% with bachelor degree, 53% with a master’s degree, and 9% with a doctoral degree.

Measure

CSS for Elementary School—Observer Form (version 2. 0)

The CSS was designed for identification and progress monitoring of teachers’ instructional and behavioral management strategy usage. The CSS is composed of three assessment stages that includes empirically supported items/strategies (e.g., Gable et al., 2009; Marzano, Pickering, & Pollock, 2001; Walker, Colvin, & Ramsey, 1999). For Stage 1, two 30-min classroom observations (within 7 days) are conducted. The observer records the frequencies of discrete instructional and behavior management strategies by tallying how often six teacher behaviors/strategies occurred across the two lessons (see Table 1). The Stage 1 six teacher behaviors yield a total score and two additional scores (i.e., praise − correct feedback and praise − (corrective feedback + clear commands + vague commands).

After completing the observations, raters fill out Stage 2. The Stage 2 Strategy Rating Scales assess important instructional and behavioral management strategies that cannot be discretely counted in addition to rating appropriate use (i.e., ideal frequency − actual frequency) aspects of the six behaviors from Stage 1. Observers fill out a total of 57 items on the Positive Instructional Strategies (PIS) and Behavioral Management Strategies (BMS) scales. The PIS scale includes 26 items on a total scale, two composite scales, and four subscales. The Instructional Methods Composite scale (14 items; maximum score of 98) consists of the instructional delivery (7 items; maximum score of 49) and student focus learning and engagement (7 items; maximum score of 49) subscales. The Academic Monitoring/Feedback Composite scale (12 items; maximum score of 84) consists of the promotes student thinking (6 items; maximum score of 42) and academic performance feedback (6 items; maximum score of 42) subscales.

The BMS scale includes 23 items that composes a total scale, two composite scales, and four subscales. The Behavioral Feedback Composite scale (11 items; maximum score of 77) consists of praise (5 items; maximum score of 35) and corrective feedback (6 items; maximum score of 42) subscales. The Proactive Methods Composite scale (12 items; maximum score of 84) consists of prevention management (5 items; maximum score of 35) and directives/transitions (7 items; maximum score of 49) subscales (see Table 1).

After each 30-min classroom observation, observers rate how often (frequency rating) teachers used specific positive instructional and behavioral management strategies on a 7-point Likert-type scale (1 = never used, 3 = sometimes used, 7 = always used) and then rate how often the teachers should have used each strategy (ideal frequency) on the same 7-point Likert-type scale. Frequency ratings are subtracted from ideal frequency ratings, yielding “discrepancy scores” (signed values) that suggest the under use (i.e., positive value), appropriate use (i.e., a value approximately zero), or over use (i.e., negative value) of specific items/strategies. The larger the positive or negative value, the greater the suggested amount of teacher practice change is needed. For Stage 3, a classroom checklist is completed related to classroom structure and routines. For this investigation, the classroom checklist was not included.

The CSS Observer Form (version 2.0) has strong face, content, and construct validity. The Stage 2 PIS and BMS rating scales are theoretically and factor analytically derived (confirmatory factor analysis) within classroom observations (Reddy & Fabiano, 2012). The CSS has strong internal consistency (Cronbach αs of 0.93 and 0.92), as well as good interrater reliability (r = 0.72 to 0.94; percentage agreement 88% to 92%) and test–retest reliability (approximately 2 to 3 weeks, unadjusted r = 0.77 to 0.86; percentage agreement 81% to 93%). Differential item functioning analyses have revealed that the Stage 2 Strategy Rating Scales and items are free of item bias for important teacher demographic variables (e.g., educational degree, tenure status, years of teaching experience). Research has found the CSS scores predict state-wide mathematics and literacy testing scores and are sensitive to change following brief teacher consultation for improving classroom practices (Reddy & Fabiano, 2012). The CSS Observer Form requires minimal training and includes explicit instructions and examples, as well as a 5-min DVD trainer.

Classroom Assessment Scoring System (CLASS)

The CLASS is an observational measure designed for researchers to assess global classroom climate and instructional quality through observation of student and teacher interactions. The constructs of the CLASS were based on research on the impact of student and adult interactions on student development and learning (Pianta et al., 2008). The CLASS is composed of 10 single-item dimensions corresponding to three theoretically and factor analytically derived domains (i.e., emotional support, classroom organization, and instructional support) that measure global classroom quality (see Table 2).

The CLASS observation includes six cycles (observations) of 20-min intervals. During this time observers take notes relevant to each of the 10 dimensions. Following each observation, observers rate classrooms on each of the 10 dimensions using a 7-point Likert-type scale. CLASS ratings represent the observed quality of each dimension and ratings range from low quality (1, 2), to mid quality (3, 4, 5), and high quality (6, 7). Lower ratings on the CLASS indicate a lesser amount of indicators for each dimension while higher ratings indicate a greater amount of indicators. Observers are to rate each classroom holistically on each dimension based on the associated indicators. The CLASS requires 2 days of training with a designated CLASS trainer and a web-based reliability test to obtain an observer reliability certificate.

Drawn from six large studies, the CLASS and its predecessor the COS evidence strong psychometric properties. The CLASS has good internal consistency estimates (Cronbach αs of the three domains 0.76 to 0.92), interrater reliability (i.e., within 1 point of master codes on each dimension; average percentage agreement of the 10 dimensions 87%), and test–retest reliability estimates for 1 day, 2 days, and 1 week (r = 0.73 to 0.85). Lower levels of stability are found across the school year (r = 0.25 to 0.64). In addition, the CLASS preschool–kindergarten version has adequate concurrent and divergent validity with the Early Childhood Environment Rating Scale—Revised Edition yielding r of 0.33 to 0.63 and Emerging Academic Snapshot (Ritchie, Howes, Kraft-Sayre, & Weiser, 2001) yielding r of 0.12 to 0.42 (La Paro, Pianta, & Stuhlman, 2004; Pianta et al. 2005). Concurrent validity has also been demonstrated (e.g., Pakarinen et al., 2010).

Procedures

All observers obtained CLASS reliability certification through a 2-day training performed by a certified CLASS trainer and passed the CLASS reliability assessment. The CLASS test authors advised that the CLASS observation method of six 20-min cycles be modified to correspond to the CSS two 30-min observations and the CLASS administration procedures be used for fourth- and fifth-grade classrooms in this study.

Informed consent was obtained from all participating teachers. For each classroom the two observations were conducted within seven school days. One observer administered the CSS and a second observer administered the CLASS simultaneously. All CSS and CLASS completed forms were independently returned to the study coordinators within 24 hours. CSS and CLASS scores were averaged across the two observations.

Results

CSS Stage 1—Frequency of Teacher Behavior

Table 3 displays the correlations between the CLASS domains and dimensions¹ and the CSS Stage 1 (classroom observation − frequency) Total, six teacher strategies (behaviors), and two difference scores (i.e., P − CF; P − [CF + CC + VC]) computed between sets of CSS Stage 1 teacher strategies. The CSS corrective feedback (frequency) score was negatively correlated with the CLASS emotional support domain and the classroom organization domain and its three corresponding dimensions (i.e., behavior management, productivity, and instructional learning formats). Similarly, the CSS corrective feedback (frequency) was positively correlated with the CLASS negative climate dimension. The CSS academic response opportunities (frequency) was positively correlated with the CLASS classroom organization domain and its three corresponding dimensions (i.e., behavior management, productivity, and instructional learning formats), as well as the instructional support domain and two out of the three corresponding dimensions (i.e., concept development and language modeling). The CSS vague commands (frequency) were negatively correlated with the instructional support domain and two of three of its corresponding dimensions (i.e., concept development and quality feedback). Also, the CSS Total Scale (frequency) was positively correlated with the CLASS language modeling dimension. Clear one- to two-step commands, praise statements, and concept summaries did not correlate with CLASS domains and dimensions.

Table 3.

Correlations Between the CSS Stage 1 Teacher Behaviors and CLASS Domains and Dimensions.

	CSS Stage 1 (classroom observation − frequency scores)
CLASS domains and dimensions	Total score	Clear one- to two-step commands	Vague commands	Corrective feedback	Praise statements	Concept summaries	Academic response opportunities	P − CF^a	P − (CF + CC+ VC)^b
Emotional support domain	0.00	−0.07	−0.01	−0.23**	0.09	0.04	0.07	0.22*	0.24**
Positive climate	0.05	−0.03	−0.04	−0.13	0.11	−0.08	0.13	0.18*	0.18*
Negative climate	0.01	−0.01	0.12	0.35***	−0.04	−0.17	−0.01	−0.24**	−0.24**
Teacher sensitivity	0.01	−0.03	0.06	−0.13	0.15	0.11	−0.04	.22*	.18*
Regard for students perspectives	−0.05	−0.12	0.02	−0.17	−0.03	0.00	0.07	.07	.14
Classroom organization domain	0.13	0.02	−0.04	−0.30***	0.93	0.09	0.26**	.26**	.22*
Behavior management	0.10	0.01	−0.06	−0.34***	0.30	0.14	0.23**	0.19*	0.19*
Productivity	0.08	0.01	0.00	−0.22*	0.05	0.00	0.20*	0.14	0.14
Instructional learning formats	0.17	0.03	−0.02	−0.18*	0.17	0.08	0.24**	0.20*	0.20*
Instructional support domain	0.14	−0.05	−0.22*	−0.07	−0.06	−0.05	0.34**	−0.02	0.09
Concept development	0.13	−0.02	−0.21*	−0.03	−0.11	−0.13	0.34***	0.02	0.02
Quality of feedback	0.01	−0.14	−0.23**	−0.15	−0.07	−0.06	0.17	0.20*	0.20*
Language modeling	0.20*	0.05	−0.05	0.00	0.06	0.11	0.27**	0.03	0.03

Note. ^aPraise – Corrective Feedback.^bPraise – (Corrective Feedback + Clear Commands + Vague Commands).

p < .05. **p < .01. ***p < .001.

Based on the behavior management literature, two difference scores were computed between sets of CSS Stage 1 teacher strategies (i.e., P − CF and P − [CC + VC + CF]). P − CF represents the amount of praise compared to corrective feedback used by educators. The P − (CF + CC + VC) score represents the amount of praise compared to the amount of demands (i.e., corrective feedback, clear command, or vague command) placed on students. A positive score indicates more praise is used relative to negative approaches. The two CSS difference scores yielded similar results. The CSS P − CF and P − (CC + VC + CF) scores were positively correlated with the (a) CLASS emotional support domain and the dimensions of positive climate and teacher sensitivity, (b) CLASS classroom organization domain and the dimensions of behavior management and instructional learning formats, and (c) CLASS dimension of quality of feedback. Both difference scores were negatively correlated with the dimension of negative climate.

CSS Stage 2—Strategy Rating Scale Discrepancy Scores

Tables 4 and 5 display the correlations between the CLASS domains and dimensions and the CSS Stage 2 PIS and BMS discrepancy scores (computed as ideal frequency ratings − frequency ratings). The CSS discrepancy scores reflect differences between teacher’s usage of strategies (frequency ratings) and ideal usage of strategies (ideal frequency ratings) in the context of lessons. The larger the CSS discrepancy score, the greater the need for a teacher to change his/her instructional and behavioral management practices.

Table 4.

Correlations between the CSS Stage 2 Positive Instructional Strategy Scale Discrepancy Scores^a and the CLASS Domains and Dimensions.

	CSS Stage 2 PIS scales
CLASS domains and dimensions	Total Scale score	Instructional methods composite	Student focus learning and engagement	Instructional delivery	Academic monitor and feedback composite	Promotes student thinking	Academic feedback
Emotional support domain	−0.26**	−0.20*	−0.22**	−0.15	−0.30***	−0.29***	−0.24**
Positive climate	−0.26**	−-0.21*	−0.19*	−0.19*	−0.29***	−0.25**	−0.27**
Negative climate	0.17	0.15	0.14	0.12	0.18*	0.18*	0.14
Teacher sensitivity	−0.17	−0.11	−0.13	−0.08	−0.23**	−0.18	−0.24**
Regard for students perspectives	−0.16	−0.13	−0.18*	−0.06	−0.17	−0.22**	−0.06
Classroom organization domain	−0.30***	−0.28***	−0.30***	−0.21*	−0.29***	−0.29***	−0.23**
Behavior management	−0.28**	−0.27**	−0.30***	−0.19*	−0.25**	−0.27**	−0.18
Productivity	−0.19*	−0.18*	−0.21*	−0.12	−0.19*	−0.20*	−0.12
Instructional learning formats	−0.29***	−0.25**	−0.24**	−0.21*	−0.30***	−0.24**	−0.30***
Instructional support domain	−0.33***	−0.31***	−0.31***	−0.25**	−0.33***	−0.34***	−0.23**
Concept development	−0.33***	−0.32***	−0.30***	−0.27**	−0.32***	−0.33***	−0.23**
Quality of feedback	−0.28***	−0.25***	−0.26**	−0.19*	−0.30***	−0.30***	−0.22**
Language modeling	−0.16	−0.15	−0.17	−0.11	−0.15	−0.17	−0.10

Note. ^aDenotes (ideal frequency rating scores − frequency rating scores).

p < .05. **p < . 01. ***p < .001.

Table 5.

Correlations Between the CSS Stage 2 Behavior Management Strategies Scale Discrepancy Scores^a and the CLASS Domains and Dimensions.

	CSS Stage 2 BMS scales
CLASS domains and dimensions	Total Scale score	Proactive methods composite	Prevention management	Directives/ transitions	Behavioral feedback composite	Praise	Corrective feedback
Emotional support domain	−0.28**	−0.18*	−0.22*	−0.12	−0.33***	−0.34***	−0.22*
Positive climate	−0.34***	−0.27**	−0.28**	−0.21*	−0.34***	−0.36***	−0.20*
Negative climate	0.12	0.11	0.11	0.09	0.10	0.11	0.05
Teacher sensitivity	−0.22*	−0.11	−0.14	−0.06	−0.30**	−0.31**	−0.19*
Regard for students perspectives	−0.12	−0.05	−0.09	0.00	−0.18*	−0.16	−0.14
Classroom organization domain	−0.33***	−0.29**	−0.25**	−0.26**	−0.30**	−0.24**	−0.29**
Behavior management	−0.29**	−0.28**	−0.22*	−0.27**	−0.25*	−0.17	−0.27**
Productivity	−0.25**	−0.23*	−0.20*	−0.21*	−0.22*	−0.13	−0.26**
Instructional learning formats	−0.28**	−0.20*	−0.20*	−0.16	−0.30**	−0.29**	−0.21*
Instructional support domain	−0.27**	−0.32**	−0.35***	−0.24**	−0.18*	−0.10	−0.24**
Concept development	−0.26*	−0.33***	−0.34***	−0.27**	−0.15	−0.04	−0.23**
Quality of feedback	−0.25**	−0.25**	−0.31**	−0.17	−0.21*	−0.14	−0.24**
Language modeling	−0.13	−0.15	−0.17	−0.11	−0.08	−0.06	−0.08

Note. ^a Denotes (ideal frequency rating scores − frequency rating scores).

p < .05. **p < . 01; ***p < .001.

As shown in Table 4, all of the CSS PIS scales negatively correlated with the CLASS classroom organization and the instructional support domains and dimensions of positive climate, instructional learning formats, concept development, and quality of feedback. With the exception of the CSS instructional delivery subscale, all other CSS PIS scales negatively correlated with the CLASS emotional support domain. All of the CSS PIS scales with the exception of the academic feedback subscale, negatively correlated with the CLASS dimension of behavior management. The CSS instructional delivery and academic feedback subscales did not correlate with the CLASS productivity dimension, while all other CSS PIS scales negatively correlated with this dimension. The CSS subscales student focus learning and engagement and promotes student thinking negatively correlated with the CLASS regard for students perspectives dimension. The CSS monitoring and feedback composite, along with its associated subscale academic feedback, negatively correlated with the CLASS teacher sensitivity dimension. The CSS monitoring and feedback composite, along with its associated subscale of promotes student thinking, positively correlated with the CLASS negative climate dimension. No correlations were found between the CSS discrepancy scale scores and the CLASS language modeling dimension.

As shown in Table 5, all of the statistically significant CSS BMS scales were negatively correlated with the CLASS domains and dimensions. All of the CSS BMS scales negatively correlated with the CLASS positive climate dimension and classroom organization domain. All of the CSS BMS scales, except for the praise subscale, negatively correlated with the CLASS behavior management and productivity dimensions, as well as the instructional support domain. All of the CSS BMS scales correlated with the CLASS emotional support domain and instructional learning formats dimension with the exception of the directives/transitions subscale. The CSS BMS Total Scale, proactive methods composite and associated subscales (prevention management and directives/transitions subscales), and the corrective feedback subscale correlated with the CLASS concept development dimension. The CSS BMS Total Scale, the behavioral feedback composite score and its associated corrective feedback subscale, along with proactive methods composite and associated prevention management subscale correlated with the CLASS quality of feedback dimension. The CSS BMS Total Scale, along with the behavioral feedback composite and associated subscales, negatively correlated with the CLASS teacher sensitivity dimension. The CSS BMS scales and the CLASS negative climate and language modeling dimensions were not correlated.

Discussion

This investigation examined the concurrent and discriminant validity of the CSS Observer Form in comparison to the CLASS. Overall, the CSS evidenced acceptable concurrent and discriminant validity. Results are discussed in light of the unique conceptual and methodological differences between the measures.

Both the CSS and CLASS are observational measures that assess important and complimentary elements of effective classrooms. The CLASS measures global classroom quality via student–teacher interactions, while the CSS measures educators’ use of specific instruction and behavioral management strategies in learning contexts. The CLASS was designed for large-scale research and is often administered by nonschool personnel. In contrast, the CSS was designed for routine educational practice (i.e., repeated use for teacher progress monitoring) for school personnel.

The CLASS and CSS also differ in the level of score inference. As noted by the CLASS test authors, “Because of the highly inferential nature of the CLASS, scores should never be given without referring to the manual” (Pianta et al., 2008, p. 17). CLASS observers are asked to infer the quality of student–teacher interactions by considering several markers (e.g., flexibility and student focus, support for autonomy, and leadership) each which include several indicators (e.g., incorporates student’s ideas) before providing an overall quality rating on a 7-point Likert-type scale. In contrast, CSS observers are asked to tally the frequency of six teacher strategies followed by the completion of the Stage 2 PIS and BMS Strategy Ratings scales (frequency of strategies used and the ideal frequency of strategies used) and Stage 3 classroom checklist. Results in this investigation are consistent with previous CLASS concurrent validity studies (e.g., La Paro et al., 2004; Pianta et al., 2005). Pianta et al. (2005) noted that the disparity between quantity-related measures (Snapshot) versus quality-related measures (CLASS) may yield weaker correlation coefficients (concurrent validity).

Findings highlight the relationship between measures of classroom quality and teachers’ use of evidence-based instructional and behavioral management strategies. Of important note, the directional nature of the CSS and CLASS correlation coefficients underscore the abilities of the measures to tap aspects of classroom quality and classroom practices both independently and collectively. For example, the CSS Stage 1 corrective feedback (frequency) negatively correlated with the CLASS emotional support domain, the classroom organization domain and its three associated dimensions (Table 3). Similarly, corrective feedback positively correlated with the CLASS negative climate dimension. The CSS’s academic response opportunities (frequency) positively correlated with the CLASS classroom organization domain, its three associated dimensions, the CLASS instructional support domain and the dimensions of concept development and language modeling. As measured by the CSS, greater use of academic response opportunities enhance: (a) verbal communication between the teacher and students that may increase CLASS language modeling dimension scores and (b) develop concepts that may increase concept development dimension scores. research has found that educators’ increased use of opportunities to respond (i.e., academic response opportunities) in general improves on-task behavior, disruptive behavior, instructional time efficiency, and academic outcomes (e.g., Partin, Robertson, Maggin, Oliver, & Wehby, 2010; Sutherland et al., 2003). These outcomes are associated with the CLASS classroom organization domain and behavior management, productivity, and instructional learning formats dimensions.

As shown in Tables 4 and 5, high-quality classrooms use both instructional and behavioral management strategies. The CSS Stage 2 PIS and BMS discrepancy scores resulted in several statistically significant correlations with the CLASS domains and dimension scores. As indicated, the CSS discrepancy scores reflect differences between teacher’s usage of strategies (frequency ratings) and ideal usage of strategies (ideal frequency ratings) in the context of lessons. The larger the CSS discrepancy score the greater the need for a teacher to change his/her instructional and behavioral management practices. Overall, the CSS PIS and BMS discrepancy scores and CLASS domains and dimensions scores yielded negative correlations with the exception of the CLASS negative climate dimension, suggesting that as teachers need for change in classroom practices increased, classroom quality as measured on the CLASS domains and dimensions decreased. These results have potential implications for tailoring supports and interventions for teachers’ best classroom practices that promote student learning outcomes.

Strengths and Limitations

Study strengths are (a) inclusion of a stratified multistate sample of general education teachers and (b) comparison of a new measure to a widely researched classroom observation system of global classroom quality (the CLASS). Potential limitations for generalizability of findings are (a) teachers were predominantly female Caucasians from New Jersey and New York and (b) results may not be generalizable to other states and populations, early education/middle/high school settings, or special education settings. Importantly, these limitations would apply equally to the CLASS and CSS measures, so they do not appear to negatively impact the primary question of concurrent and discriminant validity.

It is important to note that the measures used in this investigation represent operationalized, objective, psychometrically sound tools for documenting teacher behavior. These can be contrasted with narrative or unstructured observations that principals may use during walk-throughs or observations. The CSS requires approximately 60 min of observer time per teacher which is comparable to observation time in routine educational practice.

Conclusion

This investigation provides initial evidence of the concurrent and discriminant validity of the CSS Observer Form. As hypothesized, findings revealed that the CSS exhibits acceptable, but modest concurrent validity with the CLASS. Results suggest that the CSS measures constructs similar and complimentary to those assessed by an observational measure that taps global classroom climate of teacher–student interactions. Findings highlight the unique features of each measure and how each measure may aid in the assessment process of teachers.

While the process of validation of any new measure is an ongoing endeavor, this study offers initial support for the utility of the CSS as a measure of general education teacher strategies for educational practice. Like all assessment, the CSS has particular strengths and weaknesses. Yet the CSS offers school personnel a promising tool for assessing teacher practices that is practical, reliable, and valid.

Footnotes

Authors’ Note

The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A080337 to Rutgers University.

Notes

References

Bales

B. L.

(2006). Teacher education policies in the United States: The accountability shift since 1980. Teaching and Teacher Education, 22, 395-207.

Connor

C. M.

Son

S. H.

Hindman

Morrison

F. J.

(2005). Teacher qualifications, classroom practices, and family characteristics: Complex effects on first graders’ language and early reading. Journal of School Psychology, 43, 343-375.

Coutinho

Wiemer-Hastings

Skowronski

J. J.

Britt

M. A.

(2005). Metacognition, need for cognition and use of explanations during ongoing learning and problem solving. Learning and Individual Differences, 15, 321-337.

Fletcher

J. M.

Lyon

G. R.

Fuchs

L. S.

Barnes

M. A.

(2007). Learning disabilities: From identification to intervention. New York, NY: The Guilford Press.

Gable

R. A.

Hester

P. H.

Rock

M. L.

Hughes

K. G.

(2009). Back to basics: Rules, praise, ignoring and reprimands revisited. Intervention in School and Clinic, 44, 195-205.

Huberty

C. J.

(1984). Issues in the use and interpretation of discriminant analysis. Psychological Bulletin, 95, 156-171.

La Paro

Pianta

R. C.

Stuhlman

(2004). The Classroom Assessment Scoring System: Findings from the prekindergarten year. The Elementary School Journal, 104, 409-426.

Malone

L. M.

Cabili

Henderson

Esposito

A. M.

Coolahan

Henke

Boller

(2010). Compendium of student, teacher, and classroom measures used in NCEE evaluations of educational interventions. Volume II. Technical details, measure profiles, and glossary (Appendices A–G) (NCEE 2010-4013). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.

Marzano

R. J.

(1998). A theory-based meta-analysis of research in instruction (Tech. Rep.). Aurora, CO: Mid-continent Regional Educational Laboratory. (ERIC Document Reproduction Service No. ED 427 087).

10.

Marzano

R. J.

Pickering

D. J.

Pollock

J. E.

(2001). Classroom instruction that works: Research-based strategies for increasing student achievement. Alexandria, VA: Association for Supervision and Curriculum Development.

11.

Matthews

M. R.

(2000). ‘Editorial’ of the monographic issue on constructivism, epistemology and the learning of science, Science & Education, 9, 491-505.

12.

No Child Left Behind Act, 20 U.S.C. 16301 et seq. (2001).

13.

Pakarinen

Lerkkanen

Poikkeus

Kiuru

Siekkinen

Rasku-Puttonen

Nurmi

(2010). Validation of the Classroom Assessment Scoring System in Finnish kindergartens. Early Education and Development, 21, 95-124.

14.

Partin

Robertson

Maggin

Oliver

Wehby

(2010). Using teacher praise and opportunities to respond to promote appropriate student behavior. Preventing School Failure, 54, 172-178.

15.

Pianta

R. C.

La Paro

K. M.

Hamre

B. K.

(2008). Classroom Assessment Scoring System [CLASS] Manual: Pre-K. Baltimore, MD: Brookes Publishing.

16.

Pianta

R. C.

Howes

Burchinal

Bryant

Clifford

Early

Barbarin

(2005). Features of pre-kindergarten programs, classrooms, and teachers: Do they predict observed classroom quality and child-teacher interactions? Applied Developmental Science, 9(3), 144-159.

17.

Reddy

Fabiano

(2012). Assessment of general education teachers’ tier classroom practices: Current science and practice. Manuscript submitted for publication.

18.

Reddy

Fabiano

Barbarasch

Dudek

(2012). Behavior management of students with Attention-Deficit/Hyperactivity Disorders using teacher and student progress monitoring. In Crothers

L. M.

Kolbert

J. B.

(Eds). Understanding and managing behaviors of children with psychological disorders: A reference for classroom teachers. New York, NY: Continuum International Publishing Group, Inc.

19.

Ritchie

Howes

Kraft-Sayre

Weiser

(2001). Emerging academic snapshot. Unpublished measure, University of California at Los Angeles.

20.

Sugai

Horner

R.H.

(2002). The evolution of discipline practices: School-wide positive behavior supports. Child and Family Behavior Therapy, 24, 23-50.

21.

Sutherland

K. S.

Adler

Gunter

P. L.

(2003). The effect of varying rates of opportunities to respond to academic requests on the classroom behavior of students with EBD. Journal of Emotional and Behavioral Disorders, 11, 239-248.

22.

Tomlinson

C. A.

Edison

(2003). Differentiation in practice: A resource guide for differentiating curriculum, grades K-5. Alexandria, VA: Association for Supervision and Curriculum Development.

Concurrent Validity of the Classroom Strategies Scale for Elementary School—Observer Form

Abstract

Keywords

Method

Sample

Measure

CSS for Elementary School—Observer Form (version 2. 0)

Classroom Assessment Scoring System (CLASS)

Procedures

Results

CSS Stage 1—Frequency of Teacher Behavior

CSS Stage 2—Strategy Rating Scale Discrepancy Scores

Discussion

Strengths and Limitations

Conclusion

Footnotes

Authors’ Note

Declaration of Conflicting Interests

Funding

Notes

References