Abstract
Teacher reports on school organizational functioning, curricular processes, and student engagement are a reliable means of ascertaining valuable information about classroom climate and learning outcomes. Yet, to date, the vast majority of quantitative teacher-reported data, where teachers themselves reach judgments about educational processes, have been summary rather than lesson specific, where teachers evaluate classroom experiences at the moment of instruction. In this study, we examine how lesson-specific teacher survey reports generate insight into the relationship between student engagement and instruction. Results suggest that this underutilized design has significant application for procuring data on within-teacher variability in practice, especially in studies focused on student engagement, active change in teacher practice, and/or teacher buy-in as a mediator of outcomes. Ultimately, we argue that lesson-specific teacher reports may be a valuable tool for researchers in measuring instructional change.
Keywords
The critical call for an increase in participatory orientations toward research in educational effectiveness and school improvement is mounting as teacher perspectives continue to go unmeasured (Brydon-Miller, 2001; Hall, 1981; Horton, 1993; Kindon et al., 2007). Participatory research designs, wherein teachers and school leaders help shape and inform the research process, seek to balance inferential rigor with responsiveness to community knowledge (Viswanathan et al., 2004). A small but expanding body of literature has recognized the value and need for teacher knowledge in educational research and has begun to examine the relationship between teacher perceptions of their classroom and classroom instructional practices (Gutiérrez-Clellen & Kreiter, 2003; Richardson et al., 1991; Wolraich et al., 1998). Many of these studies have found teacher self-reports to be a valid and reliable way to measure instructional outcomes (Desimone et al., 2010; Mullens, 1998; Porter, 1993). Yet such approaches have been slow to permeate large-scale quantitative research. The present study illustrates how teachers can provide insight about the relationship between instruction and student engagement by way of repeated teacher survey reports, establishing the feasibility of capturing standardized teacher input in quantitative studies of instruction.
Teacher survey reports of school organizational functioning, curricular processes, and student outcomes are long-standing components of large scale, high profile longitudinal studies, such as those conducted by the National Center for Education Statistics, the National Institute of Child Health and Human Development, and other organizations. Yet, to date, the vast majority of quantitative teacher-reported data, where teachers themselves reach judgments about educational processes, have been summary, with responses capturing practices across a semester or more, rather than lesson specific, with teachers evaluating classroom experiences at the moment of instruction (see, e.g., Caprara et al., 2006; Ross et al., 1996). 1 This type of summary data collection often neglects to capture lesson-specific “moment-in-time” data on the grounds that it is highly variable across any given school year (Opfer et al., 2011; Rogosa et al., 1984) and subject to inaccuracies in teacher reporting (Kaufman, et al., 2016; Smithson, 1994). We argue, in contrast, that repeated lesson-specific teacher reports of student engagement offer an underutilized analytic tool in the study of curriculum and pedagogy with the potential to provide distinct and useful information.
Teachers have considerable sensitivity in appraising daily classroom conditions and outcomes. In this essay, we begin by exploring how “teacher noticing” about their classrooms and another widely studied construct, teacher efficacy, have been researched systematically. Studies of teacher efficacy reveal profound within-teacher differences in experience across class periods and days. We then describe lesson-specific research on student engagement, which offers a model of teacher reports that might be used to study a variety of instructional innovations (Shernoff et al., 2016). Finally, we briefly report the results of the present study, which used such teacher reports to demonstrate an association between instruction and student engagement. The subsequent discussion outlines the limitations and possibilities of lesson-specific teacher reports in education research.
Teacher reports will never be sufficient stand-alone evidence of the effectiveness of many educational reforms because changes in student engagement and learning are not fully observable, and teacher reports are too inherently imprecise to detect small but meaningful change. Yet teacher reports can say a great deal about the nature of implementation—revealing when teachers themselves notice an improvement in the classroom learning experience. In research where instructional innovation is expected to have a proximal effect on engagement, but where achievement is affected significantly by out-of-school mechanisms, teacher reports may be more revealing than administrative or other achievement data. Moreover, technological innovation has made lesson-specific teacher reports an especially efficient way for researchers to document experiential outcomes in the classroom.
Teacher Noticing and Teacher-Reported Data
Noticing is a central practice in the work of teaching and a well-established component of teaching expertise (National Research Council, 2001). Teachers notice (observe, realize, and attend to) student behavior, student learning, student relationships, student thinking, pedagogies, and curricular design with high degrees of accuracy (Sherin et al., 2011). Studies aimed at assessing teacher-noticing skills have been so compelling and replicable in their findings that the field has largely shifted in focus to study more minute dimensions of noticing in teaching. For example, there is evidence to suggest that as teachers move from novice to more expert, they refine their noticing skills and apply richer schemata to their appraisals of the classroom. In this way they are more able to focus their noticing on the aspects of the classroom climate that matter most (Berliner, 2001; Santagata, 2011).
There is also evidence to suggest that we may be able to learn about teacher epistemologies and develop new approaches for modeling teacher cognition through exploring teacher noticing (Russ & Luna, 2013). While teacher noticing stands as one area of teaching expertise, it must be applied to the interpretation of student understandings and development of appropriate instructional strategies to be useful (Hill & Chin, 2018). It is through the interplay of teacher attending, analyzing, and responding that favorable instructional outcomes are produced (Barnhart & van Es, 2015). The overarching consistency and broad application observed in this area of research suggests that the study of teacher noticing may prove a useful access point for future transformative research (Sherin et al., 2011).
Systematic Studies of “Within-Teacher” Variability in Teacher-Reported Data
Teacher-reported data can be utilized as a means to study within-teacher variations in educational experience. Perhaps the best example of research documenting the dramatic differences that can occur “within teachers” in the quality of the learning experience is in the literature on teacher efficacy. Much as student self-efficacy affects engagement and learning, teacher perceptions of their own professional efficacy affect teacher effort and outcomes in the classroom and even longer term career outcomes (Lee et al., 1991; Malmberg et al., 2014; Raudenbush et al., 1992; Riehl & Sipple, 1996, Rosenholtz, 1989). Teacher efficacy appears to align especially closely with insights from the research literature on curriculum tracking. Raudenbush et al. (1992) used multilevel models to explore differences in efficacy and satisfaction across the course of a teacher’s day as they moved from class to class, finding that within a teacher’s daily schedule, the difference between an academic and nonacademic course was associated with almost a full standard deviation difference in teacher efficacy.
To establish such relationships, researchers need to collect repeated measures from teachers across multiple days or class periods. Such repeated measures that allow for within-person models are considered highly desirable in quantitative social science research because they help remove selection bias associated with individuals (see, e.g., Dee & West, 2011; Guo & VanWey, 1999; Hamaker, 2012; Kelly & Carbonaro, 2012). Yet, repeated teacher reports, of efficacy or any other lesson-specific outcome, are quite rare in the educational sciences. As one example, consider the 2009–2011 Measures of Effective Teaching Study, which was specifically designed to study instructional processes and designed by researchers well-versed in the utility of within-person, “fixed-effects” models. In this study, multiple video-taped observations were collected from each teacher, which were in turn coded by multiple raters, while students completed surveys on the quality of the learning environment. The breadth and quality of the measures of instruction in the Measures of Effective Teaching data are nearly unprecedented . . . but the teachers themselves were never asked to report whether class sessions were successful or any other outcome. In other research, teachers have been asked to report on various features of the learning environment or instruction, but usually in a summative way referring to an extended period of time (e.g., Ponitz et al., 2009).
“Lesson-Specific” Measures of Instruction
Lesson-specific measures of instruction using teacher-reported data are found primarily in fields such as special education or in studies of curricular (often literacy related) interventions. For example, in a study of contrasting instructional conditions (worksheets vs. iPads), Haydon et al. (2012) investigated math fluency and academic engagement, with teachers reporting a lesson-specific preference for the use of iPads. In another example, Kern et al. (2002) used daily teacher logs to document the effects of curricular modification on student engagement and student behavior in a self-contained science class. Lesaux et al. (2010) used teacher implementation logs to collect daily information on the strengths and weaknesses of an academic vocabulary intervention. However, in much research that utilizes repeated observations of classroom instruction, the individual with the closest knowledge of the lesson goals and students’ prior knowledge base, the teacher herself, rarely provides a corresponding lesson-specific estimate of success.
Lesson-Specific Measures of Student Engagement as a Model for Teacher Reports
The experience sampling method (ESM) approach to studying student engagement offers a model for structuring teacher reports on lesson outcomes. Here, a lesson can be a traditional classroom period, a more specific, discrete instructional segment within a period, or alternately, a unit of analysis that spans multiple class periods (e.g., a thematic unit). In ESM studies, students are prompted (often randomly) during class to complete a brief report on various dimensions of engagement. To date, ESM research has been paired with a flow-theory based emphasis on three engagement related constructs: interest, concentration, and enjoyment (Csikszentmihalyi & Schneider, 2000; Shernoff et al., 2003; Shernoff et al., 2016; Shernoff & Schmidt, 2008). Such reports can be used to show how engagement changes in response to instructional process for the same teacher and even for the same student (adjusting out a student’s average, baseline level of engagement). ESM reports can also of course be used to show common aggregate differences in sub-domains of engagement across different subjects, and even to explore engagement in various out-of-school contexts and activities (Shernoff, 2013).
Conceptual Framework
Conceptually, the ESM method focuses on how features of instruction, the immediate learning environment, and the match between the learner and his or her environment affect engagement “at the moment of instruction.” Thus, we view the ESM method as more closely aligned with models of engagement focusing on the experience of learning (see discussion in Bempechat & Shernoff, 2012) than to models of engagement that emphasize belonging and school identification writ large (e.g., Finn, 1989; Voelkl, 1997). Referencing Newmann et al.’s (1992) classic model of engagement, we might, oversimplifying somewhat, argue that the ESM approach is better tailored to studying the effects of authentic work than the effects of school membership.
Even taking its conceptual frame as given, the ESM measurement approach is not perfect; it suffers from all the typical survey challenges (Desimone & Le Floch, 2004; Tourangeau, 1984), as well as some specific to the study of engagement. As a self-report measure from students, ESM readily captures the affective and phenomenological aspects of engagement, but the opposite is true for teachers who observe students’ effortful behavior but can only infer interest, enjoyment, so on. There is even some conceptual lack of clarity in this line of research; it is not clear whether the measured subdomains of interest, concentration, and enjoyment reflect a student’s underlying engagement or collectively constitute engagement. 2 Additionally, when used in-class, there is a certain amount of disruption involved in stopping to provide a self-report of one’s engagement. Finally, adolescents and teenagers might be mischievous in responding (Robinson-Cimpian, 2014).
An ESM style system can be readily adapted for use by teachers. Teachers, active in evaluating student engagement (Harris, 2011; Valli, 1997), can provide an estimate of student engagement that is grounded in effortful behavior, as opposed to the affective understanding students estimate about their own levels of engagement. Furthermore, class is very rarely disrupted by a teacher version of this report as teachers can complete short surveys in-between periods or later in the day instead of during class, addressing pragmatic considerations. The fundamental sources of error in survey responses will still be present (e.g., teacher to teacher differences in interpretations of response categories), but we might also expect much less mischievous responding.
Using Teacher Reports to Investigate the Relationship Between Instruction and Engagement: An Example From English Language Arts Discourse Research
Results from a study of high school English instruction in Western Pennsylvania in 2018 demonstrate a potential relationship between instruction and engagement through teacher-reported data. The study was part of a program of research to develop automated methods of teacher observation (see, e.g., Kelly et al., 2018; Stone et al., 2019) and uses the data from 16 teachers featured in Jensen et al. (2020). Observational measures of instruction were collected focusing on teacher’s discourse practices along with teacher reports of student engagement during each lesson. Teachers were instructed to record entire class periods (which yielded audio files averaging about 43 minutes in length) and were further directed to record on days that would feature substantial quantities of whole class instruction. Teachers recorded and uploaded teacher audio data, which were subsequently coded by expert human raters as well as machine-learning algorithms. They were then prompted to complete an online survey based on the featured lesson/class period. Are features of teacher discourse associated with student engagement, including use of authentic questions and other dialogic discourse practices (Nystrand & Gamoran, 1997)?
Teacher Report Items
Three items assessing the level of engagement from ESM studies were adapted for teacher reports.
Considering the materials and activities used in class today, how much interest did students exhibit in the lesson?
Considering today’s activities, how hard were students concentrating on their lesson tasks?
Considering today’s activities, how much did students seem to be enjoying class?
Each of these questions was followed by four response options ranging from “hardly any” to “a great deal,” for example: “Students exhibited hardly any concentration,” “Were only somewhat concentrating,” “Were mostly concentrating well,” and “Students exhibited a great deal of concentration.”
We assessed relationships between discourse features and engagement outcomes using ordered-logit regression models. Compared with linear probability models, ordered-logit regression models have the advantage of relaxing the interval-scale assumption of our dependent measures (Long, 1997). Given the relatively small sample size (74–114 lesson-specific teacher reports), we report several alternative estimates for standard errors in the tests of statistical significance. Table 1 reports the association between these engagement items and several discourse features, including teacher use of authentic questions and high cognitive level questions that require a generalization, comparison, or other analysis as opposed to simple recall/reports of information. We evaluated the association for two kinds of reports: all teacher reports regardless of the amount of intervening time between the lesson and teacher completing the survey, and only same day reports.
Association Between Observations of Discourse Features and Teacher-Reported Dimensions of Engagement: Ordered Logit Coefficients With Alternative Standard Errors
Note. Descriptive statistics (sample size, mean, and standard error) followed by ordered logit regression coefficients. Standard errors: (conventional), {corrected for teacher clustering}, [bootstrapped]. Only conventional standard errors reported for null/negative results.
Only conventional standard errors reported for null/negative results.
p < .05. **p < .01. ***p < .001.
We find a statistically significant relationship between authenticity and cognitive level of questions and teacher reports of student interest, concentration, and enjoyment of the lesson. The ordered-logit coefficients ranged from 1.21 to 1.57 for authenticity and from 2.37 to 3.73 for cognitive level. To translate these to a more meaningful effect size, a change in the prevalence rate of authentic questions from 0.2 to 0.4 (a doubling) increases the predicted probability of a “great deal of . . .” rating in teacher reported student interest from 0.205 to 0.251. For high cognitive level, an increase from a prevalence rate of 0.2 to 0.4 increases the probability as above from 0.223 to 0.369. Thus, the estimated coefficients for question authenticity correspond to relatively small increases in the probability of positive engagement ratings as the prevalence of authenticity increases, while the effect is more substantial for high cognitive level utterances.
Null Findings
In addition to the results shown in the top panel of Table 1, we also investigated several other teacher discourse measures, including teachers’ use of serial questions (i.e., conjunctive or repeated questions), the presence of negative evaluations of student responses, the presence of disciplinary language (i.e., English and language arts disciplinary terms), and discourse that establishes or clarifies learning goals (goal specificity). None of these discourse measures showed a consistent statistically significant positive association with student engagement (see bottom panel of Table 1). Additionally, we also collected teacher reports as outcome measures for the dispersion in student engagement (see Kelly, 2007, for discussion of teacher discourse and the distribution of student engagement). We found no relationship between the teacher discourse measures and teacher reports of the evenness or unevenness of student engagement (results not reported in table form).
Discussion
The approach taken in this study has some specific paradigmatic features and limitations. Researcher and participant conceptions of student engagement influence the chosen survey question stems and observable markers of student engagement. The present data collection entailed a pointed unit of analysis (the lesson over an entire class period, featuring whole class instruction in middle school English), but this survey approach could be used to study other units of instruction (e.g., a science lesson on an individual concept, or a thematic unit that spanned multiple class periods). Our framework stands as one salient means to measure teacher perspectives from a “moment-in-time” conception of student engagement, which focuses primarily on behavioral engagement. That is, while the terminology references phenomenological aspects of engagement (e.g., enjoyment), teachers can only report on what they observe (i.e., enjoyment inferred from behavior). Behavioral engagement includes student adherence to norms and involvement in classroom activities that is readily observable to teachers. Other types of engagement may be more internalized in nature and more stable from class to class, such as student identification with school. Behavioral engagement is however quite useful for understanding positive academic outcomes and social–emotional well-being (Archambault et al., 2009; Royal & Rossi, 1996).
When Are Teacher Reports of Student Engagement Useful?
In Table 2, we provide a broad overview of the main study contingencies that affect methodological fit, making lesson-specific teacher reports more or less useful. First, the topical focus of the research endeavor and theory of change should be considered. Consistent with the empirical example provided in this article, teacher reports may be especially useful when student engagement is expected to mediate outcomes. Many curricular reforms are premised on generating a recovery of engagement in students who are frequently disengaged and have come to de-identify with their schools (Davis & McPartland, 2012; Marks, 2000; Orthner et al., 2013), so we anticipate a wide array of researchers will find broad topical congruence with this method. Relatedly, teacher reports will be most useful when the specific measures align closely with teachers’ own instructional goals. In this study, we used dimensions of engagement that are broadly incorporated in learning goals, but future research could also use teacher feedback to inform the measures used or include open-ended prompts eliciting lesson-specific learning and engagement goals.
Potential Strengths, Weaknesses, and Preconditions (Methodological Fit) of Lesson-Specific Teacher Reports Based on Study Contingencies
Second, lesson-specific teacher reports will be most informative when reforms require teachers to actively implement and sustain instructional changes on a daily basis (Connor et al., 2007; Rowan & Correnti, 2009). Some reforms operate more pervasively and more independently of individual teacher decision making. For example, class-size reduction is a reform intended to improve elements of instruction, and teachers might enhance its effects with purposeful changes in classroom activities (Ehrenberg et al., 2001). However, while instructional changes presumably mediate the effects of class-size reduction, realizing the benefits of this reform may not always require active teacher change; classroom disruptions may naturally decline while individual attention to students naturally increases.
Third, teacher reports of student engagement may be most useful to researchers when teacher buy-in or teacher learning is needed to create and sustain that change (Blankstein, 2004; Klem & Connell, 2004; Wayman, 2005). In this case, evaluations from a source they trust—themselves—may be especially useful. The ability of many reform agendas to improve school outcomes is predicated on teacher trust of the school as an organization. When teachers’ own perspectives and viewpoints are incorporated into school-wide goal setting, teachers are much more likely to have a high degree of school trust, reinforcing the capacity to improve school and student learning outcomes (Forsyth et al., 2006). If these three basic improvement conditions are present, researchers should consider including lesson-specific teacher reports of student engagement in the evaluation design.
What Study Contingencies Should Be Considered?
Researchers should consider a set of contingencies that might limit the utility of lesson-specific teacher reports. First, if teachers have strong existing preferences toward a given instructional approach, then their own perceptions of student engagement or other outcomes may be biased (and this may be correlated to some degree with the experience level of teachers being studied). The specific empirical example reported in Table 1 pairs observational data with teacher reports where teachers had no knowledge of the measures of instruction at the time of reporting. Thus, the independent variable (teacher discourse practices) and dependent engagement outcomes were completely independent, removing the concern of monomethod bias that plagues many large-scale, survey studies. However, in much research the teacher must have full knowledge of the curricular innovation, in which case teacher reports might be biased by Hawthorne effects (wishing to confirm the efficacy of a particular study condition) and may be less useful. Then again, knowledge of study conditions does not always translate into bias, and both researchers and teacher study participants might approach curriculum reform from an agnostic standpoint. As one example, consider McConn’s (2016) teacher-led research into extensive versus intensive reading instruction. There are many other examples of curricular decisions and reforms where competing perspectives on student learning make it almost impossible to form an a priori expectation about outcomes (e.g., the use of perceptually rich manipulatives in early mathematics in McNeal, 2009), and where student engagement is a key mediating mechanism between instruction and learning.
Second, if there is low within-teacher variance in instructional processes anticipated to begin with, and most of the “action” is thought to lie across/between teachers, then teacher reports are less useful because they will be confounded with other teacher characteristics. Third, the use of lesson-specific teacher reports to evaluate instructional change presumes that effects occur almost immediately, that effects can be detected at the moment of instruction. If instead, effects might slowly accumulate and be revealed over time, then lesson-specific teacher reports collected over a short interval would be inadequate. Relatedly, lesson-specific teacher reports may simply be poorly suited to detecting small effects that become noticeable only over larger populations (e.g., think again of the possible effects of class-size reduction). Finally, site access to technology and instruments that enable repeated-measures impact methodological fit accordingly.
Limitations
As a more fundamental limitation, consider that sociocultural sources of engagement, including the influence of contextual factors and student identity on individual student demonstrations of engagement, are not the focus of the present study and would likely be more accurately measured through a study of specific students or groups within classrooms across longer periods of time. Contextual and structural conditions of inequity (present both at school and home) affect student degrees of identification with school, which, in turn, affect student participation in classroom activities, propensity to follow classroom and social rules of behavior, and likelihood of investing energy in understanding subject matter (Finn, 1989; Fredricks, et al., 2004). Likewise, teachers are subject to these contexts and their effects. As such, we note that the present study does not purport to explore the varied origins of visible student engagement, but nevertheless was built on the premise that organized research on student engagement should strive to be meaningfully embedded within its social, political, and historic context (Voelkl, 2012). In advocating for participatory methods more broadly, we aim to remain mindful of those contextual forces and highlight the need for universal research and teaching approaches designed for classrooms and students who possess varied needs and experiences (Rose & Meyer, 2006; Tomlinson, 2014).
Conclusion
The set of methods developed here admittedly suggests a relatively modest shift in epistemology toward teacher-centered research. Yet, struck by the ubiquity of external observations in much quantitative research on teaching, we believe models are needed to shape future research. By providing an empirical example of the potential for lesson-specific teacher reports, we hope that researchers in the field are prompted to more systematically identify lesson-specific teacher knowledge and perspectives. Additionally, we suggest more broadly that a move to more participatory research methods offers significant benefit to the field. For teachers, we would make the post hoc hypothesis that repeated survey prompts of the kind used here may serve as useful heuristics to support teacher self-evaluations—by offering focusing concepts.
To conclude, we reiterate that teacher reports will never be sufficient stand-alone evidence for many educational reforms because changes in student engagement and learning are not fully observable, and teacher reports too inherently imprecise to detect small but meaningful changes. When reforms are intended to produce small changes that accumulate over days and even years, more precise summative measures of learning are needed. And because large-scale studies using teacher-reports in the most inferentially powerful within-person designs are so very rare, right now there is little evidence of how such reports correlate with student achievement growth and other long-range outcomes. Yet technological advancements mean that teachers can submit lesson-specific data using online tools with great efficiency. In much research and development on teacher observation and learning, the ultimate goal is to provide teachers and researchers with tools that best measure instructional change. In this work then, there seems to be utility in being able to highlight the particular practices that teachers themselves report to be most strongly associated with lesson improvement.
