Abstract
This paper introduces the LIDO, or the Low-Inference Discourse Observation tool, that captures discourse moves produced by students and teachers in whole-classroom discussions. Coding methods are described, followed by analyses that explore validity of the LIDO through correlations among LIDO-coded discourse moves and between LIDO scores and scores on the Instructional Support domain of the Classroom Assessment Scoring System-Secondary, utilizing 643 audio-recorded classroom lessons. Observations were conducted in fourth through seventh grade urban classrooms, including English Language Arts, Mathematics, Science and Social Studies. Rates of teacher and student discourse moves correlated with each other and with CLASS scores in expected ways, providing evidence of internal and convergent validity. Implications for use in research are discussed, including specific advantages of this new approach, such as the capacity to tease apart teacher behavior from student behavior in the context of classroom interactions.
Introduction
Over the last several decades, a focus on expansive and “dialogic” discourse in classrooms has grown, both in research on teaching and learning, and in national standards for instruction, such as the Common Core (National Governors Association, 2010) and the Next Generation Science Standards (NGSS Lead States, 2013). This form of discourse is composed of talk that extends far beyond the traditional Initiation-Response-Evaluation (IRE; Mehan, 1979)—the three-part structure where the teacher initiates a question, a student gives a short response, and the teacher evaluates that answer (Mehan & Cazden, 2015). As in many fields, this has generated different labels which overlap and are often used interchangeably, such as “dialogic discourse” (Alexander, 2019), “inquiry dialogue” (Reznitskaya & Wilkinson, 2017), “academically productive talk” (Michaels & O’Connor, 2015; O’Connor & Michaels, 2018) or often just “classroom discussion.”
While there is by no means consensus about the most important defining characteristics of “dialogic pedagogy” mediated by talk (cf. Howe et al., 2019; Asterhan et al., 2020; Alexander, 2019), most work in this area features examples of talk in which the teacher poses a question or topic that is substantive, complex, and thought-provoking, without an immediate answer, and then supports and sustains extended dialogue around that question. Students engage by sharing their reasoning, which the teacher responds to with “uptake”—instead of moving quickly from student to student, the teacher supports students in elaborating and clarifying their contributions. The discussion is not simply bi-directional consecutive turns between the teacher and each individual student. Rather, students are encouraged to respond to their classmates and share their understandings—reasoning and evidence and experience—with one another, either in pairs or small groups, or in whole class discussions. They then work together to refine their thinking and reach deeper and broader understandings.
This kind of talk is thought by many to be supportive of multiple kinds of learning, including content learning (Ketch, 2005; Murphy et al., 2009), social-emotional development (Doyle & Bramwell, 2006; Fettig et al., 2018), and language capacities, both spoken and written (Al-Adeimi & O’Connor, 2021), but the results of research are still subject to variable interpretations. Classroom discourse is difficult to study, for a variety of reasons. There are complex interactions between the talk and the content; for example, some aspects of a math lesson may allow greater opportunity for more dialogic moves than others (e.g., exploring a new concept vs. sharing the correct answers from a quiz). Similarly, text-based discussions about socially-relevant topics may be more conducive to dialogic talk than other topics. Researchers have problematized how representative a particular instance of classroom discourse might be, when one or two observations are taken to stand for a year’s exposure even though scores can vary, for example by time of day or the students present (Curby et al., 2011). Further, there is a large variety of discursive moves that can be observed, and the effect of each one on student outcomes is not well understood (but see Howe et al., 2019 for recent progress).
This complex challenge—a seemingly crucial dimension of classroom learning that is very difficult to study—has led to a variety of tools to measure different characteristics of classroom talk. Many are “high inference;” that is, they use categories such as “encouragement” or “transactivity” (Berkowitz & Gibbs, 1983), which can make establishing reliability a difficult task (Elizabeth & O’Connor, 2014). Some, such as the DIT (Dialogic Inquiry Tool) and ART (Argumentation Rating Tool) (Reznitskaya et al., 2011; Reznitskaya & Wilkinson, 2021) pose relatively highinference questions about the degree to which a classroom episode meets the expectations that are part of the rater’s understanding of “dialogicality.” Tools like these can work well with trained observers rating an episode, providing a metric with which to assess the impact of this form of instruction on various outcomes. However, higher-inference measures require high levels of training to achieve reliability (e.g., Murphy et al., 2017), which can be a limitation.
Complex coding schemes can also be challenging to train coders to acceptable levels of reliability and a significant amount of maintenance is often needed to prevent rater drift. Some higher-inference tools require observers to consider all evidence observed over a specified period of time, such as a twenty-minute observation, and quantify the overall quality and quantity of discourse or interactions with a score on a Likert-type scale, often completing ratings of 10 or more constructs simultaneously (e.g., Pianta et al., 2012). Even with well-defined anchors used for scoring, coders must weigh and average the evidence over lengthy observation time frames, a process that is vulnerable to a reasonable amount of subjectivity and measurement error. Similarly, when applying tools such as the SDI (Science Discourse Instrument) and ART, dialogic moves are rated as occurring “rarely, occasionally, or consistently” (Fishman et al., 2017, p. 21) or “advancing, developing, or not yet,” (Reznitskaya & Wilkinson, 2021, p. 4) which would be likely to vary more from coder to coder than actual counts of such moves.
In contrast, it is possible to take what we might call a “minimal-inference” approach, by quantifying elements that require little to no inference, such as “number of words spoken by the teacher” as compared to “total words spoken by students,” length of turns in words, or relative amounts of time speaking. These are far easier to code and can be useful as metrics to distinguish drastically different approaches to classroom talk, but do not provide insights into the interaction itself. In addition, such low-inference approaches are rare and those that exist are typically rudimentary (e.g., total words spoken), leaving a significant gap in the field of discourse research.
In this paper, we present results of a validity study for a new tool that was designed to be both “low inference” and to offer a degree of specificity and granularity by using counts of easily identifiable dialogic moves. It aims to allow exploration of dialogic discourse in a classroom by tracking highly recognizable utterance-level discursive moves by both teacher and students. Called the LIDO, or Low-Inference Discourse Observation, it is a tool whose purpose is to use utterance-level evidence to make inferences about the degree to which there is potential for dialogic discourse in a classroom (O’Connor et al., 2016; O’Connor & LaRusso, 2014). As an example, consider one of the teacher moves that is highly indicative of this potential: any attempt to get students to address a contribution of a classmate. This might be as focused as “Glennie, what do you think about what Frankie just said? Do you agree or disagree?” or as open as “Does anyone want to add on to that?” Any time a teacher attempts to provide support for students to respond to one another, this indicates the potential for a dialogic interchange. Importantly, the LIDO also tracks student moves that indicate the potential for a dialogic interchange, such as utterances that directly address another student: “Shayla, I think I kind of agree, but I would also say...”.
There are numerous coding systems that track utterance-level discursive moves, such as Hennessy’s et al. (2016) Scheme for Educational Dialogue Analysis. They show that these kinds of tools can power detailed explorations of outcomes (Howe et al., 2019). However, as Hennessy et al. (2020) point out, the process of coding discourse is itself “immensely complex and demanding [and has] taxed researchers over decades” (p. 1). With deep awareness of this complexity, we attempted to identify a relatively small number of moves that would be maximally indicative of dialogic potential (or its absence), without aiming for a comprehensive coding of every utterance. We also chose categories that would be relatively easy to identify, so as to increase reliability. As such, the LIDO allows for an intuitive, low-inference (i.e., less complex) approach to identifying dialogic, and less dialogic, aspects of classroom talk produced by both teachers and students. Unlike tools that are designed for use in specific subject areas such as the SDI (Fishman et al., 2017) or the Protocol for Language Arts Teaching Observations (PLATO; Grossman et al., 2015), the LIDO can be used in all subject areas and in multiple grades. Additionally, the LIDO codes teacher and student talk moves separately. This is different from a more typical discourse analytic approach, in which exchanges between two or more people are analyzed as wholes. In our approach, the coding of moves might be construed as more surface-level, but it allows for faster and more reliable coding, and ultimately allows researchers to examine the unique contributions of specific teacher and student talk moves to classroom discourse and student outcomes. In the Procedure section below, we present a detailed description of the LIDO categories, with a rationale for their selection.
The Current Study
In this paper, we present the LIDO validation study which utilizes a subset of data from a 2-year experimental study of an academic vocabulary program with an instructional strategy that emphasizes classroom discussion (Snow et al., 2009). The LIDO was developed as part of the larger project, funded by the United States Institute of Education Sciences (IES), in order to have a low- inference classroom discussion tool to assess differences between groups and over time. The current study capitalizes on a large sample of classroom observations collected for the experimental study to assess the validity of the LIDO. These observations were coded with both the LIDO and the CLASS-S (Classroom Assessment Scoring System, Secondary; Pianta et al., 2012), an observation tool that assesses overall quality of classroom interactions, as well as specific dimensions particularly relevant to examining classroom discourse, such as Instructional Dialogue. The current study is guided by the following research question: Is the LIDO a valid observation tool? To answer this, we examine patterns in the correlations among student and teacher moves coded with the LIDO (internal validity), as well as correlations between LIDO scores and CLASS scores (convergent validity).
We hypothesized that CLASS Instructional Support (composite of dimensions) and specific CLASS dimensions (particularly, Instructional Dialogue) most related to the constructs assessed by the LIDO will have positive, low to moderate correlations with LIDO teacher and student moves, with student moves having weaker correlations than teacher moves given that the CLASS emphasizes teacher behavior over student behavior. We also hypothesized that individual LIDO teacher moves most likely to contribute to higher quality interactions and dialogue, such as asking contestable questions, would have higher correlations with CLASS scores than other teacher moves, such asking a quiz-like question with a known answer. Similarly, we expect that LIDO student moves likely to characterize higher quality classroom interactions and discussion, such as the sharing of claims and evidence, will have higher correlations with CLASS scores, particularly Instructional Dialogue and overall Instructional Support, than other student moves, particularly Instructional Learning Formats, which is a construct that focuses more on engaging presentation of materials and less on classroom interactions and discussion.
We also hypothesize that LIDO teacher and student moves will be positively correlated with each other at low at moderate levels. In addition, higher correlations are expected for teacher moves that are most likely to encourage specific student moves. For example, teacher moves prompting student-to-student interactions may be more highly correlated with moves that capture students engaging with each other. The teacher asking contestable and semi-open questions and pressing for students to express their reasoning will likely be more highly correlated with extended student utterances. And the teacher asking quiz-like questions we expect to be more highly correlated with minimal student utterances. We also expect that among the different teacher dialogic talk moves, the lowest correlations will be with the teacher asking quiz-like questions, which is theoretically less likely to promote high quality talk than other teacher moves.
Method
Sample
Observations were conducted in fourth through seventh grade classrooms in 18 K-8 schools located in two school districts in the United States, one in a major city and one in a small city, both serving ethnically diverse, primarily low-income students. These 18 schools are a subsample of a larger study which utilized a pairwise matching procedure to try to achieve demographic similarity of the intervention and control groups and a random numbers generator to randomize schools within 12 matched pairs (Jones et al., 2019). Classroom observation data was collected in all fourth through seventh grade classrooms within nine of these matched pairs, in the Fall and Spring of each year, across two academic years. A total of 752 audio-recorded classroom observations were utilized for the present study. Rather than observing classes for a pre-determined amount of time, we asked teachers to allow us to observe an entire lesson and the observation duration was based on the total length of the lesson. The average time was M ± SD = 42.6 ± 11.9 minutes, with sessions ranging from 8 to 106 minutes in length. Observations were relatively balanced across fourth (27%), fifth (26%), sixth (23%), and seventh grade (25%) classrooms and observations were conducted in a range of classes: English Language Arts (64%), Mathematics (11%), Science (12%) and Social Studies (13%). Although the LIDO can be used with obervations of any length, for validity analyses, the sample of coded observations was limited to those that were long enough to allow CLASS scores to be calculated with a minimum of three coded segments per observation (n = 634).
Procedure
Classroom discussions were audio-recorded by research assistants, using digital audio recorders. Audio recordings, as opposed to live observations, had several advantages: observations could be coded multiple times, allowing the same observations to be coded with multiple coding systems (see below); recordings could be paused and sections could be replayed as needed to support accuracy in coding; and double coding could be completed to assess inter-rater reliability for both coding systems, which was less disruptive and less challenging to schedule than sending four coders into a classroom (two coders per coding system) for a single observation. Classroom lessons were recorded with three to four recorders placed around the room with one recorder, identified as the main recorder, placed at the front of the room by the teacher. The additional recorders served the purpose of allowing coders to utilize audio from other points in the room as needed when teacher or student talk was not clear in the audio file obtained with the main recorder. Research assistants set up the recording equipment and collected key (un-identifiable) information that could not be discerned from audio recordings, such as the layout of the classroom and locations of recorders, number of students present, and classroom activity type and format.
The decision to use audio recordings over video recordings, which is often the preferred method, was based on two issues critical for obtaining valid, high-quality data. First, audio recorders were less intrusive and called less attention than video cameras, allowing interactions to be captured with less behavioral alteration or performance from individuals in the classroom who would be more likely to be aware and self-conscious that they were being recorded with video. Second, the Institutional Review Board (IRB) determined that consent from students in the classroom during the observation would only be necessary if the lessons were video recorded. If observations were video recorded, with consent required, students without consent would have to be removed from the classroom or moved to a side of the room out of the reach of the video camera which would have significantly altered the classroom dynamic so that it would no longer be representative of interactions within the classroom. Though consent was not required, procedures were still enacted, as approved by the IRB, to protect the privacy of individuals whose voices were on the recordings, including the use of codes as identifiers for the audio files; downloading audio files after returning to the research lab onto secure, encrypted project computers; and providing access to files only to members of the research team with signed confidentiality statements.
The audio-recordings were coded using two classroom observation tools: the Low-Inference Discourse Observation tool and the Classroom Assessment Scoring System.
The Low-Inference Discourse Observation (LIDO) tool
The LIDO is a theory-driven tool (Al-Adeimi & O’Connor, 2017) that operationalizes classroom talk at all grade levels, and draws on previous work on accountable talk or academically productive talk (Michaels et al., 2008; Michaels & O’Connor, 2015; O’Connor & Michaels, 2018) and measures of classroom talk identified by researchers over the years (e.g., Mercer et al., 1999; Nystrand et al., 2003; Reznitskaya et al., 2011; Soter et al., 2008; Wells & Arauz, 2006). To capture the range of dialogicality during whole-classroom talk, the LIDO examines both student and teacher talk turns and notes those that fall into carefully chosen categories. Several of these categories capture Alexander’s (2018, p. 566) “Five Principles” that define his model of dialogic discourse: (1) Collective (classrooms provide opportunities for collective learning); (2) Reciprocal (participants listen, share ideas, and consider different perspectives); (3) Supportive (participants express ideas without feeling embarrassed over incorrect answers and collectively work to reach common ground); (4) Cumulative (participants expand their understanding by integrating their own and others’ contributions); and (5) Purposeful (classroom talk is structured with specific learning goals). The first of these, ‘collective,’ refers to aspects of classroom culture that enable dialogic talk and may not be tangible, whereas the rest can be observed directly.
LIDO Categories
LIDO Utterance Codes with Examples.
We begin coding by looking for T1: teacher turns that prompt student-to-student interactions about content. Next is T2: the teacher follows up with a particular student by asking for clarification, examples, or evidence. This type of move exemplifies two qualities of dialogic talk at once: it is a consecutive follow up with the same student, asking for further information, thus supporting the student in building their contribution to the group discourse. It also probes deeper into their reasoning, building towards cumulativity by supporting coherence. T3 is the code used for the many ways a teacher might use “active listening” or supportive follow ups to keep the same student talking. Again, this category builds towards cumulativity by supporting extension of student turns. The next three teacher codes form a separate subcategory: each addresses utterances that include a prompt or a question. T4 is coded when the teacher asks an open-ended, contestable question; T5 is a semi-open question, seeking student thinking about a question with several possible solutions; and T6 involves asking a quiz-like question with a known response (See Table 1 for code descriptions and examples).
Coding of student moves begins with the categories S1 and S2: turns in which students either directly address another student about their contribution (S1) or explicitly refer to another student’s contribution (S2). We distinguish between S1 and S2 by noting that one involves the presence of a teacher (i.e., S2, where students are speaking to the teacher about another student’s contributions) and the other, more dialogic move (S1) involves students responding to each other directly. These moves (similar to T1) register both collectivity and reciprocity. Mirroring T2 is S3 which is coded when students present and support their claim(s) with evidence and/or reasoning. Another signal of reciprocity is S4: students asking the teacher content-related questions. These may be for clarification or for expansion of the topic. Lastly, S5 and S6 form a separate subcategory. Both are signs that index dialogicality or the lack thereof only through their relative frequency. An S5 is an extended student utterance that is longer than a single clause. An S6 is a minimal student utterance that is a single clause or less (See Table 1 and Figure A1).
One final aspect of the coding process is that not every utterance is coded, and no utterance receives more than one code. Priority is given to the codes that explicitly address the collective and reciprocal nature of potentially dialogic discourse: T1 is considered first, followed by T2, then T3. If none of those apply, the coder considers T4, 5 and 6. If none of those apply, the utterance receives no code. The same ‘decision tree’ approach is used for the student codes, where priority was given to the most dialogic moves (i.e., first S1, then S2, etc.) If students were speaking to one another directly or indirectly, they received an S1 or S2, even if these responses are minimal (i.e., a single clause or less.) If a student presented a claim and or a series of justifications, this utterance would be coded as an S5. However, if claims and justifications were present together, that utterance would receive an S3 code. All student questions that were relevant to the topic were coded as S4, and all responses that did not belong to the previous categories and were minimal would receive an S6. Thus, a decision tree approach was used to ensure that context was considered such that credit would be given to high dialogic moves such as S1 and S2 even if they were minimal in their form (e.g., “Yes, I agree”) (Figures A1 and A2).
LIDO Coding Procedures and Training
The LIDO was used to code 752 classroom discussions using audio recordings. To accomplish this, 14 research assistants were recruited and trained to code talk that was produced by teachers and students during whole class discussions. Before they were assigned files to code, however, the following process was implemented.
First, creators of the LIDO individually coded two classroom discussions using the teacher and student LIDO codes, and met afterwards to discuss their coding. Once agreement was reached for both discussion files, consensus files reflecting agreement of codes were created. Then, research assistants were trained by the research team over a period of 2 days, each day focusing on either student or teacher coding. After training, each coder was asked to code the files previously coded by the research team, which were then compared by the research team against the consensus files. Coders were given feedback on their coding and the research team addressed their coding questions.
Coders were then paired together, and each pair underwent several rounds of training, where each person in the pair first coded the audio recording separately, then coders compared and discussed their codes to produce a “consensus” document. Using LIDO creators’ coding of the same files, these consensus files were evaluated, and coders were once again provided with feedback when disagreements between expert and novice coding emerged. Following a number of rounds, coders were then assigned to independently code audio files using the LIDO. Thus, for each classroom in the study, a student coding document and a teacher coding document was generated.
To guide coding, we provided coders with decision flow charts (Appendix, Figures A1 and A2) and documents explaining each code, using various examples. Each codable student or teacher turn was coded minute-by-minute using audio data. We emphasized certain coding principles such as instructing coders not to make judgments or high-level inferences about what was said or to whom it was said, without concrete evidence. We also instructed coders to rate questions by their form, not the response they received. For example, an open-ended teacher question was coded as a T4, whether or not it received a dialogic student response.
LIDO Reliability
To assess inter-rater reliability, we double-coded 10 intervention and 15 control audio files, representing 5% from each wave of data collection. We utilized raw, count data generated by coders for each of the 12 moves of classroom talk. Given that count data are continuous, we calculated correlations between first and second coders’ ratings for each individual move as well as for the total counts of teacher moves and student moves. Correlations between raters for total counts of student moves was very high in magnitude and statistically significant (r (23) = .97, p < .001), while coders’ total counts for teacher moves was also found to be highly correlated and statistically significant (r (23) = .97, p < .001). When separated by each student and teacher move, we found moderate to high correlations among most moves, and no correlations among two rare talk moves. Specifically, teacher talk correlations were as follows: T1: prompting student to student talk (r (23) = .20, p = .329), T2: press for reasoning (r (23) = .86, p < .001), T3: active listening (r (23) = .56, p = .003), T4: contestable questions (r (23) = .69, p < .001), T5: semi-open questions (r (23) = .72, p < .001), and T6: quiz-like questions (r (23) = .89, p < .001). As such, all moves generated moderate to high statistically significant inter-rater correlations except for T1, which was not correlated. This can be explained by the rarity of this talk move, as T1 scores per minute of classroom talk was the lowest among all talk moves, with a mean of 0.01 moves per minute (Table 3). In the data we double-coded, four classrooms were identified as containing this move by the first rater, while only one classroom was identified by the second rater. We anticipate finding a correlation among raters with a sample that contains a higher number of classrooms with this move present.
Similarly, we found high inter-rater correlations for all but one student talk move: S1: direct student talk (r (23) = .73, p < .001), S2: indirect student talk (r (23) = .95, p < .001), S3: claim with evidence (r (23) = .64, p = .001), S4: student questions (r (23) = .09, p = .685), S5: elaborated talk (r (23) = .76, p < .001), and S6: minimal talk (r (23) = .97, p < .001). Coding for moves such as S6, which occurred most frequently in the data (M = 0.97 moves per minute) were highly correlated, whereas one student talk move, student questions (S4) was not correlated at all. Frist, this move was infrequent (M = 0.06 moves per minute; Table 3), which could partly explain this discrepancy. Further, students often inflected their responses, which some coders may have interpreted as questions whereas others may have coded these as S5 (elaborated talk) or S6 (minimal talk). Additionally, coders were instructed to only code content-related questions, which may have increased the level of inference required to identify this move compared to others.
Classroom Assessment Scoring System-Secondary (CLASS-S)
The CLASS-S (Pianta et al., 2012) is an observational instrument developed to assess classroom interactions in secondary classrooms. The CLASS includes 11 dimensions that are typically coded by an observer in the classroom or from video recordings. In the current study we utilized the five dimensions from the Instructional Support domain, including: (1) Instructional Dialogue (i.e., the frequency and distribution of cumulative and content-driven exchanges, and dialogue facilitation strategies); (2) Analysis & Inquiry (i.e., the degree to which the teacher facilitates students’ use of higher level thinking skills, such as analysis, problem solving, reasoning, and creation through the application of knowledge and skills); (3) Quality Of Feedback (i.e., the degree to which feedback expands and extends learning and understanding and encourages student participation); (4) Content Understanding (i.e., both the depth of lesson content and the approaches used to help students comprehend the framework, key ideas, and procedures in an academic discipline); and (5) Instructional Learning Formats (i.e., the ways in which the teacher maximizes student engagement in learning through clear presentation of material, active facilitation, and the provision of interesting and engaging lessons and materials). These five dimensions are used to create a composite score (based on the average of the five dimensions) that captures the larger construct of Instructional Support (based on analyses of the factor structure; Hafen et al., 2015).
For each dimension, trained coders assign global ratings based on a 7 point scale: low (1–2), mid (3–5) or high (6–7). All recordings of lessons were divided into 3–4 segments based on total length of the lesson. Each segment was rated on the dimensions listed above by coders who had completed CLASS training and certification as coders of CLASS through Teachstone, the organization that owns CLASS observation tools, trains observers, and confers certification based on satisfactory completion of an online reliability test. Because CLASS coding is typically conducted with video recordings or during live observations, additional training and reliability testing was conducted by the project team for coding these dimensions from audio files. When coding audio only, coders had high agreement with master codes conducted on video at the dimension level (percentage of observations coded within 1 point of master codes = 92%). In addition, a sample of 72 audio recorded lessons distributed across all data collection waves were double-coded and using the same parameter (raters coding within 1 point of each other), 89% agreement was achieved.
Analysis
First, we examined summary statistics for the raw data produced through LIDO data collection and coding. Then we converted LIDO scores from raw count data to rate data (moves per minute) in order to account for the varying lengths of the observations. To evaluate internal validity, we conduct and interpret correlations (Kendall’s Tau-b) among the LIDO teacher and student moves. To assess convergent validity, we examined correlations (Kendall’s Tau-b) between LIDO scores (based on rate data) and CLASS scores (including Instructional Support composite scores, as well as individual CLASS dimensions).
Results
Sample Summary Statistics
Summary Statistics for Dialogic Moves (Count Data).
Converting Count Data to Rate Data
Summary Statistics for Dialogic Moves (Rate Data).
Internal Validity
Correlations (Kendall’s Tau-b) Between LIDO Student and Teacher Dialogic Moves (Rate Data) (n = 643).
∼ p < .1, *p < .05, **p < .01, ***p < .001.
In examining correlations among different teacher moves, we also see that the majority of correlations were significant (14 of 15) and the significant correlations are all positive and in the low range (r (641) < .3), indicating that teacher moves are related but also distinct constructs. A similar pattern is found for student moves. All student moves are significantly and positively correlated, primarily in the low range, with the exception of one moderate correlation (r (641) = .42, p < .001) between the two student moves that involved interacting with peers (S1: direct interaction and S2: indirect interaction.) These correlations among student moves also suggest that they are related but distinct constructs.
Convergent Validity
Correlations (Kendall’s Tau-b) of LIDO Teacher and Student Dialogic Moves (Rate Data) with CLASS Dimensions and Instructional Support Composite Score from Coded Classroom Lessons (n = 643).
∼ p < .1, *p < .05, **p < .01, ***p < .001.
Each of the LIDO Student Moves are significantly positively correlated with CLASS-IS and almost all CLASS dimensions (35 of 36). Correlations are in the low range; however, the magnitude of correlations vary and close to half of the correlations are between .2 and .3. The LIDO student moves that have the highest correlations with CLASS-IS are the more common moves: student extended utterances (S5, r (641) = .27, p < .001), students making claims with evidence (S3, r (641) = .26, p < .001), and student minimal utterances (S6, r (641) = .24, p < .001). The magnitude of correlations with individual CLASS dimensions varied across student moves. Student moves involving student to student interactions (S1 and S2) and making claims with evidence (S3) are most highly correlated with the CLASS Instructional Dialogue (r (641) = .12, p < .001; r (641) = .20; r (641) = .25, p < .001) and Analysis & Inquiry (r (641) = .15, p < .001; r (641) = .23, p < .001; r (641) = .25, p < .001) dimensions. S4 (student questions) has the lowest correlations overall, though the correlations of greatest magnitude were with the CLASS dimensions: Quality of Feedback (r (641) = .16, p < .001) and Content Understanding (r (641) = .15, p < .001). Student extended utterances (S5) are most highly correlated with CLASS Quality of Feedback (r (641) = .28, p < .001) and Instructional Dialogue (r (641) = .21, p < .001) . Lastly, student minimal utterances (S6) have the highest correlations with CLASS Quality of Feedback (r (641) = .28, p < .001) and Content Understanding (r (641) = .25, p < .001) scores. Overall, CLASS Instructional Learning Formats had the lowest correlations with LIDO student moves.
Each of the LIDO Teacher Moves are significantly positively correlated with CLASS-IS and all CLASS dimensions, with correlations in the low range and the magnitude of correlations generally being lower than correlations between Student Moves and CLASS scores. The largest correlation with CLASS-IS is with T2 (teacher press for reasoning, r (641) = .25, p < .001). In terms of individual CLASS dimensions, correlations varied across Teacher Moves. T1 (prompting student to student interactions) is most highly correlated with CLASS Analysis & Inquiry (r (641) = .16, p < .001), Quality of Feedback (r (641) = .15, p < .001), and Instructional Dialogue (r (641) = .15, p < .001). The highest correlations for both T2 (press for reasoning) and T3 (active listening) are with CLASS Quality of Feedback (r (641) = .27, p < .001; r (641) = .19, p < .001) and Instructional Dialogue (r (641) = .23, p < .001; r (641) = .18, p < .001). Contestable question (T4) is most highly correlated with CLASS Analysis & Inquiry (r (641) = .22, p < .001) and Instructional Dialogue (r (641) = .18, p < .001) and the highest correlation for semi-open question (T5) is with Instructional Dialogue (r (641) = .18, p < .001). Lastly, T6 (quiz-like questions) is most highly correlated with CLASS Quality of Feedback (r (641) = .23, p < .001) and Content Understanding (r (641) = .22, p < .001) and has the lowest correlation with Analysis and Inquiry (r (641) =.06, p = .036). The CLASS dimension that, in general, has the lowest correlations with LIDO teacher moves is Instructional Learning Formats, the dimension that is least dialogic.
Discussion
This paper introduces the LIDO, a low-inference tool that captures discourse moves produced by students and teachers in whole-classroom discussions. While the LIDO can be used across all grade levels, we present validation results using a sample of 643 classrooms of early adolescents in grades four through seven. Unlike high-inference discourse measures, the LIDO involves counting specific dialogic moves, which results in coding with less subjectivity than other approaches; coders’ judgments are based on structural characteristics rather than deeper contextual interpretation. Importantly, while the LIDO does not capture information about the quality of discourse practices (and should not be used to evaluate teacher performance), it provides important information about both students’ and teachers’ potential range of dialogic discourse in a given discussion. Capturing student and teacher talk moves allows for analysis of the relationships between specific student- or teacher-produced talk moves and post-discussion student outcomes such as reading comprehension or persuasive writing (Al-Adeimi & O’Connor, 2021).
When coding with the LIDO, dialogic moves are counted in one-minute increments, meaning that the LIDO is a flexible tool that can be applied to lessons or observations of variable lengths. Raw count data collected with the LIDO is converted to rate data by dividing the total counts for a given observation by the total number of minutes. Given the patterns of low and high frequency moves across observations (i.e., some moves are particularly rare relative to others) and given that some moves are more likely to contribute to higher quality discourse, LIDO student and teacher moves should be analyzed as separate dimensions and not used to create composite student or teacher scales. Our analyses also revealed that, as expected, the count data collected with the LIDO are highly positively skewed with an overabundance of zeros or non-occurrences. Future research that utilizes LIDO scores as predictors or outcomes will need to utilize models that are appropriate for count data and address the skewness.
Validity of the LIDO scores was established by examining first, the relationships among LIDO Student and Teacher moves and second, the patterns of correlations between LIDO scores and scores obtained by applying another observational tool to the same data, specifically the Instructional Support domain of the CLASS. As expected, correlations between different LIDO teacher moves are small, but significant and positive (except the correlation between quiz-like questions and contestable questions), indicating that teacher moves are related but also distinct constructs. Similarly, LIDO student moves are significantly and positively correlated with each other, primarily in the low range, suggesting that student moves are related but distinct constructs. However, we did expect that these correlations would be a mix of low and moderate levels and were surprised to find that almost all of the coefficients were less than .3 and it was correlations between teacher and student moves that had a mix of low and moderate level coefficients.
In terms of correlations between teacher and student moves, we hypothesized that higher rates of certain teacher moves would be associated with higher rates of certain student moves, and in fact, correlations between teacher and student scores were largely significant and positive, at low to moderate levels, with the strongest correlations between moves that would be expected to have stronger relationships. For example, the highest correlation is between students’ minimal utterances (S6) and teachers asking quiz-like questions (T6) which are more likely than open-ended questions to result in a yes, no, or other short answer. In practice, this reinforces the hypothesis that lower-order questions prompt lower-order responses from students. Meanwhile students’ extended utterances (S5) were most strongly correlated with more dialogic teacher moves: asking semi-open questions (T5), active listening (T3), and teacher presses for reasoning (T2) and least strongly correlated with teachers asking quiz-like questions. Similarly, a significant moderate correlation was found between teachers prompting student to student connections about content (T1) and students referring to each other’s responses or contributions (S2). When teachers cultivate dialogic environments, previous research has found improved student outcomes such as persuasive writing (Al-Adeimi & O’Connor, 2021), and end-of-year exams (Howe et al., 2019). While the correlations in this study do not establish a link between teacher talk and student outcomes such as writing or comprehension, they show the importance of dialogic teacher practices for establishing a dialogic environment in which students also engage dialogically, which can be considered an outcome in and of itself.
We hypothesized that correlations between CLASS and LIDO scores would be low to moderate, and, in fact, correlations were primarily in the .1 to .3 range, indicating that the LIDO is, to some degree, assessing classroom interactions similarly to the CLASS, but at the same time, LIDO scores clearly represent classroom talk constructs that are distinct from the CLASS, in particular providing specific measurements of teacher and student dialogic moves in a given classroom lesson. CLASS dimensions have higher correlations with certain student and teacher moves in patterns that were expected given that those moves (e.g., teachers pressing for reasoning or asking open-ended questions or students making claims with evidence) are more likely to occur in classrooms considered to have higher quality interactions and instructional support. As expected, correlations are lowest with the CLASS dimension Instructional Learning Formats which focuses least on verbal interaction, focusing specifically on teachers’ promotion of student engagement by providing interesting lessons and materials.
As for LIDO student moves, correlations with CLASS scores are highest for LIDO S3s (claims with evidence) and S5s (extended utterances) which are moves that can suggest high quality interactions. We find lower correlations with CLASS scores for S1s (direct student to student talk), S2s (indirect student to student talk), and S4s (students asking content-related questions). These are the rarest of the student moves and perhaps least likely to overlap with CLASS dimensions, which do not emphasize these types of student behaviors as indicators. The correlations for S6s (minimal student utterances, such as “yes,” “no,” or “I don’t know”) include a mix of higher and lower correlations. For example, S6 has higher correlations with Content Understanding (such minimal student utterances are likely more common when teachers may be checking for comprehension) and Quality of Feedback (although the goal is feedback loops, it’s likely that those loops include a mix of minimal and extended utterances). In contrast, S6 had a smaller correlation with Analysis and Inquiry which would be expected since this CLASS dimension is focused on student problem-solving and reasoning, suggesting a stronger conceptual overlap with LIDO S3s (claims with evidence) and S5s (extended utterances) which, as stated above, was the pattern observed.
Across LIDO teacher moves, correlations with CLASS dimensions were primarily as expected. For example, the CLASS dimension Instructional Dialogue assesses the frequency of cumulative and content-driven exchanges, as well as dialogue facilitation strategies, and is perhaps closest in content to the LIDO. This dimension has higher correlations withT2s (press for reasoning), T3s (active listening), T4s (asking an open-ended, contestable question) and T5s (asking a semi-open, how question), which are moves that can suggest high quality interactions. In contrast, T1s (prompting student to student interactions about content) has somewhat lower correlations with CLASS scores in general. This is the most rare teacher move and has the least in common with CLASS dimensions, with none of the CLASS dimensions focusing explicitly on this teacher behavior as a coding indicator. The correlations for T6 (quiz-like questions) include a mix of higher and lower correlations. For example, T6 has higher correlations with Content Understanding (quiz like questions are a common strategy to check for comprehension) and Quality of Feedback (to extend student learning, it may be common for teachers to start with a quiz-like question, but expand the learning with feedback loops). In contrast, T6 had a smaller correlation with Instructional Dialogue and with Analysis and Inquiry which would be expected since higher scores on these CLASS dimensions would require greater use of open-ended and semi-open ended questions (T4s and T5s), both of which do have higher correlations with Instructional Dialogue and Analysis and Inquiry.
Lastly, we found that, in general, individual LIDO Student Moves have larger correlations than individual Teacher Moves with CLASS scores, which was surprising given that teacher behavior is more heavily emphasized than student behavior in most CLASS dimensions. Nevertheless, the general patterns seen in the correlations between LIDO and CLASS scores follow patterns that support the validity of the LIDO.
Limitations and Future Directions
Since the current study evaluated the validity and reliability of the LIDO in grades 4–7 and in primarily low income, urban schools, future studies should use the LIDO with a wider range of grade levels and socioeconomic indicators in order to examine how well the LIDO performs in other settings. We also anticipate higher inter-rater reliability when using video data and certainly when using transcripts for coding. Higher reliability is also expected with a larger sample size that includes more instances of the rarest moves; that is, selecting classrooms with an over-representation of the rarest teacher move (T1) could generate higher inter-rater reliability for coding this move. Since the time of this study, auto-transcription software and transcription services have become much more widely accessible and less expensive, thereby allowing large projects such as ours to utilize transcripts for coding, which would have also allowed us to include more data for inter-rater reliability. Nevertheless, this study demonstrated the viability of coding directly from audio files, not only for the LIDO, but also for the Instructional Support domain of the CLASS. Coding from audio files also does provide some substantial advantages over live observations, similar to video, such as the ability to pause or slow down the recording and to rewind and replay more difficult to code sections, such as ones with overlapping talk or rapid exchanges. In general, it is not advisable to use the LIDO to code live observations since the decision flowchart requires time to analyze each move for the best fitting code, and since coding teacher and student moves simultaneously is difficult to achieve with a high degree of accuracy.
Importantly, while the LIDO assesses moves that capture the range of dialogicality, it is not a measure of quality per se and should therefore not be used for evaluative purposes (e.g., teacher evaluations). That said, some teacher utterances captured by the LIDO are hypothesized to build students’ critical thinking skills (open-ended, contestable questions), while others (e.g., quiz-like questions) are better-suited for recall rather than building higher-order skills (Alexander, 2008; Cazden, 1988). On the other hand, student utterances captured by the LIDO can be dialogic in form (e.g., students speaking to one another about content), but the LIDO does not capture the quality or outcome of such interactions. In addition, S6 codes (minimal utterances) and T6 codes (quiz-like questions) represent moves that can be more complicated to interpret. These moves can be effectively interwoven in high quality classroom discussions, depending on frequency of other moves, but when S6s and T6s are present with few to none of the other dialogic moves, the lesson observed may be more “monologic” in nature and may be failing to provide conditions for higher quality discourse. Future studies can explore scoring that examines T6 and S6 codes separately and identifies profiles of classrooms and the conditions (frequency of other moves) under which higher or lower frequency levels of T6s and S6s support high quality discourse and student learning.
Despite these limitations, the LIDO offers several advantages that may prove useful in future research on classroom talk. First, the LIDO produces scores for six distinct moves of both students and teachers, allowing one to assess, for example, whether a new curriculum or teacher training leads to changes in teachers’ use of open-ended, contestable questions versus quiz-like retention questions. In addition, because LIDO data are collected by counting the occurrences of different student and teacher moves, the scores created reflect measurable quantities rather than Likert-type scale ratings which may require more inference. Also, the LIDO has been validated with data collected through audio recordings and the type of student and teacher moves that are tallied are fully observable from recordings of verbal utterances in the classroom. Live classroom observations are notoriously challenging and expensive to conduct. Video recordings can be equally challenging and costly, in addition to the likelihood of not capturing the typical dynamic in a classroom due to changes in classroom composition with some, if not many, parents declining permission for their child to be video-recorded. Lastly, with the recent rise in online instruction, a tool that measures classroom talk purely from audio may in fact be in high demand as the education field adapts to this new reality and works to build, refine, and assess practices that facilitate productive classroom talk during synchronous online instruction.
Conclusion
In sum, this study demonstrates evidence of both internal and convergent validity of the LIDO, a flexible observation tool that can be used to observe a variety of classroom settings (e.g., different grades, different subject areas), requires access only to audio of classroom lessons, and produces data on different discourse moves. Another major contribution of the LIDO is that it includes scores representing student behaviors distinct from teacher behaviors, allowing future studies to investigate how individual teacher and student moves can contribute differently to the dialogicality of discussion. Widely used tools that assess classroom interactions, such as the CLASS, are particularly useful for a providing a holistic view of classroom interaction quality, but do not allow teacher and student moves to be teased apart and examined independently. In this way, the LIDO may be especially useful for developmental research in schools. For example, peer interactions are an important aspect of developmentally responsive classrooms, particularly in early adolescence (Eccles & Roeser, 2011). The LIDO includes codes for moves in which students directly and indirectly address their peers. Such peer interactions are not typically assessed as part of productive classroom talk, but developing communication skills with peers in the context of academic discourse (e.g., “I disagree with Sarra because…”) may have broad implications for future success in both educational and professional settings. Additionally, individual moves can be connected to student outcomes, as was done in a recent study that examined teacher talk moves in relation to students’ writing (Al-Adeimi & O’Connor, 2021).
Conducting research in school settings is notoriously complex. The field of educational research needs a range of tools to best assess what is working, how it works, and under what conditions. Some studies require a global assessment, while others demand micro-level data on very specific skills or behaviors. Most studies also must contend with logistical challenges, such as the impracticality of live observations or video recordings of classroom interactions. The LIDO offers flexibility and strengths that address these needs and challenges, and should prove useful to advancing our understanding of the conditions necessary for higher quality discourse that supports student learning and success. In particular, the success of today’s youth depends on essential 21st century skills, such as critical thinking, communication, and problem-solving (Partnership for 21st Century Learning, 2016), skills that are often developed through a complex interplay of student and teacher discursive behaviors that contribute to dialogically complex and engaging classrooms.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by the Institute of Education Sciences (R305F100026).
Appendix
Decision flowchart for coding using the teacher LIDO.
Decision flowchart for coding using the student LIDO.
