Abstract
It has been common to use video records of instruction in teacher professional development, but participants have rarely been encouraged to evaluate teachers and students’ actions in those records, allegedly because evaluation deters from the development of a professional discourse. In this study, we inspected teachers’ online discussions of animations of classroom episodes realized with cartoon characters, looking at the difference in the content of conversation turns when members made evaluative comments and when they did not make evaluative comments. We were interested in finding out whether making evaluative comments correlated with participants’ reflection on their professional practice and proposal of alternative teaching actions; for that purpose we used systemic functional linguistics (SFL) to develop a coding scheme that attended to evaluation, alternatives, and reflection in forum discussions. We found statistically significant evidence that the more the participants actively evaluated the teaching in the animations, the more they proposed alternative teaching actions and reflected on instructional practice. We relate these findings to the notion of social presence in online discussions.
Keywords
Introduction
Both online and off-line discussions have been commonly used for teacher learners to exchange ideas and learn from one another about their professional practice (Barab, Kling, & Gray, 2004; Fishman & Davis, 2006). Shared resources (e.g., specific artifacts that illustrate or elicit professional knowledge) are instrumental to make members’ conversations meaningful by inviting them to express personal ideas, to ask important questions, to comment on others’ ideas, and so forth (Wise, Padmanabhan, & Duffy, 2009).
Video records of classroom interaction have often been used, as shared resources, to create face-to-face conversation contexts, for both preservice and inservice teachers to develop and elicit their professional knowledge and skills (e.g., Nachlieli, Herbst, & González, 2009; Star & Strickland, 2008; van Es & Sherin, 2008). Also, researchers and teacher developers have exploited information and communication technologies to sustain teachers’ online discussions, hence helping them to exchange, share, and learn about practical knowledge and skills with each other and with educational researchers and teacher educators (Barab et al., 2004; Fishman & Davis, 2006).
Although video records of practice have been and continue to be useful in supporting teachers’ discussions about their professional practice, members of discussion groups have rarely been encouraged to evaluate teachers and students’ actions in this kind of representation of practice, mainly because of the perception that such evaluations might deter from the development of a professional discourse (Jacobs, Borko, & Koellner, 2009; Seago, 2004). So, researchers of video-based professional development have had little chance to examine the role of evaluation in teachers’ discussions. Recently, researchers have started to use animations of classroom scenarios in sustaining teachers’ conversations about instructional practice (e.g., Herbst & Chazan, 2006; Herbst & Miyakawa, 2008; Herbst, Nachlieli, & Chazan, 2011; Moore-Russo & Viglietti, 2011; Moore-Russo & Wilsey, 2014; Tettegah, 2005). In our studies with animated classroom stories (Chieu, Aaron, & Herbst, 2013; Chieu & Herbst, 2013; Chieu, Herbst, & Weiss, 2011), we have observed that participants frequently made evaluations of the actions of the cartoon teacher. More important, we have noticed that making such evaluations can be positive in that it may go along with participants’ reflection on teaching actions or discussion of possible alternatives in teaching. This article investigates the role of evaluation in the postings made in a set of eight online forums associated with animated classroom stories, by comparing the quality of those postings where participants made evaluative comments with those postings where they did not make evaluative comments. Two of the desirable features of teacher discourse identified by Herbst and Chazan (2006), reflection and alternativity, were operationalized to account for the quality of postings. We examined the correlations between evaluation, as an indicator, and the probabilistic presence of reflection (whether or not the participants reflect on teaching actions that they notice in the classroom stories embedded into their discussion space) and alternativity (whether or not they propose alternative actions of teaching when they discuss a teaching decision in the embedded animations), as two features in professional discourse about practice.
Theoretical Framework and Related Work
Many kinds of technologies and communities have been implemented to support groups of both teacher candidates and practicing teachers in learning to do or in improving how they do the work of teaching (Barab et al., 2004; Fishman & Davis, 2006). In this section, we narrow our review to the use of technologies for sustaining collaborative learning by teachers. We especially look at how other researchers in the field study qualities of discussions in teacher learning, with a particular attention to correlations between evaluative comments and those qualities.
Video technologies have been a common choice for teacher educators to support teachers’ reflection and their learning to notice and interpret critical areas of classroom interaction (Rich & Hannafin, 2009; M. G. Sherin, Jacobs, & Philipp, 2011). Teachers have been often organized into face-to-face groups, with one or two facilitators, to view, examine, and discuss video records of teaching (e.g., Herbst & Chazan, 2003; Nachlieli, 2011; Star & Strickland, 2008; van Es & Sherin, 2008; Zhang, Lundeberg, Koehler, & Eberhardt, 2011). Those artifacts have sometimes included video captures of their own teaching, records of their peers’ instructional practice, or purportedly exemplary video records of teaching provided by third-party organizations (Zhang et al., 2011). An important characteristic of video records of classroom interaction is that they provide support for teachers to examine tactical and temporal entailments of instructional practice. The possibility to replay videos can help viewers spot important moments of a teaching episode and examine and discuss teaching tactics and strategies, student thinking, and so forth (Lampert & Ball, 1998).
An important assumption beneath the practice of using media representations of teaching to provoke discussions about teaching and eventually increase teaching capacity is that individuals who interact with this media can engross themselves with the actual practice being represented. Communication theorists have developed a variety of conceptions of presence (including social presence) to operationalize different ways in which individuals live the illusion that a mediated experience is not mediated (Lombard & Ditton, 1997; Oztok & Brett, 2011). In the case of teachers interacting with a representation of teaching and with each other, it is important to identify dimensions of those interactions that might contribute to such a sense of presence. Herbst and Chazan (2006) proposed a number of those dimensions, including alternativity (i.e., the capacity to consider alternative actions of teaching) and reflection (i.e., the capacity to inquire or speculate on reasons or consequences of actions in teaching), as important qualities to look for in teachers’ discussions of representations of teaching, qualities that we argue to be indicators of social presence.
Many studies in the literature on teacher learning have acknowledged the importance of supporting participants’ reflection (Rich & Hannafin, 2009; Schön, 1983; B. L. Sherin et al., 2010; Zhang et al., 2011) and alternativity (Nespor, 1987; Scott, 2005; Wilkins, 2008). Yet evaluation has not been considered systematically. Scholars do not seem to agree on the value of engaging viewers of classroom videos in evaluating features of classroom interactions. On one hand, the video club study (van Es & Sherin, 2008) has included evaluation as an outcome variable in its coding scheme. Also, Males, Otten, and Herbel-Eisenmann (2010) used a critical colleagueship framework in a way that promotes critical reflection and avoids damaging personal relationships within a group of mathematics teachers. On the other hand, in some teacher development environments where teachers view each other’s video records, facilitators often discourage participants from evaluating others’ teaching, allegedly to promote sensitivity and professional discussion (Jacobs et al., 2009). For example, LeFevre (2004) has emphasized “the importance of relational aspects” (p. 254) in the facilitation of discussions about classroom videos of unknown teachers and students and noted the importance to enable “teachers to be critical about teaching in a non-judgmental manner” (p. 254). Similarly, Barnett (1987) and Joyce and Showers (1980) have encouraged nonevaluative feedback among peers in coaching of leadership and teaching. Knight (2006) has also added, “Coaching is a non-evaluative, learning relationship between a professional developer and a teacher, both of whom share the expressed goal of learning together, thereby improving instruction and student achievement” (p. 36). Another study (Kelchtermans & Vandenberghe, 1994) has collected data about field notes of classroom teaching and interviews with individual teachers to create and give feedback on a professional development story for each teacher. A rule of thumb for the researcher who commented on the story was to avoid judgments and to adopt a nonevaluative attitude. In addition, in teachers’ discussion about their professional practice even if the facilitator does not explicitly discourage it, participants have been reluctant to evaluate teachers’ practice recorded in video (Seago, 2004; Zhang et al., 2011).
The tendency of avoiding judgments and adopting a nonevaluative attitude has existed not only in face-to-face conversations but also in online discussions. For instance, studies of web-mediated consultation conditions (Hadden & Pianta, 2006; Pianta, Mashburn, Downer, Hamre, & Justice, 2008) in teacher development have made a recommendation of “establishing a non-judgmental and non-evaluative supportive relationship” between a teacher and a consultant. Stiler and Philleo (2003) transformed strategies in face-to-face discussions to the design of a reflective, nonevaluative environment that supported candidate teachers in creating web-based journals or blogs about the practice of teaching. To foster communication and reflective inquiry about different perspectives on the realities of classroom practice, Edens (2000) engaged teacher education students in a nonevaluative setting where they could share their observations and concerns comfortably. Colasante (2011) also promoted a nonjudgmental or “safe” environment where preservice teachers in physical and sport education used an online media annotation tool to reflect on their teaching practice (in the form of video records) individually or collectively.
Although the above literature review has indicated a number of supporting arguments and evidence for promoting nonevaluative 1 comments in both face-to-face and online conversations about the work of teaching, there are important reasons why evaluation is important to look for in teacher discussions in an online context where the main communication channel is text. Indeed, in systemic functional linguistics (SFL), evaluation is the function accomplished by the elements of the appraisal system of language (Martin & White, 2005). Appraisal, in turn, is one of the systems of resources with which language realizes what Halliday calls the interpersonal metafunction of language (Halliday & Matthiessen, 2004), or how language enables speakers and writers to relate to their audience, particularly engaging the audience (Martin, 2000). Appraisal theory would predict that more evaluation tokens in a text mean more attempts to engage readers or listeners of the text. It could be argued that more engagement of writers and readers contributes to a sense of social presence in the forum; that is, the existence of tokens of evaluation in a posting could indicate that participants experience the forum participation as a real conversation with others.
Thus, while the literature documents the importance of reflection and alternativity in forum conversations and other teacher development encounters, the value of evaluation seems controversial. A study of the relationships between evaluation, reflection, and alternativity seems productive to help understand participants’ sense of social presence in teacher forums.
This article investigates whether making evaluative comments correlates with making reflective comments and with proposing alternative teaching actions. We examine this question in the context of online forums hosted in the LessonSketch platform (see Herbst, Aaron, & Chieu, 2013), which we describe in more detail in the next section. These online forums used animated representations of classroom episodes, where teacher and students had been represented with nondescript cartoon characters. There has been little research in the literature on the use of animations in teacher development (Herbst, Nachlieli, & Chazan, 2011; Moore-Russo & Viglietti, 2011; Moreno & Ortegano-Layne, 2008; Tettegah, Whang, Taylor, & Cash, 2008). An important advantage of the use of nondescript cartoon-based representations of teaching is that it makes it easier for the audience to focus on practices (as opposed to focusing on the individualities of people and settings) and more comfortable for the audience to appraise the cartoon characters’ actions. An earlier study (Herbst & Kosko, in press; Kosko & Herbst, 2012) indicated that while in general, participants’ levels of modality usage, which are included in the appraisal system in SFL, were similar when annotating either videos or animations, their level of normativity usage (e.g., “the teacher should . . .”) was significantly higher when they watched and discussed animations than when they watched and discussed videos. If evaluative comments were connected to increasing reflection and alternativity, animations might offer a useful alternative to video, about which Seago (2004) has noted that when teachers discuss video records “politeness and agreement is the norm” and that teachers tend to handle differences with comments such as “everybody needs to teach according to his style.” (p. 275). But, as noted above, we consider a key feature of learning from practice that participants be able to propose alternative actions and to reflect on instructional practice, and we conjecture that both of them correlate with some degree of appraisal.
LessonSketch: A Web-Based Interactive Rich-Media Environment for Teacher Learning
LessonSketch is a web-based, interactive rich-media environment that supports collaborative learning for teachers (Chieu & Herbst, 2012; Herbst, Aaron, & Chieu, 2013; Herbst, Chazan, Chen, Chieu, & Weiss, 2011). Its design has been grounded in activity theory (Engeström, 1999; Kaptelinin & Nardi, 2006; Leontiev, 1978; Vygotsky, 1978) and practice-based perspectives on teacher education (Ball & Cohen, 1999; Grossman et al., 2009; Lampert, 2010). A key characteristic that makes LessonSketch stand out among video-based learning environments for teachers is the use of cartoon-based representations of teaching: animations and storyboards in which cartoon characters represent scenarios of classroom interaction (Herbst & Chazan, 2006; Herbst, Chazan, et al., 2011; Herbst & Miyakawa, 2008). The use of nondescript cartoon characters can help design and create representations of teaching very flexibly. These representations can profit from some of the advantages of written narrative cases, such as the possibility to represent with icons the individualities and settings involved and thus control how much of those are relayed to, versus evoked from, the audience. Animations of cartoon characters can also profit from some of the advantages of video cases, such as the possibility to communicate multimodally (e.g., using gesture, facial expression, and body movement and position in addition to language) and to involve the audience in a temporality (i.e., a sense of how time flows) and timeliness (i.e., a sense that actions happen at the moment when they are needed) commensurate with that of real action (Herbst, Chazan, et al., 2011) and without video’s “multiple channels of distractions” (Goldsmith & Seago, 2011, p. 184).
LessonSketch’s users can use advanced communication tools to engage in collective and collaborative reflection. Unlike traditional and text-based communication tools (Barab et al., 2004; Fishman & Davis, 2006), LessonSketch can embed representations of teaching (e.g., animations, storyboards, videos), as shared artifacts, into the virtual discussion space to enhance online professional conversations (see Figure 1, for an example) because when referring to shared artifacts and focusing on learning contents participants are more likely to produce meaningful, in-depth discussions (Chieu et al., 2011; Neale, Carroll, & Rosson, 2004; Wise et al., 2009). We contend that LessonSketch’s cartoon-based artifacts and tools play a crucial role, as “mediators of cognition,” to help teacher users externalize their thoughts and ideas about instructional practice and thus develop shared goals and understandings (Engeström, 1999). This article partly investigates the nature of online discussions through LessonSketch’s advanced communication tools.

Part of LessonSketch’s communication tools: Embedding an animation into the discussion space. © 2014, The Regents of the University of Michigan. Used with permission.
Research Design
Research Questions
We conjecture that when teacher users evaluate classroom events (i.e., when they explicitly make evaluative comments on classroom events within a public conversation space), they are more likely to propose alternative actions for the animated teacher or students and to interpret or reflect on their professional practice than when they do not evaluate classroom events. An important goal for research has been to verify the presence of those qualities in collaborative, professional learning activities. In this study, we thus consider reflection and alternativity as two key desirable features of teacher discourse. The main research question of this study is to investigate the correlations between evaluation, as an indicator, and those two discursive features. More specifically, we focus on the following questions: Are there any associations between the observation that participants evaluate events of embedded animations and (a) the observation that they anticipate alternative actions by teacher or students, and (b) the observation that they interpret or reflect on what they notice? What is the nature of those associations, if any? Are there any significant effects of the way individual participants post in forums (i.e., the frequency of their forum posting) on those correlations?
Settings, Participants, and Procedure
In the fall of 2009, a mathematics teacher educator at a university in the eastern United States asked us to create eight online sessions for a class on geometry instruction. Each online session was a structured exploration and discussion of one or several versions of animated classroom stories (each story has a main branch and sometimes a number of variations, for example, with alternative endings). More specifically, it consisted of the following consecutive activities: (a) an individual exploration in which participants were asked to view and comment on one or more versions of an animated story, and (b) a forum discussion of those animations. We created a discussion thread for each animation version. The user interface of those threads was similar to the one presented in Figure 1; the main feature included an animation that was directly embedded on the left-hand side of the discussion space, which was organized in a tree-based format. Figure 1 shows the discussion of Version A (a main branch) of the “Chords and Distances” story in which the teacher asks students to work in groups to form conjectures about circles, chords, and their distance to the center of the circle. Table 1 shows the use of animated stories over the one-semester class. One session was used in each of the 8 weeks; all participants took part in the same session at any given time (i.e., there was only one group of participants in this study); participants usually had almost the whole week to post messages after a face-to-face class meeting every Monday. Twenty-one participants (11 teacher candidates and 10 novice teachers, 16 females and 5 males) enrolled in the course. None of the teacher candidates had full-time classroom teaching experience, though some had temporary teaching experience. The novice teachers had a full-time teaching job, with no more than 3 years of teaching experience each. The forums were not moderated; the teacher educator read participants’ postings but did not make any comments or gave feedback on those postings; she did not encourage or discourage evaluation by teachers. The participants were informed that the teacher educator would grade the thoughtfulness and insight of their forum postings at the end of the course. But the teacher educator was not involved in the present study, analyzing data or interpreting or reporting results.
Use of Animation Versions in Eight Weekly Online Sessions.
In the design of animated classroom scenarios, for each instructional story we created a number of critical events to prompt participants’ conversations about teaching practice; by critical events, we mean moments in which instructional norms are breached (Herbst & Chazan, 2003; Herbst, Nachlieli, & Chazan, 2011). For example, a breach of a norm of how proof tasks in an American high school geometry lesson are assigned could be instantiated if when giving students a problem where they are expected to produce a proof, the teacher did not provide students with clear statements of the givens and conclusion to prove; the norm is that the teacher will do so (Herbst, Aaron, Dimmel, & Erickson, 2013).
Data Sources and Data Analysis
To respond to the questions mentioned above, we collected all forum logs. As a rule, we took the posting as the unit of analysis, making the assumption that each posting contained a single contribution to the discussion. Each posting was coded for the presence or absence of characteristics of interest. The only exception was the case in which a posting included more than one paragraph and there were explicit markers of contrast in the form of internal conjunctions, such as “on the other hand,” to connect the paragraphs. Those markers suggest that the posting might include more than one contribution to the discussion. Each time such a marker existed, we considered a new unit of analysis. This is reasonable because some members may prefer to use that kind of marker to connect ideas in one posting instead of separating ideas into multiple postings.
We used elements of SFL (Halliday & Matthiessen, 2004; Martin & Rose, 2007) to code text-based conversations in forums (we describe this coding in the next subsection). SFL provides the basis for an operational framework with which we can identify, for example, where participants made evaluations of teaching or where they reflected on the work of teaching. SFL has been increasingly used in education research (Martin, 2001; Schleppegrell, 2012a, 2012b).
Because, as a rule, each individual made several postings, we did not consider postings as independent of each other. Thus, we used Hierarchical Generalized Linear Modeling (HGLM, described below), a particular form of multilevel modeling, to handle the structure that postings were nested in individual participants. Multilevel models are powerful ways to deal with the nested structure of data (e.g., students nested in classrooms, classrooms nested in schools, schools nested in districts). Recently, a number of studies have used multilevel models to analyze data in online discussions (see Cress, 2008, for a more extensive review). Note that while individuals are typically nested in groups for multilevel models, a three-level model (postings are nested in individuals and individuals nested in groups) would be more appropriate to analyze conversation data; because we had only one group we did not consider the individual-group nesting structure in our analysis. In addition, postings are also nested in discussion threads that participants may create in forums and threads are nested in forums. Because the participants in this study used only a small number of threads and forums that the teacher educator initiated in advance, however, we did not include crossed random effects of individuals and threads in our models and we did not consider forums as a new level in the nested structure of data. More explicitly, our sample included 21 forum participants making 723 posts over an 8-week period. Posts were made in response to parent posts from the instructor, and although it would have been ideal to account both for posts being made from individuals and in response to particular posts, we were limited by our sample size. Our decision to account for postings as nested within individuals allowed for an examination of important features of the online discussion. However, we acknowledge the limitation that our analysis does not statistically examine the effects of postings as also nested within parent posts. 2
While our statistical analysis does not fully account for the turn-based nature of discourse, our consideration of postings as nested in individual participants makes the assumption that while one’s posting may be influenced in some fashion by those in a thread, it is the individual who is ultimately responsible for what to put in a posting. While there are limitations in assuming that postings depend only on individuals who make the postings, our assumption is not unlike what is often assumed about classroom discourse where individual students are assumed responsible for their actions and words, while students in fact interact with one another and thus such interactions might also influence the manner in which individuals perform in those classrooms. Just as it is acceptable to assume a reasonable level of independence for the sake of parsimony in such cases (students nested within classrooms), we apply the same logic to the case of online forums (postings nested within individuals). This is not to say that examination of the turn-based, interactive nature of discourse is irrelevant; in fact, we consider it important to explore, but given the nature and size of our present sample, we take postings as nested within individuals as a reasonable approximation of the phenomenon. Although we were not be able to include both nesting structures (posts nested within individuals and posts nested within parent posts) in a single HGLM model, we performed another similar analysis in a preliminary study (Chieu & Herbst, in press) to investigate correlations between the presence of evaluation markers in a parent post and the presence of reflection or alternativity markers in a follow-up post.
SFL and Coding Scheme
We incorporated SFL to code for lexical and grammatical elements in forum postings (Halliday & Matthiessen, 2004; Martin & Rose, 2007). We tracked uses of reference and substitution to detect cohesion in online discussions when needed (Schiffrin, Tannen, & Hamilton, 2003). SFL, which considers language as a social semiotic system, looks at the language choices people make to construe meaning: It describes, in particular, the grammatical and lexical choices available to construct discourse and the meanings that are constructed through those choices. One can better understand how people construct meanings by contrasting the options they choose against the other options they could have chosen.
According to Halliday (see Halliday & Matthiessen, 2004), speakers convey meaning by simultaneously drawing on the resources that language has available to fulfill three fundamental metafunctions of language: First, the ideational metafunction has to do with the language resources for construing experiences or ideas. Second, the interpersonal metafunction is concerned with the resources that language provides for creating and maintaining social relations between speaker or writer and listener or reader. Third, the textual metafunction is concerned with the resources that language has available to organize its products into texts of particular genres. Each text is composed of linguistic choices that perform those three metafunctions simultaneously.
Our coding scheme sought to linguistically track the properties of conversations about the animations identified by Herbst and Chazan (2006) through the observation of face-to-face study groups. Our coding scheme also sought to linguistically track part of the codes (e.g., evaluative stance) used by van Es and Sherin (2008) and part of the codes (e.g., alternativity) we had proposed in an earlier study of LessonSketch (Chieu et al., 2011). This coding system attends to the three variables mentioned previously: evaluation, reflection, and alternativity. Improving on earlier usages of those codes, we operationalized those codes through attention to the linguistic choices participants made in the forums. Two coders coded postings independently. Doubtful cases, where the relevance to teaching practice was not obvious, were not coded. Aided by an SFL-based operationalization of the three codes, the two coders first coded 20 units independently. Then, they reconciled the two analyses and revisited the coding process. They repeated the same procedure for another set of 20 units. When the two coders believed that the coding system was reliable, they independently coded one forum log (about 50 units). We used the Cohen’s Kappa statistic to determine interrater reliability. The kappas of the first coding round indicated a moderate reliability, so the two coders reconciled all differing codes, continued to improve the coding scheme and their coding skill, and independently coded another forum log (about 50 units). Cohen’s kappa statistics in the second coding round for evaluation was .66, that for alternativity was .77, and that for reflection was .69 (p < .001). Kappas ranging from .01 to .20 are considered to have slight agreement, .21 to .40 are fair, .41 to .60 are moderate, .61 to .80 are substantial, and .81 to 1.00 are almost perfect (Sim & Wright, 2005). Therefore, the scores obtained for our coding suggest good interrater reliability (Capozzoli, McSweeney, & Sinha, 1999; Sim & Wright, 2005). Finally, the two coders reconciled all differing codes and continued to code the remaining units, half for each coder. We describe below how SFL helped us operationalize the three codes.
For the evaluation code, we used Martin and White’s (2005) appraisal theory, which develops the systemic functional approach to describe the appraisal system as composed of the subsystems of affect, judgment, and appreciation. 3 That approach helped us identify where participants made evaluations of teaching. According to Martin and White (2005), appraisals “. . . reveal the speaker’s/writer’s feelings and values . . .” (p. 2) and can be realized not only lexically (through word choice) but also grammatically (e.g., through the use of modals such as should in “she should not have . . .”). We counted the number of the participants’ evaluations of teacher, students, objects, and actions in the animated classroom. A posting was coded 1 for evaluation if it contained at least one marker of such appraisal, and 0 otherwise. Table 2 shows examples of appraisal markers (adapted from Martin & White, 2005, Chapter 2).
Examples of Markers for Codes.
For the reflection code, we used Halliday and Matthiessen’s (2004) notion of clause enhancement, with specific attention to manner and causal-conditional enhancements. According to Halliday and Matthiessen, “in enhancement, one clause . . . enhances the meaning of another by qualifying it in one of a number of possible ways” (p. 410). Manner enhancement qualifies meaning through comparison or the means in which the process of one clause is enacted. In the example “[Students] could check their own logic and reasons by putting [the proof] into a two-column form,” the clause “[Students] could check their own logic and reasons” is enhanced with the information that this is done through using a two-column proof format. Causal-conditional enhancement modifies clauses through variations of logical connections (e.g., if P then Q; because of Q, so P, etc.). The example “So I don’t think we should teach to the test because it isn’t necessary” presents this form of enhancement where rationales are provided in varying orders and formats with indicators such as “so” and “because.” Both forms of enhancement (manner and causal-conditional) were taken as evidence of reflection due to their demonstration of logical reasoning. Providing rationales or means for which actions take place (in the form of grammatical processes in a clause) is evidence of thinking about thinking, and therefore characteristic of reflection. A posting was coded 1 for reflection if it contained at least one marker of reflection on teaching practice, and 0 otherwise. Table 2 provides examples of reflection markers (adapted from Halliday & Matthiessen, 2004, Section 7.4.3).
For the alternativity code, we counted the presence of teaching actions that should, could, or would have been taken in the animation. A proxy for alternativity could be the participant’s use of modals (could, should, would) in reporting the proposed action, but not necessarily. We used grammatical mood to determine whether or not the participants were talking about events that had happened or had not happened. In any case, the coder should be able to point out what the original action and the alternative action were. A posting was coded 1 for alternativity if there was at least one marker of it, and 0 otherwise. Table 2 shows examples of alternativity markers.
We give an example of the final codes for a couple of typical postings in Table 3. Note that in addition to examining markers in a posting, for each forum the coder watched the embedded animation to understand the content of the classroom story the participants were talking about. The coder also looked at the parent postings (i.e., the posting that the posting being analyzed replied to) to make better sense of the context of the posting.
Assignment of Codes of Two Typical Postings in Week 4 (User 230 Replied to User 231). We Underlined Pieces of the Text That Indicate the Presence of the Code in Brackets.
Multilevel models
We used HGLM to examine the probability for forum postings to include alternativity and reflection. HGLM is a nonparametric analysis of multilevel data and, therefore, it calculates logits to estimate the likelihood of certain outcomes with given conditions. These estimations are generally limited to the samples examined. Within this study, our findings are limited to the participants engaged in the online forum and our estimates concern the correlation between the likelihood of certain elements in a post with the likelihood of other elements being included. HGLM allows for the examination of nested data (postings nested within the individual) by creating regression equations for each level of analysis and using the slopes of lower level regressions as the outcome measures for higher level regressions. Essentially, HGLM allowed us to create an identical logistic regression equation for each individual participant and for all postings of that individual as the unit of analysis for such regression equations (logistic regression was used to model dichotomous outcome variables). Using the HLM 6 program (Raudenbush, Bryk, & Congdon, 2004), we calculated statistical effects in the form of logits for each variable serving as an indicator, while adjusting for the variance within each individual who posted in the forum. These logits were used as outcome variables (or those discursive features indicated by evaluation) for regression equations examining individual characteristics. We describe this model in detail in the following paragraphs (see Chapter 10 of Raudenbush & Bryk, 2002, for a complete description of the method).
We looked at the correlations between making an evaluation of the teaching in the animation within a posting and the probability that the posting contained reflection (viz., alternativity). The proposed final model is outlined, for the case of reflection (viz., alternativity) as the outcome 4 and evaluation as an indicator of the probability that reflection is present. However, variations using other variables are identical, and we limit our description of the models to the example below for simplicity:
Level 1 equation:
Level 2 equation:
The outcome measure of the Level 1 equation, πij, represents the transformed predicted value for whether or not reflection is present in forum posting i, made by individual j. At Level 1, error is measured by µij for posting i nested within individual j. At Level 2, error is measured by
The coefficients at Level 1 are estimated as outcomes at Level 2, which represents data of the individual forum participant. By estimating the effects of Level 1 as outcomes at Level 2, we were able to account for some of the variance within individual participants, for example, their posting habits. We examined the effects,
Results and Discussion
Overall, 21 participants contributed 723 postings over eight online forum sessions. The mean number of posts per participant was M = 34.4, with Min = 21 and Max = 46, indicating a power coefficient of approximately .88, which was updated using Optimal Design 1.83 (Liu, Spybrook, Congdon, Martínez, & Raudenbush, 2007). Although the estimated power is sufficient for our analysis, it is important to consider the results that follow in context. Specifically, the sample includes 723 postings, but these come from only 21 participants. Thus, we do not make any claims of generalizability of our findings, but only regarding trends in our particular sample.
Participants often made evaluations of the animations (Figure 2) and reflected on what they noticed (Figure 3). They proposed a large number of alternative actions for the instructional practice represented in the animations (Figure 4). In other words, forum conversations in each of the eight online sessions were highly interactive and all members actively contributed to those valuable discussions (see also Chieu & Herbst, 2011). The off-line class activities might have helped stimulate that level of interaction. We believe, however, that the embedded animations may have also contributed to that phenomenon, because teachers’ discussion was highly interactive and meaningful even on the first week, and that phenomenon was also recognized in single-session studies that we conducted earlier (Chieu et al., 2011).

Percentage of posts that include evaluative comments over 8 weeks.

Percentage of posts that include reflective comments over 8 weeks.

Percentage of posts that include proposals of alternative teaching actions over 8 weeks.
The examination of the coded data allowed us to notice that the presence of evaluation often concurred with the presence of alternativity and reflection in a posting (see examples in Table 3). Next, we investigate more about those correlations using HGLM, and we show that evaluating features of classroom interactions significantly correlated with the participants’ reflection on instructional practice and their proposal of alternative actions in teaching.
Analysis of Reflection
Table 4 summarizes the frequencies of evaluation and reflection. Table 4 shows a dominance of the frequency of evaluation = 1 and reflection = 1. Overall, about 77% comments included reflection. If counting only comments including evaluation, however, that percentage increased to about 83%. The following HGLM analysis provides a better picture of the correlation between those two variables, by accounting for the nested structure of the data (i.e., posts made by individuals).
Frequencies of Evaluation and Reflection.
Customary in hierarchical linear modeling is the construction of models from their more basic elements to the final proposed model (Hox, 2002; Raudenbush & Bryk, 2002), such as that presented in the previous section. A first step in this process is the construction and running of an unconditional or empty model. The unconditional model contains only the outcome measure (in this case, reflection). Therefore, the intercept,
Level 1 equation:
Level 2 equation:
Results indicated that the intercept was statistically different from zero (
In constructing the model including evaluation, we first constructed a baseline model (Model 1), which included only the Level 1 variable evaluation. Results showed a reduced size for the intercept from the unconditional model (

Probabilities of making reflective comments.
Model 2 included the variables N_Posts and Status as variables at Level 2. Results of this model indicated that an individual’s status as either a future teacher or a novice teacher had no statistically significant interactions with the Level 1 intercept (
Analysis of Alternativity
Table 5 summarizes the frequencies of evaluation and alternativity. Table 5 shows a dominance of the frequency of evaluation = 1 and alternativity = 1. Overall, about 56% comments included alternativity. If counting only comments including evaluation, however, that percentage increased to about 65%. We applied the same analysis process described above to investigate the correlation between those two variables.
Frequencies of Evaluation and Alternativity.
The unconditional or empty model indicated that individuals were more likely to propose alternative teaching actions in forum postings than not: The intercept was statistically different from zero (

Probabilities of proposing alternative teaching actions.
Discussion of all Correlations
The previous analyses suggest strong correlations between evaluation and reflection and between evaluation and alternativity in online discussions by candidate and novice teachers. From our observation while coding forum posts, we found that the quality of participants’ reflection and alternativity, regarding teaching practice, throughout all discussions was relatively high. Yet posts that contained evaluation were more likely to have those desirable characteristics than posts that did not contain evaluation. This result seems to suggest that discouraging participants from making evaluative comments might not be conducive to improving the quality of discussions, especially in discussions where the reference object is not a video from one of the participants’ own teaching. Of course, it is important to replicate this result in further studies before making recommendations that might be consequential.
The importance of looking for those correlations was justified on the role that evaluation plays in supporting the construction of interpersonal relationships through language, as suggested by SFL. Along those lines, the conjecture was that a forum where participants were not discouraged from evaluating could enable (or not disable) resources that might support constructing an online asynchronous discussion that felt more like a conversation among people. The results show that when speakers (or rather, forum contributors) engage those resources, they are also more likely to contribute content that could be considered valuable (on account of including reflection and proposing of alternatives). A question for further research, however, is whether the use of appraisal resources of language actually supports forum interactions that also have desirable characteristics. For example, Table 3 shows that after User 231 noticed that the teacher had made a desirable action, he reflected on why that action was good and even built on that reflection by proposing a useful teaching action to improve what the teacher had done. User 230 then followed up with another viable action of teaching and justified why it would be viable. She also came up with an evaluation of an undesirable action by the teacher and considered an alternative action to correct it.
It would be important to examine whether the posts elicited by posts that contain evaluation also have desirable qualities. Does the likelihood for a post to include reflective comments or alternative teaching actions increase when participants are replying to a post that contains evaluation, compared with when they are replying to a post that does not contain evaluation? In a preliminary study (Chieu & Herbst, in press) that used a similar analysis method, we found that a forum post had 80.4% chance of including reflection if it replied to a post that did contain evaluation, but only 58.7% chance of including reflection if it replied to a post that did not contain evaluation (effect size or odds ratio = 2.88, p < .001). Similarly, a forum post had 58.1% chance of including alternativity if it replied to a post that had evaluation markers, but only 38.5% chance of including alternativity if it replied to a post that did not have evaluation markers (effect size or odds ratio = 2.21, p < .01). This finding strengthens the correlations between evaluation and reflection and between evaluation and alternativity through the entire threads of discussion or through interactions among the participants.
The results presented in this article must be interpreted with caution due to the correlational nature of the claims. Although there exist strong correlations between evaluation and reflection and between evaluation and alternativity, this does not necessarily mean that evaluative comments would lead to reflective comments or proposal of alternative teaching actions. This kind of correlation study is still useful, however, because it provides a valuable foundation for more rigorous research in the future. An experimental design (e.g., the use of control and study groups and random assignment of participants to conditions) with a larger sample size would enable investigations of the effect of evaluative comments on qualities of teacher conversations and comparisons between the use of video records of practice and the use of animated classroom episodes or between face-to-face discussions and online discussions.
Conclusion
In this article, we have presented strong evidence for correlations between evaluating features of classroom interactions, as indicators, and the probability of making reflective comments and proposing alternative teaching actions. Indeed, across eight online sessions in a teacher education class, both preservice and novice teachers frequently evaluated the teaching practice in animated classroom stories that were embedded into a forum discussion space. Furthermore, the more they were active in evaluating the teaching in the embedded artifacts, the more they created, shared, and discussed alternative actions of teaching, and the more they reflected on the instructional practice. Those characteristics, reflecting on practice and considering alternatives, have been and continue to be crucial in teacher development; hence, it seems important to look for ways to promote them (e.g., Berliner, 1994; Chieu et al., 2011; Rich & Hannafin, 2009; Schön, 1983; van Es & Sherin, 2008). The data presented show that evaluative comments on representations of teaching were more likely to include reflection and alternativity than nonevaluative comments. The degree to which such evaluative comments also improve participants’ actual teaching is an important question for future research. However, given that reflective comments were found here to be more prevalent when evaluative comments were made, and such reflection has consistently been linked to improved teaching, we infer that such a relationship is likely to be observed.
In producing that finding, we have illustrated how to use HGLM, a particular form of multilevel modeling (see Hox, 2002; Raudenbush & Bryk, 2002), to examine correlations between different variables of interest (e.g., between evaluation and reflection) and interactions between different levels of data (e.g., forum posts nested in participants). Research of online conversations or group work has not yet given attention to the nesting of forum posts in individuals, though studies have considered that participants are nested in groups (Cress, 2008; De Wever, Van Keer, Schellens, & Valcke, 2007).
Another key element of this article is the use of SFL to operationalize constructs that are desirable to observe in text-based exchanges among teachers in online discussions. We agree with a number of researchers (e.g., Martin, 2001; Schleppegrell, 2012a, 2012b) that SFL provides a useful means to analyze discourse because it is grounded in a theory of language that simultaneously accounts for the content, the context, and the construction of a discourse.
Our finding can inform the design of facilitation guides for online forum discussions of representations of teaching (Nachlieli, 2011). While scholars have cautioned against promoting evaluation in those discussions to encourage sensitivity, the evidence presented suggests that discouraging participants from making evaluations of the teaching observed might undermine the usage of representations of teaching to promote reflection and alternativity. Instead, if there is concern that participants might shy away from making evaluative comments about colleagues whose teaching has been captured on video (Jacobs et al., 2009), developers might consider translating those video records into animations of cartoon characters as a possible way of representing such teaching practice without carrying too much attention to the individual practitioner, so as to have the chance to engage participants in making evaluative comments. Along those lines, promising findings of an earlier study (Herbst & Kosko, in press; Kosko & Herbst, 2012) suggested that animations promoted more uses of modality of the obligation or normativity type (e.g., “the teacher should . . .”) than did videos; obligation or normativity is one way in which appraisal is realized. Further research would be needed to investigate the value of animations in terms of supporting evaluation in a safe environment though, particularly comparing online and face-to-face groups.
In terms of a contribution to theory, we argue that it is reasonable that when practitioners get engrossed in a conversation about such a complex, relational practice as teaching, they will engage in it not only intellectually, as an analyst would, but also emotionally, as possible participants of the scenario being discussed. Evaluation, being a function of language with which speakers relate to each other (Martin & White, 2005), could thus be a feature that indicates more rather than less of such engrossment, which the communications literature calls social presence (see also Lombard & Ditton, 1997; Oztok & Brett, 2011; Picciano, 2002). We suggest that the ways we have used to estimate reflection, alternativity, and evaluation, as well as the correlations found in this study are important steps in developing ways of estimating the telepresence and social presence of participants from direct observation of their interactions in a discussion forum about teaching.
Footnotes
Acknowledgements
Authors’ Note
Opinions expressed here are the sole responsibility of the authors and do not necessarily reflect the views of the National Science Foundation (NSF).
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Patricio Herbst and Vu Minh Chieu are authors and operators of the online platform LessonSketch at the University of Michigan.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work reported in this article is supported by NSF Grants ESI-0353285 and DRL-0918425 to Patricio Herbst.
