Abstract
This study explores the relationships between teacher knowledge, teaching practice, and student learning in mathematics. It extends previous work that developed and evaluated an innovative approach to assessing teacher knowledge based on teachers’ analyses of classroom video clips. Teachers watched and commented on 13 fraction clips. These written analyses were coded using objective rubrics to yield a reliable and valid indicator of their usable teaching knowledge. Previous work showed this measure to correlate with another measure of teacher knowledge and to predict students’ learning from the teachers’ fraction instruction. In this study, the authors replicated those findings and further showed that the effect of teacher knowledge on student learning was mediated by instructional quality, measured using video observations of teachers’ lessons.
Keywords
As evidence mounts that teachers have a significant impact on student learning (Nye, Konstantopoulos, & Hedges, 2004; Sanders & Rivers, 1996; Sanders, Saxton, & Horn, 1997), understanding how and why some teachers are more effective than others has become a high priority for education research. Because conventional wisdom suggests that one can only teach what one knows, it seems likely that at least some of the differences in teacher effectiveness are related to differences in teacher knowledge. Consequently, understanding what kinds of knowledge teachers draw on and how they use it during instruction are important steps in untangling the complex relationships between teacher knowledge, teaching practice, and student learning. In this article, we use a novel measure to explore teacher knowledge in mathematics and the possible mechanisms by which it affects instruction and student learning.
Attempting to document links between teacher knowledge, particular teaching practices, and student learning has a long history in education research, although opinions differ on how successful such efforts have been (Hiebert & Grouws, 2007). Some studies have investigated the relationship between teacher knowledge and teaching practice (Hill, Sleep, Lewis, & Ball, 2007), teacher knowledge and student learning (Rowan, Correnti, & Miller, 2002), or teaching practice and student learning (Shavelson, Webb, & Burstein, 1985). Very few studies have investigated all three elements in a single study (Fennema & Franke, 1992). Progress in identifying the kinds of teacher knowledge most relevant for effective instruction has been hampered by a lack of suitable measures. Until recently, commonly available assessments of teacher knowledge measured only subject matter content knowledge or general pedagogical knowledge, neither of which have been found to have predictive validity vis-à-vis instructional quality or student learning (Darling-Hammond & Baratz-Snowden, 2005).
Perhaps not surprisingly, significant progress in research on teacher knowledge resulted from two developments: (1) theoretical developments in identifying categories of knowledge likely to directly affect instruction and (2) new efforts to study teacher knowledge through instrument development. Shulman (1986, 1987) made a major theoretical contribution when he proposed pedagogical content knowledge as the type of knowledge most likely to affect student learning. He conceptualized it as knowing content in pedagogically useful ways and thus viewed it at the intersection of teaching and learning. Mathematics educators Deborah Ball, Heather Hill, and colleagues built on Shulman’s work by designing assessments to measure teacher knowledge, including aspects of pedagogical content knowledge (Ball, 2003; Hill & Ball, 2004; Hill, Schilling, & Ball, 2004). Ball and Hill defined the Mathematics Knowledge for Teaching (MKT) construct and then developed items to measure this construct because they recognized that systematic study of teacher knowledge would depend on the development of reliable and valid measures.
Ball and colleagues undertook a comprehensive analysis of the knowledge and skills mathematics teachers need to effectively teach their students and raised the important issue of knowledge use: “It is not just what mathematics teachers know but how they know it and what they are able to mobilize mathematically in the course of teaching” (Ball, 2000, p. 243). They identified six constructs that make up teachers’ Mathematics Knowledge for Teaching and have, to date, developed a bank of multiple-choice items to measure four of these constructs. The four constructs they measured were: (1) common content knowledge (i.e., knowledge that teachers hope to develop in their students), (2) specialized content knowledge (i.e., knowledge about the mathematical content that is only likely to develop in the context of teaching, e.g., knowledge about nonstandard solution methods), (3) knowledge of mathematics and teaching (which includes knowledge of best representations), and (4) knowledge of mathematics and students (e.g., knowledge of common student misconceptions and errors). The two other constructs proposed by Ball, Thames, and Phelps (2008) are knowledge of the curriculum and knowledge “on the horizon.”
The “job” analysis approach taken by Ball and Hill paid off. Empirical studies have shown the MKT items to produce reliable scores. Furthermore, recent validity studies have shown that the MKT predicts instructional quality as measured by the Mathematical Quality of Instruction (MQI) instrument (Hill et al., 2008; Learning Mathematics for Teaching, 2006), and at least one study has reported a direct relationship between teachers’ MKT scores and student learning gains (Hill, Rowan, & Ball, 2005). While the MKT clearly measures relevant knowledge, Ball and colleagues (2008) pointed out that “how such knowledge is actually used and what features of pedagogical thinking shape its use, remain tacit and unexamined” (p. 403).
Following in Ball and Hill’s footsteps, we also have focused on developing measures of teacher knowledge as a means of advancing our understanding of how teacher knowledge impacts instruction and learning. The kind of measure we have been developing, however, differs from that of Ball and Hill. Instead of developing multiple choice items to measure what teachers know, we have developed a measure that focuses on usable knowledge—knowledge that teachers are able to access and use in a classroom situation. So, while the knowledge we assess does not differ in kind from that assessed by Hill and Ball, our method of eliciting the knowledge does differ, giving credit only if teachers are able to activate the knowledge in a classroom situation.
Our approach is to use video clips of authentic classroom events as prompts to elicit teachers’ analyses, which are in turn assumed to draw on teachers’ knowledge. We reasoned that the analyses teachers produce would reveal not only their knowledge as it relates to the instructional segment depicted in the video clip, but also their ability to bring that knowledge to bear on a live classroom situation. To be sure, despite their high degree of authenticity the video clips are different from actual teaching; teachers have a lot of contextual information about their own class and individual students that cannot be replicated with a video clip that is provided for assessment purposes. Nevertheless, we believe it is reasonable to assume that teachers who have more extensive and sophisticated knowledge will produce more insightful analyses of teaching than will the less knowledgeable teachers (Berliner, 1989, 1991, 1994, 2001; Carter, Cushing, Sabers, Stein, & Berliner, 1988; Carter, Sabers, Cushing, Pinnegar, & Berliner, 1987).
Our Previous Work
In a previous study, we developed a classroom video analysis (CVA) assessment on teaching fractions (Kersting, Givvin, Sotelo, & Stigler, 2010). The assessment consisted of 13 short video clips, which covered relevant subtopics within the larger fraction domain: part/whole relationships, equivalence, comparing fractions, and all four fraction operations. Each video clip was accompanied by a brief description of the lesson from which it came. To complete the assessment, teachers were asked to view each of the 13 video clips and to analyze the teaching episodes depicted in them. To focus teachers’ written responses to the video clips on key elements of instruction, which are addressed in the scoring rubrics, we asked them to discuss how the teacher and the student(s) interact around the mathematical content (D. K. Cohen, Raudenbush, & Ball, 2003).
A total of 237 mathematics teachers, predominantly teaching in elementary and middle schools grades and representing a wide range of professional experience (M = 7.7 years of teaching experience, SD = 7.5) and preparation (89% fully credentialed; about 50% multiple-subject credential), completed the assessment, which was presented over the Internet in an interactive Web interface. Although the sample was self-selected, a comparison with national data showed our sample to be fairly representative of the national population of teachers with respect to credential status, credential type, and teaching experience.
Teachers’ written responses were scored according to four objective and reliable rubrics. The four rubrics indicated the degree to which teachers analyzed (1) the mathematical content and (2) student thinking depicted in the video clips, the degree to which teachers (3) included suggestions for improving the observed teaching episodes, and (4) the degree to which teachers were able to make overall sense of the instructional segments shown in the video clips. In addition, 223 of the teachers also completed an MKT scale on knowledge for teaching fractions and a brief background survey. A small subsample of these teachers (N = 19) administered a quiz on fractions to their students prior to and after they taught a fractions unit. The fraction quiz, which is described more fully in the Method section, consisted of 15 multiple choice items; about half of the items were publicly released items from standardized tests, the remaining items were written by the research team and designed to tap into students’ conceptual understanding.
That study produced several interesting findings: For the large sample (N = 223) we confirmed a strong relationship between teachers’ scored responses to the video clips and their scores on the Mathematics Knowledge for Teaching instrument. In particular, the mathematical content rubric, which rated the degree to which teachers analyzed the mathematics depicted in the video clips, was strongly related to the MKT scores, r(223) = .608, p < .01, suggesting that teachers relied on similar knowledge when analyzing the mathematics shown in the videotaped teaching episodes and when answering MKT items.
In addition, and perhaps most importantly, we showed that one CVA rubric, suggestions for improvement, was statistically significantly predictive of student learning gains (β = .521, p < .05). That is, students of teachers (N = 19) who included suggestions for improving the observed teaching situations depicted in the video clips, and who linked their suggestions to the mathematical context, learned more from the fractions unit than students of teachers who either did not include any suggestions at all or included only pedagogical suggestions. Although the other three CVA rubrics did have some association with student learning, the effects were not statistically significant. Interestingly, despite the strong correlation between CVA and MKT scores, teachers’ MKT scores in our sample did not significantly correlate with student learning gains (r = −.059, p = .82).
One possible reason for the different behaviors of the CVA and MKT measures is that the CVA picks up only usable knowledge, whereas the MKT reveals knowledge that may or may not be usable in a classroom situation. The problem of inert knowledge is well known in cognitive psychology (National Research Council, 2000). Knowledge that may be elicited in a testing situation often is not activated or applied in the context of a real-world problem-solving situation. Applied to our case, knowledge that is activated and brought to bear in a real (although videotaped) teaching situation may be a better indicator of the knowledge resources teachers are able to bring to bear in their own classrooms.
If this is true, it still leaves open the question of how teacher knowledge, even if activated, results in improved student learning. In this study we investigated the plausible hypothesis that effects of teacher knowledge on student learning are mediated by instructional quality: Teachers with more usable knowledge are able to apply that knowledge to the design and improvement of instruction in their classrooms, and students who are exposed to higher quality instruction learn more than students exposed to lower quality instruction. To investigate this hypothesis requires that we measure instructional quality and that we include all three elements—teacher knowledge, instructional quality, and student learning—in a single study.
Rationale for the Current Study
The current study was designed to follow up on our previous work in three ways: (1) We included videotaped classroom lessons of teachers’ own teaching rated for quality of instruction to test the hypothesis that the effects of teacher knowledge (as measured by the CVA) on student learning are mediated by instruction; (2) we doubled the sample size to improve the quality of our sample and to examine the robustness of the effects of teacher knowledge on student learning we had found previously, particularly, whether suggestions for improvement would remain a statistically significant predictor of student learning gains; and (3) we analyzed the effects of teacher knowledge (as measured by the CVA and MKT) on instruction and student learning to better understand the different dimensions of teacher knowledge and their potential use.
Method
Sample
A total of 36 teachers in 10 states participated in this study. (Half had participated in an earlier study, Kersting et al. [2010], and half were new). The study was advertised via listservs with national reach, postings on Web boards, and through local universities with teacher preparation programs, and the sample was thus self-selected. Although not a representative sample, participating teachers represented a reasonable degree of geographic and curricular variation.
Of the teachers, 16 teachers taught fifth grade, 15 taught sixth grade, and the remaining 5 taught seventh-grade math. With the exception of one teacher (2.8%), who reported to be not yet fully credentialed, the large majority of teachers (N = 27) held multiple subject credentials (77, 8%). Four teachers (11.1%) held a single subject credential in an area other than mathematics and one teacher (2.8%) had obtained a single subject credential in mathematics. The professional preparation of teachers in our sample is not unusual for the grade levels represented; most school districts across the United States allow teachers with multiple subject credentials to teach mathematics through sixth grade. With regard to their mathematical training, one teacher majored in mathematics (2.8%), four minored in mathematics (11.1%), and two (5. 6%) majored in a mathematics-related field. The large majority (80.5%) majored in fields unrelated to mathematics. The average teacher in our sample had 9 years of mathematics teaching experience. The range, however, was large, with teachers having from 1 to 27 years of experience (SD = 7.08).
Measures
Teachers completed three surveys as part of this study: the classroom video analysis instrument on teaching fractions, an MKT scale composed of items assessing teachers’ fraction knowledge for teaching, and a background survey. In addition, teachers agreed to be videotaped teaching one fraction lesson to their students and to administer a fraction quiz prior to and after completing their unit on fractions.
Classroom video analysis assessment
We used the classroom video analysis assessment scale that we developed as part of our previous study to measure teachers’ usable fraction knowledge (Kersting et al., 2010). The CVA scale consisted of 13 short video clips, each 3 to 5 minutes in length. Video clips were taken from a set of 11 authentic fifth- and sixth-grade fractions lessons, one clip each from nine of the lessons and two clips each from the remaining two lessons. As mentioned earlier, the fraction clips covered key topic areas within the fractions domain, such as part-whole relationships, equivalency, comparison of fractions, and operations with fractions, and were mathematically and or pedagogically interesting. Video clips portrayed teacher assistance episodes during student independent work time, student mistakes, or questions and their ensuing discussion during whole class instruction. We chose these types of classroom events because they were frequent, tended to elicit rich discussion by teachers, and were relatively short and self-contained (an important practical consideration for instrument development).
Teachers viewed the video clips on an interactive, password protected website. Teachers were provided with brief contextual information for each clip as well as a verbatim transcript presented as subtitles. As a prompt, we asked teachers to “discuss how the teacher and the student(s) in the clip interacted around the mathematical content” (D. K. Cohen et al., 2003). Teachers’ written responses to the clips were saved online and retrieved for scoring.
Teachers’ written responses to each clip were scored along four dimensions, each representing important aspects of teachers’ work: mathematical content (MC), student thinking (ST), suggestions for improvement (SI), and depth of interpretation (DI). Each dimension was scored on a 3-point scale. The first three dimensions were scored as follows: a score of 0 meant that there was no mention of the mathematical content shown in the video clip, of what students were thinking during the instruction, or of any suggestions for improving the instruction. A score of 1 meant that they mentioned the mathematics, student thinking, or suggestions for improvement, but did not go beyond what could be observed in the clip. To get a score of 2, respondents had to relate the mathematics in the clip to aspects of mathematics not readily observable in the clip, to relate student thinking to specific aspects of the mathematical content, or to relate the suggestions for improvement to specific mathematics content that the instruction needed to address. The depth of interpretation dimension was coded as 0 if the response was purely descriptive or evaluative without any evidence for the judgment, 1 if the response contained one or more unconnected analytic points, or 2 if the response connected the analytic points to form a coherent argument (for a more detailed description of the rubrics including scored sample responses, see Kersting, 2008; Kersting et al., 2010).
Interrater reliability for all four rubrics was computed both initially and at midpoint through the scoring. At each of the two points in time, we randomly sampled 14 responses for each of the 13 clips and had three raters score them on all four rubrics. Based on pairwise comparisons, interrater reliability as measured by direct agreement ranged from 79% to 89% across the four rubrics and two time points, averaged across clips (median agreement = 85%). Coefficient Kappa ranged from .61 to .77 (median Kappa = .71). We thus were able to score teacher responses with an acceptable degree of consistency. Teachers’ scored responses to the video clips were calibrated as part of a larger, more representative sample (N = 237) as a single scale and four subscales using item response theory (IRT). Teachers’ IRT scores (trait-level estimates) from that large calibration were used in all statistical analyses.
Mathematical Knowledge for Teaching
From the Michigan MKT item bank, we selected 15 multiple-choice items to form a measure of teachers’ fraction knowledge for teaching. Two of the items were testlets, that is, a stem was associated with three and with two questions, respectively, yielding 18 scores in total. Using the categories proposed by Ball et al. (2006), most of the items would be categorized as subject matter knowledge for fractions (common and specialized). The remaining items would be categorized as knowledge of content and students.
An item analysis of this MKT scale for the large sample (N = 223) showed good psychometric properties, consistent with results reported for other scales of similar length from this item bank (Hill et al., 2004; Kersting, 2008). Internal consistency as measured by coefficient α was estimated at .77. No ceiling or floor effects were observed. Respondents’ total scores on the MKT measure were distributed normally and ranged from 3 to 18 (M = 11, SD = 3.71). Teachers’ raw scores were used in all statistical analyses.
Teacher background survey
The teacher background survey was composed of nine questions. The questions asked teachers to provide information about their preparation and professional experience, including credential type and status; their teaching experience as measured by number of years teaching mathematics; and their college major.
Instructional quality measure
Each teacher was videotaped teaching one fractions lesson to his or her students. Teachers were free to choose which lesson was videotaped as long as the lesson introduced students to some new fraction concept or idea. All videos were scored using three rubrics that together comprised our measure of instructional quality. The three rubrics, which describe whether the underlying mathematics was made visible in a lesson, were developed based on a review of research findings that relate features of teaching to students’ learning (Hiebert & Grouws, 2007). Coders of instructional quality were blind to the lesson teacher’s knowledge and student learning scores.
Rubric 1: Developing Concepts
The first of our instructional quality rubrics was designed to indicate whether teachers mentioned or developed mathematical concepts or ideas and how much lesson time they spent on this. We first coded whether a presented mathematical concept was mathematically correct and sufficiently complete to support students’ learning of the mathematics. For each such concept we marked the beginning and end time of the segment in which it was presented. To be judged “complete,” all components of the concept had to be presented as part of the lesson. For example, to receive credit for “meaning of fractions,” the lesson needed to address the idea of what the “whole” is and the requirement of “same-size” pieces and discuss the relationship of part to whole.
Lesson segments that contained concepts were further categorized as either having developed or merely mentioned the concept. To be coded as developed, the teacher had to make the underlying mathematical relationships explicit. For example, if a teacher pointed out that fractions must be converted to common denominators in order to add them because we can only add like-size pieces, the teacher would receive credit for having mentioned a concept. To get credit for developing a concept, the teacher would have to explain, perhaps through the use of visual representations, why we need to convert to like-size pieces. For example, she might have explained that unless the original fractions are renamed, it would not be possible to name the quantity that results from the addition.
Rubric 2: Appropriate Use of Representations to Explain Algorithms
Teachers often use manipulatives and/or drawn representations (e.g., fraction circles, fraction strips, geoboards) to illustrate and/or explain common algorithms for computations involving fractions. For each lesson, we coded whether manipulatives and/or drawn representations were used for this purpose, whether the representations were appropriate for explaining the algorithm, and whether the representation was explicitly and completely mapped to the algorithm (i.e., the majority of the algorithmic steps needed to be explicitly mapped to the representation used). For example, a teacher could have used an area model to illustrate why we multiply numerators and denominators when multiplying fractions. Similarly, a teacher could have used geoboards to illustrate multiplication as part of a part to help students understand that although multiplying whole numbers produces a larger quantity, multiplying fractions produces a smaller quantity.
Rubric 3: Connecting Concepts and Topics
We recorded how much lesson time the teacher spent making explicit connections among mathematical topics or ideas (conceptual) or on linking topics across time (temporal) to provide students with a sense of how mathematical topics or ideas are sequenced across the curriculum. An example for a conceptual link might be “You already know how to find common factors of whole numbers by using factoring trees. Now we can use that method to find the least common denominators to create fractions with like denominators.” An example for a temporal link might be “Today we learned how to add fractions with like denominators. Next, we will learn how to add fractions with unlike denominators.”
Coding reliability
Coding of concept developed (i.e., Rubric 1) was done by agreement. Multiple raters coded each lesson and differences in coding were resolved through discussion. This was done because mentioning or developing concepts turned out to be low frequency events (leading to potentially very small denominators for a randomly selected reliability lesson), which made a formal evaluation of interrater reliability problematic. For Rubrics 2 and 3, interrater reliability was evaluated.
Interrater reliability for coding use of representations was indicated by level of agreement within each of two pairs of coders. We marked segments in which representations were used and computed a match if coders agreed on beginning and end points plus or minus 20 seconds and if they correctly rated whether representations were correctly and explicitly linked to algorithms in such identified segments. Initial percentage agreement for correctly identifying in and out points of such segments was 81% for each pair, based on coding of a randomly selected entire lesson; at midpoint the percentage agreement was 92% and 80% for each pair, respectively, based on coding of another randomly selected lesson. Direct agreement for correctly determining whether representations were linked correctly and explicitly to algorithms was 91% and 98%.
For links (Rubric 3), we computed rater reliability by comparing the coding of two raters against a master (prepared by the code development team). For each rater we evaluated whether in and out points were marked within 30 seconds of the master and whether judgments of links as conceptual versus temporal were in agreement with those made by the code development team. The percentage agreement for Rater 1 with the master was 83% and the percentage agreement for Rater 2 was 85% at the beginning of coding, and 83% and 84%, respectively, at midpoint.
Overall, we can conclude that raters applied the instructional quality codes with a reasonable degree of consistency.
Summary variables to indicate instructional quality
Summary variables were constructed for the three coding rubrics. To account for differences in lesson length, summary variables were computed as proportions of total lesson time spent on: (1) developing concepts, (2) correctly and explicitly mapping manipulatives and drawn representations to algorithms, and (3) linking across topics and ideas or across time. A fourth summary variable, computed as the sum of the three individual rubric proportions for each lesson, was used to indicate the overall instructional quality of the lesson.
Student fraction quiz
Students of our 36 participating teachers completed a short fractions quiz (administered by the teacher) prior to and after completing the unit on fractions. The fractions quiz consisted of 15 multiple-choice items covering key ideas within the fractions domain, including part-whole relationships, equivalence, comparisons of fractions, simplifying fractions to lowest terms, and the four basic operations with fractions. The items were based on released items from fifth- through seventh-grade state tests in California, Texas, and New York. Although multiple-choice items on state-mandated tests have been shown to predominantly measure skill proficiency, we deliberately sought those items that also measured some conceptual knowledge of fractions. Reliability (coefficient alpha) for the scale was estimated at .72 for the pretest and .71 for the posttest. Neither ceiling nor floor effects were observed.
Procedures
After teachers completed the online tasks, they chose one fraction lesson to be videotaped. Teachers could choose any lesson they wished, provided that the lesson included the introduction of a fraction concept or idea, not just practice of an already introduced concept.
A single videographer videotaped each lesson using a single camera mounted on a rolling tripod (for easy and smooth movements through the classroom) and a highly sensitive handheld boom microphone. The microphone was mounted on an extension so that the videographer could move it around freely with one hand to capture student responses. In addition, each teacher wore a wireless microphone.
The videography protocol was adapted from the one used for the second Third International Mathematics and Science (TIMSS) Video Study (1999). The main objective was to capture the teacher and his or her teaching. During whole-class instruction, the video camera was positioned in mid-classroom (either dead center or off to one side) to capture the teacher at the front of the room and allow for panning to the sides to capture student contributions. During student independent work time, the camera followed the teacher to capture the teacher’s private interactions with single students or groups of students. If the videographer was not able to follow the teacher due to a physical limitation (e.g., an aisle blocked with students’ backpacks or a teacher moving too quickly from place to place), the videographer captured student-to-student interactions and student work while working his way to catch up with the teacher. Students without signed consent forms were either sent to a neighboring classroom during the videotaping or placed out of sight of the camera.
Teachers administered the fraction quiz to students themselves—once before and once after their unit on fractions. Teachers were sent hard copies of the quizzes along with detailed instructions for administering the quiz in a standardized way. Teachers were asked to encourage their students to take the quiz seriously and to show their best effort in answering the items. We also asked teachers to refrain from helping their students solve the problems, except for providing vocabulary clarifications as needed. Finally, we asked teachers to limit the time students had to complete the quiz to 25 minutes.
Results
Our first analysis was carried out to see if we could confirm our previous results with this larger sample; that is, do we still find a significant relationship between teacher knowledge as measured by the CVA and student learning? Next we wanted to explore whether or not the effects of teacher knowledge on student learning were mediated by instructional quality and to further explore the relationship of the CVA with the MKT as two measures of teacher knowledge.
Analysis Plan
Most of our analyses, and the order in which we present them, are based on Baron and Kenny’s (1986) recommendations for how to investigate mediation effects. Having established a link from teacher knowledge as measured by the CVA and student learning in our previous study, our goal in this study was to investigate instructional quality as a possible mediator of that relationship. Baron and Kenny prescribe four steps in such an analysis: (1) Estimate the coefficient obtained when the outcome variable (in this case student learning) is regressed on the predictor variable (teacher knowledge) without a mediator variable in the model, (2) show that the predictor variable (teacher knowledge) is correlated with the mediator variable (teaching quality), (3) show that the mediator variable (teaching quality) is correlated with the outcome variable (student learning), and (4) estimate the direct and indirect effects of teacher knowledge on student learning in the mediation model and test whether the indirect (mediated) effect is significantly different from zero.
We used different statistical models depending on the variables included. When all variables in the analysis were measured at the teacher/classroom level (e.g., analyses relating teacher knowledge to teaching quality) we fit ordinary least squares (OLS) regression models. For analyses that included student learning outcomes (and thus, individual measurements nested within classroom), we fit a series of two-level hierarchical linear models in which we modeled student learning as gain scores. Differences in classroom gains (intercept at Level 1) were modeled as a function of teacher/classroom variables at Level 2. The basic model was as follows:
Level 1: StudentFractionQuizGainij = β0j + r
Level 2: β0j = γ00 + γ01(Teacher/Classroom VAR) + u0
Descriptive Analyses
A total of 36 teachers and their 591 students comprised the sample for this study. Descriptive statistics for all the variables included in the analyses are presented in Table 1.
Descriptive Statistics for All Variables Included in Analyses
As shown in Table 1, the average score on the student fraction quiz was approximately 6 points at pretest and 8.5 at posttest, indicating an average increase of about 2.5 points. Teacher knowledge scores as measured by the CVA and its rubrics were, on average, either at or slightly below zero, indicating that in this group of teachers the average teacher performed slightly below the average difficulty of the CVA scale. (We remind the reader that teachers’ knowledge scores on the CVA are IRT-based trait-level estimates using a normal metric measurement scale where a score of zero means that a responding teacher performs at the mean difficulty of the test.) For the MKT we obtained a mean score of 13.47 points (SD = 2.71) out of a total possible score of 18, indicating that teachers answered on average more than half of the items correctly. This seems to suggest that the MKT scale we used was comparatively easier for this sample of teachers.
The mean ratio for the overall instructional quality score used in these statistical analyses was .17, indicating that on average teachers spent roughly 20% of their lesson time on making the underlying mathematics visible. Specifically, 5% each were spent on average on developing concepts and mapping algorithms to representations and 7% were spent on links. In contrast, 5% to 7% of class time on average was devoted to developing concepts or mathematical ideas correctly and mapping algorithms to representations correctly.
Step 1: Predicting Student Learning Gains From Teacher Knowledge
The first step in our analyses was to examine the direct effect of teacher knowledge, as measured by the CVA, on student learning gains. To do this we fit a series of hierarchical linear models regressing student learning gains on the CVA total score, each of the CVA rubrics separately, and all CVA rubrics forced into the same model. We were particularly interested in determining whether the relationship of CVA responses to student learning would be replicated in this larger sample.
The analyses replicated our earlier results. As shown in Table 2, suggestions for improvement was the only significant predictor of student learning gains, both when it was entered as a single predictor (Model 4) and when all individual scoring rubrics were entered together (Model 6). To characterize the size of the effect, a one standard deviation increase in suggestions for improvement was associated with a one-third standard deviation increase in student learning gains under Model 4 and two-thirds of a standard deviation under Model 6. The former corresponds to a medium size effect (f 2 = .16) when computed as Cohen’s f 2 (J. Cohen, 1988) for hierarchical regression (f 2 = R2AB – R2/1 – R2AB), the latter corresponds to a large effect size, likely due to suppression. Regression coefficients for the overall CVA score and the remaining subscale scores were smaller and, at this sample size, not significant.
Results of Hierarchical Linear Modeling (HLM) Level 2 Analyses of Effects of Teacher Knowledge (Classroom Video Analysis; CVA) on Student Learning
Note. MC = mathematical content; ST = student thinking; SI = suggestions for improvement; DI = depth of interpretation.
Step 2: Predicting Instructional Quality From Teacher Knowledge
The next step in our analyses was to investigate the relationship between teacher knowledge (CVA) and instructional quality (the hypothesized mediator of the effect on student learning). To do this, we regressed instructional quality scores on all teacher knowledge scores (CVA total score, each rubric separately, all rubrics entered together, and all rubrics entered stepwise). The model predicting teaching quality from the CVA total score was highly significant (β = .599, p < .01). A one standard deviation increase in a teacher’s overall score on the CVA was associated with a two-thirds standard deviation increase in the overall instructional quality score. This represents a large effect size with a Cohen’s f 2 of .538 (f 2 = R2/1 – R2; J. Cohen, 1988).
Scores for each individual CVA rubric were also predictive of instructional quality. When entered individually, a one standard deviation increase on any of the CVA rubrics was associated with a one-third to two-thirds of a standard deviation increase on the instructional quality score (MC: β = .623, p < .01; ST: β = .612, p < .01; SI: β = .417, p < .05; DI: β = .459, p < .01). Effect sizes were of large or medium size (MC: f 2 = .589; ST: f 2 = .552; SI: f 2 = .204; DI: f 2 = .23). That is, mathematical content knowledge that is connected to student thinking, as measured by the CVA, informs alternative strategies, serves as a lens through which instructional segments are interpreted, and is positively related to instruction.
When all four CVA rubrics are included together in the regression model, effects of the individual rubrics drop away because of the high multicollinearity among the four rubrics. When entered together in a stepwise procedure, the mathematical content rubric emerged as the sole significant predictor of overall instructional quality (MC: β = .624, p < .01). That is, teachers who were able to provide more sophisticated analyses of the mathematical content in the context of the observed teaching episodes appeared to exhibit higher quality instruction in their own classrooms.
Step 3: Establishing That Instructional Quality (Mediator) Is Related to Student Learning
Having established that teachers’ CVA scores predict quality of instruction in their classrooms, we next investigated whether instructional quality would in turn predict student learning. Hierarchical linear regression analyses showed that instructional quality, our hypothesized mediator, was indeed a significant predictor of student learning gains. A one standard deviation improvement in instructional quality (composite score) was associated with an approximately one-half standard deviation improvement in students’ learning (β = .466, p < .01), which represents a medium-size effect (f 2 = .29).
Given the exploratory nature of this study, we also estimated the effects of the individual instructional quality indicators on student learning. When all individual indicators were entered together, time spent on developing concepts and mapping algorithms to representations remained statistically significant predictors of student learning gains (developing concepts: β = .474, p < .05; mapping algorithms to representations: β =.325, which corresponds to a large effect, f 2 = .342), while conceptual and temporal links did not explain any statistically significant additional amount of variance in student learning gains.
Step 4: Testing for Mediation—Was the Effect of Teacher Knowledge on Student Learning Mediated by Instructional Quality?
Our final step was to fit a series of hierarchical regression models in which we predicted both the direct effects of teacher knowledge on student learning gains and the indirect effects though instructional quality. In all the models we fit (for total CVA and for each CVA rubric separately), we found that adding instructional quality into the model reduced the magnitude of the direct effect of teacher knowledge on student learning. This suggests that part of the variance in teacher knowledge related to student learning was explained by the mediator, instructional quality.
To test the statistical significance of the indirect effects, we used the SPSS macros described by Preacher and Hayes (2004). These macros provide a means of estimating indirect effects using both the normal distribution and a bootstrap approach (N of bootstrap samples = 5,000), which is recommended for small sample sizes. As shown in Table 3, with the exception of suggestions for improvement, all indirect effects were significant, suggesting that our data are consistent with the hypothesis that teacher knowledge indirectly affects student learning through instruction. The indirect effect of suggestions for improvement was not significant. This might indicate that the effects of the knowledge captured under this rubric might not be sufficiently captured in our instructional quality rubric. It remained, nevertheless, the only statistically significant direct predictor of student learning gains.
Significance of Indirect Effects of Teacher Knowledge (Classroom Video Analysis; CVA) on Student Learning Using Normal Distribution and Distribution Free Bootstrapping Approach
Note. MC = mathematical content; ST = student thinking; SI = suggestions for improvement; DI = depth of interpretation; CI = confidence interval.
Indirect effects are based on unstandardized coefficients. The respective indirect effects based on standardized coefficients are .33 (total CVA), .40 (MC), .34 (ST), .16 (SI), and .24 (DI).
Comparing CVA and MKT as Measures of Teacher Knowledge
In our previous study we found that teachers’ scores on the CVA and the MKT were strongly correlated, indicating that teachers might use similar knowledge when responding to the video clips and the MKT items. This was true particularly for the mathematical content rubric scores. But whereas the CVA SI subscale predicted student learning gains, the MKT scores did not. (It should be noted that Hill et al. [2005] did find MKT scores to predict student learning gains.) In the present study we continued our exploration of MKT scores in comparison with the CVA approach.
Our first step was to estimate the direct effects of the MKT on student learning. Confirming the findings from our previous study, we did not find statistically significant direct effects of MKT scores on student learning for this larger sample (N = 36; β = −.06).
We next estimated the direct effects of the MKT on instructional quality. Surprisingly, we did not find a statistically significant effect of MKT scores on instructional quality (β = −.020), although we did again find a medium correlation of MKT with CVA scores for our current sample, r(36) = .406, p < .01. Hill et al. (2008) have reported strong relationships between MKT scores and their own measure of instructional quality. Even though the CVA and MKT measures covary, it must be the unique variance in each that predicts these different measures of instructional quality. Clearly, it will be important in future work to understand how these different measures of instructional quality relate. Finally, when we estimated a model that included MKT, instructional quality, and student learning we found little evidence of either direct (β = −.16) or indirect effects of MKT on student learning: (using normal theory) indirect: .0433 (.036), z = 1.2008, p = .22; (using no distributional assumptions) indirect: .0439 (.0381), 95% confidence interval (−.021, .281).
We conducted one final set of analyses in which we used both CVA and MKT to predict instructional quality and student learning. When both CVA mathematical content scores and MKT scores were included in a regression analysis predicting instructional quality, the effect size of mathematical content remained virtually unchanged (MC: β = .637, p < .01; MKT: β = −.033). This suggests, similar to the aforementioned, that it is unique variance in CVA scores that predicts teaching quality and not the shared variance between the CVA and MKT. We found a similar result when including MKT scores in the regression model predicting student learning gains from the suggestions for improvement CVA rubric (SI: β = .423, p < .01; MKT: β = −.139).
Discussion
In this study we investigated the relationship between teacher knowledge, teaching practice, and student learning. Our goal was to better understand how teacher knowledge in mathematics might affect student learning. Testing a mediation model, we related teachers’ reliably scored responses to a set of video clips of teaching (the classroom video analysis assessment, or CVA, an instrument we developed to measure teachers’ usable knowledge) to teachers’ own instructional quality and their own students’ learning.
Several interesting findings emerged. We confirmed in this larger sample our earlier finding that CVA scores predict student learning. We also developed a measure of instructional quality that was strongly related to both teachers’ CVA scores and students’ learning. And, consistent with the hypothesized mediation model, we found that the effect of teacher knowledge, as measured by the CVA, on students’ learning was mediated by instructional quality. For the total CVA score and for three of the four subscores (mathematical content knowledge, student thinking, and depth of interpretation), the direct effects of teacher knowledge on students’ learning dropped away once instructional quality was entered into the model. Furthermore, the effect sizes quantifying these relationships were of medium, and sometimes large, size, suggesting that the findings have practical relevance beyond statistical significance.
These results lend support to our approach to assessing teacher knowledge. By asking teachers to analyze video clips of classroom interactions, we are assessing not just what teachers know but also what knowledge they are able to access and likely apply in the course of classroom instruction. Knowledge that can be applied to the classroom is more likely to lead to higher quality instruction. And higher quality instruction produces more opportunities for students to learn. By simultaneously measuring and studying all three parts of this model—knowledge, instruction, and learning—we can identify with greater certainty those kinds of knowledge that are most closely related to quality of instruction and student learning. This in turn will enable us to begin investigating how such knowledge develops over time and help us understand the mechanisms by which it develops and the contexts in which it needs to be developed. Analyzing changes over time within the CVA rubrics might provide relevant information about knowledge development.
Although we confirmed the direct relationship of suggestions for improvement and student learning, the indirect effect of suggestions for improvement, mediated by instructional quality, was not significant (p = .09). Clearly, we need further study to understand how and why suggestions for improvement directly affect student learning. One possible explanation might be that our instructional quality rubric did not sufficiently capture the variance in suggestions for improvement scores that explains differences in student learning gains.
Results of this study might also help us think further about the nature of teacher knowledge. Our current rubrics reflect the importance of connecting mathematical knowledge with other elements of teaching, a view that goes back to Shulman (1986, 1987). That is, to obtain the highest scores on our rubrics, teachers, in their responses to the video clips, must connect mathematics into their analyses of student thinking or alternative teaching strategies. As a result, on our measure, teachers who were able to link mathematics to multiple elements of teaching were judged to be more knowledgeable than those who did not. Shulman viewed pedagogical content knowledge as a core knowledge domain at the intersection of teaching and learning because it meant knowing content in pedagogically useful ways. The medium to large effect sizes we found linking teacher knowledge to instructional quality and student learning appear to provide some evidence for this view.
We also learned in this study that we are only at the beginning of understanding how teacher knowledge relates to teaching and student learning. We found that although teachers’ scores on the CVA and MKT were related (r = .406), suggesting that teachers might have used similar knowledge when responding to the video clips and when answering the MKT items, teachers’ MKT scores did not predict instructional quality as captured by our rubric, nor did they predict student learning. Furthermore, we saw that the effects of the CVA on instructional quality and student learning were not altered when MKT scores were entered into the analysis. This indicates that the variance shared by the two teacher knowledge measures is different than the variance in CVA scores that relates to teaching quality and student learning.
Despite the progress we have made in this study, it is important to note that the findings still must be considered preliminary. Although teachers who participated in our study came from a large number of different states and instructional contexts, our sample was neither random nor large enough to draw any strong conclusions. At the same time, the consistency of the results across the multiple analyses we carried out as part of this study and their consistency with earlier findings as well as the good psychometric qualities of the measures we have constructed suggest some robustness and external validity. Currently we are working on replicating these results for two new topic areas: ratio and proportion and variables, expressions, and equations.
Finally, we believe that the kind of knowledge measured by the CVA is only part of the knowledge that can affect student learning through instruction and that our classroom video analysis approach is only one way to measure such knowledge. It will be exciting to see what other types of knowledge and ways to measure it can be found to relate to instruction and student learning.
Footnotes
Notes
N
K
B
R
J
