Abstract
A deductive, sequential mixed design was used to better understand the internal aspects of performance-based self-evaluation activities as related to teacher preparation. A modified theory of change informed the investigation of a subset (N = 15) of teacher candidates from a larger study, who all showed significant improvements in their teaching abilities after engaging in video analysis. Teacher candidates’ video analysis activities were further analyzed to better understand their self-evaluation accuracy and enthusiasm for engaging in such teacher preparation activities. Results indicated candidates rated their perceived ability significantly higher than their observed instructional ability at all four timepoints. Candidates who were most enthusiastic about engaging in video analysis with self-evaluation were the least accurate at rating their own instructional abilities. Additional findings about the association between internal factors (i.e., attitude and accuracy) and the implications for self-evaluation as a reliable form of performance-based assessment within teacher preparation are discussed.
Keywords
Over the past decade, teacher preparation programs across the country have adopted performance-based assessments for a more authentic measure of teaching ability (Bastian et al., 2018). Performance-based assessments are an alternative to subject matter tests and focus specifically on closing the gap between current and desired teaching ability. Modeled after the National Board Certification (e.g., edTPA; edTPA-like activities), common teacher preparation performance-based assessments include components such as lesson planning, video-recording instruction, collecting student work samples, writing reflections, and self-evaluating to capture a comprehensive estimate of profession-readiness as measured by teacher candidate knowledge and skill (Bastian et al., 2018; Pecheone & Chung, 2006). As teacher candidates strive to understand what successful teaching looks like, performance-based assessment opportunities can guide development by helping candidates gauge areas of strength and need (Hamodi et al., 2017). Essentially, performance-based assessments create opportunities for both measuring features of teacher candidates’ instructional effectiveness while helping to develop such teaching effectiveness at the same time (Darling-Hammond, 2020).
Performance-based assessments, that leverage video-recording, have become particularly prevalent across teacher preparation contexts. Video-based activities create robust yet flexible learning experiences to supplement (a) coursework (e.g., analyze exemplar videos; Jenkins, 2014), (b) early practicums (identify elements of video-recorded instruction; Cuthrell et al., 2016; van Es et al., 2017), and (c) field experiences (e.g., reflect on video-recorded instruction; Nagro et al. 2017; Nagro, 2019). The sequence of planning, recording, reviewing, reflecting, and evaluating instruction within performance-based assessment activities encourages critical thinking, autonomous learning, and connects coursework to teaching realities (Kearney et al., 2016). For example, through video-based activities, teacher candidates can (a) record portions of or an entire lesson, (b) review instructional or classroom management decision making, (c) reflect on why such decisions were made including the success of such decisions, (d) receive feedback from trusted mentors and coaches about a range of teaching aspects, and (e) revise their teaching decisions moving forward based on new insights (Nagro et al., 2020; Nagro et al., 2017; Calandra et al., 2008; Martin & Ertzberger, 2013). Teacher candidates who are just learning to review their own teaching can benefit from concrete video data that can be reviewed multiple times at a pace and in a setting that works for each individual.
Video-based activities enrich field experiences by allowing internship instructors, university supervisors, and coaches to observe the candidate in practice more often without the complications of in-person observations such as traveling to and from classrooms or disrupting student learning. Despite the benefits, video-recorded sessions, for the purposes of performance-based assessments, can be complicated to execute, given the nuances of technology, the time required to upload and download large video files, frequent scheduling changes, and individual school policies around parent consent. One approach to simplifying video-based activities is allowing the teacher candidate to record and review instruction independently on their own schedule. Limiting video review to only the teacher candidate excludes the university supervisor, course instructor, and mentor teacher, which can simplify scheduling and data sharing as well as mitigate unnecessary exposure of students caught on camera. Therefore, many preparation programs opt for performance-based assessments that include video-recording for the purposes of self-evaluation (Snead & Freiberg, 2019; Wright et al., 2012).
Operationally, self-evaluation is considered to be one of the most common forms of assessment in teacher education (Keller et al., 2005; Tait-McCutcheon & Knewstubb, 2018). Self-evaluation activities can include rubrics, checklists, or guidelines for candidates to use while evaluating their own professional practice. For example, Schmidt and colleagues (2009) asked 124 teacher candidates taking an introductory technology course to self-evaluate their knowledge of specific content areas and instructional technologies using a rating scale and then to rate their ability to use instructional technology while teaching across content areas. Similarly, Podgorsek and Lipovec (2017) asked 76 teacher candidates to evaluate their own knowledge of mathematics education as measured by scoring a seminar paper and then to evaluate their ability to teach mathematics during a practicum experience. There are many applications for self-evaluation within teacher preparation including those closely aligned with video-recording and performance-based activities.
Teacher candidates are asked to self-evaluate during field experiences for several reasons including access. First, self-evaluation activities promote autonomous learning and create frequent performance-based assessment opportunities (Kearney, 2013). Teacher candidates can evaluate their teaching performance through numerous approaches, including video-based activities, as often as they want in order to track their own perceived progress over time (Deniz, 2012). This helps candidates focus their professional goal setting and has been shown to increase candidate ownership over their learning (e.g., Baecher et al., 2013). Second, video analysis allows teacher candidates to provide frequent snapshots of their abilities without requiring internship instructors, supervisors, mentors, or school administrators to visit the classroom on a daily basis (e.g., Alexander et al., 2012). Self-evaluation activities can evolve as candidates target different areas for improvement such as collaboration, student–teacher relationships, instruction, and classroom management. Third, self-evaluation activities are useful for teacher candidates entering the workforce with an emerging professional identity and potentially a lack of understanding regarding formal evaluation expectations (Baecher et al., 2013; Collins et al., 2017). The cognitive and metacognitive strategies required for self-evaluation may aid teacher candidates in cultivating proficiency across many aspects of their newly acquired profession (Kelaher-Young & Carver, 2013). Last, perceived instructional ability is critical for teacher candidates new to the field because those who feel capable of implementing best practices are more likely to seek out and experience success as a teacher (Holzberger et al., 2013).
Despite the many reasons for self-evaluation, the exercise is only as meaningful as it is accurate. Perceptions of instructional ability may be similar to or different from actual instructional ability. Perceived instructional ability as measured through self-evaluation may actually be a rating of confidence (Tait-McCutcheon & Knewstubb, 2018). Ross and Bruce (2007) theorized that teacher candidates’ confidence influences their effort and instructional ability; however, confidence alone does not equate to quality instruction. In a study of 248 elementary general and special education teacher candidates, both instructors and candidates had similar perceptions of the candidate’s disposition toward teaching, but instructors reported that teacher candidates were engaging in substandard work during their preparation (Conderman & Walker, 2015). Conversely, low levels of confidence do not indicate necessarily a lack of effort or poor instruction. For example, in a study evaluating the alignment between 40 teacher candidates and their instructors, Tait-McCutcheon and Knewstubb (2018) concluded that the consistently lower ratings of candidates compared to their instructors’ ratings were reflective of low confidence of teacher candidates new to the profession not poor instructional quality. Taken together, this suggests that teacher candidates may accurately understand their approach to the profession (e.g., disposition or attitude) without necessarily understanding their ability to teach. Accurate self-evaluation and the ability to separate how one feels about teaching from how one performs while teaching is likely a skill that must be taught (Deniz, 2012).
Ross and Bruce (2007) theorized that teacher candidate self-evaluation was a mechanism for professional growth and supposed that self-evaluation activities needed to be packaged with external factors such as peer feedback, input from an external change agent (e.g., mentor), and innovative instruction to link to growth. Ross and Bruce’s theory emphasizes the impacts of external factors on teacher candidate development without considering the absence of such external factors. Wright and colleagues (2012) found significant differences in the targeted teacher behaviors where teachers who engaged in video-based self-evaluation demonstrated higher rates of implementation as compared to teachers who did not self-evaluate. This study did not supplement video-based self-evaluation activities with external factors such as supervisor or peer feedback. Such positive findings suggest that video-based self-evaluation shows promise for promoting professional growth independent of external factors as were included in Ross and Bruce’s (2007) model. Internal factors or inherent traits (e.g., attitude, disposition, and feelings) may play a larger role in teacher candidates’ instructional quality than is fully explored through Ross and Bruce’s model. Therefore, questions remain regarding how internal aspects such as candidates’ attitudes toward activities and accuracy during activities play a role in self-evaluation as a mechanism for performance-based assessment and professional growth. To this end, Holzberger and colleagues (2013) conducted a longitudinal analysis to extend the work of Ross and Bruce (2007) by looking at specific elements within the theory of change model. Holzberger and colleagues (2013) identified a reciprocal relationship between self-efficacy or confidence and instructional quality and concluded that possible mediator variables and other aspects of competence (e.g., attitude and accuracy) need to be examined to better understand the usefulness of self-evaluation within teacher preparation. Therefore, the purpose of this investigation is to better understand the internal aspects of teacher candidates engaging in video-based self-evaluation activities as related to performance-based assessment and professional growth within teacher preparation. Figure 1 illustrates a modified theory of change used to inform this investigation.

Self-evaluation modified theory of change. Note. Adapted from Ross and Bruce’s (2007) model of teacher self-assessment as a mechanism for teacher change. Dotted lines indicate additions to the original theory of change.
Current Investigation
A deductive sequential mixed design, where the dominant strand was quantitative and the supplemental strand was qualitative, was employed (Johnson et al., 2007; Morse & Niehaus, 2009; Schoonenboom & Johnson, 2017). As designed, both quantitative and qualitative strands occurred in chronological phases where qualitative data were collected prior to the clinical experience, quantitative data were collected during the clinical experience, and qualitative data were collected after the clinical experience. Data from quantitative and qualitative strands were analyzed independent of one another, but the research questions are interrelated (Teddlie & Tashakkori, 2009) with the purpose of providing contextual understanding and clarification of the results from quantitative analyses with the results from qualitative analyses (Bryman, 2006; Schoonenboom & Johnson, 2017). Mixed research questions were developed to integrate data strands leading to a comprehensive exploration of the findings (Creswell & Creswell, 2017).
This investigation was part of a larger study where 36 teacher candidates, participating in their student teaching field experiences, were split into two comparable groups. Groups were deemed comparable based on statistical analysis of previous experience (e.g., weeks of teaching experience and number of video-recorded sessions), perceived instructional ability prior to intervention, and observed instructional ability prior to intervention (see Nagro et al., 2017). Teacher candidates from both groups completed four video analysis sessions that followed a record, review, reflect, and revise the process. Only treatment group candidates evaluated their performance by rating their abilities using a self-evaluation rubric during the video review step of the process. Both treatment and comparison groups self-reported significant improvements in their teaching ability on a pre–post questionnaire, but only the treatment group demonstrated significant growth in reflective ability and instructional ability from beginning to end of the student teaching field experience (see Nagro et al., 2017). The current investigation is intended to investigate the internal aspects of teacher candidates engaging in self-evaluation activities (i.e., attitude and accuracy) including potential changes that occurred within the treatment group (n = 15). Understanding specific aspects of the treatment group’s experience has the potential to inform best practices in teacher preparation regarding how to facilitate video-based self-evaluation activities that promote authentic change and more broadly, how to facilitate teacher preparation activities that employ this form of performance-based assessment. Therefore, the following three interrelated research questions were investigated:
Method
Participants and Setting
Fifteen teacher candidates completing their semester-long, student teaching field experience in either general education (K–6) or mild/moderate special education (K–12) master’s level teacher preparation programs offered at a non-edTPA, mid-Atlantic private university participated in this study (see Table 1 for descriptive characteristics). Participants were placed in both public and private schools in classrooms ranging from first through sixth grades. Classroom settings included inclusive general education classrooms, cotaught classrooms, self-contained classrooms, and resource room pullouts. Participants were recruited into the study through their student teaching seminar instructors. The original study included 17 teacher candidates in the treatment group. Two of the 17 were placed in schools where the administrators did permit candidates to record their teaching, but the candidates had to review their video-recordings and then delete the video files before leaving the school building. These candidates were included in portions of the larger study but excluded from all analyses requiring video files. Similarly, these two candidates were excluded from this follow-up study because there was no way to compare perceived instructional ability and observed instructional ability without the video files.
Participant Characteristics.
Investigation Procedures
Video analysis setup
Seminar classes, course syllabi, and student teaching activities were similar for all candidates. University supervisors, mentor teachers, and internship instructors did not participate in or discuss any portions of the current investigation with teacher candidates. The university supervisor observations, completed by retired certified teachers, were separate from all video-recorded lessons and focused on different teaching behaviors. This was done to improve internal validity by limiting outside influences on candidates’ opinions toward the video analysis process or candidates’ perceptions of their own teaching performance specific to the aspects being measured on the self-evaluation rubric.
The first author met with teacher candidates during their first seminar class at the beginning of their student teaching field experience to explain the project, pass out materials, answer questions, and collect introductory data using a teacher candidate prequestionnaire. Each teacher candidate received supplies including four, 4 GB password-protected flash drives, a wide-angle clip-on video lens, a tripod for either a smartphone or tablet, a 258 MB flash drive with electronic copies of project resources, and a binder with hard copies of project resources.
Video analysis process
Figure 2 illustrates a flowchart of the four-step, video analysis process (i.e., record, review, reflect, and revise). First, teacher candidates were asked to record themselves teaching a complete lesson 4 times throughout their semester-long, student teaching field experience. Complete lesson was defined as a lesson that included a beginning (i.e., teacher-led instruction with an introduction to the lesson), middle (i.e., students actively engaging in learning), and end (i.e., wrap-up or review).

Flowchart of video analysis process completed 4 times by teacher candidates.
Second, candidates used a self-evaluation rubric to assess their performance during video review. Teacher candidates reviewed their video-recorded lesson within 48 hr of recording their instruction. This emphasis on prompt video review was intended to assure that the video review process would be relevant to candidates who may be very focused on the content covered within the lessons rather than their generalizable instructional ability. Meaning, while not the theoretical purpose, a practical use of video review is to help candidates with lesson planning since they can use video evidence of student progress to inform decisions about the following lesson. Watching the video-recorded instruction soon after having taught the lesson provided candidates with this practical benefit as a way to build buy-in to video analysis activities.
The self-evaluation rubric was intended to narrow their focus during video review to specific components of instruction (i.e., six elements of communicating and questioning) rather than leaving this portion of the performance-based assessment process open-ended. Video-recorded lessons produce an overabundance of data, and teacher candidates, with little to no experience reviewing video-recorded lessons, can be overwhelmed by the process. Narrowing their focus to just a few aspects of their instruction was one way to limit frustration potentially and increase the feasibility of the video analysis process.
Third, during this same 48-hr period, teacher candidates wrote a reflection in narrative form that concentrated on the same six elements of communicating and questioning. The continuity between their self-evaluation forms and their written reflection activities was intended to provide a coherent and comprehensive experience. The hope was that candidates would deeply explore and improve their understanding of a few elements of instruction rather than being introduced to new aspects of teaching with each video analysis session. Candidates referenced time stamps from the video-recorded lesson to call to attention specific instances and to assure they actually used the video evidence to write the reflection rather than writing the reflection from memory alone.
Fourth, the intention was for teacher candidates to apply new insights gained from the record, review, and reflect steps of the video analysis process to revise their instruction. After each video analysis session, teacher candidates turned in self-evaluation and written reflection materials via email, and the video-recorded lessons were saved on password-protected flash drives, sealed in labeled envelopes, and either hand delivered or left in a secure drop-off station on campus. The four video-recorded lessons were intended to represent the beginning, middle, and end of the internship experience. During the final seminar class, the first author collected teacher candidates’ postquestionnaires and offered the candidates a set of video recording supplies to continue video-based self-evaluation activities on their own.
Measures
Instructional ability rubric
Instructional ability was measured using the same rubric candidates used to guide their self-evaluation activities. The researcher-adapted rubric was designed to capture observed instructional ability to communicate with and question students. Communication and questioning techniques are two of six components within Domain 3 Instruction of the Danielson Framework (2013), as measured by levels of proficiency using a 4-point scale (i.e., unsatisfactory, basic, proficient, and distinguished). These two components are further broken down into six observable teaching elements within the framework to include (a) expectations for learning, (b) directions for activities, (c) explaining content, (d) using oral and written language, (e) quality of questions and prompts, and (f) discussion techniques. The critical attributes of each teaching element (see Danielson Framework, 2013) were used to differentiate between levels of proficiency on the rubric. While not in the original 4-point scale, “Not Observable” was included in the instructional ability rubric used for this investigation because of the unpredictability of being a visitor in a mentor teacher’s classroom. Cronbach’s αwas calculated at .867 across all four time points using Statistical Package for the Social Sciences, version 25 (SPSS) demonstrating good internal consistency of this measure.
The appropriateness of pulling from the Danielson Framework (2013) for this researcher-adapted rubric was considered. First, given the recommendations about narrowing the focus for candidates just learning to review video evidence (see Nagro et al., 2018), the decision was made to use a small portion of the framework. Using a subset of the framework did weaken claims about previously established validity and reliability (see Sartain et al., 2011) but resulted in a focused rubric with the benefits of well-articulated examples and nonexamples of vetted quality indicators that could help guide candidates’ self-evaluation activities. These quality indicators populated the rubric offering a concrete approach to differentiating between levels of proficiency. Second, the domain Instruction was selected over domains Planning and Preparation, Classroom Environment, and Professional Responsibilities because the purpose was to measure observed teaching ability. Moreover, as visitors in mentor teachers’ classrooms, candidates may not make decisions about the physical classroom setup or the behavior management system already being employed.
Self-evaluation rubric
Teacher candidates used the same instructional ability rubric as the researchers scoring the video records for their self-evaluations. Directing teacher candidates to focus on their communication and questioning techniques, two components of instruction observable through video evidence (Cantrell & Kane, 2013), was intended to guide teacher candidates toward more accurately noticing a few components of instruction rather than being acclimated with all components of the Danielson Framework (2013). Using the same measure for the researcher-scored instructional ability rubric and the participant-scored self-evaluation rubric allowed for direct comparison between observed instructional ability and perceived instructional ability.
Teacher Candidate Questionnaire
This research-created, 50-item questionnaire included four sections measuring attitudes toward (a) overall instructional ability, (b) video analysis usefulness, (c) implementation fidelity, and (d) demographic information. Specifically, Section 1 was a 13-item Likert-type scale subset (previously reported including psychometric properties in Nagro et al., 2017) that asked candidates to rate their ability regarding aspects of this investigation. Section 2 assessed the perceived usefulness of the video analysis activities overall as well as specific project components including video-recording lessons, writing reflections, self-evaluating, and receiving feedback. Questions from Section 2 that directly related to this investigation (i.e., self-evaluation focused, see Table 2) were used along with the quantitative data to answer the research questions for this investigation. Section 3 asked candidates to describe aspects of the project that would help triangulate implementation fidelity including items such as how many lessons they recorded, how many times they watched each video-recorded lesson, how much time passed between watching their video evidence and writing a self-reflection, and if they used the self-evaluation rubric to guide the video review process. This subsection was not part of the analyses but was used to assure the research team was reporting on the investigation procedures accurately. Section 4 was intended to collect demographic and previous experience information (Table 1).
Teacher Candidate Pre–Post Questionnaire.
Scoring Procedures
Instructional ability and self-evaluation rubric
Four video-recorded lessons were collected throughout the semester, and all video files were scored at the end of the data collection period using the instructional ability rubric where the highest possible score was 24. A score of 1–4 was assigned to each of the six elements listed on the rubric. These scores represented a holistic rating of ability rather than coding each occurrence of a given teaching element within the video. Coding each occurrence was considered (e.g., each time a participant explained directions to students), but at times it was unclear when one teaching occurrence ended and another one began. Therefore, the decision was made to score each of the six elements of teaching holistically across the entire video (e.g., overall, the candidate’s ability to ask questions). In instances where one element was not observable, the instructional ability rubric score was calculated by using the composite score out of five elements rather than six.
Interrater reliability (IRR)
Scorers were trained to at least 80% agreement using practice videos of teacher candidates from previous semesters. Three scorers were veteran special education teachers before earning doctorate degrees in special education and moving into teacher educator roles. The fourth scorer was a doctoral candidate and veteran middle school teacher. All four scorers had previous experience scoring video-recorded instruction using a similar protocol. Authors 1, 3, and 4 scored the first and fourth videos so that at least 50% of the videos were double scored (53% of videos from Round 1 and 60% of videos from Round 4 were double scored).
IRR for the video-recorded lessons was first calculated using the total percentage of exact agreement. When calculating the exact IRR for video files at Time Points 1 and 4 only, the IRR across all three scorers fell to 60%. Scorers independently coded all of their assigned videos at once before comparing scores and discussing their decision-making process. Likely, this approach led to scorer drift. Therefore, the decision was made to bring in a new scorer (fifth author) to help score video files at Time Points 2 and 3 (the mid-semester videos). Ninety percent of video files from Time Points 2 and 3 were double scored. Authors 1 and 5 scored two videos at a time independently and then met weekly to discuss their scores for those two videos before moving to the next two videos. The weekly check-ins included discussions about decision making when using the instructional ability rubric for scoring the video evidence. Likely, this approach, which included a higher percentage of double scored videos and frequent check-ins, prevented scorer drift. Authors 1 and 5 were able to maintain 80% exact IRR on video files at Time Points 2 and 3. The most common disagreement occurred when determining the level of proficiency of discussion techniques. Combined, the overall exact IRR across all four time points and all four scorers was 70%.
Last, adjacent IRR was calculated. It was appropriate to calculate adjacent IRR (i.e., where scores differed by one) because there were at least five levels of scoring (Graham et al., 2012). The overall adjacent IRR was 96% where each of the four time points was 100%, 92%, 100%, and 93%, respectively. While the exact IRR did not meet at least 75%, adjacent IRR was well above the 90% acceptability benchmark (see Graham et al., 2012). The strong adjacent IRR indicates a small divergence between individual scorers. Disagreements were discussed, but the two independent scores for each video were averaged together for the final score of observed instructional ability. Taking the average score rather than coming to consensus is the recommended approach for video scoring and is a conservative approach to accounting for disagreements between scorers (see Cantrell & Kane, 2013). Being conservative in reporting the observed instructional ability scores made logical sense, given the high adjacent IRR but less than ideal exact IRR.
Data Analysis Procedures
In this deductive sequential mixed design, the dominant data strand was quantitative and the supplemental data strand was qualitative. After data collection was complete, all data were analyzed independently and then the findings were merged to test the statistical association between data types. Specifically, quantitative data were analyzed using descriptive statistics and paired sample t tests (see Table 3) to identify the group distribution of perceived and observed instructional abilities including possible changes in such abilities over time (Research Question 1). Qualitative data were analyzed through a summative content analysis (Research Question 2). Finally, methodological triangulation was used to determine the degree to which qualitative findings explained quantitative findings (Research Question 3).
Paired Samples Descriptive Statistics.
Recoding quantitative data
Data were adjusted before analyses occurred. Specifically, dependent variables perceived instructional ability and observed instructional ability were composite scores comprised of six teaching elements, each rated on a 4-point scale where the possible score ranged from 6 to 24. The composite score addressed concerns regarding the unreliability of a single Likert-type scale item, but without a true zero, did not allow for straightforward analysis. Therefore, the items in the scale were recoded to begin at 0 rather than 1 and end at 3 rather than 4 adjusting the possible range to 0–18. This allowed for the new scores to be summed into a composite score and then divided by 18 to result in a continuous variable ranging from 0 to 100 with 50 as a midpoint permitting a straightforward interpretation of the results.
Methodological triangulation was used for Research Question 3. Methodological triangulation is a well-established technique but is not well-documented in education literature (Bekhet & Zauszniewski, 2012). Through this approach, statistical associations are tested between quantitative and qualitative data. In this investigation, a linear-by-linear χ2 analysis was conducted to understand whether the qualitative findings could explain the quantitative findings. Specifically, the quantitative data were recoded to classify participants’ level of self-evaluation accuracy in one of three categories since categorical data are required to run a χ2. The difference between observed and perceived instructional ability scores at Time Point 4 was calculated for each candidate. Candidates fell into three categories of self-evaluation accuracy. Those whose perceived ability scores were within one, two, or three standard deviations of their observed instructional ability scores. These categories of candidates were then compared to the qualitative categories describing participant attitude that were identified through the qualitative analysis. As a result, the data were arranged in a 3 × 3 contingency table (Participant Accuracy × Participant Attitude). A linear-by-linear analysis was selected because the contingency table was larger than 2 × 2. It was a logical fit, given that the qualitative data analysis selected resulted in categorical findings. This methodological approach was intended to determine whether the findings derived from separate approaches aligned with one another.
Screening quantitative data
Some teacher candidates experienced technical difficulties in recording or sharing their video-recorded lessons. The video files were not reviewed until after the semester was over. None of the authors were responsible for assigning grades, and this decision was intended to protect the first author from being influenced by the video content when providing feedback to candidates on their written reflections. Additionally, scoring all videos after the completion of data collections reduced the probability of an observer-expectancy effect where scorers anticipated improvements knowing where candidates were in their semester at the time the video was recorded. Unfortunately, because the flash drives were not checked, many of them were not functional at the time of playback. Files were corrupted, had no sound, or were saved in a format that prevented playback. In all, 10 of the 60 videos were lost. Ten different participants were each missing one of their four video-recorded lessons. There was no probable relationship between the technology issues and participants to suggest any correlation of missing data. This loss in data did not impact candidates’ ability to self-evaluate as they saw the original video file, but it did impact the scorer’s ability to observe candidates’ instructional ability in these 10 instances. The decision was made to use linear interpolation to fill in missing data rather than shrink the sample to only those who had four playable video files. Linear interpolation is preferred over other forms of interpolation, imputation, or using the group mean when considering a good fit and maintaining normal distribution (Noor et al., 2014). In linear interpolation, two known points are used to draw a straight-line filling in the missing data points. Within the scope of this study, when one video was missing, the data from that candidates’ other videos were used to fill in the missing data point. To assure accuracy, SPSS transformation using linear interpolation was used to calculate the missing data points. Normality was assessed at all four time points and results indicated no issues of skewness (i.e., skewness statistic within −1 and 1) or kurtosis (i.e., kurtosis statistic within −2, and 2) confirming normal distribution.
Analyzing qualitative data
A summative content analysis was conducted to review the questionnaire responses (Hsieh & Shannon, 2005). In this summative content analysis, the data set produced from the teacher candidate questionnaire was analyzed in its entirety to look for key words as opposed to looking for themes or patterns within the data. The purpose of reviewing all responses was to provide context and clarity to the quantitative findings. The participants were categorized based on their responses to the teacher candidate questionnaire (Table 2). Two independent coders (first and second authors) reviewed each participant’s responses and categorized the participants. Coders also noted key words that informed the categorization of participants. The two coders then compared responses and assured that the final categories accounted for both of their analyses. There were no disagreements, but there were slight variations in category titles so the coders reviewed a visual display of the categorical data to come to agreement on the final wording. These categories were used to classify candidates’ attitudes toward video analysis activities so that a linear-by-linear analysis testing the association between participant accuracy (i.e., quantitative self-evaluation data) and attitude (i.e., qualitative summative categories) was possible.
Results
Research Question 1
A paired-samples t test in SPSS was used to compare perceived instructional ability scores with observed instructional ability scores at each of the four time points. Figure 3 illustrates the significant differences between perceived and observed instructional ability at all four time points as outlined in Table 4. Consistently, teacher candidates rated themselves significantly higher than the scorers did in regard to candidates’ ability to communicate with and question their students. Teacher candidates did not show improvement in their ability to evaluate their own teaching performance accurately. Visual analysis of Figure 3 shows that candidates scored themselves 14.8% higher than their observed instructional ability score at Time-Point 1, 23.7% higher at Time Point 2, 23.9% higher at Time Point 3, and 18.1% higher at Time Point 4. Candidates’ self-evaluation accuracy did not show improvement over time.

Graph of perceived versus observed instructional abilities over four time points.
Paired Sample t Test to Compared Perceived and Observed Instructional Ability.
Note. Sig. = significance at .05.
Research Question 2
Overall findings from the summative content analysis of key words indicate candidates had a no shift in attitude over time (pre/post). Candidates generally reported low levels of enthusiasm for video analysis with self-evaluation during their student teaching field experience. Two major threads, degree of enthusiasm and perceptions of usefulness, captured candidates’ attitudes. When these threads are crossed, four categories of candidates are formed (i.e., enthusiastic/worthwhile, enthusiastic/worthless, unenthusiastic/worthwhile, and unenthusiastic/worthless). Figure 4 illustrated where each of the 15 candidates fell across the four categories. Most (67%, n = 10) were unenthusiastic about this project but acknowledged that it was worthwhile. Enthusiasm was low when introduced to video analysis with self-evaluation and remained low after engaging in the field experience activities. Specifically, prior to engaging in these activities, 33% of candidates (n = 5) reported being enthusiastic about watching their own video-recorded lessons, and after all four video analysis sessions, only two participants (13%) reported they were still enthusiastic (see Figure 4). Issues with technology and the time-consuming nature of engaging in video analysis with self-evaluation multiple times during their student teaching field experience were the two main indicators based on key word analysis for negative attitudes toward the project. Despite negative attitudes or low levels of enthusiasm for video analysis with self-evaluation, 73% of candidates (n = 11) thought video analysis with self-evaluation was worthwhile. Those who did not see this project as worthwhile also reported the entire project was harder than expected. Similarly, 80% of candidates (n = 12) perceived the project as a learning experience and reported that their teaching practices changed as a result of engaging in video analysis with self-evaluation. Those who did not see this project as useful also reported that the entire project was harder than expected. In fact, eight candidates reported that the project was harder than expected, four felt it was exactly what they expected, and three reported that the project was easier than expected. These findings indicate candidates were unenthusiastic about the process, but acknowledged it was a worthwhile learning experience (Figure 4).

Four categories of participants. Note. Each X indicates one participant. The four participants close to the Y-axis indicate conflicted responses.
Most candidates did not change their attitudes over time. In fact, all 11 candidates who predicted video analysis with self-evaluation would be useful on the prequestionnaire maintained their positions on the postquestionnaire. Similarly, two of three candidates who predicted the project would not be useful maintained their positions on the postquestionnaire. One candidate, who did not enjoy or feel he benefited from this project, reported enthusiasm on the prequestionnaire but changed his responses to unenthusiastic on the postquestionnaire. One participant changed her attitude to the positive. In total, only two of the 15 candidates changed their attitudes toward video analysis with self-evaluation over time.
Research Question 3
Methodological triangulation did reveal a link between quantitative (accuracy) and qualitative (attitude) data. Results from the liner-by-linear χ2 analysis indicate a significant negative association between accuracy and attitude (p = .037). Those who were least accurate (i.e., their self-evaluation scores were more than two standard deviations away from their observed instructional ability scores) were most likely to be enthusiastic and see the project as worthwhile. Conversely, candidates whose perceived abilities were within one standard deviation of their observed instructional abilities were more likely to see this project as worthless and were unenthusiastic about engaging in the work. Candidates who perceived their instructional ability to be of the highest quality were least accurate and tended to have the most positive attitude toward engaging in video analysis with self-evaluation activities. The modified theory of change (Figure 1) that guided this investigation does correctly indicate an association between internal factors, accuracy, and attitude as related to video-based self-evaluation activities.
Discussion
A goal of this investigation was to better understand the internal aspects of video-based self-evaluation activities as related to performance-based assessment and professional growth within teacher preparation. In order to achieve this goal, a subset of participants from a larger study, who all showed significant improvements in their instructional abilities after engaging in video analysis (see Nagro et al., 2017), were further analyzed to better understand aspects of video analysis activities including their self-evaluation accuracy and their attitude around engaging in these teacher preparation activities. To summarize, teacher candidates acknowledged video analysis with self-evaluation as a performance-based assessment was a worthwhile learning experience, but candidates had inaccurate perceptions of their own instructional abilities, were unenthusiastic about engaging in the preparation activities, and their attitudes about video analysis did not shift over time. Moreover, candidates who were most enthusiastic to engage in the process were the least accurate at recognizing their own instructional abilities when compared to teacher education observations. These findings suggest candidates may acknowledge that they benefit from learning experiences that they do not enjoy, and their preconceptions about particular teacher preparation activities may not change even after experiencing documented improvements. The findings were unexpected but informative nonetheless. As posited in the theory of change model (Figure 1), internal factors were associated with video-based self-evaluation activities as measured by methodological triangulation. The theory that candidates’ attitude toward and accuracy in self-evaluation activities will play a role in their professional growth held in this study. We are not making statements of causation, rather, acknowledging the portions of the modified theory of change that were tested held in this specific example. Findings from this study support the notion that perceived instructional ability is not a proxy for observed instructional ability and may actually be a rating of confidence (see Tait-McCutcheon & Knewstubb, 2018). On average, the candidates in this study rated themselves significantly higher than the observed instructional ability scores at all time points. Candidates may have had a well-establish attitude regarding engaging in such teacher preparation activities, but the findings suggest a lack of alignment between their attitudes toward engaging in learning through video analysis with self-evaluation, their confidence in their instructional abilities, and their observed instructional abilities.
Teacher Education Implications
Self-evaluation practices have many benefits but are likely not a substitute for performance-based assessments with feedback within teacher preparation. Static attitudes or inaccurate perceptions may be reinforced if the majority of performance feedback candidates receive is feedback from self-evaluation activities. Performance feedback from others that is timely, specific, positive, and corrective when necessary is critical to improving teacher candidate instructional abilities (Brownell et al., 2020). Teacher educators might consider combining self-evaluation activities with performance-based assessments that include feedback from outside agents such as university supervisors. This recommendation is in alignment with Ross and Bruce’s (2007) original conclusions regarding the need to pair self-evaluation activities with other external factors and extends their work by introducing additional considerations regarding the need to account for interplay between external factors (i.e., self-evaluation activities) and internal factors (i.e., attitude and accuracy). Structuring preparation activities in a manner that prompts candidates to adopt new perspectives regarding their instructional abilities and overall attitude toward critical career preparation may benefit them within the context of a specific course or field experience and will likely have a broader impact on their overall development as profession-ready teachers. Once teacher candidates enter the workforce, they will be expected to continue to grow and improve through professional development and coaching activities. Helping candidates find value in teacher education activities during their preparation may cultivate an appreciation for ongoing growth within the profession.
Field experience preparation activities have the potential to translate to professional when candidates enter the workforce (Etscheidt et al., 2012). However, without buy-in, candidates are unlikely to see the value in activities regardless of the impact, and therefore these practices are unlikely to generalize beyond preparation. The candidates in this study did significantly improve their instructional abilities after participating in video-based self-evaluation (see Nagro et al., 2017), but the next step is to identify methods for increasing accuracy and potentially even enthusiasm. Candidates were more apprehensive than enthusiastic in regard to the technical aspects of engaging in video analysis (e.g., video setup, recording, uploading, and playback) as well as the idea of reviewing and sharing their video-recorded instruction with others. Despite the fact that all the teacher candidates had readily available mobile technology with video-recording capabilities, many of them experienced technical difficulties. Many candidates needed ongoing technical support in order to complete the steps within the video analysis process. Beyond technical issues, engaging in the four-step video analysis process (record, review, reflect, and revise) is challenging and uncomfortable for candidates but presents valuable learning opportunities that are difficult to replicate through role-play or simulation (Nagro, 2019). Introducing video-based activities early and often in candidates’ preparation may help ameliorate anxieties about technical aspects or discomfort with self-confrontation and allow for authentic engagement during field experiences. Emphasizing meaning-making and the importance of growth as opposed to earning a specific score or completing a given set of field experience tasks may help to support learner buy-in and meaningful engagement.
Research Limitations and Implications
Methodological limitations include missing data and less than ideal exact IRR. Both of these limitations potentially stem from the complexities of using video-based activities in teacher preparation. Technology does remove barriers to teacher education research allowing for greater access to teacher candidates in authentic settings. However, video-based technology is not without fail. Issues related to capturing, uploading, and downloading video files for review posed challenges for several candidates. Additionally, the video evidence was limited because of the video-recording equipment. Candidates were encouraged to use what they had to maximize affordability and familiarity, but video file quality was sacrificed creating challenges for consistency in coding. As video-recording technology continues to improve (e.g., tablet with Swivl), these issues will likely become extraneous. Regardless of the video-recording technology used, discussing with candidates the do’s and don’ts of video-recording may improve the video file quality.
The theory of change model held for this study based on the aspects tested through the three research questions. Teacher candidate attitude, effort, accuracy, and performance are not isolated from one another. However, additional assertions such as the potential interconnectivity of both external and internal factors still need to be tested. For example, potential reciprocal relationships between performance and attitude were not tested directly. Candidates in this study improved their performance as measured by the instructional skills rubric, but their static attitudes, enthusiasm, and accuracy were not tested for statistical association to instructional performance beyond identifying a disconnect between performance and accuracy existed. The scope of this study fell short in exploring whether improving candidates’ accuracy in understanding their instructional ability or attitude toward engaging in their preparation impacts their teaching performance. Future research might use this modified theory of change to inform research that explores other examples of how attitude and accuracy might interface with teacher preparation self-evaluation activities and teacher candidate performance.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
