Abstract
The present study investigates the impact of participation in a peer assessment activity on subsequent academic performance. Students in two sections of an introductory psychology course completed a practice quiz 1 week prior to each of three course exams. Students in the experimental group participated in a five-step double-blind peer assessment activity immediately following the practice quiz, whereas those in the control group participated in the identical activity 1 week after the exam. Results show that participation in the peer assessment activity enhanced subsequent exam performance in all three cases, even after accounting for online mastery quiz performance and attendance. A detailed description of the peer assessment activity is provided as a flexible template for instructors.
Peer assessment has become an increasingly popular pedagogical tool in higher education (Carvalho, 2013; Gielen, Dochy, & Onghena, 2011), with instructors effectively incorporating it to provide feedback (formative assessment), evaluate learning (summative assessment), or both (Baker, 2008). Variously described in the literature as peer review, peer rating, peer feedback, peer marking, peer correction, or peer appraisal, peer assessment is defined generally as “an arrangement in which individuals consider the amount, level, quality, or success of the products or outcomes of learning of peers of similar status” (Topping, 1998, p. 250).
Previous research has shown that individual peer assessors benefit by developing a range of behavioral, cognitive, and metacognitive skills (Carvalho, 2013), including critical and reflexive thinking (Boud, Cohen, & Sampson, 1999; Min, 2006), evaluation and writing (Cho, Schunn, & Wilson, 2006), problem-solving (Hwang, Hung, & Chen, 2014), and communication and cooperation (Davis, Kumtepe, & Aydeniz, 2007). Peer assessment has also been linked with increases in learning motivation (Hwang et al., 2014; Jenkins, 2004), maturity and confidence (Cheng & Warren, 2000), taking responsibility for one’s own learning (Cho et al., 2006), and learning performance (Hwang et al., 2014; Wang, 2004; Xiao & Lucking, 2008). Taken together, these findings suggest that peer assessment activities should have an enduring impact on academic performance (Janes, 2007; Vickerman, 2009).
Despite this potential, a review of the peer assessment literature shows that the learning that is typically assessed is the final version of the assignment that received peer evaluation. This has also been the case within psychology courses, where peer assessment has been successfully implemented both online and face to face to improve students’ writing (Bakhshi, Harrington, & O’Neill, 2009; Cathey, 2007; Haaga, 1993; Pare & Joordens, 2008; White & Kirby, 2005), to refine experiment proposals (Sung, Lin, Lee, & Chang, 2003), to develop critical thinking skills (Anderson & Soden, 2001), and to evaluate student debates (Smith, 1990) and research posters (Edgerton & McKechnie, 2002).
In the single exception to this trend, Fantuzzo, Dimeff, and Fox introduced an innovative reciprocal peer tutoring (RPT) procedure for students of abnormal psychology that included three steps: (1) generating 10 multiple-choice questions for each course unit, (2) testing a randomly assigned partner with those questions, and (3) providing feedback to the partner about their answers (1989). This RPT procedure positively impacted students’ scores on a later multiple-choice examination, relative to two control groups. However, in this case, it was unclear whether the gains in subsequent academic performance were the result of testing and/or peer assessment, as the only step accounted for by a control group was the generation of questions.
The Present Study
The dearth of studies on the later and long-term effects of peer assessment is surprising, given that many of the potential benefits believed to accrue to students involve the development of skills that ought to impact their course performance beyond the specific peer-assessed assignment. The present study aimed to address this gap in the literature by investigating the impact of participating in a peer assessment activity on subsequent exam performance. The study employed a five-step double-blind peer assessment activity that may serve as a template for instructors interested in incorporating a peer assessment component within their course.
Hypothesis
Based on previous research (Anderson & Soden, 2001; Cho et al., 2006; Fantuzzo et al., 1989; Hwang et al., 2014), students who participate in a peer assessment activity prior to their course exams will outperform students who participate in the same activity following the exams. This effect should hold even after accounting for online mastery quiz performance and attendance, both of which are also expected to positively predict exam performance (Roediger III, Agarwal, McDaniel, & McDermott, 2011; Thatcher, Fridjhon, & Cockcroft, 2007).
Method
Participants
Two sections of a freshman course in introductory psychology at a small Canadian public university were randomly assigned to serve as the control (n = 36) and experimental groups (n = 35) for this study. Both sections were taught twice weekly on the same afternoons by the author, had a slight majority of females, and mainly included first-year university students enrolled in general arts and sciences programs (see Table 1).
Participant Characteristics Across the Two Sections.
Note. N = 74. CI = confidence interval; LL = lower limit; UL = upper limit.
Measures and Materials
Course exams
Over the course of the semester, students in both groups completed three noncumulative exams, each of which was worth 20% of their overall course grade. Each exam assessed students’ comprehension and application of concepts from the preceding 3 topics and included 40 multiple-choice, 10 fill-in-the-blank, and 2 short-essay questions. The students were given 75 min to complete each exam.
Class attendance
Attendance was noted for 13 of the 18 class meetings that did not involve a formal assessment, although attendance itself was not mandatory and did not factor into the calculation of the overall course grade.
Online mastery quizzes
For each of the nine course topics, students in both groups completed an online mastery quiz that consisted of 10 applied multiple-choice questions selected from the test bank developed by the textbook publisher (see Appendix for sample questions). Each quiz was open to students for 1 week; however, there was no limit to the number of attempts permitted, and the quizzes had no time limit. The number of attempts taken by each student in order to achieve a grade of 100% on each quiz was taken as a proxy measure of their mastery over that topic area. Each quiz contributed up to 1.5% toward the overall course grade (13.5% in total). These low-stakes quizzes were designed to structure and incentivize distributed practice and practice testing (Dunlosky, Rawson, Marsh, Nathan, & Willingham, 2013; Roediger III et al., 2011) while providing students with feedback concerning their grasp of the theories and concepts within each topic area, prior to each of the three course exams.
Practice quizzes
One week prior to each exam, students in both groups completed an in-class practice quiz. The quiz consisted of two short-essay questions that assessed their comprehension and application of key concepts from the preceding three topics (e.g., “name, briefly describe, and provide a real-world example of each of the five schedules of reinforcement”; “describe two strengths and two weaknesses of each of the following types of research designs: case study, naturalistic observation, and laboratory experiment”). The practice questions were specifically chosen by the instructor to provide students with authentic examples of potential exam questions, once again allowing them to experience the benefits of practice testing (Dunlosky et al., 2013). The students were given 20 min to answer both questions. These practice quizzes, each of which contributed 5% toward the overall course grade, were the target of the peer assessment activity described below.
Design and Procedure
At the start of the administration of each practice quiz, the students were instructed not to write their name or student number on their answer sheet. After completing the quiz, the students wrote a —four to six digit numeric code of their choice at the top of their answer sheet. As they returned their answer sheet to the instructor, they recorded their personal numeric code beside their name on a class list.
Peer assessment group
Immediately following the practice quiz, students in the experimental group went on to participate in a double-blind peer assessment activity, designed to help them take the perspective of the instructor and to consider how their answers on the course exams would be evaluated. The activity took place across five steps:
First peer assessment
The instructor shuffled the students’ answer sheets from the practice quiz before redistributing them randomly among the students. The students were asked to inform the course instructor whether they received their own quiz (in these rare cases, a substitute was provided). A grading rubric for both questions was then projected onto the screen, detailing the correct answers to the two questions, along with instructions concerning when the students could award full or part marks. The students were encouraged to ask questions of the instructor if any aspect of the rubric was unclear to them. The students then proceeded to grade the answers of their peer, listing the assigned grades near the top of each written answer. It is important to note that the students were blind to the identity of the randomly assigned peer whose answers they were grading.
Second peer assessment
Once they had completed the first peer assessment, the students exchanged answer sheets with another student seated beside them and proceeded to grade the answers of this second peer. The students were asked arrive at an independent judgment and to list their assigned grades beside the grades just assigned by their partner. Once again, the students were blind to the identity of the peer whose answers they were grading.
Discussion
Each pair of peer graders was asked to discuss the grades they had assigned. When the two sets of grades differed, the students were asked to explain their rationale for their assigned grade to their partner. Students were informed that they should feel no pressure to reach a consensus, but that they should feel free to revise any of the grades they assigned following this discussion with their partner. The students were also encouraged to ask questions of the instructor during the entire peer grading activity.
Review of feedback
At the end of the peer grading and discussion procedure, the instructor collected all of the answer sheets, each of which now had two sets of assigned grades printed near the top of each written answer. Using the numeric codes listed on the class list, the instructor then returned the peer-assessed answer sheets to the students who wrote those answers, providing them with immediate feedback in the form of grades assigned by two of their peers. This step enabled students to engage in a form of self-appraisal (Meyer, 1991), in which they considered the grades assigned to them by their two peer graders in light of their knowledge of the grading rubric, which they themselves had just applied twice. It is important to note that the students were blind to the identities of the peers who graded their answers.
Discussion with instructor and determination of the final grade
The students were informed that disagreement between the two peer graders on any of the sets of assigned grades would automatically trigger an instructor evaluation and binding grade. However, in cases where the two peer graders had achieved consensus, if the students agreed with the assigned grades, these would stand as their final grade on the practice quiz. This design element is based on the previous research that has found no significant differences between the judgments of expert and peer markers (Pare & Joordens, 2008). However, if the students disagreed with any of the assigned grades, the students were invited to discuss their specific concerns with the instructor who then provided his feedback and assigned a binding grade.
The entire procedure took about 40 min and was repeated before the second and third course exams.
Control group
Identical to students in the peer assessment group, students in the control group completed the five-step activity but a week following each of their three course exams. The students had been previously informed of this procedure and did not question its timing, perhaps assuming that they would be better prepared to evaluate their peers’ answers after taking the exams.
In order to keep both groups on the same time line for the course, students in the experimental group watched a video in the class the week following each of their exams. These videos were made available online to students in the control group.
Results
In order to assess the overall impact of participation in a peer assessment activity on exam performance, a 2 (peer assessment condition) × 3 (course exam) mixed design multivariate analysis of covariance was conducted with overall rate of attendance and online mastery quiz attempts included as covariates. In support of the hypothesis, the experimental group scored significantly higher than the control group on the course exams, F(1, 43) = 13.28, p = .001,
Exam Performance, Online Mastery Quiz Attempts, and Attendance Across Both Groups.
In order to investigate whether this effect held for each of the three course exams as well as to isolate its origins, separate hierarchical multiple regression analyses were conducted, with the disaggregated rates of attendance and online mastery quiz attempts entered as predictors on the first step and peer assessment condition entered on the second step (see Tables 3–5). 1
Summary of Hierarchical Regression Analysis for Exam 1 Performance.
Note. n = 66.
*p < .05. **p < .01. ***p < .001.
Summary of Hierarchical Regression Analysis for Exam 2 Performance.
Note. n = 58.
*p < .05. **p < .01. ***p < .001.
Summary of Hierarchical Regression Analysis for Exam 3 Performance.
Note. n = 56.
*p < .05. **p < .01. ***p < .001.
Whereas a positive influence of rate of attendance and online mastery quiz attempts appeared only on the second and third course exams, peer assessment condition predicted a significant additional proportion of variance in exam performance in all three cases (in the range of a small to medium effect, Cohen’s f 2 = .06 to .10), providing additional support for the hypothesis.
Discussion
The present study provides evidence of a beneficial impact of participation in a peer assessment activity on students’ performances on subsequent course exams, an effect that holds even after accounting for online mastery quiz performance and rate of attendance. Unlike the earlier study by Fantuzzo, Dimeff, and Fox (1989), both the control and experimental groups in the present study participated in the practice quiz prior to each of the three exams, allowing testing to be ruled out as an alternative explanation for the improved performances. Interestingly, the students’ participation in the first peer assessment activity had a larger impact on their subsequent exam performance than the later two instances (β1 = .50, β2 = .25, and β3 = .28), suggesting diminishing returns across multiple peer assessments. Although the finding warrants replication, this interpretation is in line with a recent study by Ludemann and McMakin (2014) who found that a single experience with peer assessment was sufficient for building first-year psychology students’ expectations and confidence.
A second contribution of the present study is the template provided by the five-step double-blind peer assessment activity, the design of which carries several benefits. First, it is simple and free to use. This contrasts with the sophisticated but technically complex proprietary online platforms for peer assessment (e.g., peerScholar, Calibrated Peer Review, Turnitin PeerMark) that require either students or the institution to purchase a subscription. Second, implementing this peer assessment activity in a face-to-face setting gives students the opportunity to ask questions of any kind or magnitude of the instructor at any stage of the process while increasing the likelihood that the task will be performed with the necessary diligence. Third and perhaps most importantly, the students are kept blind to the identities of those whose answers they were grading as well as those who graded their own work, a design feature that addresses concerns about reciprocal grading and several other grading biases (e.g., friendship marking; Dochy, Segers, & Sluijsmans, 1999). Fourth, the review of the peer feedback by the students arguably facilitates a more objective self-appraisal of their own work. Fifth, the activity incorporates up to four sets of judgments in each case (two peer assessments, one self-assessment, and potentially one instructor assessment), which improves the reliability of the evaluation while minimizing the workload of individual peer assessors and the instructor. The ability to replace traditional instructor grades with peer grades will be especially beneficial in larger courses. Finally, similar to method adopted by Hodgson, Chan, and Liu (2014), the instructor’s input is offered if the peer feedback is perceived as inaccurate or insufficient, a practice that has been found to increase students’ perceptions of fairness in peer assessment (Kaufman & Schunn, 2011). It should also be noted that although the outlined peer assessment activity was implemented in conjunction with a practice quiz in the present study, the procedure is flexible enough to be implemented in a wide variety of contexts (e.g., writing assignments, experiment proposals, case studies, etc.).
This study also possesses several weaknesses. Foremost among these is that the control group did not spend an equivalent amount of time reviewing course material (in the form of peer assessment activity) and did not receive feedback on their practice quiz performance prior to each course exam. A follow-up study that includes a more stringent control group (e.g., substituting a 40-min review session in lieu of the peer assessment activity) would help further validate the efficacy of the intervention.
Second, when completing the second peer assessment, the students were not blind to the grade assigned by the first peer grader, creating some potential for conformity to influence the grading. A simple improvement on this procedure that would eliminate this possibility would be to have the two peer graders list their grades on two separate sheets of paper prior to their discussion.
Third, given the nature of the study’s design, arguably the clearest test of the hypothesis occurs with the first exam, as the performance of students in the control group on the second and third course exams may have been affected by their earlier participation in the peer assessment activity following the first and second exams. This lagged effect might help explain why, despite the significant impact of peer assessment condition on all three exams, the gap in exam performance across the two conditions (in absolute terms) shrank across the semester (8.14% for Exam 1, 5.67% for Exam 2, and 4.41% for Exam 3).
Fourth, for many instructors, the use of 40 min of class time for the peer assessment activity may come at the cost of content coverage. Future research might explore the possibility of conducting this procedure online using the university’s learning management system (e.g., Moodle, Blackboard, etc.) in order to investigate whether similar benefits accrue despite the expected losses in instructor accessibility, supervision, and structured time management. Moving the procedure online would carry the additional advantage of being able to eliminate the possibility (<6% in the present study due to the small class sizes) of a student receiving their own quiz during the random redistribution and subsequent exchange.
Finally, although the present study measured the positive effect of participating in a peer assessment activity on subsequent academic performance, an important question that remains concerns the exact pathway of this effect. For example, it is possible that serving as a peer grader and applying a grading rubric supplied by the instructor lead students to study or prepare differently and, as a result, answer written questions differently on subsequent assessments. A follow-up study could test this mediational model.
There is little doubt that peer assessment is a valuable pedagogical tool that brings a range of benefits to the learning process. The present study suggests that these benefits extend beyond the specific peer-assessed assignment.
Footnotes
Appendix
Acknowledgment
I am grateful to Aaron Richmond and the other mentors of the Society for the Teaching of Psychology SoTL Writing Workshop for their invaluable advice and support during my writing of this article. I also thank the anonymous reviewers and editor for their thoughtful and excellent feedback.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
