Abstract
Improving student performance on exams is a key issue that many psychology instructors face in their classrooms. One potentially easy to deploy option for improving student performance is an exam wrapper. In this article, I detail two studies that compared exam wrappers to a control condition (a previous semester in Study 1 and a within course control condition in Study 2). Both studies found notable improvements in student exam performance above what is typically seen in the course. This suggests that the exam wrapper is an easy-to-employ tool for your students to use to improve their test preparation and performance.
One of the most vexing problems in teaching college courses is how to improve students’ preparation for exams in a class. Factors that contribute to students’ poor performance on exams include students not understanding textbook materials (e.g., Levin & Mayer, 1993; Schnotz & Wagner, 2018), issues with regard to teaching the materials and the generation of quality notes (e.g., Heijne-Penninga et al., 2015; Van Meter et al., 1994), and poor processing of information by the students themselves, which may result from background deficiencies (e.g., Smiderle & Weigel Green, 2011; Symons & Pressley, 1993) to test anxiety (e.g., Nelson et al., 2015; Tobias, 1985) and motivational factors (e.g., Bahri & Corebima, 2015; Clause et al., 2001).
Perhaps one of the most important issues that students encounter is poor preparation for the examinations. Some students have faulty assumptions about how their peers are studying and this has been shown to affect their own performance negatively (Buzinski et al., 2018). Another factor that may influence exam preparation is how faculty teach students to prepare for exams. For instance, research (Hunter, & Lloyd, 2018) has shown that few faculty provide students with active techniques to improve exam preparation. Other researcher (Foss, 2013) has addressed how general test-taking strategies can impact exam performance in a more general sense, such as eliminating clearly erroneous answers if you don’t know the correct answer on a multiple-choice question.
One technique that has been suggested to help students in the middle of a course is called an exam wrapper (Ambrose et al., 2010). The exam wrapper is a conceptually simple tool to improve student performance on exams. Following an exam in the course, the instructor passes out an exam wrapper while discussing the results. The exam wrapper will typically (a) ask the students how much time they spent preparing for the examination and how they distributed their exam preparation, (b) ask the students to reflect on where they made mistakes, and (c) ask the students to think of concrete steps that they plan to take to prepare for the next exam. Finally, the exam wrappers are collected and should be returned to the students prior to the next exam for further review.
Exam wrappers have been used successfully in a number of different disciplines including computer science (e.g., Craig et al., 2016), engineering (e.g., Chew et al., 2016), food science and nutrition (Gezer-Templeton et al., 2017), and physics (Lovett, 2013). Despite these successes, not all studies have found success with exam wrappers (e.g., Stephenson et al., 2017).
To date, it appears that only one study has explored the use of exam wrappers in a psychology course scientifically. Sociher and Gurung (2017) explored metacognitive skills and exam wrappers in community college courses. In this study, the authors found that when using exam wrappers on the day, tests were returned and handing back the exam wrappers the next class period, no benefit was seen.
Given the large number of reported successes with exam wrappers have shown in the literature (Soicher & Gurung, 2017; Stephenson et al., 2017, notwithstanding), I sought to explore whether exam wrappers would be useful in an undergraduate psychology course. Based on the extant literature, I expected that the exam wrapper would improve student performance.
Study 1
In this study, I tested whether an exam wrapper would improve test performance in a section of undergraduate social psychology. I compared the previous semester’s exam scores on Tests 1 and 2 with the exam scores in the experimental course on identical exams. I hypothesized that students in the experimental section would show an increase in test performance on Exam 2, beyond that which is typically seen between exams. I obtained institutional review board approval prior to data collection.
Method
Participants
In the control condition, I used the exam score data from a spring section of undergraduate social psychology. This section had 32 students in the course. In the experimental condition, I used the exam score data from a fall section of undergraduate social psychology. This section had 52 students in the section. Of this sample, 36 students completed the exam wrapper exercise (this represents the students who were in class on the day the first exam was returned or who had sought the exam from the professor outside of class).
Materials
Exam Wrapper
The exam wrapper consists of a series of five questions that are designed to engage the students in a metacognitive process assessing their performance on the exam that had just been handed back. See Appendix for the full list of questions.
Student Reactions
To understand whether the students found the exam wrapper to be helpful, a 7-item survey about the exam wrapper was handed out to the students at the end of the semester. See Table 1 for the questions and the average responses across Studies 1 and 2.
Student Reactions to the Exam Wrapper.
Note. Students responded on the following scale: 1 = strongly disagree, 2 = disagree, 3 = somewhat disagree, 4 = neither agree nor disagree, 5 = somewhat agree, 6 = agree, and 7 = strongly agree.
Exams
Exams in the undergraduate social psychology course consisted of 25 multiple-choice questions and 5 short essay-style questions. The topics on the exams are not cumulative in focus (although key terms like reliability and validity are seen in all exams).
Procedure
In the control condition, I handed exams back as normal, to go over scores in class. And as normal, I collected the exams after going over the correct answers and why they were correct. In the experimental condition, after handing the first exam back to students and going over the correct answers and why they were correct, I handed out the exam wrapper in class. I then gave the students ample time to complete the exam wrapper and hand the exam wrapper in along with the exam. One week before the second exam, I handed the exam wrappers back to the students and told them that I did so in order to remind them about what they thought about the first exam after getting it back. At the end of the semester, I handed out the student reaction survey in class to assess their perceptions of the exam wrapper.
Results
The chief analysis of interest in this study was to test how exam performance between Exams 1 and 2 differed across the experimental and control conditions. On Test 1, the groups did not significantly differ (Mcontrol = 78.62, SDcontrol = 13.53; Mexperimental = 77.44, SDexperimental = 15.03). Exam scores increased on Test 2 in both sections, although the increase in the control condition (Mcontrol = 81.59, SDcontrol = 14.65) was smaller than the increase in the experimental condition (Mexperimental = 86.81, SDexperimental = 12.47), Finteraction (1, 82) = 6.15, p = .015,
Another analysis of interest was whether studying time reported by students on the exam wrapper was predictive of scores on the first exam. As is commonly seen in the literature, amount of time spent studying was positively associated with exam scores (β = .37, p = .026).
Finally, I wanted to see if students’ perceptions of the exam wrapper were in any way influenced by their relative improvement in scores from Exam 1 to Exam 2. To test this, I first sought to establish whether the student reaction items constituted a single scale. Once Item 7 was reverse-coded, the scale had a very high reliability (α = .90) and consisted of a single factor when explored through an Exploratory Factor Analysis (a single eigenvalue = 4.55, along with multiple eigenvalues below 1). As such, I collapsed all 7 items into a single scale. I then regressed the difference in scores onto student reactions; as students’ scores improved from Exam 1 to Exam 2, student perceptions of the usefulness of the exam wrapped increased (β = .49, p = .009).
Brief Discussion
As hypothesized, students who completed an exam wrapper showed a greater increase in exam scores from Exam 1 to Exam 2 relative to a control condition. I also demonstrated that self-reported studying time was associated with exam scores, such that students who spent more time studying scored higher on the exams. I also found that the student reactions to the exam wrappers were associated with exam scores. As such, this provides preliminary evidence that an exam wrapper can be used to help students.
However, at least five other potential explanations could be offered to explain the increase in students’ scores. First, although I would never consciously differentially evaluate students in different courses, I could have graded the short essay questions “easier” during the semester I was testing the exam wrapper. Another possible explanation would be that I was a more effective professor during the experimental semester. A third possible explanation would be that different types of students signed up in fall as opposed to spring. Fourth, conscientious students may have shown an increase in response to any study aid offered by an instructor (and as such, any aid offered would have increased the class wide average). Finally, the simple act of telling the students that I was testing a new teaching technique could have inspired them to a higher level of performance. Given these potential alternative explanations for this effect, I conducted a second study to rule out these alternative explanations.
Study 2
In this study, I tested the effectiveness of an exam wrapper in a single section of a social psychology course. In this study, the use of the exam wrapper again represents the experimental condition, whereas the control condition consisted of a handout to the control condition students detailing studying techniques. In this way, student performance could be more directly compared eliminating the potential confounds present in Study 1.
Method
Participants
Participants were all drawn from a single undergraduate social psychology course. Thirty-one students were randomly assigned to the experimental condition (3 of whom were lost to attrition during the semester) and 32 students were randomly assigned to the control condition (2 were lost to attrition during the semester).
Procedure
At the start of the semester, students were randomly assigned to one of the conditions. During the second week of the course, all students completed a critical thinking battery (Stupple et al., 2011). After returning Exam 1 (approximately one third of the way through the course), the experimental group of participants received the exam wrapper, while the students in the control group received the handout on test-taking strategies (developed by Brigham Young University; Academic Success Center, 2017). As before, students who received the exam wrapper were asked to reflect on their performance on the exam; students who received the test-taking strategies handout were asked to read the handout and mark a response that they had actually read it. Students returned the materials to me and I held the materials until 1 week before the next exam. Additionally, I brought the exams (and associated materials) to class on subsequent days and gave the exam and materials to the stragglers who had missed the previous day. One week before the second exam (which took place approximately two thirds of the way through the class), I handed the study materials (the exam wrapper or the test-taking strategies) back to the students. As part of returning Exam 2 to students, I again provided an exam wrapper to the class. However, given the results found in Study 1, I wanted to ensure that all students had the opportunity to benefit from the exam wrapper; as such, all students received the exam wrapper. During the last week of the course, students were asked to react to the exam wrapper (as in Study 1). Finally, I noted scores on the third exam in course. Also of important note, during the scoring of the exams, I was blind to the student identity while scoring the exams. This was accomplished by only having the student’s name on the first page of the examination (and the only items on the first page were objective multiple-choice questions). I graded each exam page across every student sequentially (meaning I marked every Page 1 across all students and then I marked Page 2 across all students). In doing so, I had no idea whose exams I was grading when I was marking the more subjective short answer items.
Results
The chief analysis of interest in this study was to test how exam performance between Exams 1 and 2 differed across the exam wrapper (experimental) and test-taking strategies (control) conditions. On Test 1, the groups did not significantly differ (Mcontrol = 73.55, SDcontrol = 11.04; Mexperimental = 73.56, SDexperimental = 12.50), whereas exam scores increased on Test 2 in both sections, although the increase in the control condition (Mcontrol = 82.96, SDcontrol = 8.60) was smaller than the increase in the experimental condition (Mexperimental = 90.22, SDexperimental = 9.20); when using the exam wrapper in both conditions (after Exam 2), the difference between the groups disappeared differ (Mcontrol = 91.48, SDcontrol = 10.72; Mexperimental = 90.78, SDexperimental = 11.75), Finteraction (2, 108) = 4.97, p = .009,
Again, I wanted to see if students’ perceptions of the exam wrapper were in any way influenced by their relative improvement in scores from the first Exam 1 to the exam where they benefited from the exam wrapper (either after Exam 1 or Exam 2 depending on condition). To test this, I again sought to establish whether the student reaction items constituted a single scale. Once Item 7 was reverse coded, the scale had a very high reliability (α = .93) and consisted of a single factor when explored through an exploratory factor analysis (a single eigenvalue = 4.95, along with multiple eigenvalues below 1). As such, I again collapsed all 7 items into a single scale. I then regressed the difference in scores onto student reactions (for experimental condition, the difference between Exams 1 and 2; for control condition, the difference between Exams 1 and 3); as students’ scores improved from the exams, student perceptions of the usefulness of the exam wrapper increased (β = .41, p = .012).
Finally, I wanted to see whether students’ attitudes toward critical thinking were tied to their outcomes in the course (using the Stupple et al., 2011, measure). Of the three subscales, only Value of Critical Thinking was associated with changes in the exam scores due to the exam wrapper (β = .39, p = .039). This suggests that the students who value critical thinking more showed the greatest increases when they received the exam wrapper intervention.
Brief Discussion
As demonstrated in Study 1, students who completed an exam wrapper showed an increase in exam scores as a result of the exam wrapper. Control condition participants showed this increase from Exam 1 to Exam 3, whereas experimental condition participants showed this increase from Exam 1 to Exam 2 (and maintained a similarly high level of performance in Exam 3). Given that the instructor was blind to a student’s condition when grading exams and all of the students completed the same exams on the same lecture material, the only plausible explanation for this interaction is the influence of the exam wrapper. I also noted that the student reactions to the exam wrappers were associated with increased exam scores. Finally, I found that the value students place on critical thinking moderated the benefits they received from the exam wrapper, such that students who valued critical thinking more showed the greatest benefits from the exam wrapper.
General Discussion
Across two studies, I explored the effectiveness of the exam wrapper in undergraduate social psychology courses. In Study 1, I found that the exam wrapper improved student performance on the second exam in the course relative to the same exam given in a previous semester (resulting in a gain of 6.4 points using the exam wrapper). In Study 2, I tested both an exam wrapper and a test-taking strategies handout after Exam 1 and found that the exam wrapper provided a greater benefit than the test-taking strategies handout. In using the exam wrapper for both groups after Exam 2, the test-taking strategies group showed an increase in performance equal to (and even slightly exceeding) the previously received exam wrapper group (resulting in an average gain of 7.25 points using the exam wrapper). This suggests that the self-reflection (either the original self-reflection or the review of the self-reflection) associated with the exam wrapper is a critical component of its effectiveness relative to the test-taking strategies handout. I also found that the value that students’ place on critical thinking moderated the effectiveness of the exam wrapper such that students who valued critical thinking more showed the greatest benefits from the exam wrapper.
The results found in this study are fairly different from the results found by Sociher and Gurung (2017). This begs the question about what may have caused the discrepancy. There are a number of small differences between the studies reported in this article and the Sociher and Gurung study. First and, in my opinion, most important is the difference in timing of the return of the exam wrappers to the students. In the Sociher and Gurung (2017) study, the exam wrappers were returned to the students the next class day after they were completed (Sociher, personal communication, November 5, 2018). In my implementation of the exam wrapper, I returned the exam wrappers to the students 1 week before the next exam. The reason I chose to implement the exam wrappers this way was that I believe that not only do the students benefit from the meta-cognitive activity immediately after the exam (reflecting on what they did wrong), but I believe that jogging their memory of this is critical during the time frame the students are studying for the next exam. As such, it is possible that this subtle difference is responsible for the discrepant findings (indeed, this difference could potentially explain the various differences seen in the literature more broadly).
Another possibility was the use of the different techniques (control vs. exam wrapper) in different sections of a course. Certainly, this was a concern that I had in Study 1, in which the exam wrapper’s effectiveness may have unconsciously influenced scores or that I may have had different quality students in different sections in different semesters. Certainly, there are anecdotes that different types of students register for different days and times. Study 2 in this article certainly rules out these alternative explanations.
Limitations and Future Directions
This article provides evidence from two studies on the effectiveness of exam wrappers in an undergraduate social psychology course. However, there are a number of limitations that should be noted. For instance, this article is limited to a single faculty member in a single course (undergraduate social psychology) at a single university. It is possible that there are limitations to the types of faculty, the types of students, or the types of courses in which exam wrappers would have benefits.
Certainly, there is evidence that exam wrappers can be effectively used in a number of different disciplines (ranging from physics to food science and nutrition). However, less evidence exists with regard to the types of classes that will show these benefits. It is quite possible, for instance, that lower level courses (100- and 200-level) would be the prime courses in which to employ exam wrappers. Future research should be done where the same faculty in the same departments employ exam wrappers across different levels of courses to see whether the effectiveness of the exam wrappers changes with regard to the course in which it is used.
It is also possible that certain types of faculty would show the greatest benefits in their student performance. Faculty vary in terms of pedagogical styles, levels of experience, among any number of other variables. At this time, there isn’t evidence yet to suggest which type of faculty would show the greatest benefits in their classes, but the possibility remains. As such, future research should employ a more comprehensive approach to testing exam wrappers across multiple types of faculty members. This would demonstrate how much variability in outcomes is explained by the different sorts of faculty members employing the technique.
Exam wrappers may also have greater impacts at particular types of institutions. For instance, many of the studies that have reported gains in exam performance as a result of the exam wrapper have been private institutions. As such, it is possible (however unlikely) that some institutional-level variable exists that explains these findings (these could include the selectiveness of the institutions, the Carnegie classification of the institution, the public vs private nature of the institution, or any other number of variables). Future research should test this in a comprehensive fashion comparing different types of institutions and exploring any relevant institutional level differences.
Also, it remains quite possible (likely even) that the specific way that the exam wrapper is deployed may influence the success it shows. Based on the evidence found in this study, contrasted with the technique used in the Sociher and Gurung (2017) study, research should explore whether the timing of the return of the exam wrapper matters. I would argue that returning the exam wrappers to the students during the time frame in which they are studying for the next exam would be the most successful way to employ the exam wrapper. Of course, this could be empirically tested as part of a comprehensive examination of exam wrapper administration. As such, future studies should specifically test the suggested moderator of the timing of the returning of the exam wrapper. This could even take three levels: (1) the suggested implementation used in this study of returning the exam wrapper 1 week before the next exam, (2) returning the exam wrapper the next day after completion (as employed by Sociher & Gurung, 2017), and (3) not returning the exam wrapper to the students at all.
Finally, research should be done to explore the specific mediating factors that led to the success of the exam wrapper. For instance, does the exam wrapper lead to increases in studying time? Or, does the exam wrapper change the studying techniques employed by the students? These sorts of questions may require direct observations of the students’ studying habits (both before and after the implementation of the exam wrapper), but this sort of study would also provide much more insight into student studying and its relationship to exam performance more generally.
Conclusions
The two studies reported in this article provide strong evidence that the use of an exam wrapper in an undergraduate social psychology course improved student performance on later exams in the course. It is easy to employ the exam wrapper technique (costing less than 10 min of instruction time) and the exam wrapper showed notable performance boosts in student exams (resulting in unique gains of 6.4 and 7.25 points across Studies 1 and 2). Given the extant data and the success found in this study, I would recommend returning the exam wrappers to the students approximately a week before the subsequent exam. The success of the exam wrapper has led me to incorporate this technique more broadly in my own teaching.
Footnotes
Appendix
Name: _______________________
Acknowledgments
The author wishes to thank Dr. Jessica L. Hartnett for helpful comments on an earlier draft and Cheyenne Stefano for assistance in entering data.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
