Interteaching and the Testing Effect

Abstract

In a number of studies, interteaching has produced better student-learning outcomes than traditional teaching methods. Little research, however, has examined ways to make interteaching more effective. Research on the testing effect suggests that frequent testing may improve performance. Thus, including postdiscussion quizzes as a part of interteaching might enhance its efficacy. In this study, college students completed an interteaching session in a simulated classroom setting. Some students completed a postdiscussion quiz, whereas others completed anagrams. All students returned 1 week later to take another quiz over the material they had discussed the previous week. Students who completed postdiscussion quizzes had significantly lower quiz scores than students who completed anagrams. Thus, postdiscussion quizzes may not enhance the efficacy of interteaching.

Keywords

interteaching testing effect

Interteaching is a multicomponent pedagogical method that has its roots in B. F. Skinner’s operant psychology (Boyce & Hineline, 2002). A typical interteaching session proceeds as follows (for more detail, see Boyce & Hineline, 2002; Saville, Lambert, & Robertson, 2011). The instructor prepares and distributes a preparation (prep) guide that contains questions designed to guide students through a preclass reading assignment. Once in class, students form pairs and discuss their answers to the prep-guide items. During the discussion, the instructor moves among groups and answers questions. At the end of the discussion, students complete a record sheet on which they list items that were difficult to understand. The instructor then prepares a brief clarifying lecture that begins the next class period and addresses the items that students found most difficult. Following the lecture, students form pairs and discuss the next prep guide.

Since Boyce and Hineline’s (2002) introduction of interteaching, a number of researchers have compared interteaching to more traditional teaching methods. In a series of studies, Saville and colleagues found that interteaching increased the test scores of both undergraduate psychology students and graduate-level special education students (Saville, Zinn, & Elliott, 2005; Saville, Zinn, Neff, Van Norman, & Ferreri, 2006) and improved the critical thinking skills of undergraduate psychology students (Saville, Zinn, Lawrence, Barron, & Andre, 2008; see also Scoboria & Pascual-Leone, 2009). Researchers have also found that students in undergraduate psychology courses, undergraduate nutrition science courses, and graduate-level special education courses preferred interteaching to lecture (Goto & Schneider, 2009; Saville et al., 2006). Together, these results provide promising evidence that interteaching might be an effective and enjoyable alternative to more traditional teaching methods such as lecture. Nevertheless, there are likely ways to make interteaching even more effective (see Saville et al., 2011). One potential way to improve interteaching is to capitalize on the testing effect.

The testing effect refers to enhanced remembering that occurs when respondents take a brief test over learned material. In one study on the testing effect, Roediger and Karpicke (2006b) had undergraduates learn material from two prose passages and then either restudy the material or take a recall test. Roediger and Karpicke measured students’ retention of the material after 5 min, 2 days, and 1 week. After 5 min, those who restudied the passages remembered more. At 2 days and 1 week, however, those who took the test remembered more.

Chan, McDermott, and Roediger (2006) also examined whether testing improved retention of both tested and nontested, but related, items. College students first read an article and then either took a brief test or restudied the material. The next day, the students took a quiz on which there were both tested items (i.e., those that had appeared on the first test) and nontested, but related, items (i.e., items that appeared in the article but not on the first test). Students in the testing condition answered more questions correctly than students in the study condition and also performed better on both tested and nontested, but related, items.

Researchers have since replicated previous findings on the testing effect both in simulated classrooms (e.g., Butler & Roediger, 2007) and in actual college courses (e.g., McDaniel, Anderson, Derbish, & Morrisette, 2007). Given that frequent testing seems to enhance student-learning outcomes, one potential way of capitalizing on the testing effect with interteaching would be to administer brief quizzes after the discussions. Research on the testing effect suggests that including a brief quiz after the discussions should improve students’ performance on a subsequent quiz.

Method

Participants

Participants included 117 undergraduate students (84 women and 33 men) from James Madison University, whose average age was 19 years. Our sample consisted of 82 freshman, 21 sophomores, 9 juniors, and 4 seniors (one participant did not report her year in school). The students received partial course credit for undergraduate psychology courses by participating. They also had the opportunity to win a $25 gift card if they completed both parts of the study.

Materials and Procedure

Our general procedure closely followed the procedure reported by Saville et al. (2005). Groups of 11 or 12 students reported to a classroom. After signing a consent form, students spent 55 min participating in a simulated interteaching session. During the first 15 min of the session, students read a brief article by Allen (2003) and completed a 10-item prep guide that contained questions about the article (e.g., Approximately how many pet dogs and pet cats are there in the United States? How do researchers typically assess whether pets have a positive effect on cardiovascular health?). Although students were free to write down their answers to the prep-guide items, we did not require them to do so, nor did we check or collect their answers. Next, students formed pairs and spent the next 15 min discussing their prep-guide answers. During this time, the first author moved among the pairs and answered any questions that students had about the prep guides. Once students finished their discussions, they spent 5 min completing a record sheet on which they listed the items they wanted reviewed during the lecture; on the bottom of the record sheet, they also provided general demographic information. After collecting the record sheets, the first author randomly distributed a brief quiz (Quiz 1) or a sheet containing anagrams. For the next 5 min, students in the quiz condition (n = 58) answered eight short-answer questions over the article (e.g., What are the health benefits of pet ownership?); students in the no-quiz condition (n = 59) completed anagrams consisting of the names of famous actors. During this time, the first author reviewed the record sheets and identified which questions to discuss during the lecture. After collecting the quizzes and anagram sheets, the first author then spent 15 min lecturing over any items the students had requested. She then dismissed the students. The students returned 1 week later to complete a 16-item, multiple-choice quiz (Quiz 2). Eight of the questions on Quiz 2 were similar to the short-answer questions from Quiz 1 but presented in multiple-choice format (the tested items); the remaining eight questions came from the prep-guide material but did not appear on Quiz 1 (the nontested items). To preclude students from restudying material during the intervening week, we did not inform them that they would be taking another quiz during the second session.

Interobserver Agreement

The first author and an undergraduate research assistant independently scored 31% of the short-answer quizzes (Quiz 1). They scored each quiz on separate sheets of paper to ensure independent grading. The first author then calculated agreement scores by dividing the total number of agreements by the total number of items (agreements and disagreements) and multiplying by 100%. The mean agreement score was 93% (range = 75–100%), which is within the range of acceptable agreement scores (Cooper, Heron, & Heward, 2007). Because the correct answers on Quiz 2 were unambiguous (i.e., because they were in multiple-choice format), we did not collect any agreement data.

Results and Discussion

Of the original 117 participants who completed the first phase of the study, eight did not return the following week. Our analyses thus include data from the remaining 109 participants who attended both sessions (quiz condition = 54, no-quiz condition = 55).

We first examined how students in the quiz condition performed on Quiz 1. On average, students answered correctly 64% of the short-answer questions. Although this number may at first seem relatively low, it is important to remember that students took the quiz before hearing a lecture over the prep-guide items they found confusing. As Boyce and Hineline (2002) noted, the purpose of the interteaching lectures is to provide clarification on prep-guide items that are difficult for students to understand (for a study examining the positive impact of interteaching lectures on student performance, see Saville, Cox, O’Brien, & Vanderveldt, 2011). Had students in the present study taken the postdiscussion quiz after rather than before the lecture, they likely would have performed better on Quiz 1 (and possibly on Quiz 2 as well; see below). Future researchers may thus wish to examine how the placement of postdiscussion quizzes in the interteaching format potentially affects student performance.

Next, we examined how students performed on Quiz 2, which was the primary purpose of this study. We assessed the Quiz 2 scores using a 2 (quiz vs. no quiz) × 2 (tested vs. nontested items) mixed factorial analysis of variance. There was no interaction of condition and question type, F(1, 107) = 0.92, p = .34. There was, however, a main effect of quiz condition, F(1, 107) = 5.22, p = .02, η_p ² = .47, and a main effect of question type, F(1, 107) = 54.39, p < .001, η _p ² = .34. Specifically, students in the no-quiz condition (M = 67%) answered a greater percentage of questions correctly than students in the quiz condition (M = 61%). Students in both conditions also answered correctly more tested items (M = 72%) than nontested items (M = 56%).

In sum, students who took a quiz performed significantly worse on a follow-up quiz than students who completed anagrams. We also found that students in both groups performed significantly better on tested items than on nontested items. Our findings do not support previous research on the testing effect, which showed that taking a quiz over learned material enhanced subsequent quiz performance relative to additional studying (Butler & Roediger, 2007; Chan et al., 2006; McDaniel et al., 2007; Roediger & Karpicke, 2006b). There are at least two possible reasons why adding postdiscussion quizzes to interteaching might decrease performance. First, students in the quiz condition may have performed worse because of the negative suggestion effect. This effect occurs when respondents initially answer questions incorrectly but do not receive any feedback prior to a subsequent quiz. As a result, they are more likely to answer the same questions incorrectly on the second quiz (Roediger & Karpicke, 2006a). Students in our quiz group may have thus answered questions incorrectly on Quiz 2 because they answered them incorrectly on Quiz 1. Students in the no-quiz group, however, did not have this opportunity because they did not take postdiscussion quizzes.

Although this explanation may be plausible under certain conditions, it seems less likely in the present study, given that students in the quiz condition heard clarifying lectures after taking Quiz 1 and before taking Quiz 2. In fact, when we analyzed the responses on Quiz 1 and Quiz 2, we found few instances when students in the quiz condition answered incorrectly a short-answer question on Quiz 1 and its related multiple-choice question on Quiz 2 (not shown). Thus, the clarifying lectures seemed to correct many of the misunderstandings that students initially had when they answered questions incorrectly on Quiz 1 (see Saville et al., 2011).

The second—and we believe better—explanation concerns the potentially reinforcing effect of the interteaching lectures. According to Boyce and Hineline (2002), the lectures should function as reinforcers because students specify which material they would like the teacher to discuss (i.e., which material will be reinforcing). Because the lectures are targeted and thus more likely to maintain students’ interest, they should have a positive effect on learning. Moreover, any aspect of the lectures that makes them more reinforcing is likely to have a positive effect on learning, and any aspect of the lectures that makes them less reinforcing is likely to have a negative effect on learning.

One factor that affects the reinforcing nature of a stimulus is the amount that a person has already “consumed” (e.g., Laraway, Snycerski, Michael, & Poling, 2003; Michael, 1982). In much the same way that food becomes less reinforcing when one has consumed a large meal, exposure to large amounts of information in a classroom setting may have the same effect: The information may become, at least for a period, less interesting to students. In this study, we exposed students in the quiz condition to the material for 55 min straight (via the discussion, the quiz, and the lecture). It may have been, then, that these students became “satiated” with the material and that the lecture no longer functioned as a reinforcer (or, at least, was less reinforcing). In contrast, students in the no-quiz condition had a 5-min break from the material while they completed anagrams. Although relatively short, this break may have been enough for the lecture to retain its reinforcing function and, thus, impact students’ learning. In support of this idea, Benjamin (2002), McKeachie and Svinicki (2006), and others have noted that students’ attention spans in lecture courses tend to be very limited (typically around 10 min). With short breaks, however, lectures can maintain student interest.

One way to test this notion would be to replicate the present study in a regular interteaching-based classroom, where the lectures typically occur one or more days following the pair discussions (e.g., Saville et al., 2006, Study 2). If satiation was responsible for the negative effect we observed in the present study, separating the discussions from the lectures should at least obviate this effect and may even produce positive outcomes.

Finally, we also found that students in both conditions answered correctly more tested items than nontested items. Because students in the no-quiz condition had not seen these items prior to Quiz 2, this finding suggests that the tested items on Quiz 2 may have been unintentionally easier to answer than the nontested items. This also indicates that any positive effect of testing on student performance may not carry over to nontested items, at least in the context of interteaching. Further research is thus necessary to determine the specific conditions under which testing may carry over to nontested, but related, items.

There are at least two limitations to the present study. First, we did not check to see whether participants had sufficient time to complete the prep-guide items before beginning the pair discussions. If some students were unable to finish the prep guides, they may have relied on their partners to provide them with answers, which might have affected their performance on the quiz. This is not considerably different from what sometimes occurs during real interteaching sessions: Students who are initially unable to find the answers to some of the prep-guide items often use the discussions as a way to identify correct answers (see Saville et al., 2011). Nevertheless, if students in the present study did not have enough time to complete the prep guides, it may have impacted their quiz performance. Although our use of random assignment likely precluded differential prep-guide completion from being a confound, insufficient time to complete the prep guides may have resulted in overall lower quiz scores.

Second, although conducting this study in a simulated classroom setting might have improved its internal validity, such an environment is admittedly artificial. For example, because students’ performance on the quizzes did not affect their actual course grades, they may have lacked the motivation to do well. Similarly, because students were unaware that they would be taking the follow-up quiz, they presumably did not restudy the prep-guide material during the intervening week, another occurrence that seems unlikely (or at least less likely) in a real classroom setting. And even though we believe that students’ overall performance was still quite good—even with little motivation and no additional studying, they still answered correctly about two thirds of the questions on Quiz 2—replicating this study in a real classroom setting would provide additional information regarding the external validity of our findings. Roediger, McDaniel, and colleagues took a similar approach (i.e., focusing on internal validity and then external validity) when conducting their research on the testing effect (e.g., Butler & Roediger, 2007; McDaniel et al., 2007), as did Saville and colleagues (2005, 2006) when conducting their previous research on interteaching.

Ultimately, interteaching already contains components that seem to enhance student performance (see Saville et al., 2006, 2011). For instance, the use of prep guides may help focus students’ attention while they are initially learning course material. Interteaching also attempts to capitalize on active learning through the use of pair discussions. Finally, interteaching incorporates both immediate (from peers during the discussions) and delayed (from the teacher during the lectures) feedback, both of which seem to have a positive effect on student performance (e.g., Agarwal, Karpicke, Kang, Roediger, & McDermott, 2008; Leeming, 2002). In short, adding postdiscussion quizzes may do little to enhance the efficacy of interteaching because it already promotes relatively high levels of student learning.

Footnotes

Acknowledgments

The authors thank Ariana Harner for her help with data collection and Tracy Zinn and Jessica Irons for their comments on a previous draft of this manuscript.

Tonya Lambert is now a doctoral student in the Department of Psychology at Syracuse University. This article was based on a thesis submitted by the first author to the Department of Graduate Psychology at James Madison University in partial fulfillment for the Master of Arts degree.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

The authors received no financial support for the research, authorship, and/or publication of this article.

References

Agarwal

P. K.

Karpicke

J. D.

Kang

S. H. K.

Roediger

H. L.

III McDermott

K. B.

(2008). Examining the testing effect with open- and closed-book tests. Applied Cognitive Psychology, 22, 861–876.

Allen

(2003). Are pets a healthy pleasure? The influence of pets on blood pressure. Current Directions in Psychological Science, 12, 236–239.

Benjamin

L. T.

Jr. (2002). Lecturing. In Davis

S. F.

Buskist

(Eds.), The teaching of psychology: Essays in honor of Wilbert J. McKeachie and Charles L. Brewer (pp. 57–67). Mahwah, NJ: Lawrence Erlbaum.

Boyce

T. E.

Hineline

P. N.

(2002). Interteaching: A strategy for enhancing the user-friendliness of behavioral arrangements in the college classroom. The Behavior Analyst, 25, 215–226.

Butler

A. C.

Roediger

H. L.

III . (2007). Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology, 19, 514–527.

Chan

J. C. K.

McDermott

K. B.

Roediger

H. L.

III . (2006). Retrieval induced facilitation: Initially nontested material can benefit from prior testing of related material. Journal of Experimental Psychology: General, 135, 553–571.

Cooper

J. O.

Heron

T. E.

Heward

W. L.

(2007). Applied behavior analysis (2nd ed.). Upper Saddle River, NJ: Pearson.

Goto

Schneider

(2009). Interteaching: An innovative approach to facilitate university student learning in the field of nutrition. Journal of Nutrition Education and Behavior, 41, 303–304.

Laraway

Snycerski

Michael

Poling

(2003). Motivating operations and terms to describe them: Some further refinements. Journal of Applied Behavior Analysis, 36, 407–414.

10.

Leeming

F. C.

(2002). The exam-a-day procedure improves performance in psychology classes. Teaching of Psychology, 29, 210–212.

11.

McDaniel

M. A.

Anderson

J. L.

Derbish

M. H.

Morrisette

(2007). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19, 494–513.

12.

McKeachie

W. J.

Svinicki

(2006). McKeachie’s teaching tips: Strategies, research, and theory for college and university teachers (12th ed.). Belmont, CA: Wadsworth.

13.

Michael

(1982). Distinguishing between discriminative and motivational functions of stimuli. Journal of the Experimental Analysis of Behavior, 37, 149–155.

14.

Roediger

H. L.

III Karpicke

J. D.

(2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181–210.

15.

Roediger

H. L.

III Karpicke

J. D.

(2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255.

16.

Saville

B. K.

Cox

O'Brien

Vanderveldt

(2011). Interteaching: The impact of lectures on student performance. Journal of Applied Behavior Analysis, 44, 937–941.

17.

Saville

B. K.

Lambert

Robertson

(2011). Interteaching: Bringing behavioral education into the 21st century. The Psychological Record, 61, 153–165.

18.

Saville

B. K.

Zinn

T. E.

Elliott

M. P.

(2005). Interteaching versus traditional methods of instruction: A preliminary analysis. Teaching of Psychology, 32, 161–163.

19.

Saville

B. K.

Zinn

T. E.

Lawrence

N. K.

Barron

K. E.

Andre

(2008). Teaching critical thinking in statistics and research methods. In Dunn

D. S.

Halonen

J. S.

Smith

R. A.

(Eds.), Teaching critical thinking in psychology: A handbook of best practices (pp. 149–160). London, England: Wiley-Blackwell.

20.

Saville

B. K.

Zinn

T. E.

Neef

N. A.

Van Norman

R. K.

Ferreri

S. J.

(2006). A comparison of interteaching and lecture in the college classroom. Journal of Applied Behavior Analysis, 39, 49–61.

21.

Scoboria

Pascual-Leone

(2009). An ‘Interteaching’ informed approach to instructing large undergraduate classes. Journal of the Scholarship of Teaching and Learning, 9, 29–37.