Test-Enhanced Learning

Abstract

Test-enhanced learning is a technique instructors can use to increase recall on summative assessments (e.g., exams) via formative assessments (e.g., quizzes). The present research examined recommendations based on the transfer-appropriate processing and level-of-processing (LOP) perspectives to assess the question, does deeper processing on quizzes (i.e., using application questions compared to factual) benefit exam performance? Students were more likely to correctly answer application questions on the exam when quizzes required a deeper LOP, and students appeared to gain a relatively equivalent definitional understanding using either factual or application questions on quizzes. Consequently, the present research supports the use of quiz questions that require a deeper LOP, especially when students are expected to learn beyond rote memorization.

Keywords

test-enhanced learning testing effect transfer-appropriate processing level of processing

Test-enhanced learning is a technique instructors use to increase recall on summative assessments (e.g., exams) via formative assessments (e.g., quizzes). The present research examines the question, does deeper processing on quizzes benefit exam performance? Based on the transfer-appropriate processing (TAP) and level-of-processing (LOP) perspectives, we suggest that manipulating the type of questions on a quiz may improve the impact of test-enhanced learning on later retention. TAP indicates that the depth of processing required on the quiz should coincide with the exam questions in order to increase exam scores (Koedinger, Corbett, & Perfetti, 2012; Lockhart, 2002). For example, if the exam emphasizes vocabulary and definitions, the quiz should be oriented toward rote memorization. Thus, factual quiz questions should improve performance on factual exam questions, and application quiz questions should improve performance on application exam questions. In contrast, LOP indicates that the deeper the processing required on the quiz the better retrieval will be on the subsequent exam regardless of the type of questions on the exam (Lockhart, 2002). From this perspective, application quiz questions should improve performance on both factual and application exam questions. We contrast these two perspectives in a classroom setting and provide recommendations for instructors on how to structure quizzes to benefit test performance.

Test-enhanced learning is supported by many studies (Arnold & McDermott, 2013; Pyc, Agarwal, & Roediger, 2014; Roediger & Karpicke, 2006) that show that formative assessments improve performance on summative assessments by approximately 4–7%. As an example, McDaniel, Agarwal, Huelser, McDermott, and Roediger (2011) examined the effect in middle school students by presenting quizzes throughout learning units. They found that students performed better on units that were quizzed; similar outcomes have been found within courses in postsecondary education (e.g., McDaniel, Roediger, & McDermott, 2007).

TAP indicates that completing encoding and retrieval under similar conditions should have a favorable impact on memory (Morris, Bransford, & Franks 1977). Based on this perspective, the use of flash cards and comparable techniques should positively impact performance on an exam that centers on vocabulary and definitions due to the comparable conditions of encoding and retrieval (Blaxton, 1989). There are a few dimensions on which encoding and retrieval can be matched (e.g., medium, mental state, and question type). We used comparable types of questions (i.e., factual and applied) on quizzes and exams.

LOP indicates that deeper processing results in better retrieval regardless of the circumstances surrounding the final recall (Craik & Tulving, 1975). As an example, Rose, Myerson, Roediger, and Hale (2010) examined the impact of LOP by showing a word that was written in a color and having participants choose a word from a new list based on whether the new word was the same color, rhymed with the original word, or had a similar semantic meaning. Participants were later asked to list the original words in order. The semantic meaning condition, which requires a deeper LOP compared to the other two conditions, improved long-term memory (Rose, Myerson, Roediger, & Hale, 2010). According to this perspective, even if the exam concentrates on only definitional questions, students who studied the concept at an application level should still perform higher on the exam due to their deeper level of understanding (Benjamin & Bjork, 2000).

The present research evaluates the potential contributions the TAP and LOP perspectives may provide in relation to test-enhanced learning. These two viewpoints both indicate that the quiz question type may play a role in the testing effect: TAP supports using comparable items on quizzes and exams, and LOP supports using quiz items that require deeper processing. With this in mind, we randomly assigned students to one of the two quiz conditions (factual or application) to see how the quiz question type impacted subsequent exam performance. Factual quizzes used definition questions that required less depth of processing than the application quizzes. The application quizzes required a deeper LOP, compared to the factual quizzes, because students connected the terms to real-world situations. Exploring the differences between these conditions may improve our understanding of the TAP and LOP perspectives and may provide practical information about how to use quizzes and other formative assessments more effectively.

Method

Participants

Participants consisted of 40 of 52 students enrolled in a social psychology course. Because they registered after the first quiz or missed a quiz or the exam, 12 students were not included in the analyses. The participants were primarily White (80%), and the majority of the students involved were women (72.5%). Students were classified as freshmen (12.5%), sophomores (30.0%), juniors (35.0%), or seniors (17.5%); responses were not provided for the remaining students (5%).

Materials and Procedure

The present study concentrated on how quiz questions can influence performance on a later exam. Students completed three chapter quizzes and a subsequent exam covering the three chapters. Students were randomly assigned to complete three 10-item multiple-choice quizzes that used either factual or application questions. We selected 10 vocabulary terms from each chapter and developed a factual item and application item for each target term. Question order was random but consistent across conditions, so that the target terms were quizzed in the same order in the factual and application groups. The four answer choices were randomized for each question but were arranged in the same order in both conditions. Example items are presented in Table 1. Quizzes were completed online using Blackboard Version 9.1. Students received a grade on the assignment from the instructor with information about items missed and the correct answer. Quizzes were not discussed in detail in class, although general information was provided (e.g., reminders to complete the quizzes).

Table 1.

Example Factual and Application Items.

Target Term	Question Type	Example Item
Internal locus of control	Factual	Individuals who perceive outcomes as controllable by their own efforts have a(n)…
Internal locus of control	Application	Ryan is constantly pushing himself to improve and do better because he believes that through these actions he can have some influence over his future. Ryan seems to have a(n)…
Random sample	Factual	A representative group that is drawn from a larger population is called a…
Random sample	Application	Lara obtains a list of every student currently enrolled at her university and selects 200 students to represent the larger student body. Lara appears to be gathering a…

An exam was later administered to all students using both factual and application questions. The exam consisted of 45 multiple-choice questions and two short-answer questions, not included in subsequent analyses. Target terms from the quizzes, 30 total, were randomly assigned to be factual or application items on the test, and a new item was created for each term. The analyses focused on these 30 items: 15 factual and 15 application. The remaining 15 exam questions (7 factual and 8 application) focused on terms not previously quizzed; including these items as part of the outcomes using an overall test score did not change the findings.

Results

Univariate normality was tested by group, and the two dependent variables violated the assumption of normality because the critical ratios for skewness and kurtosis exceeded two and the Shapiro–Wilk’s tests were significant in three of the four cases (ps < .05). Levene’s test did not indicate a violation of the homogeneity of variance assumption (ps > .3). Consequently, bias-corrected accelerated bootstrap confidence intervals (CIs) were calculated for the difference between group means and the effect size using the boot package (Version 1.3-19) and boot.ci function with 10,000 bootstrapped samples in R Version 3.4.0 (Canty & Ripley, 2017; R Core Team, 2017).

We used two independent-samples t tests to assess the impact of quiz condition—factual (n = 24) or application (n = 16)—on exam performance, which we measured using percentage correct for factual items and percentage correct for application items. On the factual exam items, students in the application condition (M = 90.00%, SD = 12.41) scored 5.6% higher, 95% CI [−2.67%, 12.49%], than students in the factual condition (M = 84.44%, SD = 11.41), although this difference was not significant, t(38) = −1.46, p = .15, two-tailed, r _pb = .23, 95% CI [.01, .50]. On the application exam items, students in the application quiz condition (M = 90.83%, SD = 9.70) scored 11.4% higher, 95% CI [4.33%, 18.04%], than students in the factual condition (M = 79.44%, SD = 12.42), t(38) = −3.09, p = .004, two-tailed, r _pb = .45, 95% CI [.15, .63].

Discussion

Test-enhanced learning proposes that giving students a formative assessment (e.g., quiz) will improve recall on a summative assessment (e.g., exam), but can this effect be improved by considering the TAP or LOP perspectives? Both TAP and LOP support the importance of the type of question asked on a quiz. TAP indicates that using the same type of questions on the quiz and on the exam will increase exam scores. In contrast, LOP indicates that the deeper the LOP required for a quiz the higher the exam scores will be. We tested these viewpoints by giving half the class quizzes that focused on factual questions (more shallow LOP) and the other half quizzes that focused on application questions (deeper LOP). We then compared performance between these two groups on a subsequent exam containing factual and application questions. The results showed that students who took the application quizzes scored 11% higher on the application exam questions, which is consistent with both perspectives. The students who completed application quizzes also scored 5% higher on the factual exam questions. Although this difference was not significant, there was a small to medium effect (Cohen, 1988). Subsequently, the impact of quiz question type on performance on factual exam items warrants further evaluation, and the overall results support the LOP perspective and the use of deeper questions in formative assessments.

Our research focused on a single classroom at one university; it would be advisable to research these findings in different types of classes at multiple institutions (Wilson-Doenges, Troisi, & Bartsch, 2016). It could be that the level of a course plays a role in determining which type of quiz questions is most beneficial. For example, introductory courses that cover new material and require less application may benefit more from factual quizzes because the new terms and their definitions are emphasized in the course. Additionally, recent publications (see Arnold & McDermott, 2013) note that quizzes may enhance later performance directly via test-enhanced learning and indirectly via test-potentiated learning. Future research should include measures regarding how students study between the quiz and the exam to see whether indirect factors, such as amount of time studied and approach to studying, vary based on the types of quiz questions used. If there is a difference in how students approach studying following the quiz, it is possible that part of the quiz impact is indirect and associated with these changes in behavior.

In our research, students appeared to better understand material at the application level when formative assessments (i.e., quizzes) required a deeper LOP (i.e., used application questions). Students also performed slightly, although not significantly, better on factual exam items when quizzes used application questions. Consequently, our research supports the use of quiz questions that require a deeper LOP, especially when students are expected to learn beyond rote memorization.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported, in part, by a grant from the Davis Educational Foundation (Yarmouth, ME), awarded to the University of New Hampshire, Victor A. Benassi (PI).

References

Arnold

McDermott

(2013). Test-potentiated learning: Distinguishing between direct and indirect effects of tests. Journal of Experimental Psychology: Learning Memory and Cognition, 39, 940–945. doi:10.1037/a0029199

Benjamin

A. S.

Bjork

R. A.

(2000). On the relationship between recognition speed and accuracy for words rehearsed via rote versus elaborative rehearsal. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 638–648. doi:10.1037/0278-7393.26.3.638

Blaxton

T. A.

(1989). Investigating dissociations among memory measures: Support for a transfer-appropriate processing framework. Journal of Experimental Psychology: Learning, Memory and Cognition, 15, 657–668. doi:10.1037/0278-7393.15.4.657

Canty

Ripley

(2017). Boot: Bootstrap R (S-Plus) functions [Computer software]. Retrieved from https://CRAN.R-project.org/package=boot

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Craik

Tulving

(1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104, 268–294. doi:10.1037/0096-3445.104.3.268

Koedinger

K. R.

Corbett

A. T.

Perfetti

(2012). The knowledge-learning-instruction framework: Bridging the science-practice chasm to enhance robust student learning. Cognitive Science, 36, 757–798. doi:10.1111/j.1551-6709.2012.01245.x

Lockhart

R. S.

(2002). Levels of processing, transfer-appropriate processing, and the concept of robust encoding. Memory, 10, 397–403. doi:10.1080/09658210244000225

McDaniel

Agarwal

Huelser

McDermott

Roediger

H. L.

III . (2011). Test-enhanced learning in a middle school science classroom: The effects of quiz frequency and placement. Journal of Educational Psychology, 103, 399–414. doi:10.1037/a0021782

10.

McDaniel

Roediger

H. L.

III McDermott

(2007). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin & Review, 14, 200–206. doi:10.3758/BF03194052

11.

Morris

C. D.

Bransford

J. D.

Franks

J. J.

(1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519–533. doi:10.1016/S0022-5371(77)80016-9

12.

Pyc

M. A.

Agarwal

P. K.

Roediger

H. L.

III . (2014). Test-enhanced learning. In Benassi

V. A.

Overson

C. E.

Hakala

C. M.

(Eds.). Applying science of learning in education: Infusing psychological science into the curriculum (pp. 78–90). Retrieved from http://teachpsych.org/ebooks/asle2014/index.php

13.

R Core Team. (2017). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

14.

Roediger

H. L.

III Karpicke

J. D.

(2006). The power of testing memory basic research and implications for educational practice. Perspectives on Psychological Science. 1, 181–210. doi:10.1111/j.1745-6916.2006.00012.x

15.

Rose

N. S.

Myerson

Roediger

H. L.

III Hale

(2010). Similarities and differences between working memory and long-term memory: Evidence from the levels-of-processing span task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 471–483. doi:10.1037/a0018405

16.

Wilson-Doenges

Troisi

J. D.

Bartsch

R. A.

(2016). Exemplars of the gold standard in SoTL for psychology. Scholarship of Teaching and Learning in Psychology, 2, 1–12. doi:10.1037/stl0000050