Abstract
Over the past 10 years, crises surrounding replication, fraud, and best practices in research methods have dominated discussions in the field of psychology. However, no research exists examining how to communicate these issues to undergraduates and what effect this has on their attitudes toward the field. We developed and validated a 1-hr lecture communicating issues surrounding the replication crisis and current recommendations to increase reproducibility. Pre- and post-lecture surveys suggest that the lecture serves as an excellent pedagogical tool. Following the lecture, students trusted psychological studies slightly less but saw greater similarities between psychology and natural science fields. We discuss challenges for instructors taking the initiative to communicate these issues to undergraduates in an evenhanded way.
Keywords
During the past 10 years, crises surrounding replication, fraud, and best practices for data collection, analysis, and reporting have dominated discussions in the field of psychology. Special issues on replication, new journal policies on transparency and study design, and new methods for detecting questionable research practices have all emerged in an effort to quantify and/or alleviate concern over the reliability of psychological research (e.g., Asendorpf et al., 2013; Diener & Biswas-Diener, 2016; John, Loewenstein, & Prelec, 2012; Jonas & Cesario, 2015; Klein et al., 2014; Nosek & Bar-Anan, 2012; Open Science Collaboration, 2015; Pashler & Wagenmakers, 2012; Roberts, 2015; Schnall, Johnson, Cheung, & Donnellan, 2014; Simmons, Nelson, & Simonsohn, 2011, 2013; Spellman, 2015; Wagenmakers, 2015). Given the centrality of this debate for the future of psychological science, the attention it has received in some of our field’s most prestigious outlets, and the fact that it is actively being introduced to the public through various media outlets (Achenbach, 2015; Carey, 2015; Yong, 2016), the question of why this material is not being taught to undergraduate students seemed important, puzzling, and problematic.
At national and regional conferences, in our home departments, and over e-mail, we corresponded with undergraduate instructors from both small liberal arts colleges and large research-oriented institutions about the content of their courses. We quickly discovered that not only had instructors struggled integrating these topics into their lectures, but they also were not sure where to start in trying to teach about these topics.
These reactions are not surprising given that the only suggestions to date about conveying this material to undergraduates have been to encourage instructors to involve undergraduates in replication attempts (Frank & Saxe, 2012; Grahe et al., 2012). However, the question of whether students can fully comprehend the nature of the replication crisis must be dealt with prior to addressing the practical question of whether to involve undergraduates in replication attempts. Thus, we set out to fill this void by developing an effective, up-to-date, and balanced treatment of the replication crisis with a full lecture script and prepackaged 1-hr PowerPoint presentation. We then tested its efficacy at both a large research university with very large class sizes and a mid-sized liberal arts college with very small class sizes.
Our approach was primarily to consolidate the recent history of the replication crisis as well as the arguments for increasing reproducibility suggested by experts. This approach allows instructors unfamiliar with the details of this debate to leverage their colleagues’ expertise in a similar way that a textbook might. The lecture attempts a balanced treatment of experts’ opinions as well as a distillation of statistical discussions down to a level that undergraduates can understand.
Prior to formulating the lecture, we engaged in extended conversations with undergraduates who were considering going on to PhD programs in psychology. The vast majority of them had heard of this crisis, although sometimes only filtered by journalists. They were very curious about the debate. However, because of their haphazard exposure to these discussions, their opinions tended to be quite radical, superficial, and even emotional. Some recounted that they were glad that psychology was “cleaning house” and felt that even “most” psychological results were unreliable. Others expressed anger at the originators of this discussion for even bringing up the issue and casting psychology in a negative light. Our discussions with these students led us to empirically test how learning about the replication crisis affected students’ perceptions of the field.
Selection of Lecture Material
Lecture slides, script, and a justification for topic inclusion are available (https://osf.io/mh9pe/). The broad, 1-hr lecture was designed to give students an introduction to the causes and consequences of the replication problem in the field of psychology, the recent attempts to quantify replication and reproducibility, and possible solutions to increasing replication and reproducibility (e.g., Asendorpf et al., 2013; Diener & Biswas-Diener, 2016; John et al., 2012; Jonas & Cesario, 2015; Klein et al., 2014; Nosek & Bar-Anan, 2012; Open Science Collaboration, 2015; Pashler & Wagenmakers, 2012; Roberts, 2015; Schnall et al., 2014; Simmons et al., 2011, 2013; Spellman, 2015; Wagenmakers, 2015).
Method
Participants
Participants were 194 undergraduate students (M age = 20.26, SD = 1.99; 77.8% female) enrolled in either an Introduction to Personality course (at a large Midwestern university) or a Psychological Methods course (at a small Midwestern liberal arts college) in Spring 2016. Students reported their ethnicity as White (72.2%), Asian (12.3%), African American (6.4%), multiracial and other ethnicities (5.9%), and Hispanic/Latino (3.2%). The median year in school was 2 (a college sophomore). A large proportion of students (68%) had taken a statistics class in the past or was currently enrolled in one, whereas only 31% had heard of the replication crisis in psychology prior to the lecture. An additional 60 students did not complete one of the three parts of the study (Time 1 survey, Time 2 survey, and attending class the day of the lecture) or failed an attention check in the survey (i.e., “Please choose the third option on the scale below [i.e., #3]”). Those excluded (for whom we had demographic information) were comparable to the main analytic sample with respect to age, year in school, whether they had taken a statistics course, whether they were familiar with the replication crisis, and number of psychology classes taken. However, these students self-reported having a lower grade point average (d = 0.82) were more likely to be men (45% vs. 22.8% of the analyzed sample; χ2 = 5.14, p = .02).
Materials and Procedure
Participants completed one survey in the week prior to the lecture and one follow-up survey in the week after. At both time points, students completed attitudinal questions about the field of psychology, read a news article for a study that had generated a lot of attention but had difficulty replicating, and answered questions about the study described in the news article. At Time 1, participants were randomly assigned to read a news article describing either the interpersonal effects of physical warmth (Lynott et al., 2014; Williams & Bargh, 2008) or how a single exposure to the American flag shifts political views 8 months later (Carter, Ferguson, & Hassin, 2011; Klein et al., 2014).
At Time 2, participants read the other news article (i.e., the one they had not previously seen) and answered the same questions that were asked at Time 1. Additionally, students answered general questions about the content and the clarity of the lecture.
Below, we provide an overview of the attitude and comprehension questions, but additional open-ended questions and sample examination items are available on the Open Science Framework site.
Measures
Attitudes about psychology
At both time points, six questions assessed attitudes toward the field of psychology. These items were answered on a scale ranging from 1 (strongly disagree) to 7 (strongly agree). These questions can be found in the first six rows of Table 1.
Evaluations of Psychology and Press Psychological Studies.
Note. N = 194. Paired-sample t tests for attitudinal measures pre- and postlecture. M = mean; SD = standard deviation.
*p < .05. **p < .01. ***p < .001.
News articles and study evaluation questions
At both time points, participants read a real news article about a study that had garnered media attention but fostered doubts about replicability (Carter et al., 2011; Klein et al., 2014; Lynott et al., 2014; Williams & Bargh, 2008). News articles were copied verbatim from news sources (Charles, 2008; Hochmuth, 2011). Sample sizes for each study were surreptitiously inserted into the news articles by the study investigators.
Following the news article, five multiple-choice questions about the study were presented to participants: “How believable did you find the results of this study?” “If another group of researchers were to run a similar study to the studies described on the previous page, how likely is it that this additional study would reach similar conclusions?” “How surprising were the results of this study?” “How interested are you in reading the empirical article described in the media account?” and “How would you rate the quality of the research described on the previous page?” Each question was asked on a 7-point scale, and scale anchors varied across questions. Finally, participants were asked an open-ended question, “What were your impressions and thoughts about this study?,” which was later coded by the first and third authors (κ = .95).
Comprehension questions
At Time 2, 12 multiple-choice questions about the lecture were administered. Seven items asked about participants’ agreement on a 7-point scale ranging from 1 (disagree strongly) to 7 (agree strongly) and can be seen in the upper panel of Table 2. Five additional multiple-choice questions about the importance of various best practices for reproducibility were asked on a scale ranging from 1 (not important at all) to 5 (extremely important) and can be seen in the lower panel of Table 2.
Comprehension Questions.
Note. N = 193–194. Agreement questions were answered on a 7-point scale. Importance questions were answered on a 5-point scale. Percentages reflect proportion of respondents indicating agreement (agree slightly, agree, and agree strongly) or importance (very important, extremely important) above the midpoint of each scale. A duplicate question asking “It is important for a researcher to disclose all measures and experimental conditions that were included in a study” was accidentally included in the survey. The descriptives for this item closely match the item reported above (94.3%; M = 6.27, SD = 1.10). M = mean; SD = standard deviation.
Results
Attitudes About Psychology
For the most part, student attitudes toward psychology were relatively stable (see top of Table 1; a Bonferroni correction was applied to the six comparisons made [α = .008]). However, after the lecture, students trusted the results of studies done by psychologists less but considered psychology to be more similar to the “natural” or “hard” sciences.
Study Evaluation
We compared the evaluations of students who read a particular news article (e.g., physical warmth) before the lecture to students who read that news article after the lecture (see Table 1). Across both news articles, students thought that the findings of the studies would not replicate after hearing the lecture. Further, students were less likely to believe the findings of the flag priming study post-lecture. This pattern was also found for the believability of the physical warmth study (d = −0.20); however, this decline was not significant. Students did not change their other evaluations over time (i.e., the surprising nature of the findings, interest in reading the empirical paper, and the quality of the research).
Examining the open-ended impressions of the study revealed that participants who read the physical warmth news article after the lecture mentioned that the studies had small sample sizes. Participants who read the flag priming news article after the lecture were less likely to mention that it was a strong study. Incidence of other spontaneous evaluations in the open-ended responses did not differ pre- to post-lecture.
Comprehension Questions
As seen in Table 2, the majority of students thought that psychology had problems replicating results (69%), that the incentive structure was problematic (64%), and that it is important to report all the details about a study (96%). Importantly, almost no students thought that this was a problem limited to the field of psychology (2%). Encouragingly, very few students thought that scientists performing replications were less capable of doing research (3%). Students also did not think that media attention was a sign of reliability (3%). In retrospect, the question about low-powered studies may have been confusing for students: About 23% of students thought that studies with low statistical power were by definition incorrect. We meant to convey that low-powered studies provide ambiguous information. The vast majority of students (all %s >82%) thought that it was very important to choose a sample size a priori, make data publicly accessible; that decisions in design, analysis, and reporting affect the likelihood of obtaining significant results; and that it is important to report studies that do not work out. About a third of students agreed that counterintuitive findings are very important, reflecting the ambiguity about whether they make a meaningful contribution to the literature.
Discussion
We sought to develop a lecture that would allow instructors with only a cursory knowledge of issues related to rigor and reproducibility to convey the most important themes of these topics to their students. Developing and empirically validating a lecture about these issues should help alleviate many of the concerns instructors may have, particularly with respect to keeping up with the debate, how to translate esoteric arguments into intelligible positions, and what effects learning about these issues may have on students.
We attempted to present arguments about replication and reproducibility from an evenhanded perspective. Nevertheless, this debate is ongoing, and prominent researchers disagree about the extent of the problem, solutions to the problem, and many other issues. Thus, the lecture will continue to be updated and empirically tested. Adopting the lecture would be a way for instructors who do not follow this debate closely to stay up to date and keep their classes apprised of any developments.
The lecture was effective in conveying the most important issues about this crisis as evidenced by the comprehension questions. Of particular relevance to students being informed consumers (and future funders) of psychological research, 97% of students responded that media attention was not an accurate indicator of the reliability of a study. In addition, students demonstrated high levels of agreement with current suggestions about transparency and reproducibility, such as determining a sample size before running a study, making data publicly available, and reporting studies that “don’t work out.” They also seemed to understand that flexible statistical decision-making can lead to questionable and significant findings. Students also correctly stated that the studies presented in the press releases may have trouble replicating, given the issues communicated in class.
It may be worrisome for some instructors to see that after the lecture, students reported trusting the studies of psychologists slightly less. However, one cannot both (a) admit that psychology has problems replicating some findings and (b) expect the trust level of individual findings to remain exactly the same. These decreases were counteracted by students’ greater appreciation for study design elements that increase reproducibility. In other words, it could be that students trust individual studies slightly less, but they can also now identify reproducible research when they see it.
Although students’ attitudes about the field experienced small changes pre- to post-lecture, learning about these issues did not change individuals’ intentions to pursue graduate study. This observation suggests that the recent discussions about replication will not cause a shortage in graduate school applicants anytime soon—at least as long as students learn about it in an evenhanded way from their instructor. Whether exposure to media reports about replication issues, be they biased or not, affects interest in attending graduate school is an open question.
Given the limited time available (∼1 hr), some topics were not covered by the lecture. However, methodological advances for evaluating psychological research and conceptual approaches for enhancing reproducibility could also be integrated into other psychology courses (Finkel, Eastwick, & Reis, 2015; Gelman & Loken, 2013; Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2015; Simonsohn, Nelson, & Simmons, 2014; Smaldino & McElreath, 2016). The format of the lecture also allows instructors to leave out subtopics within the lecture if they believe they cannot deliver the full lecture in less than 1 hr. We also hope to investigate the effectiveness of the teaching materials using a less invasive approach than the one described above, as mere participation in a study about educational outcomes might have enhanced students’ attention toward the concepts.
Conclusion
We hope that the development of this lecture serves as a first step in integrating discussions about replication and reproducibility into contemporary undergraduate curricula. In our lecture, we covered the development and emergence of these issues while providing a template for instructors to add or subtract material as they see fit. The continued development and refinement of the lecture will give instructors an additional tool that keeps both them and their students up to date.
Footnotes
Author Contribution
William J. Chopik and Ryan H. Bremner contributed equally to the writing of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
