Abstract
This study assesses the effectiveness of critical thinking drills (CTDs), a repetitious classroom activity designed to improve methodological and statistical thinking in relation to psychological claims embedded in popular press articles. In each of four separate CTDs, students critically analyzed a brief article reporting a recent psychological study by answering a set of 10 critical thinking questions in relation to it. Student responses were subsequently self-scored through an instructor-led discussion session. Results showed that the average CTD score increased linearly between the first and final assessment. These results suggest that critical thinking in relation to scientific claims found in secondary source material can be successfully taught and quantitatively assessed.
Keywords
Educators have increasingly been encouraged to explicitly incorporate critical thinking training and assessment into their courses. Some time ago, the U.S. Department of Education commissioned a paper identifying the need for better critical thinking instruction and a set of general guidelines for assessing such thinking (Paul & Nosich, 1992). Since then, there has been significant progress in advancing the cause of critical thinking in higher education with the adoption of critical thinking training into national and international education standards (European Higher Education Area, 2011; U.S. Department of Education, 2006).
In light of these changes in the higher education landscape, there has been corresponding progress in developing efficacious critical thinking interventions and in ascertaining the scope of their success. First of all, there is widespread recognition that explicit instruction in some form is necessary in order to achieve meaningful improvements in critical thinking (Abrami et al., 2008; Bensley, Crowe, Bernhardt, Buckner, & Allman, 2010; Chance, 1986). That is, instructors should not expect to accomplish critical thinking learning objectives merely through passive or indirect means, such as by assigning critical thinking questions inserted in textbook chapters or by immersing their students in critical discussion.
In addition, there is ample evidence that intentional strategies to improve critical thinking can be effective. Overt instructional techniques embedded in courses designed to improve critical thinking have been shown to yield positive results across a range of relevant critical thinking outcomes including argument evaluation (Bensley et al., 2010; Blessing & Blessing, 2010), pinpointing weaknesses in evidence (McLean & Miller, 2010; Penningroth, Despain, & Gray, 2007), and proposing alternative explanations (West & Montgomery, 1998). Furthermore, it appears that these sorts of outcomes can even be sustained outside of the classroom, several months after course completion (Lehman & Nisbett, 1990).
Despite these advancements, there remains no widespread consensus regarding the best way to define critical thinking at either the conceptual or operational levels (Halonen, 1995; Williams, 1999). Any reasonable definition, however, needs to incorporate the ability to analyze information by resolving it into its essential elements and to then determine how those elements relate to one another (Bloom, Engelhart, Hurst, Hill, & Krathwohl, 1956). Within this framework, critical thinking involves the careful parsing and interpretation of information, so that ultimately a sound conclusion can be reached. In science courses, this would require basic knowledge of scientific methodology and the ability to apply that knowledge in specific cases. For instance, students are frequently bombarded by information that takes the general form “New study shows that x” (with x representing an empirical claim). The ability to think critically in this case would require students to have the ability to identify the specific claim, isolate the variables in the study, describe how those variables were measured or manipulated, describe what the results of the study were, and so on.
This more methods-oriented emphasis is consistent with at least one commonly accepted definition proposed by Glaser (1941), which includes three separate parts: “(1) an attitude of being disposed to consider in a thoughtful way the problems and subjects that come within the range of one’s experiences, (2) knowledge of the methods of logical inquiry and reasoning, and (3) some skill in applying those methods” (p. 46).
Despite the potential utility of this critical thinking definition, it is somewhat narrow in comparison to other definitions that include higher order thinking skills in addition to analysis, such as synthesis and evaluation (Bloom et al., 1956; King, 1994, 1995). In light of that, it is perhaps better to regard Glaser’s (1941) definition as a subtype of critical thinking, which might be labeled methodological and statistical thinking (MST). The current article is an attempt to demonstrate the effectiveness of an activity designed to teach this specific form of critical thinking.
In addition to these definitional concerns, it is vital to maximize the ecological validity of the critical thinking test (Halpern, 1998; Ku, 2009). The target of the exercise—the source of the claim—should mirror the ways in which real people consume actual information in their daily lives. Recent efforts have begun to address this issue by having students distinguish scientific from pseudoscientific claims of the sort that they might encounter on an everyday basis (Adam & Manson, 2014; McLean & Miller, 2010; Stark, 2012; West & Montgomery, 1998).
However, few studies have analyzed entire print articles as targets of these exercises. Nevertheless, sources such as articles in newspapers, magazines, and blogs are ideal for these purposes because they provide a relevant context that corresponds with how the average person is exposed to psychological research. In addition, since media reports often oversimplify, misrepresent, and exaggerate the objects of their reporting (Sumner et al., 2014), they provide an ideal touchstone for practicing critical information literacy skills. Given these characteristics, it is sensible to include popular press articles as targets of instructional efforts to improve critical thinking.
In the current article, I will describe the use of an in-class active-learning activity designed to improve students’ MST ability in relation to secondary source claims derived from popular press articles. In addition, I will report empirical results demonstrating quantitative improvements in MST using this activity.
Method
Participants
One hundred and eight (33 males, 75 females) undergraduate students enrolled in three separate sections of a lower level introductory psychology course participated. The data from six students were excluded from the analyses in order to control for selective attrition that was either due to withdrawal from the course or multiple absences at later class meetings in which data were collected.
Materials, Measures, and Procedure
Students completed four separate critical thinking drills (CTDs) as in-class activities interspersed over a single semester at 3- to 4-week intervals. The initial CTD was completed following exposure to course material on research methodology as well as subsequent participation in an in-class CTD practice session.
The procedure for each of the four CTDs was identical except that the stimulus articles varied. Each of the four articles was preselected from one of two popular weblogs that frequently report current research in psychology: PsyBlog (http://www.spring.org.uk/) and the New York Times Well Blog (http://well.blogs.nytimes.com/), and all were roughly the same two pages in length. Articles were selected based on the considerations of length (preference for shorter articles), clarity, and matched difficulty level (only simple, two-variable studies were selected for this introductory course). They also reflected a diverse range of topics and methodological approaches.
The basic format for the CTDs did not vary across the four measurements. Each exercise took the entire 75-min class period. Students were notified in advance of each CTD and were gently encouraged, not required, to review basic definitions and concepts in research methods (e.g., independent and dependent variables, operational definitions, and confounds) covered earlier in the course before arriving in class. Finally, students completed each CTD on their own, without any assistance from fellow students or their notes or textbook.
At the outset of each CTD, the article was distributed among the class, and students were given 15 min to read it and answer a set of 10 critical thinking questions in relation to it. These 10 questions were standardized across the four assessments. They required students to apply their understanding of research methodology to analyze and evaluate the central claim presented in the article, a task that engaged a range of lower and higher order critical thinking skills related to the analysis and evaluation of scientific claims. These questions are included in the Appendix.
The discussion, feedback, and self-grading phase of the CTDs commenced once the initial 15-min time period elapsed. At that point, any students still working on the assignment were instructed to stop doing so. The class was then presented with instructions for self-grading their own work through instructor-led discussion. Students were informed that each question was worth a maximum of one point and that they could earn a half point for partially correct answers. In addition, they were told that more than one answer could be considered correct for many of the questions and that they should raise their hands frequently to discuss how to score their individual responses in ambiguous cases. The self-grading procedure involved displaying the answer sheet over a computer projector with modeled answers displayed prominently in red type beneath each question. Students were exposed to the modeled answers one at a time and were prompted to compare them to their own answers. Throughout this process, students were continuously encouraged to ask questions or make comments to clarify the scoring of borderline responses that may have only partly resembled the modeled responses.
This reflective self-grading procedure took up the bulk of the time devoted to the exercise and provided a comprehensive feedback mechanism explicitly intended to instill a deep understanding of the rationale supporting each modeled response. It also led to a very rich, full, and dynamic conversation about the research—its strengths and limitations—and ultimately, what could be concluded from it.
Once all of the 10 questions were discussed in this way, students were instructed to sum the points they earned and compute their total out of a maximum of 10 possible points. The assignments were then collected, and final grades were assigned by the instructor. Scores from each of the four CTDs were recorded, and collectively, they accounted for about 13% of each student’s final course grade. I expected that the average CTD score across the three sections would increase in a linear manner between the first and final assessment, reflecting improvements in students’ ability to critically assess the quality of psychological research claims.
Results
The scores from all three sections were combined and averaged for each of the four CTDs. As mentioned previously, data from six students were omitted to control for selective attrition, leaving 102 participants in the final data set. Random missing data were handled by substituting the absent values with the mean scores from each of the four assessments.
The means and standard deviations for each of the four assessments are displayed in Table 1. A one-way repeated measures analysis of variance revealed that there was a significant difference between the average scores, F(3, 101) = 36.00, p < .001.
Change in Mean Critical Thinking Drill Scores Between Assessments.
Note. SD = standard deviation.
Post hoc paired sample t tests were then conducted to analyze differences between individual means. First, the differences between the first assessment and each of the three subsequent assessments were analyzed. These tests revealed that there was a significant difference between the first, M = 7.22, SD = 1.74, and second assessment, M = 8.37, SD = 1.26, t(101) = −6.46, p < .001, d = 1.58; the first, M = 7.22, SD = 1.74, and third assessment, M = 8.55, SD = 1.22, t(101) = −7.42, p < .001, d = 1.90; and the first, M = 7.22, SD = 1.74, and fourth assessment, M = 8.79, SD = 1.26, t(101) = −7.91, p < .001, d = 2.17.
Additional tests revealed a significant difference between the second, M = 8.37, SD = 1.26, and fourth assessment, M = 8.79, SD = 1.26, t(101) = −3.20, p < .01, d = .43, and a nonsignificant difference between the third, M = 8.55, SD = 1.22, and fourth assessment M = 8.79, SD = 1.26, t(101) = −1.69, p =.09, d = .24. These results are illustrated in Figure 1.

Mean critical thinking drill scores with statistical significance indicated. Error bars represent standard errors.
Discussion
The results of the current study indicate that students showed significant increases in their ability to think critically in relation to psychological claims embedded in popular press articles, a medium that closely corresponds to the way the majority of people are exposed to psychological research. These findings are consistent with a vast and ever-accumulating body of evidence that critical thinking can be taught in college (for reviews, see Abrami et al., 2008; Arum & Roksa, 2010; Bensley, et al., (2010); Huber & Kuncel, 2015; Ortiz, 2007; Pascarella & Terenzini, 1991, 2005). The magnitude of the mean difference in scores between the first and fourth assessment observed in this study (d = 2.17) represents a larger effect size than the average effect size observed in several previous meta-analyses (Huber & Kuncel, 2015). This disproportion is reasonable in light of Ortiz (2007), who observed larger mean effect sizes in studies where a critical thinking intervention was employed, as it was in the current study. It also lends indirect support to Abrami et al.’s (2008) assertion that such explicit instruction is necessary in order to achieve meaningful improvements in critical thinking.
Additionally, these data provide compelling evidence to support the use of multiple critical thinking assessments over a semester. However, the nonsignificant difference in scores between the third and fourth assessments indicates that a fourth CTD may be unnecessary. A more efficient implementation of this assignment would therefore limit the number of assessments to three. The overall pattern suggests that while scores did not significantly improve between each of the four assessments, in general, students’ MST abilities did improve incrementally throughout the semester in relation to these exercises.
However, it is important to note that because this study lacked a control condition, the results do not directly implicate the intervention itself as the cause of the observed trend. This limits any firm conclusions that might be drawn about CTDs as a teaching tool, over and above their potential usefulness as an assessment instrument. Future studies should address this limitation by including a control condition where participants do not have the chance to repeatedly practice MST through the CTD activity. In any case, it is clear that students’ scores increased, and it is certainly plausible that practicing the activity on multiple occasions had something to do with this tendency.
Another unresolved question involves the domain specificity of the questions contained in the CTD exercise and whether the improvements in MST observed in this study are likely to transfer across disciplines and life settings and to other forms of critical thinking. Since the current results cannot address this question, future research should do so by testing whether the thinking skills targeted by this activity can translate to other critical thinking activities.
An additional limitation of the current research is the potential for biased results stemming from the self-grading procedure that was employed in these exercises. In particular, this form of grading could have produced ceiling effects that would have constrained variability in the CTD scores. In addition, or alternatively, self-grading could have operated as a confound that systematically inflated scores as the end of the semester approached. Either case is possible, but on the other hand, available evidence suggests that self-grading has comparable validity to other means of grading (e.g., peer grading and teacher grading), as assessed by correlations between instructor-assigned and self-assigned grades (for a review, see Falchikov & Boud, 1989). Moreover, the potential for bias is further reduced when instructors, rather than students, furnish the final assignment grades (Sadler & Good, 2006), as was the case in the current study. Nevertheless, a future study should determine whether the observed pattern can be replicated using other grading procedures.
Finally, this methodology is subject to practical limitations based on the substantial time required for it to be fully employed in the classroom. While the CTD activity represents a high-quality learning experience for students, many instructors might be reluctant to devote multiple class periods to implement it. Luckily, as previously mentioned, the data presented here suggest that fewer than four CTDs are enough to produce measurable improvements in students’ MST. In fact, the current data indicate significantly diminished returns (i.e., a score plateau) after the second CTD, which suggests that for many purposes, it may be sufficient to administer just two of these exercises. Another approach could reduce the class time that is utilized to complete each CTD. Instructors might assign this as homework instead of as an in-class activity and devote class time exclusively to the self-grading and discussion portions of the exercise.
Footnotes
Appendix
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
