Abstract
Reading and critiquing journal articles is a way in which instructors can promote learning and critical thought among students, particularly in the undergraduate research methods course. However, articles that have survived the peer-review process are often lengthy and have only nuanced drawbacks, rendering them less useful for such pedagogical purposes. Students read a series published articles that are brief and have fundamental methodological or other concerns that render their conclusions questionable. After students read each article, the instructor leads a discussion in which students debate the article’s benefits and drawbacks. Assessment indicated that the activities increased students’ understanding of the course material and that students found the activities to be educational and enjoyable.
A host of research (e.g., Bebermeier & Hagemann, 2019; Carkenord, 1994; Chamberlain & Burrough, 1985; Gareis, 1995; Hall & Seery, 2006; Leong, 2013) suggests that critiquing peer-reviewed articles has beneficial outcomes for students. However, the process also has some meaningful drawbacks. First, as such articles have typically survived the rigorous peer-review process, their limitations are often minor and nuanced. Second, such articles are often extremely lengthy, making their use for pedagogical purposes impractical.
To solve these problems for my undergraduate research methods course, I have found a number of published articles that do not suffer from these drawbacks. Rather than having nuanced limitations, these articles have more meaningful methodological or statistical limitations. The questionable conclusions that result render these articles excellent discussion topics for a research methods class. In addition, these articles are brief: Students can read them from a start to finish in less than 5 min. This brevity allows the instructor to include many brief articles in a course, rather than a few longer articles.
Procedure
After discussing a concept in class, students read the prescribed article and then discuss its merits and drawbacks in small groups. When discussion slows (typically after 5–10 min), I ask volunteers to share their observations with the class, with the hope that the students will identify the study’s major questionable claims. We then debate the perceived strengths and weaknesses of the study and relate them to the course material. Often, students will generate many problems with the paper that are ancillary to the fundamental problem relevant to the day’s topic. Typically, we discuss no more than one article in a day.
The Articles
Questionable Claims of Causality
Kravitz and Furst (1991) gave participation gifts to some members of a fitness center, but they gave no such participation gifts to others. Attendance rates were consequently assessed. Results indicated that participants who received the gifts showed higher attendance rates than did participants who did not receive the gifts. The authors thus argue for a causal effect of rewards on exercise adherence. However, participants had self-selected into conditions: They chose to join rewards or nonrewards groups; there was no random assignment to condition. The lack of random assignment and lack of manipulation to condition therefore make the authors’ causal claim questionable.
Schumm (2004) argued that the 2003 U.S. invasion of Iraq significantly reduced suicide bombings in Israel. In a pretest-posttest design using archival data, the author found that there was a reduction in such attacks from before the invasion to after the invasion. From these data, the author made the causal claim that the invasion saved thousands of civilian lives. Causal claims relying on pretest-posttest designs, however, are questionable in that other events that took place in the interim may have accounted for the observed diference (e.g., Campbell & Stanley, 1963). The author’s use of one-tailed t tests and double-counting of outcome variables adds further uncertainty.
Gump (2004) examined the relation between students’ evaluations of an instructor and the extent to which the students were aware of the instructor’s daily class objectives. One instructor asked students to report their perceptions of the instructor’s teaching effectiveness, as well as how frequently the instructor defined class objectives. There was a positive correlation between these two variables: Students who indicated that the instructor defined objectives more frequently more likely to rate that instructor more highly. Two problems appear to exist: First, the correlational nature of this study makes the author’s occasional suggestions of causality—that higher ratings resulted from clear class objectives—questionable. Second, given that all students reported attitudes toward a single instructor, the “objectives” variable reflects differences in student perceptions rather than differences in the extent to which the instructor actually defines objectives. In other words, because all students reported attitudes toward a single instructor, there is no variance in the extent to which that instructor actually presented course objectives.
Other Questionable Conclusions
Greene and Noice (1988) assessed the effect of positive affect on problem-solving. Researchers randomly assigned participants to a positive-affect condition or a control condition. Positive affect was manipulated by giving some children compliments about their attire or appearance and giving them a package of gum. Participants assigned to this positive-affect condition performed better on creativity and problem-solving tasks than did participants assigned to the control condition. However, although the paper indicates that a manipulation check was assessed (e.g., participants were asked to indicate their mood immediately after the manipulation), there is no description of the results of that manipulation check. The use of an untested operationalization of the independent variable—along with the failure to report the results of the manipulation check—casts doubt on whether positive affect accounted for the difference in the dependent variables. The compliments and gum may have impacted other theoretical variables such as feelings of reciprocity; these theoretical variables may have accounted for the different means, not positive affect.
Parke and Griffiths (2004) examined aggressive behavior among slot machine gamblers in a casino, finding that 9 of the 303 gamblers observed engaged in aggressive behavior. The authors argued that these results support a relationship between gambling and aggression. A variety of limitations of the study, including the fact that the research includes no nongambling comparison group and that 97% of the observed gamblers displayed no aggression, make this conclusion questionable.
Trinkaus (2004) investigated the extent to which visitors to a single church in the Northeastern United States voluntarily paid for religious candles. The author showed that, over a 16-year period of time, the percentage of visitors to the church who paid declined from 92% to 28%. From these specific data collected among such a limited sample, the author’s conclusion that the general population’s honesty is in decline is questionable.
Peterson and Pfost (1989) investigated whether eroticism or violence in music videos would make viewers’ attitudes toward violence against women more positive. Participants viewed videos in which erotic content and violent content were orthogonally manipulated. Attitudes toward violence against women were then assessed. Although the proper analysis would seem to be a 2 × 2 factorial design given that two independent variables were orthogonally manipulated (i.e., eroticism and violence), the authors instead conducted a series of four-level one-way analyses of variance on the various dependent variables. This analysis technique does not fully explain the relation between the variables—interactions that may have emerged between the two independent variables, for example, cannot be observed. (A review of the means, in fact, suggests that such an interaction may have emerged had a factorial analysis beeon conducted). As such, the analysis technique is questionable.
Evaluation
On one day of the course, I explained the concept of the manipulation check to my level research methods course at a small liberal arts school in the Northeastern United States, which consisted of primarily sophomores and juniors. I presented a definition of the term as well as information about how the process can be implemented and its benefits. After this initial presentation, by random assignment, students (N = 19) completed either a five-question multiple-choice quiz about manipulation checks (e.g., “If the effect of the indepenent variable (IV) on the dependent variable (DV) was significant, but the manipulation check failed, what can we say?”) or a filler essay about a previously discussed topic . The multiple-choice questions each had only one correct answer of the four possibilities and as such each response was scored as “correct” or “incorrect.” All students then read the target article (Greene & Noice, 1988), after which time I led a discussion about the merits and drawbacks of the paper, including the fact that a manipulation check was apparently implemented but no associated results were presented in the results section. After this discussion, students took part in the task they had not previously completed: Either they answered the five questions about manipulation checks or they completed the filler essay. The data were coded by an individual who was both condition-blind and hypothesis-blind. Students who answered the questions second (i.e., after the demonstration) answered more questions correctly (M = 3.90, SD = 1.20) than did students who answered the questions first, i.e., before the demonstration; M = 2.89, SD = 0.60; t(17) = 2.28, p = .04, indicating a positive effect of the demonstration on students’ understanding of the concept.
Students from another section of the course (N = 28) were asked to indicate their perceptions of the article review sessions on 5-point scales, ranging from 1 = not all to 5 = extremely. Participants reported that the demonstrations were educational (M = 4.33, SD = 0.49), a good use of class time (M = 4.33, SD = 0.65), and enjoyable (M = 4.00, SD = 0.85).
Discussion
Reading and critiquing published articles is a common technique for enhancing critical thinking and for underscoring topics from research methods classes. Whereas publications from typical peer-reviewed outlets are often too lengthy and only manifest minimal drawbacks, the articles I use (see Table 1 for a summary) are brief and make conclusions that are debatable. Indeed, an instructor may attempt to explain why claims of causality depend on proper random assignment to condition. But reading an actual article in which authors make claims of causality without random assignment may prove more enlightening than even the most well-crafted of lectures.
Summary of Articles.
Using actual research articles may also be more beneficial than providing synopses or descriptions of studies in that reading and analyzing actual unabridged publications is a more realistic process. This enhanced realism may provide several advantages. First, it forces students to find the potential problems in the context of an article, a task that is representative of what scholars do in the academic world. Second, the current demonstration forces students to find potential problems embedded in articles in which other processes are done correctly. Problems are likely to artificially “stand out” in a synopsis, whereas they may be more challenging to find in the context of a full article. Third, reading actual journal articles as published may drive home the point that consumers of science must read even published research articles critically. In my experience, students often find drawbacks that I had never considered—something that would be extremely unlikely to take place with a synopsis.
An additional benefit of using such articles is that doing so helps foster healthy skepticism among students regarding published literature. In recent years, the reproducibility crisis (e.g., Pashler & Wagenmakers, 2012), a proliferation of predatory journals (e.g., McCutcheon, Aruguete, McKelvie, Jenkins, & Willliams, 2016), and outright fraudulent results (e.g., Stroebe, Postmes, & Sears, 2012) have made it eminently clear that even though a “finding” was reported by “scientists” in a “journal,” it is important that we not put blind faith in such publications. Students should recognize that even published articles can have their flaws, some of which may be fundamental.
Although the articles reviewed herein are intellectually stimulating, some of their conclusions are questionable due to methodological or statistical drawbacks. It is precisely these drawbacks that make them excellent candidates for pedagogical use.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
