Abstract
The ability to distinguish between correlational and causal claims is core knowledge for scientific literacy. News reports of scientific research prominently feature these claims. Thus, this knowledge has significant real-world application, and distinguishing among claims is critical to making sense of the reported research. We constructed an introductory psychology course with a series of brief exercises and assessments designed to improve students’ abilities to both understand the core concepts of correlation and causation and to apply their knowledge to real-world situations. Pre–post data on definitions and research in the news headlines revealed that students improved on both tasks by the end of the term.
Media reports of scientific research present claims that are correlational (x associated with y), causal (x leads to y), or descriptive in nature (there is x amount of y). Distinguishing among such claims is critical to making sense of the research. If a headline states “sleeping less is tied to weight loss,” do readers recognize this is a correlational claim or do they mistakenly draw a causal conclusion and consequently change their sleeping patterns to shed a few pounds? When reading “disciplinarian parents have fat kids,” do parents change their approach to parenting, because they mistakenly believe the claim suggests a disciplinarian parenting style can cause obesity?
The ability to recognize and distinguish between correlational and causal claims is considered core knowledge for scientific literacy (Norris, Phillips, & Korpan, 2003; Stanovich, 2010), has been a staple of psychology courses and introductory psychology textbooks (Boneau, 1990; Zechmeister & Zechmeister, 2000), and is a skill identified in the American Psychological Association (APA) Learning Outcomes (American Psychological Association, 2007). Despite its importance, there is only limited data regarding people’s ability to distinguish between these claims. A single study by Norris et al. (2003) suggests that undergraduates are not competent at recognizing and distinguishing between causal and correlational claims when presented with sample statements. Furthermore, although considerable attention has been given to developing critical thinking skills of various forms among psychology students (e.g., Bensley, Crowe, Bernhardt, Buckner, & Allman, 2010; Hall & Seery, 2006; McLean & Miller, 2010; Stanovich, 2010), few studies have targeted the development of students’ ability to distinguish among the types of claims found at the center of science news reports. One exception is an assignment by Connor-Greene (1993) that involved a single exercise during the coverage of research methods in a psychology course in which students analyzed a news report of scientific research. However, students did not complete a formal measure of student learning following this exercise.
We believe the development of these thinking skills requires regular and systematic practice, feedback, and reflection. Thus, we designed an introductory psychology course to include a series of formative assessments that would provide a coherent framework for the development of the abilities to distinguish among experimental, correlational, and descriptive studies and related claims. To investigate student development of these skills, we asked two questions: (a) Would student ability to accurately identify the definitions of correlation and causation improve over the course of the term and (b) would student ability to interpret and distinguish between correlational and causal claims in media reports similarly improve over the course of the term?
Method and Results
Participants
Participants were students from a predominantly White, Midwestern, 4-year liberal arts college, enrolled in four sections of an introductory psychology course taught by the first author across two 10-week terms (two sections per term). All sections used the same materials and assignments. Class sizes do not exceed 35, and students who take these courses are primarily first- and second-year students, and generally 50%–60% female.
Course Assignments
We emphasized scientific literacy throughout the course, particularly how to distinguish types of research studies (descriptive, correlational, and experimental) and how claims made about the research need to match the type of study (e.g., causal claims typically cannot be drawn from correlational studies), skills that are also articulated in APA Learning Outcomes 2 (research methods) and 3 (critical thinking; American Psychological Association, 2007). Specifically, the first quarter of the course covered the scientific method and what it means to know something in science. The instructor organized most lectures around the goals of science: describe, predict, explain, and control. The instructor presented each goal (e.g., describe) as a type of question scientists attempt to answer through specific methods (e.g., naturalistic observation, survey). Examples of studies using each method punctuated the lecture to illustrate how certain methods address certain questions or goals. These studies also afforded the instructor the opportunity to weave psychological content into the lectures. Additionally, through explanation, modeling, and student practice, the instructor taught students how to evaluate empirical and media articles describing research, including the claims made in the headlines of such reports. An online collection of headlines reporting scientific research, many of which confuse correlation and causation, provided plentiful examples for practice and discussion (see http://jfmueller.faculty.noctrl.edu/100/correlation_or_causation.htm).
Formative Assessments
For additional practice, feedback, and reflection, we assigned regular but brief homework assignments over the course of the term, many of which focused on understanding psychological research from empirical papers or news articles and headlines. (See Appendix for examples and http://jfmueller.faculty.noctrl.edu/100/printsched.htm to view all the assignments.) We carefully chose articles to be accessible to a 100-level audience and to match the topic discussed in class that day, for example, memory. Students answered questions about the research, such as “Does the study appear closer to being a descriptive study, a correlational study, or an experiment?”; or “Is the headline descriptive, correlational, or causal?”; or “Find an example of a claim presented in the media; list the evidence that is presented to support the claim; evaluate the evidence. Is the claim justified based on the evidence?” The class discussed these assignments at the beginning of the next class period, allowing students to learn immediately whether they understood the assignment. Because many students e-mailed the assignments before class, the instructor identified common misperceptions or errors to better guide class review of each assignment.
Adapting a grading scheme from Barbara Walvoord (Walvoord & Pool, 1998), the assignments were turned in and graded by the professor based only on whether a good faith effort was present (+) or absent (−). Students who received at least 90% plusses on the assignments received the full points available. Students receiving 80%, 70%, 60%, and fewer than 60% plusses received progressively fewer points. This grading scheme permitted the instructor to score the assignments rapidly while also identifying student’s strengths and weaknesses. As a result of these assignments, students practiced, received feedback, and reflected upon the targeted skills in a coherent and systematic manner.
Summative Assessments
To determine how well students were meeting the identified outcomes during the term, the instructor administered several summative assessments. Students individually completed four brief papers, three of which, described briefly below, primarily addressed scientific thinking skills. (See Appendix for detailed information on the second paper.)
Students read a summary of the methodology and results of a fictional study, and then answered several questions such as explaining why researchers used random assignment, identifying an alternative explanation, and proposing how to rule out an alternative explanation.
Students considered a research question and seven pieces of evidence intended to answer the question. Students ranked the seven pieces of evidence from most to least convincing in answering the question and explained their rankings, considering the merit of causal, correlational, and descriptive evidence for the type of question asked.
Students selected one of the two scenarios. Each scenario presented a claim and evidence to support it in a real-world context. Students evaluated the quality and nature (e.g., causal, correlational, or descriptive) of the evidence in relation to the stated claim.
Approximately a week before each of the four exams, the instructor posted 15–18 short answer questions on the course website. Students knew that 6–8 of those questions would appear on the exam but did not know which ones. At least two of the possible short answer questions on each of the exams addressed students’ ability to distinguish types of research studies and recognize whether the claim about a study matched the study’s methodology. All course assignments and possible exam questions are located at http://jfmueller.faculty.noctrl.edu/100. (See Appendix for examples.)
Skill Improvement Assessments
In Term 1, we administered a definition test to evaluate improvement in students’ ability to accurately identify the concepts of correlation and causation (Question 1). In Term 2, we administered a headline test to evaluate improvement in students’ ability to distinguish among correlational, causal, and descriptive claims presented in scientific news headlines (Question 2). In each term, students completed the assessments on both the first and the last days of the term. Completing the tests was voluntary (although no one declined). Due to small class sizes, we did not ask students to report demographic information other than gender and year in school to help maintain anonymity.
Definition Test
We created two versions of this test: a correlation test and a cause-and-effect test. At both Time 1 and Time 2, we randomly assigned students to a version but did not track which version a particular student received. Thus, some students completed the same version twice, and others completed different versions at Time 1 and Time 2. A few of the students who completed a test at Time 1 were not present at Time 2. For the correlation test (Time 1: n = 35, Time 2: n = 30), we presented five statements and asked to what degree each fits the definition of correlation (e.g., “the tendency of two factors to vary together;” “when a factor increases or decreases”). For the cause-and-effect test (Time 1: n = 34, Time 2: n = 32), we again presented five statements and this time asked to what degree each fits the definition of a cause-and-effect relationship (e.g., “the relationship between an event and a second event where the second event is a consequence of the first;” “when two variables change in a predictable manner”). For each test, two statements matched the definition and three did not. Responses ranged from 1 (definitely does not fit the definition) to 9 (definitely fits the definition).
Results
To determine whether students had correctly identified the definitional statements, when the statement matched the definition (the two causal statements on the causal test and the two correlational statements on the correlational test), we coded only responses of 7–9 as correct. When the statement did not fit the corresponding definition, we coded responses of 1–3 as correct. Then, we computed an overall score for each definition test by reverse-scoring the 3 items that did not fit the definition of correlation or cause and effect and summing responses across all five statements. Thus, for each assessment, possible scores could range from 5 to 45. Higher scores indicate that students more accurately identified the set of statements.
We did not ask for identification numbers to compare specific student responses at Time 1 and Time 2. Thus, these analyses are between subjects, and we conducted independent-samples t-tests to compare scores at Time 1 and Time 2. Overall, the scores significantly improved in the correlation test from Time 1, M = 25.57, standard deviation [SD] = 4.85 to Time 2, M = 32.78, SD = 7.28, t(64) = −4.77, p < .001, d = 1.19. Only 6.25% of the students labeled at least four questions right at Time 1, but 42.8% scored that well at Time 2. The improvement largely came from their ability to more accurately identify at Time 2 that the noncorrelational statements are not correlational. Scores also significantly improved on the causal test from Time 1, M = 31.26, SD = 5.24 to Time 2, M = 34.72, SD = 6.24, t(64) = −2.44, p < .05, d = 0.60. Although only 11.8% of the students labeled at least four of the five statements correctly at Time 1, that number improved to 46.9% at Time 2. Again, the improvement largely came from their ability to accurately identify the noncausal statements as noncausal; a ceiling effect prevented improvement in the causal statements.
Headline Test
Students also completed this test on both the first (n = 67) and the last (n = 59) days of the term. We asked students to provide identification numbers; thus, our analyses are within subjects, using only the 59 participants who completed both Time 1 and Time 2 assessments.
We gave students 10 headlines to read. Three used correlational language (e.g., “Educational level is associated with financial prosperity”), three used causal language (e.g., “Positive outlook leads to longer life”), three used descriptive language (e.g., “One third of adults have tried dating websites”), and one was a filler sentence that did none of the above. They read the headline and selected one of the four options that best described the research found in the headline: a correlation, a cause-and-effect relationship, a descriptive observation, or none of the above.
Results
We computed separate accuracy scores for cause-and-effect headlines, correlational headlines, and descriptive headlines. There were three of each headline type; thus, the number correct for each type could range from 0 to 3. Results of paired-sample t-tests indicate students demonstrated significant improvement in their scores on correlational headlines from Time 1, M = 2.16, SD = 0.98, to Time 2, M = 2.80, SD = 0.58, t(58) = −5.85, p < .001, d = 0.76. Scores on the causal headlines also significantly improved from Time 1, M = 1.92, SD = 0.79, to Time 2, M = 2.40, SD = 0.74, t(58) = −3.69, p < .01, d = 0.48. However, there was no significant improvement in the descriptive headlines from Time 1, M = 2.25, SD = 0.80, to Time 2, M = 2.39, SD = 0.81, t(58) = −1.18, ns, d = 0.15, perhaps because the instructor placed greater focus on the distinction between correlational and cause-and-effect language during the term.
Discussion
We constructed an introductory psychology course with a series of exercises and assessments designed to improve students’ scientific literacy, particularly targeting the core concepts of correlation and causation and application of that knowledge to real-world situations. To investigate student development on these skills, we asked two questions: (a) Would student ability to accurately identify the definitions of correlation and causation improve over the course of the term and (b) Would student ability to interpret and distinguish between correlational and causal claims in media reports similarly improve over the course of the term?
Regarding the first question, students’ ability to recognize definitions of correlation and cause and effect significantly improved over the course of the term. Much of that improvement resulted from students better recognizing that noncausal and noncorrelational statements did not fit the definitions of cause and effect and correlation, respectively. Regarding the second question, students significantly improved in recognizing both correlational and causal language in headlines but not in recognizing descriptive headlines, which received less attention during instruction. In other words, students appeared to have a stronger grasp of what is and what is not a causal relationship and a correlation and were better at applying that knowledge in a meaningful context at the end of the term. Although it is likely that the regular practice, feedback, and testing on these types of research through the formative and summative assessments strengthened students’ understanding of these concepts, it is impossible to determine which components or combination of components contributed to student learning.
Design Limitations
As previously mentioned, the current study employed a single-group pretest/posttest design to assess student improvement following explicit instruction. For the definition test, students completed either the correlation or the causation test at Times 1 and 2, but we did not track who completed which test. For the headline test, we utilized a within-subjects design. In both studies, the addition of a control group that did not receive instruction or the use of equivalent alternative forms would improve the internal validity of the assessments (see Bartsch, Engelhardt Bittner, & Moreno, 2008).
Variations and Practical Advice
Although most introductory psychology students will never major in psychology or pursue a career in research, all of them will regularly encounter scientific claims in the media. Given that American adults acquire the vast majority of their scientific information via the media (National Science Board, 2010; Nelkin, 1995), their ability to recognize whether a research report is suggesting a causal or a correlational link between vaccinations and autism, for example, could have costly or even deadly consequences. Furthermore, the pretest results reported earlier and our experience teaching introductory students suggest that students enter college with limited understanding of these concepts or skills and do not learn them quickly or easily. Thus, the authors chose to give less attention to some psychological topics, such as social psychology and child development, in exchange for enhanced focus on a set of critical thinking skills people require every day.
Nevertheless, other instructors might not choose to sacrifice quite as much psychological content or devote as many assignments to the development of these skills as we did. Yet, even with less time devoted to such skill development, instructors can still provide meaningful practice and feedback on scientific thinking. For example, in addition to whatever instruction on methodology and the nature of science a teacher might address, the instructor might identify two or three specific scientific thinking skills to foster, such as “Distinguish between descriptive, predictive and explanatory questions and conclusions” or “Evaluate whether a specific methodology is appropriate for testing a particular hypothesis.” Then, once a week or so, the instructor could ask students a few questions about a research or public press article or perhaps website content, weaving discussion of these issues into the normal discussion of that week’s topic. Careful selection of the material and assigned questions could address course-specific psychological content as well as provide opportunities for practice and feedback on the targeted thinking skills. Moreover, by selecting topic-specific headlines, articles, and questions, instructors could utilize this methodology in any psychology course. With a small yet meaningful set of thinking skill outcomes identified, instructors can provide systematic and regular instruction, practice, feedback, and reflection on essential skills embedded in the teaching of psychological concepts. Both conceptual and skill development should be essential components of a science course.
Appendix
Sample Formative and Summative Assessments
Formative assessment examples
Read the article “Is television traumatic? Dreams, stress, and media exposure in the aftermath of September 11, 2001.” Is the question in the title “Is television traumatic?” a descriptive, predictive, or explanatory question? (A one-word answer is sufficient.) Does the study appear closer to being a descriptive study, a correlational study, or an experiment? Explain. How do the authors suggest the media can affect someone's dream? Find an example of a claim presented in the media. Do not use a claim mentioned in class or the course website. Attach a copy of the source material (e.g., advertisement, article, website page) or write a brief description of it. State the claim. List the evidence that is presented to support the claim. Evaluate the evidence. Is the claim justified based on the evidence? Explain. Read the article “Diet of fish ‘can prevent’ teen violence.” Is the headline descriptive, correlational, or causal? Explain. Was an appropriate type of structured observation used to test the hypothesis that diet of fish can prevent teen violence? Explain. Did the results of the research support the claim? Explain.
Summative Assessment Example: Paper Assignment
You want to know
Rank order the pieces of evidence from most convincing to least convincing. By convincing, I mean how well does the evidence answer the question “why do teenagers have more difficulty focusing their attention early in the morning than do younger children?”
Explain why you ranked the different types of evidence in the order you selected.
Identify those pieces of evidence listed below, if any, from which you believe a conclusion can be drawn about why teenagers have more difficulty focusing their attention early in the morning than do younger children. Briefly explain.
Evidence:
You have overheard many teenagers talking about why they struggle with paying attention early in the morning, so you believe you have a good understanding of why they have such difficulty.
Thirty well-designed studies published in prestigious journals have discovered that melatonin, a hormone involved in sleep, is secreted by the body of teenagers at a different point in the sleep–wake cycle than it is for young children or adults.
A nutrition expert, appearing on Good Morning America, stated that she believes teenagers have more difficulty focusing their attention than younger children because, as she learned in her interviews with students of all ages, the teenagers were more likely to skip breakfast.
Your cousin has a hunch.
Twenty well-controlled experiments published in prestigious journals strongly suggest a cause for why teenagers have more difficulty focusing their attention early in the morning than do younger children.
After a recent airing of a story on the news show Dateline on NBC about some people being morning people and others being night people, Channel 5 asked viewers to call in the reasons they have difficulty focusing attention in the morning. Two reasons were given most frequently, and those reasons were presented on that night’s local news.
Two well-controlled experiments published in a prestigious journal suggest a cause for why teenagers have more difficulty focusing their attention early in the morning than do younger children.
Summative Assessment Examples: Short-Answer Test Questions
“Historical data show that the candidate who raises the most money before the primary season starts generally wins the nomination.” What kind of relationship (e.g., correlational, causal), if any, does the quote describe? Explain. Imagine the quote above is a hypothesis. How could you test it? Explain.
“Young boys (age 6) exhibiting a cluster of extreme personality traits, including impulsiveness, excitability, and lack of fear in new situations, are most likely to use both legal and illicit drugs in their early teens.” Does this research more likely describe a correlation or a causal relationship? Explain. If you said it described a correlation, change the wording so that it describes a causal relationship. If you said it describes a causal relationship, change the wording so that it only describes a correlation.
Footnotes
Acknowledgment
The authors thank Kirsten Bushman, Ellen Farley, Liz Gerhardt, Grace Hsiao, and Katya Rudenya for their assistance with the design of the studies, data entry, and the analyses of the results.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by a Summer Research Grant to the two authors by North Central College.
