Abstract
This study extends research into problems in handling sampling error within polls by examining coverage of President Obama’s approval ratings in three major newspapers over a five-year period. Results indicate support for hypotheses suggesting that, when confronted with poll results that could be explained by sampling error alone, journalists will instead emphasize those changes or differences.
Many Americans receive their knowledge of politics and the presidency directly or indirectly from the news media. Journalists, in turn, use approval ratings to examine the public’s perception of how the president and other politicians perform on the job. Much of the attention can be attributed to approval ratings serving as a “very reliable predictor” of voter behavior, especially as an election date draws near. 1 News media reporting of presidential approval polls in particular could do more than explain the American public’s view of a president’s performance. It could influence it. Priming research suggests that, while news media coverage alone is unlikely to change people’s political attitudes, it can influence the attitudes people use to evaluate politics and elections. 2 In an effort to minimize any undue influence, the Associated Press Stylebook and Briefing on Media Law gives journalists directives regarding the use of polling figures. Yet despite these guidelines, the news media are generally seen to consider margins of error and other important methodological criteria “only to a limited extent” in their reporting. 3 In an effort to examine that concern, this study uses the Associated Press guidelines to examine the depiction of President Barack Obama’s approval ratings in three major U.S. newspapers: USA Today, The Washington Post and The New York Times.
Background
Presidential approval ratings have been used since the 1930s to explain whether the American citizenry feels that the president is performing well on the job. Most polls simply ask citizens whether they approve or disapprove of the president’s job performance. Much research suggests a very broad claim that the president’s approval rating is driven by “peace, prosperity, and probability.” 4 Similarly, Sparrow 5 suggests that not only the public but also politicians pay attention to public opinion expressed in polling data and often use that data to make decisions. Wisniewski, Lightfoot and Lillemon 6 suggest a strong relationship between approval of an incumbent president and price-earnings ratio, which indicates economic performance.
Some studies have questioned the news media’s use of approval poll data in stories that cover “horse race elections” 7 and exit polls. 8 Hickman 9 observed that the news media tend to devote an inordinate amount of attention to marginal changes in polling data, and Reavy 10 and Burns 11 found that news media coverage of pre-election polls seem to emphasize changes or differences that could be explained by sampling error. Patterson 12 argued that these issues, combined with an over-reliance on polls, diminish the overall quality of journalistic coverage of U.S. elections.
Because many errors can exist in polling data, statisticians commonly identify the errors from three primary sources: measurement error, specification error and sampling error. 13 Specification error refers to the underlying theories of a poll, which could include sample selection. 14 Measurement error typically involves problems within the survey itself (i.e., the wording of the survey questions or the order of the questions within the survey). Both specification error and measurement error are systematic errors that cannot be analyzed statistically. Thus, those types of errors receive relatively little attention in published articles. Sampling error, on the other hand, is often found in published news media reports.
Sampling error denotes the estimated potential range of difference between the value reported from a sample and the true value within the population. For example, if a Gallup poll 15 with a 4-point margin of sampling error finds President Obama has an approval rating of 46, the true value within the population studied would more accurately be described as somewhere between 42 and 50. Because the range of sampling error falls along a normal distribution curve, the likelihood of the rating within the population decreases as one moves away from the center point of 46. However, utilizing the traditional 95 percent confidence level in social science research, it is noted that the result of the poll is 46 ± 4.
When two polls are compared, things become more complex because any two polls each contain sampling error that must be accounted for. 16 For a difference between two polls conducted using precisely the same methodology to be statistically significant in terms of standard social science methodology, the difference must exceed twice the poll’s margin of sampling error. If two separate polls are compared, which should be done only with great care, the difference must exceed the margin of sampling error of both polls combined.
Numerous studies have focused on journalists’ use of polling data including data misrepresentation 17 and journalists’ failure to provide enough information for readers to properly evaluate poll results. 18 The latter is especially important given research indicating that regular newspaper readers rate polls as less informative and less objective when methodological details are left out. 19
The Associated Press Stylebook recommends the context to use poll data with specific guidelines, such as “Carefully word such stories to avoid exaggerating the meaning of poll results.” 20 It also provides detailed guidelines as how to handle sampling error. While at least two newspapers in this study maintain their own style guides (e.g., The New York Times Manual of Style and Usage and the Washington Post Deskbook on Style), these do not address how to handle sampling error in news articles. Groups such as American Association for Public Opinion Research (AAPOR) and National Council on Public Polls (NCPP) have worked to educate journalists on how to handle these concerns, but researchers worry that the advice goes unheeded. As Rosenstiel observed, “These are excellent primers and good counsel. Yet groups such as AAPOR are frustrated by whether the line journalists are seeing and understanding them.” 21 Thus, this article will focus on the more detailed description of usage provided by the Associated Press and continues the research into how journalists use polling data to portray change in approval ratings and the differences and extremes in President Obama’s overall job approval polls and whether, as researchers suggest, the advice of pollsters and polling organizations continues to be largely overlooked by journalists.
Hypotheses
This study extends previous research into the sampling area reporting problems in presidential approval polls. Using previous studies as a guideline, four hypotheses will be tested:
Again, any “difference” that falls within twice a poll’s margin of sampling error could possibly be explained by sampling error alone, and when two separate polls are compared, any difference between those polls that falls within their combined margins of sampling error could possibly be explained by that sampling error alone. So this hypothesis suggests that in more than 50 percent of the cases, a writer is confronted with a “difference” that could be explained by sampling error alone; he or she will choose to emphasize the difference as genuine rather than acknowledge the statistical similarity of the two poll results.
This second hypothesis suggests that such “changes” will be emphasized as genuine rather than as statistically similar to previous results.
Records suggesting that a given poll is the “highest” or “lowest” in a president’s history or since a specific date represent a stated difference between the current poll and the next closest poll result. For example, if President Obama receives a “record low” approval rating of 46 in the most recent poll, that difference between this poll and the next closest poll must exceed twice the poll’s margin of sampling error before it can truly be called a “record.” Anything less indicates a difference that could be explained by sampling error alone. So this hypothesis predicts that more than 50 percent of the time journalists encounter such a poll result, they will emphasize it as a “record” rather than acknowledging that it could statistically be said to be tied with a previous high or low.
Unlike the previous hypotheses, which compared the results of two polls, this hypothesis relates to a single poll crossing an arbitrary line. Because the line has no margin of sampling error, only the poll’s margin of sampling error is taken into account. Thus, this hypothesis predicts that when a journalist writes about a poll result crossing a line by an amount within the poll’s margin of sampling error, the writer will be more likely to emphasize crossing the line than the fact that the crossing could be explained by sampling error alone.
Method
This study examined news articles from three major American newspapers—the Washington Post, the New York Times and USA Today—during the five-year period between President Barack Obama’s first State of the Union address on February 4, 2009, and his January 27, 2014, State of the Union address. This time period was selected to define the search parameters of the study because the State of the Union address is a significant political event in a president’s term that can influence or be influenced by presidential approval ratings.
The Lexis-Nexis database was used to identify articles that contained the terms president or presidential, approval and rating or ratings. The searches of each newspaper resulted in a total of 2,459 articles (USA TODAY = 241; WP = 1,339; NY = 879) that matched the search criteria. Articles that did not pertain to national presidential approval ratings, including particular state ratings and ratings of individuals other than President Obama were removed from the data to produce a total of 285 articles that were searched for statements that served as the unit of analysis. A total of 319 paragraphs containing such statements served as the unit of analysis. The primary coder examined each of the 319 paragraphs, while a secondary coder examined approximately 20 percent of the paragraphs for reliability, which was determined to be 0.85 using Holsti’s formula.
The researchers coded each paragraph on 36 variables, broadly divided into categories related to the following:
The identity of unit of analysis, including the date, publication, edition, page number, paragraph.
The polls, including the results of the three newspaper’s own polls, whether a poll cited within the paragraph, which poll was cited, which poll(s) were used for comparison and the margin of sampling error of the cited poll.
Difference, change, records and arbitrary lines, each coded separately as to whether one was cited, numerical separation of poll results or an arbitrary line and whether that separation fell within the margin of sampling error.
After data were checked for accuracy, results were analyzed in an Excel spreadsheet. Statements in which a writer emphasized difference, change, a record or the crossing of an arbitrary line that could be explained by sampling error alone were separated for further post hoc analysis, including estimation of the degree to which the amount emphasized fell within sampling error.
Results
A total of 319 statements related to change, difference and records in presidential approval polls were selected for inclusion in this study. Of these, 153 (48.0 percent) lacked adequate context for comparison, despite efforts on the part on researchers to identify a context. These efforts included, in some instances, looking for matching poll results when no specific poll was mentioned. For example, when President Obama’s approval ratings in a Gallup poll were compared with those of other presidents, researchers utilized Gallup’s online Presidential Approval Center to derive a context for comparison when possible. 22
In some cases, the lack of context resulted from a reference that could have meant any of two or more specific polls. For example, a March 14, 2009, article in the Washington Post, indicated that “[President Obama’s] approval ratings remain strong—above 60 percent, according to the most recent Gallup poll—but have dropped from their highs.” 23 It is unclear whether this refers to the most recent Gallup poll or the most recent Gallup daily tracking poll. Researchers found some instances in which “the most recent Gallup poll” clearly referred to the USA Today/Gallup poll and others in which it clearly referred to the Gallup daily tracking poll. When the reference was not obvious, coders labeled the statement as lacking a proper context for analysis.
In other cases, the lack of context stemmed from generic references to “the polls” or simply to “approval rating” with no mention of specific numbers. For example, an April 30, 2009, article in the New York Times noted that “[President Obama’s] job approval rating remains high, particularly given the wave of challenges on his desk.” 24 The article fails to mention any specific approval rating poll, prompting it to be labeled as lacking proper context for analysis.
Once those statements lacking a proper context were culled, the remaining 166 (52 percent) were examined for evidence related to the hypotheses of this study.
Data for H1 shows that of those statements containing enough context for analysis, 75 cited differences between polls, and 60 (80 percent) of those fell within the margin of sampling error of both polls. Looking at these more closely, writers emphasized the difference rather than the statistical similarity of the polls in 41 (68.3 percent) of the cases. Table 1 shows a breakdown of those statements referencing a difference between polls. More than half of the statements referencing a difference between polls emphasized that difference rather than noting that it could be explained by sampling error alone. This held true overall, as well as within each newspaper studied. These findings support H1.
Differences between Polls
Data to test H2 show that the studied population consisted of 71 statements citing a change in a poll. Of those, 45 (63.4 percent) depicted changes within the margin of sampling error of the poll’s results. Given the choice of emphasizing such change or acknowledging that it could be attributed to sampling error, the writer chose to emphasize the change in 24 (53.3 percent) of the statements. The results are detailed in Table 2. Analysis of all statements supports H2. However, it should be noted some differences emerged among the newspapers when it came to portraying change. USA Today articles were much more likely to emphasize change over statistical similarity compared with articles in the other two newspapers. The Washington Post was the only newspaper to lean away from emphasizing change within polls.
Change between Polls
Post hoc examination of statements in The Washington Post indicates that much of the newspaper’s apparent reluctance to highlight a change between polls can be attributed to two causes. First, when discussing perceived changed between two polls, writers tended to more frequently cite the numbers themselves. For example, the statement, “In the survey, 47 percent approve of the job Obama is doing, down seven points since January,” 25 merely states the numbers and cannot be criticized for over-emphasizing change. In addition, writers frequently depicted a change in poll numbers utilizing an appropriate level of fuzziness. For example, the statement, “[President Obama’s] overall approval rating stands at 56 percent, holding steady in Post-ABC polls since the late summer,” 26 is a statistically accurate description of polls that had fluctuated up and down by no more than the poll’s margin of sampling error during that time period.
Data testing H3 shows that articles discussing “records” in any given poll were naturally rarer than those observing change in a poll or differences between polls. This study identified 43 such statements during the content analysis. However, while these statements are less common, they are also far more likely to reference a change and/or difference. Of the 43 statements, 41 (95.3 percent) discussed changes or differences that fell within the margin of sampling error of the polls. A high percentage of these changes or differences were depicted as polling records, despite the fact that they were within the polls’ margin of sampling error. In all, 27 (65.9 percent) of the statements emphasized the records. As with difference, this was seen at least half of the time across all newspapers. The results, depicted in Table 3, support H3.
Records
With regard to reporting on “records,” the New York Times was notable both for the few statements that discussed record poll results and the lower percentage of those statements that emphasized the record over the statistical similarity of that record to previous records or other poll results.
In reviewing data gathered for H4, 30 statements referenced polls crossing some kind of arbitrary line—most commonly “the 50 percent mark” beneath which a president’s approval rating is deemed to have a negative impact upon his party in upcoming elections. Of these statements, 24 (80 percent) fell within the poll’s margin of sampling error. Because an arbitrary line has no sampling error, only the poll’s margin of sampling error applies to these comparisons. Thus, the margin of sampling error for comparison purposes is much smaller than that examined under previous hypotheses. Nonetheless, the difference between a poll result and an arbitrary line was more likely to be attributed to actual change than to sampling error in the majority of cases, supporting H4. In all, 17 (70.8 percent) of the statements failed to properly account for sampling error in reporting that a poll crossed an arbitrary line. The results are detailed in Table 4.
Arbitrary Lines
Discussion
This research adds to the body of evidence suggesting that journalists consistently have difficulty in adequately accounting for sampling error in the reporting of poll results. It further finds that, more often than not, journalists report poll variations as real change or difference even when those variations can be explained by sampling error alone. These results support previous findings that the news media have difficulty properly reporting change in a poll or differences between polls. 27 Furthermore, results suggest that the news media frequently err by treating as a news event that a poll crosses an arbitrary line, such as the 50 percent line that some researchers find important in predicting upcoming elections, even when that line was crossed by an amount well within the poll’s margin of sampling error.
This study found that the press had an especially difficult time accurately reporting “record” poll results. When the three newspapers studied reported such results, the supposed new record could be explained by sampling error alone more than 95 percent of the time. Journalists failed to account for that fact in nearly two-thirds of their reports. Moreover, post hoc analysis indicated that so-called “records” occurred at a much higher level than other poll reporting problems. In other words, the differences between the new record and the old record tended to be quite small—often as little as a 1-point difference—and well within the poll’s margin of sampling error.
The problems seem to stem from the fact that journalists tend to approach the result of a poll as a hard number, with even the slightest bit of movement deemed notable. Poll articles too often present numbers with a false degree of precision, even when methodological problems abound. Consider a March 12, 2012, article in the Washington Post under the headline, “As gasoline prices rise, president’s ratings fall”:
The negative movement has also stalled what had been a gradual increase since the fall in the president’s overall approval rating. In the new poll, 46 percent approve of the way Obama is handling his job; 50 percent disapprove. That’s a mirror image of his 50 to 46 positive split in early February.
28
First, the article portrays the 4-point change in approval ratings as a “negative” movement, despite the fact that it is well within twice the poll’s 3.5 margin of sampling error. Second, it portrays the 4-point difference between approval and disapproval as significant. Finally, it notes as somehow newsworthy that the current poll is a “mirror image” of a previous poll. However, in this comparison, the margin of sampling error should be applied to both the approval and disapproval ratings in each poll. Consider that the “true” approval rating in the population could have held steady at 48 percent approval and 48 percent disapproval, with the different results of the two polls being explained by sampling error alone. Although it would make for nearly as interesting a story, these results could easily be interpreted as saying that President Obama’s approval ratings remained essentially steady. Interestingly, the poll numbers would essentially “flip-flop” again, 50-45, in the next ABC/Washington Post poll on April 8, 201229—results that still fell within the polls’ margins of sampling error.
Rather than treat poll results as hard numbers, journalists would be well advised to confidently present them with a certain degree of fuzziness. It is one thing to say, “In a recent Gallup poll, 51 percent of respondents said they approved of the job President Obama is doing.” It’s quite another to say, “A majority of Americans approve of the job President Obama is doing.” A poll result of 51 percent, with a 4-point margin of sampling error, should be treated in every way as a result of 47-55. Doing so would avoid a majority of the problems identified in this study. For example, few journalists would look at two polls, one with a 47-55 and another with a 45-53 and conclude decisively that one poll was ahead. Yet, they too often do when the polls are 51 versus 49.
When it comes to reporting poll numbers, journalists need to become comfortable with the imprecision of statistical data. In such a world,
A 51-49 “lead” in a poll becomes “essentially tied.”
A 51-49 “drop” in a poll becomes “holding relatively steady.”
A “record high” of 51, beating out the previous high of 49, becomes “among the highest ratings in this poll.”
Arbitrary lines, such as the “50 percent mark” are almost never crossed in a single poll. Instead, results are “near the 50 percent line” until the difference between the poll and that arbitrary line exceeds the poll’s margin of sampling error.
Journalists should continue to exercise caution even when a result cannot be explained by sampling error alone. First, one must recall other forms of error, such as question wording or other methodological concerns that can bias a poll. Second, sampling error is calculated at the 95 percent confidence level. That means that in one out of every 20 cases, one expects that a result might simply be a matter of chance.
Polls remain a powerful tool for journalists to gauge the opinion, understanding and will of the people. Journalists must approach the interpretation and representation of such polls with the same care and responsibility they aspire in presenting other forms of information to the public.
Footnotes
Editors’ Note
This article was accepted for publication under the editorship of Sandra H. Utt and Elinor Kelley Grusin.
