Abstract
The ability of modern web services such as news aggregators and search engines to tailor their results to the tastes of individuals, together with people's preference for reading opinions which reinforce their own viewpoints, have raised concerns that people are nowadays exposed to a narrow range of view-points, a phenomenon referred to as the “filter bubble”. In this paper we focus on increasing exposure to varied political opinions with a goal of improving civil discourse. We develop a method to algorithmically encourage people to read diverse political opinions and test it when people actively seek information. First, analyzing data from a popular search engine we show that people are indeed more likely to read opinions consistent with their own. Interestingly, they are more likely to read news from opposing sites when the language model of a particular news item is close to the language model of their own political leaning. Based on this finding, we describe a method for assisting people to read divergent opinions by choosing documents of opposing viewpoints that have a language model closer to their own language model. We test our method on a number of web searchers and show that pages of the opposing side which were more similar than the average persons' own language model tended to be clicked 38% more than those below. We also describe the long-term effects of our method, showing that people who were shown more diverse results continued reading more diverse results and overall became more interested in news.
Keywords
Introduction
Modern web services, including search engines, increasingly tailor their results to individuals, attempting to match information to the perceived preferences of each person according to their characteristics and past behavior. While personalization has many benefits, there have also been concerns raised about the potential for personalization to result in exposure to a narrower range of viewpoints, a phenomenon referred to as the “filter bubble” (Pariser, 2011). This kind of selective exposure to information has been blamed for a narrowing of the political viewpoint of people and the fragmentation of political discourse in the United States (Garrett & Resnick, 2011). Some evidence, however, suggests that polarization of online news consumption did not increase from 2004 and 2009 (Gentzkow & Shapiro, 2011).
Long before the Internet, social scientists developed the selective exposure theory to describe observations that people seek information that affirms their viewpoint and avoid information that challenges it (Frey, 1986; Mutz & Martin, 2001; Sears & Freedman, 1967). Thus, although the filter bubble has been discussed mostly in the context of website personalization, the tendency of people to read information that supports their own viewpoint, rather than diverse opinions on a topic, has been broadly observed in a variety of decision-making contexts. We term the filter bubble created when a person actively selects (or avoids) reading material the self-imposed filter bubble.
Exposure to differing viewpoints has been shown to be socially advantageous in several ways: First, experimentation has shown that when people discuss a topic solely with like-minded people (the so-called echo chamber), they embrace more polarized views of the topic (Stinchcombe, 2010). Selective exposure may also alter the political engagement process as seen in when voters make their decisions earlier and their levels of participation over time (Dilliplane, 2011). Second, exposure to diverse viewpoints increases tolerance for people with other opinions (Garrett & Resnick, 2011). Finally, diversity allows people to understand the amount of support that their opinion commands (Stinchcombe, 2010) relative to other opinions. Given the benefits of reading broader views, Garrett and Resnick (2011) have argued that technology could be used to expose people to a broader variety of perspectives, for example, by modifying the display of information to nudge people to becoming “open-minded deliberators.”
In this article, we first quantify the (self-imposed) filter bubble and find that people are, indeed, less likely to read views that are different than their own. We then attempt to enhance “civil discourse” by modifying search engine results in a way that introduces diverse opinions. We find that a document of the opposing view is likely to be clicked by a searcher if the document has a language model consistent with their own views. This suggests automated method for broadening the viewpoints that people read and reducing the filter bubble effect.
Literature Review
Several researchers have attempted to quantify the political leaning (or polarization) of news articles and news outlets. Milyo and Groseclose (2005) introduced the Slant Quotient scores, which assign a political leaning score to politicians and news outlets based on which political think tanks they cite. This method requires significant manual work to collate these citations. Gentzkow and Shapiro (2011) used phrases derived from transcripts of U.S. Congressional and Senate debates to rank news outlets by their mention of these phrases.
More recently, the availability of large-scale behavioral data related to activity on the web has enabled the development of new ways for measuring political leaning. Zhou, Resnick, and Mei (2011) classified the political leaning of news articles and people by building a bipartite graph of people and news stories, and propagating known labels of news outlets and people across the graph. The propagation methods in Zhou, Resnick, and Mei resulted in extremely high prediction accuracy (surpassing that reached using analysis of the text of articles), indicating that people tend to read opinions similar to their own. Similar methods were utilized (Borra & Weber, 2012; Weber, Garimella, & Borra, 2012) to identify the political charge of search queries based on the fraction of the time that a query led to clicks on political blogs of known political leanings.
A similar finding, namely that people tend to read news that reflects opinions similar to their own, was obtained from studies based on more traditional media outlets (DellaVigna & Kaplan, 2007). This study also analyzed the effect that the introduction of Fox News, a news channel with clear political leanings, had on voting in presidential elections from 1996 to 2000. They found a shift of 3–8% of public opinion toward the opinions represented by the news channel. However, other data suggests that the causal link between media bias and changing of attitudes is mixed (Prior, 2013).
A number of attempts have been made to nudge people into reading a more diverse political opinion (Munson & Resnick, 2010; Munson, Lee, & Resnick, 2013). These studies classified blogs or news articles into their political leaning and evaluated different ways of presenting them in order to encourage exposure to a diversity of opinions. For example, Munson and Resnick (2010) investigated the effect of diversification in a list of blog posts presented to participants in a controlled experiment, and found that people differ in their preference for diverse opinions. More recently, Munson, Lee, and Resnick (2013) provided people with feedback as to how much (on average) their reading was biased toward one or another political opinion and found that such feedback had only a small effect on nudging people to read more diverse opinions. Kriplean, Morgan, Freelon, Borning, and Bennett (2012) developed a system for people to explicitly construct and share pro/con lists for a political election in Washington State. They found that more than half of the participants listed both pros and cons, but their opinions did not change too much after using the system in part because they could not evaluate the trustworthiness of the claims. Nevertheless, social queues, such as endorsement by friends and colleagues, can help increase the diversity of readership, as demonstrated in an experiment by Messing and Westwood (2012).
Park, Kang, Chung, and Song (2009) showed participants in a laboratory study differing opinions on news topics and found that people who were shown these opinions tended to read more opinions and reported satisfaction with the additional information. In a similar laboratory setting, Oh, Lee, and Kim (2009) found that people preferred search results that were clearly delineated as to their leaning. Finally, Liao and Fu (2013) presented people news articles on controversial topics that were clearly marked as pro or con, and found some increased reading of opposing viewpoints for topics that participants were not very involved in especially when a threat was present. However, it is unclear whether these findings carry over to larger populations during routine news reading or searching behavior.
Retrospective Analysis
Method
We extracted all queries issued to the Bing search engine during July 2012 from the United States, which led to a click on a news outlet site. News outlets were identified as those sites that included the words “news,” “post,” or “times” in their URL or were part of a list of manually curated websites, for example, WSJ, Economist, USA Today, and Huffington Post. For each query, we recorded its date and time, the query text, the sites that were displayed as results and the ones that people clicked, and the zip code from which the query was issued.
Previous work described previously has shown that people tend to read political opinions that agree with their own opinions. Therefore, we propagated the voting patterns of people to news outlets in order to estimate the political leaning of each news outlet: Let zi
be the zip code from which the ith query was issued. We assign a political leaning score si
to the ith query according to the fraction of vote for the Democratic Party presidential candidate in 2008 (Barack Obama) in zi
. According to this score, a query with si
= 1.00 means that 100% of people in the zip code from which the query was posted voted for Barack Obama. Each news site URL (j) was scored by the (simple) average of all queries which led to a click to the news site, that is,
We extracted a sample of 179,195 people who clicked at least three pages from news sites during July 2012. Each page was scored according to its website
We refer to people as Republican (Democratic) if they click, on average, on pages that are below (above) the average score of news outlets. We partition news outlets into three categories, labeling a Republican-leaning outlet as one for which
Results
Estimating the Political Leaning of News Outlets
We identified 568 news outlets that received at least 10 clicks from searchers during July. For these outlets, the average of
As a validation, we conducted the same process for queries made during January 2013. The correlation between
Browsing Patterns in Retrospective Data
People tended to click on pages from news outlets with scores close to their own: The absolute difference
Moreover, as Table 1 shows, when examining the most polarized news outlets, those with the highest and lowest 20% scores, and similarly the people with the highest and lowest 20% average scores, we observe that 81% (76%) of Republicans (Democrats) click on items from one of the 20% most polarized outlets. However, these people click on the most polarized outlets of the other side only 4% (6%) of the time. Thus, the 20% most polarized readers are much more likely to read their own point of view, and rarely that of the other side, suggesting that the phenomenon of a filter bubble (at least in the sense of what people choose to read) does indeed exist. Interestingly, people with “centrist” scores are also unlikely to read the more polarized outlets, doing so, on average, approximately only 10% of the time.
Percentage of People Who Read News From the 20% Most Democrat-Leaning and 20% Most Republican-Leaning Outlets.
Note. People percentile 1 denotes the 20% of people with the lowest (most Republican-leaning) scores, and people percentile 5 denotes the 20% of people with the highest (Democrat-leaning) scores.
The average cosine similarity between all pairs of pages is 0.201. The average similarity of pages, partitioned by readership (and weighted by it), is shown in Table 2. When people read pages from a news outlet consistent with their leaning, similarity scores are much higher (0.333 and 0.344 vs. 0.201). In addition, when they read pages from a news outlet which is not of their political leaning, those pages are more similar in their language to pages in news outlets of their political leaning than would be expected by chance (0.255 and 0.294 vs. 0.201).
Average Cosine Similarity of Pages by Readership.
Note. Differences are statistically significant (sign test for columns, rank sum for rows, p < 10−3).
Anecdotally, the articles most likely to be read by Republicans in Democratic-leaning media are from the Wall Street Journal, LA Times, and some NY Times blogs. For Democrats, these are in Fox News.
Encouraging People to Read Views of the Opposing Side
Using the insights of the previous sections, we modified the search engine results page for a small set of divisive queries, to test whether it is possible to encourage people to read views of the opposing side. To do this, we first identified a set of topics that are divisive for the U.S. public. We did this by extracting all the queries leading to any Wikipedia page defined in the U.S. politics pages of 2009–2012. We then scored the queries as to their political leaning, by the same method used for scoring media outlets, for example, each query is scored by the average of person who posted the query
Queries With the Highest and Lowest Leaning Score, Grouped by Subject Matter.
During the first 10 days of February 2013, we modified the results for the queries in Table 3 for a subset of people who issued them. During this time period, a total of 73 people (and 118 queries) were issued on the topics shown in the table. For the subset of people in the treatment group, the search results were modified. The results for all others who posted the same queries were not changed, and these served as the control group for the study. For people in the treatment group, we modified the search result for a query by interleaving the results in the search engine results page with the results of similar queries from the opposing side. For example, if the query “obamacare” was posted by someone in the treatment group, they would be given the results for the query they issued in the first, third, fifth, and so on, ranks on the search results page, and the results for a query from the opposing side (e.g., affordable care act summary) in the second, fourth, sixth, and so on, ranks. If the same query was submitted by someone in the control group, they would see only the results for obamacare.
On average, search result pages that were modified were 13% more likely to be clicked (and were 8% more likely to have at least one click). Figure 1 shows the probability of a person clicking on a result as a function of its rank, for queries whose results were modified and for those where it was unmodified. As this figure shows, the even rankings in the modified results had a lower probability of clicks than the unmodified results. This is to be expected, since these results contained information of the opposing view. Interestingly, rankings 3, 5, and 7 of modified queries were clicked with a higher probability than the unmodified ones, possibly because people were compensating for the less (politically) relevant even-ranked results.

Probability of clicks as a function of rank on the search engine results page, partitioned by whether results were modified or not.
Although the even-ranked modified results received, on average, fewer clicks than the unmodified ones, this was not uniformly so. We developed a model to predict if a given page would be clicked based on its rank in the search engine results page, whether the results page was modified, and the cosine similarity of the results page to the language model of the person issuing the query (as determined by the query they issued).
The language model for Democratic- and Republican-leaning people was computed in the following manner: The 23k most frequently read pages (as described in the Methods section) were partitioned into Democratic- and Republican-leaning according to the score of their news outlet. A language model for each view (Democratic or Republican) was then computed by taking the average vector-space model of these pages.
We trained a decision-tree model using all the above-mentioned attributes, excluding the similarity of results pages. We used 5-fold cross-validation in training and testing. The classifier that used the rank of the page in the results page and whether it was a modified result or not, but not the similarity of pages obtained an area under the Receiver Operating Characteristic (AUC; Duda, Hart, & Stork, 2001) of 0.7875, while the classifier which that included the similarity to the user model reached an AUC of 0.8068 (statistically significantly higher, p < 10−5, (Hanley & McNeil, 1982)). Indeed, results pages of the opposing viewpoint which had a similarity higher than the average tended to be clicked 38% more than those below the average. Moreover, there is a small positive correlation (Spearman ρ = .14, p < 10−5) between the page similarity and the difference between the actual and predicted values, when the latter are computed using all the attributes except page similarity. The Spearman correlation between rank and similarity was negligible. Therefore, we conclude that pages of the opposing side were less likely to be clicked by people, but this could be mitigated by choosing pages that were similar in language to their own language. This validates our findings from the retrospective study, as discussed previously.
Long-Term Effects
We compared the long-term behavior two populations: one consisting of people whose results pages were modified by including results of queries popular in zip codes of opposing opinion (for at least one query) to another, which issued the same queries, but whose results were unmodified. To do this, we compared the clicks they made to news outlets in the 2 weeks before and after the treatment date.
Four parameters were measured: First, the average absolute difference of
Each person in the (larger) control group was matched to a single person in the treatment group. This was done by finding, for each person in the treatment group, a person in the control group who, before the treatment period, had a similar distance from the average page score and who viewed a similar number of pages in that period.
The results are shown in Table 4. As this table shows, the control population showed a negligible change in average distance to center. The treated population exhibited a change of 25% toward the center, indicating that after their results were modified, they read more content from the opposing side, compared to their behavior before the experiment.
Long-Term Effects of Changing Search Results.
Note. This table shows the average distance of
Table 4 shows the average distance of
Interestingly, we also observe a change in the number of news-related queries and news articles accessed per day, rising 9% and 4%, respectively, in the treated population, compared to a negligible decrease in the control population. Finally, our results also show a small decrease in the number of news sites read by people who saw the modified results compared to those who did not. Taken together, these results seem to indicate that people who were exposed to both views on a topic read more opposing views and became more interested in news.
Discussion
People’s tendency to only read opinions consistent with their own, also known as the filter bubble, has been blamed for a narrowing of the political viewpoint of people and the fragmentation of political discourse in the United States (Garrett & Resnick, 2011). Most previous attempts to overcome this tendency were made by displaying content tagged according to its leaning or by showing people how biased their reading behavior tends to be.
Our findings show that people are indeed more likely to read opinions consistent with their own: While 76–81% of people read pages from highly polarized news outlets consistent with their own opinions, only 4–6% read pages from similar opposing sites. However, when they read pages from opposing sites, they are more likely to do so when the language model of a particular news item is close to their own language model.
In this article, we describe a new method for assisting people to read divergent opinions. We showed that when the language model of a document is closer to an individuals’ language model, it has a higher chance of being read despite it describing an opposite viewpoint.
There are several limitations in our study, both in the retrospective analysis and in the intervention: First, our scoring of news outlets according to voting patterns hinges on the assumption that people tend to read opinions close to their voting patterns. This assumption was validated by our finding that people do indeed tend to read news channels with a score closer to theirs than would be expected by chance. Moreover, the correlation of our scoring method with previously proposed methods indicates its accuracy.
In our intervention study, we modified search engine results. Thus, we consider only information-seeking activities where a person actively seeks information (via search) which might be divisive. It may be that similar methods can also be utilized by news aggregators (e.g., www.bing.com/news) to support more diverse news browsing, but further work is required in order to validate this hypothesis. Another drawback of our method is that it currently relies on matching queries on the same topics from opposing views. Automating this stage may be easy in some cases, e.g. “death panel” versus “death panels,” where simple textual similarity may suffice. However, other cases (e.g., “affordable health care” versus “obamacare”) are much more difficult, and require additional research.
Finally, we have investigated a relatively small population for only two weeks after intervention. We plan to test if the effect of intervention has long-lasting effects and in a larger population of individuals.
Our work has implications for other areas where opposing views on specific topics exist. One area we plan to investigate is that of Anorexia. It is well known that some content sought by individuals (referred to as “Thinspiration” content) can have detrimental effects on sufferers of anorexia (Harper, Sperry, & Thompson, 2008). Banning such content or showing simple manifestations of opposing views (Yom-Tov, Fernandez-Luque, Weber, & Crain, 2012) has been shown to be ineffective. Our hope is that by utilizing methods described in this article anorexia sufferers may be nudged into reading recovery-oriented content, with positive outcomes.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
