Abstract
Abstract
The widespread use of web searches in daily life has allowed researchers to study people's online social and psychological behavior. Using web search data has advantages in terms of data objectivity, ecological validity, temporal resolution, and unique application value. This review integrates existing studies on web search data that have explored topics including sexual behavior, suicidal behavior, mental health, social prejudice, social inequality, public responses to policies, and other psychosocial issues. These studies are categorized as descriptive, correlational, inferential, predictive, and policy evaluation research. The integration of theory-based hypothesis testing in future web search research will result in even stronger contributions to social psychology.
Introduction
T
In this review, we examine the reasons why people engage in web searches and how their attention to or interest in a topic can be deduced from their search behaviors. We review social psychology research that has used big data on web search queries and identify several key research areas in this field. The advantages and limitations of web search research are discussed, as well as future directions and implications of big data research focusing on web search queries.
The Nature of Web Searches
Web searches in everyday life
Conducting web searches is one of the most frequent behaviors on the Internet. The Pew Research Center has reported that nearly 90 percent of Americans use search engines to obtain information, 6 and 77 percent use search engines to search health-related information. 7 In addition to searches for health information, the content of people's web searches may be related to work, study, entertainment, consumption and other aspects related to their daily life. For example, Spink et al. 7 analyzed search terms from multiple search engines and found that the types of searches included (from highest frequency to lowest frequency) the following: (a) entertainment; (b) sex, pornography, and sexual preferences; (c) commerce, travel, employment, and economy; (d) computers and Internet; (e) health and the sciences; (f) people, places, and things; (g) society, culture, ethnicity, and religion; (h) education; (i) performing and fine arts; (j) government; and (k) unknown and incomprehensible.
Motives for web search behaviors
The behavior of using search engines to search for information in the network environment is identified as a form of information search or information-seeking behavior. This is supported by several theoretical explanations: (a) in view of the evolutionary perspective, search behaviors were beneficial for early humans to succeed in finding suitable spouses and food. 8 Thus, the search for information meets certain basic needs, promotes adaptability, and improves human survival and reproduction; (b) another explanation looks at information as something needed by humans for finding answers, reducing uncertainty, and sensemaking. 9 (c) In the field of mass communication, web search is viewed as a media-based consumer behavior. For instance, Play Theory tends to view information search as an entertainment medium, rather than an access tool for real information. 9 (d) From the educational perspective, web searches help to answer learning queries that can be divided into lookup searches, learning searches, and investigative searches. 10 These four theoretical explanations account for important motives for web search behaviors.
Advantages of research on web search data
Web search data allow tracking of the intensity of a population's attention to specific topics in real time. Compared to traditional research methods such as experiments and surveys, web search data have the following advantages:
High objectivity and ecological validity
Web search data provide information about a large user population; hence compared to other research methods, they provide higher objectivity and ecological validity. This is exemplified by one of the biggest search engines, Google, whose search volume reached 1.2 trillion in 2012, including data covering 146 languages across many countries. 11 In addition, users are not disturbed by others' opinions when they are searching the web. Furthermore, research evidence has revealed that objective web search data may obtain different findings from traditional questionnaires. For example, Wojcik et al. 12 found that political liberals were less happy than political conservatives based on a self-reported subjective well-being measure, but more happy than political conservatives based on social media data. Hence, web search data provide high objectivity and ecological validity in a nonlaboratory setting.
High temporal resolution
Web search engines produce both a search record and a timestamp of the user's search content in response to the user's search instructions, hence providing an economical and efficient way of obtaining accurate data. For example, Google Trends is able to provide both web search data and marked timestamps since 2004, while traditional large-scale social surveys take at least a few years to conduct (e.g., the General Social Survey takes 10 years each time it is conducted).
High application value for public policy
As web search data register the user's geographical location, the web search engine provider is able to collate a comprehensive set of web search data from a specific region. Hence, web search data can provide up-to-date information for research addressing city, state, or national policy concerns. In comparison with traditional research methods that gather information at the individual level, research using web search data allows group-level analysis that is of great importance and high reference value when it comes to policy making.
Social Psychology Research Based on Web Search Data
Since the use of Google's search data to significantly predict influenza in regions of the United States, 13 web search data have been used for research on politics,14,15 economics,16,17 public health,18,19 the environment, 20 and other areas. Web search data are favored by researchers in the social psychological fields as they provide information on the behaviors of a large population. Currently, web search research mainly focuses on common social and psychological behaviors such as online sexual behavior, suicidal behavior, mental health, social prejudice and inequality, and responses to public policies.
Online sexual behavior
Online sexual behavior is a hot topic in web search research, as the topic of sex is the second most frequently searched topic on the web.21–24 Web search research on sex-related topics can provide new evidence to support and challenge previous theories and common wisdom. For example, past studies revealed that births, sexually transmitted infections, condom sales, and abortions occurred most frequently during winter and early summer, and recent research findings found a similar seasonal pattern in web search activity related to pornography, prostitution, and mate-seeking. 25 Another example concerns evolutionary psychology's Challenge Hypothesis, which argues that individuals who win in competition have an increased level of testosterone and interest in sexual behavior compared to individuals who lose. 26 Big data research allows the Challenge Hypothesis to be studied on people who are indirectly involved in the competition too, such as voters in political elections 27 and basketball and soccer fans. 28 Markey and Markey 29 used Google Trend data to look at the U.S. population's sex-related searches after the 2005, 2006, and 2008 political elections after controlling for sex-related searches in the weeks before the elections. Their findings revealed that states in which most individuals voted for the winning political party had significantly higher sex-related searches than states in which most individuals voted for the losing political party. The result validated the “Challenge Hypothesis” at the group level in a nonlaboratory setting.
In addition, big data research on sex-related topics has revealed surprising findings that challenge common wisdom. There is thought to be a close relationship between religiosity and conservatism in the United States, but there is also evidence from big data research that contradicts this assumption. MacInnis and Hodson 30 used Google search data to examine associations among states' conservatism, religiosity, and sex-related web searches (website and picture searches related to sex). Counterintuitively, this study showed that in states with higher conservatism and religiosity, there was higher attention toward and interest in sex-related content on the Internet. Based on psychoanalytic theory, MacInnis and Hodson 30 posited that this counterintuitive behavior is due to the suppression of basic human sexual urges, including nontraditional sexual urges. Another study focused on searching, viewing, and commenting on rape-related pornographic videos as examples of gendered microaggression. 31 The results revealed higher interest in many southern states (Alabama, Arkansas, Florida, Kentucky, Louisiana, Mississippi, Missouri, Tennessee, and Texas) for dominance- and rape-oriented pornography compared to other American regions, even though these areas are considered to be politically conservative and religious.
Therefore, big data are beneficial for social psychology research on sexuality, as it looks at exogenous variables, implicit behavior, and cultural differences that are hard to investigate using traditional measurement methods.
Suicidal behavior and mental health
Web search data allow researchers to look at suicidal behavior and mental health issues in large populations and identify search terms that predict mental health problems. For example, McCarthy 32 found age differences in how suicide-related searches from 2004 to 2007 in the United States were correlated with the Centers of Disease Control's statistics on suicide and self-injury. Suicide-related search activity was positively correlated with intentional self-injury and suicide death among adolescents, but negatively correlated with suicide death in the general population. Furthermore, Hagihara et al. 33 studied the relationship between monthly suicide-related searches and suicide death from 2004 to 2010. Their findings showed that among adults 20–39 years old, the Google Search terms “hydrogen sulfide,” “hydrogen sulfide suicide,” and “suicide hydrogen sulfide” predicted suicide deaths 11 months in advance; among adults aged 30–39, the Bulletin Board System on suicide predicted suicide deaths five months in advance and Google searches on “suicide by jumping” predicted suicide deaths six months in advance. Similarly, Yang et al. 34 studied suicide deaths in Taipei related to web search trends (37 search terms used from 2004 to 2009) covering five categories: psychiatric, medical, familial, and socioeconomic factors, and pro-suicide terms. They found similar results as Hagihara et al. 33 Specifically, “major depression” and “divorce” search terms accounted for 30.2 percent of suicide deaths. Gunn and Lester 35 also found a correlation between suicide-related search terms such as “commit suicide,” “how to suicide,” and “suicide prevention” and suicide deaths in each of the 50 states in the United States in 2009.
In addition, research has found that people typically fall into depressed states during the winter, but big data research has provided a global perspective on the seasonal pattern of people's emotion changes and the factors that influence depression. Yang et al. 36 looked at the association between depression-related search trends across 54 geographic locations worldwide from 2004 to 2009. Their results revealed that depression fluctuations in the northern and southern hemispheres were opposite, and the volume of searches related to depression was negatively correlated with the latitude-dependent temperature in the climate. Similarly, Tefft 37 analyzed Google searches on “depression” and “anxiety” in the United States and found that the unemployment rate had a significant positive relationship with depression-related search activity, and initial unemployment insurance claims had a significant negative relationship with both depression- and anxiety-related searches.
The evidence also suggested that searches related to psychological distress (PD) continued to predict economic indicators after the end of the Great Recession in 2010. Other research has also shown that web searches related to PD might be linked to economic indicators. Ayers et al. 38 showed that web searches related to nonspecific PD increased after the United States' Great Recession ended in 2010. PD searches had strong associations with unemployment, underemployment, mortgage delinquencies, and foreclosures. In the six months before the beginning of the Great Recession in 2007, a 1 percent increase in mortgage delinquencies and foreclosures was associated with a 16 percent increase in PD searches in the following month and an 11 percent increase four months later. The unemployment rate also predicted the population's PD searches. Similarly, Askitas and Zimmermann 39 studied web search terms related to mental, emotional, and physical discomfort to look at the impact of the 2008 Financial and Economic Crisis on G8 countries, especially the United States and Germany. Search terms such as “symptoms,” which suggests self-diagnosis, and “side effects,” which suggests treatment, co-occurred with the crisis encountered in G8 countries.
Therefore, big data psychology research is effective in predicting suicidal behavior and other mental health problems, which are crucial societal issues that need to be addressed.
Social prejudice and inequality
Researchers have made use of the advantages of web search data to conduct research on socially sensitive topics such as social prejudice, academic dishonesty, and social inequality. One benefit is that web search data can show people's true feelings and interests, in contrast with questionnaires, which can be unreliable as people tend not to fully express socially unacceptable attitudes. Stephens-Davidowitz 40 used Google web search data to investigate implicit behavior related to negative feelings toward Blacks in the United States. The researcher collected data on search volumes related to the terms “nigger” and “niggers” (commonly interpreted to be offensive terms) to measure the amount of racial animus in the United States from 2004 to 2007, particularly in relationship to the share of votes received by the Black presidential candidate Barack Obama. After controlling for vote shares received by the former Democratic presidential candidate, John Kerry, racial animus was a strong predictor of Obama's vote share and was estimated to cost him 4 percent of the national popular vote in the 2008 and 2012 presidential elections. To test the validity of the results, web search data were compared with survey-based studies and a strong correlation was found between Google search data in 45 states and the General Social Survey. 41
Besides social prejudice, academic dishonesty is also condemned as socially unacceptable across various cultures. Neville 42 used Google search data to look at the relationships among academic dishonesty, income inequality, and generalized trust. Google search terms related to “term papers,” three cheating sites (Cheathouse, Essaytown, and AcaDemon), and four correlated queries on “free research paper” were used to measure academic dishonesty in the United States. After controlling for contextual variables like average income and number of colleges per state, there was a positive correlation between academic dishonesty and income inequality, and a negative correlation between academic dishonesty and generalized trust; in addition, generalized trust mediated the relationship between income inequality and academic dishonesty. People in states with higher economic inequality showed lower levels of generalized trust, which in turn appeared to be associated with a greater prevalence of academic dishonesty.
Research has shown that social inequality leads people to engage in social comparisons and show a greater concern for social status. 43 Walasek and Brown 44 used the social-rank hypothesis to look at the likelihood of people from unequal groups engaging in status-seeking behaviors like having status goods. After controlling for income and other socioeconomic factors, they identified 40 frequently used search terms related to income inequality in the United States. Their results showed that in states with greater income inequality, people displayed greater desire for positional goods and conducted more searches related to positional and status goods, which supports the social-rank hypothesis. Other research in 99 nations found a larger volume of searches on status goods in countries with higher income inequality. 45
In view of the abovementioned research, big data have provided an opportunity for researchers to investigate socially sensitive and socially unacceptable topics that are hard to investigate using measurement methods that cannot provide anonymity.
Effects of and public responses to public policy
Web search data are useful to explore responses to health-related information and public policies in the general population. Zhang et al. 46 examined seasonal patterns from 2004 to 2014 in search volumes related to “tobacco” and “lung cancer” in the United States, Canada, the United Kingdom, Australia, and China. Their findings revealed that seasonal fluctuations affected search volumes for both tobacco and lung cancer, especially the latter. Similarly, Frijters et al. 47 looked at the impact of macroeconomic conditions on health-related behaviors and outcomes using alcoholism-related Google search activity in the United States from 2004 to 2011. After controlling for state and time effects, it was found that with an increase of 5 percent in unemployment, there was a 15 percent increase in alcoholism-related search activity.
Researchers have also used web search data to assess the effectiveness of and public responses to tobacco and abortion policies. For example, Huang et al. 48 used data from Baidu Index and Google Trends to look at Internet users' responses to smoking bans in indoor public places. They used the search terms “Smoking Ban(s),” “Quit Smoking/Stop Smoking/Smoking Cessation,” and “Electronic Cigarette(s)” from 2009 to 2011 and found that all search terms were positively correlated with media coverage on smoking bans. This research shows that smoking-related online searches can serve as an alternative novel analytic tool for monitoring and evaluating tobacco use. On the other hand, Reis and Brownstein 49 examined 50 U.S. states' and 37 countries' abortion policies and search patterns related to abortion. Their findings revealed a positive correlation between local restrictions on abortion and abortion-related searches, which were in turn negatively correlated with local abortion rates in both the United States and internationally. The findings implied that women in states or countries with policies restricting abortion have limited access to abortion services and hence seek abortions outside their area. 50 These studies demonstrate that web search data provide a convenient and economic method for studying health policies' impact. 49
Summary
Studies on psychosocial issues that have made use of web search data can be classified into five types, namely descriptive research, correlational research, inferential research, predictive research, and policy evaluation research. Descriptive studies use web search data to look at people's attention and interest toward a topic over time and in response to environmental change.31,35 Correlational research looks at the association between web search behavior related to a particular topic and other behaviors, especially offline behaviors.34,38 Inferential research, usually testing a certain causal hypothesis, uses web search data as psychological and behavioral variables that are examined in relationship to other variables.29,40,42,44 Predictive research looks at web search behavior to predict important behaviors (e.g., suicide) in the population.32,33 Policy evaluation research uses web search data from a wide sample as indicators of public responses toward a specific policy across and within regions.48,49
Limitations of Web Search Research
Despite the advantages of web search data in social psychology research, there are limitations to using big data in data analysis such as sampling bias, ensuring reliability and validity of the studies, and tackling prediction and replication issues.
Sampling bias
Web search data are often assumed to represent specific groups' attention or interest during a particular period of time, but differences may exist between web search users and the general population. For example, the Pew Research Center reported that high-income and highly educated young people are more frequent search engine users compared to other groups in the United States. 6 Hence, it is inappropriate to interpret web search data as a representation of the psychological behavior of all members of the population.
In addition, as Google Search is the most famous search engine globally, and Google Trends, Google Correlate, and other Google platforms also provide researchers with data, most of the existing web search research gets data only from these platforms. While Google data provide a highly representative sample, there are other influential search engines in the global market such as Baidu, Yahoo, and Bing. Therefore, sampling bias may exist when researchers choose to use data from only one search engine to analyze. For instance, Google Search data are not a good representation of China's population because Google is not used frequently in that country. Nevertheless, sampling bias may be minimized in the future with the popularity of the Internet and its further integration with daily life.
Reliability and validity
Although analyzing key search terms helps to compress a large amount of data, the choice of key search terms and keywords can be quite arbitrary. 14 There are no standardized guidelines on how to best operationalize indicators, and most researchers have chosen key search terms based on personal judgment and convenience. For example, Yang et al. 36 used only the search term “depression” to measure depression, a term that is not comprehensive. Hence, many research studies have either used closely related search terms as measurement variables or established a set of key search terms by screening terms and refining the choice process. Other studies have ensured reliability and validity by statistically analyzing key search terms, while they are being collated. For instance, Neville 42 created a composite measure of academic dishonesty by calculating the mean search activity of nine search terms and attained good reliability (α = 0.88). Therefore, future research should use both conceptual and empirical methods in the selection of key search terms, synthesize measures of web search behavior, and conduct tests of reliability and validity to ensure the scientific nature of the measurement method.
Replication crisis and the issue of prediction
Web search data are beneficial for conducting predictive research as the data encompass the behavior of a large-scale population, but the replicability of the studies and the robustness of the predictions may be undermined by several factors. For example, overestimation of an influenza epidemic by Google Flu Trends may have been a result of users' overreaction to information in the media, change of users' search behavior, overfitting of data models, and changes in search system algorithms.51–53 The overestimation may also have been due to the disclosure of a few sample search terms by the New York Times, leading to a sharp increase in the use of those search terms and subsequently a negative impact on the model's predictive value. 54 Most importantly, it is thought that there will likely continue to be similar problems with the predictive value and robustness of big data prediction research, also known as “traps in big data analysis”. 52
Level of analysis
Due to the anonymous nature of web search data, it is implausible to link online search activities with users' individual characteristics. As shown above, analyses are mainly limited to the group level, with the unit of analysis ranging from a school to a city, a state or province, a nation, or even a continent. Whether correlational findings can be extended to the individual level deserves further investigation. For example, it has been revealed that in U.S. states with higher conservatism and religiosity, there is higher interest in sex-related content on the Internet. 30 Whether individuals with higher conservatism and religiosity would show a similar pattern needs to be addressed by research at the individual level using traditional questionnaire and laboratory-based approaches.
Future Directions in Web Search Research
Lazer et al. 52 argued that researchers should not fall into the traps in predictive research using web searches, but should instead pay more attention to small details and unexplored domains of big data. Future research on big data should make use of data acquisition time and geographical location to obtain access to other advantages and explore research areas that are difficult to investigate using traditional measurement methods. For instance, web search behavior is driven by intrinsic motivation and is relatively independent of others' behaviors, a phenomenon that has unique application value in understanding socially sensitive and socially unacceptable topics.
Last, future researchers can further integrate social psychology theories into studies using big data to enhance the research's theoretical value. For example, web search data have been used to test the “Social-Rank hypothesis” and “Challenge hypothesis” at the group level, effectively integrating the technical advantages of big data, a focus on relevant psychosocial issues, and theory-based hypothesis testing. Such studies are especially valuable because they have the ability to describe, explain, and predict social psychological phenomena. Therefore, future research using big data to assess web searches can focus on areas that are relevant for testing social psychology theories at the group level.
Social psychology research based on web search data complements traditional research approaches by opening a new venue to studying sensitive issues in large populations. It captures important aspects of online activities, which are now part of daily life in many parts of the modern world. This approach allows researchers to better understand human social behaviors and emotions by monitoring the allocation of social attention in the online context. It extends previous social psychological research at the individual level to an understanding of behavior at the level of groups of various sizes and complexity, which deepens our understanding of social behavior and group dynamics in the era of the Internet.
Acknowledgments
This research is supported by the MOE Tier 2 (MOE2016-T2-1-105) to R.Y., the Key Program of the National Natural Science Foundation of China (Grant No. 71532005, No. 71731004), and National Social Science Foundation of China (Grant No. 15BSH034) to Guangdong Key Laboratory for Big Data Analysis and Simulation of Public Opinion.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
