Review: Urban Social Listening: Potential and Pitfalls for Using Microblogging Data in Studying Cities by Justin B. Hollander,Erin Graves,Henry Renski,Cara Foster-Karim,Andrew Wiley,and Dibyendu Das

Abstract

With Facebook undergoing extensive public scrutiny, Urban Social Listening: Potential and Pitfalls for Using Microblogging Data in Studying Cities by Justin B. Hollander, Erin Graves, Henry Renski, Cara Foster-Karim, Andrew Wiley, and Dibyendu Das is a timely must read for planning researchers. Since Facebookgate, the scandal related to Cambridge Analytica’s access to the personal data of millions of Facebook users, the public has become aware of the potential value for researchers and companies in analyzing big data from Twitter, Facebook, Instagram, or Snapchat. The book specifically aims at serving as a “comprehensive methodological guide on how to collect, process, analyze, and interpret Twitter data and offers [one of] the first frank and honest assessment of the strengths and weaknesses of Twitter data” (p. 2). According to the authors, planning academics and professionals are at the beginning of understanding how people perceive their urban environment, ignoring the potential that “social listening can revolutionize the way that social scientists study cities” (p. 4).

The book is divided into six chapters and demonstrates the potentials and the challenges of using microblogging data through three illustrative empirical studies: first, a sentiment analysis of Twitter users in New Bedford, Massachusetts (chapter 3); second, a comparison of the sentiments of Portuguese-speaking and English-speaking microbloggers in five gateway cities (chapter 4); and third, a comparison of American Housing Survey (AHS) and sentiment analysis results in eight study cities (chapter 5).

The first chapter (“Introduction”) provides the reader with a background on the psychological theory of microblogging data. Big data analysis centers on happiness, that is, subjective well-being (SWB), which, according to the authors, consists of two distinct parts: life satisfaction and affect. Life satisfaction describes someone’s assessment of his or her life as a whole, while affect refers to positive and negative emotions resulting from daily occurring events. The authors briefly discuss theoretical concepts, including the big five individual personality traits, that is, openness, conscientiousness, extraversion, agreeableness, and neuroticism. The authors also discuss hedonic balance, which is the key to affect, and the mediator-moderator model, which they describe as combining “the mediation of personality traits and their effect on life satisfaction indirectly via hedonic balance with the moderating influence of culture” (p. 7). The authors’ survey of some of the relevant psychological literature allows the reader to understand that big data analysis diverges significantly from what is taught in most quantitative and qualitative planning methods courses.

Leaning on these theoretical concepts, the remainder of the first chapter then focus on sentiment analysis (or opinion mining). Sentiment analysis determines affect by analyzing people’s positive and negative attitudes toward products and services, among other things. In other words, sentiment analysis is the process of acquiring written texts with the purpose of evaluating the textual body to determine the sentiments expressed by its writers. For instance, tweets can be used to measure the SWB of their writers in terms of pleasant and unpleasant affect in order to better understand how people feel about the places where they live. In what is referred to as a lexicon-based approach, each tweet is evaluated with reference to a dictionary containing thousands of sentiment-expressing words, each of which is classified as either positive or negative. Identified sentiment-expressing words will then be multiplied with an associated score ranging from, for instance, +3 (amazing) to −3 (extremely poor), which are in turn summed across all tweets. Calculating a summary score allows the evaluation of Twitter tweets to assess city dwellers’ attitudes in urban places. Overall, the theoretical concepts are well presented in this introductory chapter, but a more detailed discussion of validity is needed. Specifically, the notion of construct validity, that is, how the chosen dictionary is the appropriate vehicle to measure people’s sentiments, warrants a more detailed discussion.

Chapter 2, “A (Short) History of Social Media Sentiment Analysis,” provides the academic literature review. The authors refer to research using digital data in the fields of psychology, prediction of election outcomes and other events, understanding multicultural communities, and public health planning, among others. Emerging planning-related applications of sentiment analyses include city-and neighborhood-level investigations into questions such as the relationships between particular places and the mood of tweets in response to citywide events or nearby landmarks. Specifically, for planning-relevant sentiment analysis, the geocoding of tweets becomes crucial, a challenging task considering that not all users enable Twitter’s location-tracking option. The authors conclude the chapter with emphasizing the potential uses of microblogging big data, including attitudes about development impacts, service provision, effective governance, or predictions of local referenda and elections.

Chapter 3 (“Taking Microblogging Data for a Test Drive”) presents an interesting approach to validating the authors’ own custom software (Urban Attitudes), their methodological framework, and their analytical strategies. Using a small sample of geocoded tweets from Bedford, Massachusetts, the authors used Urban Attitudes software to organize the data and run all tweets through the sentiment dictionary AFINN (named after developer Finn Årup Nielsen). Next, the authors undertook a cross-comparison of the software’s findings of positive and negative sentiments with the results using the sentiment analysis module of IBM’s SPSS Modeler, which uses its own dictionary. While the SPSS Modeler uses an entire tweet as a unit of analysis, Urban Attitudes uses individual words. Nevertheless, the findings with respect to positive and negative sentiments are similar. Thus, the authors point out that Urban Attributes is a valid tool. Finally, the authors present a list of twenty-four frequently occurring urban expressions (e.g., school, children, safety, vacant lot, zoning, health) assembled from meeting minutes of planning and administrative meetings that they use as the building blocks as the focus for urban social listening to study cities.

Chapter 4 (“A Close Look at Urban Immigrant Communities”) presents a case study of the well-being of Portuguese-speaking communities in five selected areas in Massachusetts based on previously described sentiment analysis of microblogging data. The well-being of Portuguese-speaking residents is compared to that of English-speaking residents by analyzing tweets written in Portuguese and in English for each of the five study areas. Using more than one study area adds validity and allows for applying a two-sample T test to provide for statistical significance. The authors compare their sentiment analysis findings to demographic information by the US Bureau of the Census and conduct a Spearman’s rho test. In short, the authors find significant differences between Portuguese- and English-speaking residents for some, but not all, of the study areas at the 5 percent level of significance. In addition, the analysis further shows statistically significant correlations between the sentiment variables and some of the demographic variables. The latter finding indicates that the sentiment analysis can indeed be a valuable supplement to traditional analysis based on demographic data. However, the authors could have done more to explain how planners and policy makers can specifically utilize the presented information for planning purposes. In other words, the presented analysis is purely descriptive in nature. Indeed, some readers may wonder about the consequences and implications of local planners’ knowing important patterns, trends, and relationships of these two selected linguistic communities.

Chapter 5 (“A National Comparison: Twitter versus the American Housing Survey”) presents a second case study of applied sentiment analysis, looking at how residents perceive neighborhood quality of life with respect to population changes between 1970 and 2010 for eight selected cities. The question the authors try to answer is whether residents of cities with declining populations have a lower evaluation of life satisfaction than residents of growing cities. The authors compare results from the sentiment analysis with average neighborhood satisfaction results from AHS, where residents rate their neighborhoods on a scale from 1 to 10. The comparison suggests that there is no significant correlation between population change and residents’ SWB as measured by either AHS or tweets. Furthermore, there is no statistically significant correlation when ranking the cities based on AHS and tweet results. The authors partly explain these findings with the fact that the tweets and AHS measure two different concepts of SWB, namely, affect and life satisfaction. The authors provide an interesting discussion of the strengths and weaknesses of the sentiment analysis. First, social media are used more frequently by younger people but have limits in terms of their usefulness for evaluating SWB for a broader range of urban residents. In addition, language-based challenges to accurately measuring sentiment are still common as current software dictionaries usually do not cope well with slang, emotions, negations, and sarcasm.

Finally, chapter 6 (“Conclusions”) provides a deep discussion of the limitations of sentiment analysis covered in previous chapters, with some recommendations for future research. Indeed, we wholeheartedly agree with the authors that “tweets can be considered a complement to existing, imperfect measures of attitudes and opinions, such as surveys and interviews” (p. 78).

In summary, the book offers a good introductory discussion of sentiment analysis and demonstrates how to conduct a sentiment analysis. The concept of validity is explicitly addressed in the applied studies by comparing the results from the five gateway cities to the results of three specifically selected cities in chapter 4 or by cross-comparison with AHS survey results in chapter 5. The authors demonstrate the potentials and pitfalls of microblogging data in studying cities as the book title promises. However, we would have appreciated a more thorough discussion of how the findings and outcomes of the case studies are of specific value to city planners and policy makers. The book is well structured, its language is easy to comprehend, and the research questions of each chapter are well presented. We recommend the book to advanced undergraduate and graduate students as well as practitioners interested in analyzing big data.