Abstract

Reviewed by: Marynia Kolak, Arizona State University, USA DOI: 10.1177/2399808317711986
Regardless of planning or disciplinary intention, “data science is happening” (Scott, 2014). Companies that approached millisecond-level time series data with engineering and statistical prowess are at the forefront of understanding human behavior, though they may not frame their quest in the same way as a researcher preparing a journal article. While the private sector may capitalize on predicting user behaviors, possibilities beckon in public sector and academic fields. How can new data and data science approaches identify vulnerable populations for local governments to support? Can we better understand the complex, interconnected relationships between social and environmental dimensions of cities? This momentum is challenged by the need for better training and collaboration across disciplines for approaching these new problems (Cleveland, 2001; Provost and Fawcett, 2013; Schutt and O’Neil, 2013).
Big Data and Social Science: A Practical Guide to Methods and Tools, a new text co-edited by Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane, seeks to address this gap (Foster et al., 2016). It delivers a primer on several data science methods and computational framing for decision-driven social science questions. The goals of the text are twofold: to communicate the elements and importance of big data analytics to social scientists, and to extend existing social science toolkits with these increasingly generalizable techniques from the computer sciences. Its sections on data curation, methods, inference, and ethics follow a case study of linking research investment and innovation, grounding the discourse in practice. Issues of data quality and confidentiality are addressed at the onset and then throughout the text, anticipating common critiques of the field.
Data science has a complex origin story, in that multiple disciplines and sectors call it their own brain child. The paradigm of Big Data and Social Science is heavily computational, with data science explicitly introduced as a discipline of the computational sciences. Machine learning dominates the methods section, though most quantitative social scientists will already be familiar with the regressions used as types of classification methods. Framing problems into research designs likewise follows this computational paradigm, which may benefit the strict machine learner more than the social scientist. This text uncovers the need for new types of collaborations that not only uncover insight from big data but also the data generating and underlying processes driving the phenomenon.
Overall, Big Data and Social Science delivers an invaluable resource for the new social scientist toolbox. The necessarily hands-on approach provides data scraping tutorials, guides on using and programming with APIs, review of SQL and NoSQL concepts, and primer on new big data technologies like Hadoop and Spark. Python serves at the core of these exercises to allow for flexibility and scalability, empowering the social scientist to match elegant theory with computational punch. For those making the shift, tutorials included will benefit their trek on the learning curve, though some chapters assume a greater level of programming expertise than others. Highlights of the text can be found in selected chapters offering capsuled frameworks and methods, like the database and machine learning chapters. A taxonomy of information visualization techniques by task provides a meaningful link to earlier chapters. A discussion of privacy and the importance of providing access highlights the complexity of developing new kinds of knowledge infrastructures. A new error framework for a big data setting by Biemer delivers an exceptional overview for understanding error and inference in these new computational settings. Basics of (social) network analysis provide a thoughtful overview of the basics and applications. This chapter works best as a follow-up to the data management section, where a network data set is developed from web scraping and matching algorithms for data linkage.
Applications of network analysis in complex systems research, driven by testing of differing social science theories, could further enrich this discussion. The emerging field of computational social science intersects these topics in increasingly urgent ways (Conte et al., 2012; Gilbert, 2010; Lazer et al., 2009). Methods for related forms of network analysis that have long traditions and data science applications, like regional science, would likewise offer relevant connections to drive research forward. With the abundance of sensor and (both social and spatial) network data flooding the datascape, tying new types of data with social science questions could bring forward new understandings of relationships and interactions between individual and populations. Spatial concepts are missed in the text, expect for brief inclusions in a discussion on databases, and later in a chapter on visualizations. A spatial perspective framing social science questions necessitates not only a spatial data and/or systems infrastructure but additional spatial-data specific tests, diagnostics, and methodology that challenge traditional statistics as well as computational processes. Still, the text is not meant to be all-encompassing but rather an initial, practical guide. How emerging methods advance insight into big data and underlying social processes is for researchers and analysts to determine over the next decade.
While “data science is happening,” defining it at this moment of confluence may not be feasible as it morphs in different, interesting ways along multiple applications. What will big data analytics and social science ultimately forge from the diverse fields they grow from? This text doesn’t seek to answer that question, but provides a handful of tools and relevant approaches needed to march into the frontiers of data science, computational statistics, and quantitative social science. This text highlights the need for more collaborations across disciplines as exciting new opportunities in datascapes bring researchers together in new ways.
