Abstract
While “scaling up” is a lively topic in network science and Big Data analysis today, my purpose in this essay is to articulate an alternative problem, that of “scaling down,” which I believe will also require increased attention in coming years. “Scaling down” is the problem of how macro-level features of Big Data affect, shape, and evoke lower-level features and processes. I identify four aspects of this problem: the extent to which findings from studies of Facebook and other Big-Data platforms apply to human behavior at the scale of church suppers and department politics where we spend much of our lives; the extent to which the mathematics of scaling might be consistent with behavioral principles, moving beyond a “universal” theory of networks to the study of variation within and between networks; and how a large social field, including its history and culture, shapes the typical representations, interactions, and strategies at local levels in a text or social network.
Network science research in the computational, social, and biological sciences is increasingly focused on datasets of thousands and even millions of nodes and comparably massive sets of connections among them—for example, in gene interaction networks or social media datasets. Well over a decade ago my colleagues and I, falling in step with many other researchers, began asking, “How well do the different analytical techniques and algorithms ‘scale up’ to large networks … ?” (Breiger et al., 2003: 5). Traditional concepts of network centrality, for example, and attendant shortest-path and betweenness metrics, are often impractical to compute for large-scale networks, even on very fast computers. More fundamentally, the phenomenology of taking account of all possible links, which is what these metrics do, may well be appropriate for a small face-to-face group or for several dozen trading partners, but inappropriate for the structuring and operation of networks at very large scale. Much of the success of Big Data science has consisted of formulating for large datasets algorithms that are more efficient and appropriate, and that “scale up” only linearly with the number of nodes and edges in a graph (Palmer et al., 2003).
While “scaling up” is a lively topic in network science and Big Data analysis today, my purpose in this essay is to articulate an alternative problem, that of “scaling down,” which I believe will also require increased attention in coming years. “Scaling down” is the problem of how macro-level features of Big Data affect, shape, and evoke lower-level features and processes.
A premise of a great deal of network science and Big-Data analysis of online behavior is that “the web sees everything and forgets nothing” (Golder and Macy, 2014: 132). Large-scale studies of Internet behavior often make use of what is in this sense thought to be an unmediated study of social interactions, and it is not at all rare for authors of such studies to claim, from the analysis of millions of Facebook posts, findings about human behavior that are said to be “in contrast to prevailing assumptions” in social science such as Festinger’s (1954) social comparison theory formulated from research on small human groups (Kramer et al., 2014: 8788, 8790).
As I envision it, the alternative problem of “scaling down” addresses four often-interrelated features of Big Data and network science research that are routinely ignored or accorded insufficient attention, to the detriment of progress in research. First, whereas many studies have been undertaken of massively large systems such as social networking sites, an under-researched question is the extent to which the behavioral findings of these studies “scale down,” i.e. apply to human groups and organizations of moderate size (dozens or hundreds), where most human social life takes place and is likely to continue to do so. This is the question of the extent to which Big-Data research applies to human behavior at the human scale of church suppers and department politics in which we spend much of our lives. Second, what are the behavioral processes that lead to the macro-level outcomes? The research community has produced stunningly impressive and workable mathematical models of how processes at lower levels (among individuals, say) might cumulate to high-level complexity (e.g. Lusher et al., 2013; Morris, 2003), or how bags of words from multiple topics might spill together to form texts (Blei, 2012). However, there has been precious little attention paid to formulating micro-processes that reflect actual behavior. Big Data has no analog to behavioral economics, the study of when and why actors follow or depart from the postulated model (Thaler, 1994). Third, network science and Big Data often see themselves as scaling “up” to generalizations that are freed from the shackles of particular texts, and to findings that apply universally to “all” networks whether power grids, gene interactions, or Facebook friending. Scaling “down” would recognize the possibility that, the bigger the dataset in the case of a particular research question, the greater are the opportunities to search for variation within the case, to contextualize its features in such a way as to lead to a distinctive form of case-based generalization (George and Bennett, 2005). Fourth, “scaling down” refers to the problem of how a large social field, including its history and culture, shapes the typical representations, interactions, and strategies at local levels in a text or social network.
In brief: (a) the degree of applicability of Big Data research to small- and moderate-sized social groups, (b) the study of when actors behave as if the mathematical mechanisms postulated to generate Big Data were true, (c) the relative utility of binning Big Data into local contexts, and (d) the production of local action from macro-level processes are all problems in “scaling down.” I will say a bit more in turn about each of the four aspects of “scaling down” that I have identified.
Scope conditions. Festinger (1954), on social comparison processes, pertains to “peer groups” or “primary groups” of the order of 101 members. Kramer et al. (2014) studied 6.89 × 105 Facebook users, by manipulating the emotional expressions in the news feeds each user received.
1
People who received positive emotion updates expressed positive emotion reactions, not the negative reactions predicted by Festinger’s social comparison theory or by Turkle’s (2011) more contemporary consideration of how technology affects social life. Kramer et al.’s assertion that their finding is “in contrast to prevailing assumptions” of social science is buttressed, in the authors’ opinion, by the ability of Big Data to detect even such a small effect as the one they found in this instance (2014: 8790). It's not clear to this reader, however, why results on Facebook should be expected to scale down so as to apply in the same way to peer groups. We live on the Internet, but we also and simultaneously live in very small “primary groups” of the sort Festinger was addressing. My own guess is that the difference is “strong” culture (which is pervasive in peer groups) vs. the “weak” culture of online friending (Schultz and Breiger, 2010). I don’t see why “the task of the researcher is to see online behavior as social behavior, the kind that might occur in any field site, be it a remote village, a law office, or a high school cafeteria” (Golder and Macy, 2014: 113). To be sure, online behavior is a very distinctive and increasingly prominent form of social behavior, and tools for analyzing Big Data open up the study of online behavior in ways that are both innovative and exciting for sociology. Nonetheless, Festinger (1950: 278) himself reviewed experimental studies showing that behavior in small face-to-face groups like remote villages, (many) law offices, and school cafeterias differed along multiple dimensions from more distanced forms of communication. Several contemporary leaders in Big Data analysis have coined the term “Big Data hubris” to refer to the often implicit assumption that Big Data is a substitute for, rather than a supplement to, traditional data collection and analysis (Lazer et al., 2014). A behavioral model? Snijders et al. (2012) take note of the many ingenious mathematical models that have been devised for Big Data analysis. Their main point, however, is the suggestion that “instead of trying to find micro-processes that lead to certain aggregate network properties based on mathematical tractability, one could follow a different analytical strategy and try to come up with micro-processes that match with actual behavior. And this is exactly where social and behavioral research can play a role.” Research of Yotam Shmargad (2014) moves decisively in this direction in order to model strength of ties and social media use. Shmargad analyzed the database of a social media company that started charging its users to receive broadcasts in their email address books. Before the monetization, users’ address books were automatically updated each time one of the people they were connected to changed their contact information. Afterwards, users had to buy the company’s premium bundle for $60 to continue receiving the updated information. In a study with the design of a natural experiment, Shmargad compared how purchase rates varied with properties of users’ networks. Among the key findings: while people value receiving information from their strong ties, they also highly value receiving information from ties that are structurally diverse, for example from ties that connect the focal individual to regions of the social network with which she would otherwise not be connected. This study therefore provides a useful exemplar of how a search for micro-processes that match users’ behavior can lead to improved, rigorous quantitative modeling of social media networks. Big Data, situated networks. McNely (2012) claims (in my paraphrase) that “the paradox of Big Data” is that we now collect so much data that the challenge is no longer only quantitative. The paradox “suggests the inverse: we need more situated, contextualized, qualitative studies of communication practices in an age of Big Data, not less” (p. 28, original emphasis). In contrast to recent trends of scaling up communication infrastructures, McNely argues that the future of communication design “must address the challenges of scaling down, of delivering Big Data in contextual, meaningful, localized forms” (p. 28, original emphasis). One possible way to exploit this paradox is to consider Big Data corpuses, such as the set of published US National Security Strategy statements analyzed by Mohr et al. (2013), as a single “case” or type of discourse that manifests internal variation. Using natural language processing techniques, Mohr et al. (2013: 678–686) show how the identification and relational trajectories of identified actors (“America,” “Yeltsin,” “Allies,” “Weapons of Mass Destruction”) can be distilled from the corpus. How these consequential trajectories change over time is importantly related to a form of within-case generalization that is usually associated with the situated study of qualitative data (Goertz and Mahoney, 2012: 10–11, 87–99).
2
Fields, cultural templates, and automorphic equivalence. Not all networks emerge from processes on nodes and edges. This statement departs from the program of Big Data analysis, yet perhaps the recognition of this point could motivate progress in the analysis of large-scale data. In many kinds of human networks, group style—defined as recurrent patterns of interaction that express a social group’s shared ideas about what constitutes good or adequate participation in group settings—provides stylized templates or ideals for interpersonal interaction (Eliasoph and Lichterman, 2003). In what sense are such cultural templates providing the impetus for predictable interpersonal linkages at the micro-level (Pachucki and Breiger, 2010), and, in this way, scaling down? A somewhat similar question can be posed in molecular biology,
3
where certain patterns, termed network motifs, occur more frequently than by chance. Kashtan et al. (2004) define families of motifs such that motifs within families share a common general theme defined by roles that the researchers identify using automorphic equivalence techniques developed in sociology for the study of social roles. Networks which share a common motif can have very different generalizations of that motif.
In research that I see as related to the studies mentioned above, Lazega et al. (2008) formulate a multi-level social network analysis via linked design: French cancer labs have ties (such as mobility of researchers among them), scientists have network ties (such as working together), and scientists are affiliated with labs. This formulation presents what I would like to identify as “a duality of scaling up and down” with an emphasis on actors’ strategies, inter-organizational control mechanisms, and a distinctively institutional theory of their coevolution that is being developed brilliantly by Lazega and colleagues (especially, Lazega, 2015; Lazega and Prieur, 2014).
In conclusion, I have identified four interrelated features of “scaling down,” the problem of how macro-level features of Big Data affect, shape, and evoke lower-level features and processes. Too often, problems of scaling down remain merely in the background of Big Data and network science studies. Recognizing and addressing them should lead to additional progress in advancing the study of what Lazer et al. (2014) term an “all data revolution,” wherein innovative analytics using data from all traditional and new sources are developed and used to further our understanding of our world.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
