Scaling down

Abstract

While “scaling up” is a lively topic in network science and Big Data analysis today, my purpose in this essay is to articulate an alternative problem, that of “scaling down,” which I believe will also require increased attention in coming years. “Scaling down” is the problem of how macro-level features of Big Data affect, shape, and evoke lower-level features and processes. I identify four aspects of this problem: the extent to which findings from studies of Facebook and other Big-Data platforms apply to human behavior at the scale of church suppers and department politics where we spend much of our lives; the extent to which the mathematics of scaling might be consistent with behavioral principles, moving beyond a “universal” theory of networks to the study of variation within and between networks; and how a large social field, including its history and culture, shapes the typical representations, interactions, and strategies at local levels in a text or social network.

Keywords

Scaling up scaling down situated networks cultural templates scope conditions

Network science research in the computational, social, and biological sciences is increasingly focused on datasets of thousands and even millions of nodes and comparably massive sets of connections among them—for example, in gene interaction networks or social media datasets. Well over a decade ago my colleagues and I, falling in step with many other researchers, began asking, “How well do the different analytical techniques and algorithms ‘scale up’ to large networks … ?” (Breiger et al., 2003: 5). Traditional concepts of network centrality, for example, and attendant shortest-path and betweenness metrics, are often impractical to compute for large-scale networks, even on very fast computers. More fundamentally, the phenomenology of taking account of all possible links, which is what these metrics do, may well be appropriate for a small face-to-face group or for several dozen trading partners, but inappropriate for the structuring and operation of networks at very large scale. Much of the success of Big Data science has consisted of formulating for large datasets algorithms that are more efficient and appropriate, and that “scale up” only linearly with the number of nodes and edges in a graph (Palmer et al., 2003).

A premise of a great deal of network science and Big-Data analysis of online behavior is that “the web sees everything and forgets nothing” (Golder and Macy, 2014: 132). Large-scale studies of Internet behavior often make use of what is in this sense thought to be an unmediated study of social interactions, and it is not at all rare for authors of such studies to claim, from the analysis of millions of Facebook posts, findings about human behavior that are said to be “in contrast to prevailing assumptions” in social science such as Festinger’s (1954) social comparison theory formulated from research on small human groups (Kramer et al., 2014: 8788, 8790).

As I envision it, the alternative problem of “scaling down” addresses four often-interrelated features of Big Data and network science research that are routinely ignored or accorded insufficient attention, to the detriment of progress in research. First, whereas many studies have been undertaken of massively large systems such as social networking sites, an under-researched question is the extent to which the behavioral findings of these studies “scale down,” i.e. apply to human groups and organizations of moderate size (dozens or hundreds), where most human social life takes place and is likely to continue to do so. This is the question of the extent to which Big-Data research applies to human behavior at the human scale of church suppers and department politics in which we spend much of our lives. Second, what are the behavioral processes that lead to the macro-level outcomes? The research community has produced stunningly impressive and workable mathematical models of how processes at lower levels (among individuals, say) might cumulate to high-level complexity (e.g. Lusher et al., 2013; Morris, 2003), or how bags of words from multiple topics might spill together to form texts (Blei, 2012). However, there has been precious little attention paid to formulating micro-processes that reflect actual behavior. Big Data has no analog to behavioral economics, the study of when and why actors follow or depart from the postulated model (Thaler, 1994). Third, network science and Big Data often see themselves as scaling “up” to generalizations that are freed from the shackles of particular texts, and to findings that apply universally to “all” networks whether power grids, gene interactions, or Facebook friending. Scaling “down” would recognize the possibility that, the bigger the dataset in the case of a particular research question, the greater are the opportunities to search for variation within the case, to contextualize its features in such a way as to lead to a distinctive form of case-based generalization (George and Bennett, 2005). Fourth, “scaling down” refers to the problem of how a large social field, including its history and culture, shapes the typical representations, interactions, and strategies at local levels in a text or social network.

In brief: (a) the degree of applicability of Big Data research to small- and moderate-sized social groups, (b) the study of when actors behave as if the mathematical mechanisms postulated to generate Big Data were true, (c) the relative utility of binning Big Data into local contexts, and (d) the production of local action from macro-level processes are all problems in “scaling down.” I will say a bit more in turn about each of the four aspects of “scaling down” that I have identified.

Scope conditions. Festinger (1954), on social comparison processes, pertains to “peer groups” or “primary groups” of the order of 10¹ members. Kramer et al. (2014) studied 6.89 × 10⁵ Facebook users, by manipulating the emotional expressions in the news feeds each user received.¹ People who received positive emotion updates expressed positive emotion reactions, not the negative reactions predicted by Festinger’s social comparison theory or by Turkle’s (2011) more contemporary consideration of how technology affects social life. Kramer et al.’s assertion that their finding is “in contrast to prevailing assumptions” of social science is buttressed, in the authors’ opinion, by the ability of Big Data to detect even such a small effect as the one they found in this instance (2014: 8790). It's not clear to this reader, however, why results on Facebook should be expected to scale down so as to apply in the same way to peer groups. We live on the Internet, but we also and simultaneously live in very small “primary groups” of the sort Festinger was addressing. My own guess is that the difference is “strong” culture (which is pervasive in peer groups) vs. the “weak” culture of online friending (Schultz and Breiger, 2010). I don’t see why “the task of the researcher is to see online behavior as social behavior, the kind that might occur in any field site, be it a remote village, a law office, or a high school cafeteria” (Golder and Macy, 2014: 113). To be sure, online behavior is a very distinctive and increasingly prominent form of social behavior, and tools for analyzing Big Data open up the study of online behavior in ways that are both innovative and exciting for sociology. Nonetheless, Festinger (1950: 278) himself reviewed experimental studies showing that behavior in small face-to-face groups like remote villages, (many) law offices, and school cafeterias differed along multiple dimensions from more distanced forms of communication. Several contemporary leaders in Big Data analysis have coined the term “Big Data hubris” to refer to the often implicit assumption that Big Data is a substitute for, rather than a supplement to, traditional data collection and analysis (Lazer et al., 2014).

A behavioral model? Snijders et al. (2012) take note of the many ingenious mathematical models that have been devised for Big Data analysis. Their main point, however, is the suggestion that “instead of trying to find micro-processes that lead to certain aggregate network properties based on mathematical tractability, one could follow a different analytical strategy and try to come up with micro-processes that match with actual behavior. And this is exactly where social and behavioral research can play a role.” Research of Yotam Shmargad (2014) moves decisively in this direction in order to model strength of ties and social media use. Shmargad analyzed the database of a social media company that started charging its users to receive broadcasts in their email address books. Before the monetization, users’ address books were automatically updated each time one of the people they were connected to changed their contact information. Afterwards, users had to buy the company’s premium bundle for $60 to continue receiving the updated information. In a study with the design of a natural experiment, Shmargad compared how purchase rates varied with properties of users’ networks. Among the key findings: while people value receiving information from their strong ties, they also highly value receiving information from ties that are structurally diverse, for example from ties that connect the focal individual to regions of the social network with which she would otherwise not be connected. This study therefore provides a useful exemplar of how a search for micro-processes that match users’ behavior can lead to improved, rigorous quantitative modeling of social media networks.

Big Data, situated networks. McNely (2012) claims (in my paraphrase) that “the paradox of Big Data” is that we now collect so much data that the challenge is no longer only quantitative. The paradox “suggests the inverse: we need more situated, contextualized, qualitative studies of communication practices in an age of Big Data, not less” (p. 28, original emphasis). In contrast to recent trends of scaling up communication infrastructures, McNely argues that the future of communication design “must address the challenges of scaling down, of delivering Big Data in contextual, meaningful, localized forms” (p. 28, original emphasis). One possible way to exploit this paradox is to consider Big Data corpuses, such as the set of published US National Security Strategy statements analyzed by Mohr et al. (2013), as a single “case” or type of discourse that manifests internal variation. Using natural language processing techniques, Mohr et al. (2013: 678–686) show how the identification and relational trajectories of identified actors (“America,” “Yeltsin,” “Allies,” “Weapons of Mass Destruction”) can be distilled from the corpus. How these consequential trajectories change over time is importantly related to a form of within-case generalization that is usually associated with the situated study of qualitative data (Goertz and Mahoney, 2012: 10–11, 87–99).²

Fields, cultural templates, and automorphic equivalence. Not all networks emerge from processes on nodes and edges. This statement departs from the program of Big Data analysis, yet perhaps the recognition of this point could motivate progress in the analysis of large-scale data. In many kinds of human networks, group style—defined as recurrent patterns of interaction that express a social group’s shared ideas about what constitutes good or adequate participation in group settings—provides stylized templates or ideals for interpersonal interaction (Eliasoph and Lichterman, 2003). In what sense are such cultural templates providing the impetus for predictable interpersonal linkages at the micro-level (Pachucki and Breiger, 2010), and, in this way, scaling down? A somewhat similar question can be posed in molecular biology,³ where certain patterns, termed network motifs, occur more frequently than by chance. Kashtan et al. (2004) define families of motifs such that motifs within families share a common general theme defined by roles that the researchers identify using automorphic equivalence techniques developed in sociology for the study of social roles. Networks which share a common motif can have very different generalizations of that motif.

In research that I see as related to the studies mentioned above, Lazega et al. (2008) formulate a multi-level social network analysis via linked design: French cancer labs have ties (such as mobility of researchers among them), scientists have network ties (such as working together), and scientists are affiliated with labs. This formulation presents what I would like to identify as “a duality of scaling up and down” with an emphasis on actors’ strategies, inter-organizational control mechanisms, and a distinctively institutional theory of their coevolution that is being developed brilliantly by Lazega and colleagues (especially, Lazega, 2015; Lazega and Prieur, 2014).

In conclusion, I have identified four interrelated features of “scaling down,” the problem of how macro-level features of Big Data affect, shape, and evoke lower-level features and processes. Too often, problems of scaling down remain merely in the background of Big Data and network science studies. Recognizing and addressing them should lead to additional progress in advancing the study of what Lazer et al. (2014) term an “all data revolution,” wherein innovative analytics using data from all traditional and new sources are developed and used to further our understanding of our world.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

This article is part of a special theme on Colloquium: Assumptions of Sociality. To see a full list of all articles in this special theme, please click here: .

References

Blei

(2012) Probabilistic topic models. Communications of the ACM 55(4): 77–84.

Breiger

Carley

Pattison

(2003) Workshop summary. In: Breiger

Carley

Pattison

(eds) Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers, Washington, DC: National Academies Press, pp. 3–14. Available at: http://www.nap.edu/openbook.php?record_id=10735.

Eliasoph

Lichterman

(2003) Culture in interaction. American Journal of Sociology 108(4): 735–794.

Festinger

(1950) Informal social communication. Psychological Review 57(5): 271–282.

Festinger

(1954) A theory of social comparison processes. Human Relations 7: 117–140.

George

Bennett

(2005) Case Studies and Theory Development in the Social Sciences, Cambridge, MA: MIT Press.

Goel V (2014) Facebook tinkers with users’ emotions in news feed experiment, stirring outcry. New York Times, 30 June, B-1.

Goertz

Mahoney

(2012) A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences, Princeton, NJ: Princeton University Press.

Golder

Macy

(2014) Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology 40(1): 129–152.

10.

Kashtan

Itzkovitz

Milo

Alon

(2004) Topological generalizations of network motifs. Physical Review E 70(3): 031909.

11.

Kramer

ADI

Guillory

Hancock

(2014) Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences United States of America 111(24): 8788–8790.

12.

Lazega

Jourda

M-T

Mounier

Stofer

(2008) Catching up with big fish in the big pond? Multi-level network analysis through linked design. Social Networks 30(2): 159–176.

13.

Lazega

(2015) Body captors and network profiles: A neo-structural note on digitalized social control and morphogenesis. In: Archer

(ed.) Generative Mechanisms Transforming the Social Order, Cham: Springer International Publishing, pp. 113–133.

14.

Lazega

Prieur

(2014) Sociologie néostructurale, disciplines sociales et systemes complexes. Revue Sciences/Lettres 2: 1–15.

15.

Lazer

Kennedy

King

Vespignani

(2014) The parable of Google flu: Traps in Big Data analysis. Science 343(6176): 1203–1205.

16.

Lusher

Koskinen

Robbins

(2013) Exponential Random Graph Models for Social Networks, Cambridge: Cambridge University Press.

17.

McNely

(2012) Big Data, situated people: Humane approaches to communication design. Communication Design Quarterly 1(1): 27–30.

18.

Mohr

Wagner-Pacifici

Breiger

Bogdanov

(2013) Graphing the grammar of motives in U.S. national security strategies: Cultural interpretation, automated text analysis and the drama of global politics. Poetics 41(6): 670–700.

19.

Morris

(2003) Local rules and global properties: Modeling the emergence of network structure. In: Breiger

(eds) Dynamic social network modeling and analysis: Workshop summary and papers, Washington, DC: National Academies Press, pp. 174–186.

20.

Pachucki

Breiger

(2010) Cultural holes: Beyond relationality in social networks and culture. Annual Review of Sociology 36(1): 205–224.

21.

Palmer

Gibbons

Faloutsos

(2003) Data mining on large graphs. In: Breiger

(eds) Dynamic social network modeling and analysis: Workshop summary and papers, Washington, DC: National Academies Press, pp. 265–286.

22.

Schultz

Breiger

(2010) The strength of weak culture. Poetics 38(6): 610–624.

23.

Shmargad Y (2014) Social media broadcasts and the maintenance of diverse networks. Working paper, School of Information, University of Arizona. AIS Electronic Library (AISeL). Available at: http://aisel.aisnet.org/icis2014/proceedings/SocialMedia/10/.

24.

Snijders

Matzat

Reips

U-D

(2012) ‘Big Data’: Big gaps of knowledge in the field of internet science. International Journal of Internet Science 7(1): 1–5.

25.

Thaler

(1994) Quasi-rational Economics, New York, NY: Russell Sage Foundation.

26.

Turkle

(2011) Alone Together: Why We Expect More from Technology and Less from Each Other, New York, NY: Basic Books.