Clustered Iconography: A Resurrected Method for Representing Multidimensional Data

Abstract

Development of graphical methods for representing data has not kept up with progress in statistical techniques. This article presents a brief history of graphical representations of research findings and makes the case for a revival of methods developed in the early and mid-twentieth century, notably ISOTYPE and Chernoff’s faces. It resurrects and improves a procedure, clustered iconography, which enables the presentation of multidimensional data through which readers engage more effectively with the presentation’s central message by way of an easier understanding of relationships between variables. The proposed technique is especially well adapted to the needs and protocols of open-source research.

Keywords

iconography multidimensional data graphical methods ISOTYPE cluster

Introduction

Appropriate presentation of research findings is a conundrum faced by both qualitative and quantitative investigators. Insofar as it is endemic to the write-up phase of a project, the challenge of making results intelligible, transparent and subject to the critical scrutiny of a broad readership is not new. Indeed, the problem has been wrestled with since at least the time of Florence Nightingale who, faced with daunting practical communication challenges, made pioneering advances in these domains. However, the urgency of the dilemma has been growing over the last two decades. For example, in 2012, a study reported that the worldwide presence of data and databases is at an all-time high (Digital Universe 2012). The same study stressed that, for the potential of this corpus to be realized, innovation was needed in methods of display and interpretation. At that time, there were 2.8 trillion gigabytes of stored data in the world. Since then, global capacity has increased, with authors such as Petrov (2019) estimating that the worldwide stockpile will exceed 40 trillion gigabytes (40 zettabytes) at some stage in 2020.

In parallel with data proliferation during the digital age, there has been a rise of increasingly sophisticated analytic software (e.g., Stratgraphics, Looker, MATLAB) and an expanding array of approaches concerned with formulaic-based methods of transformation (e.g., recursive partitioning). This is especially true in the social sciences where by the 1960s and 1970s the content scope of and readership for sociology literature were unambiguously broader than for disciplines such as political science, psychology, or economics (Healy and Moody 2014; Paechter et al. 2017). Insofar as nontechnical considerations are concerned, a growing pressure for publicly-funded research to be subject to broad scrutiny, particularly in the social sciences, has formed much of the context for scholarly enterprise and created new sources of angst for those engaged in such endeavour (e.g., “European Countries” 2018; Eisenberg and Nelson 2002). Hence, the overall scenario created from the confluence of influences pertaining to technical advance and the dissemination imperative place modern social science researchers in an increasingly stressful situation. On the one hand, they have more data to work with (and justify using or excluding) and are expected to invoke cutting-edge, but invariably alienating, methods of analysis. On the other hand and paradoxically, they are under pressure to bring into the fold the lay public.

When it comes to communicating research findings to reading audiences who lack analytic training and sophistication, modern scholars—be they working in the qualitative or quantitative tradition—often face a compromise. Specifically, they must trade-off extracting maximum meaning from data with retaining comprehensibility. Making this judgment call is unduly thorny because, for many, it is becoming difficult to appreciate the potential and limits of individual techniques. Some years ago, Hellems, Gurka and Hayden (2007:1083) summed-up the situation when they noted that it is “increasingly unrealistic to expect readers fully to understand the statistical analyses used in journal articles.” Indeed, just in the quantitative domain, growing complexity has been associated with the emergence of “statistics anxiety” (Chew and Dillon 2014; Onwuegbuzie 2004; Paechter et al. 2017). The literature addressing this malaise is somewhat incomplete but nonetheless revealing. For example, approximately 80 percent of U.S. university graduate students experience stress-related symptoms associated with having to master quantitative methods (Onwuegbuzie 2004), with compounding consequences on their course achievement (Chew and Dillon 2014; Fitzgerald, Jurs and Hudson 1996; Paechter et al. 2017).¹ In very recent times, new and more disquieting findings concerning this phenomenon have come to light with researchers such as Siew, McCartney and Vitevitch (2019) concluding that skittishness about research methods generally and avoidance of courses in this domain, in particular, is a key contributor to student attrition at university. It is noteworthy also that it is not just in relation to quantitative methods that people struggle. In fact, as Vitevitch (2016) notes, lack of confidence about undertaking and interpreting output from qualitative approaches makes the statistics anxiety syndrome seem unduly narrow in its focus. The proliferation of theoretical perspectives and discipline-specific terminology has created a broader phenomenon, which Wurman (1989) originally called “information anxiety” and others more recently have labeled “infobesity” (Rogers, Puryear and Root, 2013), “infoxication” (Chamorro-Premuzic, 2014) and “information explosion syndrome” (Buckland 2017).

Analytic complexity does not just concern researchers or those charged with making a point about data. It also affects journal publishers and others interested in communication of research findings, including, ultimately, the reader. The reason for this is simply stated: the results sections of modern scholarly articles have become increasingly complex and sophisticated. The case of a high-quality journal with a practitioner focus is illustrative: in 1983, readers of The New England Journal of Medicine with an understanding of only basic descriptive statistics could comprehend 59 percent of its articles. In 2004, the equivalent figure was just 6 percent (Strasak et al. 2007). The case of The New England Journal of Medicine is not unique. Indeed, the problem reaches its zenith in sociology journals. For example, publications such as the American Sociological Review and the American Journal of Sociology routinely present many tables, few figures with a degree of opaqueness concerning how authors moved from data to conclusions (Healy and Moody 2014). By contrast, in journals such as Science, Nature and the National Academy of Science, articles are typically associated with one (or sometimes several) graphical displays as their centerpiece and in this sense offer conclusions which are more auditable (Healy and Moody 2014).

While the use of analytic methods for data interpretation has become more complex and alienating, deployment of relatively simple displays has fallen by the wayside. For example, Healy and Moody (2014) make the somewhat counterintuitive point that, compared with natural sciences, there has been a paucity of innovation of data visualization techniques in the social sciences generally and sociology in particular (Pauwels 2010). More recently, McFarland, Lewis and Goldberg (2016) made a similar point. On the same topic but twenty years earlier, Cheng and Simon (1995) noted that in leading social science journals, data analysis involving algorithms and formulae as a principal means of making a point is up to three times more common than visual data presentation. Anecdotal observation of current literature does not suggest a change in this proportion. Indeed, recently scholars such as Healy and Moody (2014:106) have pointed out that “visualisation mostly remains an afterthought in sociology.” These authors are unclear about why this domain-specific neglect has arisen in the twenty-first century. After all, it was the social sciences that, several decades earlier, revealed the potential of visual display techniques. Specifically, Anscombe (1973) dramatically showed that two data sets with near identical statistical properties—notably their bivariate regression lines—can manifest as starkly different when presented on a set of scatterplots (the scatterplots in fact are more revealing). Moreover, authors such as Jackman (1980) who, in examining voter-turnout behavior as a function of income inequality, exposed the walloping influence of outlier data points in small data sets. Such revelation would not have been possible without resort to use of graphical displays; more specifically, without making a contrast between visually based and formulaic techniques. More recently, Chatterjee and Firat (2007) reached a similar conclusion.

This article briefly canvasses the history of the techniques used for the graphical exposition of research findings and examines their theoretical and methodological limitations. Based principally on an analysis of these deficiencies and inspired by what Edward Tufte (1983, 1990, 1997, 2007) described as the most compelling visual image conveying data that has ever been produced, it proposes a new general approach, called clustered iconography. By way of summary, this technique revives and improves the tradition of graphic illustration. It heralds a return to a largely forgotten movement that flourished a century ago, the aim of which was to increase data accessibility through picture-based displays. These early efforts focused mostly on presenting quantitative results and typically entailed representing numerically encoded findings in a raw or summative form (Onwuegbuzie and Dickinson 2008).

It is noteworthy that the approach presented in the present article is sufficiently versatile to showcase results derived using qualitative methodologies. Indeed, for practitioners and in particular consumers of research, graphical techniques help make findings easier to understand and foster better engagement with data, be it quantitative or qualitative. For researchers, such newly minted techniques, especially the one presented in this article, offer increased transparency of results and the ability to include rather than eliminate unusual cases from analysis. For reasons that will be made explicit, clustered iconography is largely blind to qualitative versus quantitative differences but instead embraces what Goering and Streiner (2013) described as “reconcilable differences” between each kind of approach. The technique to be defended also builds on Pauwels’s (2010:545) “Integrated Framework” (presented in Sociological Methods & Research) insofar as it at least partly “connects and transcends” the strengths of the visual display modes and techniques currently available.

This article argues that clustered iconography is well suited to the imperatives of modern research in general and to those of sociological research in particular. This is the case because, in overcoming some of the limits of traditional graphical approaches, the technique offers an alternative to complex data transformation. While the method makes no claim to replace either current text-heavy methodologies for presenting qualitative data or number, table and chart-based methods of depicting quantitative findings, it provides more summative power than the former, as well as greater transparency and ability to handle outliers than the latter. These advantages build on some recognized benefits of visual display techniques. In this vein, as Langley (1999) argued, picture-based data presentation stimulates the entertaining of hypotheses by readers.

A Short History of Methods for Displaying Research Findings

The development of visual display techniques predates written languages, not to mention modern statistics. One has just to consider, for example, prehistorical evidence showing that early hunters used notches or other marks to keep track of their kill rates (Cartmill 2009). However, the objective of this section is not to offer a comprehensive chronological survey of graphical representation methods. More modestly, the intention is to highlight salient aspects of immediate antecedents to the proposed technique.

As a process-related point, for exposition purposes a historical perspective of graphical methods is chosen to showcase precursor techniques for two reasons. First, such an approach organically reveals how methods emerged mostly in response to circumscribed practical problems. Indeed, using chronology as opposed to, say, a cross-sectional conceptual taxonomy (as is done in Pauwels 2010) to describe techniques starkly reveals that although each approach was intended to address a need, each also embodied something of a trade-off. Specifically, the resolution of one problem was associated with the beginning of another. The second reason a historical approach is used to present graphical techniques concerns the fact that much early innovation in visual displays comes from sociologists and was a response to sociological content matter. Indeed, early sociology journal articles are often replete with bar charts (Hart 1896), line graphs (Marro 1899), parametric density plots and dot plots with standard errors (Chapin 1924), scatterplots (Sletto 1939) and social network diagrams (Lundberg and Steele 1938). However, despite such an auspicious initial contribution to the development of visual techniques, somewhere along the way, academic sociologists have mostly ceased to be innovative in this area. More precisely, in key respects, they have lost ground, simultaneously becoming vanilla in their tastes and unambitious with their solutions. A case for the proposed technique emerges from such neglect and is inspired by the artistry embodied in an image created during the nineteenth century. As such, the method at the heart of this article addresses long-identified inadequacies, largely breaks free from the trade-off paradigm and reinvigorates the discipline of sociology as having something important to say about data presentation.

By way of preamble, of the four techniques surveyed (graphs, visual languages, concept mapping and clustered representations such as Chernoff’s faces), the technique to which clustered iconography leans on to the greatest degree, Chernoff’s faces, is also that which has almost completely fallen into disuse.

Graphs

In 1637, French philosopher and mathematician René Descartes was the first to link Euclidean geometry with algebra through two- and three-dimensional (now called Cartesian) co-ordinates along x, y and z axes. In the early nineteenth century, as statistics as a method of systematic communication of research findings developed, the need also to create ways for visually presenting data and patterns in data emerged. A seminal figure pushing this latter agenda was Scottish engineer William Playfair. For the first time, in 1786, Playfair produced techniques for showing how parts relate to wholes (Beninger and Robyn 1978; Spence 2005). Somewhat inspired by the problems occasioned by small data sets and missing data, he relied on Cartesian coordinates to develop strategies such as the line graph (see Figure 1), the bar chart, the pie chart and the circle graph (Beninger and Robyn 1978; Spence 2005). Playfair’s depiction of Scottish imports and exports to and from 17 countries in 1781 has been lauded as the first “pure” solution to the problem of discreet quantitative comparison (Tufte 2007). Previously, data had typically only been located spatially (i.e., using coordinates or with tables) or through creating time lines, a technique developed two decades earlier by Joseph Priestly as a means of comparing life spans.

Figure 1.

A graph, showing exports and imports from England to North America (Playfair 1786).

Playfair’s efforts found multiple applications. For example, they were not lost on Florence Nightingale who, working as a nurse during the Crimean War, reformed nineteenth -century public health administration and arguably inventing the discipline when she transformed unwieldy numbers-based tables into graphical formats to convey key points to the British government (Cohen 1984). During this period and later during when she became interested in developing a solution to India’s sanitation problem, Nightingale hybridized certain of Playfair’s descriptions to develop the “polar area diagram” (sometimes known as the Nightingale-Rose Diagram). These images resemble a circular histogram and were originally used to illustrate seasonal sources of hospital patient mortality. In their review of pre-twentieth-century data display techniques, Miles and Huberman (1984) identify other methods indebted to Descartes and Playfair. These include context charts (charts that show variables that are assumed to interact), growth gradients (similar to line graphs), portfolio matrices, scatterplots and state-flow charts (focusing on salient events during a given time period). Similar, possibly less comprehensive, reviews of graphical techniques in the tradition of Descartes and Playfair have been done recently (e.g., Inselberg 2009). Such approaches are known to today’s researchers; indeed, they are incorporated as standard output options of common statistical software, such as cloud functions in R and SAS. Depictions created using these computer-based methods resemble scatterplots but present a three-dimensional display that, in some cases, are rotatable.

Visual Languages

Nightingale was not the only person to use problems occasioned by war as an impetus to develop better data-communication techniques. Indeed, twentieth-century military conflicts inspired other theorists. For example, the first World Esperanto Congress took place in 1905 to promote a universal language to foster world peace and international understanding. In some respects, this initiative had objectives similar to those pursued by Playfair, that is, “to allow for the transfer of knowledge and insight between areas, even if imperfectly” (Rzhetsky and Evans 2011:3). The same period saw the development of purely visual international systems of communication. This effort culminated in Australian theorist E. K. Bliss’s (1965) “Semantography,” a language of over 10,000 symbols. Mid-twentieth-century universal languages had their own dictionaries, grammar (Bonsiepe 1965) and linguists (e.g., Bertin 1983). However, the system movement, as it became known, only took root in the field of transportation where today, regardless of cultural context or geographical location, internationally consistent symbols enable motorists to comprehend rapidly basic driving advice and regulations.

The most influential of the twentieth-century efforts to systematize visual representation of data is the Wiener Methode der Bildstatistik (Vienna Method of Pictorial Statistics), the brainchild of Otto Neurath of the New College of Commerce, Vienna and founder-director of the city’s Museum of Society and Economy. Neurath was an influential member of the Vienna Circle, a group of philosophers of science and philosophically literate scientists active from 1924 to 1936. After World War I, these thinkers sought to help people consider rationally social and economic problems and spot reasoning errors in ideological fanaticism. They are mostly remembered for developing a philosophical movement known as logical positivism, which in their view was an improvement on Auguste Comte’s “classic” positivism. From Comte, they retained a belief in verifiable (empirical) facts, the affirmation of the fact-value distinction, the conviction that all sciences must follow a unique method and a general confidence in the ability of science to guide social progress (Joullié and Spillane 2015:121-124; Kolakowski 1969:169-200).

The Vienna Method of Pictorial Statistics was faithful to the Vienna Circle’s agenda. It succeeded in its central purpose to make scientific statistics (as opposed to data) accessible to lay audiences. At its heart was International System of TYpographic Picture Education (ISOTYPE). ISOTYPE invoked a set of hundreds of standardized pictorial symbols (such as those appearing on toilet doors or street signs) to represent social and technical data with guidelines on how to combine them using serial repetition. These symbols, mostly designed in 1936 by Rudolf Modley, Neurath’s assistant, were abstracted but nonetheless natural or quasi-natural (Bresnahan 2011:9; Modley 1938). Natural signs, often referred to as “icons,” embody an intuitive (“natural”) relation to a signified entity (Nöth 2001). For example, non-text-based road signs are typically natural symbols, whereas abstracted signs, such as national flags, are pure symbols that exert their communicative power by agreement.

Neurath’s view was that symbols, either natural or abstracted, have to be self-evident in their meaning. His objective was “a system of optical representation […] that would be universal, immediate and memorable [ensuring] that even passers-by […] acquaint themselves with the latest sociological and economical facts at a glance” (Neurath, quoted in Cartwright et al. 1996:65; see also Neurath 2010, 1936:32-33). Improving graphical effectiveness was only part of the point, however. Neurath was an early exponent of the view that quantity, or scale, should be represented by symbol frequency: for example, a collection of eight “man” icons represented 8 or 80 or 800 people, while four represents half that number (see Figure 2). In Neurath’s view, adjustable-sized pictograms are ambiguous. When looking at them it is unclear, for example, whether the height or the area of the icon portrays differences in the scale of the represented entity or in its number.

Figure 2.

International System of TYpographic Picture Education chart from Neurath’s (1939) Modern Man in the Making.

Aside from using frequency rather than scale to represent relative magnitude, ISOTYPE adopted other conventions. For example, displays had to read like a book, from top left to bottom right (Uebel 1991:227). When representing geosocial distributions, icons had to be arranged in ways that suggest, or be compatible with, a map (Neurath 2010:81-82). Neurath also insisted that complex facts be transformed into pictures that told a ‘story’ in a holistic and intuitive manner. However, somewhat like storytelling in the literature-based sense of the term, he argued that where diagrams are deployed, they should disclose only one overarching narrative or theme. The influence of logical positivism’s taken-for-granted existence of observable, verifiable facts and of its Viennese advocates’ overall desire to improve society through the objectification of social and economic problems is manifest in ISOTYPE’s conventions.

One of Neurath’s legacies has been increasingly sophisticated graphics (Holmes 1984, 1993). Surprisingly, however, his influence on the scientific and nonscientific presentation of data and statistics has remained largely unrecognized. This is presumably because most of his work on the Vienna Method of Pictorial Statistics remained untranslated into English, even if occasional commentary about the technique has appeared in scholarly literature (Kinross 1981; Neurath and Kinross 2009). The recent translation of Neurath’s (2010) “visual autobiography” assists to remedy this neglect (Cat and Tuboly 2019).

Concept Mapping

Concept mapping as a formal technique of representation was developed at Cornell University by a team headed by physical scientist Joseph Novak (1990) and psychologist William Trochim (1989). Novak and Trochim’s guiding principle was that pictures or maps should be used to represent relationships between ideas. Initially meant to help capture the evolving science-related knowledge of journeyman students, concept mapping has spread into fields as disparate as social psychology (Lord et al. 1994), environmental science (Barney, Mintzes and Yen 2005), business management (e.g., Kolb and Shepherd 1997) and sociology (Trepagnier 2002).

Concept mapping has its theoretical origins in constructivism (Novak 2009). As opposed to positivist epistemology, constructivist epistemology holds that (scientific) knowledge is socially constructed, that is, emerges from a body of conventions, rules, paradigms and values themselves embedded in (and contingent on) social and historical contexts. In the constructivist view, meaning is self-referent because it is recursively created by individuals drawing on their experience (Novak 2009). In this sense, concept mapping has a “family resemblance” (to reuse Rosch and Mervis’s [1975] expression; see also Medin, Wattenmaker and Hampson 1987) with “visual thinking,” the methodology pioneered by Stanford University’s Robert McKim (1972, 1980). According to McKim, visual thinking is effective as a reflective problem-solving tool rather than a pure representation technique.

Concept mapping offers a semistructured visual representation of findings, albeit one less constrained than the traditional graph or table. A defining difference between the graph and the concept map is that, rather than merely representing abstracted findings delimited by two-dimensions, concept maps are not primarily meant to be as reflective of nature as graphs. Rather, they offer the flexibility to depict, or speculate about, dissimilar kinds of variable associations (e.g., causation, correlation, mediated relationships; Kinchin, Hay and Adams 2000; Novak 2009). In preparing concept maps, ideas have to be generated and the relationships between them articulated, a process that has been compared to the conceptual stage of structural equation modeling (Trochim 1989:1).

Concept maps invariably go beyond being merely representative of a data set in either complete (such as columns of numbers) or summarized form (such as means or standard deviations; Novak 2009). Hence, unlike ISOTYPE presentations, they are not purely and only ways of displaying research findings. Rather, concept mapping is a research technique as much as, if not more than, it is an approach to display. Indeed, Davies (2011) concluded that they are a means of crystallizing or representing ideas as opposed to merely presenting data to make points (Kinchin, Möllits and Reiska 2019).

Insofar as producing concept maps is concerned, in the early stages of analysis (prior to the use of software), the process often leverages group creativity and lay or expert interpretation. However, studies have questioned whether concept maps can be intuitively interpreted by anyone other than those involved in their creation (Brumby 1983; Kolb and Shepherd 1997; Turns, Atman and Adams 2000). Theorists such as Trochim (1989) have taken up this point when arguing that, if concept mapping were a credible scientific procedure (as opposed to a process designed to explore and represent knowledge), it would have initiated theoretical advances. However, such an innovation has yet to materialize. Moreover, at least insofar as a strictly positivist understanding of the scientific method is concerned (i.e., observations, hypotheses, predictions, experiments, verifications, etc.), the free-wheeling process of creating maps cannot be easily reconciled with the prescriptive and iterative sequence that is integral to the hypothetico-deductive approach (Kalleberg 2016). Consequently, once again at least from a narrowly positivist perspective, concept mapping is endemically at risk of not embodying characteristics that are often associated with the “doing of science” such as being replicable, verifiable (or at least falsifiable), based solely on amassed empirical (sense-based) data and subject to peer review.

In spite of its shortcomings, concept mapping has key advantages. For example, it has potential to attenuate the problem of “meandering” from observation to inference (Kinchin, Möllits and Reiska 2019; Novak 2009). This is so because as noted, at least in group contexts, the technique draws on the inherent statistical stability of multiple perspectives emerging from the same set of underlying data. In this latter sense, it is more easily classified as an instantiation of science and scientific protocols in that it is reconcilable with the Aristotelian classical model that embraces elementary forms of reasoning as well as compound forms (such as reasoning by analogy, the elements of which have parallels to concept mapping).

Clustered Representations

Clustering of data into more or less similar visual objects is typically used in data mining scenarios involving management and work-related phenomena (Berkhin 2006). The technique has been deployed most notably by Chernoff (1973) and more recently De Soete (1986), who each used a rich preexisting schemata (the human face) to showcase variables. For example, it is possible to package state crime rates in a single human face depiction, with the overall height representing murder rate and the width of the eyes representing the relative rate of aggravated assault (see Figure 3). De Soete (1986:549) argues that faces offer viewers a package where “phenomena that would be noticed less easily when the data were presented in tabular form [and] serve as mnemonic device [as well as] a straightforward means for communicating results to others.” The technique assumes that parallel delivery of data in an easy-to-comprehend format facilitates users remembering a set of results. It also seeks to empower viewers to conduct their own informal and spontaneous calculations.

Figure 3.

An example of Chernoff’s clustered icons (faces).

Chernoff never claimed to break new ground in development of clustered representation techniques. Rather, more modestly, he asserted that “instead of using machines to discriminate between human faces by reducing them to numbers, [I] discriminate between numbers by using the machine to do the brute labor of drawing faces and leave […] the intelligence to humans, who are […] more flexible and clever” (Chernoff 1973:18-19). In modern parlance, Chernoff wanted the viewer to engage in sense-making (Langley 1999; Raciborski 2009). Such reader involvement in data-processing exists also with descriptive matrices. Indeed, such matrices display raw data that “both force and support analysis” where “local contexts are seen holistically, not lost in dispersed narrative” (Miles and Huberman 1984:26).

Research addressing the efficacy of Chernoff’s faces has mostly failed to confirm his hypotheses concerning intuitive and spontaneous simultaneous viewer interpretive integration of multiple data sets and intuitive viewer interrogation of data (Raciborski 2009). Rather, there is evidence that humans are less adapted to appreciating the multiple variables contained in a face than Chernoff hoped would be the case. For example, Morris, Ebert and Rheingans (2000) have shown that features of Chernoff’s faces were not “pre-attentive”; that is, they did not lend themselves to the supposed rapid delivery of multiple variables offered by data packaged in such an image. Notwithstanding such findings, derivatives of the Chernoff technique which depart from the face idea and rely on stars, circles, stick features, or glyphs are more easily and obviously decipherable (Ward, Patterson and Sifonis 2004). When they are successful, these approaches have in common an intuitive relationship between characteristics of underlying variables and the represented image. Hence, Chernoff’s faces techniques are likely to be useful in a restricted range of contexts, specifically those which present data from a cluster of variables associated with human physiognomy.

Limitations of Visual Display Techniques

Multivariate visual displays are widely regarded as being in their relative infancy (Hurley 2004; MacKay and Villarreal 1987). As such, they are often relegated to being an unsophisticated illustrative cousin to “real” analysis (Tufte 1997, 2007). Visual displays are however, even at this stage, sophisticated enough for there to be a measure of consensus concerning how they should be deployed optimally. On this matter, Edward Tufte, the modern father of visual displays and author of The Visual Display of Quantitative Information (1983, 1990, 1997, 2006), captured the middle ground. In Tufte’s (1983:51) words:

Graphical excellence is a matter of substance, of statistics and of design. It consists of complex idea communicated with clarity, precision and efficiency. It is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the shortest space […]. It is nearly always multivariate […]. And graphical excellence requires telling the truth about the data.

Later in the same seminal work, Tufte lamented the deficiencies of visual display techniques. Indeed, he gave a summary of their generic limitations, as indubitable as his subsequent critiques (e.g., 1990, 1997 and 2007) and as relevant in 2019 as it was in 1983. Tufte (1983:177) is worth quoting once again:

They (graphical displays) can be described and admired but there are no compositional principles on how to create that one wonderful graphic in a million. [The] best one can do for more routine workday designs is suggest some guidelines such as: have a properly chosen format and design; use words number and drawings together; display an accessible complexity of detail and avoid content-free decoration and chart-junk.

In this section, three of the more obstinate shortcomings of graphical display techniques are examined: the limited-number-of-variables problem, the nature-of-the-relationship problem and the problem of representing the significance of a relationship or effect size.

The Limited Number of Variables Problem

With the exception of concept maps and charts that illustrate interactions (for instance moderating or mediating effects) but which do not present data, visual devices have typically struggled to present relationships between more than two variables. Although it is possible to find examples of graphic techniques which wrestle with three four or even five variables, in such cases, the displays are often not easily decipherable. A case in point is the Cartesian-based double-X, Y graph. This strategy depicts three variables, two of which are routinely established as independent and one as dependent. Owing to their enhanced complexity when compared with the X–Y plane, such graphics are typically difficult to read, can be used to mislead, or are ambiguous (Wainer 2000; Wheelan 2013).

Like Cartesian displays, ISOTYPE presentations are mostly unable to depict multiple variables or, more technically, lose their potency in exponential proportion to the number of dimensions added (Lupton 1986). This limitation is present in Figure 2, which graphically configures four variables: (1) year of production, (2) pounds of production (measured in batches of 50 million), (3) number of home-based weavers (with each black man representing 10,000) and (4) number of factory-based weavers (with each gray man representing 10,000). The image is moderately successful in conveying its message because the last two variables are closely related conceptually. Hence, it is possible to appreciate readily the two main points the author is making. First, there is a phasing out of home-based weaving throughout the nineteenth century. Second, there is an uptick in weaving output over the same time frame. However, consider the hypothetical case where home-based production was not replaced by factory workers but by machines. Such a state would necessitate that the gray men be exchanged with images of (say) stylized steam engines. There would still be four variables being presented in the image, but there would also be more conceptual distance between the last two and it would be slightly more difficult to “see” its key points. A further degradation in intelligibility would ensue if a fifth variable was added. Consider, for example, the case where each “year-row-set” was not associated only with measures of output and “weaver location” variables but also stylized images of hamburgers, each of which was intended to portray total caloric intake for the weaving workforce for the years in question. If this were to be done, it would take a reader longer to appreciate each of the diagram’s central messages. This kind of thought experiment unearths a trade-off: the more points made, the greater processing time and resources required. In practice, processing becomes near impossible beyond a limited number of variables, a generic preoccupation of information processing theorists (Siew, McCartney and Vitevitch 2019).

Related to the problem of limitations in variable number is a similar challenge, well embodied in literature (e.g., Wheelan 2013) concerning value labels. The case of one nominal variable, the most basic descriptive scenario, illustrates this dilemma. Suppose a researcher wanted to illustrate using a graphic device the relative frequencies of different kinds of fruit that are grown commercially in the State of Florida. A pie chart does this job fairly well; however, according to some (e.g., Wainer 2000), a histogram would suit the purpose even better. The task is manageable if there are five, six, or seven fruit in the universe (Florida) of items to be represented. At some point, however, the number of variable value labels in a set imposes a burdensome strain on the graphical technique. This difficulty does not necessarily affect the spatial capacity of the approach to handle the case load but does diminish the technique’s utility. Put more simply, visual display devices are often perfectly capable of accommodating a large number of value labels; however, beyond a certain limit, they cease to do their real job, which is to make a point about what needs attention.

Insofar as concept maps are concerned, as noted, it is debatable whether such techniques really count as examples of approaches to data display. Regardless, as is evident in Figure 3, although concept maps are effective at handling several variables, from an end-user perspective, they remain liable to being ambiguous.

The Nature of the Relationship Problem

With the exception of concept maps, visual techniques do not easily represent the nature of a relationship between the variables they depict. Indeed, they do not necessarily even have face validity in the sense that they do not intuitively show how focal variables are connected. This shortcoming is usually overcome using convention. For example, when doing research in the social sciences, analysts will normally have a hypothesis about cause and effect (Dumez 2013) and design a study (for instance a field or natural experiment) to test their suspicion about the natural order. In presenting their data pictorially, they often use a graph to plot the cause (independent variable) on the x-axis and the effect (dependent variable) on the y-axis. The American Psychological Association’s (1994:56) guidelines are prescriptive on this point: “Use the X-axis to plot value labels for the independent variable. Use the Y-axis to plot value labels for the dependent variable. Place headings on each axis accordingly.” This counsel, offered without rationale, is a thinly veiled admission that there is nothing inherent in Cartesian techniques requiring, for reasons of logic, that axes be used in the way that has become orthodox.

Although to a lesser extent than Cartesian techniques, ISOTYPE displays rely on convention to imply causation or, at least, suggest a temporal or spatial ordering of events. Figure 2 indicates how such convention typically operates. In relation to this graphic, one may ask: is the year “causally prior” to output levels or weaving locations? The figure implies that it is. If the graphic had put output where each year is depicted and a horizontal bar representing reference year where output data are currently presented, it would have conveyed a different (and somewhat bizarre) message.² The change would come about because convention prescribes that, in a situation where there is one putative causal variable and several putative effects variables, the causal element be separated from its effects.

Concept maps are adapted to showing relationships between variables as perceived by the researcher and to stimulating the viewer in imagining new relationships (Davies 2011; Novak 2009). In this sense, they are unlike most other visual display methods and perhaps do not really count as examples of data presentation techniques (or, as discussed, an instantiation of science and scientific protocols when seen from a narrow perspective). Furthermore, like the other approaches considered and owing largely to the number of variables dilemma, concept maps sometimes become overwhelming visually which limits their third-party/end-user consumption (Davies 2011).

The Significance of the Relationship/Size of Effect Problem

Visual display techniques are poorly adapted to demonstrating statistical significance. It is of course possible to augment elements of an image to show statistical significance. Indeed, this is routinely done through using, for example, asterisks besides bars (value labels often depicting frequencies) on Cartesian-based portrayals (Wainer 2000; Wheelan 2013). Such augmentations rest (and must rest) on independent statistical analysis, the output of which is then combined with the graphical device. In this sense, the visual presentation is not a substitute for statistical manipulation but rather supplemental in nature, used to make a point. Besides, although this possibility is rarely embraced, visual displays such as Chernoff’s faces arrays provide opportunity to show an effect size but are poorly adapted for demonstrating statistical significance. However, thee are exceptions. For example, using specified conventions—perhaps a cross-eyed effect and so on—a difference that is statistically significant but of negligible magnitude can be made apparent in such displays.

Clustered Iconography

According to Tufte (1983, 1990, 1997, 2007), the most compelling visual display ever produced was a representation of Napoleon’s march on and retreat from Moscow, rendered in 1869 by Charles Joseph Minard (reproduced in Figure 4). This depiction reconciles six variables: army size, army location on two-dimensions, trajectory of army movement and daily temperature during the retreat. The image is read and absorbed effortlessly. Perhaps this is mostly so because there is something of the quintessential picture about Minard’s rendering. As such, it (at least partially) concretizes the abstract under the influence of privately understood, but nonetheless generic, principles. For present purposes, the image appears to resolve unselfconsciously the aforementioned three problems of graphical display while handling dimensions that are neither intuitively associated or within the average viewer’s realm of daily experience.

Figure 4.

A representation of Napoleon’s march on and retreat from Moscow (Minard [1869] from Tufte 1983).

In his own ways, Minard was an artist. Unfortunately, he left no instructions concerning how to produce similar graphics for other phenomena. As a result, the viewer is left both in awe and frustrated at not being able to borrow the patent. The beguiling questions are, how did Minard decide what variables are consequential? Did he have a-priori principles for juxtaposing these? If so, what were they? How did they become the principles? (i.e., what meta principles were in play?). Although answers are elusive, it is noncontroversial to conclude that the work itself is a very different kind of rendering to that which typically gets used by contemporary sociologists to represent multivariate results. Indeed, Tufte (1983, 1990, 1997, 2007), in marveling at Minard’s handiwork, bemoaned such a paucity of adventurism, noting that the modern social scientist rarely gets beyond scatterplots or bar plots, often producing an image that takes too long to process while delivering underwhelming substance.

The approach to be presented in the current article for displaying visual knowledge—clustered iconography—is inspired by a hard and detailed look at Minard’s image. As such it represents an attempt to pin-down those elusive principles that guided Minard’s design. From a more technical perspective, the proposed approach offers a partial remedy to several of the problems associated with established techniques, summarized earlier as boiling down to three generic problems. In making progress with each of these, the new method also breaks free from the aforementioned trade-off dilemma that was described in the historical survey of graphical techniques (i.e., that removal of one problem is associated with emergence of another). Indeed, clustered iconography incorporates the notion that visual data display act as a trigger for ideas and facilitate an understanding of relationships between variables. It also conveys the (statistical) significance of those relationships. More specifically, the new approach addresses a key weakness of Chernoff’s construct, his overly optimistic assumption that humans undertake “natural” analysis of data when it is encoded in a face (Raciborski 2009). With respect to this hypothesis, Chernoff failed to embrace a key edict of the Vienna method: Graphical depictions must have an intuitive relationship with what they represent if they are to be effective. This principle is at the heart of clustered iconography.

In addition to drawing on aspects of Chernoff’s approach and conventions borrowed from the Vienna Method, clustered iconography relies on orthodox Cartesian geometry and mathematical principles. What emerges is a data presentation technique deploying four key doctrines, three of which derive from its predecessors in the field of visual statistics. It is to these doctrines that the discussion now turns.

Natural Signs

Unlike traditional concept mapping and especially Chernoff’s faces, clustered iconography makes use of ‘natural signs’ wherever possible. That is, clustered iconography relies on visual representations when these are intuitively associated with the represented construct. This feature is indebted to the ISOTYPE lexicon of symbols that were established to enable rapid uptake of a key message. An emphasis on the representative value of an image puts clustered iconography somewhat close to the Vienna Method and as such a solution to one of the key problems of the Chernoff conceptualization. However, unlike ISOTYPE, clustered iconography does not demand that the symbol and the symbolized be associated in a manner that is immediately apparent. Rather, the clustered iconographic designer should balance symbol fidelity with pragmatism in achieving intelligible clusters.

Clustering

To facilitate grouping, clustered iconography does not adhere strictly to true iconic representation. More particularly, the new approach violates the first rule of ISOTYPE, which states that variation in scale (e.g., frequency or size) should be represented by repeated pictograms rather than proportionally larger or small images. As mentioned, Neurath considered use of scale to represent quantitative differences as potentially creating ambiguity because it leaves readers unsure whether to choose icon height or surface area as an indicator of relative magnitude. Abandoning this principle, however fraught or valid, creates a key difference between clustered iconography and its predecessor techniques. Specifically, it permits the new approach to depict multiple variables. Clustered iconography aims to display not one or two (as in the case of ISOTYPE and concept mapping, if to a lesser extent) or three (Cartesian geometry) but multiple variables in a single combined pictogram. It also allows for the illustration of several relationships at once, although in a less direct manner than, say, Cartesian geometry.

A central principle of ISOTYPE displays was the presentation of a single theme, even though many of Neurath’s diagrams inhered more than one variable. In a clustered iconography display, each clustered icon represents one data source, whether a single identifiable group such as an organization or family, or an individual (see, e.g., Figure 5). In adding variables to a clustered icon, the designer balances the risk of exacerbating viewer confusion with that of foregoing one of the key advantages of clustered iconography, empowering at the same time the reader to take an active role in data interpretation in a manner not dissimilar to that of concept maps. However, in addition to its ability to handle variable case load, clustered iconography disengages from the constructivist philosophy of concept mapping which uses maps or pictures to enable a researcher to speculate about reality (Novak 2009). Rather, the new method allows greater and more disciplined engagement with raw data. Unlike concept mapping, it establishes the audience, as opposed to the creators, as the central interpretive actor.

Figure 5.

Family units presented in a clustered iconography format, with each circle “containing” a cluster of variables (Muurlink and Islam 2010).

When handling data obtained using ratio-measure scales (i.e., those with a zero-point), current standards oblige social science researchers to reveal, at a minimum, descriptive sample statistics (e.g., American Psychological Association 1994). Clustered iconography is compatible with such disclosure but places emphasis on the display of variables grouped on a case-by-case basis. For example, a corporation, a number of employees or a management structure appears as a single, compact semiotic cluster.

Figure 5 presents a clustered iconographic format to represent 60 urban (Dhaka) and rural (Barguna) Bangladeshi families (Muurlink and Islam 2010). In this graphic, several household characteristics are summarized in a single iconic cluster: (1) house size (represented by the size of the enclosing circle), (2) family income (represented by the thickness of the circle circumference), (3) child numbers (represented by the number of human icons within the circle), (4) child physical and mental health and (5) maternal life satisfaction. The study explores the consequences of displacement of populations from coastal regions such as Barguna to the slums of Dhaka. Using the figure’s simultaneous presentation of massed data images rather than summary statistics, the viewer is encouraged to conduct proto-statistical procedures. The key points are relatively easy to appreciate: Dhaka households are smaller, richer and have more satisfied matriarchs than Barguna households.

Spatial Distribution of Icons

Clustered iconography reintroduces axes of the Cartesian graph or other forms of spatial distribution of clusters. By restoring a plain-based coordinate system to visual statistical representations (as in Figure 6), two concepts out of three or more are foregrounded in a familiar two-dimensional depiction. This advantage is achieved because a Cartesian grid enables clustered icons to be positioned along two dimensions, with distances represented proportionally (this Cartesian display is optional and contingent on actual data; other systems of spatial distribution of clustered icons are possible). Such familiar geographical ‘maps’ are intuitive and lend themselves naturally to research employing global positioning system or geographic information system technology. Presenting clustered icons on maps allows the telling of research stories in a way that using statistics derived from the general linear model cannot achieve. It also allows for the simultaneous showcasing of variables from dissimilar scales of measure including nominal, ordinal, interval and ratio. In this latter respect, it is sufficiently versatile to handle multimethod derived results and thus respond sensitively to Goering and Streiner’s (2013) aforementioned qualitative/quantitative “reconcilable differences” dilemma.

Figure 6.

A clustered iconographic chart representing the relationship between managerial stress and policy adherence in a group of five companies on two axes. The icons simultaneously express the size of company (reflected by the size of circles) and core management team (the thickness of the “axle” around which the companies pivot), profitability (the thickness of the circumference) and growth (the size of the spurs; Muurlink et al. 2012).

Ranking

While eliminating the rule of proportionality in presenting ISOTYPE symbols, clustered iconography, notwithstanding its versatility, is better adapted for categorical, ranked, or ordinal relationships than it is for interval or ratio relationships.³ By combining ranked and categorical variables, the new method preserves the reflective nature of concept mapping. However, it also allows interrater agreement to converge, even in qualitative research. Reducing ambiguity in this way forms part of a strategy described by Guttman (1944) as the “quantification” of qualitative data (see also Van de Ven and Poole 1990). Using ranking to structure data does not require stringent assumptions about a data set aside from the presupposition that elements are ordinal (Guttman 1944; Siegel 1957).

Ranking partially assuages Neurath’s (1936) concern about the ambiguous nature of scaled icons. Specifically, the clustered iconographic approach makes fewer assumptions about absolute values and more about relative values, with absolute values being sacrificed to allow more compact depiction of multiple variables. However, where researchers have access to quantitative ratio or interval data, they can choose to use clustered items that preserve proportionality in their expression (as is done in Figure 5, where income and house size were conceived of and operationalised as a continuous variable and represented proportionally).

Figure 6 presents an example of a clustered iconographic chart depicting management-related data using ranking based on interval or ratio-scaled variables (Muurlink et al. 2012). The project drew on five case studies of companies, with aliases Queentech, Bourke, Airpower, Orange and Equal. Absolute rather than ranked data were available for the size of the firms, with such sizes being measured through employee number (expressed in area of the circle) to represent proportionally the size of the firms in the study. The absolute size of the management team is indicated by the size of the pivot of each circle/icon. Lastly, the axes represent the two focal variables, managerial stress and degree of policy formality.

Figure 5 bears resemblance to portfolio matrices that became popular in the 1980s (e.g., Hambrick, MacMillan and Day 1982; Marrus 1984), in that it features iconic and contextualized summaries of firms. However, unlike portfolio matrices, clustered iconography is able to depict multiple variables of a single firm captured in one cluster, with the possibility of an open-ended number of clusters to show interorganizational trends as well as unique intra-organizational characteristics.

While the Muurlink et al. (2012) research adhered to the conventions of a qualitative case study (e.g., Yin 2009), the use of clustered icons allows quantitative and qualitative elements to be integrated into a single graphical representation. For example, the representation shown in Figure 5 suggests a relationship between formality and stress, a finding that animates the article’s central message. The graphic allows for other putative causal relationships to be largely discounted. If such diagrams are presented as a series, with variables added one or two at a time, it is possible to build illustrations that represent a holistic picture of the entity (in this case a firm) without betraying the identity of the individual entity. A further advantage of clustered iconographic charts revealed in this example is that they allow for fine-grained presentation of data while preserving the anonymity of individual cases.

Design Guidelines and Advantages; Limitations of Clustered Iconography

One way to assess the value of clustered iconography is to examine whether the method (a) represents progress in the quest to resolve the three key problems presented earlier⁴ and (b) does not come with limitations that offset its benefits.

Design Guidelines

With or without the help of a software, the thorniest challenge when using clustered iconography remains the creation of the chart. A handful of general principles will help simplify the task. These principles, presented in Table 1, are not associated rigidly with a correct order (linear sequence) and hence not numbered.

Table 1.

Generic Principles for Creating Clustered Iconographic Charts.

Principle
With small data sets, researchers assemble the variables they wish to highlight into a grid using, for example, standard spreadsheet software. Aspects of the cases conceptualized as rankings are converted at this early stage. Choose variables (if any) to be foregrounded in the charting process and decide whether (as in Figure 4) to divide the clustered icons into two or more groups to illustrate differences. The variable chosen to enclose or cluster icons should be focal in making such a judgment. Characteristics of the icons should be chosen based on an apparent intuitive relationship that they have with a referent (in Figure 4, growth is represented by spurs or arrows). As in Figures 5 and 6, their size should correspond to the relative size of the represented entity. By grouping the icons in clusters, it is possible easily to shift them on the background (whether axes or groupings) at a final stage to illustrate different aspects of the results. With larger data sets, clustered iconography designers can choose from among several ways to reduce the data to manageable proportions; for example, they could use a randomly selected subset, group summary data, or use icons from extreme and mean cases.

Principle

With small data sets, researchers assemble the variables they wish to highlight into a grid using, for example, standard spreadsheet software. Aspects of the cases conceptualized as rankings are converted at this early stage.

Choose variables (if any) to be foregrounded in the charting process and decide whether (as in Figure 4) to divide the clustered icons into two or more groups to illustrate differences. The variable chosen to enclose or cluster icons should be focal in making such a judgment.

Characteristics of the icons should be chosen based on an apparent intuitive relationship that they have with a referent (in Figure 4, growth is represented by spurs or arrows). As in Figures 5 and 6, their size should correspond to the relative size of the represented entity. By grouping the icons in clusters, it is possible easily to shift them on the background (whether axes or groupings) at a final stage to illustrate different aspects of the results.

With larger data sets, clustered iconography designers can choose from among several ways to reduce the data to manageable proportions; for example, they could use a randomly selected subset, group summary data, or use icons from extreme and mean cases.

Advantages

Insofar as issue (a) is concerned, clustered iconography partially resolves the more persistent problems of traditional visual display techniques. It offers researchers a means of showcasing multiple variables in a single two-dimensional representation, making use of relationships that are either intuitively familiar (through the use of icons) or familiar by virtue of widespread convention.

In contrast to some of its predecessors, clustered iconography is not formulaic. Rather, like concept mapping, it prescribes no “right” way for chart creation and indeed cannot do so on the grounds of logical entailment. Instead, it offers guiding principles, summarized in Table 1. These include using icons with characteristics intuitively linked to what is being represented; established clusters to represent single organisms, organizations, or groups; and optionally using ranking and spatial arrangement to enhance either the number of variables represented or the relationship between what clusters represent.

Another area where clustered iconography represents an advance over traditional graphical display techniques is that it facilitates transparency in results presentation. For example, it allows the reader to drill into individual cases and generate alternate hypotheses while enabling the authors to retain control of the points they are making. The method is also well aligned with the philosophy that underpins the open-data, open-science, open-knowledge movements (e.g., “European Countries” 2018; Ibanez, Schroeder and Hanwell 2014; Molloy 2011) because it encourages publication of raw, or nearly raw, data in a meaningful way rather than following statistical transformation. This advantage has special import in the contemporary Western public policy context where there is an emerging imperative to open-up publicly funded research to broader scrutiny (Eisenberg and Nelson 2002; “European Countries” 2018). It is also noteworthy that, at least in the United Kingdom, the issue of accessibility impacts grant money allocation decisions (Suber 2012).

Publishing data, in “raw” rather than “processed” form using clustered iconography allows inclusion of outliers in a manner that does not bias analysis. As Langley (1999:707) put it in commenting on the importance of enabling a with/without analysis of entities, “variety contributes to richness.” In practice, an outlier in a graphic display can be either included in or eliminated from interpretation. Clustered iconography allows extremes to be compared, removing, for example, an exclusive focus on orthodox measures of central tendency. In this, the method allows the representation of both patterns and exceptions from these patterns.

By presenting structured charts of massed clustered data, the clustered iconographic chart viewer is encouraged to conduct simple statistical procedures. Hence, the technique makes progress in the quest to devise a visual device that allows some insight into issues of statistical significance and effect size. Tufte (2007:127) argued that pictorial displays should draw attention to comparisons only “if they (such comparisons) are to assist thinking.” In that regard, the use, in clustered iconography, of axes or other forms of spatial arrangement to establish two or more variables as focal is visually suggestive of causation. Indeed, because clustered iconography draws on one of the strengths of concept mapping (the notion that maps should act as a trigger for ideas), the method represents the best that traditional visual techniques have to offer in showing variable association.

Insofar as the aforementioned issue (b) is concerned—whether limitations overwhelm the advantages of a technique—the problems associated with clustered iconography do not so much represent new dilemmas but rather scaled-down versions of old ones. For example, aside from technical difficulties associated with creating a clustered iconographic chart, those producing such illustrations need to wrestle with too many data clusters to present on an A4 or Letter page, the format typically used. However, the extent of this limitation should be evaluated in a relative sense via a process of comparison with rival visual techniques. Indeed, all conventional procedures ultimately reach the point of diminishing returns when variable number, value labels and data points exceed thresholds, which are in practice often low.

Limitations

Clustered iconography diagrams have limits on how many data points they can handle. As alluded to earlier, these are largely imposed by the size of a page. Beyond a certain threshold, descriptive sample statistics such as means and standard deviations become necessary. It remains nevertheless the case that, while the new method is ideal for illustrating smaller data sets, it is also true that icons created from aggregated data are able to handle larger caseloads.

Although clustered iconographic illustrations can appear overly ‘busy’ or burdensome, they do give the researcher an opportunity to present a comprehensive overview of data in a single, relatively compact illustration. Whatever the case, it is surprising that the charge of ‘busyness’ is mostly leveled at visual techniques. Even though it is rarely done, the same concern can be raised about, for example, the outputs from multiple regression analyses. These techniques, representing the purist instantiation of the general linear model, also arose from a need to examine simultaneously several variables. For myriad reasons, they often generate confusing and somewhat opaque results (Wheelan 2013).

Organizations, groups, tribes, clubs and families are dynamic multidimensional entities. While clustered iconography addresses the need for a comprehensible, systematic and simultaneous representation of multiple variables, a two-dimensional static graphical representation will inevitably struggle to represent dynamism. The growth spurs presented on, for example, Figure 6‘s icons partially overcome the problem. With the advent of electronic journals, it is also conceivable that computer-aided visual data display use clustered iconography to capture dynamism in looped animated sequences. At the time of writing, however, no such software exists.⁵

Conclusion

In presenting and defending a new visual technique, this article has sought to enhance the comprehensibility of statistical information and address information anxiety. Clustered iconography supplements rather than replaces current research exposition conventions such as the presentation of means, standard errors and probability scores. While four factors (namely iconography, clustering of variables, use of Cartesian dimensions and ranking in presenting data) characterize clustered iconography charts, only two of these (clustering and use of icons) are fundamental to the technique. Indeed, the utility of the method derives in part from an absence of rules.

None of the characteristics of clustered iconography is unique to the technique, with Chernoff’s faces and ISOTYPE being the most direct predecessors. This lineage means that clustered iconography has a “family resemblance” to graphs, concept maps and visual display languages. However, despite sharing characteristics with these techniques, the new approach is not a derivative of an existing representational method.

In common with Chernoff’s faces, clustered iconographic displays bundle variables in a single icon. However, the technique uses both intuitive and nonfixed relationships between icons and referents as well as spatial distribution to enhance instinctive interpretation of complex patterns. Like concept maps and Chernoff’s faces, the technique is more than a passive means for presenting ideas. Rather, it acts as a trigger for interpretation by the reader and potentially a pathway to independent analysis. As with graphs, clustered iconography relies on intuitive spatial distributions but is not as limited in the number of variables it can handle or otherwise restricted by only being able to present summative variables. Like visual display languages, clustered iconography presents massed, intuitive icons, but unlike visual display languages, it enables multiple variables to be presented. It can also present individual data points rather than summary statistics and in so doing encourages exploration of outlier cases, as well as contrasts between groups of cases.

In common with ISOTYPE, clustered iconography aims at presenting a ‘natural’ reality that is deemed to exist objectively. For instance, the method relies on vivid icons, the meaning of which is (ideally) transparent because the existence of what they stand for is not in doubt. In common with concept maps, however, clustered iconography also enables the visual representation of relationships between variables, not only as the researcher identified them but also as the end user of the display can imagine them. A degree of subjectivity is retained; whatever knowledge is contained in the display, it is therefore neither fixed nor forced upon the reader. In this sense, clustered iconography is an attempt to bridge, visually if not philosophically, positivism and constructivism. The technique should therefore be especially appealing to social science researchers.

In the final analysis, clustered iconography provides researchers, particularly in social sciences, with a new option for illustrating relationships, be they range-dependent, nonlinear, or comprised of interactions between qualitative and quantitative entities. Research questions are rarely simple or able to be summed up as a relationship between two variables and a handful of value labels. Reality does not necessarily cooperate with the analytic tools at the researcher’s disposal and intuitive representation of complexity is an ongoing challenge. Clustered iconography is neither a panacea for all that is wrong with visual displays nor a universal alternative to formulaic-based data analysis. Moreover, it is certainly not a ubiquitous and flawless substitute for statistical manipulation. Rather, clustered iconography is another weapon in an arsenal of techniques.

Footnotes

Acknowledgments

The authors would like to acknowledge the assistance of Ms Deborah J. Schmidle, former librarian at the Caterwood Library, Cornell University. This study would not have come to fruition without Ms Schmidle’s contribution and support.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Jean-Etienne Joullié

Notes

References

American Psychological Association. 1994. Publication Manual of the American Psychological Association, 4th ed. Washington, DC: Author.

Anscombe

F. J.

1973. “Graphs in Statistical Analysis.” American Statistician 27(1):17–21.

Barney

E. C.

Mintzes

J. J.

Yen

C. F.

. 2005. “Assessing Knowledge, Attitudes, and Behavior toward Charismatic Megafauna: The Case of Dolphins.” Journal of Environmental Education 36(2):41–55.

Beninger

J. R.

Robyn

D. L.

. 1978. “Quantitative Graphics in Statistics: A Brief History.” American Statistician 32(1):1–11.

Berkhin

2006. “A Survey of Clustering Data Mining Techniques.” Pp. 25–71 in Grouping Multidimensional Data, edited by Kogan

Nicholas

Teboulle

. Berlin, Germany: Springer.

Bertin

1983. Semiology of Graphics: Diagrams, Networks, and Maps. Madison: University of Wisconsin Press.

Bliss

E. K.

1965. Semantography (Blissymbolics): A Simple System of 100 Logical Pictorial Symbols, Which Can Be Operated and Read Like 1+2=3 in all Languages, 2nd ed. Sydney, Australia: Semantography.

Bonsiepe

. 1965. “Visual/verbal Rhetoric.” Ulm: Journal of the Ulm School of Design 14/15 /16:23–40.

Bresnahan

2011. “An Unused Esperanto: Internationalism and Pictographic Design, 1930-70.” Design and Culture 3(1):5–24.

10.

Brumby

1983. “Concept Mapping: Structure or Process?” Research in Science Education 13(1):9–17.

11.

Buckland

2017. Information and Society. Cambridge: MIT Press.

12.

Cartwright

Cat

Fleck

Uebel

T. E.

. 1996. Otto Neurath: Philosophy between Science and Politics. New York: Cambridge University Press.

13.

Cartmill

Smith

F. H.

. 2009. The Human Lineage. Hoboken: John Wiley & Sons.

14.

Cat

Tuboly

A. T.

, eds. 2019. Neurath Reconsidered: New Sources and Perspectives. Cham, Switzerland: Springer.

15.

Chamorro-Premuzic

2014. “How the Web Distorts Reality and Impairs our Judgement Skills”. The Guardian (13-May), Retrieved March 1, 2018 (https://www.theguardian.com/media-network/media-network-blog/2014/may/13/internet-confirmation-bias).

16.

Chatterjee

Firat

. 2007. “Generating Data With Identical Statistics but Dissimilar Graphics: A Follow up from the Anscombe Dataset”. American Statistician, 61(3):248–254.

17.

Charles

S. Chapin

. 1924. Journal of Education, 99(15):398–398.

18.

Cheng

Simon

H. A.

. 1995. “Scientific Discovery and Creative Reasoning with Diagrams.” Pp. 205–228 in The Creative Cognitive Approach, edited by Smith

Ward

Fink

. Cambridge: MIT Press.

19.

Chernoff

1973. “The Use of Faces to Represent Points in k-dimensional Space Graphically.” Journal of the American Statistical Association 68(342):361–68.

20.

Chew

P. K.

Dillon

D. B.

. 2014. “Statistics Anxiety Update: Refining the Construct and Recommendations for a New Research Agenda.” Perspectives on Psychological Science 9(2):196–208.

21.

Cohen

I. B.

1984. “Florence Nightingale.” Scientific American 250(3):128–37.

22.

Davies

2011. “Concept Mapping, Mind Mapping and Argument Mapping. What are the Differences and Do they Matter?” Higher Education 62:279–301.

23.

De Soete

1986. “A Perceptual Study of the Flury-Riedwyl Faces for Graphically Displaying Multivariate Data.” International Journal of Man-Machine Studies 25(5):549–55.

24.

Digital Universe. 2012. Retrieved December 08, 2019 (https://www.prnewswire.com/news-releases/new-digital-universe-study-reveals-big-data-gap-less-than-1-of-worlds-data-is-analyzed-less-than-20-is-protected-183025311.html).

25.

Dumez

2013. Méthodologie de la recherché qualitative: 10 questions clés de la démarche compréhensive. Paris, France: Magnard-Vuibert.

26.

Eisenberg

Nelson

. 2002. “Public vs. Proprietary Science: A Fruitful Tension?” Daedalus, 131(2):89–101.

27.

European Countries Demand That Publicly Funded Research Be Free. 2018. The Economist (September 15th). Retrieved December 1, 2019 (https://www.economist.com/science-and-technology/2018/09/15/european-countries-demand-that-publicly-funded-research-be-free).

28.

Fitzgerald

S. M.

Jurs

S. J.

Hudson

L. M.

. 1996. “A Model Predicting Statistics Achievement among Graduate Students.” College Students Journal 30(3):361–66.

29.

Goering

Streiner

. 2013. “Reconcilable Differences: The Marriage of Qualitative and Quantitative Research Methods.” Pp. 225–41 in A Guide for the Statistically Perplexed: Selected Readings for Clinical Researchers, edited by Streiner

D. L.

. Toronto, Canada: University of Toronto Press.

30.

Guttman

1944. “A Basis for Scaling Qualitative Data.” American Sociological Review 9(2):139–50.

31.

Hambrick

D. C.

MacMillan

I. C.

Day

D. L.

. 1982. “Strategic Attributes and Performance in the BCG Matrix: A PIMS-Based Analysis of Industrial Product Businesses.” Academy of Management Journal 25(3):510–31.

32.

Hart

1896. “Immigration and Crime”. American Journal of Sociology, 2(3):369–377.

33.

Hellems

M. A.

Gurka

M. J.

Hayden

G. G.

. 2007. “Statistical Literacy for Readers of Pediatrics: A Moving Target.” Pediatrics 119(6):1083–088.

34.

Healy

Moody

. 2014. “Data Visualization in Sociology”. Annual Review of Sociology, 40:105–128.

35.

Holmes

1984. Designer’s Guide to Creating Charts and Diagrams. New York: Watson-Guptill.

36.

Holmes

1993. The Best in Diagrammatic Graphics. Mies, Switzerland: Rotovision.

37.

Hurley

C. B.

2004. “Clustering Visualizations of Multidimensional Data.” Journal of Computational and Graphical Statistics 13(4):788–806.

38.

Ibanez

Schroeder

W. J.

Hanwell

M. D.

. 2014. “Practicing Open Science.” in Implementing Reproducible Research, edited by Stoden

Leisch

Peng

R. D.

. New York: CRC Press.

39.

Inselberg

2009. Parallel Coordinates: Visual Multidimensional Geometry and its Applications. New York: Springer.

40.

Jackman

R. M.

1980. “The Impact of Outliers on Income Inequality.” American Sociological Review 45:344–47.

41.

Joullié

J.-E.

Spillane

. 2015. The Philosophical Foundations of Management Thought. New York: Lexington Books.

42.

Kalleberg

2016. “Question-driven Sociology and Methodological Contextualism.” Pp. 89–107 in Theory in Action: Theoretical Constructionism, edited by Sohlberg

Leiulfsrud

. Leiden, the Netherlands: Brill.

43.

Kinchin

I. M.

Möllits

Reiska

. 2019. “Uncovering Types of Knowledge in Concept Maps.” Education Sciences 9(2):131.

44.

Kinchin

I. M.

Hay

D. B.

Adams

. 2000. “How a Qualitative Approach to Concept Map Analysis can be used to Aid Learning by Illustrating Patterns of Conceptual Development.” Educational Research 42(1):43–57.

45.

Kinross

1981. “On the Influence of Isotype.” Information Design Journal 2(2):122–130.

46.

Kolakowski

1969. The Alienation of Reason: A History of Positivist Thought. New York: Doubleday Anchor.

47.

Kolb

D. G.

Shepherd

D. M.

. 1997. “Concept Mapping Organizational Cultures.” Journal of Management Inquiry 6(4):282–95.

48.

Langley

1999. “Strategies for Theorizing from Process Data.” The Academy of Management Review 24(4):691–710.

49.

Lord

C. G.

Desforges

D. M.

Fein

Pugh

M. A.

Lepper

M. R.

. 1994. “Typicality Effects in Attitudes toward Social Policies: A Concept-mapping Approach.” Journal of Personality and Social Psychology 66(4):658–73.

50.

Lundberg

G. A.

Steele

. 1938. “Social Attraction-patterns in a Village.” Sociometry 1:375–419.

51.

Lupton

1986. “Reading ISOTYPE.” Design Issues 3(2):47–58.

52.

MacKay

D. B.

Villarreal

. 1987. “Performance Differences in the Use of Graphic and Tabular Displays of Multivariate Data.” Decision Sciences 18(4):535–46.

53.

Marro

1899. “Influence of the Puberal Development Upon the Moral Character of Children of Both Sexes.” American Social Journal 5(2):193–219.

54.

Marrus

1984. Building the Strategic Plan: Find, Analyze, and Present the Right Information. London: Wiley.

55.

McFarland

D. A.

Lewis

Goldberg

. 2016. “Sociology in the Era of Big Data: The Ascent of Forensic Social Science.” The American Sociologist 47(1):12–35.

56.

McKim

1972. Experiences in Visual Thinking. Monterey, CA: Brooks/Cole.

57.

McKim

1980. Thinking Visually: A Strategy for Problem Solving. Belmont, CA: Wadsworth.

58.

Medin

D. L.

Wattenmaker

W. D.

Hampson

S. E.

. 1987. “Family Resemblance, Conceptual Cohesiveness, and Category Construction.” Cognitive Psychology 19(2):242–79.

59.

Miles

M. B.

Huberman

A. M.

. 1984. “Drawing Valid Meaning from Qualitative Data: Toward a Shared Craft.” Educational Researcher 13(5):20–30.

60.

Modley

1938. “Pictographs Today and Tomorrow.” The Public Opinion Quarterly 2(4):659–64.

61.

Molloy

J. C.

2011. “The Open Knowledge Foundation: Open Data Means Better Science.” PLoS Biology 9(12):1–4.

62.

Morris

C. J.

Ebert

D. S.

Rheingans

P. L.

. 2000. Experimental Analysis of the Effectiveness of Features in Chernoff Faces. Paper presented at the 28th AIPR Workshop: 3D Visualization for Data Exploration and Decision Making.

63.

Muurlink

O. T.

Islam

M. Z.

. 2010. “Charting the Impact of Climate on Climate-displaced Families in Bangladesh.” Centre for Work, Organisation and Wellbeing, Griffith University. Unpublished manuscript.

64.

Muurlink

Wilkinson

Peetz

Townsend

. 2012. “Managerial Autism: Threat–Rigidity and Rigidity’s Threat.” British Journal of Management 23(S1):74–87.

65.

Neurath

Kinross

. 2009. The Transformer: Principles of Making Isotype Charts. London: Hyphen Press.

66.

Neurath

1936. International Picture Language: The First Rules of ISOTYPE. London: K. Paul, Trench, Trubner.

67.

Neurath

1939. Modern Man in the Making. New York: Alfred A Knopf.

68.

Neurath

2010. From Hieroglyphics to ISOTYPE: A Visual Autobiography. Edited by Eve

Burke

. London: Hyphen Press.

69.

Nöth

2001. “Semiotic Foundations of Iconicity in Language and Literature.” The Motivated Sign: Iconicity in Language and Literature 2:17–28.

70.

Novak

J. D.

1990. “Concept Maps and Vee Diagrams: Two Metacognitive Tools to Facilitate Meaningful Learning.” Instructional Science 19(1):29–52.

71.

Novak

J. D.

2009. Learning, Creating, and Using Knowledge: Concept Maps as Facilitative Tools in Schools and Corporations, 2nd ed. New York: Routledge.

72.

Onwuegbuzie

2004. “Academic Procrastination and Statistics Anxiety.” Assessment & Evaluation in Higher Education 29(1):3–19.

73.

Onwuegbuzie

Dickinson

W. B.

. 2008. “Mixed Methods Analysis and Information Visualization: Graphical Display for Effective Communication of Research Results.” The Qualitative Report 13(2):204–25.

74.

Paechter

Macher

Martskvishvili

Wimmer

Papousek

. 2017. “Mathematics Anxiety and Statistics Anxiety: Shared but also Unshared Components and Antagonistic Contributions to Performance in Statistics.” Frontiers in Psychology 8:Article 1196.

75.

Pauwels

2010. “Visual Sociology Reframed: An Analytical Synthesis and Discussion of Visual Methods in Social and Cultural Research.” Sociological Methods & Research 38(4):545–81.

76.

Petrov

2019. Big Data Statistics 2019. Retrieved November 25, 2019 (https://techjury.net/stats-about/big-data-statistics).

77.

Playfair

. 1786. The Commercial and Political Atlas: Representing, by Means of Stained Copper-plate Charts, the Exports, Imports and General Trade of England, at a Single View. London.

78.

Raciborski

2009. “Graphical Representation of Multivariate Data Using Chernoff Faces.” Stata Journal 9(3):1–14.

79.

Rogers

Puryear

Root

. 2013. “Infobesity: The Enemy of Good Decisions”. Retrieved March 1, 2018 (https://www.bain.com/insights/infobesity-the-enemy-of-good-decisions/).

80.

Rosch

Mervis

C. B.

. 1975. “Family Resemblances: Studies in the Internal Structure of Categories.” Cognitive Psychology 7(4):573–605.

81.

Rzhetsky

Evans

J. A.

. 2011. “War of Ontology Worlds: Mathematics, Computer Code, or Esperanto?” PLoS Computational Biology 7(9):1–4.

82.

Siegel

1957. “Nonparametric Statistics.” The American Statistician 11(3):13–19.

83.

Siew

C. S. Q.

McCartney

M. J.

Vitevitch

M. S.

. 2019. “Using Network Science to Understand Statistics Anxiety among College Students.” Scholarship of Teaching and Learning in Psychology 5(1):75–89.

84.

Sletto

R. F.

1939. “A Critical Study of the Criterion of Internal Consistency in Personality Scale Construction.” American Social Review 1:61–68.

85.

Spence

2005. “No Humble Pie: The Origins and Usage of a Statistical Chart.” Journal of Educational and Behavioral Statistics 30(4):353–68.

86.

Strasak

A. M.

Zaman

Marinell

Pfeiffer

K. P.

Ulmer

. 2007. “The Use of Statistics in Medical Research.” The American Statistician 61(1):47–55.

87.

Suber

2012. Open Access. Cambridge: The MIT Press.

88.

Trepagnier

. 2002. “Mapping Sociological Concepts.” Teaching Sociology 30(1):108–119.

89.

Trochim

W. M. K.

1989. “An Introduction to Concept Mapping for Planning and Evaluation.” Evaluation and Program Planning 12(1):1–16.

90.

Tufte

E. R.

1983. The Visual Display of Quantitative Information. Cheshire: Graphics Press.

91.

Tufte

E. R.

1990. Envisioning Information. Cheshire: Graphics Press.

92.

Tufte

E. R.

1997. Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire: Graphics Press.

93.

Tufte

E. R.

2007. Beautiful Evidence. Cheshire: Graphics Press.

94.

Turns

Atman

C. C.

Adams

. 2000. “Concept Maps for Engineering Education: A Cognitively Motivated Tool Supporting Varied Assessment Functions.” IEEE Transactions on Education 43(2):164–73.

95.

Uebel

T. E.

1991. Rediscovering the Forgotten Vienna Circle: Austrian studies on Otto Neurath and the Vienna Circle. Dordrecht, the Netherlands: Kluwer Academic.

96.

Van de Ven

A. H.

Poole

M. S.

. 1990. “Methods for Studying Innovation Development in the Minnesota Innovation Research Program.” Organisation Science 1(3):313–35.

97.

Vitevitch

M. S.

2016. “Network Analysis and Psychology.” Pp. 130–46 in Handbook of Applied System Science, edited by Neal

Z. P.

. New York: Routledge.

98.

Wainer

2000. Visual Revelations. Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot. Mahwah, NJ: Laurence Erlbaum and Associates.

99.

Ward

T. B.

Patterson

M. J.

Sifonis

C. M.

. 2004. “The Role of Specificity and Abstraction in Creative Idea Generation.” Creativity Research Journal 16(1):1–9.

100.

Wheelan

2013. Naked Statistics: Stripping the Dread from the Data. New York: Norton.

101.

Wurman

R. S.

1989. Information Anxiety. New York: Doubleday.

102.

Yin

R. K.

2009. Case Study Research: Design and methods, Applied Social Research Methods Series, Vol. 5. Thousand Oaks, CA: Sage.