Abstract
In the context of recent debates about the ‘data deluge’ and the future of empirical sociology, this article turns attention to current activities aimed at achieving far-reaching transformations to the World Wide Web. The emergent ‘Semantic Web’ has received little attention in sociology, despite its potentially profound consequences for data. In response to more general recent calls for a critical politics of data we focus our enquiry as follows: first, we explore how sociological analysis of the artefacts and tools that are currently being developed to build a Semantic Web helps us to uncover the potential effects of this ‘next generation’ web on knowledge, data and expertise; and second we consider what a Semantic Web might offer to sociological research. We conclude by considering some implications of multidisciplinary engagement with the Web for the future of sociology.
Introduction
The recent proliferation of digital technologies has led to some remarkable transformations in the quantity and nature of social data which, it has been suggested, may have some significant implications for the future of sociology. Writing in this journal, Savage and Burrows (2007, 2009) have drawn attention to the ‘deluge’ of data that is now generated beyond the academy through the routine deployment of digital technologies, taking place at a pace and on a scale that dwarfs academic research. More generally, Savage and Burrows suggest that in ‘a world inundated with complex processes of social and cultural digitization’ (2009: 763) sociologists may be losing ‘… whatever jurisdiction we once had over the study of the social as the generation, mobilization and analysis of social data becomes ubiquitous’ (2009: 763). These claims have provoked a heated debate about the politics of data and method, focusing in particular on commercial transactional and administrative data and their implications for sociology, at least as we know it (Crompton, 2008; Webber, 2009).
In this context, our article focuses on emergent changes in the nature and structure of Web-based data. Whilst sociologists have been debating the significance of corporate transactional and administrative data – mostly unavailable for sociological analysis – a social movement has gathered pace around the online publication of ‘open data’, offering access to the raw materials gathered by researchers, government agencies and, latterly, businesses: more data for the deluge. This is an interesting phenomenon in itself but it is not the main focus of our article. Rather, we draw attention to the role of open data in more fundamental changes to the World Wide Web. Specifically, open data may be one way to enable a re-orientation of the Web so that it is no longer built on links between documents – as it is today – but, rather, operates by linking heterogeneous data, quantitative and qualitative, to particular entities: things, concepts, people, or places (for example). Known variously as Web 3.0, the Web of linked data, or the Semantic Web, these changes are core to research and debate in computer science but have received little attention within sociology, despite recent sociological emphasis on the importance of networked information (Castells, 1996, 1997, 1998; Lash, 2002), the place of computational infrastructures in mediating this (Hayles, 2006) and increasing recognition of how automated (software driven) information flows shape lives (Alleyne, 2011; Thrift and French, 2002). It is difficult to keep up with empirical developments within these broader trajectories, but we should endeavour to do so, particularly – we suggest – in relation to recent claims that the Semantic Web could constitute a step change in the global networking of information.
In response to Savage and Burrows’ (2007, 2009) calls for a critical politics of data, we focus our sociological attention on the Semantic Web along two lines of enquiry. First, we explore how a sociological analysis of this emergent infrastructure helps to uncover its potential effects on data, knowledge and expertise. As sociologists we know that technical developments are neither inevitable nor neutral. The making of a Semantic Web will involve politics and power, difficult choices and contingent outcomes. It is important that we understand how this new Web is being constructed and what that means for the future of ‘the largest human information construct in history’. 1 Second, we consider what a Semantic Web might offer sociologists and how sociology might contribute to its development. In turn, both lines of enquiry raise questions about the expertise required to engage with the Semantic Web. Unless sociologists are prepared (and able) to acquire sophisticated computational expertise, we must collaborate with computer scientists. This article is built on the early stages of such collaboration. We are currently working together, in the Web Science Doctoral Training Centre at the University of Southampton, to develop multidisciplinary curricula and research that transcend the usual disciplinary boundaries. We have experienced first-hand the challenges arising from the different epistemologies, histories and languages of sociology and computer science, which raise questions about the wider politics of knowledge and dynamics of power and identity that arise in multidisciplinary work. Nonetheless, we are now pooling resources to interrogate the implications of an emergent Semantic Web. For us, this initial collaborative discussion is promising – indeed, we hope that it may pave the way for collaborative development of semantically enabled web applications in the future.
We begin below with a description of open data, linked data and the promise of the Semantic Web, as presented by its proponents. In what follows, we analyse the conventions, tools and structures proposed to build a Semantic Web and consider the ontological, epistemological and practical implications of imposing these structures onto raw data. We continue by exploring how sociologists might engage with the development of a Semantic Web, not only as users but as active participants in its construction, and briefly explore some of the issues involved in collaborative work at the computer science/social science interface. We conclude by returning to questions about the future of sociology in this emerging digital landscape.
Open Data, Linked Data and the Semantic Web: The Promise
The potential transformations that we focus on here begin with the remarkable recent movement towards open data. Spearheaded by Tim Berners-Lee – often described as the inventor of the Web – the call for scientists, communities, governments and even businesses to make their data available on the Web has resonated globally (Berners-Lee, 2009, 2010). Governments have led the way (Huijboom and Van den Broek, 2011): the data.gov.uk website set up in January 2010 lists 5300 data sets and rising and the UK government has recently announced £10m for a new Open Data Institute, whilst many other nations are also competing in a ‘benign race to the top’ of the open data league (Shadbolt, 2010). Some of this is data we have already seen – now available electronically and under open licence – but some is new, for example http://data.gov.uk/dataset/coins (accessed 14 May 2012) provides data on UK public spending and http://www.police.uk (accessed 14 May 2012) presents street level crime data. But for open data proponents, this is only the first step. To make data more useable, there is encouragement to publish ‘raw’ data, in a spreadsheet for example, rather than as a scanned pdf, and to publish in non-proprietary formats – CSV rather than Excel, 2 for instance – allowing access from multiple platforms. Step by step, the ‘open-ness’ of data is enhanced with increased access and fewer restrictions on use.
There are important questions about the open data movement and the sometimes hyperbolic claims of transparency and democratisation that are made (Beer, 2009).Who will publish what kinds of data? Will businesses put their data into the public realm? Those producing transactional data have been notably reluctant to do so, so far. What is open (and what is not)? For example, the British government withdrew some of the COINS (public spending) data shortly after publication. How raw is the published data? To protect individual identities, the www.police.uk statistics have condensed crimes into six categories making it impossible to distinguish certain types of crime; for example, shoplifting and sexual assault fall into the category of ‘other’ along with drug dealing, bigamy and 1288 other offences. Will ‘open’ mean open to everyone? There is debate in France over licensing access, which could see ‘behaviour checks’ on individuals and organisations wishing to reuse information from public bodies. 3
These questions aside, the rapid growth of open data also has some fundamental implications for the World Wide Web. Currently, the Web is built as a system that enables us to share documents (which appear as Web pages) and to make links between them. It works because each document has a unique identity (a Uniform Resource Locator, URL) and surfing the Web is akin to moving amongst a library of documents. We rarely see raw data. Documents contain data, of course, but in using the Web we do not make links between different sources of data about a particular entity but between documents containing information on particular data entities. The primary unit of the Web is the document. By contrast, once data is freely available in raw form, as data, it becomes possible to conceive of a Web that makes links between data, rather than between the documents that contain data. The primary unit of the Web becomes ‘data entities’ – real world things, such as people, places or products. Once these data entities are established, multiple and heterogeneous data can be linked to them and, further, complex links between entities, and what is known about them, can be made. For Berners-Lee (2009), this promises ‘a complete sea change’ in the way that data are produced and might be used on the Web. The Web might be transformed from a library of documents into a single linked data base.
Delivering the Promise? Building the Semantic Web
However, impressive as this seems, the concept of a Semantic Web is still relatively new and there is certainly nothing inevitable about it. Rather, as we know from the sociology of scientific knowledge and science and technology studies, scientific and technical developments are driven by heterogeneous human and non-human actors with particular resources and affordances, shaped by wider economic, social and political contexts, and outcomes are contingent and unpredictable (see for instance, Bijker et al., 1989; Feenberg, 1995; Flanagin et al., 2010; Latour, 1987; Mackenzie and Wajcman, 1999). It is salutary to note that Tim Berners-Lee originally envisaged the HTTP protocol as a means for physicists to share large data sets and certainly did not predict that 20 years later there would be nearly 600 million websites or that the Web would become so central to everyday life for billions of people. Clearly, we cannot be certain that the Semantic Web will develop as its proponents claim or to the scale aspired. Nonetheless, this article has been provoked by our observation of the efforts that governments, leading computer science academics and, increasingly, businesses are making to promote and support the development of linked data (Tinati et al., 2011), driven by promises of transparency, breaching knowledge silos and releasing latent value, in which linked data plays a particularly prominent role. Even assuming that the network of actors involved holds firm, a great deal of work remains to be done in order to deliver a Semantic Web. The promise must be turned into an operational infrastructure: artefacts, rules and tools must be built and – of course – used (Fiveash, 2011).
Because of the computational expertise required, this work falls to engineers and computer scientists, where it is governed by technical expertise. This cannot be understood solely within an engineering paradigm. Rather, as recent work in related fields has shown, it is a socially saturated practical and organisational achievement (Pike and Gahegan, 2007; Randall et al., 2011) that entails profound epistemological and ontological considerations (Brewster and O’Hara, 2007). Not least, for instance, it involves managing issues such as trust and proof. Furthermore, the knowledge practices involved shape what is known and knowable (Ribes and Bowker, 2009). As we outline below, this may have important consequences for the nature of data and what we can do with it in the emergent Semantic Web. More generally, as Lash suggests, software algorithms contain generative rules: ‘… virtuals that generate a whole variety of actuals. They are compressed and hidden … yet this … type of generative rule is more and more pervasive in our social and cultural life’ (2007: 71). Specifically, Lash’s (2007) point is that developing algorithms is not simply an opportunity for invention but also a route through which power to define and know is mobilised (however unreflexively). In what follows, we describe three key components of a semantically enabled Web and consider, in each case, the wider implications of these.
(i) Naming Data Entities
For its proponents, the Semantic Web could ‘provide a common framework for the liberation of data’ (Berners-Lee et al., 2006a: 20). This begins by giving data entities an independent existence, free from the constraints of any document within which they might appear. Rather than organising the Web around links between documents with URLs, the Semantic Web is organised around entities, each with its own Uniform Resource Identifier (URI). Effectively, URIs ‘point at’ particular entities represented within a data set. For example, in census data each individual would have a URI but so too might places of residence or the Registrar General’s social class categories. If the same URI is used to point at a particular entity – a person, place or social class category – in multiple data sets, it becomes possible to link data from these different sources around a common entity. Whilst there is debate about the most appropriate way to name URIs, the front runner is the Resource Description Framework (RDF) which offers a standard way of describing entities: rdf//: provides a protocol (of sorts) equivalent to the http:// protocol for web pages. Publishing data in a consistent format is key to making a transition from a Web of documents to a Web of data. Whether or not the authors of a particular data set engage with data in other sets is irrelevant. Rather, the point is that links may now be made by anyone interested in that particular entity or, indeed, its relations with other entities. If all data using the Registrar General’s social class schema made common use of shared URIs it would become possible to link all Web-based data on social class, as measured in this scheme.
The naming of data entities may appear to be straightforward. However, it rests on an understanding of ‘knowledge as facts’ (Brewster and O’Hara, 2007), the assumption that real world things are objectively known and knowable as representations within a global information infrastructure. This is more problematic than it might seem. What ‘counts’ as an entity? And how precisely can an entity be defined? Even apparently simple entities like places are defined in diverse ways. For instance, is ‘London’ the administrative and political boundaries of the city as a whole; the much smaller ‘City of London’; or a more diverse historical, cultural and social set of places? Or does London become reduced to its constitutive postcodes? And this is just a simple example. As sociologists, we might be interested in other kinds of entities, like ‘class’, ‘race’ or ‘crime’ and we are only too aware of the difficulties of defining these. Furthermore, there are important questions about which entities will become ‘known’ and which will not. Making some things ‘known’ tends to obscure other things and, indeed, ways of knowing (Bowker and Star, 1999; Strathern, 1990). Furthermore, drawing boundaries around discrete entities rests on the assumption that each has an autonomous and independent existence – an assumption that the world is made up of clearly defined things (Lakoff, 1987). This ‘agential cut’ (Barad, 2003: 815) stands in stark contrast to relational ontologies which insist that entities are produced through their relations with other entities in particular networks (Latour, 2005) and contexts (Mitchell, 2002).
(ii) Structuring Data
In fact, whilst it is possible for entities to be defined and named in multiple ways using Semantic Web techniques, this is regarded as a technical problem (not a philosophical solution) because it limits the ability of the Semantic Web to link data on the ‘same’ entity. For the Semantic Web to deliver its technical promise data must be structured. First, it is important that URIs are used consistently, which, taken to its logical conclusion, would mean a global syntax convention such that URIs would ‘… acquire global scope and are interpreted consistently across contexts’ (Berners-Lee et al., 2006a: 25). Indeed, extensive research addresses this co-referencing problem (where two URIs represent the same ‘thing’) (Glaser et al., 2008; Nikolov et al., 2007). Widespread incorrect use of the ‘same as’ relation, a structural relationship designed to associate entities considered identical, has illustrated the problems associated with simple technical approaches attempting to solve deeply philosophical issues (Halpin et al., 2010).
Second, whilst URIs are the basic building block for the Semantic Web, its promise depends on linking these in meaningful and useful ways: structuring the relations between described entities. Berners-Lee et al. (2006a) describe two different ways of doing this. Data may be ‘tagged’ by communities of users from which aggregate patterns of linkages between entities may emerge. This is ‘bottom up’ data structuring, emergent from what people do, a kind of crowd sourcing, ‘folk-taxonomy’ or ‘folksonomy’. Alternatively, ‘ontologies’ may be built by experts and imposed ‘top-down’ on data to describe:
… the sets of entities that make up the world-in-a-computer, and circumscribe the sets of relationships that they can have with each other. (Ribes and Bowker, 2009: 199)
According to Berners-Lee et al. (2006a), formal ontologies offer the greatest purchase for realising the potential of the Semantic Web since they offer systematic ‘… attempts to regulate part of the world of data to allow mappings and interactions’ (2006a: 34). However, ontology building is not a simple or solely technical matter. Recent ethnographic research on ontology building in particular scientific domains such as earth sciences (Ribes and Bowker, 2009) and cell biology (Randall et al., 2011) is highly instructive. Specifically, this research shows that the development of ontologies is shaped by social networks as well as by institutional and disciplinary hierarchies (which ‘experts’ are involved and why?) and that finding consensus amongst those involved is not always easy or necessarily satisfactory (Randall et al., 2011). Furthermore, it is clear that the practice of ontology building in itself has effects; not least, it demands that knowledge is represented in prescribed ways, such that what can be known and/or knowable becomes re-orientated practically and epistemologically to the demands of ontology building tools and the principles of computational thinking that underlie them (Ribes and Bowker, 2009). In short, ontologies embody a set of epistemological and ontological commitments (Brewster and O’Hara, 2007). As sociologists we too are involved in sorting and classifying our data within projects, but ontology building represents an endeavour to temporarily stabilise fields of knowledge for particular academic communities. If the Semantic Web is to deliver some of the promises made for it, this will depend on consistent methods of data representation enabling linkages between multiple ontologies: a scaling up of the effects of those processes described in this ethnographic work on ontology building, with all the implications that this carries.
(iii) Processing Data
Given the potential scale of data on the Semantic Web, the importance of making the data available ‘machine processable’ cannot be underestimated. Even now, the Semantic Web is growing at a rapid pace: it has doubled every 10 months since 2007. In 2011 there were over 200 data sets in the ‘linked data cloud’ comprising 25bn interlinked pieces of data. A fully fledged Semantic Web defies easy quantification. Potentially, it is both awesome and overwhelming. However, it is not envisaged that humans will engage directly with this new world of networked data. Rather, the Semantic Web will allow machine-driven processing of data at a scale and pace beyond immediate human capabilities. Once standardised machine recognisable URIs are established computational tools can be built that ‘… go beyond display and instead integrate and reason about data across applications’ (Berners-Lee et al., 2006a: 19). To achieve this, tools must be built to ‘… process together data in heterogeneous formats, gathered using different principles for a variety of primary tasks’ (2006a: 19) allowing the integration and visualisation of data from across the Web. Again, these tools will have a profound mediating effect on our engagement with the Semantic Web. Unlike the Social Web or Web 2.0, the benefits of the Semantic Web may not be readily visible to most Web users. Our engagement with a Semantic Web will be driven by computational tools, effectively making choices and decisions on our behalf to deliver outcomes. We, as users, will have little information about these processes, however high speed and efficient the service turns out to be. In a broader sense this is not a new phenomenon. For many years, software systems have ordered and divided the world for us in ways that we are not directly aware of – Google for example (Pariser, 2011) making decisions on ‘our’ behalf. But if the Web, as we know it, becomes the Semantic Web this will mark a significant extension of this ‘technological unconscious’ (Beer, 2009).
In short, the processes involved in naming, structuring and processing data for and by the Semantic Web are profoundly social with tremendous sociological implications. In its emergent form, the Semantic Web offers us the opportunity to explore how these processes are shaping data and the Web as a global data infrastructure. Drawing on the research on ontology building described above we must direct attention to the construction and operation of multiple intersecting ontologies, but also ask similar questions about the practical and organisational work involved in establishing protocols and computational tools that will be necessary to produce and interrogate linked data at Web scale. We need to develop sustained analysis of the ontological and epistemological commitments that are being built into the material artefacts of the Semantic Web and reproduced in the everyday functioning of the new protocols and tools, carrying ontological and epistemological consequences beyond their origins: embedded in the machinery of the Web with potentially significant consequences, for us, as its future users.
Engaging with the Semantic Web
There is more at stake here than the need for a critical sociological analysis of the Semantic Web. However important this is, something else is clear: a semantically enabled Web also holds promise for sociological research, first in opening up a new world of data for analysis. As data are liberated from documents the Semantic Web could offer new analytical resources for sociological practice – not least, enabling us to make new linkages between heterogeneous data, both qualitative and quantitative, academic and popular, and explore and reveal linkages between data and entities in ways that we have not been able to do before. If the linking of people and residence via utility accounts or using commercial geodata applications is exciting (Savage and Burrows, 2009) the Semantic Web promises much more.
Thus far, the illustrations of semantically enabled data have been dominated by geographical mapping, not least because of the accessible interface and functionality of Google maps. One current ‘poster child’ for the Web of linked data is a map of cycling accidents in central London. The data on accidents was published on a blog, in RDF format and within 48 hours an entrepreneurial individual had linked this data via postcode to Google maps and published an online interactive map for users (Figure 1).

Cycling accidents mash up. Source: www.citybeast.londoncyclists.html
Reminiscent of the pioneering work on spatial configuration of poverty in London carried out by Charles Booth in the early 19th century, another example of linked data concerns the geography of water supply in Zanesville, Ohio. Here, digitally held data from the water supply companies were linked to data on ethnicity by residence to show systematic inequality in the water supply to black and ethnic minority households – a revelation that resulted in the water company settling a class action out of court for $10.9m. 4 In both cases, one freely available data set was linked with another to produce something new, and valuable. Furthermore, these illustrations show how quantitative data sets might be mapped and show the potential for linking more diverse forms of data (including crime statistics, environmental information, archived photographs and social history projects) to specific places. 5
However, whilst residence and wider questions of spatiality are certainly important to sociology this is not the only reason that sociologists might want to engage with the Semantic Web. We can do more than this. If we think, for example, of existing methods for researching social class and the forms of data that these produce we can begin to see the potential. Surveys (the UK census, for example) offer large-scale data sets but are inevitably limited in the depth of information that they can elicit and are inflexible (data on class, for instance, is commonly collected and/or recorded according to top-down, broad and/or pre-defined categorisations) and difficult to connect to other, related sources of data. Difficult, but not impossible: indeed, there are good examples of the insights that can be achieved by linking information about individuals across data sets (Boyle et al., 2011; Sabel et al., 2009). But linking data from specified sets is not the same as developing linked data: that is, data using shared URIs which is linkable with any other relevant data set using the same conventions.
Potentially, the Semantic Web offers integration, across multiple data sets, at scale and speed, organising data around named entities – people or postcodes for instance. In principle this would allow us to integrate sociological knowledge about class, including qualitative data, from interviews for example, to the myriad of datasets and analyses that reference the Registrar General’s classification of socio-economic class. This could (re)connect data to offer nuanced and emergent accounts of class that might (begin to) transcend the limitations of different methodologies. If we add to this the opportunities that already exist for Web 2.0 ‘users’ to add data online and to interact – via Facebook, Twitter or other forms of social media – there is potential to augment our data on class with dialogic and dynamic information.
To do this, we need the appropriate URIs, perhaps we will want to construct our own ontologies, and we will certainly require computation to engage with these diverse data sources. There is no reason why any of these will emerge from within computer science. The mapping applications described above, and more practical problem-orientated applications (for example integrated public transport data) already do a good job of illustrating the technical affordances of the Semantic Web. Meanwhile, although there is a growing body of computer science research exploring how to re-present the Semantic Web in human readable forms (Huynh et al., 2007), these interfaces are restricted to simple reconfiguration and rendering of linked data and do not offer – or indeed, seek to provide – mechanisms for more complex sociological exploration and analysis. If we want to use the Semantic Web to address sociological questions, then we should get involved in building the artefacts and tools that will drive the Semantic Web. Indeed, if we do not, it is possible that the current drive to open up data will render future Web-based data less transparent and useable to us than it is at present. Whilst current calls for ‘raw data now’ (Berners-Lee, 2009, 2010) seek to open up data for everyone, the segue from open data to linked data and the Semantic Web means that data are increasingly mediated by technical structures (RDF, URIs, ontologies) and tools which may, effectively reduce the usability of this data by non-technical experts. Whilst computer scientists have produced a 5-star scheme to rank the technical usability of Web-based data (Table 1), 6 a non-technical usability ranking might look rather different, as increased technical mediation reduces the transparency of these data. The rhetoric of ‘openness’ may, paradoxically, mean less openness for some.
Technical ratings and non-technical usability (adapted from http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/)
In short, there is a case for sociological engagement with linked data and endeavours to build a Semantic Web. However, engaging with these developments takes expertise of a kind that sociologists rarely possess (Graham, 2005). To take even a relatively straightforward example of raw data, the recent publication of UK government spending data runs to 44Gb of data, prompting comments on data.gov.uk such as these: ‘raw data: we want someone to digest it’; ‘transparency is great but only if people can find what they want’. 7 And before we start thinking that sociologists are the people for the job, it is important to realise that even the most superficial foray into the data assumes a basic knowledge of RDF and its querying language SPARQL. If we are to engage in both critical sociological analysis and explore the affordances of the Semantic Web for sociological practice we require expertise from beyond the usual confines of sociological knowledge. Specifically, it will be essential to establish collaboration with computer scientists and informatics experts. This means more than getting technical support. Analysing the artefacts that ‘build’ a Semantic Web and using linked data require collaborative research practice and analytical approaches beyond the ‘sum of the parts’ from computer science and sociology. This will not be without its challenges. In the final section of this article we reflect on our experience in the new cross-disciplinary field of ‘Web Science’.
Web Science
Although collaboration across the social and natural sciences is hardly new, in a world where the nature and structure of our data and information practices are changing so profoundly this becomes more a necessity than a choice of innovative topic for study. Indeed, Savage concludes his analysis of sociology in the 20th century with this call:
If there is a future for the social sciences, it consists in forming intellectual and technical alliances with ways of knowing – from humanities, sciences, and informational systems – with which they are only currently weakly affiliated. Whether they have the inclination, aptitudes or resources to do this remains to be seen. (2010: 249)
Interdisciplinary collaboration across the natural and social sciences has a troubled history, most notably represented in the ‘science wars’ of the 1990s. Pitching positivism against post-foundationalism, scientists defended their methodologies and theories against critiques that claimed to reveal the social construction of scientific knowledge in the everyday, pragmatic and contingent practices of the laboratory (Gross and Lewis, 1997; Labinger and Collins, 2001; Latour, 1987). Web Science has emerged more recently as a cross-disciplinary collaboration, initiated by computer scientists on the understanding that the Web is a complex socio-technical phenomenon which requires new forms of interdisciplinary collaboration to apprehend it and to (continue to) build it in socially responsible ways (Berners-Lee et al., 2006; Hendler et al., 2008). Web Science begins, then, with an implicit commitment to transcend the legacy of the science wars but, of course, this is not easy. Just because we are all interested in ‘the Web’ doesn’t mean that we are necessarily talking about the same things or, indeed, have the same goals.
Borrowing briefly from a Bourdieusian vocabulary (rather than the philosophical vocabulary of the science wars), sociology and computer science each operate with their own distinctive forms of cultural capital – valued knowledge – linked to specific forms of engagement with artefacts – tools and texts, for instance – and to social norms, a recognisable habitus. Together these are building a new field of practice – Web Science. Here, power relations, forms of knowledge and disciplinary identities are negotiated, shaping how we work on and around the bigger theoretical questions which may, or may not, continue to divide us into disciplinary silos. In this new space, none of us are (immediately) ‘at home’ and the ‘strange and contingent nature’ of each discipline is apparent to the other (Bowker and Star, 1999). This setting provides the opportunity for critical examination of each other’s epistemological, ontological and methodological approaches, an opportunity to learn and develop something new, but is also a source of anxiety and power politics. Exactly how this plays out – which forms of capital will prove to be an asset, how the clash of habitus will be negotiated, how this new field of Web Science will shape up – remains to be seen.
Whilst sociologists might problematise the ‘real world’ entities that sit at the base of the Semantic Web or engage with the relational ontologies (in a social science sense) of agential realism or actor network theory, computer scientists might insist that pragmatic decisions, for example about the structure of data, have to be made in order to build systems. In this context it may be difficult for sociologists to find a credible voice, to render their cultural capital valuable in technology-heavy settings. We do not (most of us) have the cultural capital to engage in sustained technical debate; but it can be hard to present our own expertise as anything more important than ‘talking to people’ or ‘common-sense’, things that computer scientists can do perfectly well without us! Drawing on complex concepts and theories runs the risk of being inaccessible whilst not doing so risks a denial of the intellectual heritage of our discipline. Conversely, computer scientists are often driven by the need to build things and although the stated aims of Web Science are to ‘understand the Web’ the subtext to this can become ‘so we can fix it’ (Hendler et al., 2008).
The development of ‘Web Science’ aims to draw equally on computational sciences, social sciences and the arts and humanities to transcend such divisions or at least engage in critical reflection on these as they operate in practice. The point, we suggest, is to develop research practice that effectively appreciates the Web as a socio-technical phenomena. To do so, we must find spaces where the instabilities that are inherent in emergent communities of practice can be validated, where our instincts to close this down with the old certainties from our more familiar fields of practice can be avoided and where we can develop the languages and compromises that will enable us to take this forward, however haltingly.
Conclusion
Sociologists rarely think about the protocols and standards or the artefacts of data and method which inhere in the expertise, institutions and tools that underpin the Web. This article has tried to show the importance of doing so, especially now. As momentum builds around transformation towards a Semantic Web we have the opportunity to explore how our future Web is being designed and built and to analyse the possibilities that this might open up and close down. In this article we have concentrated on one aspect of this: specifically, the politics of data and expertise involved in the conceptualisation and engineering of a Semantic Web. This poses both challenges and opportunities to sociology. The ‘naming’ of entities and building of ontologies may appear at once as an extraordinary global rationalisation of what counts, and how information flows, and at the same time, as a potentially revolutionary opportunity to liberate data from the structured confines of particular documents, rendering it flexible and dynamic, open to multiple new (and older) interpretations. In these early days the Semantic Web, and the artefacts that comprise it, is neither one thing nor the other and is potentially both. How things turn out is an empirical question: a question that we should pay attention to.
It is important that we work to make the social construction of the Semantic Web visible: to ensure that the micro-politics of its artefacts are understood as politics, representing choices and interpretations, rather than as neutral fact or engineering design. Our everyday lives – as sociologists and as citizens – are increasingly entangled with the Web and accordingly it is our responsibility to ensure that we understand this phenomenon and that we skill ourselves to play an active part in its future. We appreciate that this is not going to be easy. But, if we don’t get involved we risk ceding the field to a tsunami of positivism tied to the ascendancy of computer science and/or other technical forms of cultural capital in the digital age.
What then for the future for sociology as a discipline and indeed the wider disciplinary settlement between the humanities, the natural sciences and the social sciences in this context? Perhaps it is necessary to state the obvious: working with the Web, whether as a data source or analytical resource, is clearly not the only thing that sociologists will be doing in the future. This is not about a monolithic vision for sociology; indeed we share Crompton’s (2008) commitment to the maintenance of a heterogeneous sociology. Furthermore, there will of course be shortcomings and weaknesses in the Web-based data that we do work with (for example, in terms of what data exist and about what kinds of entities) that will demand original empirical research, maybe using the methods that were so central to the evolution of our discipline in the 1960s, but also using the range of more innovative methods (visual methods, arts based methods, correspondence analysis and so on) that have become more popular recently. Similarly, and perhaps ironically, researching the evolution of the Web itself will require some original empirical research beyond the data that are stored and circulated within its own architecture.
This said, it is clear that the changes underway to the Web are of concern to a wider audience than scholars of science and technology. As we have argued, these are changes with implications for all empirical sociologists not just because of the intense proliferation of data – way beyond commercial transactional and administrative data – but because they represent an evolution in our global information structure. Here we have concentrated on the issues involved in building a Semantic Web, but an operational Semantic Web raises other enormously important issues for sociologists, amongst others, to consider not least in relation to questions of privacy, which become paramount in a world where heterogeneous data can be linked to particular entities – individuals for instance – whether they like it, or even know about it.
Rather than analysing these issues from the ‘outside’, we should find ways to engage in the evolution of the Web. As Feenberg puts it, technical developments are ‘underdetermined’ and depend on ‘… the fit between them and the interests and beliefs of the various social groups that influence the design process’ (1995: 4). What happens is shaped by who is involved and sociologists should ensure that we have a voice as the artefacts, tools and protocols of the next generation Web shape up. Now is the time to engage, whilst the Semantic Web is embryonic, rather than realising another 10 years down the line that it’s too late.
This will require new skills and knowledge and interdisciplinary collaboration which may produce new kinds of methods, data and indeed theories. These might not look – or be claimable – as uniquely or solely sociological. This does not mean abandoning our sociological imagination but, we believe, it does mean overcoming the ‘silo-ing’ of knowledge into discrete disciplines in order to recognise ‘the mixed way things happen’ (Mitchell, 2002: 52) and that this overlaps, even transcends, the disciplinary carve-up of knowledge, which we should see as the outcome of social and political struggles for identity, power and resources rather than any ‘natural’ division of knowledge and expertise. This should not alarm us. For all that British sociology in the 21st century shares similarities with its formative ancestors from the 1960s – particularly in terms of method and data – in other respects there are profound differences, particularly in terms of research themes and theoretical orientations (Savage, 2010; Urry, 2005). We should not be concerned about further evolution, or taking risks. The challenge is to see where we can take sociology in these exciting times.
Footnotes
Funding
Funded by the EPSRC, grant number EP/G036926/1.
Notes
