Abstract

Against the background of the challenges of new technology-driven language practices and big data, Quantifying Approaches to Discourse for Social Scientists seeks to provide social researchers with a rich toolkit of quantitative approaches and ‘a range of inside and outside perspectives’ (p. 8) to investigate the construction of knowledge in society, focusing on both the textual context of discourse and its institutional, social and historical contexts. The book comprises 10 chapters in four parts.
The introductory Part I (Chapters 1–2) reviews methods developed in natural language processing and discusses the challenges and opportunities for quantifying discourse research. Chapter 1, by the editor, sees the ‘increased amount of coherent information contradicting previously established knowledge formations or narratives’ (p. 4) as a crucial contribution to making discourse a valuable object for social research, knowledge production and knowledge exchange. In Chapter 2, reviewing both strengths and weaknesses of qualitative and quantitative methods, Duchastel and Laberge advocate the use of integrative mixed methods in discourse analysis.
Part II (Chapters 3–4) focuses on the integration of institutional contexts into discourse analysis. With the help of correspondence analysis, Chapter 3 adopts the Foucauldian concept of dispositif to demonstrate an empirical analysis of academic discourse with a corpus of UK sociologists, combining a linguistic discourse analysis of text with a sociological study of the social context. The chapter discusses ‘where meaning-making is domesticated and controlled in the academic world’ (p. 61), though it does not extend into an analysis of how. Chapter 4, by Angermuller and van Leeuwen, takes scientometric research as an example of the emerging numerocratic practices in the context of a knowledge-power dispositif, and critically argues that scientometrics as a research field legitimises numerocratic governance in higher education and creates social inequalities as well as hierarchical relationships between academics.
Part III (Chapters 5–7) discusses more complex algorithms used in exploring corpora. In Chapter 5, Scholz presents the lexicometric approach to discourse, a data-driven approach to applying statistical algorithms onto textual data of language use. He emphasises the importance of the combination of both heuristic methodologies and hermeneutic methodologies in lexicometric analysis. Chapter 6, by van Meter, shows how statistical methods helps to understand ‘which concepts and topics dominate a particular discourse in society’ (p. 156) and their relationships at a certain period, and features the value of textual analysis in the systematic study of scientific and cultural production. Wiedemann’s Chapter 7 introduces unsupervised and supervised machine learning algorithms to demonstrate how advanced text mining contributes to a discourse research design, and also compares the characteristics of ML and already established computational approaches, such as computational content analysis and lexicometric analysis. Wiedemann argues that current ML methods are still far from a purely automatic discourse analysis, since they rely on creative human analysts to link findings in the data and draw the right conclusions.
Part IV (Chapters 8–10) explores new developments in corpus-assisted discourse studies. Chapter 8, by Baker and McEnery, uses the CADS approach to revisit and extend a previous study of the representation of Muslims and Islam in the UK press, and exemplifies the dynamic nature of discourse in a contrastive way. Bubenhofer and colleagues devote Chapter 9 to the twofold usefulness of adopting visualisation techniques in discourse analysis, which on one hand turn the data accessible for interpretation and lead to new insights with ‘diagrammatic operations’ on the other. The final chapter by Stegmeier and colleagues employs three different methods to showcase transnationalisation of political communication via Twitter: geolocation analysis, network analysis and keyword analysis. Focusing on two global issues, namely net centrality and climate change, the multiple methods complement and supplement each other, offering a constant mutual check on the methodology. Nevertheless, to some degree this method-driven approach can not cover the lack of a theoretical foundation.
The book is well-organised and informative, providing detailed discussions of the application of complex algorithms in corpus exploration as well as new developments in CADS. Its transdisciplinary approach breaks down institutional and disciplinary boundaries between the fields of linguistics, computer science, statistics, political science, sociology and more, providing readers with a wide range of quantifying methods. Varied analytic tools are showcased on discourses ranging from the institutional to the political including factorial correspondence analysis, descending/ascending hierarchical classification analysis, geolocation analysis, network analysis and keyword analysis. Unlike volumes devoted to a specific approach, such as text mining or corpus linguistics, this volume highlights quantifying corpus methods from a global as well as a comprehensive angle and offers many accessible ways to examine how meaning is constructed in society. Nonetheless, its goal to present these methods ‘in an accessible way to corpus beginners and experts alike’ (p. 13) is not fully met; beginners may be daunted by the familiarity the volume requires not only with different disciplines but also with the possibilities and limits of quantifying methods in handling natural language data.
The book views quantifying methods as complementary rather than in opposition to qualitative methods (p. 12), and aims to address different levels of the co-text and the context instead of simply offering analysis at the textual level. However, few chapters focus on applying the combination of quantitative and qualitative methods to discourse analysis advocated in Chapter 2, which makes it less comprehensive and persuasive in analysing both intra- and extra-contexts of language use. Nonetheless, this multinational collection of papers from leading scholars is worth reading not only for its overview of the field of quantitative discourse analysis but also for specific approaches and tools that combine theoretical well-foundedness with methodological innovation. It will be of use to those who are interested in quantitative discourse analysis, especially in the challenges posed by big data and technology-driven language practices.
