Abstract
This paper explores the potential of voice audio in qualitative research, as data in its own right rather than only as a precursor to transcription. Building on critiques of voice in qualitative research, I argue that audio can enable researchers to work with the more-than-representational excesses of voice. Developing this line of thinking, I draw on Levi Bryant’s machinic ontology to set out a post-humanist conception of voice as arising within ecologies of media machines. As an example of what machinic voice audio can do, I describe an experimental audio work that I produced as part of research on a ruinous landscape. The final section of the paper makes more general observations about the malleability and fallibility of the machinic voice.
Introduction
This paper makes the case for working with voice audio as qualitative research data rather than solely as a precursor to transcription. I conceptualise voice in post-humanist terms, as arising within ecologies of mediating machines, drawing on Levi Bryant’s (2014) machinic ontology. As an example of the potential of working with the media ecologies of voice in qualitative research, I describe an audio work that I produced as part of research in a ruinous landscape, in which polyphonic voice audio was used to experiment with how the site was understood and experienced.
My starting point is the idea that voice arises from assemblages of vibrational machines. Air pushes through vocal chords, whose oscillations resonate in the throat, mouth and sinuses, filtered by movements of the tongue and lips, the pharynx, palate and jaw. The resulting vibrations propagate through space, bounce off surfaces, and get caught up with other sound machines: ears, microphones, telephones, recorders, amplifiers and transmission infrastructures. This machinic production of voice is routinely silenced in qualitative research, however, which prefers to treat voice audio as an objective record of expressed meanings, experiences, opinions, ideas, memories, feelings and values, that can then be transcribed into text (Nordstrom, 2015). Within this logocentric paradigm, listening becomes detached from sound; vibration is drowned out by discourse. The entanglement of text and audio to which the act of transcription attests is elided in favour of privileging text alone. The legibility of writing seems to confer a reassuring solidity, pinning down the communicated message to secure the truth claims of voice, while its sounds drift away on the breeze.
Consequently, in qualitative research, voice operates largely as a metaphoric concept, sustained by deeply ingrained hermeneutic and representational assumptions that haunt the social sciences. 1 While meanings, representations and textual accounts are clearly an important part of what passes in the world, all too often voice is treated as through its syntax and semantics were all that mattered, requiring the voice to yield up ‘what it means’ to analysis or else remain mute. Voice has been colonised by a paradigm centred on what Foucault referred to as the “continuous generosity of meaning. . .the monarchy of the signifier” (Foucault, 1981: 73). Yet no singer, rapper, actor, beat-boxer, stand-up comedian, impressionist, performance poet, television or radio presenter, sound engineer or film maker would accept that voice is reducible to language, or that listening is solely about comprehending meaning. The voice is ‘more than a conduit for the transfer of information. . .The voice, in its expression of affective and ethico-political forces, creates worlds’ (Kanngieser, 2012: 337).
This paper builds on critical work rethinking voice in terms of its materiality, affects and more-than-representational potentials (Komulainen, 2007; MacLure, 2009, 2013; MacLure et al., 2010; Mazzei, 2013; Mazzei and Jackson, 2017; Vallee, 2017a, 2017b). I suggest that one way to take these critiques of voice seriously in the practice of qualitative research is to work with ecologies of voice machines. This conception of voice is informed by Bryant’s (2014) machinic ontology, whose key principles can be summarised as follows:
A machine is neither subject nor object but ‘a system of operations that perform transformations on inputs thereby producing outputs’ (Bryant, 2014: 38). Machinic ontology has a performative focus, attending to what entities do rather than what they are.
All kinds of entities can be considered machines, including human and nonhuman beings, organic and inorganic bodies. As such, like other post-humanist theories such as actor network theory and vital materialism, machinic ontology erodes hard distinctions between culture and nature, between humans and other forms of life.
Machines link up with other machines in relations that are ecological, insofar as the resulting assemblages have the capacity to produce results that exceed the sum of their parts. Each machine modifies the activities of the other machines with which it is coupled. In other words, machines mediate each other.
For the purposes of this paper, machinic ontology tunes into how voice audio is produced by ecologies of machines, as the various elements of human vocal apparatus and technical media link together in relations through which the voice is manipulated and propagated as vibration. Bryant uses the term onto-cartography to describe a type of analysis that maps these relations between machines ‘and how they structure the movements and becomings of one another’ (Bryant, 2014: 7). Onto-cartography’s focus on how machinic operations mediate flows is particularly attuned to the aim of this paper: to explore how voice audio can be worked with in qualitative research, rather than always pushed out of the analytical frame in favour of transcribed text. An onto-cartographic approach has the potential to disrupt what Nordstrom calls the realist-objectivist paradigm, in which voice recording is assumed to ‘capture’ discourse in a way that is ‘apolitical, acultural, and aproblematic’ (Nordstrom, 2015: 390). Methodologically, conceptualising voices as media ecologies opens up different ways of working with voice, starting with its quivering intensities of vibration as they circulate amongst machines.
Where transcription dematerialises the voice, audio recording machines register and relay the physical vibrations of matter in the world, and hence the extralinguistic aspects of voice. Admittedly, audio systems often filter signals to minimise ‘noise’ and enhance the vocal ‘signal’, but they cannot ever wholly disentangle language from sound. A more-than-representational excess is always registered alongside the semantic content (Cox, 2009; Kittler, 1999). Consequently, audio recordings may represent voices but they always do more besides. I discuss experimental styles of working with voice recordings that maximise this excessive potential, using it to intensify sonic affects rather than falling back into the representational tropes common in conventional broadcasting and documentary production.
These arguments require some caveats. There are many research situations in which there are good reasons for limiting the use of voice audio, such as where anonymity is vital to protect participants. As Gershon (2013: 261–262) puts it, this paper ‘is not an argument against text or a call for the significance of sound over other sensory information.’ Rather I want to make the case for hearing the sounds of voices as well as what they signify, as well as how they might be written. Voice audio methods do not offer ‘solutions’ to the problems of voice, providing better access to the authentic truths of subjects, or a way to transcend the limits of text. The media ecologies of voice are of interest precisely because they intensify these problems, amplifying what MacLure (2009) calls the productive insufficiency of voice: the ways in which the limits and excesses of voice scramble enquiry, upset epistemology, and disrupt the orderliness of conventional qualitative methods.
Neither am I suggesting that all voices are irreducibly or essentially sonic. Some people hear voices that do not vibrate acoustically (Blackman, 2001). Some voices operate in non-sonic registers, such as sign language (Stone and West, 2012) and the Picture Exchange Communication System (Ashby, 2011). The audio methods I am advocating here will not work with all voices. Nonetheless, a machinic conception of voice has the potential to be more inclusive than humanist conceptions of voice, because machinic ontology positions assistive technologies, such as typed communication, hearing aids, induction loops and sound field amplification, as yet more machines within ecologies of voice, rather than external add-ons or prostheses.
The paper begins by reviewing previous work on the voice and the sounds of voices. I then discuss voice audio methods via an example from my own research. The final section of the paper develops a broader argument about the methodological and epistemological implications of working with voices as machinic media ecologies.
Rethinking the voice
The discourse of voice is apparent in everything from consumer surveys to disability rights campaigns, from school councils to legal advocacy. Tangen (2008) defines voice on three levels: as methods and strategies used to gather people’s views; as the views themselves; and as the subjects expressing these views. In the social sciences, voice has been framed as a means through which (often marginalised) subjects can express their experiences and desires, and have these taken into account. For example, in education, the discourse of student voice positions students as ‘valuable experts whose opinions should be sought for the betterment of school’ (Pomar and Pinya, 2015: 113), to improve teaching and learning (Flutter, 2007; Keddie, 2015), to improve classroom conditions (Hopkins, 2008), to support students who are moving between schools (Messiou and Jones, 2015), or to create more inclusive environments (Adderley et al., 2015; Bolic Baric et al., 2016; Whitburn, 2016). Ultimately the goal is one of empowerment by enabling people to voice the truth about their lives.
This notion of voice has undoubtedly had emancipatory effects. The promotion of disabled people’s voices, for instance, has resulted in material improvements in accessibility and inclusion. There is evidence that incorporating people’s views into decision making can have positive effects in areas such as health care (Cotterell et al., 2011) and social work (Tregeagle and Mason, 2008). The critiques of voice on which I am building in this paper do not seek to undermine these valuable strategic gains, but rather to recognise how the forms of subjectivity promoted by voice can also become regulatory, limiting what can be heard. These critiques have been particularly prominent in education research and childhood studies. A number of authors in these disciplines have challenged the implicit humanism of the dominant discourse of voice: the idea that the voice comes from a rational, conscious, self-possessed subject ‘who knows who she is, says what she means and means what she says’ (MacLure, 2009: 104). Voices which do not fit this description tend to be filtered out or silenced. In youth participation, for example, ‘young people are urged to downplay a vast array of emotions in order to transform their feelings into ‘reasoned’ argumentation’ (Kraftl, 2013: 15). Based on research with disabled children, Komulainen (2007) shows how practices of voice privilege clear, rational communication, such as unambiguous answers to dichotomous choices, and struggle to accommodate desires or perspectives that are uncertain, unclear, partially formed, or which fall outside of binary choices. The framing of children’s voices within such narrow adult-centred epistemologies limits what can count as valid expression (I’Anson, 2013).
Throughout these arguments there is a frustration with the dominant representational conception of language, which ‘limits articulation to that which is verbal, textual or linguistic’ (Komulainen, 2007: 23). This representational paradigm occludes the embodied materiality of voice. Transcription silences this corporeality, with its huffing and puffing, thick accents and stumbled half-sentences, its bursts of laughter, stutters, coughs and gutteral mumblings. ‘One could argue, indeed, that one of the main functions of method is to contain, manage or forget the bodily entanglements of language, so that it can be freed to represent’ (MacLure, 2013: 664). These bodily entanglements are worth saving from the dustbin of method because they produce critical fissures in the edifice of signification. Crowbarring analysis into these cracks prizes open a space in which we can hear the productive insufficiency of voice: its ‘abject propensity to be too much and never enough’ (MacLure, 2009: 97). Too much, insofar as the voice always overflows the capacity of categories and codes to contain it; never enough, because the voice has a propensity towards absence, in silences, mispronunciations, one-word answers, empty or enigmatic statements.
The productive insufficiency of voice disrupts method. The question is how to ride the waves of that disruption rather than being washed away. MacLure (2009: 106) suggests attending to ‘those properties of voice that resist both surrender and mastery – properties such as laughter, mimicry, mockery, irony, secrets, masks, inconsistencies and silence.’ Such extralinguistic properties become easier to hear when we listen to voice as sound, the ‘noisy blur that talk is. . .the strange sound that unconnected letters may create. . .the physicality of tears, shrieks, the vomiting voices of laughter, sighs, lisps, whispers’ (Gurevitch, 1999: 528–529). Rosen (2014), for example, writes about the communicative and affective functions of young children’s screams in nurseries. A sonic sensibility can also help attune to silences, in which the absence of voice troubles the idea that subjects ought to speak their truth (MacLure et al., 2010; Nairn et al., 2005; Spyrou, 2015).
What these accounts tend to neglect, however, are the relations between the voice and media technologies. Voice audio has long played an important role in qualitative research. Harvey Sacks, for example, laid the foundations for conversation analysis (CA) by analysing recordings of a psychiatric counselling helpline. In other words, the methodology of CA arose from a media ecology in which voice was entangled with both the telephone and the tape recorder. Yet these technologies are largely effaced in the presentation of CA, which focuses on transcribed talk (e.g. Sacks, 1992). The Jefferson Transcription System, widely used in CA, addresses sonic features of the voice, such as intonation, pace and volume, but in a way that, whilst wholly dependent on audio, displaces it in favour of textual renditions of talk. Between the literature on interpersonal research encounters and the literature on transcription, there are a few accounts addressing the media machines that link these two aspects of method (Back, 2014; Crichton and Childs, 2005; Gordon, 2013; Lee, 2004; Markle et al., 2011; Nordstrom, 2015; Thompson, 1996). In practice, however, audio recorders still tend to be heard by qualitative researchers within a logocentric, objectivist-realist paradigm (Nordstrom, 2015), as devices that produce accurate and impartial records of spoken discourse for textual transcription, or sometimes in more disparaging terms as disruptive gadgets that obstruct rapport and reduce data quality (Al-Yateem, 2012).
One way to respond to the critiques of voice outlined above is to work in the opposite direction, bringing voice audio technologies centre stage. For qualitative and post-qualitative researchers who want to work with the productive insufficiency of voice, to actively harness extralinguistic excess to generate knowledge that can disrupt the dominant language- and meaning-centred paradigm of voice, audio technologies have massive untapped potential. To give a flavour of these affordances, the next section of the paper presents an example from my own work.
Audio methods
A number of social researchers have explored sound-based methods in recent years (Daza and Gershon, 2015; Dean, 2016; Duffy and Waitt, 2011; Gallagher and Prior, 2014; Gershon, 2013; Hall et al., 2008; Moles and Saunders, 2015; Saunders and Moles, 2013, 2016; Stevenson and Holloway, 2017). Much of this work argues that qualitative research can benefit from attending more closely to sounds beyond the usual focus on human voices, including using audio recordings to tune into background noise and sonic ambiences that are ordinarily ‘filtered out’ by researchers and their methods. In this paper I want to hear what happens if that expanded sonic sensibility is flipped back onto the voice. With voices, audio recordings register a level of detail beyond what can be conveyed in textual transcription, relaying accent, rhythm, cadence, hesitations, laughter, bodily noises, as well as traces of the machinic apparatus such as self-noise and microphone position. Specialised phonetic transcription systems can render such sonic features into text, as in the Jefferson system, but the results displace audible vibration in favour of abstracted visual representations of vocal sound. Of course, text and audio are not mutually exclusive; they can be used together in all kinds of ways; and transcripts have affordances that voice audio does not. Indeed, transcription can be seen as another machinic process within the media ecology of voice. Nevertheless, given the dominance of text in qualitative methods, what I want to focus on here is audio, which has by comparison been neglected.
Relaying sound as vibration, audio recording technologies register and re-enact what Kittler (1999), following Lacan, refers to as the real – i.e. physical movements of matter, independent of semantic meanings or discursive functions. For this reason, audio recordings are ideal for working with the affective and more-than-representational aspects of voice. They were made for the job. Barthes (1985) argues that transcription loses the body in speech, by taking time to rewrite, edit, censor and tidy up what was said. Voice audio, by contrast, is neither speech nor transcription. It hovers in an ambiguous third position, without either the bodily fullness and immediacy of speech, or the seamless coherence of writing. Like transcription, the inscriptions of audio permit editing and manipulation, but they also register bodily traces – not only of the body of the speaker, but also of the acoustics of other bodies involved in the media ecologies of recording and playback.
To repeat, I am not suggesting that voice audio recordings offer straightforward solutions for unsilencing the voice. There is nothing inherently liberating about the medium. It can be, and often is, used in ways that reify humanist subjectivity and representational epistemologies, such as in mainstream radio and television broadcasting, documentary production and electronic news gathering. In these media genres, technical conventions include close mic’ing, various measures to minimise background noise, editing out mistakes or failures, and carefully sequencing voices one at a time to produce coherent linear narratives. These practices shore up the illusion of voice as a rational expression of self. Yet in the sonic arts there are more experimental styles and techniques, orientated towards working with the rhythms, tones, ambiguities and excesses of voice audio: sampling, editing and sequencing voices, mixing and layering multiple voices, and processing, e.g. through vocoders, filters, pitch shifting, time stretching, delay, reverberation and modulation effects. Rather than hiding behind technologies to present an illusion of transparency, such techniques intensify the technicity of voice. For example, Vallee (2017b) writes about how artist Laurie Anderson has used the vocoder to multiply, defamiliarise and displace her voice, enacting a playful critique of the ways in which voice is tied to identity categories such as age and gender. Along similar lines, Prior (2018) analyses how popular music plays with the malleability of the voice as a hybrid human-technical assemblage, via software such as Vocaloid, which synthesises virtual singing voices based on pre-recorded phonemes and phrases.
This kind of blatant manipulation may seem inappropriate to qualitative researchers, for whom representing research participants fairly and accurately is a fundamental principle. Yet intensifying the technicity of voice, by pushing voice machines to the point where they break established production conventions, can also perform valuable functions in qualitative research. Such experiments can bring the media ecologies of voice to the fore, amplifying how voices always arise and circulate within assemblages of machines, both intra-human and extra-human; they can disturb the humanistic conception of voice as the authentic, truthful expression of an internal subjectivity; and they can bring forth reconfigured voices that surface different kinds of truths – more fractured, compromised and speculative, with a more open ended quality than the representational closure that characterises traditional qualitative research.
In 2012–2013 I experimented with voice audio in the production of a sound piece based on research about Kilmahew, a former country estate in Scotland. This site contains a unique series of ruins, most notably the remains of St. Peter’s College, an internationally renowned work of post-war modernist architecture. Starting in 2010, efforts were made to reinvent the site by a public arts organisation called NVA. During this process, a team of academics worked in partnership with NVA to carry out collaborative, experimental research activities relating to the site. The aim was to explore Kilmahew and St. Peter’s College, in ways that would engage a range of people with connections to, or interest in, the site, and thereby generate new ideas and insights about its past, present and possible future. As part of this project, I produced Kilmahew Audio Drift No.1, a sound composition for people to listen to on portable audio players whilst walking around the site, folding sounds and stories from the place back into it to playfully intervene in people’s experiences of the landscape (see Gallagher, 2015). A portable audio work had practical appeal as the site is accessible only on foot and has no stable infrastructure or power sources. It was made available as an MP3 file online for listeners to download (http://www.michaelgallagher.co.uk/audio/Kilmahew-Audio-Drift-No1.php).
The source material included environmental field recordings made at the site, recordings of interviews with 14 key informants from a range of different backgrounds, and recordings made during three on-site activity days, in which a variety of people with an interest in the site were invited to take part in playful activities investigating the landscape. The editing, composition and mixing of all of these recordings to produce a final piece bore some similarity to analysis and writing up – far less systematic, more intuitive and ad-hoc than in traditional qualitative analysis, but comparable insofar as the process was governed by certain themes that emerged through the process of the research, and which informed choices about what data to include and exclude and how to arrange it. In what follows, I discuss two themes in particular: conflict and resonance.
One of the recurring themes in the data was the contested nature of Kilmahew and St. Peter’s College, as a site where many different histories, species, architectural forms, stories of the past and visions for the future were all layered together, often uncomfortably, producing conflict and friction. Two conflicts in particular stood out from the interviews. The first concerned the value of the modern ruin: for some it was an exceptional, inspiring space that warranted conservation or restoration (there were competing visions of what that might involve), while for others it was an ugly, obsolete carbuncle that deserved demolition, echoing a wider popular discourse of hostility towards post-war modernist and brutalist architecture. The second conflict concerned the management of invasive species in the woodland, particularly Rhododendron ponticum, which had been planted as an ornamental shrub, probably in the late 1800s or early 1900s, when the site was a designed and intensively managed landscape. During years of neglect, the species had expanded across the estate, coming to dominate the woodland understorey in many places, obscuring paths and viewpoints. At the time, NVA were developing plans to eradicate these plants, but one interviewee in particular was vociferously opposed to the proposed approach, which he saw as heavy-handed and destructive.
The convention in qualitative research would be to sequence these different perspectives, producing something like what Deleuze (1994: 224) writes of disparagingly as ‘good sense’, which ‘essentially distributes or repartitions: “on the one hand” and “on the other hand”’. Using audio, however, I was able to work in a way that felt more attuned to the geography of the site: mixing and superimposing different voices, allowing them to be heard simultaneously, at points becoming tangled and confused like the place itself. With audio, working with polyphony in this way is not merely about incorporating different voices, or setting the words of different participants alongside each other, but can happen as a literal polyphony: the simultaneous production of many voices in the audio domain. In audio production, multi-tracking is commonplace, unlike the single-track convention that dominates other media such as text and moving images.
My use of polyphonic voices was inspired by The Idea of North, a 1967 experimental radio documentary produced by the Canadian virtuoso pianist Glenn Gould 2 , comprising an edited montage of recorded voices speaking about the far north of Canada. The voices are frequently superimposed over each other in what Gould described as counterpoint, drawing on his extensive knowledge of Bach. As a result these voices sometimes wash out into an undulating babble, and following the exact meaning of everything that is said becomes difficult. This construction undermines the manufactured coherence of voice in conventional radio production, in which voices usually speak clearly one at a time. Ironically, the result is something closer to how voices often sound in the ‘real world’: speaking all at once, drifting in and out of earshot, not always intelligible.
Gould was working with analogue multi-track tape, and would likely have had to edit and compose his work in a laborious way, marking up open reel tape by hand, making splices using a razor, joining sections together using adhesive tape, dubbing different voices using multiple tape recorders, and mixing voices by hand using faders. Modern digital audio workstation software, by contrast, enables fast non-destructive multi-track editing and easy layering of many tracks, with fully automated control over all parameters. Influenced by these affordances, where The Idea of North has multiple voices slowly fading in and out, I found myself making faster and more fine-grained edits. The conflicts in the data steered me towards working with pairs of opposed voices. I spent many hours editing and tweaking the timing and levels of different phrases and words to ensure that each voice had space to air its views alongside the others. I aligned audio clips such that one voice could be heard in the pauses and gaps left by another; used automation to balance levels so that all sides of the argument could be heard; and placed opposing voices on either side of the stereo field, leaving listeners caught in the middle of the debate. In this way, the work both represented conflict and performed it at the level of sonic affect. The listener is placed between, for example, the tone of reverence of one interviewee as he waxes lyrical about how the building ‘hides its beauty within. . .it modulates light in this incredible way’, layered with the more muted disdain of another interviewee describing it as ‘a kinda grim building. . .cold, uninspiring’.
These methods are not unproblematic. It has been argued that The Idea of North uses aesthetic distance to create an epistemological opacity (Vallee, 2014) –a critique that could equally apply to my own piece. Vallee suggests that Gould’s documentary advances an implicitly colonialist, nationalist vision of Canada, in which Inuit voices are notably absent, their culture represented only by southern ‘experts’. This line of argument amplifies how voice audio requires as much care over politics and ethics as any other method –perhaps more, given the cultural value placed on voice, and the potential longevity of the medium. What I suggest we take from Gould, then, is not his politics but the potential his work demonstrates for using voices in counterpoint. The innovation of Gould’s technique is not only to superimpose voice recordings, but also to align them to create resonances – not in the acoustic sense of the term, as the reinforcement of a particular frequency, but through alignments between certain sonic-discursive elements. For example, in The Idea of North, when taking about travelling northwards, a voice is heard to say ‘further’ and then immediately afterwards another voice says ‘farther’, tangling the two voices together, such that the discourse seems to float between the speakers. This way of working with voice audio positions discourse as a collective endeavour that inhabits and flows through subjects, mediated by them, rather than something that comes from them. In my own data, where the same words or themes were taken up by different speakers, I used these as hinge points around which to link voices together. For example, I mixed a recording of a young man from one of the nearby villages recalling how he stumbled upon a gothic themed fashion shoot at the site one day, with a clip of an interior design lecturer referring to how gothic architectural features had been grafted onto an old ruined castle on the site, giving it a more complicated backstory than was immediately apparent. I also created non-discursive alignments to intensify resonances between voice and site. For example, I blended together two voices at a point where each of them broke into laughter; I processed a voice speaking about rumours of underground tunnels at the site through a tunnel-like reverberation modelled on the acoustics of one of the ruins; I truncated the debate about rhododendron removal with a burst of recorded chainsaw noise to affectively relay the violence of the imminent clearance works; and I interrupted an interviewee’s account of walking dogs in the woods with a binaural recording of a dog barking at me while I was on the site one day. Listening to the final piece in situ, this phantom dog seems to ‘jump out’ of the mix, momentarily disrupting the historical distance between past and present. More generally, several listeners remarked on how the piece, auditioned in situ, seemed to repopulate the abandoned site with its many voices, bringing it alive, or animating it with ghosts. This is the kind of extralinguistic affective potential that I find compelling about experimental voice audio methods. They can move bodies, modulate the atmosphere of a place, perform hauntings, and create an uncanny sense of absent presence, in which the voice comes unhooked from its moorings in subjectivity and language.
Voice as machinic media ecology
In this final section, I want to work outwards from the preceding example to develop a broader methodological framing for voice audio methods. To this end, it may be useful to further develop the conception of voices as machinic media ecologies, drawing on Bryant’s (2014) machinic ontology, as a methodological basis for operationalising post-humanist critiques of the voice. As previously noted, Bryant understands all entities as machines that process inputs to produce outputs, and which function in conjunction with other machines. Each machine mediates the outputs of other machines through relations that constitute an ecology, producing effects that exceed the sum of its parts. Thinking in this way amplifies how the voice originates amongst vibrating body parts: vocal chords mediating air flow from the lungs and diaphragm, turning it into oscillations, which are then mediated by the resonant properties of the oral and nasal cavities, whose output is modified further by the tongue and lips, and so forth. But as discussed above, machinic thought also directs attention outwards, to the production of voice by electronic technologies. From the first moment that Bell spoke to Watson on the telephone, from the earliest etchings of Edison’s phonemes into phonograph foil, audio machines have been transcending the limits of the human subject, sending the voice across space through broadcasting and telecommunications, and preserving it beyond death through mechanical or magnetic inscription. As Prior (2018: 495) argues: “Paradoxically, while the voice attains its meaning as a uniquely expressive carrier. . .it is simultaneously accompanied by a whole machinic infrastructure (electricity, stages, acoustic treatments, amplifiers, microphones, compression, and reverb units) which reveals that carrier to be radically hybridized.”
If qualitative research is to remain relevant in an era of ever-developing voice technics, it will need to engage with how those voice functions with which it has historically been concerned, such as human communication, subjective expression, articulation of discourse and so on, are entangled with electronic media machines. Urban public spaces, for example, are have become infused with what (Power, 2014) calls the ‘soft coercion’ of automated voices apologising for the inconvenience, or warning that luggage left unattended may be destroyed or damaged. Increasing numbers of domestic and other private spaces now host voice-controlled assistant systems such as Amazon Echo and Alexa, Google Home, and Apple iOS Siri, raising concerns about privacy. Voice technics are also used to augment bodies with impairments, as with electrolarynx devices and computerised text-to-speech synthesis. Software such as the Lyrebird app and Adobe’s Voco are providing more sophisticated simulations of human accent and other timbral qualities of the modelled voice. The Lyrebird technology is currently being used for purposes such as enabling people with degenerative diseases such as ALS (Motor Neurone Disease) to continue communicating in their ‘own voices’ after losing the capacity to speak. Yet such systems also have disruptive potentials for deception, forgery and fake news, as evident in the production of ‘deep fake’ videos, in which highly realistic images and voice synthesis are combined to simulate statements from well known public figures.
These machinic media ecologies have serious implications for the epistemological and political functions of voice. Against the idea of voice as a relatively durable index of a unique human subject, a machinic media ecological perspective tunes into the malleability of voice, as an assemblage whose ensemble production, arising through couplings between heterogeneous forces and bodies, creates multiple points of flex, modulation and indeterminacy where various micro-powers can come into play. For example, there are medical procedures referred to as ‘voice lift’, that attempt to counteract the effects of ageing on voice timbre: ‘the surgeon injects implants through the neck that bring the vocal cords closer together, or they inject fat (or collagen) to make the surface area of the flesh thicker’ (Vallee, 2017b: 91). Such interventions are part of a wider spectrum of biosocial voice modification techniques: speech and language therapies, vocal coaching, elocution lessons, ventriloquism and impersonation. These processes remind us that, through ongoing processes of ageing, health and illness, education, attunement to different linguistic, socio-economic and geographical milieus and so forth, the voice is always being modified. At the same time, voice biometrics, audio forensics and voice print technologies are establishing new regimes of truth, that try to fix voices and pin them to individuals (Sterne, 2008) – precisely the inverse of voice synthesis and simulation, which unhook voices from their ties to specific bodies. All of these voice technics can be heard as attempts to wrestle control over the troubling plasticity of voice.
The question for qualitative research might be: how can these technologies be appropriated in ways that, rather than serving functions of social control, unleash more generative radical potentials of voice? Bryant insists that machines, though always in relation, are not wholly reducible to their relations with other machines, because ‘each machine carries an excess capable of breaking with its circumstances. . .and enter[ing] into new relations. In these new relations, the machine might very well display hitherto unexpected powers’ (Bryant, 2014: 181). As Nordstrom (2015: 398) puts it, ‘No singular definition can describe the recording device because through its iterations and becomings it exhausts definitions.’ It is precisely these excessive, speculative potentials that voice audio methods can help to actualise, by throwing voice machines into new situations, to create new ecologies of relations between languages, audio technologies and listeners.
The excess of voice machines is particularly apparent in vocal breakdowns and failures. The machinic assemblages of voice, for all their careful orchestration, are too unstable to be relied upon to express the self as a coherent, contained identity. Before it even leaves the body, the voice is shaped by wayward material conditions, such as states of excitement, exhaustion, hydration or dehydration, agitation or relaxation. The voice may try to present an illusion of rational self-possession and self-presence; it may be eloquent and articulate; technologies may black box its body out of sight and out of mind; and yet still it is prone to accidents, lapses and misfires.
In 2010, the British broadcaster Jim Naughtie offered up an exemplary instance. For over two decades, Naughtie’s voice was a regular fixture on the Today programme, BBC Radio’s flagship morning news and current affairs show. In the machinic ecology of this voice, a male Scottish accent with perfect clarity of enunciation, authoritative without ever becoming overbearing, joined forces with large diaphragm condenser microphones, pre-prepared scripts, acoustically treated studios and carefully optimised dynamic range compression to produce the most comprehensible of utterances. Phonemes rolled out fully formed, cadences rose and fell properly, producing an effortless sense of rationality. And yet on one memorable occasion, when introducing Conservative minister ‘Jeremy Hunt the Culture Secretary’, Naughtie’s voice accidentally swapped the ‘H’ of Hunt and the ‘C’ of Culture to shocking and hilarious effect (see https://youtu.be/YS5mVoqJpUk).
Whether the incident was a Spoonerism, the result of phonetic priming, or a Freudian slip is a matter for speculation. 3 Of more interest for my analysis is how Naughtie’s voice immediately broke down following his gaffe, like a tower block collapsing after the initial dynamite blast of demolition. Valiantly continuing to read the headlines, this voice, normally so composed, started choking on its own words – beset by dry coughs and awkward pauses, lines forced through the hoarseness of vocal cords seizing up. Radio’s voice-from-the-ether suddenly acquired a body, which intruded noisily. In a thickened, viscous tone, teetering from the brink of laughter to the edge of teers, headlines about Wikileaks, high-speed broadband networks and Egyptian shark attacks took on gasping, almost morbid quality. ‘Excuse me,’ Naughtie eventually spluttered, ‘coughing fit’ – an explanation whose obvious inadequacy revealed the desperation of a man struggling to control his own voice machine.
Naughtie’s mistake amplifies some important features of the machinic voice. It shows us that no voice, and no method of voice, is immune to breakdown, no matter how well trained or technologically supported. It demonstrates that vocal breakdown is worth listening to rather than discarding, as it can reveal the bodily materiality of voice – a materiality that was there all along, but concealed by media artifice with its norms of intelligibility and definition. The episode also points, once again, to the potential of audio for relaying the voice as sound. Transcribe Naughtie’s words and most of what happened is lost; listen back on YouTube, however, and a whole series of extralinguistic sensations, affects and resonances spill out from the speakers. My attempt to describe it textually is laboured and inadequate by comparison.
Conclusion
This paper has argued for working with voices as machinic ecologies in qualitative research. My argument is not anti-language, anti-text or anti-meaning; rather it wants to hear what else voice can be and do. Responding to critiques of voice in qualitative research, I have sketched out how experiments in voice audio, using techniques such as contrapuntal polyphony, might enable researchers to work productively with the excesses of the machinic voice. There will always be situations in which voices need to be transcribed – to maintain anonymity, to protect vulnerable participants, to enable certain kinds of analysis, or to conform with established channels of publication and dissemination. But the field of qualitative research would surely benefit from a larger repertoire of techniques for working with voice audio beyond the default mode of transcription.
Bryant’s machinic ontology is methodologically useful as an alternative to the dominant objectivist paradigm, which treats voice audio as a straightforward record of discourse. Hearing the voice instead as a machinic media ecology opens up the black box of the voice, directing attention to how it arises vibrationally, through the coupling of different human and nonhuman voice machines. Machinic ontology also draws attention to how these machines mediate one another, collectively producing effects that exceed the sum of their parts. Framing voice in this way enables us to hear its material and affective dimensions, its malleability, instability and emergent qualities.
From a humanist perspective, the various kinds of technologically mediated voices I have discussed in this paper might seem fake or inauthentic imitations, far removed from the individual, personal, human voices with which qualitative research is concerned. Yet I am arguing that all voices are machinic through and through. They arise through the coupling of vibrational machines into ecologies, whose emergent properties and inherent instabilities can take the voice beyond the recitation of banalities and linguistic cliches, and into stranger, more provocative territory. Moving amongst machines, voices proliferate as versions, like the endless repetitions of dub music: sounds spoken, recorded, replayed or re-recorded, encoded and decoded, returning as echoes of echoes. Working with these machinic ecologies can help qualitative research tap into more of the lively, speculative potentials of voice.
Footnotes
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the AHRC, award numbers AH/K502728/1 and AH/J006556/1.
