Abstract
Qualitative research that focuses on social interaction and talk has been increasingly based, for good reason, on collections of audiovisual recordings in which 2D flat-screen video and mono/stereo audio are the dominant recording media. This article argues that the future of ‘video’ in video-based qualitative studies will move away from ‘dumb’ flat pixels in a 2D screen. Instead, volumetric performance capture and immersive performative replay rely on a procedural camera/spectator-independent representation of a dynamic real or virtual volumetric space over time. It affords analytical practices of re-enactment – shadowing or redoing modes of seeing/listening as an active spectation for ‘another next first time’ – which play on the tense relationships between live performance, observability, spectatorship and documentation. Three examples illustrate how naturally occurring social interaction and settings can be captured volumetrically and re-enacted immersively in virtual reality (VR) and what this means for data integrity, evidential adequacy and qualitative analysis.
Keywords
Introduction
Qualitative research that focuses empirically on practical action, social interaction and talk in social practices has been increasingly anchored, for good reason, in working with collections of audiovisual recordings of natural occasions. This article will focus on the methodological approach of ethnomethodological conversation analysis (EMCA), but many points are relevant to a broader video-based qualitative research agenda (Flick, 2018). For both, 2D flat-screen video and mono/stereo audio recordings have been, and still are, the dominant recording media for data collection. Taking a different tack, this article makes the case that there is a future for ‘video’ in video-based qualitative research that in some ways is not video per se. Ultimately, video-based qualitative studies will migrate away from ‘dumb’ flat pixels in a 2D screen towards volumetric performance capture and immersive performative replay in virtual reality (VR). 1 It is true that certain traits of the use of video in video-based research will remain important, such as replay and freeze frame, but recent and emerging technological possibilities require creative rethinking and bold experimentation that will have significant methodological consequences.
As a result of practices and technologies of virtualisation, we are on the cusp of a scenographic turn that entails a rethinking of space, volume and movement. There is a need for a more scenographic imagination (Hahn, 2018; McKinney and Palmer, 2017) – for instance, to draw from, but not be bound by, immersive theatre and computer gaming, and thus to discover and multiply an innovative apparatus of observation with fresh ways to ‘sense-with-a-camera’ and ‘sense-with-a-microphone’. Unlike ‘video’, volumetric performance capture relies on a procedural camera/spectator-independent representation of a dynamic real or virtual volumetric space over time. A complete, reliable workflow for live volumetric performance capture in everyday physical settings is not yet commercially available nor computationally feasible today, but this article maps out its possibilities and limits, primarily for data collection and staging the data, through consideration of current examples of novel immersive qualitative analytics (IQA) software and new types of time-based data collection that go beyond (and yet remain within the orbit of) video. Three examples illustrate how naturally occurring social interaction and settings can be captured volumetrically over time and what this means for data integrity, evidential adequacy and qualitative analysis. 2
The prevalence, practices and problems of 2D video and mono/stereo audio
Since the 1990s, 2D digital video has been stored as an encoded digital file that a computer media player streams and renders according to a codec (coder-decoder algorithm) as pixels on a flat screen. However, the unreflective adoption of these complex, often opaque, apparatuses of capture, storage and replay that co-constitute ‘objects of observation’ has led to ‘video’ becoming a black-box. For example, in relation to the early days of using video recordings to study social interaction, Erickson (2004: 202) noted that there were ‘differences in scholars’ routine looking and listening practices which resulted from the differing affordances of different kinds of audiovisual reviewing equipment.’ Hirschauer (2006: 418) argues that ‘instead of the selectivity of an observer, recordings yield to the selectivity of their media. . . . The video-camera as well only produces excellent documents when we grant to the selected detail in the camera’s display what we would not grant to any human participant: that it is preserving what has “actually” happened.’ Moreover, EMCA and media scholars have critiqued the dominant use of (2D) video recordings because of, for instance (a) the inadequacy of video for analysing the situated accomplishment of certain phenomena/orders of everyday action, (b) the limitations of video to ‘capture’ reality, (c) the intrusiveness of video recording in everyday settings, (d) the screen essentialism and planocentrism of flat video and (e) the false universalism of one-dimensional audio (Liegl and Schindler, 2013; Livingstone, 1987; Lowood, 2016; Schröter, 2014).
A number of scholars in EMCA have given practical advice about collecting data using video cameras, as well as how to view and analyse the resulting footage (Büscher, 2005; Goodwin, 1993; Heath et al., 2010; Knoblauch et al., 2006). 3 Additionally, some studies have topicalised camerawork and the use of video as a set of practices in the workplace or in leisure pursuits (Broth et al., 2014). There are well-known problems (and solutions) when using video cameras/audio recorders to collect data in natural settings, but the sociotechnical dimension of the camera/microphone as an apparatus or agency of observation is often neglected (Barad, 1998: 94; Gallagher, 2020). From the earliest days, some ethnomethodologists, such as Harold Garfinkel and David Sudnow, were aware that ‘you will have a hard time reproducing the actual spatial relations that members see because the camera does not operate like the eye does’ (Hill and Crittenden, 1968: 54). Sudnow claimed that a wide-angle lens would solve the problem, but such lenses tend to flatten space and push objects away from the viewer. In his discussion of the recording practices he developed when using the early portable video recorders, Goodwin (1981: 43) noted ‘any camera position or framing of participants involves a choice from a set of alternatives and any of the alternatives not selected would have produced a different record of the event.’
From a more critical stance, Branigan (2006: 8) questions the notion of a camera’s ‘point of view’ and the resulting ‘frame’. In interrogating the camera as metaphor, Branigan notes that there are at least 15 different uses of ‘frame’ as a metaphor in film studies: ‘It does not simplify matters to say that a camera merely “frames” an object’ (8). Branigan, however, is primarily referring to the traditional ‘frame’ of the 2D video camera. What is to be made of passive 360° cameras, which since 2016 have become commercially viable and easily available in the consumer market for adoption by qualitative researchers (Davidsen and McIlvenny, 2016; Gómez Cruz, 2017; McIlvenny and Davidsen, 2017)? One could argue that the continuous footage that results from such a multi-lens camera, after the separate wide-angle lenses are algorithmically stitched together into one ‘equirectangular’ frame or ‘cube map’, allows a viewer to see a flat 2D visual representation of the totality of a scene from a single location but in all directions at once, that is ‘frameless’ without a subjective contour (Branigan, 2006: 103–105). The most common view one has onto this representation is usually enframed by the traditional rectangular ‘viewport’, which changes, much as Branigan suggests, as if one ‘turns’ one’s head to look elsewhere in the scene. 4 Thus, the phenomenological experience is similar to that of a singular subject gazing out into the world (through the multiple ‘lens-eye’s centred at the camera’s fixed location).
A concern for space and volume in ethnomethodology
There are a few precursors to the scenographic turn that raise praxeological questions about body, space and perception. Maurice Merleau-Ponty, in his seminal work entitled Phenomenology of Perception (first translated into English in 1962), explored the perception of the visual field, space and verticality, including the reported experiences of wearing inverting lenses or ‘goggles’. Drawing on a deliberate misreading of Merleau-Ponty, Harold Garfinkel posed what he called ‘tutorial problems’ for his students using the technologies of inverting lenses and auditory side-tone delay (Garfinkel, 2002). In a significant methodological shift, Garfinkel concluded that what could be learnt from wearing the inverting lenses was the achievement of bodies of practices: ‘These bodies have eyes that are skills; eyes that are skills in the ways that eyes do looking’s work. Where seeing is something more, other and different than formal analytically describable positioning the orbs to assure certain retinal registration of a perceptual field, let alone a visual field’ (210). One of Garfinkel’s students, David Sudnow, envisaged that if he were to record a man going for a walk, he would need a photographic ‘camera mounted to his head’ (Hill and Crittenden, 1968: 54). In fact, this camera was to record the scene as it appeared to the man as if present on all sides (cp. 360° camera). Following on from Merleau-Ponty, Todes (2001) gives us a way to talk about what Garfinkel presciently referred to as ‘bodies of practices’, namely body-direction in the practical visual spatiotemporal field. Todes writes that our left and right ‘appear to be merely two different sides of our body. But when we are active they generate an apparent spatial field for all objects, and themselves acquire a place in this field. They do this by bearing our capacity to turn in place; that is, to turn around without moving or passing, and therefore to turn, in a certain sense, in an instant. Turning in this in-stant (literally, standing-in), or the sense of our ability to do so, apparently gives us at once our simultaneous spatial circum-stance (where what stands around us is), and our own position or place as the in-stance (that which stands inside) of that circum-stance’ (49). With passive 360° cameras, we catch a glimpse of how we might begin to recover the lived practice of circum-stance and the artful production of a spatial field.
In his introduction to ethnomethodology, Livingstone (1987) raises the example of discovering the witnessable social order of pedestrian crossings ‘available to members as situated practices of looking-and-telling’ (1). He notes that a sociologist might put a film camera overhead to find disengaged structural patterns in their crossings, but, he argues, to understand how pedestrians cross ‘we must, metaphorically, move the camera to eye level’ (22, my emphasis). Here he is careful not to assert that actual cameras are comparable to eyes or that they should be mounted at eye level, but that any form of documentation should take into account the crossers’ bodies and eyes in the spatial practices of crossing.
The visual has most often taken primacy over sound but there are occasional hints that there are also specific issues with sound, space and volume. 5 As part of their argument about hearability and the ringing of a telephone as a hearable summons, Garfinkel and Wieder (1992) acknowledge as an aside that the directionality of sound is significant in terms of the ‘listened-for direction of the ring’ (197). They refer to David Sudnow’s concern with ‘sounded doings’ and note that ‘the direction from which a sound is heard is a detail with which the listened to sound is recognized and identified as a sounded doing, that is, the sound-of-the-coherent-object; the coherence-of-details-developingly-listened-to-and-listened-for’ (197). In Bjelić (2019)’s excellent retelling of Garfinkel & Wieder’s argument about ‘hearability’ in relation to CA’s focus on ‘hearership’, he notes that ‘CA reduces members’ spherical hearing as a phenomenal background to the logic of conversation, because for CA, the context of a conversation is internal to its recordable features’ (my emphasis). This is the only mention in his paper of ‘spherical hearing’ (e.g. the hearing of spatial sound or spatial audio surrounding the listener) and it is not elaborated. In Mengis et al. (2018)’s exploration of the ways in which camera technique shapes what can be ‘seen’ in the resulting recordings – the praxeology of camerawork – they largely ignore sound and audio, but they do note that their study ‘indicates that the audio aspect of video recording also contributes to our understandings of space, and this dimension should also be systematically explored’ (310).
Beyond ‘dumb’ flat moving image pixels?
Even though we can recover some sense of space from a 2D perspectival image, and there is a kinetic depth effect – a perceptual cue about space that arises exclusively from the movement of a camera (Branigan, 2006: 9) – there has been very little reflection on either the dominance of the monoplane image or the flat quality of the pixelated digital image in qualitative research, and very few studies that draw upon and/or record stereoscopic video (and spatial audio) for analytical purposes (McIlvenny, 2019a, 2019b). To get a better understanding of how we can move beyond 2D video, Schröter (2014) proposes that there are four series of optical knowledge: geometrical optics, wave optics, physiological optics and virtual optics. For Schröter, there is ‘a blind spot in the existing historical studies chronicling optical or visual media’ (3). They have not adequately accounted for what he provisionally calls ‘the history of the technological transplane image.’ Unfortunately, he argues, ‘planocentrism privileges the plane that supposedly is readily comprehensible ‘at a single glance’, i.e. as a presence within the awareness of an observer who is present him- or herself’ (35). He defines the ‘spatial image’ as ‘the comprehensive category comprising both transplane images and those that have a three-dimensional material support (like types of sculpture or, for example, globes)’ (38). 6 Galloway (2014) argues that there was a bifurcation in the mid-19th century between (a) a cinematic concern with a point of view of a singular experience of a gazing subject/lens standing in one central location, and (b) an anti-cinematic mode in which the eye was virtualised into a ‘metastatic virtual camera able to view an object from any point of view whatsoever’ (66). Consequent to the extensive exploration of the first branch of the bifurcation, arguably leading to the predominance of 2D video in qualitative research, this article revisits that bifurcation to see what can be gleaned from the second branch. The second proliferated the number of points of view dispersed within a space to conceive of the multiple points of view as temporally synchronous but was limited by the technology of the day and the phenomenological paradigm. With the increased powers of spatial computing and new forms of virtual reality technology (VR), it is an appropriate time to return to this bifurcation.
Pioneers in volumetric performance capture and replay
Schröter (2014) and Galloway (2014) have highlighted an alternative history of volumetric media in the 19th century in which point of view has no meaning. In the 1860s, François Willème’s ‘photosculpture’ technique, using a ring of discrete but synchronised photographic cameras around a fixed subject, permitting the ‘virtualization of the eye’ (Galloway, 2014: 66). His work, and that of Braune and Fischer in the 1890s (Zielinski, 2006: 245–248), added dimensionality by disrupting ‘the singular experiences of a central gazing subject (or lens eye), however much it may be complicated by montage or the use of two or three concurrent cameras’ (Galloway, 2014: 63).
Over 100 years later, the pioneers in volumetric performance capture include motion capture, volumetric film, Machinima, mixed reality and VR games. It was in 3D video games that a new movement called Machinima developed in the 1990s to remix well-known games and create dramatic narratives using the games and their assets as a backdrop (Lowood and Nitsche, 2011; Ng, 2013). 7 Greenhalgh et al. (2000) documented one of the first attempts to build a software solution for capturing and replaying collaborative virtual environments in innovative ways. The Digital Replay System further developed the highly flexible ways a user could navigate a complex corpus of recordings/data streams in a collaborative environment (Benford and Giannachi, 2011; Crabtree et al., 2015; Greenhalgh et al., 2007; Murgia et al., 2008). More recently, computer scientists and performance art theorists have combined their efforts to develop systems such as CloudPad, with which anyone can regenerate a hybrid replay of the mixed reality performance (Giannachi et al., 2012). Steptoe and Steed (2012) suggest a reference architecture for replay that ‘collates multiple components of a user’s nonverbal and verbal behavior in single log file, thereby preserving the temporal relationships between cues’ (Steptoe and Steed, 2012: 388). Lastly, in theatre performance studies, Delbridge and Tompkins (2012: 62–63) argue that motion capture technology has led to a shift in the understanding of the documentation of an actor’s body in terms of (noncinematic) frameless movement in a 3D capture volume that the system can ‘register’. 8
Although a complete, reliable workflow for live volumetric performance capture in everyday physical settings is not yet feasible at present, we can begin by mapping out its possibilities and limits through consideration of examples of novel software and new types of time-based data collection that go beyond video.
Towards volumetric capture by staging and inhabiting video data
Increasing the complexity of recordings, both 2D and 360°, of natural settings has encouraged scholars to explore how to better represent space and volume in social interaction and practices, which in turn results in a reconsideration of 2D and passive 360° video recordings and their limitations for analysis (McIlvenny, 2019b). The complex, time-based data sets that result are hard to visualise using traditional solutions. An alternative approach within immersive qualitative analytics (IQA) is to stage video by reconstructing the site and the scenes of social interaction over time in an interactive and immersive 3D representation or model in VR and to inhabit video by exploring complex spatial video and audio recordings of a single scene through a tangible interface in VR. Examples of novel software tools to stage, inhabit and analyse passive 360° videos include SQUIVE (Staging QUalitative Immersive Virtualisation Engine) and AVA360VR (Annotate, Visualise, Analyse 360 videos in Virtual Reality). 9 Because volumetric video capture is in its infancy, the SQUIVE staging software engine is a foretaste of what it would be like to inhabit a complex site of social interaction as if it was recorded volumetrically.
Example 1
SQUIVE has been applied to the staging of a specific complex site of social and material conduct which was recorded over 5 days with several 2D and 360° cameras and traditional and ambisonic microphones in demanding conditions with respect to lighting and sound. The event was an award-wining interactive exoskeleton robot-human theatre performance entitled Inferno at a municipal theatre, including four public performances and some additional experiments on the role of music and dance on the receptiveness of the subjects to wearing the performative exoskeletons. 10 Using the 360° video recordings as a memory aid, SQUIVE was used to construct a bespoke interactive 3D model. Avatars, lighting, furniture and props were styled to reflect the ambience and actants in the relevant spaces in which the performances took place. With a head-mounted display (HMD), a user can walk around the simulated theatre and encounter the different rooms and connecting spaces that the participants interacted in and moved through. A user can interact with the cameras that were used to record scenes and they can hear spatial sound in the space as it was recorded. Footage from the physical cameras can be selected, played, cloned and clipped (see Figure 1).

Virtual cameras, avatars and real footage of a focus group staged in SQUIVE.
A 5-min video clip can be found online which shows a 2D screen capture of the viewport of a user in SQUIVE on a walking tour through the theatre site showing time-slices of the cohort of experimental subjects engaged in different activities across four rooms, for example briefing and dressing, contact dance preparation, exoskeleton performance and focus group. 11 Animated avatars substitute for each participant and the reconstructed 3D scene is populated with virtual cameras that substitute for the actual cameras that recorded the event. Each virtual camera can be selected to re-view the reconstructed scene from the perspective of the actual 2D and 360° camera footage. Moreover, a user can launch AVA360VR (McIlvenny, 2019b) from any virtual 360° camera, so that the user can annotate the clip recorded by any physical camera or microphone in the scene concurrently during the same time-slice.
The reconstruction in SQUIVE stands on its own as an anonymised, interactive volumetric archive of the event and its documentation. In its current state, the user can alternate between navigating the staged virtual volumetric reconstruction and re-viewing the physical camera footage from actual locations in the space of the event. Therefore, SQUIVE is a useful antidote to mono-planar recordings and representations, since one is continually questioning the evidential adequacy of each mode of staging and inhabiting. Of course, analytical care must be taken in order not to make strong claims about conduct from observation and interaction with the staged models alone. The virtual models are a tool to organise qualitative enquiry; they are not a substitute for ethnography, skilled camerawork and originary sources.
Two examples of volumetric performance capture in virtual reality
As I have argued above, there are problems with treating 2D video recordings as veridical reproductions of past social events, especially with regard to access to scenographic orders and spatial experience. Volumetric performance capture, on the other hand, affords analytical practices of re-enactment, shadowing or redoing modes of seeing/listening as an active spectation for ‘another next first time’ (re-documenting). The two examples that follow illustrate how naturally occurring social interaction in virtual settings can be captured volumetrically. In each case, analytically relevant aspects of the spatio-temporal flow and performance of the event are recorded. They are immersively replayable as if live from any spectatorial position. Of course, the recordings presented in the following examples are from screen captures of the computationally rendered scene on a computer display ‘viewport’, but this is for convenience given the limitations of traditional academic publishing genres. The ‘data’ are really the volumetric performance capture that can be re-enacted. Auslander (2018) contends that re-enactments of documentations are always productions, not reproductions, in a new context. They are not historical documents that give access to objective experiences of the original performance. Auslander suggests that ‘reactivation’ might be the better choice for accounting for the relationship in which ‘each reactivation discloses the original, but discloses it under different circumstances’ (85). The ‘data’ in the examples below only live phenomenologically when reactivated, so the unique traces of performance are derivative of the capacity for immersive, performative replay.
Example 2
On the other end of the continuum from Example 1, ‘data’ were collected from computer gaming environments that can capture and re-enact a player’s 3D avatar gameplay in VR for another player to re-inhabit. Example 2 focuses on an instructed action sequence in a social VR game (Mindshow), in which players record, edit and share staged scenes with avatar puppets that are controlled by the player. 12 Shared scenes can be reinhabited and modified by other players, thus supporting a remix culture. An experienced player, John (JOH), has captured themselves giving instructions on how to act out a 3D animated character/avatar (circled in shot 1 in Figure 2) in a scene in anticipation of a new player in the future wishing to learn how to do the same. 13 In 30 s, John performs a short, public tutorial entitled ‘How to Make a Mindshow 1’ in the direction of a virtual placeholder (circled in shot 3 in Figure 2, but also visible in shots 4 and 9–16) that signifies the position of the player when they start/inhabit the re-enactment. The scene was ‘recammed’ – re-filmed as a 2D video during a re-enactment – from multiple angles. Sixteen unique re-enactments (shots numbered 1–16 in Figure 2) were undertaken by the author for ‘another next first time’ (Garfinkel, 1996: 10). 14 For the purposes of documentation, a 2D composite video was made comprising the sixteen mainly static, handheld virtual camera angles, including three performances of laic analysis (shots 14–16), recording the viewport of a mobile spectator within the re-enactment of the ‘original’ scene (see Figure 2). 15 Following Mengis et al. (2018)’s typology of camera views, the 16 shots were indicative of the Panoramic View (3, 4, 9, 10, 11, 12, 14, 15, 16), the American-Objective View (1, 2, 5, 6), the Roving Point-of-View (mobile virtual cam, 7, 8, 13) and the Infra-Subjective View (All, HMD). 16 As this is volumetric capture, all of these views are represented uniquely, simultaneously and multiple times, in the composite video of the same tutorial scene with John.

Composite of sixteen Mindshow replays.
A 30-s video clip is available online that shows the complete instruction sequence with 16 synchronised unique replays. 17 The audio track is from the original voice track recorded by John. A speech-centred, script-based transcript is included with the online video clip that characterises the speech of John in relation to the gross movements of his head/HMD and hands/controllers.
This example demonstrates how an instructed action unfolds in a virtual environment in anticipation of a re-enacted viewing in the future. In this case, unlike in YouTube instructional videos, it is not the video itself which is the accountable documentation of the instructed action, it is the volumetric performance capture (combined with the temporality of the audio recording of voice) which is the documentable. Although the performance is designed for a future player occupying a specific location (indicated in shot 3 in Figure 2) that so-positioned-player can move their head during the re-enactment, and one can also inhabit the scene as a spectator from any position anyway. The performance accomplishes its viewability in a typical facing-formation (an ‘F-formation’) constituted by a circular vis-à-vis arrangement of bodies delineating an ‘o-space’ (Kendon, 1990, 2010), in which the instructor instructs a future Mindshow player in front of them, a virtual arrangement which includes the static object/avatar. This is a familiar triadic face-to-face social encounter.
Example 3
The third example illustrates another instructed action sequence in a different social VR game (VREAL), in which a player, known as ‘Alliehugs’, deploys another VR tool (Tilt Brush) to instruct a future player how to use the game to capture volumetrically their own gameplay, and replay someone else’s, in a supported third-party VR game. 18
Unlike Mindshow, which is self-contained, VREAL cleverly builds a capture/replay layer on top of other selected VR games, that is it allows the player to capture their normal use of a VR game and for anyone to re-enact what was captured from any perspective in the 3D scene that was recorded. 19 A composite video was made by the author comprising multiple replays recorded from a variety of unique (and arbitrary) virtual camera angles rendered to the viewport of the player, including (a) an original player HMD viewport, (b) a mobile spectator POV viewport, (c) a static aerial cam and (d) a static behind-a-poster cam (see Figure 3). A 3-min video clip is available online that shows the beginning of the tutorial by Alliehugs. 20 The composite video comprises two videos made up of four replays edited together in sync with the original voice track. The clip begins after 1:21, and at this timecode cam1 shows a POV and cam2 shows a HMD viewport. A speech-centred, script-based transcript is attached to the online video clip that represents the speech and movements of Alliehugs in relation to her gestures and the visual props in the virtual environment. Given there are fewer ‘recams’ than in Example 2, shifts between virtual cameras are noted explicitly.

Composite of VREAL replays.
Although similar in some respects to Example 2, this example demonstrates more sophistication in its mode of volumetric documentation of actual gameplay. In analytical terms, we get to re-view a richer instructional sequence – for example there is a broader toolset available, repurposed from the VR game that VREAL piggybacks on. This instructional sequence is designed for a future player in that it accomplishes a spherical co-viewing. The performance accomplishes its viewability in the virtual volume through the instructor’s staging of information screens in a circle encompassing the virtual participants. Also, the instructee-to-be is explicitly invited to occupy the space in a negotiable, virtual over-the-shoulder or side-by-side arrangement that co-constitutes the F-formation of their focused virtual social encounter.
Conclusion
As discussed above, it is not yet possible to fully capture a naturally occurring physical scene volumetrically with the necessary degree of fidelity and reliability required for qualitative research. Nevertheless, the first example above illustrated how a volumetric performance capture could be envisaged via a 3D reconstruction of the site as an interactive archive, which provided a spatial interface to the actual recordings made of the event. The second and third examples took place purely in virtual reality, in which it is possible to capture the performance volumetrically and replay it in VR. It is important to note that even though the latter may appear to be an inferior (computationally rendered) life-world, which has little bearing on the complexity and richness of our everyday socio-material worlds, that is beside the point. Despite their limitations, such cases still demonstrate that members treat them as having a scenic (and sonic) intelligibility; they are virtually real (Shields, 2003).
This article has focused on new forms of volumetric data collection, as well as the staging and inhabiting of such data for the purposes of qualitative research. In such cases, the technologies of capture and replay are congruous to a degree with the technologies and techniques of qualitative data analysis, and yet there are differences (Flick, 2018). If one does have the technology and the skills to make a volumetric performance capture, then a strategy to begin analysis might be, for instance, to re-enact the volumetric performance capture and find positions and angles on the witnessability, observability and visibility of the phenomena. One could, for example, take a panoramic stance to capture the whole scene from the outside, or roam around the scene as a mobile spectator or take the embodied position of each participant, that is their head movements, eye movements and body movements. One could ‘recam’ those views that are significant. With such a strategy, there are clear advantages and affordances that volumetric performance capture and immersive performative replay has over traditional 2D video recordings. For example, there is (a) frameless capture, (b) the freedom to move the camera during a continuous replay shot, (c) the freedom to review the scene from any position and angle and (d) the opportunity to hear the directionality of sounded doings. Some positive use cases unique to the approach considered in this article include:
A means for other researchers to re-view the audio or video ‘data’, depending on ethical constraints, and find alternative accounts based on a re-seeing or a re-listening 21 ;
A means to resolve occlusions in the 2D video image, for example a gesture or shift in eye gaze is occluded by another person or object in the scene;
A means to resolve views of the scene that are not viable with a specific configuration of passive 2D and 360° footage and
A better purchase on the scenic and sonic, namely volumetric or spherical, intelligibility of bodies and objects in practical activities.
Some researchers might argue, for instance, that the virtualisation of video as a technology of re-viewing is unnecessary or a distortion of a participants’ perspective, which ordinary video escapes. However, this stance is vulnerable because 2D ’video’ (and ‘audio’) itself has always relied on a virtualisation of detail inside the ‘frame’ that can be recovered by replay, yet which is not available to the participants. Additionally, claims that monophonic audio is exempt from the epistemological problems faced by ‘subjective’ video data are often anchored in an uninterrogated falsely universalist notion that sound is uniform and equally accessible to all. Suchman and Trigg (1991: 78, my additions) note that ‘video-based interaction analysis affords a powerful corrective to our tendency to see [hear] in a scene what we expect to see [hear].’ It instils a healthy scepticism about the validity of observations that were made without the possibility to check the record more than once. What is the significance of this observation when there is no videotape and no definitive ‘record’? I contend that it leads to a renewed scepticism about the validity of observations that were made based on ‘the record’ when there is no possibility to check it (against) the records-to-come.
It is important for qualitative research to open up a dialogue about a new set of metaphors, assemblages, optics and acoustics, and critically examine their ‘phase cancellations’, ‘dead zones’ and ‘blind spots’. As Liegl and Schindler (2013: 262) have argued with respect to video in sociological research, specific vis-abilities emerge, in which ‘a nexus of knowledge and vision comes into being. In order to comprehend or trace these practices sociologically, one must acquire the practical knowledge of the respective participant in order to be able to see the same situation as the participants see.’ We must always interrogate, following Hirschauer (2006: 422), how any new mode of recording transfers ‘everyday incidents from their native contexts into the context of sociological argumentation by exceeding and falling below the participants’ knowledge’, how they textually reify ‘a singular event, emancipating ‘data’ from the participants’ control’ and establish ‘a stable empirical referent within sociological discourse’. Within an earlier paradigm of data collection, Ashmore et al. (2004: 355) raise the spectre of ‘tape fetishism’, in which (audio)tapes are treated as having ‘the power to transport the listener directly to the original event.’ We can speculate, following Ashmore and Reed (2000), whether or not a new ‘stronger’ analytical object under different circumstances is engendered through volumetric performance capture, something ontologically closer to the ‘event’ that disrupts the epistemic authority of the ‘Tape’, and, therefore, to which the ‘Tape’ would relate, much as the Transcript does today to 2D video ‘Tape’. Is there a danger of volumetric performance capture fetishism? Auslander (2018: 16-17) inoculates against this by shifting our attention away from the documentation to the experience of the re-activation as performative itself. Although documentation is unequivocally something ‘other than performance’, we can still ask what makes documentation ‘so inexorably linked to it that it is hard to separate it from it’ (Giannachi et al., 2012: 171)? Following Auslander (2009: 85), ‘each reactivation discloses the original, but discloses it under different circumstances.’ Arguably, what volumetric performance capture affords is not yet a witnessing, not yet a making visible of the work of assembling visible social fields as practical courses of action, not yet a finding of the scenic intelligibility of a course of action, finding in its course the continuous ‘framing’ of the capture. That work and achievement of intelligibility is yet to be undertaken ‘after’ the originary event when a proliferation of simultaneous continuous ’shots’ can be performatively accomplished. Of course, we must be cautious here; this is not a witnessing in the course of action in the original scene, that is the ghostly re-witnessing has no impact on the re-enacted actions.
Footnotes
Acknowledgements
I would like to thank my colleague Jacob Davidsen for our mighty collaboration on Big Video, in which many of the strands and ideas of this article were brewing. Also, much gratitude to Nicklas Haagh Christensen, who was the lead programmer and developer for BigSoftVideo. He suffered long discussions with me on core technical concepts and coding relevant to this article. Colleagues provided useful commentary and criticism in a ‘silent pre-data session’ focusing on the examples presented in this article. The Video Research Lab (VILA) at Aalborg University provided technical equipment and support staff for software development.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
