Analytic Affordance: Transcripts as Conventionalised Systems in Discourse Studies

Abstract

This article explores the role of transcripts in the analysis of social action. Drawing on a study of the interactional processes in optometry consultations, we show how our interest in the rhythm of reading letters from a chart arose serendipitously from our orientation to transcription conventions. We discuss our development of alternative transcription systems, and the affordances of each. We relate this example to constructivist debates in the area of transcription and argue that the issues have been largely characterised in political terms at the expense of a focus on the actual processes of transcription. We show here that analytic affordances emerge through an orientation to professional conventions. The article ends by suggesting that a close reflection on the design of transcripts and on transcription innovation can lead to more nuanced analysis as it puts the researcher in dialogue with the taken for granted ideas embedded in a system.

Keywords

affordances conversation analysis transcription

Introduction

Issues relating to transcription have received an increasing amount of attention in the last decade or so. Researchers have been particularly interested in the epistemological questions that emerge from the creation of iterations of data that fix it into some (usually) written form. Hammersley (2010) has shown that from this discussion has emerged a consensus around the idea that transcripts should be viewed as a construction, which ‘re-present’ (Gibson and Brown, 2009) data rather than providing some neutral scientific instrument of presentation (Bucholtz, 2000). One of the key aspects of the debates in relation to transcription has been the strong concern with showing the ‘politicised’ nature of transcriptions (2000), and the ways that transcripts operate as ideological formations (Roberts, 1997) that ‘translate’ (Kress et al., 2005) social practice into the fixed written ‘tellings’ of a researcher (rather than of the researched). Transcripts draw attention to particular features of speech and social action (such as talk, gesture, movement, gaze) that implicate preferred readings of the interaction they re-present. A transcript does not neutrally report, but displays a particular understanding and perspective of whatever it is that is being displayed. As such, transcripts are perhaps in some ways the epitome of an unequal relationship between the ‘researched’ and the ‘researcher’, where the latter defines, or at least delimits, a reader’s understandings of the former.

While such issues have been strongly on the agenda in the increasingly self-aware social science communities, the actual processes through which transcripts are used in analysis have been rather less prominent as areas of discussion. Where they are discussed as practices as opposed to ‘epistemic issues’, researchers quite often make a distinction between two types of transcripts or, more accurately, between different analytic aims within the process of transcription. Bucholtz makes a distinction between ‘naturalized’ and ‘denaturalized’ transcripts, with the former referring to forms of representation that ‘conform to written discourse formations’ (2000: 1439) and attempt to present ‘what was meant’ within a given discourse event. ‘Denaturalized’ approaches try to capture the nuances of spoken language, the inflections and interactional components that are viewed as important carriers of meaning. Similar distinctions are made by Gee (1999) who talks about ‘broad’ and ‘narrow’ transcriptions and Gibson and Brown (2009) who describe ‘unfocussed’ and ‘focussed’ transcripts. (We prefer and will use the term ‘focussed’ transcript because it draws attention to the analytic work of identifying specific features of interaction for analysis.) These sorts of categorisations indicate the different roles that transcripts can play for researchers as either a kind of verbatim record of a discursive event that has little interest in the nuances of speech, or as a detailed analysis of particular features of talk.

In whatever form, transcripts provide an analysis of settings and draw out relevant features of the talk or context in order to display and to work through their analytic relevance. In paradigms of research that use verbatim transcripts the aim of the transcript is simply to capture more or less what the participants in an event ‘meant by what they said’. The sorts of epistemic concerns highlighted at the beginning of this section are a reminder to researchers that the ways they represent discourse – even if ‘discourse processes’ are not their analytic concern – do have important implications for the implicit (or explicit) meaning of their ‘sense’ of the setting.

To give an example of one area of debate in relation to this, a common practice is to write phrases and words in a way that gives some indication of their phonetic distinctiveness, thus ‘sumin append’ instead of ‘something happened’ or ‘coz’ instead of ‘because’. Preston’s work argued that such non-standard spellings ‘serve mainly to denigrate the speaker so represented by making him or her appear boorish, uneducated, rustic, gangsterish and so on’ (1985: 328). This is particularly the case, it is argued, because such spellings are usually provided very selectively and therefore only turn some speech and some speakers into an object of ‘difference’ rather than others (see also Bucholtz, 2000, and Jefferson, 1996, on this). The implication behind such practice is that all standard transcription that does not have indications of phonetic distinctiveness is ‘normal’ and ‘ordinary’, and that non-standard transcripts implicate an ‘othered’ speaker. Further, because it is not systematically applied, there are real possibilities of actually mis-representing the speakers being depicted (Jefferson, 1996). Preston suggests that a unified phonetic spelling may aid non-evaluative transcripts, where sounds of speech can be accurately represented without implied evaluation, as all speech would have the same indications of their precise phonetic context.

In contrast to this position, Bucholtz points to the particular difficulty that phonetic spelling provides in terms of alienating audiences and making transcripts inaccessible to non-specialist readers. Indeed, Bucholtz notes that while there are good arguments for standardisation, ‘… a preoccupation with accuracy may prevent us from examining the equally important question of what is at stake in a particular transcription’ (2000: 1446). Accuracy is only one of a number of important concerns in transcripts; other important issues, it has been suggested, include readability (how well the transcript can be understood by an audience), and, central to the arguments of this article, to produce an analysis and an understanding. As Roberts put it, the role of transcriptions is to ‘call up the social roles and relations constituted in language and rely on [transcribers’] own social evaluations of speech in deciding how to write it’ (1997: 167–8). Transcripts are a bias and they are an interpretation – they provide an insight into the analyst’s story of what the researcher sees as relevant in a given transcript. As Jefferson notes, ‘… when we talk about transcription we are talking about one way to pay attention to recordings of actually occurring events’ (1996: 25). The role of transcripts (or at least, one of the roles of transcripts) is precisely to be selective, and to provide a focussed, perhaps even theorised, insight into some practice or other (Duranti, 2006; Ochs, 1979).

This article seeks to explore this notion of selectivity in transcription in relation to the idea of ‘affordances’ (Gibson, 1979). The term describes the relationship between objects and social products and the social conventions of their use: an object comes to offer up particular possibilities of use which emerge from the interplay between its own characteristics and the ways in which people perceive and use it. Norman (1988) popularised this notion of affordance, and since then it has spread into areas of human-computer interaction (HCI), design and also sociology (Hutchby, 2001a). For our purposes, it is sufficient to note that we consider the affordances of transcripts to be both enabling and constraining, aspects of technology that Hutchby respectively calls ‘functional’ (they facilitate some types of professional gaze while limiting others) and ‘relational’ (different people using the same transcript may see different relationships relevant to their particular research interests) (2001a: 448). In this sense, the concept of affordances helps us to systematically examine the relationship between the design of the transcript and the way in which the analyst inspects the video data (cf. Hutchby, 2001b). We suggest that, in producing conventionalised transcripts, researchers create an object that enables them to focus on quite particular features of a social context but, at the same time, constrains their ability to analyse others. We discuss this idea in more detail in the next section in relation to ‘focussed’ forms of transcription in discourse studies.

Transcription and Analysis in ‘Discourse’ Studies

Where researchers are quite explicitly interested in the operation of discourses then they typically work within quite well-enshrined conventions of representation. The Jeffersonian method of transcription (Jefferson, 1984) is one of the most prominent of these, versions and variations of which can be found in many areas of ‘discourse’ studies (Dressler and Kreuz, 2000). Jefferson’s system was developed in the context of the perspective of conversation analysis (CA), and aims to help pursue the very particular analytic interests of that discipline. The transcription notation enables researchers to document the organisation of talk by paying close attention to overlap in talk between speakers, and to look at the emphasis, intonation, elongation and other specific aspects of utterances. While the system was designed to examine the sequential order of everyday conversations, it also has value beyond this discipline. The focus on the character of discourse that the transcription notation provides has aided researchers in all kinds of disciplines who have interest in discourse processes.

However, as with any method of representation, the system itself has embedded within it certain taken for granted – and probably, quite frequently unnoticed – assumptions. Firstly, the transcripts are organised on the basis of the identity of participants, with the categories used to identify speaker (Doctor/Patient, Man/Woman, Speaker 1/Speaker 2, Mary/Peter) privileging some component of that identity (or bracketing out their identity altogether) (Watson, 1997). The very naming system creates a way of reading and seeing the discourse and its ‘orderly properties’. A further argument often levelled at these modes of representation is that their exclusive focus on talk severely hampers the development of a sufficiently detailed understanding of the organisation of contexts. Erickson describes these forms of representation as ‘playscripts’, and argues that they ‘obscure relations of mutual influence between the speaking behaviour of speakers and the listening behaviour of listeners’ (2010: 342); they put talk as a spoken act at the centre of the analysis, removing all the other aspects of the contexts (such as gesture, gaze, posture) that are relevant for understanding the broader context of such activity. The conventions of representation draw the analysis to the exploration of ‘speaking behaviour’ as against other communicative actions. In orientating to these conventions of transcription, researchers reproduce the analytic foci and restrictions implicit within them; the conventions themselves come to structure the analysis and inform the organisation of the research.

While the sets of debates that have been described above show some awareness of the ‘structuring’ features of transcription processes as analytic tools, the very particular ways that transcripts produce understanding within research contexts (rather than in the abstract) have not been discussed in detail. Where the processes of transcription are explored, they tend to be described in very ‘intentional’ terms, where the researcher is involved in a rigorous inspection of data in order to actively choose the relevant features to be represented in the form. Sidnell (2010) provides a good description of the difficulties of transcription, and the frustrations of attempting to first of all ‘hear’ the complexities of spoken language, and then to re-present them accurately in written form. These sorts of accounts show that the relation between a transcript and data is a complex one, which emerges from very rigorous and time-consuming involvement with data. People who work with transcripts in these genres tend to maintain a close relation to the data to which they pertain: data sessions often involve groups of people reading through transcripts while listening to or watching the audio/video data that it references. As a result, transcripts are fluid objects that change and morph as the analysis does. As we have seen, however, the purpose of a transcript is to bring to the fore some features of the data rather than others, and it does so through the sorts of conventionalised systems described above. Our interest in this study is to look at the relationship between these systems and the process of analysis.

Multimodal Transcription

Nearly all of the preceding discussion has referred exclusively to the transcription of talk. As we noted, however, this is a very restrictive way of conceptualising social settings, and is very limited as a means of analysing social action (Heath et al., 2010). Multimodal transcripts are forms of representation that are interested not only in talk or discourse, but also in other communicative ‘modes’. In this article our reference to ‘mode’ is different to its application in semiotic or multimodal research, where it describes components of a signifying system. In this article the term describes aspects of communication, such as gesture, body posture, gaze, the manipulation of objects, music, tools, or any other feature of a setting that might be relevant to understanding a social context.

Researchers involved in analysing social settings (from whatever perspective) and who have an interest in that setting other than the organisation of talk, often use video for their data collection (Heath et al., 2010). Unlike the transcription of talk, there are no clear conventions for transcribing video data (Gibson et al., 2011). The absence of conventions in transcribing multimodal forms of data mean that researchers working with video have to very much ‘find their way’ in figuring out how to make a transcript. Bezemer and Mavers provide a semiotic analysis of multimodal transcription processes, drawing out the ways that researchers ‘select’ (choose from a range of data) and ‘highlight’ (make visible particular features) aspects of data through transcripts. The authors show how transcripts ‘translate’ from one medium to another, turning talk into written words or movement into a series of images from a particular angle, producing new ‘syntax’ and spatial alignments. Importantly, these re-descriptions of the world – re-orderings and re-iterations – are, as with transcriptions of talk, analytic in their aim and function. As they succinctly put it, ‘The modifications brought about by transduction are not only necessary, but it is precisely the re-making of observed activities in a transcript that can lead to fresh insights’ (2010: 196). Bezemer and Mavers look at a range of transcripts and show how their authors order the phenomena through particular semiotic forms. Their description goes a long way to illustrating the ways that transcripts order the world as new re-descriptions of particular events.

This article deals with a related but distinct issue – the process of producing transcripts as a part of analysis. Drawing on examples from a research project into optometric consultations, we describe the analytic and practical decisions that led to the development of new forms of representations. The project that informs the following discussion was an ESRC-funded study¹ of optometric practice, which involved video recording more than 70 one-to-one consultations in clinics in and around the London area of the UK.

Analysis

Conceptualising a Problem

The ability of a transcript to help a researcher to ‘conceptualise’ a problem depends on its representational affordances. Within the optometry project, one of the areas that the team became interested in was the way that patients read out letters during a distance vision test. In particular, we began to focus on the rhythmic features of the patient’s reading. Figure 1 gives an example of the issue. In lines 1 to 5 the optometrist asks the patient to place a patch over his left eye. The optometrist goes on to ask him to read the lowest line down that he can (line 5), and the patient proceeds to read the letters (lines 7–11) (n.b. the transcription symbols used in both this example and in Figure 2 are provided below the transcription itself).

Figure 1.

Rhythm in speech while reading letters in a distance vision test.

Figure 2.

Turn-taking and rhythm in letter reading in a distance vision test.

This example is unusual as, ordinarily, the patient and optometrist treat the end of a line of letters as a marker in a turn-taking sequence, so that after the end of a sequence the optometrist says something like ‘good’ or ‘next’ – see Figure 2 for example. In Figure 1 though, the patient reads out one line after another without leaving such a gap. However, what was particularly interesting in this transcription was the rhythmic way in which the letters were read out at the beginning and how this rhythm seems to change as the reading progresses. At first, the letters are read out without pause (‘EASILY read the Ee Pee eNn: yoU Vee’, line 7); in the second line (‘Zed aR: (0.3) yoU Eee whY (.)’, lines 7–8) there is a pause after the first two letters, and in the third line (‘yoU enN (0.5) Tee (.) something Zed’, lines 8–9), there is again a pause after the first two letters, although this time slightly longer, and then a further one after the third letter. After looking at several similar examples, the team began to suspect that the rhythm of letter reading may indicate something about the patient’s ability to read.

Different examples showed slightly different characteristics in relation to the rhythmic reading of letters. In Figure 2 we see that having been requested to read the letters in a list (lines 5–7), the patient reads the first line in an even rhythmic pace (line 8), but in the next two lines includes pauses after the first three letters (‘eNn Vee yoU (1.0) Pee Dee’, line 10, and ‘enN yoU enN (.) ahR Zed’, line 13). In both of these second lines, the ‘rhythmic presentation’ with which the reading is delivered is the same, consisting of three letters, a pause, and then another two letters. Even on the last line, where the patient actually gets some letters wrong, the letters are presented in the same format. There is something slightly different happening here to the talk in Figure 1, in that the rhythmic character of the reading in the line where ‘problems’ occur is delivered in a way that is consistent with the other lines.

The central point to note in the context of this article is that the transcript enables us to concentrate on rhythm of the reading because the representation of pauses is a part of the conventions of the medium. The transcript ‘affords’ a focus on rhythm and enables the researcher to pay attention to a particular feature of interaction. ‘Unfocussed’ (Gibson and Brown, 2009) verbatim modes of representation, for example, do not do this and would not be able to show the possible relevance of rhythm.

However, while they may be useful for highlighting an issue, even these more detailed ‘focussed’ forms are limited for the purposes of exploring the rhythm of speech. The representational form can show something of the comparative rhythmic quality of pauses but it cannot illustrate with much precision the rhythmic characteristics of words themselves. While the elongations of the word sounds show that there is some stretching in their pronunciation, these are very rough indications. We needed to develop a mode of representation that could help us to focus on the rhythm in words and pauses in equal detail.

Music Notation Transcription

Our initial transcripts enabled us to identify an issue then, but we needed further representational innovation to be able to explore it more fully. The first method we used involved adopting the transcription system used to depict rhythm in music scores. This is quite a logical notation form to use, of course, as music notation is developed for the purposes of representing rhythm in sound. A similar approach has been used by Erickson (2010) to transcribe multimodal actions. Music scores divide time up into ‘bars’ which are a fixed segment of time comprising of a ‘count’ of 2, 3, 4, 6, 8 (or whatever) beats. In our transcripts all bars were divided into beats of 4. The notes that make up the bars have particular durations. Minims (or ‘half notes’) have a count of 2 (i.e. there are two in a bar), Crotchets (or ‘quarter notes’) are a count of 4 (i.e. there are four in a bar), Quavers have a count of 8 (i.e. there are eight in a bar) and so on. The notes have relative values to one another, so that a whole note is equal to two half notes, each of which is equal to two quarter notes, etc. The notes can be ‘tied’ together to extend their duration, and pauses in sound are shown with distinctive ‘rest’ markers, which have the same divisions of duration as the notes. See Figure 3 for a more detailed outline of this notation.

Figure 3.

Conventions in music notation.

Figure 4 shows a hand-written transcript for one of my early attempts to represent the variations in speech through music notation. There are three segments of reading here taken from the same consultation. The letters being read out are written underneath the notes both as letters and as they would appear in a conventional conversation analysis transcript. We can see that in all three of the examples the reading appears to be more or less rhythmically equal. In the first example the letters N, U, R and Z are read in the same rhythm, with only the N having an elongated character. All of the letters in the next example are read at the same pace, as are the R, Z, P and D from the final one.

Figure 4.

Representations of rhythm in speech through music notation.

This form of representation shows pauses and pronunciations as relative values – it illustrates how long a pause lasts comparative to the pronunciation of a word (or ‘letter’ in this case). We were also able to see things slightly differently through the transcript. For instance, the first example is actually a representation of the letters read out in line 13 of Figure 2. In Figure 2 it looks like the N following the U is rhythmically similar but in Figure 4 we can see that it appears to be elongated. This ‘elongation’ does not occur in the stretching of either of the two sounds that make up the letter (as in ‘e:nn or ‘enn:’), but in a slowing down of the reading. This could have been represented in the decontextualised system as <enn> to show that the talk had been slowed, although that would not show how much slower or the relative rhythmic duration of the letters.

As a representational form, the transcriptions facilitated the direct comparison between sequences of letter reading: the positioning of one line of letters above another presents the rhythm as comparative to the others. For instance, in the first and third examples in Figure 4, the first two letters of the series are read out evenly. In the first example, the third letter (enn) is extended slightly and followed by an eighth note pause, and then the last two notes are read out evenly, and in the same rhythm as the first two. In the third example, this same pattern can be seen, but the pause and the extended letter (sea) are reversed. The questions that we found ourselves asking about this were related to the use and duration of pauses and the extension of words, and whether they were treated by optometrists as signalling some difficulty in reading. To ask this question we needed to use these ‘rhythmic transcripts’ in conjunction with our original transcripts so that we could see how the optometrists responded to these different types of reading.

In the process of producing these transcripts it became very apparent that there was a disjuncture between the recording and the written articulation of the reading. Essentially, some of the orderliness of the speech represented in forms such as Figure 4 is artificial and is a property of the transcription, not of the talk. Music notation is designed for representing music within the western tonal and rhythmic systems and is not good at showing the dynamics of music (let alone of other sounds) in other genres. Progler (1995), for instance, has shown that jazz music cannot be very accurately represented by western music notation as the complex rhythmic inflections are lost. The same is true in spoken language as the language is made to fit the rhythmic orderliness of the transcription system. Furthermore, the system indicates the similarity of the rhythm but strips the pace of the reading from the analysis. While we can see, for instance, that all of these systems were shown as being similar to one another in their rhythmic orderliness, we cannot see the variations of speed within their utterance. As an ‘ordinal’ system (Erickson, 2010), music notation does not show time but timing.

Timeline Transcription

The very apparent problems with this mode of representation led us to try to find another system that would help us to look more closely at the timing of the utterances in a way that compared their relative speed. We developed a transcription method that involved writing the talk against a timeline that showed seconds divided into fifths. We used a software called CatDV to analyse our videos, and this included a counter that, rather than count in tenths of a seconds, counts from 0 to 25 in divisions of five within each second (i.e. fifths of a second). For convenience, we used the same system to represent the time of our transcripts. Figure 5 is a version of a part of a transcript that we created. Here, the seconds are illustrated in bold along the top row, with the fifths of seconds shown in the row underneath them. Below the time, both of the participants are listed, with their speech outlined in grey. Each line of speech lasts for four seconds, and then continues below.

Figure 5.

Representations of duration in speech through timeline transcription.

Unlike focussed modes of representation, this timeline form of transcription shows quite precisely the relative timing of utterances. We can see for instance that the utterance from the optometrist (‘not so good:t’) was much slower than the second (‘anything on the top line’), as they lasted the same time (‘4/5ths of a second’) but comprised much longer phrases. While the relative pace of talk can be shown in conversation analysis transcription, this involves showing that something is faster or slower than surrounding speech, rather than showing their actual speed.

In terms of our interest in the reading processes, we can see that the first five letters of the chart that are read by the patient (03:35–03:36) are read out increasingly slowly. According to the measurements indicated here, the first two letters are read within 2/5ths of a second, the next two letters in 3/5ths of a second and the final letter in 3/5ths of a second. In comparison to the conversation analysis transcription, this notation system helps us to see quite precisely how the phrasing varies across time – that the letter reading slowed down considerably for the final letter. It also enables a close comparison of speed in different instances of the same person’s speech. We see, for example, that the second line of letters is at first read much more slowly with each letter lasting 3/5ths, 2/5ths and 3/5ths respectively, but with the final two letters read more quickly and spoken as a unit over 3/5ths of a second. This gives us a way of seeing in more detail how reading practices vary across a given patient’s instances of reading, and how they are dealt with by optometrists. Because utterances are grouped into ‘units’ of speech, the transcript also gives some indication of the rhythm of utterances. This can be combined with transcription from focussed modes of representation to demonstrate where letters are stretched (as in the first optometrist’s utterance), and other inflections.

For all its advantages as a comparative mode of representing speed, there are limitations with this form of transcription. For one thing, the division of talk into ‘speech units’ is an imperfect way to show timing, as the alterations of time ‘within’ those utterances is not very clearly shown. In the end, the reason for separating them out in this way is pragmatic in that it is an easy way to position text against a timeline. To do so with individual words would be extremely time consuming, so this represents a kind of compromise between workability and analytic interest. Furthermore, the phrasing that enables the researcher to hear these as different units is not represented in detail. The separation of, say, the ‘Yeah very good’, ‘One last’, ‘That arr at the end’, ‘Anything on the bottom one’ into discrete segments comes from being able to hear the phrasing of their intonation through pitch alteration, pace, micro pauses and other features as producing an audible separation of them as units. It would of course be possible to include these features in the transcription, but that would make an already complex transcription system even more time consuming.

Discussion

In the introduction to this article we suggested that the concept of affordances is important for understanding the analytic role of transcripts. As we saw in the previous section, a problem can be more easily conceptualised if the conventions of transcription enable it to be represented: the transcript’s functionality allows the analyst to discover and explicate certain sequential features of the recorded interaction (a process which we call ‘enabling’) while it ‘constrains’ the possibility of discovering others. In our case, we were able to see the potential relevance in the rhythm of reading because the focussed transcripts being used gave some loose indication of this as an interactional characteristic. There is something serendipitous in the interplay between the conventions of a modality of transcription (like conversation analysis) and the representation of some interesting finding. In areas of study where discourse processes are the primary interest the serendipity of the affordances of transcription modes have not been explored in detail. Researchers tend to describe the transcription process as highly intentional and as involving a selection on the part of the transcriber of the relevant features to pay attention to within a given interactional context; this is what Hutchby (2001a: 448) denotes as the ‘relational’ aspect of technology. This downplays the role of the conventions of practice within which the transcriber is working, and their role in helping the researcher to focus on one area rather than another. For example, the transcription conventions in CA were designed for the purposes of analysing the sequential order of ordinary conversations, and their mode of representation is a design that enables researchers to achieve this analysis. As we showed, restrictions arise when this mode of representation is used to focus on aspects of social action that are imperfectly represented within the system. It is here that the affordances of the medium become quite keenly felt.

The previous discussion was a preliminary study of the relationship between affordances and analytic interest, and how the conventions embodied in CA transcription helped us to develop our interests in an interactional phenomenon. In our example, having noticed the relevance of a particular feature, we looked for a new representational form to study it in more detail. We tried to find a way of comparing instances of reading that would highlight more clearly the rhythmic quality of the utterances as well as the pauses. By doing this we noticed that while the system helped us to focus on rhythm, the transcription form also stripped out other features of the interaction, notably the speed of the talk. In foregrounding one aspect of the data we had of course removed other relevant aspects. Our timeline transcripts were an attempt to recover the temporal nature of the reading practices being observed. As with the previous systems, these transcripts worked well in helping us focus on that aspect, but in doing so they stripped out other potentially relevant areas.

The now well-developed argument that transcripts are a device to knowledge construction has largely been worked through in relation to the political sensitivities around representing data (Hammersley, 2010). The example provided here shows that there are issues of equal importance in relation to the construction of our own research problems and problematics. The discussion may make this seem like quite an esoteric matter relating to niche areas of interest. However, there is an important point to be taken from this example that has relevance beyond the particular case discussed here. The conventions of transcription in conversation analysis are now widely used and quite generalised systems of representation. Dressler and Kreuz’s (2000) survey of these practices led them to present a model for transcription that focusses on very particular areas of discourse. What we hope should be quite clear from the previous discussion, however, is that any conventionalised system produces a set of boundaries of concern; quite explicitly specifying sets of relevancies and, by definition, irrelevancies. The affordances of use, then, not only relate to an individual researcher’s area of activity, but actually come to define or constitute certain logical boundaries of enquiry within which a community works. If Dressler and Kreuz’s survey of practice has value as a yardstick, then there is an implication that discourse studies more broadly may be working within a quite bounded methodology.

However, there is by no means a causal or overly rigid relation between convention and action, and that innovation of practice is, as Becker (1982) showed in his studies of art activity, an important component of any domain of social action. Hutchby (2001a: 449) talks about the ways in which the affordances of technology may be influenced by social rules and conventions that delimit the possibilities for action. The example discussed here can be taken to show that analysis should involve a careful reflection on the relationship between transcription conventions and conceptual focus. The case has been made quite strongly that the procedures for naming (Watson, 1997) and the exclusive focus on talk at the expense of other modes of interaction (Erickson, 2010) have an important impact on one’s analysis. Our example shows that any representational system will of course strip some features of social action at the expense of others. The issue therefore is not about how to improve transcription systems, but with treating transcripts more like a design of analysis, rather than a technical enterprise.

We saw in the previous discussion that design was an important part of the process of thinking through our interests: establishing the requirements of representation that are relevant to the particular problem being explored. Bezemer and Mavers (2011) have described this in terms of ‘selecting’ and ‘highlighting’ data, where the researcher works out which data to use and to foreground. These are particular parts of a much more general issue of ‘establishing and working through a problem’. We saw that, in all of our transcription innovations, the old systems of representation had an important role to play; that innovation did not involve ‘starting from scratch’, but a kind of experimentation with new modes of representation. We saw that there were quite practical considerations that impact on the choice of representational form: one of the concerns that has been discussed in the literature in producing transcripts is their readability for lay readers (Roberts, 1997). Another fundamental issue highlighted in the above descriptions is what we described as the ‘compromise between workability and analytic interest’. Our timeline transcription was ‘good enough’ as a representation of pace, and while it could have been more accurate by showing the duration of each word, this would have made the transcription process far too onerous. We found also that a combination of different modes of representation was valuable to be able to pursue our interest (e.g. to see how similar rhythmic patterns were interactionally dealt with by optometrists).

The purpose of a transcription is to draw attention to some aspect(s) of the data – to help the researcher to conceptualise an issue, to analytically work it through and, ultimately, to represent that problem to a reader. These represent potentially quite different interests that can operate in tension with each other. This article has attempted to characterise this interplay in relation to a real-world example, and to highlight the importance of having an awareness of it.

Footnotes

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. However, part of the work relates to the optometry project detailed in Note 1.

Notes

Will Gibson is a qualitative sociologist with a particular interest in studies of social interaction. He has conducted empirical research in various settings, including medicine, music and education contexts. Will’s publications have explored methodological issues related to qualitative enquiry, and have looked at various technical issues related to the in situ organisation of social conduct.

Helena Webb uses ethnomethodology and conversation analysis to study healthcare-based interactions. Her PhD investigated doctor–patient interactions during medical consultations about obesity and she is now a member of the Work, Interaction & Technology Research Centre (WIT) at King’s College London working on the ‘Assessing Eye Sight and Ocular Health: The Practical Work of Optometrists’ project.

Dirk vom Lehn is Lecturer in Marketing and member of the Work, Interaction & Technology Research Centre. His principal research interest is in social interaction in museums, galleries and science centres. His most recent publications are Harold Garfinkel (UVK Verlagsgesellschaft, 2012) and ‘Discovering “Experience-ables”: Socially including visually impaired people in art museums’, Journal of Marketing Management 26(7–8): 749–69.

References

Becker

(1982) Art Worlds. Berkeley: University of California Press.

Bezemer

Mavers

(2011) Multimodal transcription as academic practice: A social semiotic perspective. International Journal of Social Research Methodology 14(3): 191–206.

Bucholtz

(2000) The politics of transcription. Journal of Pragmatics 32(10): 1439–65.

Dressler

Kreuz

(2000) Transcribing oral discourse: A survey and a model system. Discourse Processes 29(1): 25–36.

Duranti

(2006) Transcripts, like shadows on a wall. Mind, Culture, and Activity 13(4): 301–10.

Erickson

(2010) The neglected listener: Issues of theory and practice in transcription from video in interaction analysis. In: Streeck

(ed.) New Adventures in Language and Interaction. Amsterdam: John Benjamins, 243–56.

Gee

(1999) An Introduction to Discourse Analysis: Theory and Method. London: Routledge.

Gibson

(1979) The Ecological Approach to Visual Perception. Boston, MA: Houghton Mifflin.

Gibson

Brown

(2009) Working with Qualitative Data. London: Sage.

10.

Gibson

Webb

Vom Lehn

(2011) Re-constituting social praxis: The ethnomethodological analysis of video data. International Journal of Social Research Methodology 14(3): 207–18.

11.

Hammersley

(2010) Reproducing or constructing? Some questions about transcription in social research. Qualitative Research 10(5): 553–69.

12.

Heath

Hindmarsh

Luff

(2010) Video in Qualitative Research. London: Sage.

13.

Hutchby

(2001a) Technologies, text and affordances. Sociology 35(2): 441–56.

14.

Hutchby

(2001b) Conversation and Technology. Cambridge: Polity Press.

15.

Jefferson

(1984) Transcription notation. In: Atkinson

Heritage

(eds) The Structure of Social Action: Studies in Conversation Analysis. Cambridge: Cambridge University Press, ix–xvi.

16.

Jefferson

(1996) A case of transcriptional stereotyping. Journal of Pragmatics 26(2): 159–70.

17.

Kress

Jewitt

Bourne

. (2005) English in Urban Classrooms: A Multimodal Perspective on Teaching and Learning. London: Routledge Falmer.

18.

Norman

(1988) The Psychology of Everyday Things. New York: Basic Books.

19.

Ochs

(1979) Transcription as theory. In: Ochs

Schieffelin

(eds) Developmental Pragmatics. New York: Academic Press, 43–72.

20.

Preston

(1985) The Li’l Abner syndrome: Written representations of speech. American Speech 60(4): 328–36.

21.

Progler

(1995) Searching for swing: Participatory discrepancies in the jazz rhythm section. Ethnomusicology 39(1): 21–54.

22.

Roberts

(1997) Transcribing talk: Issues of representation. Tesol Quarterly 31(1): 167–72.

23.

Sidnell

(2010) Conversation Analysis: An Introduction. Oxford: Wiley-Blackwell.

24.

Watson

(1997) Some general reflections on ‘categorization’ and ‘sequence’ in the analysis of conversation. In: Eglin

Hester

(eds) Culture in Action: Studies in Membership Categorization Analysis. Washington, DC: International Institute for Ethnomethodology and Conversation Analysis and University Press of America, 49–76.