Abstract
This article draws an analogy between physical nonverbal gesture and the textual conventions of new and social media to argue that the vital nonverbal functions of face-to-face communication are not absent from digital media, but that communicative functions typically enacted nonverbally are transposed into new spaces of interaction afforded by synchronous and near-synchronous textual media. Digital and social media text is conversational text that fulfills the phatic needs of typical social interaction: ‘keeping in touch’ does not in any way constitute a cultural regression but represents the fundamental ground of human cognition, which is inescapably both social and technologically dependent. An analysis of examples from the popular microblogging service Twitter serves to illustrate the gestural functions of digital media text, including the enactment of mediated social ‘spaces’. The closing section explores the theoretical implications for identity and agency of connecting embodied nonverbal communication to digital media communication that is all too often erroneously understood to be or implicitly approached as ‘disembodied’.
Keywords
Despite early techno-utopian visions of virtually embodied interactions in virtual worlds, the wide accessibility of mobile phones and VoIP technologies, the popularization of video conferencing (e.g. Skype) and the more recent availability of mobile video communications (e.g. Apple’s Facetime), most digital media communication remains firmly text-based. Computer-mediated communication began with text messaging and, as the recent spate of digital culture doomsayers are quick to remind us, it has not lost its roots. To take one recent example, Sherry Turkle’s Alone Together (2011), the Lonely Crowd for the 21st century, points out, yet again, the growing propensity to text instead of talk, and the inherent inadequacies of rapid-fire, decontextualized messages compared with sustained, semantically and emotionally rich face-to-face interaction.
Others, however, have long argued that computer-mediated communication ‘does not depart discernibly from oral and written patterns of conversation’ (Galegher, et al., 1998: 524). ‘Like speech’, argues Nancy Baym, ‘much CMC is direct, contextualized, and interactive’ (2006: 39). Rodney Jones (2009), among many others, points out that the forms and conventions of text-based new media communication have and continue to evolve a variety of means ‘to substitute for the contextualization cues normally present in [face-to-face] conversation’. Similarly, Danet et al. (1997) demonstrate that ‘online performance draws attention to the language and the medium, turning the lack of cues into a communicative asset’ (quoted in Baym, 2006: 41). As such, text-based digital media offer a plethora of options for nuanced social interaction and identity performance (Herring, 1999a; Jones, 2009; Quan-Haase, 2009). The performance of identity is, in fact, the primary affordance of social media, which allows participants to ‘validate and engage with others’ (boyd et al., 2010) generating a widely dispersed intersubjectivity (Crawford, 2009) through active audience construction (Marwick and boyd, 2011) to build a ‘faceted identity’ (boyd, 2001).
The ability to ‘communicate as desired’ speaks to an inherently human impulse that is ‘more easily enacted via technology’ (Walther, 1996: 33). And even text-based technologies, when combined with (nearly) ubiquitous network coverage and mobile digital communication devices – putting everyone you know in your pocket – can support the strengthening of social bonds with correlated implications for the shaping and maintenance of identity. Walther (1996) argues that the limited cues available in CMC afford not weaker interactions, but ‘hyperpersonal’ communication. In a context of assumed social identification, communicators respond to the limited physical and social cues available in text-based digital media interactions with a form of confirmation bias that attributes to their interlocutors beliefs and attitudes, and even age, race and socioeconomic status, that resemble their own (Jacobson, 1999). ‘The effect’, argues Rowe, ‘is generally a more intimate interaction over e-mail than what may occur in face-to-face interactions’ (Rowe, 2009: 81). The same can hold for any text-based communication among people who share (assumed) bonds of social identification to whatever attenuated degree. Communities form around shared interests and the virtual proximity of mediated co-presence as connected individuals develop their own interactional conventions and shared communicative norms (Boneva et al., 2006; Eckert, 2003; Quan-Haase, 2009; Rowe, 2001).
The language of new media text communication has been studied in a variety technological contexts, including the internet and mobile communications, generally (Baron, 2000, 2008; Crystal, 2001; Herring, 1996; Rowe and Wyss, 2009); email (Baron, 1984; Rowe, 2001, 2009; Wyss, 2008); IRC (Werry, 1996); instant messaging (Baron, 2004; Lewis and Fabos, 2005; Quan-Haase 2009; Quan-Haase et al., 2005); and SMS (Berg, 2009; Crystal, 2008; Thurlow, 2003), all of which are regularly examined in the open-source, peer-reviewed journal Language@Internet (date current). The conversational nature of social media communications has led scholars to describe text-based CMC as a ‘written-spoken hybrid’ (Rowe, 2009: 85) in which ‘text-based exchanges…emulate face-to-face conversations…[in] a creative blend of spoken, written and CMC conventions’ (Quan-Haase, 2009: 47–50). Instant messaging, for example, ‘despite the reliance on text for communication and the use of formal written style…is more akin to face-to-face conversations in terms of the users’ norms and practices because it is interactive and casual’ (Quan-Haase, 2009: 47). Where new media naysayers understand text messaging, instant messaging, and related social media forms ‘as text that happens also to be conversation, and, in that, find the form understandably lacking’, social media proponents ‘seem to see Twitter’, for example, ‘as conversation that happens also to be text, and, in that, find it understandably awesome’ (Garber, 2011). The conversational affordances of such technologies come from their ‘always-on’ nature and the wide variety of social cues that are indeed available, even in technologies limited to text, to manage identity performance and social interactions.
The present argument is that digital and social media text is conversational (Herring, 2010): embodied, often temporally immediate (synchronous) identity performance, no more potentially ‘disengaged’, ‘disingenuous’, or ‘shallow’ than face-to-face conversation. The mediated/embodied binary underlying the worrying of ‘shallow’ new and social media is a false dichotomy. All communication is embodied as all cognition is embodied. In face-to-face, physically immediate contexts, language relies on nonverbal ‘paralinguistic’ cues whose necessary functions are not entirely lost in mediated interaction, but are reconfigured as the media of their communication change. The social cues of physical interaction are transposed into new functional spaces of communication. This article aims to demonstrate gestural functions present in text-based digital media communication and explore some of the theoretical implications of the reconfiguring space of conversation.
Despite the fact that ‘Reliance on extralinguistic context is associated with spoken language’ (Baron, 1998; Rowe, 2009: 83), social media forms offer a number of non- or marginally verbal expressions that blur the lines among the conventions of physical and mediated interaction. Papacharissi argues that ‘Given the level of control over verbal and non-verbal cues in a variety of online contexts, individuals may put together controlled performances that “give off” exactly the “face” that they intend’ (2009: 210). So while CMC has been described as ‘creative’ (Rowe, 2009: 85), ‘playful’ (Baym, 2006: 40; Rowe, 2009: 81) and ‘telegraphic’ (Baym, 2006; Cherny, 1999; Werry, 1996) because of its ‘ephemerality, speed, interactivity, and freedom from the tyranny of materials’ (Danet et al., 1997), all of these characteristics apply to much face-to-face and verbal, non-goal-directed, social interaction. Facebook’s ‘variety…of nonverbal “pokes” and gestures’, for example, are ‘props’ that ‘provide the dramaturgical range within which to construct more elaborate performances of self’ (Papacharissi, 2009: 212).
Using a variety of mediated gestural forms and functions available with ‘merely’ text-based technologies, new and social media bring us ever closer to McLuhan’s proverbial global village in what has been called ‘connected presence’ or ‘continual connection’ (Licoppe and Smoreda, 2005), ‘mundane connection’ (Oulasvirta et al., 2009), ‘ambient co-presence’ (Stankovic, 2009; Wilson, 2009) and ‘digital intimacy’ (Marwick and boyd, 2011; Wilson, 2009) of our ‘social awareness streams’ (Naaman et al., 2010). Social interaction rich enough to acquire such monikers must be providing the social cues necessary to fulfill the functions of typical face-to-face, nonverbal cues. And in fact, Mischaud argues that ‘Twitter upholds the intrinsic social function of communication’ (2007: 34). Keeping in mind Nancy Baym’s contention that ‘mediated interaction should be seen as a new and eclectic mixed modality that combines elements of face-to-face communication with elements of writing, rather than as a diminished form of embodied interaction’ (Baym, 2010: 51), I would like to draw an analogy between physical, nonverbal gestures and mediated, textual social cues to argue that the two arise from the same embodied cognitive roots and needs to serve equivalent social and communicative functions.
Gesture and paralinguistics
Many of the text-based new media and social media conventions alluded to in the introductory section are blatantly gestural. Explicating this analogy requires a brief examination of physical gesture in the context of nonverbal and paralinguistic communication. For Carrie Noland (2009, relying on Massumi, 2003), what makes an act a ‘gesture’ is the involvement of the body in a double process of active (muscular) displacement and (sensory) information gathering: we send information as we receive information; we enact our spaces of communication (Lefebvre, 1991), our cognitive and cultural environments. Linguist and nonverbal communication researcher Adam Kendon loosely defines gesture as ‘actions that have the features of manifest deliberate expressiveness’ (2004: 15). This broad definition is intended to cover a wide range of nonverbal and/or paralinguistic behaviors that support a broad variety of utterances on a continuum from non-systematic and unconscious to more formally linguistic: gesticulation, language-like utterances, pantomimes, emblems (see next section), and finally sign languages (McNeill, 1992: 37).
Keeping in mind McNeill’s (2005) admonition that gestures ‘are part of language’ rather than a separate ‘body language’ divorced from linguistic communication processes, there is precedent for exploring the connections among language and gesture in the study of interjections, ‘word-like’, paralinguistic expressions such as hey, uh, ouch, whoa, yuck, and a multitude of familiar, more or less linguistic forms and conventions that imbue normal conversation in natural human languages. From a linguistics and pragmatics perspective, Wierzbicka (1992) and Wharton (2009), for example, describe interjections as ‘vocal gestures.’ While gestures have been generally dismissed from conventional, Chomskyan linguistics as an aspect of ‘performance’ rather than language proper (language as syntactical relations), the linguistic status of interjections – which serve some similar functions – has been the subject of debate thanks to their more transcribable, word-like forms (Ameka, 1992; Wharton, 2009; Wierzbicka, 1992). Ameka discusses interjections as ‘relatively conventionalized vocal gestures (or more generally, linguistic gestures)’ (1992: 106), and delineates three properties of interjections: they (1) ‘include items which were thought of as “non-words”’, (2) ‘were thought of as being syntactically independent’, and (3) ‘are said to signify a feeling or a state of mind’ (1992: 102). As a context-bound set of objects/behaviors that ‘encode speaker attitudes and communicative intentions’ (Ameka, 1992: 107), ‘there is no doubt that there is an intimate connection between interjections and gestures in general’ (1992: 112). Paralinguistic communication such as interjections and gestures, therefore, straddles the boundary between verbal and nonverbal, problematizing the definitions and hierarchies of conventional linguistics.
In the broader study of human communication, the relationships among language, interjections and gesture provide a rich context in which to examine the natural language of new media communication. When an individual ‘utterance’ can be clearly defined by the technological bounds of a specific digital media format (e.g. a 160-character SMS message or a 140-character tweet, with available conventions for extending messages), academic debates over the linguistic status of interjections can be seen in a new light. In textual social media forms, an interjection is clearly a linguistic expression in the sense that it uses typed orthographic linguistic conventions to impart a definite, contextual, semantic content (i.e. syntactical, functional meaning). The formal semantic character of a linguist’s transcription of a verbal interjection is certainly debatable. But it seems beyond argument that an interjection or expression that is clearly not a ‘word’ (e.g. ‘grrrrrrr’) carries meaning as an electronic text message in a conversational context. In order to establish a clear connection between paralinguistic behaviors in immediate, physical interaction and text-based forms of mediated communication, I will generally confine the present discussion and analogy to ‘co-speech gesture’ as investigated at the intersection of linguistics and cognitive psychology. By constraining the discussion of gesture in this way, I can concentrate on the broad communicative functions of gesture as they are enacted and performed in and around language and, specifically, through conversational text.
Co-speech gesture, or ‘the spontaneous movement of the hands that accompany speech’ (Goldin-Meadow, 2003; McNeill, 2005), is a universal human phenomenon that straddles the boundary between conscious and unconscious activity. 1 Even the blind make hand gestures when speaking, using gestural forms identical to those used by people with unimpaired sight, and even when talking to other blind people (McNeill, 2005). McNeill (1992, 2005) argues that such gestures are intimately connected to the process of thought and the production of language. This explains, for example, why binding a person’s hands will affect the way he or she uses language. Animals, on the other hand, do not ‘gesture’, at least not for the familiar human purposes of deictic indication and establishing shared attention. Chimpanzees can be trained to use gestures when rewards are involved, but they do not, for example, point in the wild (Tallis, 2010). In human infants, however, hand gestures are a critical precursor to speech. Infants with autism do not point, making the absence of gesture an important diagnostic indicator of early cognitive development (Baron-Cohen, 1995; Tallis, 2010).
The term ‘co-speech gesture’ covers a wide range of behaviors in a variety of academic contexts. While no single definition or typology of gesture has been universally accepted, Kendon (2004) has identified four basic properties: (1) Gesture can be executed more quickly than the spoken utterance. (2) Gestures are silent, which means they can be used simultaneously with speech and/or beyond the boundaries of the immediate interaction. 2 (3) Gestures are visible, which means they can be used over much greater distances than speech. And (4), the production and reception of gesture do not require the same kind of mutual orientation among interactants as speech, and they can therefore be used successfully amidst distractions (e.g. in a crowd). To generalize these properties a bit further, gestures are compressed, extensive, paralinguistic communication.
New media gesture
Compressed, extensive, and paralinguistic is a fairly good description of fast, telegraphic, and multimodal new media communication and the social cues that have evolved within digital and social media forms. While being limited to available orthographic conventions, textual new media ‘gestures’ are able to provide a surprisingly wide variety of communicative functions that can be analyzed along Kendon’s continuum, minus the two extremes of unconscious gesticulation on the one hand and sign languages (formal symbolic systems) on the other: in the former, the ‘unconscious’ use of mediated communication would seem problematic, at best, and for the latter, textual paralinguistic gestures already function as part of written language – they do not constitute an independent symbolic system.
Rowe, however, argues that the often synchronous (or near-synchronous) and responsive character of CMC might encompass more of the continuum than we might initially suppose: ‘In terms of consciousness, e-mail combines the best of both worlds: Its nonconscious side [quick, responsive CMC] reflects [Vygotsky’s] inner speech, which connotes intimacy; its conscious [deliberate, self-reflective] side reflects creative linguistic “art”’ (Rowe, 2009: 84). The non- or less-conscious end of the paralinguistic spectrum is further opened up to play in social media technologies (e.g. Facebook or even SMS) where, unlike the chat rooms, MOOs and MUDs of early cyberstudies (e.g. Jacobson, 1999), interactants are often communicating with people whom they already know and have interacted with physically prior to making online connections. Nonverbal speech patterns, evocations of known personality traits, and cultural norms can therefore be expected to translate better to some degree in social media where interactants tend to (but of course do not always) share personal histories.
In drawing an analogy between digital media communication conventions and physical gestures, a wide range of behaviors might be included depending on the context and the definition of gesture being applied. Richard Grusin (2010) discusses gestures of physical interaction with communication and media devices as affective enactments of anticipatory relations to broader social, cultural and political systems. Jos Schuurman (2011) offers 13 ‘on-line gestures, which can serve as signals indicating endorsement or recommendation’: subscribe, read, store, share, tag, rate, copy/share, send, comment, blog, ‘pipe it through’ (which would include re-blog or retweet), link, and approve/reject. A variety of other actions performed with digital communications technologies can be approached under the banner of ‘gesture’, including obvious social media gestures such as ‘poke’, ‘like’, ‘friend’, ‘follow’, and ‘favorite’. 3 Given the requisite aspects of movement and rhythm that define the concept of gesture, even response time can be considered gestural in the context of digital media communication, for example, breaking up IM messages to create pauses, suspense/expectation/anticipation, and to manage conversation flow/turntaking (Lea and Spears, 1992; Lewis and Fabos, 2005; Walther and Tidwell, 1997).
The use of more conventionally paralinguistic forms in digital and social media communications have been previously addressed as mediated forms of ‘emotional grooming’ (Ling et al., 2005) that function as ‘phatic fillers and backchanneling…employed in a similar way to FTF conversations’ (Quan-Haase, 2009: 49, citing Herring, 1999b; see also Baron, 2004; Schandorf, 2011). Tjora (2011) describes ‘SMS-hugging’, in which romantic couples maintain a hidden or discreet, direct, one-to-one, mediated emotional connection while in the shared physical social context of a larger group. While Rowe, writing specifically about email communication, argues that ‘Written language…cannot draw on paralinguistic means to reveal context’ (2009: 83), the common use of paralinguistic cues, including interjections and emoticons, as social media conventions would seem to refute such a claim. As Garber (2011) says, ‘Online, words themselves, once silent and still, are suddenly springing to life. And that can be, in every sense, a shock to the system. (Awesome! And also: Aaaah!).’ Herring and Dresner (2010), more analytically, argue that punctuation and (especially) emoticons function as markers of illocutionary force and perlocutionary intention – communicative adaptations to the textual medium – rather than simple one-to-one iconic mappings to physical expressions. Broadening the argument, recent surveys by the Pew Internet and American Life Project demonstrate that African Americans and Hispanics make up the most active American demographic populations using Twitter (Smith, 2011). The continuing adaptation of oral-cultural forms and practices to social media (as well as the reverse – I sometimes catch my students using ‘lols’ in spoken conversation) being driven by these and other cultural groups are both a significant driving force of linguistic innovation and the likely source of many reactionary fears and responses from the ‘language and culture police’.
So while Walther (1996: 29) argued that CMC is best compared with other written forms rather than face-to-face communication, his argument was made before the rise of social media and before the spread of ubiquitous mobile communication technologies made constant mediated social interaction a real possibility, allowing for ever more colonization (or reclamation) of the textual language space by oral communication forms, including the adaptation and translation of paralinguistic forms. This analogy can be made more explicit by comparing specific social media practices with specific categories of co-speech gesture as defined by Kendon (2004): deictic, imagistic, expressive, rhythmic, and emblematic. The paralinguistic forms and conventions of textual new and social media that can be classified as such are now common in a wide range of technologies to differing degrees in different contexts. The following analysis should be understood as a proof of theoretical concept. I use convenient examples from Twitter for no other reason than their immediate availability. Examples were culled from the author’s personal Twitter stream over several months in addition to arbitrary searches for emoticons and iconic expressions. Since my purposes here are analogical, illustrative and classificatory, I make no attempt to quantify their use, either generally or in any particular context. A quantitative analysis would have little relevance here since the explicit purpose is to demonstrate that nearly all new media and conversational text has some gestural element, however attenuated. The examples have been chosen merely to illustrate such gestural functions and in that sense represent ‘ideal’ models, though they are all actual tweets by real Twitter users.
Deictic gesture
Deictic gestures are indicative, indexical or pointing actions that instantiate often implicitly or explicitly hierarchical or spatiotemporal relations. The fact that indicative paralinguistic forms appear in new media is hardly surprising: the foundational metaphor for the paradigmatic online action is a deictic gesture: a hyperlink points to another web page. Consider the perfectly naturalized marketing phrase, ‘Visit us online at . . .’ Twitter offers a wide variety of deictic gestures, including hyperlinks. Retweets (RTs) point back to the original source of a message, while also (like all forms of indication) implicitly pointing to the person doing the RTing and making explicit a connection between the two interactants, however loose. Direct messages (D or DM) are an obvious deictic form, pointing a receiver back to the source of a message. Personal indications (@) point to others whom the author of the message wants to draw attention to, or the attention of. Hashtags (#) point to phenomena of social interest, or indicate the attempt to generate such interest. More broadly, avatars (by, for example, drawing attention to specific phenomena, cultural references, or corporate/collective identities) and Facebook gestures (e.g. ‘Like’, ‘Poke’) can be understood as deictic gestures calling attention to the one doing the ‘poking’ as much as to what is ‘liked’.
The basic format of a tweet is, arguably, a link and a brief contextualizing comment. The deictic aspects instantiate a set of relations between the ‘tweeter’, the ‘tweeted’, and the potential audience of followers. In Figure 1, science fiction author Bruce Sterling provides a link, adding a relevant quotation and three hashtags that serve to contextualize the website being pointed to, which in this case presents a London company’s clothing line that includes a women’s t-shirt embedded with LEDs that generate patterns of light in response to the wearer’s movements. Sterling’s tweets are ‘protected’, meaning that he has reserved the right to personally approve those who will be allowed access to his tweets; therefore, (besides blocking marketers from exploiting links that he publicizes, and others from datamining his tweets for ‘predatory databases’) he can have some degree of confidence that his followers know who he is and are aware of the things that he writes about and of his personal and professional interests. The quotation and the hashtags given provide a deictic triangulation of those interests for an audience of followers familiar with the overall space in the same way that hand gestures effect relations within a physical or conceptual space in a face-to-face conversation (Haviland, 2000).
In Figure 2, Gigi Ibrahim, an Egyptian blogger who (eventually appearing on the Daily Show) was an important voice in one of the first of the 2011 ‘Arab Spring’ uprisings, is being retweeted by NPR social media coordinator Andy Carvin. Carvin has been widely acclaimed (and closely watched) for his use of Twitter in his journalistic practice, masterfully filtering, or ‘curating’, Twitter and other internet resources to provide an essentially real-time perspective of events on the street in the Middle East. The deictic aspects of this strategy include the instantiation of relationships between Carvin, the active Middle Eastern Twitter users he follows, and his own audience of followers. Being retweeted by Carvin provided a form of legitimacy to the young people participating in the uprisings, giving them a sense that the world was indeed watching, while demonstrating and maintaining Carvin’s own professional, journalistic legitimacy to his western audience.
In this particular tweet, the deictic gestures include the @s, the first of which is a reply from Ibrahim (to @Ssirgany), the other three pointing back to other local participants and commentators (and prior public arguments and conversations), as well as Ibrahim’s avatar which directly indicates the ongoing event of the ‘January 25th’ Egyptian demonstrations. These deictic gestures enact a set of conceptual relations between the participants of the uprising and Carvin’s western audience via the legitimizing mediator of Carvin himself. 4
In Figure 3, journalist Karen Beninato makes use of a common Twitter convention, the #FF hashtag. Deriving from the practice of sharing one’s favorite ‘tweeps’ with one’s own followers on ‘Follow Friday’, the #FF (or ‘friend feed’) hashtag indicates an acknowledgment of value and a recommendation to one’s followers. In this case Beninato’s RT is acknowledging @shultzilla’s recommendation, which serves on the one hand to indicate gratitude and on the other serves to return the favor of recommendation to her own followers, to whom @shultzilla’s recommendation also serves as a form of legitimacy of Beninato’s value. @shultzilla’s original tweet establishes, as the RT reinforces, a set of relations among those Twitter accounts indicated, all New Orleans-area Twitter voices, and calls the attention of those mentioned back to @shultzilla. The set of relationships established in this particular case also indicates and reinforces a set of conventional social bonds since all of those involved are a part of the specific geographic and cultural community of New Orleans.
Kendon (1990) examines some of the ways that interactional space is defined in face-to-face encounters by means of, for example, greeting gestures (e.g. handshakes) and changes in bodily orientation, which both react to the informational and emotional content of physical encounters and enable particular forms of social interaction. In the context of online chat rooms, Bays (1998) argues that addressivity functions analogously to gaze in face-to-face conversations. The present suggestion is that deictic gestures of textual social media communication perform such functions by defining and instantiating ‘spaces’ of interaction through the management of virtual proximity in mediated, ambient co-presence. Further targeted, empirical research is required to explore the idea that this analogy is a valuable way to approach mediated interaction.
Imagistic gesture
Hand movements that imitate, depict or nonverbally describe are classified as imagistic gestures. In face-to-face conversation, this could include making an hourglass shape in reference to the shape of a body or a swooping motion to describe a specific movement or event in a narrative. ASCII art provides numerous examples of imagistic gesture on Twitter, as do emoticons (orthographic/typographic images that approximate facial expressions) and avatars that portray idealized images.
In Figure 4 we find a typical western horizontal emoticon expressing pleasure or happiness in the image of a smile turned sideways and using repetition to indicate degree (the more closing parentheses, the bigger the smile). The message also has a deictic component, being directed at @YUKOS061. Linguistic communities using the English/Roman alphabet (which have dominated the early eras of online communication – a situation that is rapidly changing) have developed a staggering number of emoticons (though only a relatively small number are actually common; see Crystal, 2008), so many in fact that most instant messaging and other online text communication services have automated their use by generating iconic ‘happy face’ images appropriate to the facial expressions being approximated by text. Different styles of emoticons are available using different scripts and character coding sets. For example, Asian language communities have evolved a parallel style of horizontal emoticons (see Azuma and Ebner, 2008; Azuma and Maurer, 2007; Cha, 2007; LaPointe et al., 2004; Markman and Oshima, 2007; Wang, 2004), demonstrating the imagistic and emotional potential of orthographic conventions as well as the natural propensity of the human brain and perceptual system toward facial expressions and emotion in social interaction, generally. LaPointe et al. (2004), for example, argue that individuals’ increased use of emoticons and the emotional interaction that they represent ‘suggests the development of electronic personalities and an increase in [mediated] social presence’. While Herring and Dresner (2010) argue convincingly that emoticons do not, in fact, tend to represent or reflect the actual emotion of the message sender as much as the illocutionary force and perlocutionary intent of the message, 5 the imagistic character of emoticons is clear.
In Figure 5 we see the even more overtly imagist gesture of a trumpet playing music, created with ASCII characters and UNICODE symbols. The avatar, as well, provides the idealized image of an animated character. Either of these gestures is open to a range of interpretations depending on the conversational context, the degree of social and personal connection to the person behind the account, and the ‘hyperpersonal’ assumptions of followers.
In the two tweets in Figures 6 and 7 we see not orthographically ‘drawn’ imagistic gestures, but textually rendered conceptual images of expressive actions that mimic the emotional content of the respective physical gestures. Jason Wilson’s ‘slow clap’, for example (Figure 6), is an image that expresses the sarcasm of unenthusiastic congratulations: the link is to an announcement that Facebook had ‘Finally!’ developed an app for the iPad several months after the second version of the popular tablet was released. Also of note is that in both cases the gestural images are set off by asterisks, a common convention that often indicates a break from formal written/typographic conventions, much like scare quotes or quotation marks can indicate sarcasm, irony, or questionable veracity in more conventional forms of text. Similarly, both gestural images are preceded and contextualized by interjections emphasized by exclamation points. Emma Gibson’s tweet (Figure 7) can easily be interpreted as a completely gestural expression: deictic address + expressive interjection + iconic gestural image.
Expressive gesture
Actions that express or indicate emotion or state of mind are expressive gestures. A pumping fist as an expression of anger, pride or enthusiasm is a readily available example, as is a shrug, a raised eyebrow, or even tone of voice or vocal emphasis, more broadly. In text-based social media, text styles (e.g. all caps) and (as in Figures 6 and 7) punctuation (e.g. exclamation points, ellipses) are common translations of vocal expressive gestures, while avatars and emoticons can also serve directly expressive functions.
In Figure 8, @lamarathon uses all caps and an exclamation for emphasis while situating the content of the tweet with several deictic gestures that are further accentuated and more widely dispersed by PolymerPhD’s RT (defining network range as space as volume). The avatar also serves an expressive function, conveying an emotional content and a sense of personality.
Shortly after AT&T announced its purchase of T-Mobile (pending regulatory approval), @pkafka used the transcription of a vocalized interjection as an expressive gesture to contextualize a link to a collection of T-Mobile commercials denigrating AT&T’s network operations and quality (Figure 9). Where the imagistic gestures described earlier were set off by asterisks, indicating their gestural functions, the interjection ‘Hohohoho!’ is a directly transcribed vocal gesture that requires no orthographic qualification. The RT by @ncarr adds another layer of deictic gesture that indicates both connection and approval, both of which have more subtly expressive aspects.
In the case of Figure 10, @joaniepop’s transcriptions of vocal gestures/interjections, accentuated by all caps, repetition and punctuation are given the further expressive elements of profanity and direct allusion to emotion (i.e. ‘feel’). Furthermore, in this case, the hashtag is used as an expression that invites audience identification and solidarity. A popular and often repeated quotation whose original source now seems to be lost in the ether is that ‘Facebook is for people you went to high school with, Twitter is for people you wish you went to high school with’, and @joanipop’s tweet here exemplifies the degree to which different mediated social networks can fulfill very different socio-emotional functions for a particular individual. Such hashtags are expressions that invite but do not necessitate both empathy and social reinforcement: others may reply or follow up such a tweet with their own examples of ‘Stuff I Can’t Say On Facebook’. This may, in fact, be the most common use of hashtags, and is widely popular as a sort of mediated party game. This type of networked play therefore also serves a deictic function as a hashtag’s spread maps a network space of phatic connection.
Expressive gestures in face-to-face communication are so common as to be unremarkable. The only thing that makes them remarkable in mediated communication is that text-based communication, generally, is still held to the idealized cultural standard of conventional printed forms: the novel, the formal letter and so on. Writing is taught in formal contexts and receives the imprimatur of a rational, civilizing behavior that denigrates the emotional except in specific contexts in which rhetorical pathos is appropriate. However, it should not be surprising that conversational text communication should have at least as much in common with conversation as with text, and should therefore easily adopt and adapt emotional, expressive forms of social identification.
Emblem gesture
Emblems are formal, culturally specific, nonverbal expressions, typically of the hands, that tend to be excluded from the formal study of co-speech gesture because they can often be directly translated into verbal equivalents and therefore function more like formal linguistic symbols than paralinguistic signs. People always know when they have performed a nonverbal emblem and often do so in an attempt to control the behavior of others. Examples in include ‘shush’, ‘thumbs-up’, ‘okay’, ‘and a host of other hand movements, many of which have unprintable meanings’ (Goldin-Meadow, 2003: 5). Mediated textual emblems include LOLs, acronyms, some emoticons and ASCII art, as well as iconic hashtags and expressions such as ‘#FAIL’.
The #FAIL hashtag typically indicates an attribution of incompetence and/or an expression of disappointment. It is often used in a way similar to a ‘thumbs down’ gesture or, if directed with an @, a middle finger, and is similarly designed to influence or control (in this case corporate) behavior. If a physical nonverbal emblem is a physical expression that can be translated directly into a verbal equivalent, Ian Wilson’s #YouHaveGotToBeJoking hashtag (in Figure 11), constitutes a verbal expression translated into an iconic form.
In the example in Figure 12, we see a combination of expressive interjections, punctuation and textual iconic forms along with a variety of deictic gestural forms situating the informational content in the socio-emotional context of @jessedarling and her assumed or directed audience. The iconic, emblematic gesture combining ‘WTF’ and W00t’ 6 expresses a mixture of confusion and excitement nearly as concisely as a facial expression and body posture might in a face-to-face interaction.
Rhythmic gesture
Motions that explicitly ‘mark out, punctuate or…make reference to the structural aspects of discourse, either in respect to its phrasal organization or its logical structure’ are classified as rhythmic gestures (Kendon, 2004). A common rhythmic rhetorical gesture is to mark out prosody or points of emphasis with the index finger or fist (a common gesture in political oration, for example). On Twitter, to take but one example, ASCII art and punctuation often perform similar rhythmic, structural functions for the purposes of emphasis and emotional expression.
In Figure 13, Vitorino Ramos uses punctuation to mimic the rhythms of speech and convey physical and emotional (social) discomfort.
In the example in Figure 14, @HAL9000_ uses rhythmic and expressive gestures for comedic effect. In Figure 15, Jenny Ryan’s punctuation is designed to express determination, while being qualified with a lack of capital letters that undercuts her purposeful declaration. The interjection following an ellipsis further qualifies her expression of determination, and the final emoticon (indicating a ‘raised eyebrow’) can be interpreted as conveying either surprise at the goal itself or as a reflexive questioning of her ability to meet the declared challenge.
Obviously, as with physical gestures in face-to-face contexts, these gestural categories are contextually dependent and often overlap. The expressive gestures in Jenny Ryan’s tweet, for example, mutually accentuate the rhythmic gestures. A single tweet can contain all of the gestural forms discussed here and can in fact be completely gestural, carrying emotional content but little or no explicit semantic content (see Figure 16).
The relevance: Phatic communication
Many of these textual forms of gestural, paralinguistic expression, especially those of emphasis and rhythm, are common to other forms of conventional textual communication such as the personal letter. Clearly, the form and context of the particular instance of communication in question establishes the degree to which expressive elements are considered appropriate. In a business letter or academic article, for example, relatively few expressive forms analogous to non-verbal expressions will be tolerable or acceptable. In a personal letter, note or handwritten postcard, such expressive features are possibly even more indicative by their absence, which runs the risk of being interpreted as ‘stiff’, ‘cold’, or ‘stand-offish’. Rotman (2008, citing Kittler, 1999) argues that the advent of the typewriter attenuated or even ‘effectively eliminated’ the expressive paralinguistic functions of handwriting, such as ‘written emphasis, uncertainty, rhythm, discontinuity, stress, tailing off, and other scriptive traces of the body’ (2008: 26). As I have tried to show, even if this were the case with the personal typewriter, the temporal affordances of conversational digital media communication have allowed the evolution of a wide variety of means to transpose nonverbal, gestural communicative functions into text. 7
As with speech patterns and spoken conventions, writing style is modulated by context, commonly taking the place of tone of voice in face-to-face conversation, and genres of writing are typically taught in terms of ‘tone’. As Quan-Haase argues, email ‘writing style…becomes another linguistic variable with which the message is given meaning…[A] formal writing style signals the importance and seriousness of the message, whereas an informal writing style signals a casual exchange’ (2009: 46). Unsurprisingly, Quan-Haase found that research ‘participants made very few spelling mistakes [in emails] when they were conversing about a serious topic. By contrast, when they engaged in casual exchanges, communicating quickly often was more relevant than employing correct spelling’, and one research participant acknowledged that he ‘paid more attention to writing properly when he communicated with teachers…, a result of ‘social desirability’’ (Quan-Haase, 2009: 45, citing Lewis and Fabos, 2005). Because form itself is an expressive indicator, the conflation of writing styles (code-mixing) can also serve the functions that are here being called ‘gestural’ by using playful combinations and stylistic hybridizations, for example ‘What the *** does my esteemed colleague mean by that?’ (Rowe, 2009: 79).
As argued earlier, the hierarchical prioritization of formal conventions in textual communication is a historical contingency following from the context in which writing and the conventions of written communication are taught. Writing is a marker of civilization and ‘high’ culture that continues to be privileged as a deliberative, self-reflective form of cognitive and cultural mastery. In this context, informal conventions, altered spellings, and novel textual forms, such as emoticons, are understood as ‘debased’ in exactly the same way that slang and subcultural speech patterns are understood as ‘mongrelized’ by conservative proponents of the dominant culture – despite the obvious historical facts of linguistic evolution and the entirely normal and natural human inclination to play with language (Crystal, 2001, 2008).
Unarguably, conversational forms of textual communication naturally mimic face-to-face conversation and fulfill or transpose many if not most or all of the same functions. This can be clearly demonstrated with an examination of social media forms or categories of communication. Again, for convenience and consistency, I will use Twitter as an example. To date, only a few ‘tweet typologies’ have been published, and while those available (e.g. Honeycutt and Herring, 2009; Mischaud, 2007; Naaman et al., 2010; Oulasvirta et al., 2009) vary widely, they have several commonalities. Common categories include greetings, weather, small talk, emotion, and metacommentary, among others. These are immediately recognizable as categories of phatic communication, and the phatic character of social media has been noted by others (Miller, 2008; Parks, 2010; Stankovic, 2009; Stankovic et al., 2010). Parks argues that ‘For online settings such as social network sites, the most relevant…requirements are engaging in shared rituals, social regulation, and collective action through patterned interaction and the creation of relational linkages among members that promote social bonds, a sense of belonging, and a sense of identification with the community’ (2010: 111).
Linguistic anthropologist Bronislaw Malinowski coined the term ‘phatic communion’ in 1923 to describe ‘free, aimless, social intercourse’, the kind of ‘small talk, gossip or chit-chat’ (Coupland et al., 1992) that occurs apart from any specific activity or task being performed. Cheepen (1988) describes phatic communion as simply ‘chat’, and includes narrative as one of its aspects. This emphasis on what the Oxford English Dictionary describes as communication ‘that serve[s] to establish or maintain social relationships rather than to impart information, communicate ideas, etc.’ has led many scholars subject to the high-culture bias of writing (e.g. logocentrism) to understand phatic communication as trivial. Miller, for example, argues that ‘in phatic media culture, content is not king, but “keeping in touch” is’, and that this represents a worrying dilution of culture and society (2008: 395).
But there is a strong argument to be made that phatic functions influence all social interaction, and are fundamental to human communication generally. As Zeynep Tufekci argues, ‘that’s what humans do’ (Tufecki, 2011). Laver (1975, 1981) argues that phatic communication fulfills initiatory, propitiatory and exploratory functions and, furthermore, both constrains the thematic development of interaction (i.e. it is sequentially meaningful and uncertainty reducing) and confers crucial indexical meanings (i.e. it is socially diagnostic). In Goffman’s (1959) terms, phatic communication helps participants establish a ‘working consensus’ in social interaction. Phaticity, therefore, ‘may be best seen as a constellation of interactional goals that are potentially relevant to all contexts of human interchange’ (Coupland et al., 1992: 211). This broad reach is consistent with Malinowski’s original formulation.
Malinowski saw ‘phatic communion’ as a fundamental human characteristic and process. ‘In discussing the function of speech in mere sociabilities’, he wrote, we come to one of the bedrock aspects of man’s nature in society. There is in all human beings the well-known tendency to congregate, to be together, to enjoy each other’s company. Many instincts and innate trends, such as fear or pugnacity, all the types of social sentiments such as ambition, vanity, passion for power and wealth, are dependent upon and are associated with the fundamental tendency which makes the mere presence of others a necessity for man. (Malinowski, 2006 [1923]: 297–298).
‘Willingly or not’, argues Kendon, ‘humans, when in co-presence, continuously inform one another about their intentions, interests, feelings and ideas by means of visible bodily action’ (2004: 1). And there are important reasons for this: social interaction, and the variety of forms which that interaction takes, is a basic requirement of individual cognition and language production. McNeill (2005) argues that gesture and language are generated together from prelinguistic, dynamic cognitive processes that serve as the seed of both conceptualization and communication. In fact, the anterior language area of the brain (Wernicke’s area) is contiguous with the primary sensory-motor cortex, and overlaps the primary auditory cortex and several areas of activation relating to the perception and processing of the movement of others’ mouths, eyes and hands (Puce and Perrett, 2003). The production of language and movement is neurophysiologically connected with the processing of sound, language, and the nonverbal communication of others (D’Ausillo, et al., 2009; Devlin and Ayedelott, 2006; Iacoboni, 2008; Meister, et al., 2007; Okada and Hickok, 2006).
But the importance of the emotional identification (or repulsion) enacted in phatic communication runs even deeper because the ‘rational’ thought and language production that defines human being is based on and grounded in emotion (Damasio 1994, 2003; Ramachandran, 2011; Ramachandran and Blakeslee, 1998). And gesture is the primary modality for conveying emotional information: ‘Language and speech’, argue Beattie and Shovelton, for example, … are primarily concerned with propositional thought and the communication of semantic information about the world, whereas the movements of the body, changes in facial expression, and posture and hand and arm movement are assumed to communicate emotional information and form the basis of the social processes through which interpersonal relationships are established, developed and maintained. (2007: 221–222)
In terms of linguistics and pragmatics, conversational text communication, which has for many become a more convenient and common form of interaction than verbal or face-to-face interaction, represents natural language in action. Such conversations are not transcriptions of vocal interaction. They are actual instances of verbal linguistic exchange, and a corpus of social media text communication can be studied as such without falling back upon academic (syntactic, grammatical) idealizations of speech communication, as is the temptation (or crutch) at hand when analyzing transcriptions of spoken discourse. Secondly, studies of paralinguistics in text-based conversation may have important implications and revelations for the study of nonverbal communication and social cognition, as well as their intimate connection to and inseparability from (formal) language. Wharton, for example, has already made a theoretical connection between prosodic and gestural forms and ‘borderline linguistic expressions such as interjections and properly linguistic expressions such as mood indicators, discourse connectives and discourse particles’ (2009: 148). The examination of these phenomena in digital media contexts where face-to-face nonverbal gestures are replaced by textual paralinguistic forms such as those described here can extend the study of natural language pragmatics into an important and novel area of formally linguistic interaction. Similarly, new motion capture technologies, such as Microsoft’s Kinect, which is already being put to a wide variety of uses in academic and private behavioral research (not to mention behavioral marketing research), and which will soon be able to resolve and capture facial expressions (see Tanz, 2011), will undoubtedly prove to be important tools for the empirical analysis of nonverbal communication, particularly gesture, both physical and, in combination with voice recognition technologies, textual gestural expressions.
Gesture is a crucial mode of phatic communication that enables and enacts the reciprocity and self-disclosure, identity performance, engagement, and other-validation required for the community bonding interactions of tribe and village that have been integral to the evolutionary development of human consciousness (Leroi-Gourhan, 1964). Gesture is the way we enact our identities, the way we think our selves. It is the embodiment of our attitudes and our negotiated assumptions and expectations of our social contexts and cognitive environments. And if gesture is phatic communication, and new media communication is gestural, then the phatic communion of ambient co-presence is ‘the way we interact, and the way we feel each other out there in the realm of the World Wide Web’ (Stankovic, 2009: 1).
Mediated gesture of the distributed body: The networked self
The symbolic act is the dancing of an attitude. Kenneth Burke, The Philosophy of Symbolic Form (1941: 9)
Internet communication technologies have provided an ever-growing range of meta-media and a broadening ecology of social possibilities. Social media technologies, such as Facebook, Twitter, networked computer console games and others, along with ‘smart’ phones and other mobile technologies provide access to and interaction with our social circles in a virtual proximity or co-presence unlike anything since the primitive village. Where early electronic communication technologies such as the telephone and television, combined with transportation technologies such as automobiles and airplanes, centrifugally extended our communities and social networks while geographically dispersing them (Carey, 1992; Innis, 1950, 1951), new mobile internet communication technologies are generating a centripetal effect that is drawing us all closer together into a variety of overlapping digital ‘spaces’. At the same time, ‘augmented reality’ technologies that combine digital imaging and motion capture with the search and database capabilities of the internet, are overlaying digital ‘spaces’ upon our physical environments. In this context, textual gestures are just one subtle way in which we are able to intersubjectively enact our cognitive environments and spaces of interaction, both physical and digital, and where the boundaries between conventional frontstage and backstage interaction are blurring as multiple spaces and multiple social circles intersect and pull apart, both physically and digitally.
We currently inhabit an interesting transitional moment in the dynamic evolution of communication technologies, which continue to shape us, individually and socially, as we shape them, through both production and use. We are not quite accustomed to the growing affordances of ‘ambient co-presence’, and, as Figures 17 and 18 illustrate, even those who inhabit these new spaces with facility neither find mediated interaction to be ‘complete’ nor know exactly where the balance points among the physical and digital, the public and the private, lie.
If ‘always-on’ mediated connections are thought of as replacing physical, face-to-face interaction, then the new media naysayers are right: wholly mediated social connection is psychologically and emotionally lacking, and potentially detrimental. But if we understand these new communication spaces in a (more informed and more disciplined) McLuhanite sense as extensions that are shifting and reshuffling the balances of our social interactions, there is at least as much to look forward to as to be wary of.
Updating McLuhan’s (1964) theories of mediated sensory extension and the psychology of the global village, Tom Pettitt (2009) argues that in the evolution of our communications technologies we are witnessing the closing of the ‘Gutenberg Parenthesis’, a sort of ‘return to orality’ in the spirit of Ong and Derrida that Pettitt locates in, for example, the return of looser practices of attribution that give less priority to the individual work than to the community of knowledge and cultural ecology, a situation, Pettitt argues, that is reminiscent of the opening of the ‘Parenthesis’ in Shakespeare’s England and earlier. While such ideas have been attractive for several decades, the positing of a return or regression is obviously linear and does not account for the complexity of the expanded options for action and interaction now available and multiplying, nor for the complex interaction among the spaces (physical, mediated and ‘augmented’) in which our social interactions now take place. Such a linear view also assumes that such expansions of our ecologies of interaction will have no effect on the human minds interacting in and reacting to these spaces.
In Becoming Beside Ourselves: The Alphabet, Ghosts, and Distributed Human Being, Brian Rotman (2008), makes a series of claims related to those I have made here about mediated gesture but with emphasis upon the expanding options available to and the implications for individual agency. In his foreword to the book, Timothy Lenoir writes, ‘Not only is thinking always social, culturally situated, and technologically mediated, but individual cognition requires symbiosis with cognitive collectivities and external memory systems to happen in the first place’ (Lenoir, 2008: xxvii). Just as social context is a requirement for individual identity, individual cognition requires a cultural context and a cultural memory system in which to arise and function. For Rotman, the multimodal communicative acts performed with digital media technologies are understood as gestures of the distributed human being within a digitally mediated social and cultural context. Importantly, Rotman is not making a crude posthuman argument that human beings are escaping our physical contexts. Quite to the contrary, our necessarily physical mind–bodies are extending their awareness into the spaces of our evolving media ecologies.
Drawing on earlier theories of media ecology, Rotman posits the emergence of a third modality of self alongside the ‘I’ who speaks and the ‘I’ who writes and is read: from the oral and the scriptive/typographic, he argues, comes the parallel and distributed, the ‘networked self’: the alphabetic self, the embodied agency who writes and reads ‘I’, and in so doing performs a complex play of same and other-ness, actuality and virtuality, with the one who speaks and hears ‘I’, will be confronted by a third ‘I’, a self coming into being to the side of the written form, what might be termed a para-self, whose enunciation of ‘I’ will take place…in the interior of a post-, better, trans-alphabetic ecology of ubiquitous and interactive, networked media. (Rotman, 2008: 5) This third self-enunciating agency…is immersive, understanding itself as meaningful from without, an embodied agent increasingly defined by the networks threading through it…Such an ‘I’ is porous, spilling out of itself, traversed by other ‘I’s networked to it, permeated by the collectives of other selves and avatars via apparatuses (mobile phone or email, ambient interactive devices, Web pages, apparatuses of surveillance, GPS systems) that form its techno-cultural environment and increasingly break down self–other boundaries thought previously to be uncrossable: what was private exfoliates (is blogged, Webcammed, posted) directly into the social at the same time the social is introjected into the interior of the self, making it ‘harder to say where the world stops and the person begins’ (Clark, 2006: 1). (Rotman, 2008: 8)
Conclusion
Human individuality, human identity, human cognition, require social engagement – even if that engagement is predominantly mediated, and whether that engagement is physical, face-to-face and temporally immediate or takes the form of music, books, radio, television, online social networks or some other form(s) of consumption and interaction. We require social and cultural context(s) to provide the forms and materials of our thoughts and our selves. Individuality relies on the presence of others, but presence is much more than it used to be. In the ‘ambient co-presence’ of networked digital communications technologies, the compressed, extensive, paralinguistic emotional connections of phatic gesture are embodied in new forms afforded by new ways of ‘keeping in touch’ that are appropriate to the distributed, networked agency made possible by these environments. When approached from the perspective of nonverbal communication and social cognition, these changes are arguably less dramatic (and less ‘new’) than many of the typical evangelisms of new media proponents would have us believe. By approaching the technology from the perspective of the embodied, emotional, human being, rather than an (even implicitly) dualistic, technologically determined, metaphysical techno-mysticism, we may find ourselves in a better position to understand the implications of technological change and such important concerns as the nature of identity and identity performance, the influence and repercussions of the ever-changing media ecologies in which those performances take place, and the function of language and symbolic (inter)action, both in these evolving environments and in human cognition more broadly.
Footnotes
Acknowledgments
This article benefited significantly from the feedback of Roy Christopher, Patty Harkin, Steve Jones, Nathan Jurgenson, Athina Karatzogianni, David McNeill, Carrie Noland, Zizi Papacharissi, Kelly Quinn, William Rinehart, Andy Rojecki, Brian Rotman, Rob Short, and Jim Sosnoski. The author would also like to thank Christopher Honey for calling attention to the neuroimaging studies of emoticons, and Bruce Sterling for his blessing of ‘the academic fair use of a datapoint cheerfully stolen from some other Twitter source’.
