Abstract
Sound in television news broadcasting is dominated by perceived authoritative voices, resulting in a narrowing of ideas and a restriction of experiential and cultural representation. Television news offers audiences selective access to events through a suppression of environmental ambient sound. It thereby actively removes the context of an event and denies the public a broader sense of the event’s present significance. This article is a critique of the way in which media institutions and their practitioners wield the tools of their industry to manipulate audible content in order to preserve their own interests over those of the public. It begins with a brief historical examination of technology and the institutional practices that developed through the emergence of radio, sound films, and television. I then argue that through such technology and practice, television news decontextualizes the sound of events in order to overwrite them with authoritative analysis. As my main example, I offer two versions of the Howard Dean “scream” – one as presented through the broadcast institution and the other as presented through independent media and the internet. These two versions of the same event reveal how audible misrepresentation threatens news as public good and the documentation of history.
Late in the evening on 19 January 2004, about 3000 anxious people stood in the Val Air Ballroom in West Des Moines, Iowa eagerly awaiting the arrival of their candidate for U.S. President, Howard Dean. Earlier that evening, pollsters projected that Senator John Kerry had earned a surprise win in the Iowa Caucuses, relegating previous frontrunner Dean to a distant third-place finish. By the time Dean arrived the crowd was ebullient, and the room was enveloped in a thunderous cacophony of sound. He picked up a wireless hand-held microphone and spoke energetically through the ballroom’s public address system about taking back the nation and about not giving up. The crowd intensity rose to meet his every amplified word. As the people became louder, he responded with equal fervor. Eventually, the dynamic between Dean and his supporters rose to such a level that the crowd around him drowned out his voice.
This was the event as observed by those in the ballroom at the time. Watching the rally on television, however, the rest of America became witness to a very different occurrence (C-SPAN, 2004: video). Through the medium of television, Dean’s speech exhibited a man seemingly out of control. His voice and energy level rising higher and higher, he seemed to shift from intense to borderline manic, finally reaching the peak of a crescendo in what sounded like an inappropriately enthusiastic scream. Over the next several days, the incident was repeated incessantly on news broadcasts and became a subject of ridicule on late-night talk shows. The candidate who once led the pool of Democratic Party candidates never recovered and soon lost his bid to become the nominee.
But there was yet another version of this event, this one disseminated not through television, but through the internet. Joe Jensen (2004), a Dean supporter, videotaped the speech from his position within the crowd. From this perspective, viewers and listeners witness things more as they were seen and heard from within the event as it happened. Dean’s voice could be heard in the context of the surrounding sounds of a throng of supporters whose voices rose like a swelling tide. As his voice lifted in intensity, the audience in attendance matched his enthusiasm. Finally, when he reached the point of his infamous yell, the crowd was so loud it was barely audible through the supporter’s camera microphone. The crowd had drowned out the man on stage. How could there be such a difference between two versions of a single event? The answer lies in the practice of television news sound production.
Howard Dean’s infamy was imprinted onto the public consciousness through the use of two distinct but conjoined elements: a microphone and a medium. The historical progression of each provides clues as to how his Iowa speech and the public perception of that speech contributed to his eventual downfall. In examining the history leading up to television sound production, we see how two intertwined developments during the early years of film and radio led to the current aesthetic of television sound. The first of the two is technological: a competitive, capitalistic atmosphere in microphone research and development resulted in products designed to eliminate extraneous sound and give prominence to particular voices. The second is institutional: television sound is borrowed from a model of radio and film aesthetics that favors intelligible information over the wider sonic environment of an event because such practice leads to predictable and manageable results for the industries themselves. As a result of these factors, sound practice today encourages the active elimination of that which can provide wider cultural and representational perspective, that of contextual ambient sound. This sound of experience and human interaction is removed or suppressed by the institution of broadcasting in order to meet its economic needs.
Microphones: technology of exclusion
In 1876 Alexander Graham Bell invented the telephone and a year later Thomas Edison reproduced sound through the invention of the phonograph. As the two essential components of sound media – transmission and reproduction – both technologies turned out to be driving factors in the way sound technology developed in the early 20th century. In its early development, Bell Labs was working on two main technologies simultaneously: microphone circuitry and telephony. The basic sound transmission design was the same for both. ‘The microphone amounts to little more than a highly sensitized telephone transmitter’ (Pitkin and Marston, 1930: 221). While microphone design focused on sonic realism, telephony filled a different need. The telephonic aspect was Bell’s main business objective, and here clarity in voice reproduction was a primary goal. ‘Intelligibility was clearly linked to conventionality at this early stage. Speech that could be easily interpreted on the basis of little actual audio information – a call, a query, a cliché – was more likely to be understood over the telephone’s lines’ (Sterne, 2003: 248). Bell Labs:
felt entirely comfortable sacrificing 60 percent of the voice’s acoustic energy (the lower frequencies) because they lost only 2 percent intelligibility in the bargain. The functional primacy of intelligible speech enabled telephone systems to reduce drastically the amount of power required for transmission while retaining the ability to transmit voices with acceptable clarity. (Lastra, 2000: 164–5)
Alongside Bell’s purposeful compromise of fidelity in favor of linguistic clarity was a more commercial goal – ‘to convince users that telephone conversation was the same thing as face-to-face conversation’ (Sterne, 2003: 265). The point here is that Bell, a major innovator in microphone design, was primarily a telephone company, whose engineers made transmission systems with the business goal of voice clarity in mind.
Advancements in both telephony and sonic reproduction went hand in hand. The 1920s saw the emergence of new technology by Western Electric, the manufacturing branch of Bell Telephone, resulting in the electrostatic capacitor, or condenser, microphone (Eargle, 2001). One of the early technical goals of this new design was to better represent acoustic nuances of the voice, which in the early sound films was not captured with the sensitivity desired by studios. ‘In the first few sound pictures, very little difference could be detected between the voices of the men and those of the women’ (Pitkin and Marston, 1930: 236).
Also during this time a new radio industry was taking shape with the foundation of the Radio Corporation of America (RCA) in 1919. From its inception, radio sought to embody the aesthetic of controlled delivery. ‘Like the phonograph, radio technology was first conceived as a means of point-to-point communication’ (Peters, 1999: 204). Originally, microphones of this early era were omnidirectional carbon mics (Altman, 1994) that indiscriminately captured a wide sound field. But in the 1920s, RCA engineers began addressing the problem of sound reflections, wherein the architecture of the studio produced an effect in the signal that was deemed unnatural (British Broadcasting Corporation: 1951). These reflections – echoes produced by the room – were impossible to eliminate with the wide-envelope technology of omnidirectional microphones. As such, the reflections called too much attention to the mechanism of radio and compromised the goal of projecting a sense of intimacy and purity as delivered by the direct, unadulterated sound of the voice. Eventually, the bi-directional ribbon microphone became the preferred choice in radio. It offered enough desired sensitivity, isolation from room ambience, and directionality restricted to two on-mic sources.
The competitive environment of Bell Laboratories and RCA heated up in the late 1920s and leading into the 1930s. During this time, developments in microphone design were expanding to meet the needs of the emerging film and broadcasting industries. The 1930s ushered in ‘the first unidirectional or ‘cardioid’ microphone’ (Borwick, 1990: 11). Designed by RCA’s Harry F. Olson, the 77-series was an ‘immediate success’ (Altman: 1985) and marked a significant breakthrough in design because of its directional polar pattern. The design allowed microphones to narrow their sonic ‘focus,’ wherein sounds emanating from particular locations were isolated while peripheral sounds were rejected to varying degrees. The model 77A spurred a new round of competition between Bell and RCA, engendering two approaches toward a singular goal: Bell aimed to reduce unwanted signals and enhance the quality of primary ones by lessening distortion in the pickup. RCA, on the other hand, came up with the idea of pointing the microphone at the source to remove external sound (Altman, 1985: 10).
This movement eventually led to more directional characteristics in microphone design, an engineering approach used in most on-location and radio broadcasting microphones today. As the classification suggests, unidirectional patterns reject extraneous sound such as ambience, noise, reflections (reverberation), and any other unwanted sounds in favor of those generated from a particular location – so-called on-axis or frontal sounds, and even more specifically, a particular voice. Additionally, unidirectional designs enabled a lengthening of the distance between subject and microphone. The more narrow the polar pattern – and therefore the more narrow the angle of sound picked up – the greater the distance one could move the microphone from the subject before it took on a less directional behavior (Eargle, 2001). The growing functionality of the unidirectional microphone marked a new opportunity in the radio broadcasting and film industries. Whereas an omnidirectional microphone might capture sounds encompassing a full 360-degree radial sound field, unidirectional polar patterns allowed practitioners to isolate points of emphasis within the natural sound field. Not only did these microphones provide the ability to select particular sounds, they also dramatically rejected the so-called ‘noise’ of the surrounding environment and the reverberant characteristics that affect clarity in the voice.
One particular type of unidirectional microphone is the cardioid – the now widely used heart-shaped envelope pattern that restricts, to some degree, sound coming only to the front of the microphone. As a result, it rejects off-axis sounds, or those emanating from the sides, to varying degrees. There is a gradual decrease in signal strength the more off-axis, and more distant, the sound is from the microphone. These mics also reject sounds from directly behind the pickup unit, meaning that the microphone can be held in the hand without any so-called ‘handling noise’. The cardioid hand-held microphone is what Howard Dean used in his speech in Iowa. Examining the C-SPAN footage, one can see that Dean is holding the microphone extremely close to his mouth while speaking, a distance from full contact to roughly five centimeters. This positioning leads to a highly directional sound envelope that would achieve excellent ambient cancellation characteristics. Also, as the crowd becomes louder and louder, Dean’s voice increases in volume to adjust, which further cancels sounds outside the envelope. This is evident in videographer Joe Jensen’s footage from the crowd (2004).
These developmental steps in microphone design in the early 20th century provided the technological means toward isolating and thereby more clearly reproducing particular signals in a sound field. The competitive business climate between Bell Labs and RCA generated steady enhancements toward the preservation of clarity in voice reproduction while excluding undesirable sounds inherent to a particular location. Such advancements helped Bell’s and RCA’s business prospects because they met the needs of the emerging broadcast and film industries. These industries required predictability in voice reproduction in order to maintain control over content, a trend that has become commonplace in today’s television news media.
Sound in film: economics and verisimilitude
Although not applied to film until 1927, sound had a role in the cinema years prior. Rick Altman, James Lastra, Jonathan Sterne, and other academics of the history of sound have chronicled how sound media was used as a means of capitalist intent. Theater owners of the early 1900s hired audible performers to lure potential patrons to the stage. Various sound-generating schemes such as drummers, piano players, and Edison’s phonograph would draw the attention of passersby on the street toward what the theater had to offer on that particular day. Eventually, these sidewalk performers moved inside and became part of the early silent Nickelodeon films. These early sound performers were ‘effects specialists (typically drummers working a series of ‘traps’) who labor mightily to reproduce every possible sound suggested by the image’ (Altman, 1994: 15).
Clearly, sound was not integral to the image in the ‘silent’ film era, but abstracted from it. Images were regarded as art, but sound was something else – a side show, something to sell tickets. During this time, the idea of synchronizing sound to film was already being developed as a business venture even though the process had yet to be tested (Kellogg, 1955). Edward W Kellogg notes that using sound for speech was seen as largely unnecessary. But the industry, provided with the capability to make actors speak, took advantage, and the transition to synchronous sound film was realized.
When in 1927 such a picture was shown [The Jazz Singer] the story, the music and the dialogue were splendidly adapted to produce a fascinating picture with great emotional appeal, in which no element could have been spared without serious loss. In short, the excellence of showmanship played no small part in making it clear to everyone who saw it that the day of ‘Talkies’ was here. (Kellogg, 1955: 356)
Many filmmakers justifiably saw studio intervention as heralding an era of bad art, partly due to difficulties in adapting to the new mode of practice (Kellogg, 1955). A hierarchical audiovisual relationship developed that not only placed images above sound on the level of perceived artistic value, but also clearly defined their aesthetic roles. Pictures were regarded as the foundation, while sound served a very different purpose. As James Lastra states, sound became a method ‘to address the audience directly – to hail them’ (2000: 120). This historical division between audio and visuals helps to illustrate why sound came to serve a more practical than artistic need. Direct address is prominent in radio and would later become the principal aesthetic of nonfictional television.
Once sound became a standard component, struggles emerged over how to present it. In dramatic commercial film, wider shots naturally called for more distant sound and more ambient representations. Close-ups required a tighter ‘sonic focus’. When film editing became more commonplace, it carried with it a sound problem that led to one of the major early debates of the 1930s, and one which continues to this day: What happens to perceived sonic space during a cut between a long shot and a close-up, or vice-versa? A cut is visually based. Objects in the real world do not launch toward us; rather, we as subjective viewers of objective events simply shift our sense of visual emphasis. A cut from a long shot to a close-up is therefore an acceptable perceptual shift. But our ears have no such ability. To ‘re-focus’ our ears, our physical presence must move in time toward or away from objects in order to accept the change in perspective. Quickly cutting from a distant shot to a close-up worked against the ears’ natural inclination toward continuity.
These changes in the visible image resulted in debates over audible representation. The sound technicians wanted to present a pure sense of sonic perspective regardless of the cut. Here, representations of sound space directly correlate to the image space at all times, and change along with the cut. But this jump in sound focus placed Hollywood’s desire for verisimilitude at risk. The pressing need of Hollywood was clear dialogue. The studios therefore favored intelligible dialogue regardless of the cut, resulting in a continuous close-up sound. Lastra has documented how two schools of sound philosophy emerged that were in stark contrast: a phonographic or ‘fidelity’ model, and a telephonic or ‘intelligibility’ model (2000). The phonographic model called for spatiotemporal accuracy while the telephonic was ‘like writing’ (2000: 139) wherein reverberation and extraneous sound were reduced in favor of clear dialogue.
The telephonic model was assisted by the new unidirectional microphone designs and how they enabled the studio producers to solve the problems of cut discontinuity (according to their standards) and clarity in voice recording. The boom microphone was a breakthrough in this regard when it became available in 1929 (Lastra, 2000). This technology allowed sound technicians to place a very narrow-field unidirectional condenser microphone on a long pole and direct it toward actors’ voices. The boom microphone follows the action and results in continuously close-miked situations, generating high intelligibility representations regardless of shot length.
Those with the power in Hollywood were now able to dictate methodology to the sound technicians. Altman notes that the practice of preserving clarity over realism became something for which sound technicians were ‘praised and rewarded’ by the studio (1992: 54). By 1938, the intuition of sound engineers moved toward intelligibility over spatial accuracy (Lastra, 2000). Enabled by technological advancements that eliminated undesirable sound, studios were able to provide a more predictable, manageable, and therefore more saleable product by preserving homogeneity in voice representation. As a result, studio executives, not sound artists or film directors, helped to build the template for an aesthetic approach to sound that remains standard practice today.
The radio migration
In 1921, KDKA in Pittsburgh, PA launched the first continuous commercial AM broadcast signal (Bruck et al., 1999) and, two years later, network radio was born. Powered by the RCA, the industry quickly expanded. By the time The Jazz Singer was projected in cinemas in 1927, the exclusively audible medium of radio was already firmly established. While film had to negotiate the contrasting philosophies of sound positioning in relation to picture cuts and continuity issues, radio had no such audiovisual conflicts. As a purely sonic medium, the practice of radio was one that strived for discrete, isolated clarity in voices and sound effects. Radio was also predominantly studio-based compared to the more location-based medium of film. The problem for radio was much simpler – removing the studio from the voice. Radio’s early methodological history was an effort toward regaining a sense of direct communication that was lost through a broadcaster’s abstraction from an audience.
In short, radio adopted a conversational mode of address that spoke to listeners as if each was a person in his or her own right. At the same time, the spaces within which listening took place were implicitly acknowledged in the design of radio talk. (Scannell, 2000: 10)
As sound films grew in popularity and prosperity, the film industry needed practitioners. Naturally, it looked to radio.
For where had Hollywood found its sound technicians? By far the majority … had come from the radio studios. The early years of sound cinema were thus heavily marked by the version of reality offered by other modes of representation – first silent cinema, then radio. (Altman, 1992: 55)
Radio engineers had already developed the ability to isolate and discriminate among particular voices and relational sound effects. When they migrated to film, they had already codified sound into a hierarchical structuring system, resulting in a foreground/background aesthetic (Altman, 1994). ‘Two regions dominate: the foreground, in which actors move and narratives develop, and the background, which serves to guarantee diegetic reality while concealing discursive reality (by reducing the camera’s ability to register any space not identified with the diegesis)’ (1994: 9). In this shift to film, radio engineers applied their highly developed, institutionalized template of interiority to a medium that was one of visual exteriority. Radio brought an invisible practice and attached it to the film image. By 1938, the intelligibility model had become the dominant form in cinema (Lastra, 2000).
It is here, in this confluence of cinema and radio during the late 1930s, that we find the model for television sound a decade later. Through cinema sound practice, television had its audiovisual model of continuity and vocal clarity; through radio, television found interiority and structure. The same goal is at work for both industries: Production practice dictates a preservation of the voice at the expense of all other naturally occurring sound elements in order to preserve the perceived legitimacy of the broadcaster.
Television: text and context
Television today consists of a wide variety of broadcasting environments, both in-studio and on-location. In hermetic forms of programming, control over specific sound sources is necessary. For example, in-studio productions – where a script is pre-determined or environment and unpredictability play no role – dictate a level of audible organization in pursuance of clear debate or narrative. However, in content whose meaning is determined a posteriori – such as news events broadcast live – control over the sonic environment usurps the documentation of history. 1 Visual coverage in live television is, to a far greater extent, provided. There is the notion that images are enough to provide the ‘reality’ of an event, which ostensibly meets the needs of objectivity. Seemingly having met this goal, live television suppresses, controls, and re-renders the sound environment and overwrites it with description and narration. These media-decontextualized voices are thereby free to dictate other agendas. What is lost in such an approach to sound representation is the context – the broader sonic significance of an event that resides in the ambient sound of experience. Often, as exemplified by the coverage of the Howard Dean rally in Iowa, the ambient sound is what reflects the social interaction occurring in the moment. But with the room activity suppressed as unnecessary background noise, the audience is incapable of receiving the event’s larger significance.
Two categories of speech dominate television news as a means of gaining control over an event’s significance: narration and direct address. Chion calls such voiceover content ‘textual speech’ (1994: 172). It is a reversal of Walter Ong’s observation that the written word arises from the world of sound (1982). In television, sound is constructed using the written word. Textual speech has the power to take control over narrative, setting, and moment, and in so doing, achieve domination over the audiovisual presentation. Whereas film uses narration in fragments, television produces an unceasing stream of description and analysis. As such, broadcast news is continually appropriating the sonic character of the event in order to take ownership over the event. This is a manifestation of the need for control on behalf of the broadcaster. ‘Textual speech is inseparable from an archaic power: the pure and original pleasure of transforming the world through language, and of ruling over one’s creation by naming it’ (Chion, 1994: 173). The environmental, ambient character of time and place as it happens cannot emerge through the nonstop textuality of television news sound.
This desire to strip all artifacts of the mediation process from the informational message is a strong one, and cuts to the heart of a great deal of thinking on media content. James Carey notes how communication is manifested ‘in the construction and maintenance of an ordered, meaningful cultural world that can serve as a control and container for human action’ (1992: 18–19). This can only occur if events are broken down into simplistic forms. ‘[T]he purpose of the representation is to express not the possible complexity of things but their simplicity. Space is made manageable by the reduction of information’ (Carey, 1992: 28). As Carey suggests, the goal of institutionalized communication is ownership and control over its particular form of representation.
With sound decontextualized from audiovisual representations, audiences are abstracted from the cultural significance of events. When sonic environments are treated as a form of residue to be stripped away, the event loses not only its sense of the moment, but also the public interaction within a specific space of experience. It is not, however, simply a subtractive action; it is also one of substitution. Sonic experience, once removed, is rewritten with whatever the broadcaster deems fit to construct, namely voices of authority. The textual message thereby replaces the contextual message of human action that makes up the wholeness of the event. Herbert Zettl makes matters worse when he advises methods wherein the voice take prominence over the event. ‘The information function of sound is to communicate specific information verbally. In our verbally oriented society, a word is often worth a thousand pictures’ (1999: 314). Zettl is discarding the inherent bias of the voice that stands in as a mediator of broader cultural interactions and therefore does a disservice to the representation of events as they occur. Sound practice, decontextualized and re-organized through voices of authority, removes a broader sense of understanding and replaces it with constructed ideology. The danger as it relates to television, and in turn the recording of history, is that such an exclusive form of rendering widens the separation between the event as it occurred and the event as represented.
‘Mediated acoustical reality’
Returning again to Howard Dean’s speech in Iowa, we can now gain a better understanding of what happened that night. Again, there were two video recordings of the same event. First was the C-SPAN audio feed, which offered only one sonic element – a single, unidirectional microphone, fed monophonically through a sound board, and sent through cable stations into people’s homes. This is recognized practice in news audio feeds. In every sense, the process adhered to highly developed practices of professional sound production for television:
The microphone choice was perfectly suited to keep the sound of the environment (including reverberation and crowd noise) from overwhelming the legibility of the central figure speaking to a crowd.
The public address system ensured a clean audio signal flow.
Dean’s technique of microphone-to-mouth proximity was just right to maintain clarity in his voice.
As intended, the home viewer could hear every word he was saying.
By contrast, Joe Jensen’s (2004) footage from within the audience of some 3000 supporters provided a completely different representation. Jensen’s footage was characterized by high levels of reverberant-to-directional sound and an increasing loss in intelligibility over time, both of which would be considered bad television coverage of a man speaking on stage. But it is here that we also find a prime example of the phonographic approach to sound that was dismissed by the late 1930s. In this version, the contextual ambient sound, the environment of the place, provides the very answer to why Dean’s voice elevated so high in volume and why he screamed into the microphone. The phonographic representation exhibits very clearly how, within the context of the raucous throng of supporters, Dean’s voice became unintelligible to the people within the event itself, independent of the broadcast.
Jensen’s footage, however, received scant attention. The reasons are many and complex. Broadcasting an alternate version that may seem more ‘real’ than the news organization threatens the legitimacy of the managed broadcast; there might be accusations of bias toward Dean; and it does not fit the pattern of controlled spectacle that is so prevalent in TV news. But in purely sonic terms, there is also a phenomenological issue at play here, wherein the Jensen manifold (in the Husserlian sense) does not conform to what televised ears recognize as objectively accurate. Chion (1994) identifies this form of aesthetic processing as ‘mediated acoustical reality’, while Philip Auslander (1999) refers to it as conditioning through ‘mediatization’. The concept is the same: televised sound, as produced in accordance with the mandate of aesthetic practice, produces a certain identity, or version of reality, that the ears perceive as truthful. Through a listener’s ongoing familiarity with the media-enhanced voice – one that is made prominent through unidirectional microphones, equalization, and compression – the ears will naturally gravitate toward such conventions and find legitimacy in what is spoken. In other words, the institutional practice itself is what encodes the authority into the voice that is speaking.
Auslander goes further to suggest an interesting aesthetic reversal that occurs based on the conditioning of an audience. The live can now only take on legitimacy when it has been mediatized. ‘[W]hereas mediatized performance derives its authority from its reference to the live or the real, the live now derives its authority from its reference to the mediatized, which derives its authority from its reference to the live, etc.’ (Auslander, 1999: 39) And ‘once live performance succumbs to mediatization, it loses its ontological integrity’ (1999: 42). While Auslander is referring to both sound and image, we can certainly find within this concept the authority that the voice brings to the television viewer. Mediatization of the voice through microphone design and engineering approach creates its own aesthetic of manufactured attention and its own form of authority by extracting subjects from the sounds in the natural world. Television becomes not a site of the real, but a separation from our possible connection with its presence. The interiority of radio and the telephony of film sound, through decades of technological development and method, have redefined our expectations of sonic truth.
Steve Wurtzler (1992) examined this idea of listener abstraction by categorizing spatial and temporal presence and absence. Howard Dean’s speech would fall into one or two of Wurtzler’s four positions, depending on whether someone witnessed the speech in person or on TV. Those in the room at the time of the speech would fall into Position I – temporal simultaneity and spatial co-presence. Those watching at home on television would bear witness to a different relationship with the event, Position II – temporal simultaneity and spatial absence. Position I involves being ‘wholly present’ in the moment while Position II constitutes ‘simultaneous presence and absence, a combination of qualities of both the live and the recorded, the immediate and the mediated’ (1992: 90, emphasis added).
This helps to further explain why it is impossible to reverse what happened with the coverage of Howard Dean in Iowa. No matter how much broadcasters may later admit to overplaying or misrepresenting the event, the audience can’t integrate such a notion. People retain the scream as originally heard because television listeners have learned to accept the microphone as the audible agent of authority. To a television viewer, whatever comes through a vocal microphone conforms to their notion of reality. It is only when we look at the footage shot by Jensen – a more open, less acoustically mediated representation of the sound of the event – that something seems inauthentic within the manifold of that particular audiovisual representation. Here, we are separated from our standards of normalcy and orientation. In this version, we experience low-fidelity sound, with no audible anchor and no authority figure to guide us. Jensen’s open representation – with its ambient sound and sense of chaos – becomes a subject of fascination, but not one of authenticity that comes with the mediated version. Returning from his footage to the C-SPAN coverage, we as viewers still find the latter legitimate, because, again, that is our sense of televised reality. Even with the knowledge of the other point of audition, we connect more substantively with the mediated.
The voice of authority
The problem in sound representation is not that language is an inappropriate form of sonic content in television news, but that language is problematic when it functions as a substitute for the event itself.
Dialogue can be a wonderful method for enforcing imagination of the other’s position and is obviously a far superior mode of handling differences than fisticuffs or nerve gas, but it is not in itself an adequate communicative vehicle for bearing the full varieties of moral experience. (Peters, 1999: 160)
As for the voice itself, it is important to address what is gained by the broadcasters and lost to the public through the suppression of context and the isolation of particular voices in live news reporting. In the effort to orchestrate ideas through the management of particular voices within live environments, the culture, experience, and plurality is discarded. In the coverage of live reports or events in the field, as is so common in news, the environment is where the living, breathing energy of human activity resides. Recent events, such as the Arab Spring and Occupy Wall Street movements, illustrate that speaking isn’t a disengaged activity – it is an interactive engagement. The environment, culture, and individuals who contribute to that environment are the responsive context to the words as delivered. When audiences are better able to hear this context, they are better able to make determinations on events in ways that an individual voice of the network may either misrepresent or misunderstand. And yet, broadcast journalism suppresses this collective engagement in order to compose its own narrative and thereby legitimize its own authority.
Who are these voices selected by a broadcast network and what are they adding to the documentation of history? The overwhelming majority of them carry some kind of perceived authority. ‘Professional codes ensure that what is considered important is that which is said and done by important people. And important people are people in power’ (Kellner, 1990: 113). The authority figures given access to speak are those who serve the broadcaster and thereby preserve the legitimacy of the institution. The institution tends to provide microphones to individuals who fall into one of four general categories: the subject, the specialist, the correspondent, and the anchorperson. All are either employed by or invited to participate by the broadcaster. The subject can take on many forms, from a figure of power or celebrity down to a man on the street. Dean is an example of the former. He embodies the type of subject who provides a sense of credibility, substance, or interest that fulfills the broadcaster’s need to produce an authentic and manageable story. The latter provides a different role, that of an ‘everyman’ figure. This subject’s role is meant to appeal to the sense of universal human qualities that also, in turn, gives further legitimacy to the broadcaster.
The specialist explains or debates specific issues. ‘The debaters are either government officials or representatives of recognized institutions, and as such they inhabit the sphere of newsworthiness’ (Carpignano et al., 1990: 115). Kellner sees these authority figures as unwitting servants who sustain the broadcaster’s need for moderation in an effort to communicate to the public that all is well. ‘[T]elevision news attempts to mediate between the opposing factions on different issues, policies, and ideologies and to promote a middle-of-the-road consensus, flattening out differences and managing conflicts’ (1990: 115). Opinions that reside on the fringes of political or social thought are rarely represented. Granting access to such figures may threaten the appearance of objectivity upon which the news media rely. ‘Television news usually reinforces existing opinions; it is not a forum for new ideas or critical perceptions’ (1990: 114). Anchorpersons and correspondents offer a different form of authority, in this case the charming presence of reason and authenticity. ‘A good anchor is a good actor and with the lift of an eyebrow or with studied seriousness of visage, he or she can convince you that you are seeing the real thing, that is, a concerned, solid journalist’ (Postman and Powers, 1992: 31).
Taken together, these voices produce a form of managed content that is antithetical to the nature of live events. The goal is not to record ideas and events for history, but to take ownership over their meaning in order to produce a product for consumption. It is accomplished through an exercise of dominance over ideas and events. Michel Chion says that in film ‘a human voice structures the sonic space that contains it’ (1999: 5). In television, however, the voice becomes a force that overwhelms the sonic space that is separated from it. Even more problematic, the space is where the event actually occurs, not in amplified abstractions that surround or supplant it. In Howard Dean’s speech in Iowa, he was the subject, the broadcaster’s selected voice of authority. But the audience’s enthusiasm was the reason his words were spoken through the microphone as loudly as they were. Dean became ‘the story,’ but he was not the event. The event was the public, silenced by the apparatus, which made the story.
Conclusions and discussion
Television news and live-event coverage – the very coverage that should be providing a sense of openness – is exclusionary and restrictive in nearly every sense. From early in its history, sound reproduction has served as a component inferior to the image, one whose role was to generate economic interest. Today, we see how sound functions to uphold the legitimacy of the broadcasting institution in news reporting. We see how the management and suppression of acoustic space resolves a commercial need for ownership over messages. We see how technology aids in the support of each of these aims through the elimination of cultural context.
The practice of television news sound ultimately amounts to a hierarchical structuring of ideas. It gives authority to a handpicked selection of agents who wield it as an ideological tool. The outcome of this practice is that dominant perspectives prevail in broadcast news. These voices endow the broadcaster with the appearance of authenticity and thereby contribute to the commercial aspirations of TV news. Broadcast news, as a result, does not engender an interactive, open, multicultural flow of vast ideas. Rather it constitutes a flow of capital, wherein a particular impression of information and experience is packaged as a product for consumption. In the public perception, Howard Dean’s speech in Iowa was received as a newsworthy occurrence because it was packaged to appear that way, regardless of whether it was fair to the events inside the room. The immediate and lingering significance of this in regard to history, therefore, is not the result of the event itself, nor the man, nor any other aspect – whether technological, procedural, institutional, or ideological. Rather it is and will be the result of the confluence of all these and other factors. However, the writing of history, the inscription of the event, transcends the event and the man by rendering it in its documented form. The document is a single voice, isolated from context, made newsworthy and repeatable by meeting the commercial inclinations of the broadcasting entity. The primacy of this voice is built into the system of broadcasting. It is encoded into the technology (the microphone) and the institution (the news media). The unfortunate outcome of such practice is the narrowing of ideas and the misrepresentation of historical events.
Issues of power, economics, and content management for ideological ends in news media, are, of course, not limited to the role of sound. The institution of broadcasting comprises a complex, ever-shifting web of interests – one that spreads far beyond the need to document history. Simply asking the media to open up the microphones to the larger public will not work; to varying degrees, sound requires organization to effectively communicate ideas. But it is important to consider the methods and objectives behind the mechanism. The democratization and individualization of media through new technologies and digital distribution are in turn subject to the adapting structures of traditional media organizations needing to protect their product. It is therefore worthwhile to continue examining the evolving structures and methods of both traditional and independent media in relation to how we hear and record – and thereby remember and document – history. As evidenced by the packaging of Dean’s speech, the elimination of contextual ambient sound amounts to a restriction of cultural representation, which in turn enables the broadcaster to re-inscribe ‘meaning’ and agendas. While reproductions of events always reshape or otherwise modify the audiovisual ‘truthfulness’ of an event, it is clear that broadcasts devoid of contextual ambient sound produce profound shifts in abstraction away from events as they occurred. This is unfortunate because such sound should be a means of contextualizing the cold, inhuman distance of authoritative description with the public sphere of human activity – rather than the other way around. We need to be able to hear our world before we can talk about what it means.
Footnotes
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
