Abstract
From the inception of sync sound in the late 1920s to the modern day, sound in animation has assumed a variety of forms. This article proposes four principal modes that have developed in the commercial realm of American animation according to changing contingencies of convention, technology and funding. The various modes are termed syncretic, zip-crash, functional and poetic authentication. Each one is utilized to different aesthetic effect, with changing relationships to the image. The use of voice, music, sound effects and atmos are considered as well as the ways in which they are recorded, manipulated and mixed. Additionally, the ways in which conventions bleed from one period to the next are also illustrated. Collectively, these proposed categories aid in understanding the history and creative range of options available to animators beyond the visual realm.
Keywords
Sound sells the reality of an animation to its audience, encouraging viewers to invest in the onscreen events. In an animated film, the audio operates like an echo of the physical world in an otherwise constructed landscape. The sonic space may be highly referential, resembling the sound of the natural world, or it might be ‘hermetic’ (Whitehead, 2002: 149), sonically detached from the natural world with its own self-contained conditions. In each case, this article will illustrate how new approaches to sound design have emerged according to contingencies of technological developments, changing aesthetic conventions, and economic factors. The following analysis will illustrate how film sound technologies have changed over time, and how animation sound design has been used creatively in different traditions, by a variety of stylists.
While animation sound remains ‘an under-explored aspect of a medium that is frequently reduced to its graphic qualities’ (Allen, 2009: 20), material on the field is not as impoverished as it once was. In 2009, the Animation Journal devoted an issue to the subject, while Drawn to Sound (Coyle, 2009) and Tunes for ‘Toons (Goldmark, 2007), in addition to a wealth of book chapters, articles and interviews, offer ample research material. This article aims to bring this research together to paint a broader picture of the major trends of sound design in commercial American animation. The use of voice, music, sound effects and atmos will be considered along with the ways in which they are recorded, manipulated and mixed. The analysis will begin with the earliest sound films from the late 1920s, through to the golden era of the 1940s and 1950s, to television from the 1960s, and modern feature films. Some of the key approaches to sound design in animation will be detailed, along with the ways in which varying contexts effected change in cartoon sound aesthetics. Additionally, the ways in which conventions bleed from one period to the next will also be illustrated.
Four discernible modes of animation sound design will be detailed: these modes are termed the syncretic, zip-crash, functional and poetic authentication. Collectively, these proposed categories will aid in understanding the history and creative range of options available to animation artists beyond the visual realm.
Syncretic
An account of sound in animation begins before the invention of synchronized sound. During the era of silent film, cartoons represented sound visually using the same codes as comic books, such as speech bubbles and lines emanating from a speaking character (Chion, 2009: 39). In addition, audiences were accustomed to live accompaniment in the film theatre; organists and pit drummers were employed by movie theatres to provide music and sound effects. This brought about a loose and intermittent form of audio-visual matching and an approximation of this style can be heard in the early Mickey Mouse short Plane Crazy (1928), which was essentially a silent film with sound effects added later. At this formative stage, musical accompaniment was conceived differently to live action film. George Tootell’s 1925 book How to Play the Cinema Organ: A Practical Book by a Practical Player suggests the organist should take the screening as an occasion to exercise one’s own wit, rather than establishing mood or defining character (p. 84). This defined the early image–sound relationship audiences experienced at the movies before the commercial adoption of sync sound in 1927.
When sync sound was first introduced to animated films, basic synchronization between sound and image was enough of a novelty to hold an audience’s interest. Music was integral to the construction of cartoons and played a more significant role than that of dialogue, and sound effects themselves were musicalized. This already distinguished cartoons from live action film. Testament to the importance of music, cartoon series were given musical names: Disney ran the Silly Symphonies, Warner Bros. ran the Merrie Melodies and Looney Tunes, and MGM ran Happy Harmonies. Walter Lantz also picked up on the popularity of musical titles (and alliteration) with the Swing Symphonies and Musical Miniature series.
The general convention for live-action cinema has been that the image determines the music, with the footage filmed and edited first, and music added later. By contrast, the onscreen movements in syncretic cartoons (as they are defined here) are designed to conform with music that has been in development from the beginning of the creative process. The scoring of music and animation of movement were closely integrated and, as such, the soundtrack adheres more closely to musical rhythmic and structural conventions instead of stretching to fit with the imagery. 1 This music-based style of cartooning was first developed at Disney and later adopted by other studios in the 1930s (Barrier, 2003: 156). While it is not essential that a syncretic cartoon feature rubber hose animation, 2 both styles reigned supreme at the same time and are thus connected.
Indicative of the centrality of music to the overall aesthetic effect of the early Disney cartoons, animator Wilfred Jackson stated:
I do not believe there was much thought given to the music as one thing and the animation as another. I believe we conceived of them as elements which we were trying to fuse into a whole new thing that would be more than simply movement plus sound. (Jackson, quoted in Thomas and Johnston, 1995: 288)
In essence, synchronization forms the heart of this school of audio-visual relationship. Following on from the live accompaniment during the silent era, syncretic shorts may be considered synchronization experiments which ‘explore and showcase the possibilities the technology opened up through their precise and inventive matches of sound and image’ (Jacobs, 2015: 65).
Core to the syncretic style is working with a discernible, fixed tempo which only occasionally changes over the course of a 7-minute short. Working with a fixed tempo means that since characters and objects move in synchronization with the rhythm, the music prescribes how long characters walk, how many steps they would take, and the speed at which they move. This could create conflicts between animators and composers where the animators (not understanding how music is composed) would ask for an extra beat in the music, or something similar, to complete a movement cycle. This would lead to abrupt rhythmic shifts, which composers typically avoid. Reportedly, in the late 1920s, Walt Disney and his composer of the time Carl Stalling would argue about the soundtrack, in which visual cohesion would undermine musical integrity (Barrier, 2003: 22). Their eventual compromise was that that the Mickey Mouse cartoons would feature a soundtrack in which the music fits the action as best Stalling could manage, while in the Silly Symphonies series, music could take precedence and the action would be adjusted to fit with cohesive music.
The first Disney cartoon to feature synchronized music and movement was the iconic Mickey Mouse short Steamboat Willie (1928). This was so influential that it is considered the starting point for sync sound in animation, even though it was preceded by Paul Terry’s Dinner Time (1928) released earlier that same year (Furniss, 2016: 94). To mark the extent of Steamboat Willie’s influence, synchronization between movement and the rhythm of the music is widely known as mickey mousing. 3 Syncretic cartoons feature eminently simple plotlines, privileging play of movement over detailed stories. In the first Silly Symphonies cartoon Skeleton Dance (1929), for example, skeletons rise from their grave in the dead of night and perform a macabre dance in synchronization with the soundtrack. They terrorize black cats and owls before fleeing back to their grave at the break of dawn.
The success of Steamboat Willie and Skeleton Dance encouraged Warner Bros. to create and distribute syncretic cartoons starring Bosko, Foxy and Piggy (their early in-house stars), which were proposed by Leon Schlesinger and supervised by Hugh Harman and Rudolf Ising (comprising the aptly-named ‘Harman-Ising’ Studio). The early Merrie Melodies cartoons from 1930 to 1936 feature song and dance routines scored by Frank Marsales. In You Don’t Know What You’re Doin’! (1931) Piggy takes to a stage to show the in-house musicians how to play, only to be heckled by drunken patrons. In Lady Play Your Mandolin! (1931), Foxy (as a wandering gaucho) chances upon a remote Mexican tavern serving tequila and joins the merriment, which eventually leads to a duet with the in-house chanteuse (a female fox).
Not all plotlines were musically based however. Some cartoons remained offstage, like One More Time (1931) in which Foxy performs duties as a police officer doling out driving tickets and chasing bank robbers to the rhythm of a jazz soundtrack. In addition to Foxy and Piggy, the first true star of Warner Bros. cartoons (preceding Porky Pig, Bugs Bunny, Daffy Duck and others) was Bosko, who could be seen serenading his girlfriend amongst other musical antics. It was not just the characters who would dance, but buildings sway, the horizon line bounces and nearly every object comes alive to the beat of the music. Imagery often reacts to, or appears governed by non-diegetic music, thus problematizing distinctions between the diegetic and non-diegetic sound (Curtis, 1992: 201).
You Don’t Know What You’re Doin’! features a typical example of the syncretic style. Along with the music, there are also occasional pauses for dialogue, musical sound effects, and alterations in tempo when a scene changes. The cartoon begins with an up-tempo 10 frames per beat (henceforth ‘fpb’) for the opening fanfare. This slows down slightly to 12fbp once the story begins which entails Piggy picking up his girlfriend and taking her to a show in a car that chugs in time with the music. At the venue, the rhythm slows down further to 16fpb while Piggy shouts ‘You don’t know what you’re doing!’ to the in-house band. An ascending musical phrase accompanies Piggy as he climbs the stairs to take to the stage. Once he is onstage, a musical conversation occurs between a horse on a trombone and Piggy on a saxophone (see Figure 1). Their words are discernible through musical intonation:
Oh yeah? Is that so?
Yeah, that’s so
Ha-ha ha-ha ha
Ha ha ha

Piggy confronts an in-house musician through song in You Don’t Know What You’re Doin’! (1931). Screen grab from DVD (Looney Tunes Golden Collection: Volume 6, Disc 3, Rudolph Ising, Warner Bros.).
Piggy plays a fragment of a popular standard of the time, ‘Silver Threads among the Gold’ and is promptly heckled himself by drunken patrons (through song). The tempo speeds up again to 12fpb when Piggy, who is now intoxicated himself, staggers through the city with one of the hecklers (a dog). Piggy’s antics are accompanied by a trumpet solo, while the dog is accompanied by a bass clarinet. Finally, the two fall into a trashcan inside a landfill, turn to the audience, raise their arms and exuberantly cry ‘whoopee!’
Lea Jacobs (2015: 62) has explained how the creative process worked at Disney and a similar model would have been followed at Warner Bros. Once the story had been approved, the timing process was underway with a piano to hand. Animators would select a tempo with the music director at the beginning of the process and then time the cartoons on sheets of written music, indicating how many frames were required for each action. The timing would be transferred from music sheets to exposure sheets. Finally, the music would be recorded with a click track (invented by Carl Stalling) playing through the musicians’ headphones to ensure effective synchronization.
While syncretic cartoon soundtracks are defined predominantly by their use of music, dialogue and sound effects also feature but they are subsumed into the rhythm of the film. 4 This can occur, for example, through characters singing the dialogue and musical sound effects being put to use such as a slide whistle during a fall, or harps representing the wind during breezy summer days. Owing to the weight of sound equipment in the 1930s which rendered recordings outside the studio impractical, sound effects were produced in a controlled studio environment by musical instruments – typically slide whistles, cymbal crashes, bulb horns and timpani drums; the same techniques used by pit drummers to produce sound effects during the silent era.
Scott Curtis (1992: 202) suggests that the musicality of syncretic sound effects can be understood as a relationship that is defined not by fidelity (i.e. what an object actually sounds like), but of analogy. Similarly, film sound scholar Michel Chion observes that when children play, they sometimes vocalize the movement of their toys through pitch rather than mimicking a realistic sound. An aeroplane, for example, ascends and descends with a vocalized glissando. Something similar happens in syncretic soundtracks, such as an ascending musical figure accompanying the climbing of a flight of stairs. Chion (1994: 121) comments:
The sounds of the character’s footsteps do not themselves go up any scale of pitches. What is being imitated here is the trajectory and not the sound of the trajectory, drawing on a universal spatial symbolism of musical pitches. Sound is applied to most visual moments in this manner, and the animated film is the privileged province of this sound-image relation.
In addition, the vocal conventions of the time combined with the limitations of recording technology of the 1930s, makes male voices nasal and female voices sound somewhat shrill to modern ears in the syncretic soundtrack. While there was a greater variety of vocal styles in subsequent modes of cartoon sound production, Curtis (1992: 202) suggests that the exaggerated voices featured in early sound cartoons match their non-indexical, elastic and distorted bodies This set a precedent for voice acting in later cartoons.
In addition to being a product of aesthetic choices, syncretic image–sound design was also a product of economic incentives. Warner Bros.’ adoption of song and dance routines was in part motivated by their formal ties to a music catalogue they had acquired the rights to. Not only were their early shorts designed to entertain, they were also commercials for sheet music. This suited their use of high-energy, kinetic popular tunes instead of classical music since cartoons were in part taking the place of live vaudeville acts. In addition, Warner Bros. were better suited to the use of jazz tunes rather than the post-romantic music featured in live action cinema which Disney later adopted.
Technological factors also had an influence on syncretic image–sound design. In addition to their weight restricting recording equipment to studio settings, overdubbing was not possible in the early stages of sound film, so up until 1933, the soundtrack had to be recorded in one take with a single unselective, omnidirectional microphone (Altman, cited in Curtis, 1992: 197–198). The challenge was to produce a complex soundtrack featuring music, dialogue and sound effects through these limited means. At this time, dialogue and music were not generally heard simultaneously in film sound unless they had been recorded at the same time (Salt, 1985: 43).
As pervasive as synchronization was during early sound cartoons,
5
it was dogged by a negative connotation. The term ‘mickey mousing’ was reportedly coined by producer David O Selznick, who derisively compared a Max Steiner score to the music of a Mickey Mouse cartoon. Daniel Goldmark (2007: 6) comments:
The phrase implies not only that the music in question is simplistic, or ‘mickey mouse,’ but also that it is telegraphing to the audience too much information: that is, the music is calling attention to itself as it describes what is happening on screen’
Chuck Jones (2002: 94–95) responded to the suggestion of unimaginative use of synchronization in cartoons, protesting in 1946 that:
… all cartoons use music as an integral element in their format. Nearly all cartoons use it badly, confining it as they do to the hackneyed, the time-worn, the proverbial … many cartoon musicians are more concerned with exact synchronization or ‘mickey mousing’ than with the originality of their contribution or the variety of their arrangement.
Jones offered an inventive tour-de-force of sound–image synchronization four years later in Rabbit of Seville (1950) which features a rapid, tight interaction not just with character movements but also with editing patterns and a complementarity between the music and the onscreen interplay between Bugs Bunny and Elmer Fudd. Meanwhile, Joseph Hanna and William Barbera produced their own impressive The Cat Concerto (1947) which, amongst other things, features faithful synchronization between the music and Tom’s fingers on the piano.
By the mid-1930s, sound in the Disney shorts moved progressively towards the illusion-of-life aesthetic that was beginning to dominate Disney animation (Telotte, 2008: 34), ending the ‘tyranny of the beat’ (Jacobs, 2015: 72) in favour of what was considered a more realistic style. The Disney studio did not abandon all use of the syncretic approach, but it became more refined. Three Little Pigs (1933) was considered a turning point, in which the narrative and music became integrated in a more complex manner. There is more effort to vary narrative pacing as well as the tempo of movement and music from one sequence to the next (p. 73). Later, feature films such as Snow White and the Seven Dwarfs (1937) and Pinocchio (1940) featured song and dance routines which are stylistically rooted in earlier syncretic cartoons.
Returning to the distinction between referential and hermetic styles of sound design introduced at the beginning of this article, syncretic soundtracks may be understood as hermetic since they are sonically detached from the sound of the natural world, notably in the sense there is an ongoing musical score, musicalized sound effects and stylized voices.
Zip-crash
Following the syncretic tradition that dominated the 1930s, the anarchic cartoon shorts of the 1940s and 50s will now be considered, paying particular attention to Warner Bros. and MGM. The zip-crash 6 mode can be characterized as highly mannered and ostentatious, with sound that plays an active part in the humour of the films, rather than defining the visual rhythm or operating solely in the service of the story. Sound effects are both flamboyant and incongruous, such as a gunshot sound when characters dash off screen, or a tyre screech when they come to a stop. Likewise, voices are highly stylized, from Bugs Bunny’s Brooklyn wise guy to the phlegmatic Droopy and the deranged Woody Woodpecker. Music in zip-crash soundtracks is fragmented, shifting in tempo and genre, and frequently quotes brief excerpts of other compositions for comic effect.
Developments in technology paved the way for this new approach to sound design. By the 1940s, recording equipment was light enough to take outside the studio. In addition, sound editors were finally able to cut sounds together, adjust volumes and overdub a limited number of audio tracks. This development in technology was accompanied by a change at the Warner Bros. studio. Hugh Harman and Rudolf Ising left and were replaced by Tex Avery, Bob Clampett, and later Chuck Jones amongst others, who developed a different sound aesthetic which pulled away from the original syncretic, rubber hose style. Zip-crash sound design breaks away from the syncretic tradition; the dialogue and sound effects are not rhythmically integrated into the music, nor is the onscreen action dictated by the music. Real-world sounds also come into play rather than musical effects, and there is a greater level of sound fidelity.
The use of highly distinctive voices, coupled with striking sound effects and eccentric music, creates a sonic landscape that may be considered hermetic (albeit for different reasons to syncretic cartoons). That is to say, zip-crash cartoons actively avoid resembling the sound of the physical world. What gives the sound of Warner Bros. cartoons such a distinctive and hermetic character are the three major talents they had working on the soundtrack: voice artist Mel Blanc, sound editor Treg Brown and composer Carl Stalling. All three had very specific approaches which complemented each other stylistically.
The human voice plays an active role in zip-crash cartoons. Maureen Furniss (2008: 85) observes that cartoon voice acting is distinct from other voice-over work like dubbing foreign films, radio and voice-overs for commercials. A superior cartoon voice artist needs to be a good actor, as well as possessing a talent for ‘funny voices’. Some voice actors like Mel Blanc perform several roles. His characters may possess speech impediments like Daffy Duck’s lisp, Porky Pig’s stutter, or Elmer Fudd’s rhotacism. They may have strong accents like Speedy Gonzalez, or be distinctive in other ways which correlate with their personality like Yosemite Sam’s grizzled, bellowing voice. Celebrities of the time would also be impersonated in cartoon characters – Pepe Le Pew’s voice was based on the French actor Charles Boyer (who performed as Pepe Le Moko in Algiers, 1938), and Foghorn Leghorn is based on the radio character Senator Claghorn, a blustery Southern US politician. Similarly, at MGM, Bill Thompson’s voice for Droopy was based on the radio character Wallace Wimple. It is also notable that some characters from the zip-crash tradition barely speak, such as Tom and Jerry or Wile E Coyote.
While the earliest cartoons featured sound effects produced by musical instruments, Treg Brown used (and also contributed to) Warner Bros.’ extensive library of live action movie set recordings. Using out of context real-world sounds became part of his hallmark style. For example, he would produce a kangaroo hop by twanging a fingernail file off the side of a table, and create tongue blips from the Road Runner by snapping his finger out of a coke bottle (Burtt quoted in Crash, Bang, Book: The Wild Sounds of Treg Brown, 2004). The imposition of incongruous yet effectively synchronized real-world sounds into the fantasy world of the cartoon contributes to their ostentatious effect.
Brown’s approach may be understood in contrast with Disney’s Jimmy MacDonald, who began his career in the 1930s when sound recording equipment was too heavy to be taken on location and thus recorded all his sounds in a controlled studio environment instead of real-life car screeches or aeroplane engines, for instance. MacDonald produced custom sound effect machines which replicated the natural sounds of the world that could be controlled and recorded inside a studio (Finan, 2016). For example, dried peas rotating in a wooden cylinder could replicate the sound of rain and a canvas scraped against wood sounded like wind (similar techniques used for radio soundtracks of the time). MacDonald and Brown’s divergent approaches to sound, then, are products of both the different technologies available and also the aesthetic intentions of each respective studio. MacDonald wanted to reproduce real-world sounds in a pleasing (sometimes musical) way, while Treg’s wild field-recorded effects contributed to the offbeat landscape of Warner Bros. cartoons.
The success of Brown’s sound effects may be understood in part by a process Michel Chion (1994: 109) calls rendering, where the film spectator unconsciously recognizes sound effects to be suitable not because they faithfully reproduce a real-world sound, but because they convey the feelings associated with the events depicted. Hence, characters dig their heels into the ground when coming to an abrupt stop and they are sonically accompanied by a car tyre screeching to a halt. The application of particularly bold uses of rendering is perhaps afforded by the artificial-by-nature character of stylized animations.
The incongruity of sound–image relations has at times been the object of humour. For example, Duck Amuck (1953) features the sound of a machine gun seemingly emanating from a guitar when Daffy begins to play. This paved the way for a later collaboration between Chuck Jones and Treg Brown in a short called Now Hear This (1962), in which a British gentleman unwittingly uses a devil’s horn as an ear horn. As a consequence, the sounds that he (and the audience) hear are continually misleading. For example, when it sounds like an off-screen train is approaching, the man runs for cover. It turns out to be an ant that scuttles across the screen.
Audio-visual metaphors also occur in Now Hear This, for example when the man listens to a bird singing into his horn he hears the sound of a music box. Both a birdsong and a music box are sweet, melodious and provide a wholesome charm. Later in the film, the man looks inside the horn to inspect it, and a stream of bubbles emanate out into his face. This is accompanied by the sound of a group of men laughing. In this instance, the sound doubles-up as both the laughter evoked by the fellow being pranked, but also illustrates a peculiar congruity between the sound of laughter and a stream of bubbles. Laughter, like a stream of bubbles, features a rapid string of short utterances. The bubbles emerging from the red pipe engulf the male figure, providing the audience with a visual equivalent for the sound track; each bubble moving through the air and past the figure’s head and ears can be read as a graphic visual metaphor of many verbal utterances. Experienced together, there is a curious push-pull effect of a simultaneous congruity and incongruity between sound and image (see Figure 2).

A ‘laughing bubble attack’ in Now Hear This (1962). Screen grab from DVD (Looney Tunes Golden Collection: Volume 6, Disc 4, Chuck Jones, Warner Bros.).
In addition to real-world sounds being effectively applied in out-of-context scenarios, Brown’s sound effects are also notable for perceptually imprinting fast visual events effectively on the spectator. Quick movements in cartoons are augmented by rapid auditory punctuations – crashes, gun shots, swooshes, and the like. This is particularly valuable for the ‘zip-crash’ school of animation since the ear analyses and processes faster than the eye. Chion (1994: 11) explains, ‘The eye perceives more slowly [than the ear] because it has more to do all at once; it must explore in space as well as follow along in time.’ When confronted with a sound film, then, the eye is more spatially adept and the ear is more temporally adept. Hence, the eye and the ear serve each other productively, particularly in cartoons that feature rapid movements.
Also notable is the balletic, complementary interaction between Treg Brown’s sound effects and Carl Stalling’s orchestration. 7 Like the earlier syncretic soundtracks, Brown’s effects seemingly become ‘musicalized’ when accompanied by Stalling’s orchestrations, though they are not integrated into a discernible musical rhythm like syncretic films. For example, in Zipping Along (1953), Wile E Coyote practises his skills of hypnosis on a fly before trying it on the Road Runner. He stands on his tiptoes, reaches his arms forward and wiggles his fingers. Zig-zag lines emanate from his hands, and this is accompanied by a buzzing, electrical sound effect from Brown and a high-pitched musical trill from Stalling. When the fly is hypnotized and enters a state of entrancement, a bell rings. Both musical and non-musical sound effects work together to create an aesthetic whole.
Similarly, music and sound effects work together symbiotically in High Diving Hare (1949). Yosemite Sam tries to forcefully coerce Bugs Bunny to leap into a bucket of water from a high diving platform, only to be tricked into repeatedly performing the act himself. In one static 20-second shot, we see Yosemite Sam climb the ladder and fall through the air three times. As he begins to fall off screen, a slide whistle ascends tonally. At the same time, a rapid, high-register string sequence descends. At this stage, the sounds suggest activity outside the frame since we do not see how he fell off the diving board. As he rapidly falls from the top of the screen to the bottom, the motion lines following his descent are augmented by by a quick audio ‘swoosh’, rendering the fall whilst also making the quick 7-frame drop across the screen perceptually imprint itself on the audience (see Figure 3). Finally, there is an off-screen splash as Sam falls into the bucket.

Yosemite Sam zips across the screen in High Diving Hare (1948). Screen grab from DVD (Looney Tunes Golden Collection: Volume 1, Disc 1, Friz Freleng, Warner Bros.).
Enraged, Sam repeatedly climbs the stairs only to be tricked again. As he moves, instead of footsteps we hear a fast, low-register ascending string sequence. Yosemite Sam’s musical stamps up the ladder bear the legacy of the syncretic cartoon’s mickey mousing. These are used liberally in the zip-crash cartoon, like the familiar trope of the ascending pizzicato violin strings when a character tiptoes off screen.
Both Stalling and Scott Bradley (at MGM) carried over the musical-vocalization technique applied in syncretic cartoons as cited in You Don’t Know What You’re Doin’! (‘Oh yeah? Is that so’). In Tex Avery’s Northwest Hounded Police (1946), Droopy (as a Canadian Mountie) pursues Wolfie (an escaped convict). While on the run, Wolfie reads the signs ‘Don’t look now’, ‘Use your noodle’, ‘You’re being followed’, and ‘By Sgt. McPoodle’. The audience hears a bowed violin mimicking the phonic properties of Droopy’s voice as if he were saying the words out loud. In a vivid blurring of dialogue, music and sound effect, ‘[the] rhyme structure encourages a tonal match for the spectator as she or he reads the close-up of each message’ (Allen, 2009: 12).
Another technique Stalling and Bradley shared was their use of rearranged song fragments. While Scott Bradley avoided quoting popular songs when possible (Goldmark, 2007: 8), Tex Avery (for whom Bradley scored at MGM) nonetheless liked including extracts of familiar, popular tunes of the time such as ‘La Cucaracha’, Rossini’s ‘William Tell Overture’ and Raymond Scott’s ‘Powerhouse’. In contrast, Stalling is best known for quoting and rearranging other pieces of music from opera to jazz and Tin Pan Alley. For him, musical quotations became a language through which he could tell stories (p. 7).
Stalling’s song fragments would often commentate humorously on the events. For example, in Catch as Cats Can (1947) Sylvester swallows a bar of soap and begins hiccupping bubbles, to the melody of ‘I’m Forever Blowing Bubbles’. In Mouse Wreckers (1949) Jerome and Koehler’s ‘Sweet Dreams, Sweetheart’ plays when Claude Cat is reading a book about nightmares. It may be noted that while Stalling has been defined principally by his quotations of other songs (p. 10), he has stated ‘Eighty to ninety percent [of my music] was original. It had to be, because you had to match the music to the action, unless it was singing or something like that’ (Stalling, quoted in Barrier, 2002: 50).
With his fragmentary and rhythmically complex style, Bradley considered his compositions to be attached to the tradition of 20th-century modernist music, alongside composers such as Charles Ives or Igor Stravinsky (Goldmark, 2007: 72). Stalling reportedly did not think of his music in this way, yet he has been reconsidered in this light in more recent years (see Brophy, 2002) following a later, renewed interest in his work. 8 With both Bradley and Stalling, their music does not merely underscore events, they also anticipate gags, accent visual jokes, define characters and respond to visual events.
Like all modes of sound design detailed in this article, the context played a large role in the development of the zip-crash aesthetic. Sound recording equipment of the time was more developed than it was during the syncretic period and less developed than the functional and poetic-authentication periods. Access to orchestras that Stalling and Bradley had at their respective studios (which were otherwise used for feature films) enriched their sonic landscapes and are less common in television animations.
More recently, the legacy of the zip-crash school of sound design can be heard in TV shows like Animaniacs (1993–1998), Ren and Stimpy (1991–1996) and Spongebob Squarepants (1999–). A familiar lexicon of flamboyant sound effects and eccentric voices, with fragmentary soundtracks can still be heard in the more ‘cartoonish’ animations. However, with the rise of animation made for television, the functional school of sound design has predominated.
Functional
The functional mode of sound design should be principally associated with made-for-television animated sitcoms like The Simpsons (1989–), Family Guy (1999–) and more recently Bojack Horseman (2014–). It is more minimalist than the zip-crash mode, with a notably spare style which is principally in the service of the narrative rather than operating as an element that calls for contemplation in and of itself. While syncretic animations take audio-visual synchronization as the principal structuring system and zip-crash cartoons use flamboyant music and sound effects to contribute to the humour, in the functional cartoon, sounds are stripped back to what is necessary to serve the narrative, and the voice is king.
This more paired-back approach to sound began at the inception of made-for-television animation with Jay Ward’s Crusader Rabbit (1950–1952) which featured a voice-over, occasional character dialogue and a light underscore throughout. This was followed by early Hanna-Barbera television cartoon sitcoms like The Flintstones (1960–1966), Top Cat (1961) and The Yogi Bear Show (1961) which similarly featured underscoring throughout the episodes (covering the gaps between dialogue and sound effects) and a laugh track, recreating the sound of a ‘studio audience’, as was the convention of the time in TV sitcoms.
Under Greg Watson, a collection of sound effects was developed at the Hanna-Barbera Studio which offered a distinct audio branding. Familiar sound effects particular to Hanna-Barbera cartoons would recur, such as cowbells ringing followed by a bullet ricochet when a character scrambles before dashing off screen (Finan, 2016). Other animation studios did not subsequently adopt the creation of a distinctive lexicon of sound effects and the use of laugh tracks was quickly discarded. But the use of light music throughout the cartoon continued through Saturday morning children’s cartoons produced by Filmation such as He-Man and the Masters of the Universe (1983–1985) and Bravestarr (1987–1989), as well as later children’s cartoons such as Clifford the Big Red Dog (2000–2003). In animated sitcoms like The Simpsons, distinctive sound effects are not an in-house trademark, laugh tracks are not utilized and music does not play through the dialogue quite so much. Music is closer to that of live-action television than their animated predecessors in the 7-minute cartoon shorts discussed previously.
Highlighting the importance of dialogue in functional sound design, Chuck Jones has referred derisively to made-for-television cartoons as ‘illustrated radio’ (Jones, quoted in Furniss, 2005: 64). The relationship between the voice and other audio elements can be characterized by thinking in terms of a hierarchy of sonic importance. Dialogue is at the top of that hierarchy, followed by sound effects, then ambient effects, and finally music at the bottom.
Like zip-crash sound design, a wide range of vocal styles can feature in functional sound design. Characters might possess a vocal embodiment which is indicative of their personality, e.g. Grandpa Simpson is an emaciated, angry old man, Cartman’s voice from South Park (1997–) is that of a quintessential brat, and Scooby-Doo speaks with dog-like vocalizations. The celebrity impersonations practised by Mel Blanc carried over to television, with characters like Top Cat (based on Phil Silvers) in the 1960s at Hanna-Barbera. The Simpsons continued this tradition with characters like Rainer Wolfcastle (based on Arnold Schwarzenegger) and Mayor Quimby (based on Ted Kennedy). Like Mel Blanc, some voice actors perform several roles, like Dawes Butler, June Foray, Hank Azaria, Billy West and Tom Kenny. Other characters remain mute like the Pink Panther, or Maggie Simpson.
Taking a step away from Greg Watson’s approach of branding sound effects at Hanna-Barbera, sound editor Jeff Shiffman who has worked on Saturday-morning cartoons such as Thundercats (1985–1988) and Teenage Mutant Hero Turtles (1987–1996), rejects the ‘branding’ of a familiar lexicon of recognizable sounds. He comments that the less one relies on existing library materials, ‘the more original a show will sound over the course of time’ (Shiffman, 2015). In addition, a show like The Simpsons relies more on realistic sound effects than the more flamboyant, zip-crash sound effects:
Lampooning life and human behaviour, the show is written for an adult audience and, as such, it doesn’t fall back on slapstick and ‘bam-splat’ Hanna-Barbera-type sound effects to support the story-line … the sound effects need to be real, and this can be quite demanding when considering some of the unconventional situations and locations in which the characters find themselves. (Buskin, 1997)
This quotation places The Simpsons and other shows with a functional sound design closer to the ‘referential’ pole of sound than the ‘hermetic’, meaning it more closely resembles the sound of the natural world than the syncretic and zip-crash styles. Sound designer on The Simpsons, Travis Powers, comments that sound effects are often underplayed rather than overplayed to make them truer to life: ‘That makes the situations even funnier – like if Homer gets hit in the head it’ll sound exactly like that, because the pain comes from the realness of it’ (Powers, quoted in Buskin, 1997). This approach is diametrically opposed to Treg Brown’s more incongruous style, yet both serve their respective purposes in order to achieve different aesthetic effects.
Sound can also guide audience attention to particular onscreen events. In turn, it may be applied to serve the narrative and ensure audiences notice what directors want them to (Beauchamp, 2013: 17). If a character is eating, the sound designer may add the sound of chewing to draw one’s attention to this. If a character is making notes during a lesson, the image might be accompanied by exaggerated scratching or squeaking sounds of a pencil to draw attention to the fact they are diligently following the teacher’s words.
In addition to ‘hard effects’ like screeching tyres, ticking clocks and gun shots, atmos effects can also be applied like rustling leaves, airport sounds or a waterfall. Atmos effects feature in functional animations, but only when deemed necessary and they will play at a low threshold level so that they are only just discernible. In The Simpsons, classroom scenes do not tend to feature leaves rustling outside the school window, noise from the hallway or an air conditioner. Atmos does feature more prominently during establishing shots, such as the sound of crickets accompanying an exterior shot of the Simpson household in the evening. Once the scene cuts to the inside of the house, the atmos will either disappear or lower to threshold level so as not to interfere with the dialogue.
Music in the functional mode also marks a significant change from syncretic and zip-crash cartoons. In the functional mode, it operates in a similar way to live action sitcoms. Alf Clausen, the composer for The Simpsons, had a background composing music for TV drama rather than cartoons and was intentionally sought out for that reason. Not only did Clausen’s style mark a breakaway from Carl Stalling and Scott Bradley, but also from the near continual musical underscore of television cartoons from Hanna-Barbera in the 1960s and 1970s, and the synth-scored Saturday morning children’s cartoons of the 1980s.
With a 35-piece orchestra (Television Academy, 2015), Clausen typically recorded just over 30 pieces of music per episode. These range in length from a couple of seconds up to a minute and a half, covering a wide range of musical styles. An underscore may feature, or a stinger (a brief instant of music which accompanies a scene transition). Some characters also have their own themes, like Sideshow Bob or Mr. Burns – though most characters (including Marge, Homer, Bart and Lisa) do not. Generally, music on The Simpsons operates in a similar way to live action film and television soundtracks. Unlike zip-crash music, the music in functional cartoons underscores emotion rather than comedy. In contrast to Carl Stalling’s music, Clausen comments: ‘overall, the producers don’t want the music to make a statement – no musical jokes. The dialogue has got to be what’s driving it, and it’s got to be funny, and the visuals have to be what’s funny’ (Clausen, quoted in Goldmark, 2002: 245).
For an account of the way in which the musical soundtrack operates in The Simpsons along with many other conventional soundtracks, Claudia Gorbman’s (1988: 73) list of functions may be considered: conventional soundtracks provide formal and rhythmic continuity by covering transitions between scenes; they offer unity via repetition and variation of themes; the soundtrack serves as a signifier of emotions, setting moods and reinforcing spectacle; they offer narrative cueing, establishing settings and characters, interpreting and illustrating events; and the soundtrack is inaudible, subordinating itself to the dialogue and images which are the primary vehicles of the narrative. There is some crossover with these aspects in syncretic and zip-crash soundtracks though music is less likely to be subordinated to the dialogue and images in these earlier modes of cartoon sound design. As detailed, both of these earlier modes also include musicalized sound effects and the rhythm of the music is synchronized with the onscreen movements (occasionally in zip-crash cartoons, and continually in syncretic cartoons).
While the functional sound design approach is less ostentatious than previous styles, it has proven itself to be a suitable aesthetic for animation television series that are primarily dialogue driven. Like the previous two schools of sound design, the functional approach is also viable within the time available for each episode – which is limited. For The Simpsons, the sound effects editor is given five to seven days to work on each episode, the dialogue mixer has two working days per episode and the final dub is finished in six hours (Buskin, 1997). The final category to be proposed in this article is produced over a longer period of time, with a higher budget and it makes more explicit use of modern sound technology.
Poetic authentication
As the name suggests, this school of sound design is furthest from the hermetic end of the pole and closest to the referential. Sound authenticates the visual world in this category (with creative licence), rather than stripping sonic layers back to the essentials in order to serve the narrative (functional), contributing to the humour (zip-crash) or providing a visual rhythm (syncretic). This mode is typically found in the work of modern CG animation movie studios such as Pixar or Dreamworks. Paul Wells (1998: 102) argues:
[Some animators] deploy sound in a way that makes characters speak as if they were live-action actors, use music as a barometer of mood in the fashion of live-action narratives, and only employ sound effects to properly represent or enhance the real sounds present in the environment.
To enhance the effect of realism, full use is made of the finely detailed sound design afforded to filmmakers following the revolution brought about by the introduction of Dolby technology. In the earliest stages of film sound, the principal goal was to offer the audience something clear and distinct, with minimal sensory complexity. By the 1940s and 50s, it was possible to integrate more sounds simultaneously. Following the initial invention of film sound, progress in technology led to ‘the second sound revolution’ (Schreger, 1985: 348), making cinematic sound more spectacular. This happened in two phases: Dolby stereo was introduced in the 1970s and digital sound recording in the 1990s.
Following the introduction of Dolby, there was a marked improvement in signal-to-noise ratios, resulting in less ‘hiss’. In addition, higher and lower pitches could be clearly reproduced thanks to an increase in frequency response. There was an improved clarity and subtlety in the sound, and sonic textures became denser with a variety of layers overlaying one another. All of this contributed to a new sensitivity to tiny aural details in cinema. Consider the dinner scene in The Incredibles (2004); in addition to the non-caricatured voices of the mother and two children, one can hear small details such as chairs shuffling, clinking cutlery, the baby babbling, footsteps (which are different for each family member) and a rattling vase on the mantelpiece after the door slams shut. Sounds overlap without loss of fidelity and each one is equalized with a suitable amount of reverb to suit the space. When Dash speaks into his cup, his vocal timbre alters accordingly. Likewise, when Bob Parr (the father) calls from the next room, his voice sounds suitably distant.
With Dolby technology, noise and sound effects are able to take on heightened dramatic, expressive and sensuous interest. Sounds can create, embellish or deepen the mood and atmosphere through equalization and reverb. In Wall-E (2008), the sound of Wall-E’s motors change depending on whether he is in a barren wasteland at the beginning of the movie, or the sterile mothership later on. In both cases, as well as being informative about the space he inhabits, the reverb of his motors also influence the tone of the scene.
Ambience can operate as a delicate underscore that stabilizes the image and creates space beyond the screen, like the low rumbling of the mothership in Wall-E. Michel Chion (1994: 150) refers to the space created by ambient noise in Dolby sound as the superfield, which he defines as ‘the space created, in multitrack films, by ambient natural sounds, city noises, music, and all sorts of rustlings that surround the visual space and that can issue from loudspeakers outside the physical boundaries of the screen’. Some superfields have a vast extension, suggesting a large cityscape for instance, while others suggest more intimate spaces, or little suggestion of an outside world.
Sound localization also becomes more important in modern cinematic sound design. Cinema sound began as monophonic but this was later replaced by stereo, then quadrophonic and finally surround sound (seven speakers or more). With the additional speakers, sound became a more engulfing sensory experience. When Wall-E floats in outer space propelled by a fire extinguisher, it is possible to track his movements with your eyes closed as he moves from left to right and across the rear speakers in a surround sound setup.
These sorts of immersive audio effects were not possible during the development of zip-crash cartoons, and while Dolby technology has been available for functional cartoons, this focus on subtle sounds and a stronger presence of the superfield is not generally exploited. This might be accounted for by noting that sound designers of feature films generally have more time and more resources than those working on a television show. In addition, this more detailed style perhaps feels more congruous with CG animation, which looks closer to photo-realism then the simplified block-coloured 2D animation more typically found on television.
There are two figures who are most instrumental in developing the poetic-authentication sound aesthetic at Pixar. The first is Ben Burtt, who worked on Pixar’s earliest short The Adventures of André and Wally B (1984), and later Wall-E. The second is Gary Rydstrom, who worked on Luxo Jr. (1986), subsequent shorts, and also their first five feature films. The style developed by these two artists can be broadly characterized as a poetic use of sound which also serves to authenticate the image. Rydstrom adds sounds in Luxo Jr. which both sell the reality of the image, while also having an emotional effect. The two human-like lamps that leap and interact with one another are accompanied by anthropomorphic squeaking sounds which at various times sound playful, curious and sad (Wells, 2009: 28).
Ben Burtt worked in a similar way on Wall-E. He has stated that, when deciding on a sound, he first asks what noise would the object in question plausibly make? What’s the physics of it, and how does it work? However, if such a sound does not create the right effect required for the scene, the science will be abandoned and he will develop that sound or manipulate it into something that works emotionally (Interview with Burtt, Animation Sound Design: Building Animation from the Sound Up, 2008). For instance, Wall-E himself is made from a multitude of motors but Burtt needed to find suitable motors that matched Wall-E’s personality. In contrast, Eve (Wall-E’s companion) has soothing, hi-tec, musical tonalities that serve as a digital counterpoint to Wall-E’s rattling, mechanical construction. The respective sound effects characterize the two robots differently. While the audio seemingly authenticates animation, then, there is nonetheless an important amount of creative intervention to engage the audience emotionally.
Generally, feature films have more time to prepare their scores and larger orchestras to work with than television shows. For the most part, music in poetic-authentication movies operates in the same way as functional and live-action scores – it reinforces spectacle, provides narrative information, and offers formal and rhythmic continuity and unity. But like Carl Stalling’s tendency to quote existing popular songs of the day, feature-length soundtracks for CG films such as Shrek (2001), Chicken Little (2005), Happy Feet (2006) and Trolls (2016) also use, or rearrange, existing pop songs. However, this has more to do with an existing tradition in live-action cinema called ‘compilation scores’ with films like American Graffiti (1973) or Goodfellas (1990), where popular songs are used to denote time periods, suggest suitable moods and feature lyrics loosely pertinent to the scene over which they play (Smith, 1998: 155). In Shrek, for example, the song ‘I’m On My Way’ by the Proclaimers plays when Shrek and his companion set off on a journey.
Some of the previous traditions in animation voice acting remain while others are less common. Celebrity impersonations are uncommon in poetic authentication and so are actors performing several roles in the same film. However, vocal affectation corresponding with their personalities is still commonplace, such as the ‘hillbilly’ voice of Tow-Mater in Cars (2006) or the nebbish Hiccup from How to Train Your Dragon (2010). Celebrities lending their voices with minimal affectation to animation is now commonplace, such as Tom Hanks in Toy Story (1995), John Goodman in Monsters Inc. (2001) and Samuel L Jackson in The Incredibles.
The poetic authentication mode, then, should be best understood as the style most closely resembling live action sound design. Some creative licence is used to best tell the story, but it nonetheless aims to more closely resemble the natural sounds of the physical world than the previous modes of sound design, even if the ‘realism’ of this style of sound design is based on a set of cinematic conventions.
Conclusion
Observing the development of sound design in animation from its inception to the present day, one may argue that it was most distinct from live action in its earliest forms. As years passed, the dominant modes began to increasingly emulate live action television and film. Cartoon sound design began as highly hermetic (sounding wholly unlike the natural world) and moved progressively towards the opposite referential pole, without ever fully recreating the sound of the natural world any more than commercial feature films since they always feature a subtle amount of manipulation to create the desired effect.
Evidently, some conventions passed from one style to others. Techniques like mickey mousing, celebrity impersonations, musicalized sound effects and vocalizations, and the quotation of song fragments span more than one single mode. In addition, voice actors performing several roles with speech impediments, exaggerated accents or celebrity impersonations have also been used at various stages throughout animation’s history.
As stated in the introduction, novel approaches to sound design emerged over the 20th century in American animation according to contingencies of technological development, changing conventions and economic factors. In all four modes of sound design, each approach is designed to produce different effects.
An interesting creative exercise would be to match the various modes of sound design with alternative visual styles. Syncretic sound is aligned with rubber hose animation, zip-crash sound is generally aligned with the golden age of 7-minute cartoon shorts, the functional style was developed for limited animation on television and poetic-authentication was designed to work with modern CG animations. There appears to be congruency between each corresponding sonic and visual style, but this could be disrupted to interesting effect. Syncretic image–sound design could be applied to a CG animation, for example, or poetic authentication could be applied to limited animation.
As with any typology, the distinctions between styles proposed will not always be completely tidy. Categories are helpful as starting points when seeking common ground, but following this they may be used as a springboard to deconstruct, adjust, hybridize categories and develop subcategories. 9 As such, refining these categories could be a suitable project for future research. In addition, this article focused on American studios, but other national approaches to sound design in European or Asian studios should also be considered for further research. 10 Suffice to say, there is still much to be explored in the nascent field of sound design in animation studies.
Footnotes
Acknowledgements
My thanks to Professor Murray Smith, whose course Sound and Cinema (for which I served as teaching assistant) set the groundwork for this article.
Funding
This research received no specific grant from any funding agency in the commercial, public or not-for-profit sections and there is no conflict of interest.
