Abstract
The purpose of this study was to examine the role initial attack and expertise play in the identification of instrumental tones. A stimulus CD was made of 32 excerpts of instrumental tones. Sixteen possible combinations of the variables of initial attack (present or absent), expertise (beginner versus professional), and timbre (flute, clarinet, alto sax, trumpet) were recorded twice, once on B-flat4 and once on F4. After listening to the excerpts, music major (n = 100) and non-major participants (n = 112) identified the instrument that was performed in the excerpt and the expertise level of the performer of the excerpt. A list of possible instruments was provided that included the four stimulus instruments as well as four distractor instruments. Data analysis revealed initial attack presence had a significant main effect on instrument identification, especially for music majors’ identification of performer expertise. Results suggest presence and possibly quality of the initial attack affect timbre identification and tone quality assessment.
The ability to accurately perceive, evaluate, and identify the sounds produced by musical instruments is a skill at which all music educators should excel. This skill is necessary for teachers to accurately and efficiently diagnose performance errors and to prescribe solutions and improvements. In elementary general music curricula, teachers challenge students to develop the ability to identify musical instrument timbre. How well do musicians and non-musicians perceive music instrument timbres? Do they differ in their ability to accurately identify instrumental timbres? What components of musical instrument sounds affect one’s ability to perceive timbre? These questions have been asked not only by musicians, but also by acousticians, neuropsychologists, physicists, and others.
Neurologists have contributed to musicians’ understanding of timbre identification. Cognitive brain research involving timbre has often utilized event-related potentials (ERPs) to measure the amount of cognitive activity. ERPs are “neurophysiological electrical potentials elicited by a stimulus associated with the execution of a cognitive or motor task” (Crummer, Walton, Wayman, Hantz, & Frisina, 1994, p. 2720). Studies utilizing ERPs have suggested that the brain is capable of differentiating between tones of differing timbres (Goydke, Altenmüller, Möller, & Münte, 2004). Research in musical settings has come to the same conclusions, with the caveat that timbre does not exist independently of other acoustical attributes. Handel and Erickson (2004) utilized multidimensional scaling techniques to assess non-musicians’ similarity judgments of instrumental timbre and concluded that changes in timbre which occur over the range of an instrument had an effect on the ability to discern similar instruments.
The amount of musical training may also affect how the brain processes musical timbre and music experience might influence one’s identification of sounds, particularly musical instrument timbre (Crummer et al., 1994). Pitt (1994) found that non-musicians struggled to identify a change in timbre when coupled with a simultaneous change in pitch, but musicians did not. Madsen and Geringer (1990) investigated differences between musicians’ and non-musicians’ focus during listening and found that non-musicians did not focus on timbre in musical excerpts even when timbre was identified as a prominent feature. An extension of this research indicated that despite music majors’ ability to identify timbre as more prominent than other features in the excerpts, both majors and non-majors displayed a lower preference rating for timbre-prominent excerpts (Geringer & Madsen, 1995/96).
Accurate identification of instrumental timbre quite possibly interacts with previous music experiences. In a study related to working memory, Hall and Blasko (2005) found participants who played an orchestral instrument were better able to negotiate change interference from a differing/conflicting timbre. Conservatory students who played orchestral instruments had significantly higher recognition rates in comparison to pianists, guitarists, and singers (Srinivasan, Sullivan, & Fujinaga, 2002). Compton (2007) found that high brass players were significantly more successful than low-brass and non-brass participants in discriminating between excerpts performed on cornet or trumpet, a result illustrating that experience may influence perceptual accuracy. Although music performance experience appears to increase acuity for the detection of timbral differences, Bernier and Stafford (1972) found no significant relationship between music instrument preference and the ability to detect timbre differences. These findings imply that musicians and non-musicians not only listen and perceive music differently, but that previous music performance experience may cause musicians to attend to timbre more so than non-musicians; therefore musicians may be able to identify musical timbres more accurately.
Teaching developing instrumentalists to produce a characteristic sound is a critical task for instrumental music teachers. A frequent pedagogical approach is to conceptualize timbral variations with the use of adjectives and physical analogies to describe with words how the sound of instrumental timbre varies. Cavitt (1996) found that teachers and pedagogical texts most frequently used the terms “dark” and “full” to describe ideal brass tone, while the terms “pinched,” “thin,” and “tight” were used to describe poor brass tone (p. 15). Geringer and Madsen (2005) found that the darker tone quality of a flugelhorn resulted in music majors correctly identifying it more consistently than they did either B-flat, C, E-flat, or piccolo trumpets playing the same musical patterns. It seems that variations in tone, regardless of how they are described (in sound or words) may influence instrumental timbre identification.
In addition to tone quality and timbre variations effects on perception, another probed area of timbre identification has involved the role of the initial attack, also known as the initial transient or onset. Research results are conflicting as to the extent of effect due to the attack. Through the use of multidimensional scaling, Iverson and Krumhansl (1993) suggested that more information is necessary for timbre judgments beyond the attack. Kendall (1986) utilized signal notes and folk-song phrases to determine whether transients were necessary to identify the instrument performing the excerpt. He concluded the attack was not necessary for the folk-song phrases and was “sufficient but not necessary” (p. 209) for instrument identification in single-note context. These results question the perceptual importance of the attack.
In contrast, various studies suggest a resultant perceptual advantage due to the presence of the attack. Saldanha and Corso (1964) edited single tones to isolate initial and decay transients along with the steady state of the instrumental tone. Using three pitches and 10 instruments, results indicated identification of instrumental timbre improved with practice; initial transients and vibrato aid in identification; and some pitches (F4) and instruments (clarinet, oboe, and flute) may be more identifiable with or without the initial transient/attack. Both Wedin and Goude (1972) and Thayer (1972) found subjects had more difficulty identifying instrumental sounds when the initial attack was missing. Additionally, Thayer also found that those with more musical training performed best and that removal of the attack affected accurate identification of the trumpet more so than clarinet and flute. Elliott (1975) investigated the role of the attack with graduate music students who were asked to identify the instrument performing the excerpt. Subjects heard the tone performed with and without attack and the results showed significant correct responses for clarinet, oboe, and trumpet, despite presence of the attack. Paul (2005) replicated and expanded Elliott’s study with high school students with similar results. Clarinet results were consistent with Elliott’s research, while the trumpet results contradicted Elliott’s research but were consistent with other findings suggesting that the attack on the trumpet was important for correct identification (Thayer, 1972). The subjects also improved throughout the three listening sessions, which suggests that either the attack was important or that the subjects’ identification results improved with practice. Adult amateur musicians also were significantly more successful identifying instrumental timbres when the attack was present and consistently identified trumpet over flute, clarinet, and saxophone excerpts (Schlegel and Lane, 2013).
The variations in tone quality/timbre inherent with instrumental performance coupled with presence, and perhaps quality, of the initial attack seems to have an effect on the ability to perceive and identify instrumental timbre. The quality of instrumental tone and attack may interact with the ability to identify timbre and also the expertise/experience level of the performer. In a study by Compton (2007), three accomplished trumpeters (performers with graduate degrees in performance in conjunction with several years of professional performing experience) performed musical excerpts on trumpet and cornet. A participant tasked with identifying the instrument performing the excerpts noted one of the accomplished players to be weaker than the others. Compton theorized, “Perhaps this person’s concept of a ‘good’ sound has to do with the attack portion of the tone, which was audibly different with one player than with those of the other two performers” (p. 77). Compton’s comment implies the audibility and quality of the attack may influence the perceived quality of tone. Does the absence of attack not only influence the identification of instrumental timbre but also the assessment of the quality and expertise level of the performer? Schlegel and Lane (2013) found that adult amateur musicians were not able to differentiate between the expertise levels of the performer of the excerpts regardless presence or absence of attack.
Therefore, the primary purposes of this study were to observe the role of the initial attack of an articulated instrumental sound in identifying the (1) timbre of the sound and (2) performance level of the performer. Within the context of this study, the term “timbre” was used to describe the characteristic sound produced by an instrument. From a pedagogical perspective, both the student’s and teacher’s ability to separate critical components of sound is important to the teaching and learning process. If a student has an uncharacteristic sound, determining whether it is related to the long tone, the initial attack, or both would result in different pedagogical and practice strategies. This study is unique among the literature in that it expressly uses both student and professional tones as excerpts. Previous research has been limited to professional tones or those generated electronically. The present study is an extension of an initial investigation by Cassidy and Schlegel (2008) exploring the role of initial attack on non-majors’ ability to identify instrument timbre and performer expertise. Data from the 2008 study were combined with that of musicians, gathered using the same protocol, allowing for comparison between groups, a secondary purpose of the study.
Method
Musical stimuli
In order to observe the influence of various sound parameters on participants’ ability to correctly identify instrumental timbres, a CD was created with 32 single-tone excerpts. Sixteen possible combinations exist using the variables of experience of performer (beginner or professional), instrumental timbre (flute, clarinet, alto saxophone, trumpet), and initial attack (present or absent). For auditory variety, these 16 combinations were presented twice, once on B-flat4 and once on F4, resulting in 32 excerpts. Instruments were selected based on those typically taught at the beginning level and pitches were determined to be in a comfortable range for beginners. In order to keep all pitches in the same range, only soprano instruments were chosen for the study.
Professional sound files were taken from the McGill University Master Samples (Opolko & Wapnick, 2006). Multiple studies in timbre identification (Brown, 1999; Hall & Blasko, 2005; Handel & Erickson, 2004; Ilmoniemi, Valimaki, & Huotilainen, 2004; Iverson & Krumhansl, 1993; Pitt, 1994; Srinivasan et al., 2002) have used the McGill University Master Samples (MUMS) as a resource for stimuli. Tones of beginners were collected by recording sixth-graders who were at the end of the first year of participation in middle school band. Students’ instruments were tuned prior to recording, and pitches, calibrated to A440, were sounded before the recording process in order to aid the beginning students in producing an in-tune performance. Numerous students were recorded in order to ensure there were enough viable sound files (correct pitch, in tune, characteristic tone, clear attack) from which test excerpts could be extracted. A panel of instrumental music experts listened to the tones and, for each instrument and pitch, selected the one that was most in tune and represented the most characteristic sound of a beginning student. The hardware utilized in the recording included an Audio-Technica 4040 large diaphragm condenser microphone and a Presonus Digimax FS Class A pre-amplifier. Files were recorded as .wav files (44,100 Hz, 16-bit stereo) and were recorded in a quiet office off the main rehearsal room. After the signal was captured, it was imported into Pro-Tools LE, Version 7.1.
One of the main points of interest in this study was the role of initial attack in the identification of timbre, especially as it relates to beginner and professional instrumentalist performance. Sound Forge Audio Studio 8.0 was used to crop all sound files in order to isolate the steady-state tone and the initial attack. Length of the raw professional sounds ranged from 2.13 to 3.9 seconds and could not be lengthened without distorting the integrity of the sound. Raw beginner sounds were approximately 4 seconds in length. To equalize duration, the tones containing the initial attack encompassed the first 1.5 seconds of the excerpt. The sustain excerpts were also was 1.5 seconds in length but the initial half-second was not included. No fade in or out (ramping) was inserted into any of the files. With this editing, all 32 sound files were 1.5 seconds in length. The amplitudes were equalized using the same software so they would be perceived as being equal in volume. A panel of music experts listened to the excerpts to confirm the uniformity of volume. In addition, the panel compared the perceptual audio quality between the professional (MUMS) and beginner excerpts to determine if there was excess “noise” in the beginner excerpts. The panel concluded that the files were perceptually uniform in volume and in quality.
Due to the short length of the tones, each sound file was played twice, with 1 second in between the repeated tone. The first iteration of the tone was preceded by .5 seconds of silence and the repeat was followed by .5 seconds of silence, resulting in a total of 5 seconds for each complete excerpt. A track was inserted between each excerpt to announce the number of the following excerpt.
The participants (N = 212) were all students at a large southern university and were either music majors (n =100) or non-majors (n =112). Music majors were at various stages of their academic program though no facet of their degree program such as major instrument or year in college was utilized as a factor in the analyses. All participants were tested in a classroom environment during a regularly scheduled class. It should be noted that all of the non-majors were enrolled in a summer general education music class, suggesting at least a minimal level of music experience. Class sizes varied, but all rooms and audio equipment provided acoustic environments that were conducive to hearing subtle changes in auditory excerpts. The non-major music classes utilized in this study included: two sections of music appreciation and two sections of class piano for non-majors. The music majors were enrolled in one of the following four classes: Student Teaching, Teaching Music in the Elementary Schools, Orientation to Music Education, Teaching Music in Diverse Settings, and Teaching Instrumental Music in the Secondary Schools. In total, the test was given nine times.
Procedure
Participants signed a consent form approved by the Institutional Review Board (IRB) at the university, acknowledging their willing participation in the experiment. After completing the demographic questions on the form, the following directions were read aloud:
You will hear 32 excerpts of traditional instruments. These excerpts are played by either a professional musician or by a beginning student who has completed one year of study. You will hear each excerpt twice. After listening to the excerpt, indicate what instrument you thought you heard. A list of possible instruments has been provided. Write the name of the instrument on the line. Also indicate if you think the instrument was performed by a beginner or a professional by checking the appropriate box. You
While the excerpts included only flute, clarinet, alto saxophone, and trumpet, the list of possible instruments included on the response form were: flute, oboe, violin, trumpet, saxophone, viola, French horn, and clarinet. Participants listened to one of two randomly generated orders of excerpts. Average time of the task was about 15 minutes.
Results
Because of intact class testing, an unequal number of participants listened to each of two orders. To determine whether excerpt order resulted in significant differences among participants, a dependent t-test was conducted comparing mean scores of participants on each of the 32 excerpts between listening to order 1 (n = 86) and order 2 (n = 126). No significant differences were found due to order of stimulus presentation [t (31) = 1.00, p > .05]. All data, therefore, were considered as one data set.
After listening to each of the 32 excerpts, participants made two discriminations—instrument timbre and experience level of the performer. This resulted in two possible correct responses per excerpt. With two examples of each variable combination (e.g., flute, beginner, with initial attack), one at B-flat and one at F, four points were possible for each unique combination. With 16 combinations, 64 points were possible in this task. A four-way factorial analysis of variance (ANOVA) with repeated measures was calculated using total raw scores as the dependent measure. College major functioned as the between-subjects factor, with instrument, expertise level, and initial attack conditions as within-subjects factors.
Results indicated significant differences for all main effects (p < .001), though these significant main effects have varying effect sizes. Music majors (M = 2.43, SD = 1.12) were more accurate than non-majors (M = 1.87, SD = 1.1). Clarinet timbre (M = 2.33, SD = 1.08) and trumpet timbre (M = 2.30, SD =1.21) were judged similarly and more accurately than alto saxophone (M = 2.04, SD = 1.09), or flute (M = 1.89, SD = 1.13). Scheffé post-hoc procedures for multiple comparisons indicated that trumpet and clarinet means were significantly different from clarinet and alto saxophone means (p < .001). Tones of beginners (M = 2.30, SD = 1.1) resulted in more accurate responses than professionals (M = 1.97, SD = 1.16) When pitches were presented with the initial attack (M = 2.46, SD = 1.13) responses to instrument type and level of expertise were more accurate than when pitches were presented without the initial attack (M = 1.82, SD = 1.06). Effect sizes, reflected in the partial η2 values, were: instrument, .13; expertise, .26; major type, .33, and initial attack, .64. Clearly, the presence or absence of the attack had the greatest effect on participants’ ability to identify the expertise of the performer and identify the instrument being played.
Most of the two- and three-way interactions were significant beyond the p < .01 level. The four-way interaction among instrument, expertise level, initial attack, and major type was also significant [F (3, 630) = 4.2, p = .006, partial η2 = .02] and is illustrated in Figure 1. For all instruments and both majors, participants responded to the identification questions more accurately when the initial attack was present as opposed to just the sustained portion of the sound. However, among music majors, beginning tones were identified more accurately than professional tones for all instruments when initial attack was present. This was not the case among non-majors where initial attack had little effect except on the flute tone. In fact, the flute tone was the only timbre that consistently was identified more accurately when beginners played than when professionals played, regardless of initial attack condition or major of participants.

Four-way interaction among variables: Initial attack (Attack and Sustain), Expertise (Professional and Beginner), Instrument, and Major.
While the excerpts included only flute, clarinet, alto saxophone, and trumpet sounds, the list of possible instruments on the answer sheet also included oboe, violin, viola, and French horn. Tables 1 and 2 supply the number of correct and incorrect responses for each instrument listed on the answer sheet. Clearly music majors were more accurate than non-majors in their responses, but accuracy on any instrument never reached 70%. The best distracter for the flute tone was trumpet among the music majors and French horn among the non-music majors. For clarinet, it was oboe and for alto sax it was clarinet for both groups of participants. Oboe was the biggest distracter for the trumpet tone among the music majors and French horn was among the non-majors.
Tabulation of Correct and Incorrect Responses to Instrument Identification by Majors.
Note. Columns add up to 800 possible responses (n = 100 music majors, eight excerpts of each instrument).
Bold indicates a correct response.
Tabulation of correct and incorrect responses to instrument identification by non-majors. Instrument Played.
Note. Columns add up to 896 possible responses (n = 112 non-majors, eight excerpts of each instrument).
Bold indicates a correct response.
Discussion
Renowned conductor John P. Paynter once stated, “Often what we call ‘bad tone quality’ is really ‘bad attack and/or release.’ The sound in between may be just fine” (1984). The results from this study support previous empirical studies (Elliott, 1975; Paul, 2005; Saldanha & Corso, 1964; Thayer, 1972; Wedin & Goude, 1972) demonstrating that initial attack has an effect on accurate identification of timbre. In every combination of variables, participants, regardless of major, were able to more accurately identify the instrument and performer expertise when the initial attack was present in the sound as compared to when the sound was just the sustained part of the tone. Initial attack was more consequential to music majors—the difference between means of articulated and non-articulated tones were much greater than for non-majors, and this was especially evident with the excerpts of beginners. What is unique to this study is that tones from both novice and expert performers were utilized. Based on our data, we are able to hypothesize that it was the quality of the attack, not the attack alone, that seemed to impact identification.
The concepts of timbre and tone quality overlap more than they are disparate, which logically leads one back to the Paynter observation. Teachers of instrumental music, especially those who work with beginners, should consider the importance of the initial attack in the development of a “characteristic sound.” Focused work on long tones is certainly critical in the development of tone, but similar dedicated work on initial attack with beginners appears to be equally important.
Results were consistent with previous research in that removal of the attack from a trumpet sound reduced the accuracy in identification tasks (Thayer, 1972). This was most apparent for the beginning trumpet excerpts for the music majors. It appears that there is something in the onset of a trumpet timbre that aids the identification process. It is possible this is similar across all brass instruments, but is less prominent in woodwinds. This study included only one brass instrument, so future research is warranted.
Like previous research in timbre identification, the excerpts consisted of single tones. Perceptual accuracy might differ in more authentic musical contexts as was noted by Kendall (1986). Sequential patterns of pitches (melody) as compared to single sustained pitches would provide evidence, or not, of the ability to transfer results from single pitch data into more musical settings. Using frequency of stimulus (using bass instruments instead of treble instruments) and expressive qualities of sound (vibrato, dynamic changes) as variables in future timbre studies would continue to add to the knowledge base in perception. Instrumental music teachers would be interested in how these parameters compare between novice and experienced musicians.
We acknowledge that the excerpts from expert performers were recorded and edited using professional equipment by sound engineers. The excerpts from children were recorded under different circumstances: in a quiet room using excellent but not professional recording equipment and edited by a skilled but not professional editor. In theory, the quality of the recordings could have given a clue as to the performance level of the instrumentalist separate from the tone quality or initial attack. However, pilot testing with professional musicians confirmed that there was no perceptible difference in the quality of the recordings between the novice and professional sound files. This seems to be substantiated by data that indicate initial attack was the primary explanation for differences among responses. One would expect if the differences were due to recording conditions that, overwhelmingly, all participants would have identified expertise across all conditions with a high degree of accuracy. This did not happen. Nevertheless, it is a condition that should be taken into account when considering the implications of the study.
A potential limitation of this study is the length of the excerpt we used. Duration of sound files were shorter than those used in previous studies (Elliott, 1975; Paul, 2005). However, they were the equivalent of a dotted quarter note at 60 bpm with an eighth rest between the two identical sound files in each excerpt. Pilot testing revealed it was possible to identify timbre with short sound files, but the duration may have skewed the results. An investigation with reaction time as a dependent measure would help set protocol for future research by providing an indication of how long one actually needs to make an accurate instrumental timbre identification decision. Length of excerpt is one plausible explanation for the relatively weak performance among musicians, especially in identifying instrumental timbres. Excerpts replicating usual listening conditions—i.e., professional musician sound with an articulated attack—resulted in a high of M = 82% (trumpet) accuracy rate in the instrument identification portion of the responses among music majors. Flute (M = 74.5%), alto sax (65%), and clarinet (M = 60%) were quite a bit lower.
It is an enigma as to why music majors had such a difficult time with instrument identification. Even more mysterious is the fact that they were more accurate in identifying instruments when beginning sounds with initial attack (presumably “less characteristic”) were played (trumpet M = 94.5%, flute M = 80%, alto sax M = 70.5%, clarinet M = 75.5%). Non-majors were notably less accurate than music majors, and expertise of the player did not have an effect on their accuracy in identifying trumpet or clarinet sounds, but they were almost 15 percentage points more accurate on flute excerpts when the beginner was playing, and almost 15 percentage points more accurate on alto sax excerpts when professionals were playing. Although quality of audio equipment was presumed to be adequate for purposes of this study because it existed in university classrooms designed for listening, this is an aspect that cannot be overlooked as an explanation for the data. There is also a possibility that the absence of fading in or out resulted in artificial and abrupt “attack” that may have influenced participant’s abilities to identify instrumental timbre. Perhaps it was this artificial “attack” and not the presence or absence of the “initial” attack that affected identification.
There exists, however, another possible explanation for the significant main effect of expertise. One would assume that music majors would be more accurate identifying the professional excerpts, though the opposite was the case. This suggests that undergraduate music majors may have a more clear and well-formed concept of what is a poor tone rather than an ideal tone. Based on the results of this study, the quality of the articulation and the sustained portion of the sound may function as a gestalt as suggested by Compton (2007). Perhaps this is the most unique finding of all and is ripe for further investigation.
Understanding timbre and tone quality and the parameters that define characteristic instrumental sounds is crucial to music educators teaching instrumental music. That the initial attack is an important part of a characteristic sound is expected. That it might be as important as the sustained portion of the tone suggests focusing on the quality of the attack during instruction as much as the sustained sound.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
