Abstract
We examined perception of artificial timbre blending using recordings of two actual instruments. In Experiment 1, participants heard stimuli comprising different proportions of sounds from an oboe and a trumpet, constructed using both a linear and a logarithmic algorithm, and judged the degree of blending. In Experiment 2, participants chose between an oboe and a trumpet in each blend condition. In both experiments, participants were able to track the degrees of blending between the two anchor points quite accurately. In Experiment 3, participants matched test blends to two target blends in an ABX design and showed no evidence for categorical perception of oboe and trumpet timbres in their judgments. Further, participants with and without musical training showed similar patterns of responding. The findings suggest a high level of sensitivity for timbre coding in auditory perception and also have implications for timbre manipulation as a compositional device and sound morphing techniques.
With some experience we can tell an oboe from a trumpet, and, professional violinists can often tell a Stradivarius from a cheap violin. The quality of sound, or timbre, is a salient aspect of music, and as such has been explored extensively from both a psychological and a musical perspective. Within psychology, the emphasis has been on the perceptual and cognitive aspects of timbre perception. For instance, Grey (1977) explored the different perceptual dimensions for judging similarities between different instrumental sounds. Further, timbre has been shown to influence pitch perception (e.g., Krumhansl & Iverson, 1992; Marvin & Brinkman, 2000; Pitt, 1994; Vurma, Raju, & Kuuda, 2011), affect perceptual groupings and organization (e.g., Cusack & Roberts, 2000; Iverson, 1995), and create illusions of expanded or contracted intervals without affecting the perception of individual pitches (Russo & Thompson, 2005). Timbre has also been shown to have a strong influence on memory. Several studies have shown how timbre congruity enhances memory and timbre change impairs memory (e.g., Halpern & Müllensiefen, 2008; Oakes & North, 2006; Poulin-Charronnat et al., 2004; Radvansky, Fleming, & Simmons, 1995). In other research, timbre has been shown to affect emotion perception (Hailstone et al., 2009) and pleasantness judgments of musical excerpts (Makris & Mullet, 2003) as well. Finally, physiological studies have shown that we can identify timbre before intensity (Gong, Hu, & Yao, 2012), and timbre is processed bilaterally although with more right hemisphere input (e.g., Hoch & Tillmann, 2010; Samson, 2003).
Within music, the study of timbre has been approached from a different perspective, one that involves orchestration, sound synthesis, and the potential of timbre as a formal construct in musical composition comparable to melody and rhythm (Erickson, 1975; Lerdahl, 1987; McAdams & Cunible, 1992; Wessel, 1979). Advances in music technology have affected these efforts in a variety of ways. Throughout the traditions of tape music, analog synthesis, and computer-based music, options for manipulating the timbre of recorded and synthesized sounds have steadily increased. Techniques that were explored and refined in the 1970s – such as frequency modulation (Chowning, 1973), analysis/resynthesis (Grey & Moorer, 1977), and granular synthesis (Roads, 1978) – introduced hosts of unfamiliar timbres with high levels of complexity and realism to the years of compositional practice that followed. More recently, many computer-based instruments have been designed to provide precise control of timbre (McPherson, 2010; Oliver & Jenkins, 2008), bringing its significance as a musical dimension to the fore in performance contexts as well as fixed media compositions.
For centuries, composers have blended timbres acoustically with sophisticated orchestration. Currently, they have access to a variety of software tools to achieve these effects with greater control and precision. In one technique, called cross-synthesis, the strategy is to create artificial blends by analyzing recordings of existing instruments. The most common technique is to interpolate between the spectra of the two sounds being blended, with a result that is significantly different than combining the sounds acoustically. Many refinements of this method have been made in recent years (Caetano & Rodet, 2010; Ezzat, Meyers, Glass, & Poggio, 2005; Fitz, Haken, Lefvert, & O’Donnel, 2002). Compositionally, timbre blending is useful for generating instrument morphing effects and carefully formulated sound combinations Yet, despite its heavy use in music, there is a disproportionately small amount of research on how people actually perceive these sounds. This may be due to the fact that timbre is a multidimensional aspect of sound, making it difficult to effect isolated changes in perception. Several studies involving similarity judgments of pairs of sounds have identified spectral energy distribution, sound onset characteristics, and spectral fluctuation over the course of a tone as acoustic correlates of three basic perceptual dimensions determined via multidimensional scaling (Grey, 1977; Iverson & Krumhansl, 1993). Exploring the effects of artificial timbre manipulation, Grey (1975) presented sequences of interpolated timbres between specific instrument endpoints and asked participants to identify the point at which the timbre changed from one instrument to the other. In general, participants rated these transitions as being gradual in nature. A later study illustrated that exchanging the spectral energy of a pair of tones resulted in exchanged positions along a predicted dimension of the perceptual space (Grey & Gordon, 1978). McAdams, Winsberg, Donnadieu, De Soete, and Krimphoff (1995) included synthesized hybrids in a dissimilarity rating experiment. In the resulting perceptual space, some hybrids fell between their pure sources along certain perceptual dimensions. However, the hybrids were not generated systematically based on analyses of actual instruments, leaving details of the relationship between blending parameters and perception somewhat unclear. Caclin, McAdams, Smith, and Winsberg (2005) investigated links between timbre dissimilarity judgments and gradual changes in specific synthesis parameters. Based on acoustic correlates to perceptual dimensions identified in previous research (onset time, spectral centroid, and spectral flux), parameters of non-instrumental synthesized tones were varied systematically in order to assess the validity and continuous nature of each dimension. Their results indicated that participants made use of a continuous range of onset times and spectral centroids in their dissimilarity judgments.
In this paper, we explored how sensitive listeners were to these blended instrumental sounds. More specifically, we explored how the degree of blending – that is, the level of spectral interpolation between different instrumental sounds – affected the listeners’ identification of the timbres and whether the parent instruments could retain their identities in these newly created sounds. To this end, in two experiments we blended sounds from an oboe and a trumpet in varying degrees, and asked participants to judge whether they could discriminate the blends of differing ratios in a continual fashion (Experiment 1) as well as to judge whether each blend was more representative of one or the other instrument (Experiment 2). A secondary question in these experiments was whether individuals who were musically trained or more familiar with the two parent instruments that provided the blends would be more discriminating with respect to sound quality.
Experiment 1
Method
Participants
A total of 16 adults in the Washington DC area participated in the experiment either for extra credit in psychology courses at American University or on an entirely voluntary basis. None had any hearing difficulties. Those who had at least 5 years’ of experience playing an instrument were classified as “musicians” (N = 9, mean = 11.9 years, range = 5–33 years). Others were classified as “non-musicians” (N = 7, mean = 1 year, range = 0–3 years). We also classified the participants into two categories as a function of their familiarity with the two instruments, regardless of their musical training. Based on self-reports on a scale of 1–10 for each instrument and using a median split, those participants with combined scores of 10–19 (N = 8), reflecting the unweighted sum of the ratings given to familiarity with oboe and familiarity with trumpet, were classified as “more familiar” and those with combined scores of 2–7 (N = 8) were classified as “less familiar” with the two instruments at hand. We should note that there was, of course, some overlap between the musicianship and familiarity groups. Of the nine musicians, five were also classified as “more familiar” with the instruments. However, this overlap was not complete in that four of the musicians were classified as “less familiar” and three of the seven non-musicians were classified as “more familiar”.
Materials
Sound stimuli for the timbre continuum were created using two instrument samples from the McGill University Master Samples collection (MUMS). Trumpet and oboe tones played at C4 (261.626 Hz) served as source material for the extremes of the continuum, with gradations in between generated via signal processing of the original samples. The tones were recorded at a sampling rate of 44.1 kHz in 16-bit quality, and normalized for loudness and duration (2 seconds). Artificial timbre blending was achieved using overlapping windows of short-time Fourier analysis/resynthesis, at each moment interpolating between the spectral magnitudes of the two tones. Phase information was left unchanged. Along with attack time and spectral fluctuation, spectral envelope has been consistently identified as a major perceptual dimension of timbre (Caclin et al., 2005; Grey, 1977; Iverson & Krumhansl, 1993; Krumhansl, 1989; Lakatos, 2000; McAdams et al., 1995). Thus, although this study focuses on only one dimension of timbre, it is a sound one.
In order to capture subtle spectral changes with sufficient time resolution, an analysis window size of N = 1024 samples (23.22 ms) was used, with overlapping windows analyzed every 256 samples (5.81 ms). The algorithm employed is based on a technique described in Puckette (2007), altered here to allow an arbitrary amount of blending between a source and target signal. It works as follows: for each window of audio, both the trumpet and oboe signals are transformed to the frequency domain via Fourier analysis. Based on the difference between the magnitudes of the trumpet and oboe spectra and a variable interpolation coefficient, a magnitude scalar is calculated to adjust the magnitude of each frequency bin of the trumpet spectrum data before an inverse Fourier transform. This yields the altered (i.e., “blended”) time domain signal. Phase data are not considered in this process. At each window, the magnitude scalar is calculated as
where i is the frequency bin number (0 < i < N-1), Ai is the scalar for bin i, B is the interpolation coefficient, |S| is the source (trumpet) magnitude, and |T| is the target (oboe) magnitude. As B increases from 0.0 to 1.0, the spectral envelope of the trumpet signal is gradually transformed so that it sounds more and more like the oboe. For each stimulus, the interpolation coefficient was held constant for the duration of the tone in order to produce a fixed blend rather than a time-dependent morphing effect; however, the technique does exploit the time-varying spectral features of each signal in general. No special considerations were made for different segments of the tones (i.e., attack, initial decay, sustain), or for different frequency bands, but the tones were matched in terms of loudness, duration, and attack segment lengths. With a coefficient near 0.5, it produced the desired result of timbral ambiguity.
Sets of 10 stimuli were created for the continuum, where B = 0 produced a pure trumpet tone, B = 1.0 produced a pure oboe tone, and coefficient values in between produced various gradations of trumpet/oboe blends. In an effort to find a set of coefficients that produced perceptually gradual results, two coefficient curves were used. The first increased from 0 to 1.0 linearly, while the second increased logarithmically. These two sets were used based on pilot feedback indicating that the resulting timbre continuum from trumpet to oboe was appropriately gradual across the scale. In all, two sets (20 stimuli) were generated with these methods.
Design and procedure
There were two presentation sets (Set Linear and Set Log) corresponding to the two blending curves. In each set, there were 50 trials, comprising five presentations of each blend step. Because there is evidence that timbre perception can be influenced by previously heard timbres (Mercer & McKeown, 2010), all trials were completely randomized with respect to blend step. Half of the participants received Set Linear first and the other half Set Log first. The participants wore headphones (Sony MDR-7506) and heard the stimuli that had been stored on a CD from a computer. They wrote their responses on a sheet of paper that had been numbered from 1–50 for the first set and 1–50 for the second set. All participants were first presented with the pure oboe and pure trumpet sounds to remind them of what these instruments sounded like. During the main presentation, they were asked to rate each sound on a scale of 1–10, where 1 indicated one pure sound (oboe for half of the participants and trumpet for the other half) and 10 indicated the other pure sound. The presentation set order and the ends of the scales corresponding to each of the pure sounds were counterbalanced across four groups of participants. After all 100 sounds were thus classified, the session concluded with a short demographic survey on musical training and familiarity with different instruments.
Results
The results are shown in Figure 1. The main question of interest was whether the participants could track the blend steps of gradually changing timbre ratios. Interestingly, there were no significant differences between the patterns of responses to the stimuli as a function of blend step construction. The slopes of the two curves corresponding to stimuli from Sets Linear and Log (0.88 and 0.83, respectively) were not different from each other (Wilcoxon Signed Rank test, Z = 0.08, N = 16, p > .10), showing that at least rank ordering the blends with respect to varying levels of instrument contributions was similar regardless of how the blends were created, and thus the analyses were based on combined responses.

Subjective blend judgments as a function of increasing blend.
The most important result to note was that gradually increasing blends were indeed trackable by all participants. The correlation between the actual blend step changes and the participants’ ratings was 0.77 (p < .01). Although the musicians’ correlation was higher (0.81) compared to non-musicians’ (0.72), and so was the more familiar group’s correlation (.80) compared to the less familiar group’s (0.73), Wilcoxon Rank-Sum tests showed that these differences were not statistically significant [Us = 40 (n1 = 9, n2 = 7) and 44 (ns = 8), respectively, both ps > .10].
Experiment 2
Method
A total of 16 adults in the Washington DC area participated in the experiment either for extra credit in psychology courses at American University or on an entirely voluntary basis. None had participated in Experiment 1 and none had any hearing difficulties. Based on the same criteria as in Experiment 1, we again classified them as musicians (N = 8, mean = 9.1 years, range = 5–19 years) or non-musicians (N = 8, mean = 7.5 months, range = 0–3 years) as well as those more familiar (N = 8; combined scores 12–20) or less familiar (N = 8; combined scores of 3–11) with the two instruments. Half (N = 4) of the musicians and half (N = 4) of the non-musicians were classified as “more familiar” and the remaining four in each musicianship subgroup were classified as “less familiar”.
The materials and design were the same as before. The procedure, however, was different in that participants were asked to classify each sound as either an oboe or a trumpet. Thus, they were forced to make a binary decision, instead of tracking gradual changes. Of interest was whether participants would be able to label a given blend weighted toward one or the other extreme as the same instrument in a consistent manner throughout the multiple presentations of that blend, especially in the case of the more ambiguous blends.
Results
The results are shown in Figure 2. The question of interest was the retention of the identities of the two parent instruments through the different blends, and more specifically whether there would be blends where the two instruments would be truly blended, with neither parent instrument able to influence the judgment consistently, and thus a “new” timbre could be said to have emerged. A participant was deemed to be perceiving predominantly the influence of one instrument if 80% of his or her ratings corresponding to a given blend step (and each blend step containing a higher percentage of that instrument in the blend thereafter) indicated that instrument. We measured the number of steps it took the participants to transition from one instrument to the other in this manner, or the number of steps of “indecision” where neither parent instrument exhibited dominance. The mean number of steps in which the blends were perceived as too ambiguous to be consistently classified as one or the other instrument (i.e., the mean number of steps of “indecision”) was 3.25 for Set Linear and 3.00 for Set Log, and not different from each other (Wilcoxon Signed-Rank tests, N = 16, Z = 1.12, p > .10). Thus, the analyses were conducted on the basis of participants’ combined responses with respect to the classification of each blend step into one or the other instrument (a total of 10 times).

Percentage of times the blend was classified as Instrument 1 as a function of increasing blend with Instrument 2.
The mean number of steps where the blends were ambiguous was 3.09. Also, even though as expected this number was lower for musicians (2.62) than for non-musicians (3.56), as well as for the more familiar group (2.64) than for the less familiar group (3.50), neither difference reached statistical significance (Wilcoxon Rank-Sum tests: Us = 43 and 42, respectively, both ns = 8, both ps > .10). We should note that, unfortunately, because we used a paper-and-pencil response procedure, we did not have reaction time data to see whether people were also slower for these more ambiguous blends.
In Experiment 1, it seemed that listeners were able to track the different degrees of blending with relative ease and make fairly accurate perceptual judgments on the emergent spectral shape and the temporal envelope of these new sounds. Moreover, in Experiment 2, it seemed that listeners were also able to use their knowledge of the parent instruments’ timbres to moderate their judgments and assign the blends to relatively accurate categories depending on the degree of interpolation, except in the few steps of indecision. An interesting question that arose from these results was whether the judgments of the timbres in these blends of indecision could also show evidence for categorical perception. The present results, obtained from the “classification” or “identification” procedure of Experiment 2, suggested not. However, without a standard “discrimination” procedure as converging evidence, we could not rule out categorical perception unequivocally. Thus, to make a convincing case for continuous perception, we tested the same idea using the most commonly used “discrimination” procedure (ABX). Categorical perception refers to the tendency to ignore or be unable to perceive small differences in attribute changes and instead classify sounds into discrete broader categories. The original categorical perception studies were conducted with speech sounds, where the gradual changes between certain consonants such as b and p were shown to be indiscriminable and assigned to the two distinct categories instead (e.g., Eimas, 1963; Liberman, Harris, Hoffman, & Griffith, 1957). Categorical perception has been shown in music, too, with pitch intervals transitioning between major and minor (e.g., Burns & Ward, 1978; Locke & Kellar, 1973; Siegel & Siegel, 1977), as well as with durations of plucked and bowed sounds (e.g., Cutting, 1982; Cutting & Rosner, 1974). It appears that when distinct categories exist (e.g., major/minor for musicians but not for non-musicians who do not show a similar categorical perception with incremental mode differences), there is a tendency to classify continually changing sounds into those categories. However, not all similarly changing sounds are perceived categorically, even when distinct categories exist. For instance, vowels and lexical tones of tonal languages seem to be immune to categorical perception (e.g., Francis, Ciocca, & Ng, 2003; Fry, Abramson, Eimas, & Liberman, 1962). The results of Experiment 2 suggested that timbre transitions might be of this sort, and the sustained portion of the instrumental sounds (perhaps akin to that in vowels in speech), provided enough information for perception to be continuous rather than categorical.
Thus, in Experiment 3, we used an ABX discrimination task, the prototypical method for assessing categorical perception (e.g., Cutting, Rosner, & Foard, 1976; Gerrits & Schouten, 2004; Liberman et al., 1957). In this method, using a matching-to-sample procedure, the idea is that if there is categorical perception, two sounds belonging to the same category (in this case the same instrument) should be labeled the same way, which in turn should make discrimination very hard. Thus, if there is any point at which the degree of blending pushes one of the two slightly different sounds to the next category, then the test blend X (which is the same as one of the two target sounds – A or B) should be more easily matched because the test blend shares the same categorical label as only one of those target blends. That is, the task would be easy only at the category transition point; comparisons beyond that point in either direction would be based on sounds within the same category and sharing the same label and thus be less discriminable. Otherwise, if there is no categorical perception, then there should be no single level at which the same blending degree difference between the two sounds would lead to superior matching accuracy compared to other levels. Of interest was whether this would be the case for the timbre blends, for which the parent instruments provided the category labels, in particular for the few blend levels where the identification of a parent instrument appeared to be difficult in Experiment 2.
Experiment 3
Method
Participants
A total of 16 adults in the Washington DC area participated in the experiment either for extra credit in psychology courses at American University or on an entirely voluntary basis. None had participated in Experiment 1 or 2, and none had any hearing difficulties. Because the first two experiments showed similar patterns of results for musicians and non-musicians, and irrespective of familiarity with the parent instruments, no such differentiations were made in this experiment.
Materials, design, and procedure
The materials were the same as before; however, because in the previous two experiments Set Log and Set Linear blends had produced similar results, only the Set Log blends were used. All blends were paired with each other to form the AB pairs, creating a list of 45 pairs. Thus, there were 9 pairs with only one degree or step of blend difference between their component items, one at each level of blending, 8 pairs with two degrees or steps of blend difference, 7 pairs with three degrees of blend difference, and so on, and finally there was 1 pair whose component items comprised the two parent instruments (pure oboe or pure trumpet). The critical trials were those with one- and two-step differences. These were embedded among all of the other comparison pairs comprising larger differences between the pair items so as to not make the task too daunting, as well as to affirm that the discriminability of targets would have a large range and not be either too obvious or too difficult in all cases. All of the pairs were presented twice (thus there were 90 trials for each participant) and each pair presentation was followed by the presentation of a test item (X). Half the time X matched A, and half the time it matched B. Items within each pair were presented as A or B items equally often across two groups of participants. The presentation order of all blend pairs was randomized for each participant, as was whether X matched A or B in any given trial, within the constraint that, across comparisons at each level, it matched A half the time and B half the time. The ISI was 1.7 s between the A, B, and X presentations. Between each trial, there was a pause for as long as it took the participant to respond, and then an additional 1.7 s was given before the start of the next trial.
The procedure was also similar to that of the previous experiments except that, instead of writing their responses on paper, participants pressed A or B on a Macintosh laptop computer, to indicate their response regarding which item of the just-heard pair was the same as the test item in question.
Results
The results are summarized in Figure 3, in which we plotted the mean accuracy for the one- and two-step comparisons – that is whether the X was identified correctly as matching A or B across the two trials for any given participant and across all participants – as a function of the blend step position. As can be seen in the figure, there was no single telling peak for either the one-step or the two-step data. That is, there was no blend-degree level where discrimination was superior to those comparisons before and after, for either the one- or the two-step function (Kruskal-Wallis tests, Hs = 4.78 and 6.92, dfs = 8 and 7, respectively, both ps > .10), and hence no recognizable category boundary. Given that the performance in the two-step comparisons exceeded chance level (Wilcoxon Signed-Rank test, Z = 2.07, N = 16, p < .05), it was not the case that the lack of a category boundary reflected a failure of the blend differences between the targets to cross a discrimination threshold. Interestingly, there also seemed to be a gradual improvement in discrimination in the two-step comparisons at levels that were closer to the “oboe” parent instrument (Mann-Whitney U test, N = 16, p < .04).

Mean accuracy in one- and two-step differences as a function of blend step.
Discussion
Exploration of timbres produced by the simultaneous playing of different instruments has always been a cornerstone of orchestration. Although digital audio tools have enabled new techniques and levels of realism in the art of timbre blending, giving it an even more salient role in music production (Fitz et al., 2002), not much is known about how these blends are actually interpreted by the human auditory perceptual system. The aim of the current study was to explore people’s perception and identification of such blended timbres. More specifically, we asked whether listeners could experience a newly formed Gestalt while appreciating and being able to track nuanced differences in the quality of the sound in reference to the anchor-point timbres. Further, after finding in Experiment 2 that the parent instruments involved retained their identity and the blends could be classified as predominantly one or the other instrument depending on the ratios of the two timbres, we also asked whether there would be categorical perception in the cases where the ratios of the two timbres could not be easily identified.
Participants were able to track slight changes in the new blended sounds surprisingly well while using information from their knowledge of the parent instruments as anchor points. That is, their perceptual processing of these new sounds closely paralleled the physical changes created by interpolating between the oboe and the trumpet spectra, sounds they had never encountered before. The ability to discern the relative contributions from the two parent instruments was quite impressive, especially given that the specific interpolated steps were presented entirely randomly, out of the context of a predictable progression (cf. Grey, 1975). These findings support the notion of timbre scales and intervals as perceivable musical constructs (Erickson, 1975; Lerdahl, 1987; McAdams & Cunible, 1992). For instance, McAdams and Cunible (1992) found that, just as musicians can discern a major third as being a smaller gap along a pitch scale than a perfect fourth (regardless of direction), their participants could accurately assess specific distances between timbres. Consistent with these findings, in our study, participants showed the ability to identify randomly ordered interpolated timbres within a sequential set, implying an ability to perceive the underlying structure of a timbre scale. Such sequences can thus be effective as compositional constructs analogous to pitch-based melodies. A future direction might be to explore how well degrees of timbre interpolation can be tracked in the context of other instrumental sound pairings, as well as to determine whether there exist perceptually equal degrees of timbre change. Further, although there is no obvious analog in timbre scales that corresponds to the octave in pitch-based scales, one might speculate that larger timbre scales could be constructed by joining independent continua that share timbral endpoints, which would enable sensations of tension and resolution and create the possibility of consonant and dissonant timbre intervals.
Although there is of course the possibility that our blend transitions were not subtle enough to necessitate the activation of the instrument name to aid in their labeling, it was interesting to note that similar results were obtained with both blend sets, those in which the interpolation coefficient increased in a linear fashion as well as those in which it increased in a logarithmic fashion. All participants went through both blend sets and were able to recalibrate their judgments to be able to rank the stimuli appropriately within the different sets. Further, even when forced to make a binary distinction, the choice of the instrument appeared to be aided by an awareness of the differing degrees of blending, suggesting that timbre may not be coded in a manner that necessarily evokes the semantic category of the instrument used to produce it.
Interestingly, in both Experiments 1 and 2, even though the differences did not reach significance (likely because of lack of power due to relatively small sample sizes), musicians and those familiar with the two parent instruments, who were expected to have more awareness of the sounds produced by different instruments, had even fewer blend steps where they showed inconsistencies in their judgments. Nevertheless, both musicians and non-musicians showed similar patterns of responding, suggesting that this ability to track timbre changes was not dependent on training. Such a result is consistent with the notion that in musical tasks where processing does not depend on formal musical training, any differences between musically trained and naïve participants will tend to be of a quantitative rather than a qualitative nature at best. For instance, musically more meaningful materials are remembered better by musically trained participants whereas random, meaningless materials are remembered better by musically naïve participants; however, increasing the length of the materials has the same detrimental effect for both groups of participants (Korenman & Peynircioğlu, 2007). Similarly, whereas performance on tasks like categorization of major–minor differences or contour judgments as a function of tonal and atonal music benefits from training or expertise through the development of special skills and access to unique knowledge or strategies, performance on tasks involving emotion judgments or memory for tempo do not (e.g., Bigand & Poulin-Charronnat, 2006; Burns & Ward, 1978; Levitin & Cook, 1996; Morrongiello & Roes, 1990). The former group of tasks depends on the analysis of underlying musical structures whereas the latter group depends simply on the processing of surface structures (e.g., Lerdahl & Jackendoff, 1983). Thus, it was not surprising that in the present study, timbre judgments, which rely on processing of surface structures and do not depend on training, showed similar patterns of responding regardless of training or familiarity.
One secondary question that arose from this fairly accurate tracking of blend steps was whether participants would show any categorical perception at any point. The prerequisite for categorical perception seems to be the existence of salient end-point categories (cf. Harnad, 2003). One could argue that different instruments can indeed comprise salient categories for most people who have experienced their sounds in different circumstances. However, it may also be that the “categories” in this case refer to the way the sounds are produced rather than the identity of the sounds themselves. Whereas in language a “b” and a “p” have different semantic properties, which differentiates them regardless of who produces the sounds, for most people, the experience of sound itself is a continuous perceptual quality and does not embody a semantic property.
The results of the first two experiments suggested that timbre perception was indeed continuous rather than categorical, even for musicians. In Experiment 3, the existence of a categorical boundary was ruled out entirely with the traditional ABX discrimination task. There was no level of blending that could be considered a boundary, before which the sound would be categorized as an oboe and after which the sound would be categorized as a trumpet. Nevertheless, participants again appeared to be sensitive to the actual timbres of the two instruments involved in the blending. Regarding blends near the extremes, participants were quite sensitive to the entrance of trumpet characteristics coming from the oboe side, but showed less sensitivity to the presence of oboe on the trumpet end of the continuum. This could have been due to the unique attack quality of the trumpet tone, which begins with significant levels of high frequency transients before mellowing to a more even harmonic tone. Thus, a small amount of this quality grafted onto nearly pure oboe could have cued the presence of “trumpetness”, but the reverse was apparently not true.
Two caveats in our study were that our sample size was not huge, and the range of timbres used was limited to two instruments. Although small samples in similar perceptual studies is not uncommon (e.g., Pastore, Li, & Layer, 1990 had 4 participants, Rosen & Howell, 1981 had 3 participants, and Siegel & Siegel, 1977 had 6 participants), a larger sample would certainly have strengthened our conclusions. Also, we used only trumpet and oboe, and experience of the blends of these two timbres may not generalize to that of other blends (cf. Fabiani & Friberg, 2011). Compositionally, timbre is employed to clarify simultaneous melodic streams, to create coloristic effects, and to generate impressions of departure, arrival, tension, and resolution. Thus, similar future studies with more varieties of instruments and interpolation techniques would help to build a body of research on instrumental timbre scales, creating a valuable resource for composers. Further, extended to include non-instrumental sounds, more research on timbre continua would be beneficial in a variety of areas, such as film sound design applications, where visual transformations sometimes require corresponding changes in audio information. Optimal processing parameters for specific cases of sound morphing could be established based on this line of research, helping sound designers create perceptually even transitions quickly based on given endpoints.
Another limitation was that we had only single tones of a single pitch. Although small changes in pitch appear not to have a noticeable effect on timbre perception (e.g., Steel & Williams, 2006), it is not clear that this invariance would extend to timbre blends as well, especially because instruments differ in their comfortable ranges (cf. Pitt, 1994). In addition, the amount of information conveyed would likely modulate the perception of the blended timbres. For instance, presenting a melody rather than a single tone may affect judgments, as characteristics of transitions between tones can be indicators of instrumental timbre. In a different vein, the importance of sound onset or the presence or absence of attack may also affect the perception of the blends from real instruments (cf. Tardieu & McAdams, 2012).
These limitations notwithstanding, in our study the blends were based on actual instruments rather than synthetic sounds and thus tapped into human perception of a particular set of musical experiences. The most important finding was that participants showed a high level of sensitivity to slight changes in oboe and trumpet timbres and they were able to track the different ratios in the blends quite accurately.
Footnotes
Acknowledgements
We thank Carol Shou of Montgomery Blair High School and Ben Mangold of American University for their help in data collection.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
