Abstract
What does the common descriptive lexicon for instrumental sound tell us about how we conceptualize musical timbre? Perceptual studies have revealed a number of verbal attributes that reliably map onto timbral qualities, but the conventions of timbre description in spoken and written discourse remain poorly understood. Books on orchestration provide a valuable source of natural language about instrumental timbre. This article uses methods from corpus linguistics to explore the semantic features of timbre through a quantitative analysis of 11 orchestration treatises and manuals. Findings reveal a relatively constrained vocabulary for timbre: about 50 adjectives account for half of all descriptions in the corpus. The timbre lexicon can be categorized according to affect, matter, metaphor, mimesis, action, acoustics, and onomatopoeia, and further reduced to three latent conceptual dimensions, which are labeled and discussed. Descriptive patterns vary systematically by instrument and instrument family, suggesting certain regularities and consistencies to timbre description in the orchestral tradition. This study helps test the long-held assumption that conventions of timbre description are vague and unsystematic, and offers a cognitive linguistic account of the timbre-language connection.
Verbal practices of timbre description form an enduring linguistic supplement to the orchestral tradition. The French horn is often considered noble, the cello rich, the oboe nasal—musicians and writers regularly call upon adjectives like these for the purposes of illustration, comparison, and instruction. Although commonplace, the lexicon for timbre is more unstable and inconsistent than many other music vocabularies. As Walter Piston (1955, p. 67) put it, “adjectives used to describe the tone of [an instrument] cannot do more than direct the student’s attention to certain admittedly general and vague attributes.” Regardless of this perceived vagueness, semantic practices constitute a vital component in the transmission of discursive knowledge about timbre.
Orchestration manuals and treatises offer a revealing window into the common descriptive practices surrounding instrumental timbre in the western symphonic tradition. The current study uses techniques from corpus linguistics to explore these semantic norms. A corpus of 11 books on orchestration was manually mined for terminology relating to timbre. This dataset of adjectives and expressions was then subjected to statistical analysis in order to determine consistencies in semantic conceptualization. What terms are most commonly used to describe orchestral instruments? Do writers tend to use these words consistently? Are descriptive conventions different from one instrument to the next? What does this lexicon reveal about timbre cognition? Results from this text analytic approach to timbre description complement and extend the experimental literature by investigating how authors talk about timbre in natural language contexts.
Orchestration
It goes without saying that a deep knowledge of instrumental timbre—singularly and in combination—has always been essential to orchestration. In the most influential treatise on the topic, Hector Berlioz (1844, English translation 1882, p. 4) noted that orchestration should include, “the study (hitherto much neglected) of the quality of tone (timbre), particular character, and powers of expression, pertaining to each of [the instruments].” Similarly, Pierre Boulez commented, “To understand the extent to which timbre, composition, and affectivity are linked in the mind of the composer, one only needs to look at the musical education he has received, and which he himself transmits [through treatises and instruction]” (Boulez, 1987, p. 162). Since the manipulation of timbre is such a critical component to effective orchestration, the reflections of authors of orchestration texts offer a unique window through which to view timbre description in action.
The role of verbal description in orchestration is open to debate. Authors of treatises regularly lament the inexact nature of timbre description. For example, Kennan and Grantham (1952, p. 2) point out that descriptions should necessarily be considered mere aids to “aural memory and aural imagination,” writing: … tone colors cannot really be described adequately in words. It is all very well to read in an orchestration book that the clarinet is “dark” in its lower register, but until the sound in question has actually been heard and impressed on the “mind’s ear,” a student has no real conception of that particular color …
Despite the well acknowledged imprecision in timbre description, authors nevertheless readily attempt to translate instrumental sounds into words. Nikolai Rimsky-Korsakov (1933, p. 18) was more sanguine than most about the role of timbre description in the study of orchestration, acknowledging the aesthetic or affective value of borrowing adjectives from other, non-auditory senses. To Rimsky-Korsakov, timbre description was less a matter of objective labeling than symbolic, even synesthetic transference: It is a difficult matter to define tone quality in words; we must encroach upon the domain of sight, feeling, and even taste. […] In using the terms thick, piercing, shrill, dry, etc. my object is to express artistic fitness into words, rather than material exactitude.
As evidenced by the accounts above, the study of orchestration has largely been the endeavor of composers (Erickson, 1975; Mathews, 2006) and historical musicologists (Dolan, 2012). The topic has also been approached from empirical perspectives. In recent work, Goodchild and McAdams (in press) analyzed common principles of orchestration from a perceptual orientation, explaining the bases for ubiquitous vernacular concepts such as “blend” through the lens of auditory grouping principles (Bregman, 1990). Researchers have also explored the perception of blend in orchestral instrument dyads (Kendall & Carterette, 1991; Sandell, 1995).
Timbre semantics
Much of the perceptual literature on musical timbre includes an explicit or implicit linguistic component: participants typically evaluate timbral stimuli using words, often in the form of ratings scales, classifications, adjective checklists, free verbal responses, or interviews (for a review, see Wallmark and Kendall, in press). Even when no explicit language task is required, moreover, researchers often summarize statistical results by way of descriptive qualities. Indeed, the “linguocentric predicament” (Seeger, 1977, p. 47) runs deep in timbre research; however, despite the interrelations of timbre and semantics, relatively few studies have analyzed natural language sources for clues about how people translate timbral percepts into words.
In a canonical observational account of the timbre-language connection, Helmholtz (1867/1954, pp. 118–119) described musical timbres with uneven partials as hollow; strength in higher frequencies as nasal; strong fundamental frequencies as rich; weak fundamentals as poor; and energy in the highest partials as cutting and rough. In an early psychoacoustic study, Lichte (1941) found that timbral percepts were consistently described in terms of brightness, roughness, and fullness. Similarly, using a questionnaire and ratings task, Pratt and Doak (1976) determined the primary verbal attributes for a set of synthetic tones to be dull-bright, warm-cold, and pure-rich. Although differences in languages, methods, and stimuli make direct comparisons across the timbre semantics literature difficult, there is evidence that certain descriptive practices are shared across a number of language groups and contexts (Namba et al., 1991).
Studies using clustering and principal component analysis have generated further insights in their reduction of related descriptive adjectives to underlying semantic groupings. For example, von Bismarck (1974) found that synthetic timbre description could be grouped into four (German) verbal scales: dull-sharp, compact-diffuse, empty-full, and colorless-colorful. In an English study, Kendall and Carterette (1991) reported the primary semantic dimensions of orchestral timbres to be related to nasality, richness-brilliance, and complexity. And in a comparative study between English and Greek, Zacharakis, Pastiadis, and Reiss (2014, 2015) found that orchestral timbre terminology in both languages was grouped according to luminance, texture, and mass terms. Although certain broad patterns have been observed in perceptual studies, it is unclear to what extent these results resonate with common descriptive routines for orchestral instruments in discourse outside the lab.
Text corpus studies and timbre
Corpus analytic techniques have made significant inroads in many humanities and social science disciplines within the last decade, including music psychology. The corpus approach is defined by Temperley and VanHandel (2013, p. 1) as, “research involving statistical analysis of large bodies of naturally occurring musical data.” Much of the musical data studied using corpus methods is, naturally, music; that is to say, large samples of notated music (Huron, 2006), recordings (Serrà, Corral, Boguñá, Haro, & Arcos, 2012), or song texts (Condit-Schultz, 2016; Werner, 2012). Data sources that pertain to aspects of musical behavior and reception—including written and spoken language about music—remain understudied using these methods.
There are a number of corpus-based precursors to this article. In a text-analytic study, Kendall and Carterette (1993) extracted timbre descriptors from Piston’s Orchestration (1955) and subjected them to a modified semantic differential ratings task using wind instrument dyads as stimuli. They determined that around 91% of the variance in description could be explained by four verbal attributes, which they labeled power, strident, plangent, and reed. Other recent studies have leveraged the Internet to infer common semantic structures for timbre. Analyzing a large dataset of descriptive terms (or, “social tags”) for music, Ferrer and Eerola (2011) used latent semantic analysis to assess trends in timbre description across a broad range of musical genres. They found 19 semantic clusters reducing to five latent factors (energetic, intimate, classical, mellow, and cheerful), which cut across most genres and could be explained by a relatively small set of acoustic features.
Study aim
There were three main goals of the present study. The first goal was to statistically summarize a representative sample of the written discourse on orchestral timbre. What descriptive vocabulary is commonly used to characterize the sound of different instruments in the orchestral context? The related second goal, using statistical inference, was to determine whether this vocabulary reflects a consistent and systematic set of semantic strategies. Are there differences between descriptive schemas for individual instruments, for example, or between instrument families? Many writers over the years have assumed that words for timbre are whimsical, subjective, and somewhat arbitrary—this study will put that assumption to the test by hypothesizing that, to the contrary, timbre description in orchestration treatises reflects shared and relatively consistent verbal conventions that resonate with results from the timbre perception literature. And, finally, as a window into the cognitive linguistic underpinnings of timbre description, the third goal was to explore the latent conceptual structures of the corpus through an inductive process of semantic categorization. What does the common vocabulary for instrumental sound, as revealed in the corpus, tell us about how the mind makes sense of timbre? Influenced by conceptual metaphor theory (Lakoff & Johnson, 1980, 1999), music theorists have proposed a number of compelling models relating music vocabulary and concepts—for example, pitch height (Cox, 2016; Zbikowski, 2002), musical motion (Johnson & Larson, 2003), and “musical forces” (Larson, 2012)—to cognitive semantic structures. This article proposes that such conceptual structuring may undergird timbre description as well.
Timbre—the multidimensional attribute of auditory perception that allows us to distinguish one source from another when all other acoustic variables are kept constant—is notoriously difficult to study (for reviews, see Hajda, Kendall, Carterette, & Harshberger, 1997; McAdams, 1993). The present article aims to shed additional light on the “problem of timbre” through a corpus-based investigation of its semantic dimensions.
Methods
Data
The corpus analyzed in this study consisted of 11 treatises and manuals on orchestration and instrumentation published between 1844 and 2009 (see Table 1). Although not exhaustive, the corpus represents arguably the most widespread and influential texts on the subject, including the first modern orchestration treatise (Berlioz, 1882; Berlioz & Strauss, 1948/1991) as well as books commonly used today in college-level composition and orchestration classes (e.g., Adler, 1989). 1 Three general inclusion criteria were taken into account when selecting books for analysis: (a) English-language or translated-into-English sources, (b) a history of pedagogical and practical use in the Anglophone world, and (c) sources specific to the symphony orchestra.
The corpus.
The corpus consisted of 11 books. Frequencies listed are for timbre-related terminology only. Note: Token f refers to the total frequency of timbre descriptors in each source (including repetitions); % is the percentage of the total corpus represented by each source; type f is the frequency of unique descriptors in each text (i.e., the number of individual descriptors without repetition); and CTTR is the corrected type-token ratio, a common index of lexical diversity (Carroll, 1964). See text for details.
Method of data collection
Because common terminology for timbre is often ambiguous and context-dependent, descriptive words were extracted manually by the author and two musically trained graduate research assistants. Repetitions, quotations from other authors, tessitura information (low, middle, or high register), and specific playing techniques (col legno strings, brass mutes, etc.) were all noted and included in the dataset, and words were tabulated by instrument, instrument family, and various blends. Intensifiers (very) and negations (not x) were omitted; explicitly acoustic descriptions were also excluded from analysis. 2
Most importantly, special care was taken to avoid the inclusion of adjectives that cross-map with other musical domains in passages that do not specifically refer to timbre, including terms that can apply equally to pitch (deep, low), dynamics (soft, big), rhythm and articulation (punctuated, crisp), and texture (thick, sparse). To be sure, timbre covaries with pitch and dynamics, so it could be argued that this separation is a false one (McAdams & Goodchild, 2017). Many authors specify these covariates when describing timbre; however, a good many passages in the corpus either omit this information or provide inadequate context to accurately evaluate it. For the sake of the present study, therefore, a fairly conservative extraction method was employed. There were still many ambiguous cases, which were discussed and resolved among the author and the two assistants. To illustrate some typical instances of timbre description and corresponding extraction methodology, consider the following randomly selected examples (with
“[Cello] timbre, on the upper strings [range-high], is one of the most
“[The tambourine is] capable of
“The quality of the notes [on stopped French horn[technique]] varies from a
“The very deep notes[range-low] of the double bassoon are remarkably
“All are in sixth position [on the violin], except that the initial B is better played on the more
Following the extraction process, raw wordlist data were analyzed using AntConc software (Anthony, 2014). Lemma (i.e., canonical or dictionary) forms of all descriptors were specified in order to avoid duplications, including the transformation of adverbial, verbal derivations, and nominal forms to adjectives (for example, the tokens brightness, brightly, brighter, and brighten were all subsumed under the lemma bright). A stop-list filtered all irrelevant lexical content (particles, conjunctions, etc.).
Results
The extraction procedure resulted in 3666 total timbre descriptions (tokens), of which 879 (types) were unique. In addition to standard statistical procedures, the rest of this paper will employ a few basic measures common in corpus linguistic analysis (Gries, 2010). Tokens refer to the total frequency of observed timbre descriptors in a given dataset; % indicates the percentage of the total corpus represented by a given variable; and types are the unique observed descriptors within a given dataset.
As an index of lexical diversity—i.e., the relative level of redundancy or novelty of word choice—linguists often use a type-token ratio (TTR), or ratio of unique words to the total word count. For comparisons of uneven sample sizes, Carroll (1964) proposed a corrected TTR (CTTR) consisting of the ratio of type to the square root of two times the token frequency. 3 Lower relative scores thus indicate consistency/redundancy in descriptive vocabulary, while higher scores indicate breadth/diversity of word choice. Taken as a whole, the approximate lexical diversity for the corpus is high, CTTR = 10.3. Furthermore, we can see that lexical diversity is negatively correlated with year of publication, r(9) = –.70, p < .02, indicating that richness of vocabulary for timbre in orchestration texts has diminished over time. This historical trend is likely attributable in part to the aesthetic values of 19th-century musical romanticism, which tended to valorize poetic subjectivity and extra-musical, often literary themes (Berlioz was a foundational architect of program music). It should perhaps come as no surprise, then, that earlier authors exhibited a more expansive descriptive palette in characterizing timbre.
The 50 most frequent timbre descriptors are displayed in Table 2. Corroborating the high corpus-level CTTR value, results indicated a right-skewed, “long tail” distribution of word types: the most frequent descriptor, brilliant, occurs 112 times, although it only accounts for 3.1% of the corpus. The mean frequency of each descriptor is 4.51 (SD = 9.89), with 12% of the corpus consisting of words that occur only once (referred to as hapax legomena). In contrast, the top 50 words alone account for 49% of the corpus. This kind of skewed frequency distribution is roughly consistent with Zipf’s principle of least effort in natural language use, which states that the usage frequency of any word is inversely proportional to its rank in the frequency table (Baayen, 1992; Zipf, 1935). Thus, the lexicon of timbre description is diverse from an absolute standpoint due to a high proportion of words that appear only a few times.
Top 50 timbre descriptors.
After generation of the master list, data were subdivided by instrument family and by individual instrument. The following will focus only on the 17 primary non-percussive instruments of the modern orchestra, though terms for percussion and more obscure instruments (e.g., ophicleide, basset horn) were included in the complete dataset.
As shown in Table 3, vocabulary grouped by instrument family reveals a good deal of redundancy, with many of the top 10 or so words from the master list (e.g., brilliant, rich) reshuffled for each of the families. High CTTR measures betoken the breadth of descriptive options available to writers at the family level, which is intuitive given the timbral diversity of each group. The possible exception here would seem to be the more homogeneous strings; one might expect violin and viola, for example, to be described in similar terms, though this appears not to be the case, with roughly the same level of diversity in the strings as the other families (CTTR = 8.2). Despite similar CTTR measures, in absolute terms woodwinds were the best represented in the corpus (f = 1140).
Descriptors by instrument family.
Note: The woodwind group consisted of data for nine individual instruments plus the full section and blends between woodwinds; brass consisted of four instruments plus section and blends; strings for four instruments plus section and blends; and percussion for 37 instruments plus piano. Descriptors listed in ascending rank order (1–10).
Table 4 displays frequency, diversity, and top 10 descriptors for each instrument. The clarinet was the most frequently described woodwind (f = 245) and also had the highest lexical diversity (CTTR = 6.6); among the brass family, this distinction went to the horn (f = 201), though the trombone showed greatest diversity (CTTR = 6.6). It is worth noting that many instruments are described using opposing, even contradictory terms: for example, the clarinet is frequently described as both rich/dark and brilliant/piercing depending on register. The lack of tessitura specificity in the lists below is an important factor driving some of this bipolarity; indeed, the results here collapse these perceptual distinctions for the sake of indexing simply the most common semantic correlates of the “constrained universe of timbres” available to each instrument (McAdams & Goodchild, 2017, p. 129).
Descriptors by selected instruments.
Note: Descriptors consist of all ranges, dynamics, and auxiliary playing techniques for each instrument. Descriptors listed in ascending rank order (1–10).
Categorization
Given the high variability in vocabulary, it is difficult to assess semantic norms from wordlist data alone. To investigate the conceptual groupings that characterize the dataset, therefore, categories were generated by the author through an inductive sorting process based on the complete set of timbre descriptors (including hapax legomena), then corroborated through an inter-rater agreement procedure (for more on this method see Slingerland & Chudek, 2011). In order to preserve the independence of each categorical variable for later statistical testing, each word was sorted according to its single category of best fit, although secondary and tertiary categories were noted. Close consideration of category membership also led to a reduction of the original dataset, as redundancies were eliminated (token f = 3571, type f = 792).
Inductive categorization resulted in seven basic conceptual groupings of timbre descriptors, described below. Inter-rater reliability analysis was then performed to determine the validity of this categorization schema. Three musically trained raters were given brief descriptions of the a priori categories; they then independently sorted the top 50 words into the categories. Cohen’s kappa (κ) was computed for each of the six coder pairs then averaged to provide a single index of agreement (Light, 1971). Kappa scores ranged between .60 and .79, with a mean score of κ = .68, p < .001. Although it must be acknowledged that the inference of conceptual categories from text corpus data is notoriously difficult and disagreement is inevitable, according to common interpretive practices this represents substantial agreement. Following reliability analysis, therefore, the seven-category typology was considered “final” and descriptive statistics were computed.
I have labeled the seven basic categories of timbre descriptors as follows (from most frequent to least): (a) Affect, (b) Matter, (c) Cross-modal correspondence (CMC), (d) Mimesis, (e) Action, (f) Acoustics, and (g) Onomatopoeia. To briefly define these categories, with examples (in
Affect: Words that refer to emotional and aesthetic properties of timbre, generally consisting of affectively valenced, evaluative, contrastive adjectives. Examples: mellow, noble, grotesque, gloomy, powerful, attractive, fine
“… nothing can equal the
Matter: Words that describe features of physical matter; i.e. objects or substances with weight, mass, shape, etc. Examples: thin, round, hollow, liquid, sharp, blunt, heavy
“In the lower register [the horn] is dark and brilliant;
CMC: Words borrowed cross-modally from other senses, encompassing an embodied conceptual transfer process by which an auditory target domain (timbre) is understood in reference to a non-auditory source domain (vision, touch, taste, smell) (Lakoff & Johnson, 1999). This category can be further subdivided by sense modality. Examples: bright, warm, sweet, smooth, dark, coarse, sparkling
“[These cello notes are]
Mimesis: Words that describe timbre metaphorically by way of comparison to other things, including direct sonic resemblance as well as more impressionist, poetic correspondences; i.e. it “sounds like” x, where x can be another instrument, the voice, nature, etc.
4
Examples: Flutey (flute-like), nasal, bell-like, throaty, growling, stormy, fairy-like
“[The violin is] the true This category might be divided into four general subcategories: Vocal terms (hoarse, breathy); instrument terms (gong-like, oboe-like); natural/weather sounds (thundering, windy), and poetic simile (sound “such as might come from a ghost if it had a pip in its throat”).
Action: Words describing physical actions or qualities of movement; i.e. sounds that act in certain ways or were produced through specific actions of the performer. This group often takes the form of verbal adjectives, but also frequently describes elements of performance technique (e.g., a pinched sound as the result of a pinched embouchure). Examples: Piercing, biting, lamenting, strained, mocking, muffled
“The characteristic sound of a trumpet with a fully assembled Harmon mute is
Acoustics: Words referring to specific auditory properties and/or the spatial contexts of sound production. Examples: Ringing, shrill, raspy, blaring, resonant, echoing, clangorous
“[Orchestra bells] will produce a beautifully
Onomatopoeia: Words that phonetically resemble the sounds they describe, either in lexical forms (buzz) or through vocables (/doo-ah/). Example: Click, hiss, rattle, ping, screech, cluck, honk
“The quality of tone in pizzicato [can sound like] a
Table 5 lists the top-ranked words for each conceptual category. Confirming intuition, affective descriptions are the most frequent (35.5% of corpus): composers often use timbre to convey specific affective intentions, and timbre has long been considered a “sign-post for the emotions” in symphonic music (Boulez, 1987, p. 163). This category also contains the highest diversity of the seven (CTTR = 7.2), perhaps reflecting its more subjective character. In contrast, CMC and acoustic descriptions appear to have a much more constrained lexicon (CTTR < 2), likely driven by the preponderance of just a few words within each category. Onomatopoeia represents the least frequent conceptual type (2% of corpus), with most of its observations driven by a small handful of instruments, especially percussion. There is a likely cultural component to the paucity of onomatopoeia: writers may have been reluctant to describe the timbral repertory of the august symphony orchestra using such “wild” terminology, even in “tamer” lexical forms (Rhodes, 1994). In other musical communities—for instance, the vocabulary prevalent among professional sound engineers in popular music (Porcello, 2004)—onomatopoeia figures much more prominently.
Seven categories of timbre description with top 10 words in each.
As a first exploratory step, a hierarchical cluster analysis of category membership for the instruments was performed. The analysis employed a farthest neighbor clustering method using chi-squared measures for categorical data (Figure 1). Intriguingly, for the most part this clustering procedure revealed natural groupings organized by family and consistent with intuition: clarinet and bass clarinet clustered together, as did flute and piccolo, oboe and bassoon, and so forth. This result is consistent with the factor analysis of Wedin and Goude (1972), who found that instrument description is classified according to family membership. However, some unexpected pairs cluster together in the lower branches, including saxophone with tuba, and horn with double bass. It would seem that descriptive lexicons common to certain instruments are more independent of family membership than others: saxophone and tuba are both frequently described as heavy, a term that does not appear in the top-ten rankings of any other woodwinds (with one exception) or brass. Moreover, the bottommost stems, rather than being organized by family, appear to be grouped according to the prevalence of terms such as dark, rich, deep, and other conceptually related matter and CMC vocabulary.

Cluster analysis of categorizations by instrument.
Next, in order to determine whether the frequency of descriptors for instrument families and individual instruments was associated with the conceptual categories beyond the level of chance, a log-linear analysis was performed. Log-linear models are a special case of the GLM designed to predict the frequency of categorical outcomes as a function of a linear combination of independent variables, similar to an ANOVA but for “count” instead of interval data (Anderton & Cheney, 2004). The test revealed significant main effects of instrument family and category membership, as well as two-way interactions, χ2(42) = 4953, p < .001. To further investigate these effects, separate chi-squared tests were performed to determine the level of association between the four instrument families and each of the seven categories. There was a significant association between instrument family and affect, χ2(3) = 21.08, p < .001, Cramer’s V = .09; matter, χ2(3) = 12.19, p < .01, V = .07; mimesis, χ2(3) = 9.43, p < .02, V = .06; acoustics, χ2(3) = 19.78, p < .001, V = .09; and onomatopoeia, χ2(3) = 108.55, p < .001, V = .20. CMC and action were not significantly associated with instrument family. Figure 2 displays the percentage difference from mean category frequency used to describe each instrument family.

Instrument family description in relation to average mean frequency of family categorizations.
Instrument family membership thus appeared to have a systematic effect on category of verbal attributes. To find out whether frequency distributions of the descriptive categories differed between individual instruments, 15 instruments with a token frequency greater than 65 (or 1 SD = 60 below the mean f of 125) were included in another log-linear analysis. (Bass clarinet and contrabassoon were omitted based on this criterion). As with the instrument families, the model indicated significant main effects of instrument and categories along with two-way interactions, χ2(119) = 3622, p < .001. Chi-squared tests indicated that all instruments exhibited statistically significant associations with membership in the seven categories, M χ2(14) = 27.58, p < .02, Cramer’s V > .12, with the exception of matter, χ2(14) = 22.99, p = .06. This suggests that variation in the description of orchestral instruments is not distributed in equal proportions among the seven conceptual categories, but rather reflects certain established conventions in timbre semantics that remain relatively consistent throughout the corpus.
As illustrated in Figure 3, each instrument has a unique descriptive profile relative to the others in its family. For instance, among the woodwinds (Figure 3(a)), the clarinet basically conforms to the categorical frequency distribution of the corpus; however, the contrabassoon is described with significantly more matter and onomatopoeia terms, and fewer affect and metaphor words than average. As a family, the woodwinds exhibited a greater variation in differences from mean category frequency than the others (as indicated here by larger x-axis scales in Figure 3(a)). Among the brass (Figure 3(b)), the trombone is described with a higher percentage of terms for affect and mimesis (mainly vocal) than the rest of the instruments. String description (Figure 3(c)), on the other hand, is fairly uniformly affect-driven.

Instrument description in relation to average mean frequency of instrument categorizations (by family). (a) Woodwinds. (b) Brass. (c) Strings.
Finally, a principle component analysis (PCA) was performed to determine whether any latent conceptual groupings structured the category results. Count data were converted to ratios for input, and sampling adequacy was confirmed (Bartlett). Varimax rotation generated three factors with eigenvalues >1. As shown in Table 6, these three factors accounted for 74.1% of total variance: Factor 1 (32.9%) loaded positively onto matter and onomatopoeia, versus the negatively loaded affect; Factor 2 (25.1%) consisted of positive loadings on CMC and acoustics; and Factor 3 (16.1%) loaded positively on mimesis and action, also with a strong negative loading on affect. Interestingly, affect—the most frequent descriptive category—loaded negatively onto all factors, indicating a strong inverse relationship between the frequency of affective descriptions of timbre and the major groupings revealed by the PCA.
Principle component analysis by conceptual category.
Results of principle component analysis (PCA) using Varimax rotation with Kaiser normalization. All eigenvalues > 1; combined total variance explained was 74.1%. Large positive factor loadings (> .50) indicated in
Discussion
This article explored three main questions regarding timbre description in a corpus of orchestration books: (a) What words are the most frequently used to describe qualities of instrumental timbre? (b) Are descriptive conventions actually different from one instrument to the next? And (c) What does this lexicon reveal about timbre conceptualization and cognition more generally?
Beginning with the first, descriptive, goal of this study, it was found that timbre words are characterized by a Zipfian distribution (Zipf, 1935): a small subset of terms accounted for a large percentage of all utterances about timbre. The most frequent 50 words comprised around half of the corpus, while 12% consisted of hapax legomena (words that only occur once). Each descriptor appeared on average only 4.51 times in the corpus. Interpretation for this result must remain ambivalent. In absolute terms, the lexicon of timbre for orchestral instruments is varied and diverse, reflecting a rich repertory of descriptive strategies. On the other hand, this pool is for the most part shallow: a small handful of words account for the lion’s share of actual language use. We might conclude that while there exists a great deal of poetic latitude in timbre description, most of the time this extensive verbal inventory is untapped. There are likely cultural and historical explanations for this trend: as noted previously, lexical diversity in orchestration treatises has decreased markedly from the florid writing of Berlioz to the stark descriptions of more recent authors. The gradual standardization of timbre vocabulary may also be due to a broader reorientation beginning in the late 19th century away from the values of musical romanticism and toward psychoacoustic definitions of timbre (see Wallmark & Kendall, in press).
Considering that English contains tens of thousands of adjectives, moreover, we might conversely ask ourselves why a paltry 800 or so verbal types suffice to comprise the entirety of the timbre lexicon in this corpus. Seen from this perspective, if timbre description is largely arbitrary and subjective, why was there not even greater diversity? (Admittedly, the opposite interpretation could also be drawn from this, with descriptive vagueness manifesting as lower lexical diversity as authors recycle the same small vocabulary to describe different instruments.) Either way, the simple fact that certain common English adjectives were absent in this corpus (e.g., diplomatic) while others were robustly present (bright) is evidence that some descriptors are viewed as better fits than others, resulting in a relatively high degree of consistency among authors. In future studies, it would be interesting to see if the spoken discourse about timbre reflects this same general pattern. To be sure, orchestration treatises are the products of professional musicians; it is not clear the extent to which other discursive communities may employ similar norms in timbre description.
This result also suggests that timbre semantics is constrained by certain discrete conceptual schemas, which were explored through an inductive categorization procedure. These seven categories confirm and extend results from the timbre perception literature. In a sound retrieval study, Wake and Asahi (1998) identified three main descriptive strategies, naming the sound itself, sounding situations, and sound impressions. Sarkar, Vercoe, and Yang (2007) similarly proposed three types of timbre descriptions, material properties, sensory modalities other than hearing, and subjective impressions. Furthermore, Zacharakis et al. (2014, 2015) identified the three semantic factors of luminance, texture, and mass to explain a large proportion of variance in timbre description. Though the taxonomies above varied in stimuli, language, and methods, the present result would seem to largely support these broad schemas while also adding semantic granularity to them. This approximate convergence suggests that the descriptive routines outlined here may be active in musical “talk and text” beyond that of the symphony orchestra, and might characterize a broader swath of the discursive landscape for musical timbre in many linguistic and cultural contexts.
Second, we can infer from the distribution of words among the seven categories that each instrument family and individual instrument (with a couple exceptions) has a strong association with the seven categories. This helps resolve whether there is a systematic relationship between certain types of descriptors and the instruments thus described: if timbre words were randomly distributed and the categories applied equally well to all instruments—e.g., the oboe was just as likely to be labeled noble (an affect word) as the French horn—we would expect all instruments to be described using similar proportions of each. This is not the case, confirming the hypothesis that variation in timbre semantics is likely systematic, and related to instrument and instrument family.
Finally, this study has implications for our understanding of the cognitive linguistics of timbre. PCA revealed a grouping of timbre terminology into three latent semantic dimensions. To visualize these relationships, Figure 4 plots the locations of the categories in a three-dimensional conceptual space. The first dimension loads positively onto onomatopoeia and matter, with a negative loading on affect. It could be argued that the two categories that exemplify this dimension reduce to an underlying conceptual schema that taps into the materiality of sound production; thus, we might label it the Material dimension of timbre description. This dimension relies upon a conceptualization of instrumental timbre as both the sound of material things (onomatopoeia, which foregrounds the ecological contingencies of sound production through phonetic imitation), as well as the physical properties of sounding materials themselves (matter).

Three-dimensional model of orchestral timbre conceptualization.
The second dimension, which is driven by positive loadings on CMC and acoustics, appears to be associated with sensory impressions both from the auditory environment and through non-auditory senses, particularly vision and touch. We could therefore label this the Sensory dimension, which is grounded in a conceptualization of timbre as cross-modal sensory perception. As Rimsky-Korsakov evinced, synesthetic adjectives for timbre are pervasive. In a behavioral study, Eitan and Rothschild (2010) reported tactile associations with a number of musical parameters, including pitch height, loudness, and timbre. The prevalence of cross-modal adjectives in timbre description might be accounted for by way of conceptual metaphor theory: as originally posited by Lakoff and Johnson (1980), we often make sense of abstract target domains through reference to concrete source domains (e.g., conceptualizing romantic relationships as journeys, as revealed in expressions such as “it’s been a long and bumpy road”). Conceptual metaphor theory has been applied fruitfully to a range of non-timbral musical dimensions (Cox, 2016; Larson, 2012; Zbikowski, 2002): the present result suggests that it also may be extended to timbre conceptualization. In addition to historical and cultural factors in timbre description, then, cognitive semantic constraints may influence which terms appear to fit a given timbral percept and which are considered incongruent.
Lastly, the third dimension loads onto action and mimesis. These categories are united by what we might label as Activity, which captures a crucial verbal (as compared to adjectival) component to timbre. Activity terms highlight the physical contingencies that go into the production and perception of sound (action), as well as the resemblances we actively project onto them (mimesis). As pointed out by Fales (2002), timbre is often considered what a sound is; in contrast, the Activity dimension captures what a sound does. Taken together, conceptual groundings in Material, Sensory, and Activity dimensions may be interpreted to support an embodied, ecological theory of timbre perception and cognition (Wallmark, Iacoboni, Deblieck, & Kendall, 2018).
There are a number of limitations to this study. First, manual extraction of timbre descriptions was admittedly cumbersome and prone to ambiguity. This was arguably necessary, however, until more precise learning algorithms can be used to automate the process. Additionally, linguistic and stimuli context was not considered (e.g., the oboe is nasal compared with the clarinet). Next, this study did not take into account the process of historical influence that undeniably affected word choice among these texts, which may confound in unpredictable ways the sampling assumptions upon which these statistical analyses were based. And finally, this study did not account for covarying musical domains in analyses of word frequency by instrument. A greater focus on the long-neglected role of tessitura and dynamics in timbre description promises to enliven future research.
Conclusion
This article used corpus-analytic methods to explore the conventions of timbre description in orchestration books. Drawing on a frequency of over 3600 timbre terms extracted from 11 texts, the study found that around half of all descriptions of instrumental sound used the same 50 words. Terms could be grouped into seven semantic categories: affect, matter, cross-modal correspondence, mimesis, action, acoustics, and onomatopoeia. The relative proportion of vocabulary from each of these categories varied systematically by instrument family and individual instrument. Furthermore, PCA revealed three semantic factors that conceptually ground timbre description in Material, Sensory, and Activity dimensions.
The current findings have important implications for the psychological and humanistic study of musical timbre (McAdams, 1993), orchestration (Dolan, 2012; Goodchild & McAdams, in press), and the cognitive linguistics of music (Johnson, 2007). Methodologically, this study demonstrated the novel analytical potential of using natural language data, as acquired through large text corpora, to explore issues in timbre perception and cognition typically approached using methods from experimental psychology (cf. Ferrer & Eerola, 2011; Kendall & Carterette, 1993). In showing certain linguistic consistencies in timbre description that complement the experimental literature, these findings could have implications for the development of music information retrieval systems (Leman, 2007) and computer music interfaces (Gounaropoulos & Johnson, 2006). Moreover, this approach could be used to quantitatively investigate other topics related to musical discourse within the fields of music psychology, historical musicology, and ethnomusicology.
Footnotes
Acknowledgements
I would like to thank Stephen McAdams and Roger Kendall for their helpful comments on an earlier draft of this article. I also wish to thank my graduate research assistants at the SMU MuSci Lab, Jessica Pinkham and Andrew Penney.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
