Abstract
The current study examined the relationship between fame and melodic originality among the refrains of over 500 early American popular songs. The main goals were to attempt to replicate results detailed by Simonton (1994), to compare different measures of melodic originality in the context of information theory, and to utilize hierarchical linear modeling in the analyses. The following hypotheses were tested: (1) melodic originality varies across historical time; (2) melodic originality is a positive function of composer age; and (3) fame is a curvilinear function of melodic originality. Results showed that melodic originality increased from 1916 to 1960—the period covered by the current corpus—which is consistent with hypothesis 1. Hypothesis 2 was not confirmed, as results showed a negative relationship between originality and age. The test of hypothesis 3 showed that a significant amount of the variation in fame could be attributed to a non-linear relationship with originality, but that the fame–originality relationship is moderated by genre (instrumental v. vocal). Implications for further studies of the psychomusical contributions to fame are discussed.
Researchers interested in studying creative development and achievement in composers often examine the distribution of high- and low-quality works across composers’ careers. Much of the research focuses on classical music (Kozbelt, 2005, 2008; Kozbelt & Burger-Pianko, 2007; Simonton, 1977, 1980a, 1980b, 1989, 1994) and on popular songs from first half of the 20th century (e.g., Hass & Weisberg, 2009; Hass, Weisberg, & Choi, 2010). While classical music offers a larger corpus of work and a richer history (Simonton, 1994), American popular music published between 1900 and 1960 offers a relatively standardized corpus (cf. Wilder, 1972), meaning that there should be sufficiently fewer genre-based differences in the latter corpus (Hass & Weisberg, 2009). Popular music, though sometimes dismissed by critics, is generally representative of cultural trends (Middleton, 1990), and has been used as a vehicle for understanding the relationship between music and emotion and memory (e.g., Krumhansl, 2010; Schellenberg & Scheve, 2012).
Despite the recent attention given to popular songs by music cognition researchers, less is known about how songs achieve public recognition. In an effort to add to this recent body of work on popular music, this article examines the relationship between information-theoretic measures of melodic originality and fame in the songs from the Great American Songbook. 1 The current study synthesizes the two-note transition–probability approach to melodic originality employed by Simonton (e.g., 1994) with recent interest in information-theoretic measures such as entropy and point-wise mutual information (e.g., Margulis & Beatty, 2008). Before describing the current analysis in full, I will first summarize several analyses of melodic originality in classical music and derive hypotheses. Following that, I will introduce definitions for entropy and point-wise mutual information used in the article. Finally, the analysis will be presented and fully discussed.
Melodic originality in classical music
Simonton (cf., 1994 2 ) embarked on a decade-long investigation into melodic originality across themes from a large corpus (> 15,000 themes) of classical instrumental and vocal music (Barlow & Morgenstern, 1948, 1976). He quantified melodic originality by first tabulating the two-note transition probabilities (i.e., relative frequencies of bigrams) across the first six notes of each theme in the corpus. He then assigned each theme an originality score, defined as:
with P(i, j) representing the probability of each bigram (represented by relative frequencies), and dividing the sum by 5 creating an average probability across the five bigrams. Subtracting the average from 1 yields the complement of the average probability of the bigrams, and thus represents an “improbability” score, with the desired property of larger numbers signifying more originality. Simonton’s main justification for this measure was that Martindale and Uemura (1983) showed that themes that scored higher on Simonton’s originality measure were also rated as “higher in ‘arousal potential’” (Simonton, 1994: p. 34), by listeners.
Simonton’s (1994) summary also provides a number of interesting results regarding the relationship between melodic originality and a slew of other variables, most notably composer age, historical time, and fame of the composition as a whole. According to Simonton, originality should increase to a peak well into composers’ careers, and then dip slightly (see also Simonton, 1989). Originality also varies substantially across historical time, with Simonton showing support for a 5th-order polynomial (generally increasing) functional relationship between the two variables. However, a more recent study (Kozbelt & Meredith, 2010) showed that a linear increase in melodic originality between 1500 and 1950 is a better fit to the data. Regardless, originality of Western tonal melodies seems to have increased across its more than 400 year history.
Simonton also showed that the fame of a composition—measured in terms of the number of citations across an array of music anthologies, concert guides, and music appreciation textbooks—is a backward, inverted J-function of melodic originality (i.e., curvilinear). He concluded that the most famous songs were those that offered a medium amount of melodic originality. To add more depth, Simonton qualified the fame–originality relationship in a second paper (Simonton, 1980b) using a metric called Zeitgeist originality—essentially centering each song’s repertoire originality score around the yearly mean of originality scores for all songs contributing during that time period (mainly 5-year blocks). Simonton concluded that composers begin their careers by matching their styles to the Zeitgeist but, as their careers progress, they begin to deviate from the Zeitgeist, producing more and more original work relative to their time-period. In describing this, Simonton laments the early death of Mozart, who may have gone on to produce even more original works if only he had lived long enough.
Though the sheer volume of analyses performed by Simonton on this 15,000-theme dataset should not go unrecognized, Simonton’s claims have been recently challenged by studies using multilevel modeling (a.k.a. hierarchical linear modeling: Raudenbush & Bryk, 2002). In hierarchical linear modeling (HLM), variance can be partitioned to more precisely test hypotheses on datasets with observations nested in higher-order units. More specifically, HLM allows for regression coefficients of some variables to be functions of other, higher-level variables. For example, Kozbelt and Meredith (2011) constructed a hierarchical model in which melodic originality (level 1) was nested within composer-level variables (level 2) such as birth year and compositional style. In doing so, they allowed for variations in the relationships between age and originality, and originality and fame in different factors of the composer-level variables. They found that composers with longer careers (level-2 factor) exhibited larger linear increases in originality (level-1 dependent variable) than less prolific composers (level-2 factor) with shorter careers. Further analysis by Meredith and Kozbelt (2014) using HLM showed evidence against the swan-song phenomenon. Thus, further examination of the relationships between originality and time, and originality and fame using a different corpus are necessary. In addition, analyses of datasets of this source need to reflect the nested nature of the data, and so HLM was be employed in appropriate phases of the current study.
The current study
To summarize, melodic originality seems to vary over time and within individual composers’ careers. Originality, defined in terms of an overall corpus, and in terms of specific years may also predict the fame that a composition earns. Thus, the specific goals of this study were to attempt to replicate these effects in a corpus of American popular songs composed between 1900 and 1960.
There are a few notable differences between this and prior analyses. First, there is reason to believe that reference works regarding American popular music sometimes provide biased views of song quality (for a full discussion see Middleton, 1990). Second, it has been argued (e.g., Hass, 2011, 2013) that more reliable and valid information about creative impact can be culled from electronic databases, if they exist. Fortunately, music information researchers have compiled a freely accessible and dynamic database of citation counts of popular songs (www.secondhandsongs.com), which served as the sole indicator of fame for this analysis. 3 The database specifically tallies the number of different artists that have covered a song, and is curated by a team of music information-retrieval researchers, for the purpose of this type of analysis.
Count data of this sort also have an advantage over composite data for they allow for the direct examination of distributions of rare events. Indeed, many works published by songwriters and composers go unnoticed, meaning that the production of a very famous song is a rare event (cf., Lotka, 1926). This is interesting because the generating process for rare events can be approximated with a Poisson distribution (e.g., Ross, 2010) and, thus, Poisson regression can be applied in the current case to more accurately analyze the relationship between melodic originality and fame. Indeed, Simonton (2003) hypothesized that a stochastic combinatorial process governs all creative productivity, and that the probability that a particular person creates a famous product is Poisson distributed (i.e., immensely good ideas are rare events). In prior research (e.g., Kozbelt & Meredith, 2011) the fame variable was transformed to conform to the assumption of normality for a general linear model. Such a transformation is not applicable to the current dataset because of the presence of zeros. In fact, if fame reflects an underlying Poisson processes then transforming the data would obscure our understanding of how and why high fame songs come to exist. So another key aspect of this analysis is that it is the first of its kind to feature an un-transformed, Poisson-distributed measure of fame.
Also, the current dataset, though composed of only 50 years of music, does have nested properties. Specifically, songs are nested in composer’s careers and within genres. As described, Kozbelt and Meredith (2011) showed that the use of HLM might be better than a disaggregated regression approach to this type of data (see also Meredith & Kozbelt, 2014). The relationships among originality, composer age, and historical time are likely nested within the genre (vocal or instrumental) to which a song belongs. Thus, a random intercept model was constructed for originality to adequately reflect the nesting. However, the fame variable (Poisson distributed) violates basic HLM assumptions, and requires a generalized HLM. The latter is more complex than a single-level generalized linear model, but an increase in computing capability over the last two decades enables good approximation to the quite complicated generalized HLM estimators (e.g., in the R statistical programming environment, see below).
Finally, a secondary aim of this study was a comparison of Simonton’s melodic originality metrics (repertoire and Zeitgeist) with other some alternative measures. It should be noted that in his initial analyses, Simonton (1980a, 1980b) defined melodic originality as the sum of the transition probabilities (reverse-scored), which is just an approximation of a high-order Markov chain to the sequence of transitions. It is an approximation because accurate Markov chain calculations of higher order would require conditioning the probability of each transition on the prior state (Ross, 2010). More importantly, a Markov chain approximation assumes that each state (in this case, each transition to a new pitch-class) is independent of everything except the prior state, highly unlikely given the compositional conventions across the western canon, including a preference for small intervals (Narmour, 1990), and correlation between pitch-class distributions and the tonal hierarchy (e.g., Huron, 2006; Krumhansl, 1990). Markov models have been used in music perception to calculate similarity between melodies but only to quantify the difference between a target melody and several transformations of that same melody (Schulkind & Davis, 2013).
In light of the possibility that other kinds of probability metrics may be useful in defining originality, the current analysis introduces three additional metrics related to information theory. First, one can compute the information content, or in this case, the negative base-2 logarithm of the product of the transition probabilities. Recent studies have shown that information content is directly related to the expectedness of melodic events (e.g., Hansen & Pearce, 2014). Second, it may be advantageous to examine originality in terms of pitch-classes rather than bigrams. For example, repetitive melodies contain less information than relatively random melodies in the sense that a person should find it easier to predict what the next note will be while listening to a repetitive melody (Cohen, 1962). Repetitiveness may be a way in which listeners gauge originality of a melody, and may contribute to the ultimate fame of a composition. The method section describes a simple procedure for capturing repetitiveness in terms of the entropy of each six-note melodic segment. Finally, another information-theoretic metric, point-wise mutual information (PMI), has emerged as a way to gauge the semantic relatedness of components of linguistic bigrams (Recchia & Jones, 2009), and the procedure was adopted here for gauging how formulaic each six-note segment was, conditioned on the distribution of pitch-classes in the entire corpus. In the following sections, variables will be defined in more depth, and the results will be compared with Simonton’s (1994) conclusions, as detailed above.
Method
Songs
A total of 553 songs served as the subjects of analysis. Songs were included based on two interdependent criteria: (1) the song’s inclusion in one of six “fake books” used by performing musicians (see appendix), and (2) the song having been published either as sheet music or via recording before the year 1960. The latter criterion was chosen because 1960 marks the end of the period covered in Wilder’s (1972) in-depth analysis of the Great American Songbook. The 1960s also mark the start of a shift in the focus of popular music from bands performing cover songs to bands writing and performing original material (Wald, 2009). The latter is important for the validity of using cover counts as the sole indicator of fame.
Fame
The fame variable was defined as total number of entries for each song in the Second Hand Songs (SHS) database (www.secondhandsongs.com). A separate study on the entire careers of five of the most eminent composers in this sample (Hass & Weisberg, 2015) revealed that cover counts from SHS and another database (www.allmusic.com) correlate very highly (r ≈ .71). Again, one reason to use the counts from the SHS database as the only criterion was motivated by the shape of the distribution—resembling a Poisson process (see Figure 1)—which is consistent with studies of creativity in science (Simonton, 2003). SHS is also maintained by a team of music information researchers who constructed it for the purpose of this type of archival research. Thus, it is a reliable index of how many times a song was covered by another artist, either in the studio or on a live album. Again, this metric is quite similar to Simonton’s (e.g., 1994) composite citation index, compiled from several text-based reference sources.

Histogram of SHS citations for vocal and instrumental songs.
Transition-probability variables
Repertoire and Zeitgeist originality
In keeping with Simonton’s (1994) latest conventions, repertoire originality was computed by first calculating the average probability across the 5 bigrams, and then subtracting that result from 1. Zeitgeist originality was computed by regressing repertoire originality on historical time, saving the predicted scores, and calculating the difference between each repertoire originality score and its predicted score.
Information content
Repertoire and Zeitgeist originality are essentially probabilities of the union of a particular set of bigrams (i.e.,
Pitch-class variables
In addition to examining transition-probabilities, the actual distribution of pitch classes was also tallied. Figure 2 shows that scale degrees 1, 5, and 8 (corresponding to the tonic, median, and dominant pitches) were most common among songs in the corpus. This is to be expected given Krumhansl’s (1990) discussion of work by Knopoff and Hutchinson (1983) showing that this distribution is common in western classical music, and may be evidence of composers’ awareness of the tonal hierarchy. As such, two pitch-class variables were constructed, one that did not account for this underlying distribution (entropy) and one that did (average PMI).

Distribution of scale degrees among the melodies.
Entropy
Shannon’s (1948) equation for entropy can be used to assess the amount of uncertainty in a signal source (see also Cohen, 1962; Margulis & Beatty, 2008). Shannon entropy is defined as:
where Pi refers to the probability of some symbol, i, in a set of M symbols. The base 2 of the logarithm means that uncertainty is measured in bits, or the number of binary digits needed to encode each symbol. Shannon entropy must be defined across a sample space that sums to 1. To apply the metric to this analysis, each of the six slots available in the melodic segments were assigned a probability value of 1/24. An algorithm written in R (see note 1) then scanned each segment and determined how many times (if at all) a note was repeated. If a note occurred only once, its posterior probability was increased by 3/24 once, resulting in the probability of 1/6 for that slot. However, if a note was repeated n times, n × 3/24 was added to the initial 1/24 for the first slot, while each of the remaining slots with the repeated note remained at 1/24. This ensured that all of the six note “distributions” would always sum to 1, thus satisfying the second Kolmogorov axiom of probability. 4
As an example, if a melody consisted of six unique notes, the posterior probability of each note would be 1/6 (or 1/24 + 3/24). The resulting entropy value is:
If a melody consisted of five unique notes and one duplicate, the resulting probabilities would be 7/24, 1/6, 1/6, 1/6, 1/6, and 1/24. Using the entropy equation, the resulting entropy value for that melody would be roughly 2.433.
Average PMI
As previously described, PMI is a metric used to gauge the semantic relatedness between two words or documents (Recchia & Jones, 2009). It is calculated in the following way: given two tokens (e.g., words or pitch-classes) PMI is the ratio of the joint probability of the two tokens to the product of the probabilities of the tokens themselves. Thus, the formula for PMI between two symbols is given by:
with a and b corresponding, in this case, to two adjacent pitches. Average PMI was chosen as the best representation of the formulaic nature of each melody because it has been shown to correlate highly with originality when used to score items from tests of creative thinking (Harbison & Haarmann, 2014).
Other predictors and controls
A number of variables were constructed as control variables for the purposes of the regression analyses. Historical time was defined as the year of publication of each song, which in most cases was the sheet music copyright date. In some instances (e.g., the songs of Thelonious Monk) publishing copyrights were obtained much later than the dates of original recordings, and so songs’ dates were then checked for accuracy using two reference works (Cook & Morton, 2008; Suskin, 2010). Composer age was defined as the difference between the composer’s birth year and the year of publication for songs with one composer. For songs with multiple composers or lyricists, composer age was approximated by taking the average of the ages of the composers at the time of publication. Each song was also labeled according to the number of compositions contributed to the sample by the primary composer of the music—the first or only name credited with writing the music for each song.
Finally, songs were assigned a dummy genre coding of 0 for songs written and performed as instrumentals, and 1 for songs written with vocals. A song was designated as a vocal tune if it was originality written as a vocal tune. This was applied even in cases in which the vocals did not feature prominently in all cover recordings. Examples of the latter come from the career of Duke Ellington, with many of his songs published as vocal numbers, only later to be performed by his band and other musicians as instrumentals. The idea is that the initial melodic composition may have been constrained by the knowledge that a human voice was necessary for performance, which in turn might restrict the range and combination of pitches used in the melody (Simonton, 1994).
Results
All data preprocessing and analysis was completed using R (R Core Team, 2014), and the full dataset and R script (written in R Studio) is available on the author’s Open Science Framework account (see Author’s Note below). R software packages (in addition to the base packages) used in the analyses were the lme4 package (Bates, Maechler, Bolker, & Walker, 2014) and the psychometric package (Fletcher, 2010). Figures were constructed using the ggplot2 package (Wickham, 2009).
All variables were checked against the assumptions of regression analyses including normality, except fame, which is clearly Poisson distributed (see Figure 1). Number of compositions was positively skewed, mainly due to the fact that 56% of the composers included in this database only contributed a single song. Inspection of a Q-Q plot confirmed that the number of compositions variable was not fit for regression, and thus it was excluded from the subsequent analyses. For the purposes of testing the curvilinear trends detailed by Simonton (1994), the “poly” function in R’s base package was used to create orthogonal linear and quadratic predictors for age, historical time, and the various originality metrics where necessary. In other models in which only linear hypotheses were tested, the predictors were centered on their respective grand means. Centering enables the use of the intercept value in HLM as an estimate of the true mean of the criterion variable.
Comparisons of originality metrics
Table 1 contains the Spearman Rank-order correlations for all variables in the analysis. The square box isolates the intercorrelations among the various originality measures. All three of the metrics based on the transition probabilities (repertoire originality, Zeitgeist originality, information content) correlated strongly with each other (r ⩾ .89). Entropy correlated significantly with the transition probability measures, but not with average PMI. Average PMI did not significantly correlate with any other originality variable.
Descriptive statistics and intercorrelations among variables across genres. Correlation coefficients calculated using Spearman’s rho. Square box shows correlations among the originality metrics. Historical time and composer age appear here in raw form, though both variables were centered for the regression analyses.
*p < .05, **p < .01, n = 553 songs.
Originality as criterion
To evaluate Simonton’s results relating originality to historical time, repertoire originality, Zeitgeist originality, and information content were modeled with separate hierarchical models, each of which included age at composition (level 1) and genre (level 2) as predictors. In the repertoire originality and information content models, historical time was entered as a level 1 predictor. It was not considered a level-2 predictor because of the brief time-span of this corpus. Because Zeitgeist originality is mathematically defined in terms of the average level of originality per year, historical time was excluded from that model. Model testing proceeded in two phases: testing of the null model and examination of genre-based intraclass correlation (ICC), and random-intercepts model testing (Radenbush & Bryk, 2002). The unweighted ICC formula was used following the guidelines set by Raudenbush & Bryk (2002) in order to ascertain the proportion of variance in each originality metric between the genres. It can also be used as a yardstick for assessing whether specification of level-2 variables is necessary (Woltman, Feldstain, MacKay, and Rocchi, 2012). ICC was calculated using the “ICC1.lme” function in the psychometric package (Fletcher, 2010). Finally, there is a controversy about the stability of p-values from the lme4 package (Bates, et al., 2014), so they are not reported in the text, though approximately significant results are flagged in Table 2.
Results of regression analyses for the three originality criterion variables. Models A and C are hierarchical linear models with a random intercept for each genre. The variance estimates (as standard deviations) of the random effect of genre of Model B is an ordinary regression model. Unstandardized estimates reported for interpretability. See text for the equations used in Models A and C.
p < .05; **p < .01.
Abbreviations: HLM = hierarchical linear model; OLS = ordinary least squares.
The general two-level model for each of the originality criteria below can be summarized as:
Level-1 Equation (year = historical time):
Level-2 Equations:
This notation is consistent with that used by Raudenbush & Bryk (2002) where the betas are level 1 coefficients, the gammas are level-2 coefficients, and r and u are level-1 and level-2 (respectively) error terms. Additionally, level-1 predictors are specified in mean deviation form so that the intercept is interpretable as an estimate of the true mean of originality scores. Thus, originality of a song is conceived of as a function of the average originality across songs (intercept), and the three level-1 predictors (historical time, age, and age-quadratic)—which in turn are a function of the level-2 effect of genre—along with the respective errors of estimation.
Repertoire originality
The intercept-only model for repertoire originality revealed ICC of .13 meaning that 13% of the variance in originality is at the genre level, and the remaining 87% is at the song level. Therefore, the two-level model for repertoire originality was tested and included age (linear and quadratic), and song date (centered on the grand mean) as song-level predictors, with random intercepts for the two genres (see above). The main results are summarized in Table 2A along with variance estimates. The estimated mean originality across genres was β0j = 0.98. Originality increased with the passage of historical time (β1j = 0.0017). Originality also significantly decreased as composers aged (β2j = −0.0264), with no significant quadratic trend. If we consider originality the opposite of probability, as historical time passed, compositions became more original by about 0.2 percent by year. Originality also dropped on the order of 3 percent per year of aging.
Zeitgeist originality
The intercept-only model for Zeitgest originality revealed an ICC of .02 meaning that only 2% of the variance in originality is at the genre level, and the remaining 98% is at the song level. The lack of a strong Genre component led to an ordinary least squares regression using the level-1 model above, and omitting Genre as a predictor. Zeitgeist originality was thus regressed on age (linear and quadratic) yielding a significant overall result, with albeit a small effect size, F(2, 550) = 6.97, p = .001, R2adjusted = .02. Again, Zeitgeist originality decreased as composers aged (b = −0.03, p < .01), with no significant quadratic trend. The interpretation is the same as above.
Information content
The intercept-only model for information content revealed an ICC of .15 meaning that 15% of the variance in information content is at the genre level, and the remaining 85% is at the song level. Step 2 again included age and historical time (linear and quadratic) as song-level predictors, following the two-level equations above. The regression estimates are summarized in Table 2C, along with variance components. The estimated mean information content across genres was β0j = 29.67 bits. There was a significant increase in information content over time (β1j = 0.81), but a decrease in information content as composers aged (β2j = −11.07), with no significant quadratic trend. Information represents the base-2 logarithm of the product of the transition probabilities for each melody. Thus, the coefficient for age represents a loss of 11 bits of information per melody as a composer ages, while passage of time is marked by an increase of 0.81 bits per year.
Fame as criterion
As described above, Figure 1 shows that fame scores resembled a Poisson distribution, meaning that Poisson regression is the ideal analysis for this data. However, the mean-variance ratio for fame was 63.16 signifying overdispersion. Poisson regression holds this ratio at 1 because the mean and variance of a Poisson distribution are assumed to be equal (Ross, 2010). To test whether overdispersion would influence fame, fame was regressed on information content, historical time, and composer age as a diagnostic check using a single-level quasi-Poisson generalized linear model. This procedure does not hold dispersion at 1, and the estimated dispersion parameter of the quasi-Poisson model was very high (Destimated = 43.91). This indicates that the variance in fame greatly outweighed the mean fame score per song.
To address an overdispersion issue in a count variable like fame, one can extend the generalized linear model to negative binomial regression (log link). However, introducing genre as a level-2 factor causes some problems fitting cross-level interaction terms, as the estimation procedure for the likelihood function becomes extremely complex (see Help page for “glmer” in the lme4 package). So instead of including genre as a random effect, the model for fame was fit using the “glm.nb” function from the MASS package (Venebles & Ripley, 2002) in R. Specifically, this allowed for a simpler analysis of how genre might moderate the relationship between originality and fame. As shown in the prior analysis, there was a significant amount of variability across the two genres with regard to originality.
In addition to the stipulation that genre is a fixed effect in the final model, the analysis of fame was further simplified by having information content stand as the only transition-probability predictor. There are three reasons for this: (1) Table 1 shows that information content correlates nearly perfectly with both repertoire originality and Zeitgeist originality; (2) conceptually, information content seems theoretically more interpretable (i.e., bits of information in a melody) than the two originality criteria used by Simonton (1994); and (3) it is easier to compare the strength of the two new metrics—entropy and average PMI—as predictors of fame to a single transition-probability metric, rather than three metrics.
Entropy and PMI metrics
Before examining the explanatory power of all the predictors of fame, two models were compared with a likelihood ratio test to examine whether the introduction of entropy and average PMI as predictors was necessary. If a more parsimonious model, nested within a larger model, does not significantly differ in terms of the likelihood functions of the two, it is wiser to retain the more parsimonious (i.e., simpler) model. To make the comparison, fame was regressed on all of the predictors, and then again regressed on all of the predictors excluding entropy and average PMI. The likelihood ratio test of the two models was not significant χ2(2) = 0.43, p = .81, and thus the more parsimonious model was retained. That means that entropy and average PMI are somewhat superfluous to this particular model of fame. Thus, the remaining analyses focused on information content as the only melodic originality variable.
Additive fame model
In the first model, fame was regressed on genre (instrumental as comparison group), and the following in both linear and quadratic forms: information content, age, and historical time. Table 3A shows the results of the first model. Only genre (b = 0.87, p < .01) and historical time linear (b = −8.96, p < .01) significantly predicted fame. Because negative binomial regression has a log link function, exponentiation of the regression coefficient (e0.87 = 2.38) yields an incidence rate for the categorical predictor, genre. In this case, vocal compositions were 2.38 times more likely to achieve high fame than instrumentals (see Figure 1). In addition, fame decreased as a linear function of historical time, which is not surprising given that fame is a cumulative measurement, such that the longer songs have been “on the market,” the more likely they are to achieve cumulative fame.
Negative binomial regression results predicting fame for an (A) additive model and a (B) interaction model. The reference group for the Genre predictor is Instrumental. Unstandardized estimates reported for interpretability.
*p ⩽ .05; **p < .01.
Abbreviation: IC = information content; AIC = Akaike Information Criterion.
Interaction model
Because fame varied by genre, and the originality models showed that originality also varied by genre, a model with two interaction terms—genre × information content and genre × information-content-quadratic—was fit to compare with the additive model. The results are presented in Table 3B, which shows that the interaction model yielded a very slight reduction in AIC compared to the additive model. A more formal likelihood ratio test of the comparison between the interaction model and the additive model suggested the interaction model was a marginally better to the data (χ2(2) = 5.37, p = .06).
The estimates for genre and historical time are nearly identical to those in Table 3A. However, one striking difference emerged in the interaction results. The addition of the interaction term rendered significant both information content quadratic (b = −4.55, p < .01) and the interaction between genre and information content quadratic (b = 5.49, p = .02). The first result indicates that fame is an inverted U-function of information content, and the second indicates that this effect is much stronger in vocal music compared to instrumental music. As Simonton (1994) predicted, in this final model, fame has a nonlinear relationship to melodic information content.
Discussion
There were two purposes to these analyses: (1) to test whether melodic originality varied across historical time and across composers’ careers, and (2) to test whether melodic originality predicted the fame of a sample of American popular songs. All three transition-probability measures of originality (repertoire, Zeitgeist, and information content) decreased with age, while repertoire originality and information content increased with historical time. The relationship between originality and historical time is consistent with Kozbelt and Meredith’s (2011) reanalysis of Simonton’s (e.g., 1980a) data, showing a monotonic increase in originality of classical melodies across five centuries, in contrast to Simonton’s (1994) fifth-order polynomial results. The result for age is contrary to prior studies that show monotonic increases in originality as age increases, before a plateau (e.g., Simonton, 1989, 1994).
Also, repertoire originality—defined by Simonton (e.g., 1980b)—was almost perfectly correlated with information content—calculated by taking the base-2 logarithm of the product of the transition probabilities across bigrams. The latter is preferred both for the numeric range of the scores and because the concept of information content is used by other music cognition researchers (e.g., Hansen & Pearce, 2014). However, the other information-theoretic indicators—entropy and average PMI—did not relate strongly to either originality or fame. It is not clear why this is the case, except for the fact that in semantic analysis, PMI performs better when the training involves a larger scale corpus (e.g., 1 million tokens) than the corpus presently used (3318 tokens). Another interpretation is that PMI is only assessing how formulaic a melody is in theoretical terms, but does not directly relate to how appealing the melody is. However, further analysis with entropy and PMI using entire melodies, rather than the first six notes, may yield different results. For example, Margulis and Beatty (2008) used entropy to aesthetically differentiate various classical music genres. However, the goal of that analysis was descriptive, and the authors defined entropy measure across entire genres (e.g., Bach Chorales, Mozart String Quartets), rather than within a single song segment. Regardless, the utility of both entropy and PMI should continue to be explored in studies of music analysis and aesthetics.
Perhaps the most interesting of the current results was that the inclusion of an interaction term to control for variations in information content by genre led to a confirmation of Simonton’s (1980b, 1994) hypothesis that a curvilinear relationship exists between fame and originality. This is evidence that genre may indeed be moderating the originality–fame relationship. Because the distribution of fame is non-normal, the relationship between fame and originality (conditioned on genre) is not a U-shaped function. Rather, Figure 3(a) shows that a high concentration of vocal songs exists at the low end of the originality spectrum, all of which have modest fame. Figure 3(b) shows a higher concentration of instrumental songs in the middle of the originality spectrum, and that the songs with the highest fame also fell within this midpoint. Comparison of the two figures also illustrates the variation in information content across genres. Instrumental songs were far more varied in their originality than were vocals.

Plots of information content scores and SHS citations for (a) vocal and (b) instrumental songs.
Finally, both figures show that the relationship between fame and originality is not a simple one, and further nonlinear analyses should be conducted on similar datasets to explore the complexities in this relationship. However, the current results do provide an important contribution by showing that information content is related to fame in an interpretable way, namely, that the relationship fits Simonton’s (1994) earlier assertions that there is an “optimal” level of melodic originality between highly predictable and highly unpredictable that relates to high fame.
Beyond the first six notes
In his analyses, Simonton (1994) raised the issue that an analysis of the first six notes of a melody might not capture the aesthetic experience of listening to the song. The current analysis did not measure aesthetic experience directly, but rather assessed fame in terms of citation counts. Though the relationship between fame and originality is not linear, a significant amount of variation in fame can be attributed to genre-based differences in information content. However, one important difference between this analysis and previous studies of classical music is that only one segment of music from each song was examined. It may be the case that analyses that examine different sections of popular songs—for example the differences among refrains, verses, and bridges—will yield different results regarding the relationship between fame and originality.
However, there is another aspect of popular music that the current analysis does not examine. It can be argued that one of the main functions of popular music is that people dance to it (Wald, 2009). Indeed, Kozbelt and Burger-Pianko (2007) included metric and rhythmic variables in their analysis of the fame of lieder from Schubert’s career. Metric analysis may be less important in the Great American Songbook as nearly all of the songs in this dataset were written in 4/4, common time, or cut time. However, there are substantial rhythmic and stylistic variations across the songs, and also across different performances of each song. For example, Kurt Weill’s “Mack the Knife” (1928) was written as a kind of sinister ballad, with lyrics in German. Bobby Darin’s famous 1959 version featured a livelier feel based on Louis Armstrong’s version from 1956. The fact that the Darin version was hugely successful may owe both to the crooner’s singing style and to the swing feel, though the song was not performed as written. The current analytic system cannot differentiate between different versions of the songs, though there are efforts to do that in the music information retrieval community (e.g., Bertin-Mahieux & Ellis, 2012).
Of course, not all popular songs feature happy, lively, danceable tempos. Schellenberg and von Scheve (2012) found that over the past 50 years, popular songs have become progressively sadder sounding—featuring more minor modes and slower tempos. Indeed, since the recording industry boom in the 1960s, consumption of popular music has changed dramatically, including an increase in the identification of a song with a single, definitive recorded version (Wald, 2009). Many people can now listen to music privately with headphones, rather than relying on musicians to reproduce a song for them in the dancehall. Schellenberg and von Scheve concluded that the changes in tempo and emotional complexity in popular music is not likely attributed to a single cause. Instead, popular music content is likely an interaction among cultural and personal variables. Future studies should continue examine rhythm and tempo in relation to popular song fame, as they may provide more a more complete musical picture than analyses focusing simply on melody.
Analysis of lyrics
Finally, popular song lyrics are an important aspect missing from the current analysis. This is very important since the current analysis showed that vocal songs were two and half times more likely to achieve high fame than instrumentals. Indeed, Wilder (1972) singled out Cole Porter’s lyrics as being even more important than his compositional skills. However, Schellenberg and von Scheve (2012) were skeptical about the results of studies that emphasized the importance of lyrics over musical content among more recent popular songs (e.g., Pettijohn & Sacco, 2009a, 2009b). Instead, the authors reasoned that more information could be gleaned from the mode and tempo of the songs. However, none of these studies focused on the relationship between lyrical content and fame, or long-term reception of the songs. Rather, the emphasis was on examining listeners’ recollections about the songs and surrounding life events (see also Krumhansl, 2010; Krumhansl & Zupnick, 2013). Again, Kozbelt and Burger-Pianko (2007) set a precedent for lyrical analysis by incorporating coding schemes from Martindale’s (e.g., 1990) Regressive Imagery Dictionary (RID), a text-mining tool designed to identify different psychoanalytic, emotional, and conceptual themes in text. Comparison of a text-only model to other models of fame revealed that the RID data did not account for much variance in recording counts. Instead, the authors concluded that the contributions of lyrics may be confounded by the fact that lyrics must appear within rhythm. At the same time, the RID is biased toward identifying Freudian primary and process imagery, and Kozbelt and Burger-Pianko (2007) only examined that data across the first six notes in each theme.
A far better approach to examining lyrical content is either the use of PMI scores or latent semantic analysis (Landauer & Dumais, 1997) to measure semantic similarity both within and across songs. This is much more in line with the melodic analysis systems covered in this article, and latent semantic analysis has been successfully applied to many real-world text-analysis tasks such as scoring student authored essays (e.g., Foltz, Laham, & Landauer, 1999; Kintsch, 2002). However, one downside of both of these analyses is they are not designed to examine emotional content, as is emphasized in recent studies of popular music lyrics (e.g., Pettijohn & Sacco, 2009a). Yet, it is easy to design natural language processing algorithms to search for keywords among lyrics, and even among song titles (e.g., “love,” “happiness”). Future studies should include these measures to more accurately assess the degree to which lyrics, over and above melodic and rhythmic attributes, influence the fame of popular songs.
Limitations
Despite the intriguing results that emerged from the analysis, a few limitations should be addressed. Though a substantial amount of low-fame works were included in this sample, at most the dataset captured about 10% of the songs from any one songwriter. A concurrent study (Hass & Weisberg, 2015) of the full careers of five of the most prolific songwriters included in the database show that about 60% of songs fail to achieve any fame at all. The current database was restricted to songs from well-known songbooks. Such books are designed for performing musicians, meaning that lesser-known songs might be excluded. At the same time, Figure 1 shows that the recording counts resembled the Poisson distribution that is to be expected for such data (e.g., Simonton, 2003).
As always, archival analyses such as this one suffer from generalizability issues. However, this corpus represents an excellent cross-section of songs written by a large number of composers and deliberately intended for mass consumption. Thus, the corpus represents a good testing ground for hypotheses about aesthetics, originality, and fame that could be generalized to similar product domains.
Conclusion
The current analysis sheds light on the relationship between fame and originality among American popular songs written in the first half of the 20th century. Two results from classical music studies—an increase in originality over historical time and a non-linear relationship between fame and originality—were replicated. However, originality and fame both varied substantially between vocal and instrumental melodies, and that difference seems to be moderating the fame–originality relationship. The most positive result from the current analysis is that information content is an excellent way to quickly assess the originality of a melody, and is also a construct well studied by music psychologists. Future work in this area will continue to shed light on this fascinating relationship between melodic content and song success.
Footnotes
Appendix
Sources of music:
The Real Book: Volume I. Hal Leonard: 2007
The Real Book: Volume II. Hal Leonard: 2005
The Standards Real Book. Sher Music Co: 2000
The New Real Book: Volume I. Sher Music Co: 1988
The New Real Book: Volume II. Sher Music Co: 2005
The New Real Book: Volume III. Sher Music Co: 2005
Author’s note
All of the data along with the algorithms for producing the dependent variables are available for download at the author’s Open Science Framework account (direct link: osf.io/k6cz8).
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
