Abstract
The majority of research on music aesthetics treats music and lyrics as discrete entities, despite the artistic imperative that they should relate to one another in some way. This research computer analyzed both the music and lyrics of the songs to have reached the weekly UK top five singles chart from January 1999 to December 2013 (N = 1,414). The findings indicate that the typicality of a given set of lyrics relative to the corpus as a whole was associated with their popularity; that there were numerous associations between each of six mood scores assigned to the music and various aspects of the lyrics (e.g., passionate music was associated with lyrics addressing hardship and less concern with precise numerical terms); and that the relative contribution of the lyrics and music to overall popularity varied according to the means by which these were operationalized so that, for instance, music and lyrics contributed equally to explaining peak chart position, whereas music outperformed lyrics in explaining the number of weeks spent on the top five. Pop music and its lyrics are related to one another, and the relationship can be explained to some extent via existing concepts in the aesthetics literature.
Recent advances in desktop computing power have facilitated a number of recent studies concerning content analyses of music or the accompanying lyrics based on an entire or large sample from a complete corpus. The great majority of this work (and other research on music aesthetics) has treated the music per se and accompanying lyrics as two discrete entities: in some cases this has led experimental researchers to employ only instrumental music, or to researchers in a number of specific fields simply neglecting the possible relationship between music and lyrics. This seems to lack ecological validity, particularly in the case of popular genres that usually do feature lyrics, and denies the artistic reality that lyrics and music are often written in the belief that they in some way complement one another, and that the lyrics must presumably, therefore, contribute to the popularity of the track in question. This research attempts to address this by directly considering various relationships between music and accompanying lyrics across all those 1,414 songs which have reached the United Kingdom top five singles chart between 1999 and 2013, and considering these from an explicitly psychological perspective. Specifically, we aimed to identify whether the typicality of the lyrics can predict popularity (as typicality can predict the popularity of music), to map the relationship between musical and lyrical content and so determine what kinds of music tend to be paired with what kinds of lyrics, and to assess the relative contribution of music and lyrics to the popularity of a given track.
While the nascency of corpus-level work concerning music aesthetics means that the literature is inevitably disparate, three themes have emerged to date. These concern, respectively, content analyses that attempt: to illustrate the psychological features of a given musical corpus (e.g., Czechowski, Miranda, & Sylvestre, 2016; de Clercq & Temperley, 2011; Everett, 1999; Gauvin, 2015; Jackson & Padgett, 1982; Kreyer & Mukherjee, 2009; Petrie, Pennebaker, & Sivertsen, 2008; Van Sickel, 2005); attempt to predict the commercial success of music based on characteristics of the music and musicians (e.g., Bradlow & Fader, 2001; Giles, 2007; Hong, 2012; Pettijohn & Ahmed, 2010), and consider the relationship between particularly pop music and various social psychological and socioeconomic indicators (DeWall, Pond, Campbell, & Twenge, 2011; McAuslan & Waung, 2018; Neuman, Perlovsky, Cohen, & Livshits, 2016; Pettijohn, Eastman, & Richard, 2012; Pettijohn & Sacco, 2009; Zullow, 1991).
In addition to the disparate nature of the subject matter of this existing work, there is a corresponding lack of theoretical coherence between these studies. However, some indication of a possible fruitful theoretical approach is provided by a much larger body of corpus-level research carried out by Dean Simonton. He has demonstrated that the extent to which art works are original or typical of the corpus as a whole has implications for various aesthetic outcomes (see review by Simonton, 1997). Much of Simonton’s work concerning specifically music has focused on the concept of melodic originality, which was operationalized as the statistical probability of the transitions between notes within a given musical theme relative to the preponderance of these transitions across the corpus. Simonton (1980), for example, found increasing levels of melodic originality over the lifespan of 479 composers. The same research also found evidence of what he termed an “inverted backwards-J” shaped relationship between melodic originality of the 15,618 themes by these composers and their popularity. In a similar vein, Hass’s (2016) analysis of 500 early American popular songs found that melodic originality increased between 1916 and 1960, and that there was a curvilinear relationship between melodic originality and the popularity of the music.
A reasonable body of experimental evidence published from the 1980s onwards has taken a more theoretical approach in similarly indicating that positive aesthetic responses are predicted by the extent to which the artistic works in question are typical of the class from which they are drawn. Although various authors express this in slightly different ways, the common thread to all is that typical instances are more easily classified, and that it is this ease of classification that drives positive responses. In support of this, Martindale and Moore’s (1989) experimental research showed that typicality accounted for 51% of the variance in liking for music. On a larger scale, North, Krause, Sheridan, and Ritchie (2017) analyzed a larger database (from which a subset is employed here) showing that, among 143,353 pieces that had achieved any commercial success in the United Kingdom, there was a positive relationship between the extent to which each was typical of the corpus and the duration of commercial success. It is notable, moreover, that these findings parallel other recent research by Nunes, Ordanini, and Valsesia (2015) which presented experimental evidence that lyrics containing repetition can be processed more fluently, and corpus-level findings indicating that more easily-processed lyrics are more likely to reach number 1 positions in music sales charts, and do so more quickly. Therefore, the first hypothesis of this research was that the typicality of any given piece of music or set of lyrics relative to the corpus as a whole should each predict popularity, such that higher typicality is associated with higher popularity.
Second, numerous autobiographies and similar non-empirical sources describe attempts by musicians to compose lyrics and music that complement one another by expressing similar themes and moods (although see Simonton, 2000). The notion here is that musicians are subject to an artistic imperative to ensure that music and any lyrics in some way align with one another in order to facilitate communication, although we are not aware of any research on this. To provide just one well-known anecdotal example, however, John Lennon and Paul McCartney (The Beatles, 2000) have described how their early commercial releases (e.g., “From Me to You”) deliberately matched the relatively simple musical structures with lyrics focussing on first person pronouns, with the goal of maximizing immediate and direct communication with the listener. Given this, Hypothesis 2 was simply that we might also expect to find a positive relationship between the mood evoked by the music and the subject matter and mood evoked by the lyrics (across a large number of specific variables). Confirmation of such would, therefore, provide an initial mapping of the relationship between the content of music and lyrics.
This research also tests a third hypothesis concerning the relative contribution of music and lyrics in predicting popularity, given that much of the literature on music aesthetics explicitly ignores lyrics. Simonton (2000) considered this issue in the case of opera, using 911 works by 59 composers. He argued that although there are well-known exemplars of composers and librettists receiving equal credit for their work (e.g., Gilbert and Sullivan), opera audiences are often content to attend performances sung in a foreign language that would be understood by presumably only a (potentially small) proportion of them. In apparent concordance with this, Simonton showed that almost half of the variance in the degree of aesthetic success of the operas he considered could be explained by the identity of the composer, and that composers exerted a greater influence on the success of the work than do librettists. However, although there is no reason to doubt this conclusion in the context of the corpus of opera, music sales charts in many countries are dominated typically by lyrics sung in the predominant language(s) of the country in question, implying that these lyrics are important to listeners, and it seems reasonable to make the working assumption that lyrics are so prevalent in best-selling music partly because they provide an opportunity for direct and specific communication. Indeed, there is a small literature which explicitly indicates that poetry can, of course, elicit strong emotional responses and that these are analogous to responses to music (e.g., Zeman, Milton, Smith, & Rylance, 2013). As such, we might expect that when lyrics are (typically) in the predominant language of the audience, there is a greater scope for them to influence the popularity of the song in question. In short, the predominance of music over lyrics in predicting popularity may not apply (at least as strongly as in opera) to pop music sales charts, and the present data set presents an opportunity to test this. As such, Hypothesis 3a was that aspects of the music per se might predict popularity better than do aspects of the lyrics, consistent with Simonton’s findings concerning opera; although Hypothesis 3b was that this relationship might not be found, or even reversed, in the pop music considered here, such that lyrics predict popularity better than does music.
These issues were investigated using a database of 1,414 pieces of music, representing all those to have reached the top five on the weekly UK singles sales charts between 1999 and 2013. Both the lyrics and the music were computer analyzed according to a number of variables, and in the case of H1 and H3 were compared against four measures of popularity, given corresponding evidence in the experimental aesthetics literature showing that different measures of “hedonic tone” have different relationships with various predictor variables (e.g., Marin et al., 2016).
Method
This study employed a data set featuring all those individual songs that reached the weekly top five singles chart positions in the United Kingdom from January 1999 through December 2013. The top five (rather than, for instance, the top 10) was selected as the cut off simply to manage the workload associated with data collection. While previous research has addressed the song lyrics in order to investigate different hypotheses (e.g., Krause & North, 2019a, 2019b; North, Krause, Kane, & Sheridan, 2018), this study combines these with variables concerning the associated music per se (detailed below). Chart data were sourced from www.officialcharts.com, and reflects the charts used by the British Broadcasting Corporation (BBC): throughout the period in question, the BBC had the majority of radio audience share, and the chart formed the basis of the playlist employed in daytime music programming (by both the BBC and also a large number of commercial radio competitors). Note that although 1,565 songs reached positions one to five on the weekly UK charts from 1999 to 2013, data concerning both the lyrics and the music were available for only 1,414 songs since, for example, a number were instrumentals and for a small number of others it was not possible to reliably determine which of several versions was that which had achieved greatest public prominence, such that it is this set of 1,414 songs on which the analyses were run.
Lyrics variables
As detailed in North et al. (2017), North, Krause, Kane, and Sheridan (2018) and North, Krause, Sheridan, and Ritchie (2019b), song lyrics were sourced from various web sites (e.g., www.azlyrics.com), corroborated against a second source, checked for completeness (i.e., through reinstatement of omitted redundancies arising from instances of “chorus x2” or “repeat first verse”), and processed for language consistency (i.e., to ensure correction of misspellings and consistent use of contractions and truncations). Computerized text analysis software, Diction 7.0 (Hart, Carroll, & Spiars, 2013), was then used to analyze each set of lyrics. Diction compared each set of lyrics against a set of approximately 10,000 words, organized into lists that serve, respectively, as 36 variables, that were originally developed via analysis of 20,000 texts (Abelman, 2014; Sydserff & Weetman, 2002). For each instance of a word occurring in the lyrics that also appeared in the word list for a given variable, 1 was added to the score for that variable. In addition to the 36 variables, Diction calculates five composite variables (known as “master variables,” namely, certainty, optimism, activity, realism, and commonality, respectively) via combinations of the main variables (Huffaker & Calvert, 2005; Short & Palmer, 2008): details of the calculation of the 5 composite variables and of the 36 discrete variables are presented in Table 1. For each song, the scores produced by Diction on each variable were divided by the number of words in the text in question (to control for this, given that the lyrics were of differing lengths), and then multiplied by 1,000 to facilitate presentation. Note that Cook and Krupar (2010) used Diction previously to analyze song lyrics from the Great Depression era, and that the software has been employed in over 300 published studies to date (www.dictionsoftware.com/published-studies/), several of which have employed a variety of media texts.
Summary of the “Diction” dictionaries (taken from Hart, 1997).
The measure of the typicality of the lyrics was based on that used by North et al. (2017) and North et al. (2019b) and employed the five composite dictionaries, since in conjunction they provide “the most general understanding of a given text,” and were created explicitly to facilitate comparison between texts (Hart et al., 2013, p. 4). In order to calculate typicality, mean values were calculated across the corpus of lyrics from 1999 to 2013 for each of the composite variables in turn. For each song, the difference was then calculated between its score on each composite variable and the corpus mean score for that variable. Any negative values were multiplied by −1 so that the score represented the magnitude of difference from the corpus irrespective of the direction of this difference. The typicality score for each set of lyrics was then calculated as the sum of the difference scores for each of the composite variables in turn. Note, therefore, that high scores indicate atypicality relative to the corpus, and low scores indicate typicality relative to the database.
Music variables
Data concerning the musical component of each song were sourced from an existing data set, created in partnership with a private sector music organization (see details in North et al., 2017; North, Krause, Sheridan, & Ritchie, 2018, 2019a, 2019b). As detailed in North, Krause, Sheridan, and Ritchie (2018) and North et al. (2019b), a trained artificial intelligence (AI) process used algorithms to analyze and produce scores for each track concerning its degree of energy, beats per minute (BPM), and the extent to which it represented each of six mood clusters (namely, clean, simple relaxing; happy, hopeful, ambition; passion, romance, power; mystery, luxury, comfort; energetic, bold, outgoing; and calm, peace, tranquility). Energy and mood scores were based on analysis of each piece in terms of 69 differing combinations of 11 sonic properties (e.g., pitch, rhythm). In the case of energy scores, the AI process was trained on the basis of 200 exemplar tracks containing what were thought to be calming and energetic pieces, which the AI then learned to classify. In the case of mood ratings, the AI was trained via human ratings of 300 seed tracks. In the case of both energy and mood ratings, the AI then assigned values to each piece in the database on the basis of its similarity with others in terms of the 69 combinations of 11 sonic properties. The process by which the AI was developed and validated is detailed in US Patent No. 20100250471 (2010) and US Patent No. 20080021851 (2008). The BPM was analyzed via an algorithm developed from an industry standard, open source C++ library (see http://essentia.upf.edu): measures were taken every 30 s, and the average was calculated to produce a single score per track. The typicality score for each piece of music was produced by first calculating a mean value across the corpus for each of energy, BPM, and the six respective mood scores. As with the lyrics, for each song, the difference was then calculated between its score on each variable in turn and the corpus mean for that variable, any negative values were multiplied by −1, and the typicality score for each piece of music was then calculated as the sum of the differences on each variable from the corpus mean. Note, therefore, again that high scores indicate atypicality relative to the corpus, and low scores indicate typicality relative to the database. There are four published papers, North et al. (2017, 2018, 2019a, 2019b) which have previously employed the AI process adopted here to quantify musical variables and the popularity of commercially released music: these used 204,506 pieces that had enjoyed commercial success in the United States and a further 143,353 pieces that had enjoyed commercial success in the United Kingdom, and showed that the popularity and emotional content of this music were broadly consistent with theoretical predictions based upon the literature in experimental aesthetics that has employed human participants.
Popularity
Given Marin et al.’s (2016) argument that hedonic tone (i.e., the favorableness of an aesthetic response) is not a unitary construct, the popularity of each track was operationalized in four ways. Two measures were based on chart performance during 1999–2003, namely, (a) the peak chart position reached (one to five) for each song and (b) the cumulative number of weeks each song spent in positions one to five. In addition, two popularity scores from the broader music data set (North et al., 2017) were employed, namely, “United Kingdom hit popularity” and “United Kingdom hit appearance,” which aimed to provide a wider-ranging indication of the popularity of the songs. As detailed by North et al. (2017), the hit popularity score is based on UK sales chart information, incorporating charts that are general, genre-specific, format-specific (i.e., singles charts and charts concerning sales of albums on which the given song featured), and regional (e.g., Scottish): to produce a single score for each song, these data are weighted by the generality of the chart in question (e.g., the UK singles chart was assigned a weighting of 1 whereas appearance of the song on an album that featured in the UK albums chart was assigned a weighting of 0.5), and the variable gives an overall picture of the popularity of the song in question across various sales charts. For each track per chart, popularity was then operationalized by calculating the sum of 1 divided by (peak chart position multiplied by chart weighting). The hit appearance score is calculated as simply the number of weeks a song appeared on the top 40 charts, irrespective of numeric position, and provides an overall indication of the duration of the commercial success of a given song. Note that while data concerning peak chart position and number of weeks in positions one to five concern specifically the period from 1999 to 2013, the UK hit popularity and UK hit appearance measures draw on chart information dating back to 1962 to provide a more general overview of the cultural prominence of a given song over a very extended period of time.
Results
Hypothesis 1 was that the typicality of the music and lyrics should each predict popularity. The lyrics typicality score and music typicality score were used to predict each of the four popularity measures in turn, using one separate General Linear Mixed Model (GLMM) analysis for each respective measure of popularity (α < .013, that is, .05/4). The results are shown in Table 2. This shows that in the case of the number of weeks in the top five and UK hit appearance, the models were statistically significant, and the typicality scores concerning both the lyrics and the music were related negatively to popularity (and note the direction of scoring in the typicality variables, such that these negative relationships indicate that more typical music and lyrics were more popular). In the case of peak chart position, however, the GLMM model was non-significant although the lyrics typicality scores were related positively to popularity, and in the case of UK hit popularity, the model was non-significant, although typicality of the lyrics was related negatively to popularity.
GLMM analysis results concerning Hypothesis 1.
GLMM: General Linear Mixed Model; CI: confidence interval.
Degrees of freedom for predictor variables = 1, 1338.
Overall model: F(2, 1338) = 2.366, p = .094, ηp2 = .004.
Overall model: F(2, 1338) = 7.169, p = .001, ηp2 = .011.
Overall model: F(2, 1338) = 2.897, p = .056, ηp2 = .004.
Overall model: F(2, 1338) = 8.53, p < .001, ηp2 = .013.
Hypothesis 2 was that we might expect to find a positive relationship between the mood evoked by the music and the subject matter and mood evoked by the lyrics. To test this, a series of GLMM analyses were carried out, with each analysis investigating the extent to which each of the six respective music mood scores could be predicted by the lyrics variables. For each of the music mood scores, first, separate GLMM analyses were conducted employing each of the 41 Diction variables individually as predictor variables (see Appendix 1). Only those Diction variables demonstrating a significant relationship (α < .05) with the criterion variable were retained for the second step, and the results of these analyses (α < .008, that is, .05/6) are detailed in Table 3. These show that scores for the music as “Clean, simple, relaxing” were related positively to the number of different words, self-reference (i.e., references to the first person), and motion (i.e., terms concerning movement, physical processes, journeys, and speed). Scores for the music as “happy, hopeful, ambitious” were related negatively to the lyrics demonstrating aggression (i.e., depictions of competition and forceful action), accomplishment (i.e., words concerning task completion and organized behavior), and commonality (i.e., language concerning agreed upon values of a group). Scores for the music conveying “passion, romance, and power” were related positively to lyrics containing instances of leveling (i.e., words that ignore individual differences and which convey completeness and assurance) and hardship (i.e., words concerning natural disasters, hostile action, and censurable behavior), and negatively to lyrics containing instances of numerical terms (i.e., instances of numbers, dates, arithmetical operations, and other quantitative terms), cooperation (i.e., words concerning behavioral interactions leading to a group product), and embellishment (i.e., a high ratio of adjectives to verbs). Scores for the music conveying “mystery, luxury, and comfort” were related positively to the number of different words, and negatively to the lyrics containing instances of aggression and diversity (i.e., words describing individuals or groups who differ from the norm). Scores for the music as “energetic, bold, and outgoing” were related positively to the lyrics conveying instances of collectives (i.e., singular nouns concerning plurality concerning social groups, task groups, and geographical entities), and negatively to the number of different words in the lyrics, and to them containing instances of self-reference, spatial awareness (i.e., words concerning geographical terms, physical distance, and measurement), and exclusion (i.e., words concerning the causes and consequences of social isolation). Finally, scores for the music conveying “calm, peace, and tranquility” were related positively to the number of different words in the lyrics, instances of them conveying ambivalence (i.e., words concerning hesitation or uncertainty) and leveling, and negatively to instances of them conveying satisfaction (i.e., words denoting positive affective states and nurturance).
GLMM analysis results pertaining to Hypothesis 2 concerning mood.
GLMM: General Linear Mixed Model; CI: confidence interval.
Overall model: F(18, 1391) = 5.703, p < .001, ηp2 = .069. Predictor degrees of freedom = 1, 1391.
Overall model: F(24, 1385) = 17.858, p < .001, ηp2 = .236. Predictor degrees of freedom = 1, 1385.
Overall model: F(10, 1399) = 14.017, p < .001, ηp2 = .091. Predictor degrees of freedom = 1, 1399.
Overall model: F(20, 1389) = 5.655, p < .001, ηp2 = .075. Predictor degrees of freedom = 1, 1389.
Overall model: F(21, 1388) = 13.541, p < .001, ηp2 = .170. Predictor degrees of freedom = 1, 1388.
Overall model: F(10, 1399) = 19.335, p < .001, ηp2 = .121. Predictor degrees of freedom = 1, 1399.
Hypotheses 3a and b concerned whether characteristics of the music predicted popularity better than did the characteristics of the lyrics or vice versa. To test this, all the variables concerning music and lyrics (excepting the typicality scores) were entered into GLMM analyses using the same two-step method used to test Hypothesis 2 (step one results are illustrated in Appendix 2). Separate analyses were carried out for each of the four measures of popularity (namely, peak chart position, number of weeks in the top five, UK hit popularity, and UK hit appearance, respectively), and the results are detailed in Table 4 (α < .013, that is, .05/4) along with the mean effect size for the music and lyrics variables within each test, respectively (based on the individual predictor variable effect sizes), so that the mean effect sizes demonstrate the relative utility of music and lyrics in predicting popularity. Music and lyrics contributed equally to explaining peak chart position; music outperformed lyrics in explaining the number of weeks spent on the top five; lyrics outperformed music in explaining UK hit popularity; and lyrics outperformed music in explaining UK hit appearance.
Results of the GLMM analyses testing Hypothesis 3.
GLMM: General Linear Mixed Model; CI: confidence interval; BPM: beats per minute.
Overall model: F(12, 1396) = 15.132, p < .001, ηp2 = .115. Predictor variables degrees of freedom = 1, 1396.
Overall model: F(20, 1388) = 8.768, p < .001, ηp2 = .112. Predictor variables degrees of freedom = 1, 1388.
Overall model: F(11, 1329) = 6.277, p < .001, ηp2 = .049. Predictor variables degrees of freedom = 1, 1329.
Overall model: F(21, 1388) = 8.205, p < .001, ηp2 = .110. Predictor variables degrees of freedom = 1, 1388.
Discussion
In summary, there was evidence that the typicality of a given set of lyrics relative to the corpus as a whole was associated with their popularity; there were numerous associations between each of six mood scores assigned to the music and various aspects of the lyrics (e.g., passionate music was associated with lyrics addressing hardship and less concern with precise numerical terms); and the relative contribution of the lyrics and music to overall popularity varied according to the means by which these were operationalized so that, for instance, music and lyrics contributed equally to explaining peak chart position, whereas music outperformed lyrics in explaining the number of weeks spent on the top five. In the following paragraphs, we unpack these findings in more detail and address their theoretical consequences.
Hypothesis 1 stated that the typicality of the music and lyrics of any given song relative to the corpus should predict each of the four measures of the popularity of the song in question. This hypothesis was based on earlier, predominantly lab-based, research indicating that typicality is related positively to aesthetic responses. Only the models concerning the number of weeks on chart and UK hit appearance were statistically significant. The pattern of results concerning these was consistent, however, illustrating that within the individual tests, the typicality scores concerning both the music and lyrics were negatively related to the popularity measure in question, so that more typical music and lyrics enjoyed more popularity. Thus, these findings partially support Hypothesis 1 and the lab-based findings of previous research that typicality should promote popularity. They do so in the context of much more naturalistic musical stimuli and measures of popularity than have been studied hitherto.
Hypothesis 2 stated that, as a consequence of artistic goals, we might expect that the subject matter and mood of lyrics should reflect properties of the music in a manner that implies that each is composed to complement the other. The results showed that each of the six mood scores assigned to the music could be predicted by the lyrics variables. Two aspects of these findings are particularly notable. First, there was clear evidence that musicians employ lyrics that either complement or compensate for the mood of the music in a rather literal manner. To provide some selective examples of this for the sake of clarity, happy music was associated with lyrics containing lower levels of aggression; passionate music was associated with lyrics addressing hardship and lower levels of concern with precise numerical terms, cooperation, and embellishment; mysterious and luxurious music was associated with lyrics containing a larger number of different words (which increases potential ambiguity) and lower levels of aggression; music that was energetic, bold, and outgoing was associated with lyrics that concerned collective groups of people and associated negatively with lyrics addressing exclusion; and music that was calm, peaceful, and tranquil was associated with lyrics that were ambivalent. The lack of previous research makes it very difficult to comment on the theoretical implications of this with any certainty. However, in the light of the findings concerning typicality (Hypothesis 1), one possibility is a good candidate for further research. As noted earlier, lab-based research on typicality has argued that this is positively related to aesthetic responses because typical stimuli are more easily processed. We might expect that complementary lyrics and music facilitate processing of one another and so enhance the listener’s understanding of the intended message. For instance, if music and lyrics complement one another then we might expect to find greater agreement between listeners on the intended meaning of a given song, or where listeners would be able to reach these judgments more quickly, than when the music and lyrics did not complement one another.
A second aspect of the findings concerning Hypothesis 2 is that there were also a number of relationships concerning other variables that cannot be explained in terms of musicians simply matching the qualities of the music to the qualities of the lyrics in a rather literal manner. Instead, the results provide a clear indication of how musicians have tended to match a number of specific musical properties to a number of specific lyrical properties in a more abstract, artistic manner. More simply, the quantity of significant relationships provides some detailed insight into the creative process concerning pop music by telling us which musical and lyrical properties musicians tend to “feel” are appropriately matched to one another, even though these specific relationships are not intuitive. For instance, Table 3 indicates that scores for the music as clean, simple, and relaxing were related positively to scores for the lyrics on self-reference; scores for the music as happy, hopeful, and ambitious were related negatively to scores for lyrics on accomplishment; and scores for the music as expressing mystery, luxury, and comfort were related negatively to scores for the lyrics on diversity. The nascency of research on the relationship between music and lyrics makes it very difficult to propose confident theoretical explanations as to why these relationships might exist, but the sheer fact of their existence across such a large cohort and range of variables which reflect the daily music listening of the United Kingdom means that these relationships should be a candidate for future theorizing. For instance, some specific hypotheses raised by the present findings, that may be tested by future work with practicing musicians, are that the tendency to pair clean, simple, and relaxing music with lyrics containing self-reference is because the undemanding nature of the music provides a clear opportunity for complex self-reflection; the tendency to pair happy, hopeful, and ambitious music with lyrics addressing commonality of values between people reflects a collectivist, utopian worldview on the part of musicians; the tendency to avoid pairing passionate, romantic, and powerful music with lyrics containing numerical terms and embellishment may reflect an attempt to convey a rousing call to action that lacks sophistication and qualification; the tendency to avoid pairing music that conveys mystery, luxury, and comfort with lyrics that address diversity may similarly reflect an attempt to deliberately avoid acknowledging any subtlety of argument and instead focus upon heterogeneity; the tendency to pair music that is energetic, bold, and outgoing with lyrics concerning collective groups of people and lower numbers of different words again arguably reflects a deliberate strategy for producing an unsophisticated, rabble-rousing call to action; and the tendency to pair music conveying calm, peace, and tranquility with lyrics containing a larger number of different words and lower levels of satisfaction suggests that the song is used to produce an opportunity for expressing detailed and complex concerns.
Hypothesis 3a, following Simonton’s (2000) earlier research on opera, was that musical variables should outperform lyrical variables as predictors of popularity, whereas Hypothesis 3b was that lyrical variables may perform much better in predicting popularity given that the lyrics of United Kingdom’s best-selling pop songs are usually in English. Mean effect sizes demonstrated that music variables outperformed lyrics variables in predicting the number of weeks spent in the top five, and music and lyrics variables performed equally in predicting peak chart position, whereas lyrics variables were better than music variables in predicting UK hit appearance and UK hit popularity. The relative importance of music and lyrics in predicting popularity differs between the various predictor variables and according to the precise operationalization of popularity, and so lends more weight to H3b rather than H3a. Clearly, however, the greater importance of the lyrics in predicting the two longer-term and more general measures of popularity (UK hit popularity and UK hit appearance) than in the two popularity measures derived solely from top five singles sales charts suggests that lyrics have a longer-term relationship with general popularity, whereas music per se is associated more closely with the shorter-term, very high levels of popularity that are required for appearance of the song in the top five singles chart.
Before concluding, we should note a number of limits to the generalizability of the present findings and the possibilities for further research that these raise. Music is of course a cultural product, and the present findings relate to only those songs that reached the weekly UK top five singles chart between 1999 and 2013. They may not be replicable in different countries or different historical periods. It is notable, however, that the top five singles represented the basis of radio broadcasting in the United Kingdom throughout the period in question, and so do provide good coverage of the music to have reached public prominence in that country. As such, the findings may well have relevance for market testing of new music prior to commercial release, and suggest that this should overtly address (a) the typicality of both music and lyrics and (b) the extent to which the vocabulary of the lyrics (and perhaps also the means of their delivery) complements the characteristics of the music. Nonetheless, the discrepancy between the present results and those of both Simonton (2000) concerning opera and lab-based research on typicality indicate the need for work of this nature to be carried out via a variety of research methods, on a number of different bodies of music, and potentially on a culture-by-culture basis. The present findings are perhaps of more value as an early indicator of what may be possible, rather than as an explicit guide concerning what should immediately be done by those working in the music industry. We note also that the means of measuring typicality employed here, which is reasonably novel except for North et al. (2017) and North et al. (2019b), may be a fruitful technique for the music industry to adopt, given that commercially available music is already digitized.
We should also highlight the small effect sizes associated with the significant results reported here. These seem tolerable for three reasons. First, a range of commercial factors distorts the market for pop music and mitigates against finding any relationships at all among the variables considered here: even small effect sizes are potentially very interesting in this commercial context. Second, given the complexity of music, it seems highly plausible that a very large number of variables could be implicated in the issues investigated here: when investigating the relationship between any two specific variables, it would be surprising if anything but small effect sizes resulted. Third, the reliance of this research on pre-existing data sources inevitably limits the adequacy with which more general theoretical concepts can be captured. For instance, the operationalization of typicality drew on only those variables described here, rather than the broader number of factors upon which any typicality influence is based during everyday music listening: given this limitation, we again feel it is appropriate to prioritize statistical significance over effect size. Nonetheless, the small effect sizes identified by this research again suggest the need for considerable refinement of the conclusions, and our hope is that the present findings and arguments provide some guidance for future research in this nascent field.
In the meantime, the present findings indicate that the typicality of the lyrics relative to the corpus can predict their popularity; that there are a number of associations between various aspects of the music and lyrics, and that these are readily interpretable; and that the relative contribution of music and lyrics to the popularity of commercially successful songs varies according to the precise means by which these are operationalized. There is a relationship between pop music and the lyrics of that music that is intuitive and which may be explicable to some extent through existing theoretical concepts in the literature on psychological aesthetics.
Footnotes
Appendix
Results of the first-step GLMM analyses concerning Hypothesis 3.
| Predictor variable | Peak chart position | Number of weeks on chart | UK hit popularity | UK hit appearance | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F | p | ηp2 | F | p | ηp2 | F | p | ηp2 | F | p | ηp2 | |
| Number of different words | 1.265 | .261 | 0.001 | 0.007 | .934 | 0.000 | 0.000 | .992 | 0.000 | 2.113 | .146 | 0.001 |
| Numerical terms | 3.268 | .071 | 0.002 | 0.011 | .917 | 0.000 | 4.783 | .029 | 0.003 | 2.598 | .107 | 0.002 |
| Ambivalence | 0.963 | .327 | 0.001 | 0.006 | .940 | 0.000 | 2.616 | .106 | 0.002 | 0.596 | .440 | 0.000 |
| Self-reference | 2.002 | .157 | 0.001 | 0.006 | .940 | 0.000 | 0.066 | .798 | 0.000 | 2.720 | .099 | 0.002 |
| Tenacity | 4.197 | .041 | 0.003 | 10.568 | .001 | 0.007 | 1.500 | .221 | 0.001 | 1.746 | .187 | 0.001 |
| Leveling | 0.633 | .427 | 0.000 | 8.990 | .003 | 0.006 | 0.247 | .619 | 0.000 | 7.335 | .007 | 0.005 |
| Collectives | 23.124 | .000 | 0.016 | 1.525 | .217 | 0.001 | 4.798 | .029 | 0.003 | 52.772 | <.001 | 0.036 |
| Praise | 0.032 | .858 | 0.000 | 1.556 | .212 | 0.001 | 0.746 | .388 | 0.001 | 1.265 | .261 | 0.001 |
| Satisfaction | 383.452 | <.001 | 0.214 | 77.422 | <.001 | 0.052 | 19.273 | <.001 | 0.014 | 44.898 | <.001 | 0.031 |
| Inspiration | 0.055 | .814 | 0.000 | 0.002 | .966 | 0.000 | 0.111 | .739 | 0.000 | 2.717 | .100 | 0.002 |
| Blame | 0.047 | .828 | 0.000 | 0.511 | .475 | 0.000 | 0.905 | .342 | 0.001 | 0.020 | .887 | 0.000 |
| Hardship | 1.269 | .260 | 0.001 | 3.677 | .055 | 0.003 | 0.057 | .811 | 0.000 | 0.679 | .410 | 0.000 |
| Aggression | 1.146 | .285 | 0.001 | 0.093 | .760 | 0.000 | 0.224 | .636 | 0.000 | 0.002 | .962 | 0.000 |
| Accomplishment | 0.239 | .625 | 0.000 | 1.350 | .245 | 0.001 | 2.679 | .102 | 0.002 | 2.399 | .122 | 0.002 |
| Communication | 0.226 | .635 | 0.000 | 0.049 | .824 | 0.000 | 0.340 | .560 | 0.000 | 2.033 | .154 | 0.001 |
| Cognitive terms | 2.478 | .116 | 0.002 | 0.011 | .918 | 0.000 | 0.027 | .869 | 0.000 | 4.442 | .035 | 0.003 |
| Passivity | 0.204 | .652 | 0.000 | 0.770 | .380 | 0.001 | 0.081 | .776 | 0.000 | 1.148 | .284 | 0.001 |
| Spatial awareness | 0.872 | .351 | 0.001 | 0.543 | .461 | 0.000 | 1.575 | .210 | 0.001 | 4.939 | .026 | 0.003 |
| Familiarity | 33.060 | <.001 | 0.023 | 6.362 | .012 | 0.004 | 0.002 | .966 | 0.000 | 7.483 | .006 | 0.005 |
| Temporal awareness | 0.430 | .512 | 0.000 | 1.176 | .278 | 0.001 | 4.393 | .036 | 0.003 | 7.636 | .006 | 0.005 |
| Present concern | 0.029 | .865 | 0.000 | 2.417 | .120 | 0.002 | 5.001 | .025 | 0.004 | 6.527 | .011 | 0.005 |
| Human interest | 3.759 | .053 | 0.003 | 6.227 | .013 | 0.004 | 3.349 | .067 | 0.002 | 0.806 | .370 | 0.001 |
| Concreteness | 3.351 | .067 | 0.002 | 13.571 | <.001 | 0.010 | 37.281 | <.001 | 0.026 | 0.053 | .819 | 0.000 |
| Past concern | 6.510 | .011 | 0.005 | 0.042 | .838 | 0.000 | 0.293 | .589 | 0.000 | 5.097 | .024 | 0.004 |
| Centrality | 2.259 | .133 | 0.002 | 1.585 | .208 | 0.001 | 0.526 | .469 | 0.000 | 1.028 | .311 | 0.001 |
| Rapport | 0.029 | .864 | 0.000 | 0.882 | .348 | 0.001 | 0.001 | .982 | 0.000 | 1.249 | .264 | 0.001 |
| Cooperation | 2.272 | .099 | 0.002 | 0.028 | .866 | 0.000 | 0.703 | .402 | 0.000 | 1.430 | .232 | 0.001 |
| Diversity | 0.000 | .986 | 0.000 | 2.419 | .120 | 0.002 | 2.417 | .120 | 0.002 | 1.897 | .169 | 0.001 |
| Exclusion | 0.509 | .476 | 0.000 | 0.001 | .971 | 0.000 | 7.175 | .007 | 0.005 | 10.000 | .002 | 0.007 |
| Liberation | 7.102 | .008 | 0.005 | 6.005 | .014 | 0.004 | 1.537 | .215 | 0.001 | 0.001 | .976 | 0.000 |
| Denial | 2.232 | .135 | 0.002 | 0.417 | .519 | 0.000 | 0.004 | .948 | 0.000 | 0.016 | .899 | 0.000 |
| Motion | 0.145 | .703 | 0.000 | 5.016 | .025 | 0.004 | 2.703 | .100 | 0.002 | 0.422 | .516 | 0.000 |
| Insistence | 0.716 | .398 | 0.001 | 1.665 | .197 | 0.001 | 0.050 | .824 | 0.000 | 3.952 | .047 | 0.003 |
| Embellishment | 0.746 | .388 | 0.001 | 0.045 | .831 | 0.000 | 0.471 | .493 | 0.000 | 12.017 | .001 | 0.008 |
| Variety | 4.450 | .035 | 0.003 | 13.684 | <.001 | 0.010 | 0.053 | .819 | 0.000 | 9.417 | .002 | 0.007 |
| Complexity | 0.857 | .355 | 0.001 | 9.013 | .003 | 0.006 | 0.201 | .654 | 0.000 | 5.629 | .018 | 0.004 |
| Activity | 0.678 | .411 | 0.000 | 7.527 | .006 | 0.005 | 0.136 | .712 | 0.000 | 5.411 | .020 | 0.004 |
| Optimism | 4.007 | .046 | 0.003 | 8.660 | .003 | 0.006 | 0.033 | .856 | 0.000 | 5.207 | .023 | 0.004 |
| Certainty | 0.114 | .736 | 0.000 | 11.170 | .001 | 0.008 | 0.244 | .621 | 0.000 | 6.591 | .010 | 0.005 |
| Realism | 0.917 | .338 | 0.001 | 7.993 | .005 | 0.006 | 0.103 | .748 | 0.000 | 4.985 | .026 | 0.004 |
| Commonality | 0.721 | .396 | 0.001 | 8.443 | .004 | 0.006 | 0.172 | .678 | 0.000 | 5.151 | .023 | 0.004 |
| Energy | 18.058 | <.001 | 0.013 | 8.122 | .004 | 0.006 | 1.012 | .315 | 0.001 | 0.010 | .922 | 0.000 |
| BPM | 0.210 | .647 | 0.000 | 0.245 | .621 | 0.000 | 9.642 | .002 | 0.007 | 2.187 | .139 | 0.002 |
| Mood 1 score | 3.486 | .062 | 0.002 | 4.563 | .033 | 0.003 | 6.173 | .013 | 0.004 | 4.962 | .026 | 0.004 |
| Mood 2 score | 0.305 | .581 | 0.000 | 0.975 | .324 | 0.001 | 19.755 | <.001 | 0.014 | 2.781 | .096 | 0.002 |
| Mood 3 score | 1.276 | .259 | 0.001 | 27.218 | <.001 | 0.019 | 1.134 | .287 | 0.001 | 0.025 | .874 | 0.000 |
| Mood 4 score | 6.554 | .011 | 0.005 | 15.765 | <.001 | 0.011 | 1.570 | .210 | 0.001 | 0.406 | .524 | 0.000 |
| Mood 5 score | 8.888 | .003 | 0.006 | 12.324 | <.001 | 0.009 | 12.834 | <.001 | 0.009 | 5.497 | .019 | 0.004 |
| Mood 6 score | 7.465 | .006 | 0.005 | 1.725 | .189 | 0.001 | 1.072 | .301 | 0.001 | 0.146 | .703 | 0.000 |
BPM: beats per minute.
For each analysis, degrees of freedom = 1, 1408 for all Diction variables; 1, 1411 for Energy; 1, 1342 for BPM; 1, 1412 for all mood scores.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
