Abstract
Musical preferences are a fundamental individual difference predicting a multitude of listening behaviors. For decades, researchers have investigated how musical preferences are organized but have been hindered by genre-based and self-report methodologies. Recently, researchers have begun to investigate musical preferences at the feature-level using stimuli, rather than at the genre-level using self-reports. However, these new methods have been experimental and limited in their ecological validity. To address these recent limitations, we use an ecologically valid behavioral approach based on one million people who listened to more than 200,000 songs from streaming services, which is to our knowledge the largest study to date on the structure of musical preferences. Individual musical preference was measured from song playback counts and analyzed using principal components analysis on the psychological and sonic music features. Our results showed that music-feature preferences had a three-dimensional structure confirming previous theory and research. These dimensions are Arousal (level of energy in music), Valence (spectrum of negative to positive emotions), and Depth (intellectual and emotional depth in music). These findings lay firm ground for future research on music-feature preferences and pave the way for social-psychological and neurobiological experiments with music.
Music is a culturally ubiquitous phenomenon that has immediate effects on cognition, affect, and the brain (Blacking, 1995; Levitin, 2006). Research suggests that people spend between 15% and 25% of their waking lives listening to music, and today the music recording industry is worth over $15 billion (Motion Picture Association of America, 2007; Rentfrow, 2012). Gone are the days when people need to wait patiently for the radio to play their favorite song or dig into their record collection to pick out a suitable song. Today, people’s daily music listening behavior is dictated by algorithms developed by streaming services that aim to suggest music pieces that meet the needs and preferences of their users.
Alongside the transformation of modern day music listening habits, science has also brought the study of music more into focus. One of the emergent areas of music psychology over the past decade and a half has been the study of musical preferences. This is not surprising considering that musical preferences are at the very basis of musical behavior. It is a fundamental individual difference that influences how music will affect a person and provides a window into why people listen to music and the psychological needs that it fulfills. Indeed, musical preferences have been shown to link to a host of factors including age, personality, cognition, values, and most recently testosterone levels (Boer et al., 2011; Bonneville-Roussy, Rentfrow, Xu, & Potter, 2013; Doi, Basadonne, Venuti, & Shinohara, 2018; Greenberg, Baron-Cohen, Stillwell, Kosinski, & Rentfrow, 2015; Greenberg et al., 2016; Rentfrow, Goldberg, & Levitin, 2011).
One question that has plagued researchers is the underlying psychological organization of musical preferences. Initially, due to technological constraints, researchers were confined to assessing the structure of musical preferences through self-reported genres (Rentfrow & Gosling, 2003). However, genres are ill-defined and elusive constructs that misrepresent a listener’s musical taste. To overcome these limitations, research adced beyond genres by assessing musical preferences via a careful selection of audio excerpts administered to participants (Rentfrow et al., 2012). This line of research was able to identify that preferences for styles can be organized into five components: Mellow, Unpretentious, Sophisticated, Intense, and Contemporary. Furthermore, this research also discovered that musical preferences are driven by the specific musical attributes or features in the music (e.g., sad, deep, complex, thrilling, happy; Rentfrow et al., 2011, 2012).
Recently, Greenberg et al. (2016) used over 200 musical pieces representing 26 genres and subgenres and 38 musical attributes to show that perceived musical features across and within genres can be organized into three robust dimensions: Arousal (the degree of energy in music from low to high), Valence (the spectrum of negative to positive emotions), and Depth (the intellectual and emotional complexity in the music). Fricke and Herzberg (2017) replicated and extended these findings in a German sample using a separate set of musical attributes. Most recently, Fricke, Greenberg, Rentfrow, and Herzberg (2018) showed that the use of computer-extracted music features yields the same three-dimensional structure as previously found with human ratings. However, this prior line of research lacked ecological validity, because it relied on only a select number of musical excerpts chosen by experimenters.
Greenberg and Rentfrow (2017) recently argued that the next step for music research is to use big data (including data from streaming services) to address research questions with greater ecological validity. The present research aims to fill this gap by measuring music preference from behavioral data. In this study, we utilize playback counts of music pieces as a measurement for music preference. This method should retain the advantages of audio-based assessment over self-reported assessment, and should be able to avoid other response biases, such as the social desirability bias (e.g., Van de Mortel, 2008).
The goal of the present research was to examine the underlying organization of music-feature preferences using big data of the music listening behavior of over 1 million users across more than 200,000 songs.
Music features
Research on music preference first focused on musical features to provide an explanation on why certain musical genres cluster together in preference assessments (Rentfrow & Gosling, 2003). Utilizing user-provided descriptions of songs, this research found that the participants used not only sound-related features (such as slow, fast, loud, electric) but also psychological characteristics (such as relaxing, romantic, or clever) to describe music pieces (Rentfrow et al., 2011; Rentfrow & Gosling, 2003). Other researchers examined preference patterns for specific music characteristics, such as a general preference for happy music (Schellenberg, Peretz, & Vieillard, 2008). It was also revealed that preference for musical features relate to style preferences. For instance, mellow music styles, such as soft rock and r&b/soul, were found to correlate positively with relaxing and romantic features, and negatively with aggressive, loud, and fast features (Rentfrow et al., 2011). On the contrary, sophisticated music was associated with psychological attributes including inspiring and intelligent features, but not with sound-related attributes such as electric or percussive features (Rentfrow et al., 2011). These clear correlational patterns suggested that it should be possible to use these musical features to measure musical preferences from the outset. Furthermore, these musical features avoid biases that might come with the use of genres or styles such as unfamiliarity with a genre, social desirability, or other social connotations. For instance, North and Hargreaves (1999) found that people use stated preference for musical genres to define their social identity, and thus might select music based on its social implications rather than preference for the music itself. Therefore, researchers have begun to investigate the organization of musical preferences based on the underlying musical features that pervade genres and styles.
The sets of musical features in this line of research stem from user statements (Rentfrow & Gosling, 2003) and have been enhanced by expert jurors (Rentfrow et al., 2011, 2012). They comprise sound-related and psychological variables, which can be further segmented into categories, such as auditory features (e.g., fast, loud), instruments (e.g., piano, woodwind), positive affect (e.g., fun, happy, romantic), negative affect (e.g., aggressive, sad), energy (e.g., calming, party music), and cerebral (e.g., deep, complex, intelligent). Preferences for these features were found to reveal a three-dimensional structure (Fricke & Herzberg, 2017; Greenberg et al., 2016). The relationship with the established five-dimensional structure of musical genre preferences was also examined: For instance, preference for high-valence music correlated mostly with preference for mellow and contemporary music styles; preference for high arousal music was correlated with intense but not contemporary music styles; and preference for depth features was correlated with sophisticated and mellow music genres (Fricke & Herzberg, 2017).
Another field of research concerned with the analysis, classification, and categorization of music pieces in terms of their music features is the field of Music Information Retrieval (MIR). MIR systems seek to extract different facets from music pieces, most often stated as pitch, tempo, harmony, timbre, as well as editorial, textual, and bibliographic facets (Downie, 2003). Some of these music features can be extracted directly from the raw audio data. For instance, the Beats per Minute (BPM) measure for tempo can be reliably extracted by calculating the differences between recurring rises of amplitude in the audio signal (Dixon, 2001). Such features obtained directly from the audio data are called low-level features. The other, more abstract high-level features, such as mood or genre classifications, are usually created through combinations of low-level features, which in turn are learned through machine learning algorithms. For instance, support vector machines (SVM) can be administered to a collection of manually labeled ground-truth data (e.g., songs that are labeled to be sad or not-sad). The SVM then “learns” a linear classifier that is able to distinguish between sad and not-sad songs. The accuracy of these attributions can be examined using a validation dataset that has not been used during the training of the SVM. The ESSENTIA audio analysis library (Bogdanov et al., 2013), which we use in this study for music analysis, provides pre-trained SVM for various music mood classification tasks, all of which achieved satisfactory classification accuracies (for an overview, see Bogdanov, 2013).
We aim to use these computationally extracted music features for the assessment of music preference. To that end, we use music playback events as an indicator for preference. However, measuring music preference from playback statistics differs from direct assessment. First, it is not clear if playback events indicate direct preference for the music. People might sometimes listen to songs that do not fit their preferences (for instance, when scouting for new music), or they might use streaming services for background music, and thus just tolerate the music but do not actively prefer it over other songs. However, we can still assume that most songs a user listens to at least do not oppose their preferences, and it is more likely that a person likes the songs they play back on music streaming services than not. Second, playback statistics lack negative feedback, that is, songs the user did not listen to. Lack of positive preference data could be interpreted as negative feedback (e.g., Jung, Hong, & Kim, 2002), but this is only viable for a very limited stimulus pool, as there is a virtually infinite number of music pieces a user did not listen to. Another form of negative feedback could include songs that were presented frequently but have been skipped every time (see Jung, Hong, & Kim, 2005). These data were unfortunately not available to us. However, even if sparse positive-only feedback might lead to less pronounced preference profiles, they should point in the right direction, and thus should be able to capture valid aspects of music preference.
Method
Music listening data
For our study, we used the data from the Million Song Dataset (MSD; Bertin-Mahieux, Ellis, Whitman, & Lamere, 2011). 1 The MSD is a collection of metadata and high- and low-level audio features for exactly one million songs. The MSD project offers complementary datasets, including one on user listening data (the Taste Profile subset 2 ). This subset features over one million unique users who listened to over 380,000 unique songs, resulting in more than 48 million user-song-playcount datapoints (Bertin-Mahieux et al., 2011). The dataset does not report how long a user had to listen to a music piece in order to count as a playback, and also doesn’t state when the data were collected or any details on the sociodemographic properties of the sample and the users. The source of the music streaming statistics is also not disclosed, although the company providing the data, TheEchoNest, performed music-feature analysis for major music streaming providers and has been acquired by Spotify in 2014 (The Echo Nest, 2014). The playback statistics are long-tail distributed at the song level, meaning that relatively few songs have been listened to very often (McFee, Bertin-Mahieux, Ellis, & Lanckriet, 2012). The playbacks per user were more evenly distributed (for a more thorough description of the dataset, see McFee et al., 2012).
The audio feature data that come with the original MSD is sparse compared to the output of modern music analysis software like ESSENTIA (Bogdanov et al., 2013), which we successfully used in previous studies (see Fricke et al., 2018). While other researchers enhanced the available feature data by providing further low-level analysis results from several music analysis software libraries (e.g., Schindler, Mayer, & Rauber, 2012), we wanted to ensure comparability of the results with our preceding research. We therefore decided to enhance the MSD ourselves by analyzing the songs from the Taste Profile subset with the ESSENTIA software library (Bogdanov et al., 2013). ESSENTIA was preferred over other music analysis software because it offers out-of-the-box classifiers for high-level musical features, such as genres, moods, and mood clusters.
First, we filtered the playcount data provided by the MSD for unique songs. We looked up the metadata of the resulting 384,546 songs in the MSD and filtered the artist, name, release, as well as an identifier string for the release and track on the 7digital.com website. The 7digital.com is an online music provider that offers mp3 audio downloads. The service also offers the download of preview snippets for a great number of songs. By looking up the respective IDs on the website, we were therefore able to gather audio excerpts for many of the songs of the MSD, as has been done in other research before (e.g., Schindler et al., 2012).
Some of the songs in the MSD were no longer available on 7digital.com. Also, we excluded 599 song snippets that had a length below 25 s. In total, we collected 203,717 audio snippets (53.0% of all unique songs in the Taste Profile subset) with a length of 30 (61.7%) or 60 s (37.0%; 1.3% had different lengths of over 25 s) and sample rates of 22,050 (44.2%) and 44,100 Hz (55.8%). We then proceeded to analyze the excerpts with ESSENTIA.
ESSENTIA provides classifiers for high-level music features, which we conceptually divided into three categories. The mood classifiers try to capture mood-related features of a song, such as aggressive, happy, or relaxed. ESSENTIA’s mood clusters do the same thing, but group multiple moods together to an overarching theme. The sound-related classifiers capture elements such as the average loudness, the danceability (as indicated by a pronounced, recurring beat), or if a song is instrumental or not.
The distributions of the extracted binary high-level music features are visualized in Figure 1. As displayed, most features are centered around the 50% probability, with the exception of Instrumental, Aggressive, and Acoustic with lower mean probabilities of occurence, and Tonal and Bright Timbre with higher mean probabilities of occurence.

Violin plot of the distributions of the computed music features.
Subsequently, we integrated the playback statistics for each user with the audio analysis data by weighing the songs according to their playback statistics. For instance, when a user listened to a song with Danceability = .80 two times, and to another song with Danceability = .40 three times, we would multiply the Danceability measures by their playback counts and divide them through the total playback count, resulting in a Danceability preference of
Overall, the Taste Profile subset collected data of 1,019,318 users. We only included the songs we had access to, and further eliminated 19,094 songs where the authors of the MSD suspected matching errors (see Bertin-Mahieux, 2012). Through this reduction, some users ended up not having listened to any songs we had data on. These users were excluded from further analysis. We also excluded 10,522 users who only listened to one song. In total, our final dataset contained weighted music preferences for 1,006,725 users, who listened to an average of 25.8 songs
Analysis
The Taste profile subset only contains playback statistics but no other data on the users. We therefore focused on replicating the component structure of individual music-feature preference in the large MSD sample by performing a principal components analysis (PCA). Furthermore, we examined the inter-score correlations to see how the individual preferences for the components relate to each other.
The probabilities for ESSENTIA’s five mood cluster classifiers sum to one. Because the weighting of the songs by playback count preserves the classifier relationships within each song, this leads to perfect multicollinearity, that is, one of the mood clusters is perfectly described by the other four. To continue our analysis, we hence excluded the third mood cluster (Literate, Brooding) from the PCA. We chose the third cluster because it showed the largest correlation within the set of mood cluster with the fifth mood cluster,
Results
PCA
The Kaiser–Meyer–Olkin measure of sampling adequacy was high with 0.87, and Bartlett’s test of sphericity was significant,
The first component explained 38% of the variance, the second component 16%, and the third component 15%. Overall, 69% of the variance was explained. The first component loaded on items such as Aggressive,
Component loadings for measured music-feature preference using computer-extracted music features.
Primary component loadings are presented in bold typeface.
Mood clusters (MIREX) (Hu & Downie, 2007):
Cluster 1: Rowdy, Rousing, Confident, Boisterous, Passionate.
Cluster 2: Amiable/Good-Natured, Sweet, Fun, Rollicking, Cheerful.
Cluster 3: Literate, Wistful, Bittersweet, Autumnal, Brooding, Poignant.
Cluster 4: Witty, Humorous, Whimsical, Wry, Campy, Quirky, Silly.
Cluster 5: Volatile, Fiery, Visceral, Aggressive, Tense/Anxious, Intense.
The first two components clearly match those found in previous research (e.g., Fricke et al., 2018) and were named Arousal (relating to high-energy songs) and Valence (relating to songs with a positive mood). The third component is less distinct and only partly matches the Depth component of previous research. We decided to still name it that way, but address this issue in the discussion.
The individual scores on the three components showed small correlations. The Arousal component correlated with Valence,

Plot of the scores and inter-component correlations for a random subset of 10,000 users.
Discussion
We used big data from natural music listening environments to explore the underlying organization of musical preferences in millions of users. The observations were based on people’s actual listening habits providing an ecologically valid approach to studying musical preferences. The findings confirmed prior theory and research with big data to show that three broad dimensions underlie people’s music-feature preferences: arousal, valence, and depth (AVD).
The arousal and valence components found from our analysis clearly match those of previous research (Fricke et al., 2018; Fricke & Herzberg, 2017; Greenberg et al., 2016). The arousal component describes preference for high-energy music, that is, music that is loud, rousing, and does not comprise acoustic music. The valence component captures songs that are perceived as fun, cheerful, and often danceable. The depth dimension is less distinct. Previous research clearly found a dimension relating to cerebral elements, for example, relating to preference of intelligent, complex, and reflective music. Such specific cerebral attributes are not available as classifiers in the ESSENTIA software library. The current results indicate that the third factor loads on attributes such as electronic, instrumental, not tonal, and not happy. While previous research attributed ESSENTIA’s depth factor also to instrumental and not tonal music (Fricke et al., 2018), happy was originally not associated with this factor. The factor seems to capture dark and complex electronic music, which is related to the idea of the original depth component, which was argued to capture intellectual and emotional depth. The result complements prior work which showed that systemizers preferred both complex music features from avant-garde classical music and features from heavy metal which can include dark and electronic features of varying complexities (Greenberg, Baron-Cohen, Stillwell, Kosinski, & Rentfrow, 2015). Thus, the third component does indeed include features related to the original concept of depth, with slight differences to the results from preceding research. This slight discrepancy should be addressed in future research and could be mitigated by introducing additional classifiers targeting these cerebral features.
This study was hindered by several limitations. First, psycho-demographic data for the users were not available; therefore, observations about how the three feature dimensions link to key individual differences such as gender, age, and personality were not able to be made. Thus, we are not able to estimate how we can generalize our results to the general population. While the amount of users would suggest that most characteristics should be normally distributed, a specific selection process underlying the data might foil these conclusions. Future research might help with this issue by examining these characteristics for a different sample and reporting the feature preference distributions. If they are similar to those of this study, we might be able to infer that generalizations drawn from this large sample are valid after all.
Second, the number of features extracted for each of the songs was relatively small when compared to prior scientific research (Greenberg et al., 2016) and industry-based feature extraction platforms (e.g., Gracenote). Future research should explore the facet and hierarchical nature of the AVD model by enhancing the set of features available for analysis.
The AVD model of music-feature dimensions has now been replicated three times using mixed methodologies and sampling raising three important questions for future research. First, does the AVD model replicate in non-Western music? Second, does the same three component structure emerge for creative art mediums outside of music, including art, literature, film, and television. Third, given recent evidence of the biological underpinnings of musical preferences (Doi et al., 2018; Wilkins, Hodges, Laurienti, Steen, & Burdette, 2014; Wallmark, Deblieck, & Iacoboni, 2018), to what extent is the AVD model rooted in biology compared to culture?
This research made extensive use of machine-learned music-feature classifiers. While the existing classifiers generally show satisfactory accuracies, they are not as diverse and detailed as human ratings. It’s not sure if more nuanced music features can be accurately learned with current music classification algorithms. However, we showed that the existing classifiers do find the AVD model. Therefore, it should be possible to create classifiers targeting these three dimensions specifically. The creation of such classifiers built on the robust AVD model could potentially lead to a drastic simplification of music categorization and thus yield interesting applications in research and business alike.
