Abstract
Music is both universal, appearing in every known human culture, and culture-specific, often defying intelligibility across cultural boundaries. This duality has been the source of debate within the broad community of music researchers, and there have been significant disagreements both on the ontology of music as an object of study and the appropriate epistemology for that study. To help resolve this tension, I present a culture-cognition-mediator model that situates music as a mediator in the mutually constitutive cycle of cultures and selves representing the ways individuals both shape and are shaped by their cultural environments. This model draws on concepts of musical grammars and schema, contemporary theories in developmental and cultural psychology that blur the distinction between nature and nurture, and recent advances in cognitive neuroscience. Existing evidence of both directions of causality is presented, providing empirical support for the conceptual model. The epistemological consequences of this model are discussed, specifically with respect to transdisciplinarity, hybrid research methods, and several potential empirical applications and testable predictions as well as its import for broader ontological conversations around the evolutionary origins of music itself.
At some point, the adage that music and mathematics are intrinsically connected passed into the popular lexicon. When a mathematician is revealed to be a skilled musician, it elicits little surprise. On occasion, this connection is explained by referencing the intrinsically logical nature of mathematics that has made it a favorite mechanism for speculative fiction writers and researchers in the search for extraterrestrial intelligence alike to facilitate conversation between humanity and other species. This perception of mathematics as irreducible truth, or a kind of universal ontology, is then presented as analogous to a similarly popular perception of musical structure as equivalently logical. As if to bolster this argument, music has frequently been referred to as a “universal language,” an appellation often attributed to Longfellow (1835) that has survived to the present day, as evidenced by a recent spate of press releases and news articles proclaiming that new studies had demonstrated music’s universality (Reuell, 2018; Wida, 2020). Several aspects of music have been identified across cultures, implying the existence of some statistical universals of music (Mehr et al., 2019; Savage et al., 2015), particularly child-directed song such as lullabies (Trehub et al., 1993). Music has even been wielded as a message to hypothetical extraterrestrial encounters in the Voyager Golden Records (Chua & Rehding, 2021).
But in music’s case, this has the same internal logic as asserting that language itself is a universal language: Although music is everywhere, musics are often not mutually intelligible or even mutually recognized as music (Bohlman, 1999). Even the studies referenced in those press releases emphasize the ubiquity of some recognizably music-like phenomenon and certain function-specific similarities, not mutual intelligibility (Mehr et al., 2019), leaving ample room for significant differentiation between musics. This specificity is due to music’s status as a cultural product, or a “public, shared, and tangible [aspect] of culture” (Morling & Lamoreaux, 2008). Indeed, contemporary musicology often leverages music to investigate social or historical conditions (Casadei, 2016; Frolova-Walker, 2016; McClary, 1991), provide windows into distinctive features of other cultures (Fox, 2004; J. H. McDermott et al., 2016), or assess political climates (Rossman, 2004; Shepler, 2010). Recently, scholars have begun explicitly seeking cross-cultural approaches to music cognition as well (Jacoby et al., 2020), and ethnomusicologists increasingly argue against the perceived universality of a concept of music recognizable to Western scholars or interlocutors (Wald-Fuhrmann et al., 2021). In every case, they problematize the view of music as truly universal, although these discussions make far less headway in the popular press.
However, such a construction often situates music as a passive reflection of cultural norms and practices, a sort of measure or index for culture. This stands in stark contrast to the way culture is conceptualized in cultural and developmental psychology as cointegral with the psyche. In this view, cultural products do not simply reflect cultural norms but actively shape them (Koopmann-Holm & Tsai, 2014; Markus et al., 2006), influencing both the outsider’s perception of a culture and the internal states of that culture’s members. These scholars emphasize the bidirectional causality of nature and nurture, or of the individual and their environment, defining individuals as initially indiscriminate learners whose continued learning is shaped by their prior knowledge. Cultural products such as music, then, are crucially important in actively shaping both individuals’ psyche and their culture.
In this article, I argue that this idea—that music is fundamentally cointegral with culture, not a passive product of culture—should be centered in music scholarship, particularly work connecting music to other products or capabilities. To this end, I propose a specific ontological model of music as a key mediator between cultural patterns and cognitive capacities. This transition essentially defines music in terms of its functional role in the mutually constitutive cycle of individuals and environments rather than as a collection of specific artifacts or characteristics that can vary widely across or within cultures, de-emphasizing statistical universals in favor of functional universals. Although this is not itself an evolutionary theory of music, such an emphasis on the functional aspect of cultural products implies that an ontological theory of music driven by mutually constitutive ideas of culture and cognition must, in some sense, grapple with music’s evolutionary provenance.
Competing Theories of Musical Evolution
The evolution of music and musicality, which Honing (2018) defined as the biological capabilities that enable the creation and consumption of music, have long been topics for debate. Pinker’s (1997) offhand determination that music was “auditory cheesecake,” which was emblematic of a general hypothesis that music is a nonadaptive by-product of other evolved capacities, may have brought the debate, which had long occupied music scholars, to the scientific and popular mainstream and provoked several responses. Cross (2001) argued that music was crucial to the evolution of the human mind and that it is particularly relevant to humankind’s success as a social, communal species and the emergence of culture (Cross, 2008). Patel’s (2010) “transformative technology of the mind” theory itself evolved from an initial assertion that music is a “biologically powerful human invention” to an integration with the emerging concept of gene-culture coevolution (Patel, 2018), inspired in part by the work of theorists such as Killin (2016), who argued for blurring the distinction between a technology and an adaptation. Other theorists have argued that music is an evolved mechanism for navigating sexual selection, akin to avian plumage displays, by which women choose mates on the basis of males’ musical performance (Charlton, 2014; Miller & Todd, 1998; Miranda et al., 2003; Van Den Broek & Todd, 2009). Specific aspects of music, most notably rhythm perception and entrainment, have also been independently theorized in evolutionary terms (Patel, 2006; Patel & Iversen, 2014). The result is a sprawling mélange of theoretical frameworks that occasionally contradict but most often simply coexist.
A pair of recent articles attempted to synthesize these and other views on music’s evolutionary origins in two contrasting hypotheses. In their hypothesis of music and social bonding (MSB), Savage et al. (2021) proposed that modern music is the product of an iterative process of gene-culture coevolution. Specifically, they argued that music evolved as a series of separate “proto-musical components,” each of which had a positive impact on social bonding that opened the potential for further incremental adaptations. They described this process as a “virtuous spiral” of beneficial adaptations enabling other beneficial adaptations. Meanwhile, Mehr et al. (2021) argued that music evolved as a credible signal (i.e., a signal that can be trusted as conveying some information because of either its causal connection to that information or the cost associated with producing the signal) for coalition size and strength and for parental attention. Unlike the MSB theory, the credible-signaling approach focuses primarily on the question of adaptation, arguing that music meets the criteria for a Darwinian adaptation by emphasizing added survival value of specific music-enabled capacities.
Both theories are compelling and well supported by evidence in a variety of domains. Many of their associated commentaries addressed their strengths, including the MSB hypothesis’s discussion of groove (Ashley, 2021) and explanatory power for the development of mother-child interactions (Dissanayake, 2021), the credible-signaling theory’s clearly adaptationist framework (Pinker, 2021) and potential support from existing experimental frameworks such as the signaling-game model (Lumaca et al., 2021), and both hypotheses’ potential implications for the neurobiology of developmental disorders (Kasdan et al., 2021). Other commentaries have offered critiques, including defenses of the by-product theory of musical evolution (Lieberman & Billingsley, 2021; Stewart-Williams, 2021), calls to shift the terms of the theorized selection processes (Atzil & Abramson, 2021; Eirdosh & Hanisch, 2021), or an overall hesitance regarding theories of musical evolution claiming broad, unitary, explanatory power (Harrison & Seale, 2021).
However, the most convincing and common critiques of both models contend in some way with their shared search for universality. Both proposals are searching for universals in music’s provenance and therefore aim to account for potential variation across cultures to reveal the structural, cognitive, or physiological commonalities associated with musicality. Although both approaches succeed in this pursuit and provide extraordinarily fertile ground for further study (Margulis, 2021), as many commentaries have pointed out, they share two crucial pitfalls. First, they overspecify music in terms of its peculiar modern and primarily Western provenance (Brown, 2021; Cross, 2021; Wald-Fuhrmann et al., 2021) while making claims over evolutionary time (Honing, 2021; Iyer, 2021) that underemphasize the short-term effects of environment on cognition (Hannon et al., 2021) that are increasingly recognized in developmental psychology (Sameroff, 2010). Second, they disregard the need to engage with variability in music across cultures as ontologically or etiologically relevant (Eirdosh & Hanisch, 2021; Patel & Von Rueden, 2021) and with the particular relationship between culture and cognition (Scott-Phillips et al., 2021).
The approach presented here contends with these issues by changing the level of analysis, largely abandoning the neural and physiological realm to focus directly on music’s interaction with culture as increasingly portrayed within cultural psychology. In so doing, I provide a theoretical framework for investigating precisely the cross-cultural variations in music-making, collective or individual, Patel and von Rueden (2021) highlighted while shifting the focus away from what Iyer (2021) indicated are circularly defined concepts of “music” and “musicality.” This builds on a logical foundation defining music in terms of its function rather than its form, essentially analyzing what Iyer described as experiences that “might have felt like music” and addressing Cross’s (2021) argument against cleaving music off from the rest of human experience. I then discuss extant evidence for my approach to music and conclude with empirically falsifiable predictions and both ontological and epistemological implications of the model. This theory does not explain the full panoply of musical experience, nor is it a theory of music’s evolutionary origins. In addition, it results in a potentially quite broad ontology of music, partly to address Wald-Fuhrmann et al.’s (2021) concerns about the specificity of how both MSB and credible-signaling define music and musicality. Like all models, including MSB and credible signaling, it is incomplete, amid its own ongoing process of evolution, but useful nonetheless.
Music as a Culture-Cognition Mediator
The model of music I propose here is built on a foundation of existing theories of music as a mediator. From Adorno’s (2002) dialectics to Born’s (2005) concept of music as a complex, multifaceted object that mediates multiple disparate social phenomena and the actor-network theories of Latour (1996, 2005), Hennion (2003; Hennion & Muecke, 2016), and Piekut (2014), framing music as a mediation has been very well established. These approaches to musical mediation are particularly focused on social, historical, or anthropological mediations and are principally concerned with the ways music stitches together a constellation (to use Adorno’s term) of human and nonhuman actors. The model presented here builds on this foundation, applying similar analytical techniques on a similar network structure while situating music explicitly as a mediator among cognitive capacities and cultural abstractions. In doing so, I hope to expand on Born’s (2011) examination of the materiality of music—and the related materiality of culture—by focusing on the nonmaterial and not necessarily material aspects of each. As a result, although I engage heavily with questions of the social, my frame of reference is psychological rather than anthropological.
I propose that musical features are culture-specific instantiations of universal cognitive constraints that are themselves observable in both musical and nonmusical contexts. These cognitive constraints include but are not limited to temporal prediction and pattern detection, short-term and working memory, autonomic arousal, and perceptual classification. Furthermore, exposure to music alters both listeners’ internalized cultural norms and the parameters of their cognition. A schematic of this model is shown in Figure 1.

The culture-cognition-mediator model.
This model is effectively an extension of the mutual constitution of cultures and selves (Markus & Kitayama, 2010). Because music is a cultural product that is frequently very culturally specific in its features, those features themselves are both dependent on and formative of the cognitive priors that makes them effective. In other words, music’s psychological impact makes use of cognitive schema that music constructed. This results in a causal loop in which prior musical exposure shapes individuals’ response to new musical stimuli, which, in turn, alters their interpretation of their prior music exposure. It also situates music and other cultural products as mediators between culture and cognition, providing a concrete mechanism by which cultural norms can influence cognitive constraints and vice versa. In so doing, cultural products such as music allow learned, culture-specific parameters to, over time, become internalized as cognitive precepts themselves. For instance, memory for familiar or native musical styles is enhanced with respect to unfamiliar ones (Demorest et al., 2008), and recognition of culturally familiar changes in tonality is faster and more accurate than the recognition of culturally unfamiliar ones (Raman & Dowling, 2017).
By framing musical features as instantiations of cognitive facilities that are not music-specific, it may seem that I run the risk of defining music out of existence as a distinctive phenomenon in human society. However, this is, in truth, nothing more or less than a natural extension of the constructivist views that are increasingly common across the social and psychological sciences. In fact, it is a hybrid of developmental constructivism, with its focus on the ways individuals acquire and interpret knowledge (Gelman, 2009; Sameroff, 2010), and constructionist views of cognitive constructs, such as emotion, which posit that they are formed from the interaction of more fundamental psychological building blocks (Gendron & Feldman Barrett, 2009; Lindquist et al., 2013, 2015). In addition, framing music as a construct of cognitive, social, and cultural parameters does not make it any less unique; it simply ties music firmly to the operation of human society and the human brain. If music were not so constructed, then it would not be as efficacious as it is.
The musical-inference structure
Situating music at this particular intersection implies an inferential structure always incorporating both low-level sensory processes and cultural priors. This is because each musical feature is fundamentally an inference or decision made about the music. Seen this way, the distinction between relatively low-level musical features, such as dissonance, and more complex ones, such as allusion, can be implemented as differences in the weighting between immediate sensory data and learned priors. Crucially, priors are always a factor in these decisions; even concepts with some psychophysical justification, such as dissonance and octave equivalence, are music- and culture-contingent. Artists and creators implicitly invert this inferential process when crafting their musics, attempting to create a piece that leads to particular inferences on the part of the audience.
This process can itself be represented as its own cycle, drawing on Lerdahl’s (1992) theory of separate grammars, or sets of priors, for a music’s producer and its perceiver and synthesizing it with McClelland’s complementary learning-systems theory (Kumaran et al., 2016; McClelland, 1998; McClelland et al., 1995). The result is the system shown in Figure 2. Although the process depicted here is broadly similar to that described in the original schematic of the culture-cognition-mediator model, it serves to highlight two crucial aspects of this approach that the simpler schematic avoids. First, it explicitly separates the inference process of a music signal’s production and the inference process of that signal’s perception. This separation emphasizes the crucial role of music as a communicative mediator in the kinds of social interactions necessary for the development of culture and specifies that these interactions cannot happen without some medium, such as music. Second, it emphasizes the Bayesian foundation of the culture-cognition-mediator model by locating the inference at the intersection of short-term-memory and long-term-memory systems. By doing so, this representation firmly situates the culture-cognition-mediator model within broader disciplinary dialogues surrounding both the communicative role of the musical work (Cross, 2014; Feld, 1984) and the combined cognitive and contextual situatedness of inference and decision-making (Constant et al., 2021; Tomasello et al., 2005; Veissière et al., 2020).

Inference structure in the culture-cognition-mediator model (adapted from Fram, 2021).
Note that this inference process is relatively agnostic to the nature of the delineation between the production and perception of a musical signal. In contemporary Western music, for example, the producer may be a composer or performer, and the perceiver may be an audience member, whereas in a participatory musical style, each participant would simultaneously be the producer and perceiver of different musical signals.
Musical features
One notable facet of this model is how it defines musical features. Musicology scholarship has typically regarded musical features as properties of the music itself, ranging from low-level auditory characteristics, such as pitch, volume, or timbre, to higher-level features, such as melody, harmony, phrasing, and orchestration. These higher-level features are essentially patterns of low-level auditory parameters that change over time. Some theorists have defined music in terms of these patterns, such as Edgard Varèse’s rhetorical question, “What is music but organized noises?” (Varèse & Chou, 1966, p. 18). However, this definition relies on the ability to recognize this organization or to detect a pattern in a sequence of noises. This capacity is, cognitively speaking, two inferences about the sound: first, that it is organized, and second, the nature of that organization. Such a process applies to any musical feature above the level of pure auditory data, such as pitch, volume, and timbre, and even those may be inferences that are somewhat sensitive to learned patterns (Wong et al., 2012).
Situating what musicology and music theory consider “musical features” as the outcome of a cognitive inference erases the functional distinction between them and what have generally been considered either extramusical features or psychological responses to music. These include aspects such as the sociocultural function, genre, perceived or induced emotion, and preference. All of these are, in different ways, inferences derived from the application of learned schema about music to auditory data. The only distinction between these and canonical musical features is the nature of the schema, but particularly because the contextualized nature of music is increasingly acknowledged in music scholarship, the barrier between inferences about a piece of music and inferences about its context, emotional content, or extent to which it is liked has become increasingly arbitrary. This is especially true when considering the inferential process itself, which is largely identical across these categories. Therefore, the culture-cognition-mediator model dispenses with this distinction entirely, instead defining musical features as the outcomes of such cognitive inferences given learned, culturally variable schema.
Defining culture
Cultural psychology relies on a particularly dynamic ontology of culture as mutually constitutive with the self (Markus & Kitayama, 1991, 2010). Adams and Markus (2001) drew on Kroeber and Kluckhohn’s (1952) definition of cultures as “explicit and implicit patterns of historically derived and selected ideas and their embodiment in institutions, practices, and artifacts” (p. 357). To them, this formulation of culture preserves the fluidity necessary to avoid the trap of reification while retaining the dialogic properties they found so attractive in Hermans’s (2001) conceptualization. They also acknowledged an irony of interdisciplinarity: Just as they were proposing theoretical frameworks for the explicit study of culture, anthropology—a field that, as Shweder (1999) pointed out, is deeply intertwined with cultural psychology—was moving away from exactly such a formal analysis.
Computational approaches to social clustering have converged on similar frameworks. Some approaches have relied on an underlying clustered social structure akin to small-world networks (Girvan & Newman, 2002; Watts & Strogatz, 1998), whereas others, including Granovetter’s (1973) study of the role weak ties between individuals play in knitting together networks and Breiger’s (1974) analysis of structural roles in social networks (Lee & Martin, 2018), have emphasized the ways in which such clustered network topologies create different social roles for individuals to fill. However, these theories are all descriptive, aiming to analyze mature networks that resemble those observed in real social interactions.
Within the last few years, social-network theorists have begun devising models that can generate social clustering. Most relevant among these for my purposes is the associative-diffusion model, in which individuals alter their associations among values, characteristics, practices, and products on the basis of the associations of their connections and modify the strength of their connections on the basis of similarities and differences in their associations (Goldberg & Stein, 2018). This algorithm has been shown to generate clustered, small-world structures even when starting from unbiased connections and associations. The analogies between associative diffusion and cultural psychology’s mutually constitutive approach to culture are clear. Groups of people are not determined a priori but are, instead, associated with emergent patterns of associations between cultural products and practices.
Note that the association vectors produced by associative diffusion are numerical manifestations of cultural patterns meeting the standard laid out by Adams and Markus (2004). In this sense, associative diffusion is a computational operationalization of the mutual constitution of culture patterns and selves that produces graph structures similar to those observed in real-world social networks. In the culture-cognition-mediator model, culture is conceptualized as an emergent, dynamic set of patterns or associations derived from valenced social interactions that, in turn, exerts causal influence on the nature and interpretation of those social interactions themselves.
Specifying cognition
Note that at first glance, the culture-cognition-mediator model positions music between a component of individuals’ environment (culture) and a component of their biology (cognition). By doing so, it remains unclear where, precisely, the self is located. However, this mismatch is due to a lack of specificity in how cognition is defined. Contemporary approaches to cognition increasingly treat it as a much broader phenomenon than simply the actions of neural populations. Some modern philosophies of cognition explicitly account for phenomena such as embodied, extended, or social cognition as vital components of the broader cognitive process that are inextricable from neural activity (Clark, 2009), although the more radical of these extensions have been critiqued for their overgeneralization and implausibility (Wilson, 2002). This culture-cognition-mediator model follows in that vein of thought, framing cognition as the totality of conscious and unconscious processes, neurological and otherwise, that enable an individual to gather and interpret knowledge. Under this view, cognition is treated as though it comprises processes rather than the entities or substrates that facilitate them.
Even such a definition does not encapsulate the full panoply of processes and features that make up the self. However, it is not intended to. My argument here is not that music mediates the entire cycle of mutual constitution; rather, I claim only that music mediates part of it.
Evidence for Music as a Culture-Cognition Mediator
By construction, this model posits that music facilitates two broad causal effects: the ways in which the self, particularly cognitive aspects of the self, influence the patterns of culture and the ways in which those cultural patterns influence the cognitive aspects of the self. Although additional research explicitly grounded in this culture-cognition-mediator model is necessary to test nuances of the theory, there already exist ample data supporting music’s role in both directions of this mutually constitutive process.
From culture to cognition
Music’s broad conceptualization as a “technology of the self” (DeNora, 1999) is strikingly similar to the way music is framed in this half of the mutually constitutive cycle and has led to research establishing several core aspects of musical functionality. One of the most commonly cited functions of music is mood regulation. There is extensive evidence both that music is used for mood regulation at all stages of development (Saarikallio, 2011; Saarikallio & Erkkilä, 2007; Saarikallio et al., 2013) and that the specifics of this usage vary between cultures (Saarikallio et al., 2021). Furthermore, there are a variety of strategies for using music to cope with stressful external events, such as the COVID-19 pandemic (Carlson et al., 2021), and some of those strategies have been shown to have negative effects on mental health (Carlson et al., 2015). Much music listening has been shown to be positively tied to well-being, particularly when the music was chosen by the listener, even when it was secondary to some other activity (Sloboda et al., 2001). A meta-analysis of findings concerning musical functionality proposed that the primary functions of music are internally focused, aligning well with the use of music to mediate from cultural patterns to cognitive functions and states (Schäfer et al., 2013). However, this research generally operates on established cognitive parameters and does not reflect ways in which music mediates culture’s lasting impact on the self.
Perhaps the most fruitful strain of research providing evidence for music-mediated impacts of culture on the cognitive self concerns infant-directed singing. There appear to be acoustical and formal traits that identify infant-directed singing across cultures (Mehr et al., 2018; Trainor et al., 1997) as well as results indicating consistent infant responses to lullabies from unfamiliar cultures (Bainbridge et al., 2021). However, there is also evidence that infants prefer familiar songs, implying that the songs they are exposed to early on shape their preference behaviors (Kragness et al., 2022). In addition, familiarity with a piece of infant-directing music helps ease infants’ distress (Cirelli & Trehub, 2020), providing more evidence for interplay between musical exposure and underlying cognitive processes in infants. This musical exposure can vary widely between families (Ilari, 2005), and this lack of consensus on what entails “appropriate” music for infants implies a direct impact of parental beliefs, shaped by their own internalized cultural patterns, on the content of infant-directed singing. Some evidence indicates that cultural variation in musical environments may lead to altered rhythmic behaviors in infants (Ilari, 2015), although this is difficult to causally assess. There is also evidence for a primary role of familiarity in music-evoked emotions among adults (C. S. Pereira et al., 2011; Schäfer, 2016). As a result, these findings indicate a direct tie between culture and cognition that is mediated through specific musical selections.
Familiarity and predictability are related but distinct concepts. Whereas familiarity implies knowledge of a specific piece of music, predictability is a generalization beyond those musics that have been directly experienced by a listener. Prediction is a cognitive capacity that in the musical domain relies on both regularity or recurrence and more complex representations of musical structure, which are learned as patterns in music an individual experiences. In the vocabulary established in this article, this implies that there are aspects of musical predictability that are cultural in origin. This form of prediction is also observable in infant-directed singing. Infants as young as 2 months have demonstrated the ability to discriminate between similar melodies (Plantinga & Trainor, 2009), and there are known to be differences between how easily adults and children detect melodic alterations that stay within a tonal context (Trainor & Trehub, 1992), implying that the cognitive skills associated with deviance detection are contingent on learned, culture-dependent parameters.
Neurocognitive data have established a long and complex developmental arc of the acquisition of higher-order musical syntax. By the age of 11, children with regular exposure to musics of tonal derivation (e.g., based on Western common practice) were able to make quality judgments about chords faster given a strong tonal-harmonic priming context regardless of musical training, whereas 8-year-olds were faster only if they had musical training (Schellenberg et al., 2005). Musical training has been shown to increase activity in cortical regions associated with linguistic and auditory processing when listening to harmonic sequences (Koelsch et al., 2005), and musical ability, irrespective of formal training, predicts language ability (Swaminathan & Schellenberg, 2020). Other studies have shown that different aspects of musical structure are enculturated at different paces and that behavioral evidence of musical enculturation lags neural evidence (Corrigall & Trainor, 2014, 2019). Likewise, although neural responses associated with music-syntactic violations are observable as early as age 30 months, neural signatures of harmonic integration are still absent (Jentschke et al., 2014). These demonstrated impacts of musical experience and enculturation on neural and behavioral processing that unfold throughout the course of child development offer even more precise evidence that music facilitates the impact of culture on cognition.
Among neurotypical adults, another clear instance of this direction of mediation concerns musics intended for specific kinds of group behaviors and contexts, such as political campaigns and religious ceremonies. Campaign songs are chosen to create associations between songs or artists, and all their attached affordances, and the politician or issue in question. Music has been associated with American political campaigns since the country’s earliest days (Schoening & Kasper, 2012). The title of Schoening and Kasper’s (2012) book, Don’t Stop Thinking About the Music, is itself a riff on Bill Clinton’s use of the Fleetwood Mac song “Don’t Stop (Thinking About Tomorrow)” in his 1992 campaign for the U.S. presidency, a usage that is a testament to the indelibility of that song’s connection to Clinton himself.
The use of music in political campaigns has been framed as a deliberate attempt to interface with constituents’ cultural identities and self-concepts with an aim to alter their interpretation of or attitude toward the candidate (Patch, 2016). Sometimes, this linkage has different effects depending on preexisting cultural understandings of particular songs; for instance, the dissociations between Donald Trump’s politics and his use of songs such as The Village People’s “Y.M.C.A.” have been frequent sources of material for late-night comics, whereas music critics have referred to his adoption of The Beatles’s “Revolution” and Neal Young’s “Rockin’ in the Free World” as examples of those songs as an empty signifier, 1 or a concept with a subjectively determined meaning (Richards, 2016); incidentally, this explicitly follows on from Zizek’s (2007) assertion that Beethoven’s “Ode to Joy” has been used to celebrate such a baffling range of causes as to have lost intrinsic meaning.
Recent research has proposed that candidates rely on musically congruous songs, Trump’s notable stylistic disparity notwithstanding, although songs that are musically unlike the rest of the candidate’s playlist but reinforce or support their political ideology can have positive effects on their candidacy (Johnson et al., 2021). In every case, musical choices carry a clear intentionality: to draw on some enculturated understanding of a song’s affordances and implications to alter public understanding of the person being associated with it. In fact, it may be that Richards (2016) and Zizek (2007) are mistaken in their assertion that such jarring mismatches between the individual or cause and the music are instances of empty signification; instead, it may be closer to an appropriative act aimed at using the cultural significance of a musical piece to alter constituents’ cognitive responses to a candidate, political entity, or ideology. In other words, musical pieces are neither empty nor equivocal signifiers, but the use to which that signification is put is inconstant and tailored to individuals’ agendas.
From cognition to culture
Finding evidence of music-mediated impacts of cognition on culture is slightly harder, in part because culture is a difficult thing to directly assess. However, researchers can build on the basic insight driving associative diffusion—that people’s patterns of beliefs and behaviors tend to mirror the patterns of people in their social circles—to investigate the related developmental phenomenon of prosocial behavior. There is extensive evidence that making music together, particularly when this joint music-making involves physical entrainment or synchronized movement, leads to increased cooperation and heightened prosocial behaviors and empathy in young children (Rabinowitch et al., 2013; Rabinowitch & Knafo-Noam, 2015; Rabinowitch & Meltzoff, 2017). Such increased positive interaction between individuals can lead to stronger social ties (Morelli et al., 2017), which, in turn, contributes to increased sharing and mutual adaptation of cultural norms.
This particular facility of music has also been applied in therapeutic contexts. Neurologic music therapy is consistently related to improved outcomes in stroke rehabilitation (Thaut & McIntosh, 2014); improvements have been shown across motor function, emotion, and social communication (Fujioka et al., 2018). Other age-related disorders, such as dementia (O. McDermott et al., 2013; Särkämö et al., 2014; Sihvonen et al., 2017) and Parkinson’s disease (Nombela et al., 2013; Pacchetti et al., 2000), are positively affected by music therapies or regular musical activity, as are conditions such as clinical depression (Erkkilä et al., 2011), schizophrenia (Ceccato et al., 2006), and a wide range of other psychiatric diagnoses (Gold et al., 2009). For developmental disorders, such as autism, which include a strong preference for structure and predictability in their phenotypes, music provides a useful scaffold for effective therapy (Lense & Camarata, 2020). This leverages music’s predictability—drawing on known cognitive capacities—to encourage social bonding between individuals—a necessary condition for the entrainment of cultural patterns. Indeed, all infant-directed communications are known to demonstrate characteristics such as rhythmicity, which are generally stronger in music than in adult-directed speech (Fernald, 1989; Trainor et al., 2000), implying that the cognitive processes prominent in music are ideally suited for social bonding with infants. Music has also demonstrated efficacy in promoting multiple areas of development for children with Down syndrome (Gemma et al., 2020) and has potential for therapeutic applications in other neurodevelopmental disorders, such as Williams syndrome (Lense & Dykens, 2016; Lense et al., 2014; Levitin et al., 2004) and Rett syndrome (Sigafoos et al., 2009). This effect is not restricted to children with developmental disorders, given that music has been shown to lead to entrainment behaviors in typically developing children as well (Trainor & Cirelli, 2015).
Infant-directed singing, already shown to mediate the connection between the cultural patterns in the home and the child’s cognitive responses, facilitates the opposite causal effect as well using a similar social mechanism as joint music-making and rhythmic entrainment. Infants have shown increased attention to strangers singing songs their parents have sung to them (Mehr et al., 2016) and are more likely to help adults who sang familiar songs (Cirelli & Trehub, 2018). In addition, although infants prefer accepting objects from adults who have previously sung a familiar song, their attention to those objects was predicated by whether their previous exposure to the song came from a recording or from a parent (Mehr & Spelke, 2018). Furthermore, children have been shown to prefer the company of other children who not only share their musical preferences but also know the same songs as them (Soley & Spelke, 2016). This evidence implies that familiar, predictable musical sequences are important markers for children’s social entrainment and capacity for joint attention, which is known to be a crucially important aspect of child development (Carpenter et al., 1998; Tomasello & Farrar, 1986). Furthermore, it appears that the bidirectional mediation is clearest early in childhood, during which the developmental processes involved are accelerated and more easily assessed, although there is evidence that song and infant-directed speech do positively affect language acquisition into adulthood (Ma et al., 2020).
Some scholars have suggested that this function of music is evidence for its evolutionary provenance as a way to transform cognitive parameters directly into cultural patterns (Cross, 2001, 2003). In fact, Cross’s (2001, 2003) view of how humans construct meaning from music, which involves a kind of interpretive indeterminacy that he calls “floating intentionality” (Cross, 2008), is very similar to the culture-cognition-mediator model. However, although Cross proposed music as a tool for promoting the kinds of “shared intentionality” he viewed as central to the human capacity for culture, my approach views music through a constructionist lens, framing it as a product of lower-level cognitive and cultural phenomena that is a necessary intermediary in a constantly unfolding, mutually constitutive cycle. Such a constructionist view of music opens useful parallels that bolster this argument, such as recent research demonstrating that the entrainment of speech rhythms—a notably musical phenomenon—is an effective signal of shared community membership (Polyanskaya et al., 2019). It also implies both that, as seen earlier in this article, music has efficacy over unrelated cognitive processes and that, as Rabinowitch (2020) argued, music is a powerful tool for social change that must be handled with delicacy and care.
Just as infant-directed singing appears in both causal directions, political music facilitates the effects of cognitive processes on cultural patterns and the inverse. Perhaps the clearest examples of this phenomenon are national anthems. Official anthems, or songs that have been explicitly chosen to represent a country, engender a complex set of cognitive and affective responses in their listeners. National anthems have been found to contribute to senses of national identity starting in later stages of adolescence (Winstone & Witherspoon, 2016), but the valence of responses to national anthems is associated with membership in cultural in-groups and out-groups (Gilboa & Bodner, 2009), indicating that these responses are connected to specific versions of national identity. They are often chosen specifically to inspire feelings of national pride and unity and can undergo modifications to best achieve this aim as the national zeitgeist shifts (Liao et al., 2012). Protest songs are, in some ways, the inversion of national anthems because they are designed to alter opinions of and evoke empathy with cultural out-groups. They have been shown to be remarkably effective at this (Ziv, 2018), implying they can shape the general perception of communities not part of the typical conception of national identity.
These anthems derive their power from fostering a sense of national or communal identity, drawing on cultural referents to evoke shared cognitive and affective responses, which, in turn, reinforce intracultural ties. In this way, national anthems conjure cognitive responses that both are derived from the listener’s specific engagements with cultural patterns and construct them. They also serve as a prime example of how inseparable the two directions of causality are and the role that music plays in the iterative coconstruction of cultures and selves.
Empirical Predictions and Future Directions
Much like the MSB and credible-signaling hypotheses of musical evolution (Fritz, 2021), the culture-cognition-mediator model is difficult to falsify directly. However, it makes several concrete, testable predictions around specific musical features that bear mentioning, particularly with respect to neural correlates of predictive coding, affective responses to music, and clinical applications for neurodevelopmental disorders. These predictions are, at this point, speculative because the culture-cognition-mediator model is a nascent theoretical framework, albeit one supported by a large collection of extant data. However, these may form the basis for further research and, regardless of whether the specific predictions of the culture-cognition-mediator model are borne out, will shed additional light on the intersection of cultural heterogeneity and cognitive generalizability that this framework represents.
Predictive coding
The idea that the perception of unpredictable musical events depends on prior musical experience, not just statistical properties of music itself, is not unique to the culture-cognition-mediator model. However, this framework indicates the potential for a much stronger claim: that there are systematic variations in this prior experience not necessarily captured by an individual’s own music-listening or music-making decisions. Essentially, the culture-cognition-mediator model implies that the influence of background cultural exposure (e.g., music in advertisements, from neighboring apartments, in grocery stores) on predictive encoding of musical stimuli is potentially quite large. Therefore, there may be variations in neural and behavioral correlates of music-specific temporal, harmonic, and melodic predictions across cultures above and beyond what is accounted for by musical preference or training.
Prior research has already demonstrated that several of these correlates, such as the early right anterior negativity (ERAN) and mismatch negativity, emerge at different points in development associated with different levels of enculturation. However, the culture-cognition-mediator model indicates that those event-related potentials (ERPs) will be different depending on the alignment between a musical signal and the listener’s prior exposure, so a stimulus that elicits an ERAN for one listener may not for another. Furthermore, these differences may be systematic so that patterns in music-specific ERP responses are related to the prevalence of musical styles within the background noise of a cultural environment and to the statistical patterns within those styles. In addition, individuals with similar patterns of ERP responses may have similar musical preferences, and people with strong affiliative attachment may, in turn, have similar patterns of ERP responses to musical stimuli. This last claim is both extraordinarily challenging to assess and likely to be incomplete at best, if not outright incorrect. However, neural synchronization has been observed in a variety of social and musical environments, and it is not entirely unreasonable to suggest that over time, these interpersonal alignments solidify into similar activation patterns to musical stimuli. At the very least, it is worth further study.
Affective response
Both the MSB and credible-signaling theories of music evolution emphasize music’s ability to engender and convey affective states. The cultural specificity of affect has been the topic of much debate, particularly in cultural psychology, paralleling the rapidly expanding body of work on cultural heterogeneity in music, especially in how music is used in various contexts. One fruitful avenue for study in this space implied by the culture-cognition-mediator model is investigating cultural variation in the affective states evoked by music, the valuation of those affective states, and culture-specific trends or patterns in music creation, use, and preference. A culture that prioritizes feelings of calmness, for example, may demonstrate different context-specific usage and preference patterns for music perceived as high or low arousal. There may also be variation in the levels of arousal a particular piece of music evokes in listeners across cultures. Together, these represent music-mediated influences of culture on cognition (prior musical exposure and learned associations affect core affective responses to musical stimuli) and music-mediated influences of cognition on culture (affective responses to music drive context-specific musical preferences and communal music behaviors).
The musical space most frequently found to be consistent across cultures is child-directed song. In the credible-signaling hypothesis (and several commentaries, including Trehub’s (2021) critique of credible signaling), child-directed singing is framed as a uniquely powerful avenue for building interpersonal bonds between children and caretakers, particularly mothers, and its ability to induce positive affective states. It is fascinating, then, that musical preferences and use cases differ so widely both between and within cultures by adulthood. The culture-cognition-mediator model indicates that this may be due to musical exposure outside the constraints of child-caretaker interactions or to changes in underlying cognitive capacities driven by nonmusical stimuli. Recent technological advances enabling researchers to record a child’s full acoustical environment may allow researchers to study these background musical contributions in longitudinal settings, especially in concert with improved cognitive-assessment methodologies designed to probe both sides of the culture/cognition dichotomy. In this case, the culture-cognition-mediator model indicates that variation in this acoustical background and development of other cognitive capacities, such as timing perception, melody or pitch discrimination, and auditory entrainment, may drive heterogeneity in the evolution of affective responses to music over a child’s development, indexed by aversion or attraction to a musical stimulus and measures of autonomic arousal, such as skin conductance, pulse rate, and pupil dilation.
Neurodevelopmental disorders
Behavioral therapies for neurodevelopmental disorders rely on the ability of environmental factors to alter internal cognitive capabilities, which is at the core of the culture-cognition-mediator model. Because of this parallel, the specificity of intersections between cognitive capacities and cultural patterns in music can be leveraged to guide the development of powerful, targeted interventions.
Impaired rhythm perception has been identified as a key vulnerability in several neurodevelopmental disorders (Ladányi et al., 2020; Lense et al., 2021). Rhythm itself is implicated in several musical features, such as hierarchical rhythmic structures. In child-directed song, these features tend to follow consistent statistical properties and are recognizable across cultures (Trehub et al., 1993, 2015; Unyk et al., 1992), but there is substantial divergence in musics not intended for children. As previously noted, musical preference diversifies as children age, so although music offers a powerful potential scaffold for rhythm perception and associated capabilities, the statistical properties of the most effective music will change over the life span. In particular, the culture-cognition-mediator model suggests that instantiations of effective music-based interventions will be more similar to each other early in childhood and will require more individualization in older individuals. This differentiation may emerge quite early in life, including before the characteristics of many neurodevelopmental disorders are easily observable. For this reason, the culture-cognition-mediator model implies that understanding the home musical environment is necessary to deploy effective musical interventions.
Similar logic extends to disorders that emerge later in life. Broadly speaking, the literature on music-enabled or music-enhanced recovery and rehabilitation from conditions such as stroke, Parkinson’s disease, or dementia focus on either motor function or social/emotional communication and processing. Preference and familiarity, both of which are strongly enculturated, are linked to emotional outcomes, quality of life, and social connectedness (Baird & Samson, 2015; Baird & Thompson, 2018; Sedikides et al., 2022; Sung & Chang, 2005), so the implications of the culture-cognition-mediator model on social/emotional outcomes for age-related neurological conditions are both clear and broadly understood. Therefore, I focus here on novel implications for motor function. For instance, improved gait in Parkinson’s has been linked specifically to music with high beat salience (Leow et al., 2014, 2021), but the ability to induce a regular beat from music is in part learned on the basis of hierarchical temporal schema that vary across cultures (Nunes et al., 2015). Furthermore, this relationship between music and gait improvement may be tied to dance (A. P. S. Pereira et al., 2019), implying another level of cultural specificity, and recent evidence indicates that familiarity with music enhances the benefits for stride length and variability (Park et al., 2021). As a result, the culture-cognition-mediator model suggests that the set of musics useful for improving motor function for any given individual includes some pieces with weak beat salience but strong learned temporal schema and that individuals with more similar musical exposure will have greater overlap in these extended musical spaces.
Conclusions and Implications
Although the culture-cognition-mediator model is broadly supported by extant literature, it is by no means an exclusive account of musical ontology. The simple fact that it is possible to conduct musicological research without drawing on this kind of bidirectional mediation or while examining only music’s connection to either cognition or culture is sufficient demonstration of the limitations of the approach detailed here. And like all models, it will inevitably have overlooked some crucial data. However, it is an effective distillation of the reasons behind music’s well-established functional efficacy. Regardless of whether music evolved to enhance social function or as a tool for mood regulation, it is singularly effective at those tasks, among many others, precisely because of its status as a mediator between the cultural and the cognitive.
The culture-cognition-mediator model also has several implications for both the development of the ontology of music and potential epistemologies of musical inquiry. Although some of these implications are discussed explicitly here, this is not an exhaustive list. In addition, whereas some of them are novel contributions of this model, others are restatements or reinforcements of other existing conclusions.
Epistemological implications
Research under the culture-cognition-mediator model must be strongly transdisciplinary. Such a transdisciplinary scope has been central to music scholarship for well over a decade (Born, 2010; Parncutt, 2006; Waltham-Smith, 2020), so this particular implication is far from novel. However, it puts the culture-cognition-mediator model in line with current trends in musicology more broadly, verifying a necessary but not sufficient condition of its relevance in contemporary scholarship. Although there are valuable things to be learned from domain-specific investigations, it is increasingly apparent that theories of musical provenance, ontology, or functionality, particularly those that engage direction with intercultural research, must span multiple disciplines. Studies deriving from the culture-cognition-mediator model are no exception.
In addition, the culture-cognition-mediator model requires a nuanced and complex engagement with culture as both producer and product of music. This is a somewhat more powerful assertion than is typical in musicology and is certainly more expansive than psychological treatments of music, which tend to view it as a cultural product only. Although marshaling evidence for both causal directions, I find much of the research supporting music’s role in mediating the effect of cognition on culture relied heavily on linking hypotheses that have yet to be empirically tested, in large part because of significant methodological and ethical barriers to hypothesis testing in the cultural space. In addition, asserting that music facilitates cognition’s formative impact on culture runs the risk of either reifying culture or reducing it to an extension of neural function, either of which would be inaccurate. However, these barriers are not insurmountable and, in fact, offer particularly tantalizing avenues for interdisciplinary collaboration and methodological cross-pollination.
Finally, the culture-cognition-mediator model implies that the set of phenomena considered features of music must be expanded to include all music-related phenomena that draw on both cognitive capacities and cultural norms or that facilitate the action of one on the other. This view is directly at odds with the conventional understanding of musical features as observable aspects of the music-object itself and proposes some challenges for music theory in particular. At the most fundamental level, there is no a priori reason why musical features in this construction must be aesthetic at all. In fact, this instantiation of musical features explicitly problematizes the clarity of the distinction between musical and extramusical parameters. It is not at all clear where the line between them is drawn, so for a robust account of musical functionality, they ought not and, indeed, cannot be distinguished. This has unavoidable consequences for the ontology of music itself.
Ontological implications
If music research under the culture-cognition-mediator model considers both musical and extramusical parameters as musical features, this implies that “music” is perhaps better conceived as a broad set of behaviors, products, beliefs, affordances, and associations surrounding the object researchers typically think of as music. This ontological shift is not entirely new because it has been presaged by several researchers over the past half-century (Gourlay, 1984; Rouget, 2004; Small, 1998), who found broad intercultural differences in where the line between music and closely related practices, such as dance, lie. The culture-cognition-mediator model, however, expands musical ontology in a slightly different way: Rather than extending music into the domains of other related practices, this extends music to include aspects such as emotion, genre, and performance context as features, indistinguishable in kind from harmonic closure, syncopation, or melodic coherence. 2
This comparison between the culture-cognition-mediator model and extant theories of music such as Small’s (1998) concept of “musicking” implies a particular nuanced ontological shift: Cognitive or cultural distinctions between music and other cultural products, such as visual art, dance, literature, film, or theatre, are likely both smaller in scope and more relevant to ontological gaps among these products. I illustrate this with an example. Gourlay (1984) indicated that in many cultures, music and dance are not understood as separate concepts. Even in cultures that do make firm distinctions between the two phenomena, they often share performance contexts, forms, and sometimes even names. 3 Likewise, there is a sizable collection of research demonstrating that music directly evokes movement, specifically dance (Burger et al., 2014; Carlson et al., 2018, 2020; Hurley et al., 2014; Janata et al., 2012), as well as whole musical epistemologies centered on the embodied nature of music and musicality (Leman, 2008). Taken together, this implies both that the cognitive distinction between music and dance is minimal, if it exists at all, and that even cultures with separate concepts for music and dance invoke many of the same cultural norms to engage with them. As a result, ontological distinctions between music and dance are driven by a small number of relatively minor cultural and/or cognitive discrepancies, which appear to be almost entirely concerned with that ontological gap itself. A similar argument can be made for other pairs of cultural products.
Although it is not an explicitly evolutionary hypothesis, the culture-cognition-mediator model carries implications for the thriving research on the evolutionary origins of music and musicality. Much of this discussion has revolved around music’s status as an adaptation—an innate capacity that serves a specific, naturally selectable purpose—an exaptation—a technology or by-product of adaptations—or something akin to what Huron (2001) called “non-adaptive pleasure seeking” or Patel (2010, 2018) described as a “transformative technology of the mind.” The culture-cognition-mediator model, at first glance, appears to be something of an enigma because it views music through a constructionist lens that closely resembles Pinker’s nonadaptationist view (Pinker, 1997) but gives music the precise functionality that Cross and others have argued was adaptive (Cross, 2001, 2003, 2008; Honing, 2018; Killin, 2016).
Seen from this framework, the constructionist view of music as a product of cognitive traits not specific to music and the evolutionary view of music as progenitor of the human capacity for culture are not mutually exclusive; rather, the reality is likely a hybrid of the two, much like Killin (2016) and Patel (2018) argued. More precisely, although musicality may be a by-product of lower-level cognitive traits, music itself is inextricably tied to culture and human social functioning, operating alongside other cultural products to stitch together the mutually constitutive cycle of cultures and selves. This perspective also exposes the potential risks associated with relying too heavily on any specific ontology of music when making broad evolutionary claims. The MSB hypothesis, for instance, focuses on the evolutionary provenance of musicality, which Savage et al. (2021) defined as “the underlying biological capacities that allow us to perceive and produce music” (p. 2), rather than music, which they implied is too varied to have a consistent adaptive function. By shifting music’s ontology away from the musical object and toward music’s underlying cultural functionality, I could argue that music itself, alongside other cultural products, enhances survival value by enabling the complex fluency of human social interaction and, therefore, may be an adaptation. However, because such an ontological shift essentially defines music as “the set of auditory behaviors that accomplish a particular cultural task,” arguing that it is an evolutionary trait is circular, much like arguing that the visual apparatus evolved in order to see. The ontological specificity in both the MSB and credible-signaling hypotheses, which several commentaries criticized for artificially circumscribing the heterogeneity of human music and musicality, is necessary for their evolutionary claims to carry exclusive weight. Therefore, although both theories offer powerful frameworks for understanding the provenance and continued function of music in humankind, neither accounts for everything. Perhaps given the ontological instability of music itself, no current theory can, including the culture-cognition-mediator model.
Finally, I return to the question of whether music is or can be considered a universal language. The Voyager Golden Records contain music, carefully selected to represent what NASA believed was the best of humankind in the 1970s. That music, like all music, is inextricably entangled in the mutually constitutive cycle of cultures and selves, and in the absence of a cultural referent, it is reduced to its cognitive components. These components can interface with biological imperatives of perception and are clearly deliberately structured, implying consciousness to the music’s creator. However, the resulting assemblage of auditory data has also lost its distinctiveness as music and likely would not be interpreted as such by any extraterrestrial intelligence that encounters it. In other words, the culture-cognition-mediator model predicts that musics of unknown or foreign provenance will not be recognized or interpreted as music at all. Therefore, even if the cognitive and cultural framework for music is shared and even if there is always something occupying that mediating role, its lack of intelligibility or even recognizability implies that it is not a universal language at all. Rather, it is a kind of translator between disparate substrates. Considered within the broad scope of the ongoing dialogue of culture and cognition, music is in the middle.
