Abstract
Starting with a brief review of the question, this article examines evidence stemming from classical, popular and ethnic music, as well as from the field of language-based taxonomies and other research areas, that seem to prove a strong correlation between aural environment and musical production. To explain these interesting phenomena, we propose the sonic affinity theory, which holds that the human musical brain is shaped by the action of surrounding sounds, which generate aural profiles and aesthetic patterns in complex reflective processes. The ability to unconsciously internalize auditory signals is the result of an adaptive evolutionary mechanism and develops into a sonic affinity.
Keywords
When Charles Darwin explored Tierra del Fuego in 1832, during his famous voyage on the Beagle, he was struck by the imitative capacity of indigenous Fuegians:
They are excellent mimics: as often as we coughed or yawned or made any odd motion, they immediately imitated us. […] They could repeat with perfect correctness each word in any sentence we addressed them, and they remembered such words for some time. […] All savages appear to possess to an uncommon degree this power of mimicry. (Darwin, 1839, p. 229)
Darwin would never state a specific connection between the mimetic skills observed and the musical phenomenon, but in passages such as those described the link might be implicit.
Humans are imitative animals by nature. We learn directly from our parents’ actions and surroundings. Within our complex sensory system, sound forms part of our life before birth; it is a sense that precedes that of sight, containing a remarkable psycho-emotional weight and influence on behaviour, and is usually kept until the last moment of existence. This article focuses on the confluence of the mimetic capacity and the processes of acoustic perception, decoding and production in the human brain. It is based on the assumption that the overall noise surrounding the person on a daily basis (ontogeny) through what is a probably an adaptive heritage over many millennia, and obeying a causality involved in evolutionary logic (phylogeny; see Parncutt, 2009), essentially conditions his/her musical culture in terms of an eventual affinity. A number of contributions from different disciplines and approaches are drawn on here to support this hypothesis.
The concept of ‘mimesis’ is central to the theory of art, literature and cognitive science. It is subject to innumerable terminological nuances, schools and disciplinary applications that provide contributions of great interest. For example, thanks to biology, we know that nature imitates itself somehow in the processes of species’ adaptation to the environment because those whose body temperature and skin colour match the ambient temperature and surrounding landscape tend to survive better (Cossins & Bowler, 1987; Foster & Kreitzman 2010; Johnston & Bennett, 2008; Osorio & Vorobyevb, 2008). In the field of sound and music, mimicry seems to have a core incidence too, already noticed in the history of musical thought since its first stages. We are referring to the Greek theory of ethos, which was deeply mimetic by postulating that the mood traits inherent to each musical mode caused a correlative emotional and inescapable reaction in the listener. Plato and Aristotle believed that certain types of melodies were uniquely suited to produce feelings of harmony, order and virtue (and the contrary), thereby acting as a means of moral law. For these classical philosophers, art was essentially an imitation of nature (Landels, 1999; West 1994). Similar approaches can be found in early Chinese thinkers, particularly Taoists (DeWoskin, 1993).
A contemporary pioneer in associating music with imitative principles was Vernon Lee (1932). More recently Schafer (1977/1994) coined the fertile concept of ‘soundscape,’ establishing some reflective links between the surrounding context and certain musical repertoires. In a striking exploration, Cumberland (2004, and personal communication, May 2011) showed how the aural atmosphere of an Alpine valley has a specific pitch, objectively measurable, that overtly impacts the auditory culture of its inhabitants. With a complementary orientation, Thibaud understands a sound ambiance as a ‘sensory background’ having a ‘prevailing emotional tonality’ (2011b, p. 206), identifying even a ‘tonic phenomena’ to conclude that ‘there is no radical break between living creatures and their environment […] ambiance reminds us that living organisms and their milieu form a continuum’ (p. 209; see also Thibaud, 2011a). The reflective character of human responses to those acoustic frames is noteworthy:
Evolution endowed the musical brain with a perception-production link that most mammals lack. This motor-action-imitation system gives us the ability to take something in one sensory domain and figure out how to create it with another. We hear music, then sing it. Every song you know how to sing, every word you speak, you reproduce with your own voice based on something you originally heard. (Levitin, 2009, pp. 266–267)
1
Accepting Levitin’s statements, the phenomenon of musical imitation follows a perception-production mechanism where the initial hearing becomes an almost irreplaceable step for the subsequent musical reaction. Therefore, if any concrete soundscape has a particular and measurable pitch that might act like an all-pervasive and empathetic sound-matrix, an important part of the human ‘creativity’ must take place conditioned by that environmental root, affecting various reflective attitudes. 2 For example, Milliman’s work shows how in imitative behaviours related to music, the categories of pitch, intensity, timbre, rhythm or melodic profile are not the only referential frames to be taken into account: tempo is also significant. In one experiment he found that in slow music conditions, customers in a supermarket shopped 15% more slowly than when fast music was played (Milliman, 1982). Subsequently, he studied 1,392 groups of restaurant customers over eight weekends, finding customers ate more slowly when slow music was played, completing their meal in 56 minutes on average, compared to 45 minutes when fast music was played (Milliman, 1986). Similarly, North and Hargreaves (1998) have convincingly demonstrated the power music exerts in the perception of non-musical realities: in a cafe, during 4 days, they played as background music pop, classical, easy listening, and no music. Each aural ambience led customers to perceive the cafe and the economic value of its products differently (see also Areni & Kim, 1993; Kämpfe, Sedlmeier, & Renkewitz, 2010; North, Hargreaves, & McKendrick, 1999).
A number of general approaches and related terminology show an increasing interest from scholars in these phenomena. Among them, the concept of echoic mimicry (Carter, 2005, pp. 46–49) deserves attention, as well as the notion of sonic effect (Augoyard & Torgue, 2005) and the central thesis by Changizi (2011) – the latter understanding music and language as a response to certain natural auditory impacts. In 2001, Cox launched a mimetic hypothesis that addressed the processes by which prior experiences become relevant to musical conceptualization. For Cox, these incorporated experiences motivate and influence the formation of basic musical significance (Cox, 2001). Ten years later, he emphasized imitation as an instinctive and automatic process, relying on the notion of embodied cognition (Cox, 2011, after Lakoff & Johnson, 1999). His work was based on how music becomes internalized into the bodies and minds of listeners, stressing the role of the body in the construction of meaning. Cox believed that ‘part of how we comprehend music is by way of a kind of physical empathy that involves imagining making the sounds we are listening to’ (Cox, 2011, p. 1). Additionally, Godøy and Leman stated that ‘we experience and understand the world, including music, through body movement’ (2010, p. IX). Consequently, it might be assumed the existence of a human embodied audition in which the perception of external sound stimuli is processed in accordance with grouping parameters (Bregman, 1990), is internalized according to natural principles and triggers musical sensorimotor reactions.
The genetic status of imitation could be demonstrated by brain injury patients; in fact for many people it remains inhibited, as described by Rizzolatti and Sinigaglia (2008), who outline the extreme case of echopraxia, in which patients have a tendency to mimic observed actions compulsively, no matter how unusual the actions may be. Some rationale for this strong and instinctive propensity for mimicry can be found in the exposure effect, a psychological mechanism by which people develop preferences through familiarity. The locution was coined and first studied half a century ago by the American psychologist Robert Zajonc. The preference for the known is probably universal and must be due to a functional reason because it is present in all living species: ‘ethologists have shown that all kinds of animals prefer familiar over less familiar stimuli’ (Huron, 2006, pp. 131–132).
Memetics was developed through the seminal work of English ethnologist Richard Dawkins in 1976 (The selfish gene). The ‘meme’ is a consistent unit of culture that settles in the mind of the individual and whose transmission presupposes a transference and imitative action from person to person, as a sort of reflective cultural gene. Seen in this light the meme is an evolutionary replicator, defined as information copied from person-to-person by imitation (Blackmore, 2001; see Aunger, 2007), and with repercussions for musical perception and production:
We mentally imitate sound-producing actions when we listen attentively to music or we may imagine actively tracing or drawing the contours of the music as it unfolds. (Godøy, 2003, p. 318)
In the early 1990s, there was a commotion in the scientific community when scholars from the University of Parma discovered mirror neurons in macaques. Today there is some consensus on the existence of mirror neurons in humans but the debate concerning their brain functions continues. Mirror neurons are groups of brain neurons that fire not only when the monkey performs an act but – and this is the point – when it looks at, or merely listens to, how another monkey carries it out. The consequences of this discovery have had a major impact on the understanding of the mechanisms of cerebral motor activity, because it may prove that the correlation between perceived image and action does not depend on specific muscle movements but on intentions assumed as such from an external stimulus. The mirror neurons will imitate the behaviour of another being in his action as if the observer him or herself was performing the action. Here, too, there is a plausible application directly concerning musical perception and production:
[M]usic, like language, involves an intimate coupling between the perception and production of hierarchically organized sequential information, the structure of which has the ability to communicate meaning and emotion. We propose that these aspects of musical experience may be mediated by the human mirror neuron system. (Molnar-Szakacs & Overy, 2006, p. 235)
These contributions are of great interest, because if, for instance, mirror neurons definitely have a substantial influence in the process of musical creation – and independently from conscious volition – then the reflective sound-based responses we are studying would be demonstrated, and our theory would consequently have biological and empirically firm support. However, the notion of a sonic affinity with the environment, as contemplated in the present work, points to a more flexible and dynamic interpretation of the principle of mimesis – or fixed imitation – than these remarkable perspectives show in general, as well as emphasizing a key causality involved.
Some musical references: From Mozart to the beep generation
Musicology provides some support for the thesis of a deep and not always obvious affinity between life and music. For example, the composer and theorist Murray Schafer – a pioneer in this kind of environmental approach – wrote that the sounds Mozart heard throughout his life were mainly ‘birds, human voices, carts with metal wheels moving through the cobbled streets, the coachman’s whip [thus] treble sounds.’ Schafer observed a correlation between this aural environment and the music of Mozart, full of ‘mid and high frequencies,’ where bass sounds are quite ‘light’ (1997, p. 85). In a doctoral course given in 2004 at the Autónoma University of Madrid, musicologist Roger Alier – an expert on classical opera – stated that in Rossini’s operas (half a century after Mozart), some sections with a strong rhythm had been conceived under the direct influence of the sonority coming from the then recently-born industrial machinery. Alier particularly stressed the repercussion of steam engines and spinning machines in the lively, hustling and sustained beat of the Italian composer.
3
In the same first decades of the 19th century, the orchestra started an unstoppable growth process, enlarging its different sections, introducing new instruments and widening their acoustic range and timbre capabilities (Dolan, 2013). The piano also progressed in several crucial aspects, adding new bass and treble keys (to more than 80) and incorporating the double repetition action (Sebastian Erard in 1824), allowing a much faster repetition of a note. As Hanslick stated at the end of the romantic period:
In Mozart’s and Haydn’s day the piano was a weak, thin box with a soft tone, scarcely audible as far as the front room . . . The full tone and carrying power of the modern piano arise from its great size, its colossal weight and the tensions of its metal-strengthened frame . . . The instrument has gained this offensive power and offensive character for the first time in our day. (Cited in Schafer, 1977/1994, p. 109. See also Bijsterveld & Schulp, 2004)
Similarly, it appears reasonable to make a connection between the new devices of those years and the invention and quick spread of the metronome among classical composers and performers, who were attracted by the accuracy it provided (Brown, 2009, p. 42). The pianola’s popularity around the same time may be likewise understood. These instrumental and technological novelties could be due, at least partially, to the psychological need of assuming and reflecting a wider and more complex and mechanical daily soundscape (see Coates, 2005; Loughridge, 2013).
Some works by Beethoven also deserve attention from the point of view of an affinity with the acoustic upheaval that took place in Europe throughout his life. For instance, several excerpts from his symphonies or the fast tempi in his sonatas popularly known as Pathetique (no. 8 in C minor, Op. 13, 1798) and Moonlight (no. 14 in C-sharp minor, Op. 27, no. 2, 1801) were unthinkable only a few decades before, not just thanks to the undoubted talent of the German composer but also because besides his deafness he had to perceive clearly the new sonorous universe of Vienna. Many of these changes were generated by the industrial revolution and the motorized, mechanical and aggressive soundscape that it conveyed, thus pointing to the end of the gallant and basically rural Europe.
As can be appreciated in this excerpt from Beethoven’s Fifth Symphony (Op. 67, in C minor, 1804–1808, Figure 1), the low-pitched instruments play a noticeable leading role when compared with standard scores of previous generations, and the whole orchestra works occasionally as an unstoppable locomotive, crushing everything in its wake.

Beethoven, 5th Symphony, II Andante, bars 114–121.
In most of the symphonies by Anton Bruckner, composed in the second half of the 19th century, the Scherzi produce a stunning mass effect. Throughout these long tempi, the brass section (the most vigorous of the orchestra) imposes its power with breathtaking melodies, usually in minor or chromatic modes. There are many examples from these decades, including outstanding compositions by Wagner, Liszt or Mahler where formal complexity and the increasing presence of dissonance reinforce the impressive timbre and harmony effects described. These composers musically articulated the gigantism and pervasiveness of 19th century soundscapes, far from the beautiful progressions that Handel or Vivaldi carried out in the previous century with comparable initial talent, inspiration and temperament.
What happens within other Western musical traditions? A genre like minimalism – based on endless and consonant repetitions – also looks to be conditioned by its historical time, echoing somehow the regular and uninterrupted sonority of many modern household appliances. It shares several rooted elements with techno music and has enjoyed a great success amidst the avant garde and experimental attempts of the 20th century. Not far from minimalism Brian Eno, composer of the first official ‘ambient album’ with his 1978 release Music for Airports, described how in that record ‘we wanted to use music in a different way – as part of the ambience of our lives – and we wanted it to be continuous, a surrounding’ (1996, p. 293). 4 Eno reflected thereby how modern urban soundscapes tend to be a uniform and constant rumble that pervades every aspect of life, leading the composer to isomorphic musical productions.
Thompson has labelled the sonority of big American cities in the early 20th century as the machine age, alluding to the accelerated ‘technological crescendo of the modern city’ (2002, p. 2). In fact, many people shared the belief that New York was defined by its din and ‘all perceived that they lived in an era uniquely and unprecedentedly loud’ (p. 120). Jazz and R&B had a strong relation with those soundscapes, recognized by its inhabitants:
The connection between jazz and the sounds of the city was evident to virtually all who listened in. ‘With its cowbells, auto horns, calliopes, rattles, dinner gongs, kitchen utensils, cymbals, screams, crashes, clankings and monotonous rhythm,’ [Joel] Rogers remarked in 1925, jazz ‘bears all the marks of a nerve-strung, strident, mechanized civilization’ (Thompson, 2002, pp. 130–131).
Jazz and the city evolved simultaneously; when Miles Davis arrived in New York in 1944, the pace of streets life was the fastest thing he had ever known:
Miles Davis tells of an accelerated city, somewhat chaotic. Perhaps it is precisely that frantic pace of a big city which, at times, seems to reflect bebop. With its songs performed in a fast time never before heard in jazz, bebop is a quick, nervous and intense music, developed in a quick, nervous and equally intense city. (Tejada, 2013, p. 46)
Rock was (and still is) the musical translation of combustion engines, motorbikes, drills, hammer blows, cement mixers and the subway, with so many other potential analogies between this broad label and everyday life sounds. Among its countless social descendants is the recent beep generation (or MP3 generation), our expression to designate the many young people who seem to have been born with a mobile phone in one hand and a computer mouse in the other, alternating with the Play Station, Nintendo, Sega and other console labels and online games (Gillis, 2011). The digital and artificial sounds generated by these gadgets – so representative of the web era – must have some connection with the overwhelming success of techno music, and many interrelated genres, that is currently taking place worldwide, forming a global cyberculture and an outstanding way to construct and maintain an ‘audible identity’ (Labelle, 2010, p. 148; see also Campos, 2014; Gray, Mentor, & Olivares, 2013; Mulder, Ter Bogt, Raaijmakers, Nic Gabhainn, & Sikkema, 2007; Schäfer & Sedlmeier, 2009; Werner, 2009).
The most aggressive and deafening variants of heavy metal and punk are in many cases located within industrial districts dominated by heavy sounds, such as traffic and factories. In these soundscapes, the presence of powerful bass sounds is noticeable, the complete opposite of the aural world of Mozart. Regarding rap, another popular modern genre, Cohen has suggested that as children, many rappers might have been left by their parents for extended periods in front of the TV (or within earshot of a radio), watching animated cartoons and children’s programmes in which characters usually speak long and loudly (personal communication, March 2011; see also Bickford, 2012; Cohen, 2000; Coyne, 2010; Greene, 2012). Rap performances seem to share with those TV broadcasts several unambiguous elements: the centrality of the image of the announcer/singer who addresses the viewer/audience face on, part of the body mimic and gestures and the essentially convincing and persuasive tone of the prosodic phrasing. The affinity of rap and hip-hop with the pervasive language of advertising must also be stressed, for it involves and even accentuates the features described. This type of auditory milieu may have formed a substantial part of the everyday life of many children ‘addicted’ to television, and in previous generations to using the remote control (which can limit exposure to advertising) and expansion of computers and other devices of the digital era. 5
The cover art of some recordings conveys its musical effects. In Conscious mix (Figure 2) the naked human figure looks to be dominated by a symmetrical, electric and oppressive sound matrix, suggesting a total sensory invasion/submission. The music is a compilation blending techno and hip-hop.

Conscious Mix, a Los Angeles producer DSCO compilation in 2012.
Preindustrial civilizations show a very close connection between the natural sounds of their environment and life itself, within which music, dance and associated rituals become a direct expression of that surrounding context. They are sonocentric cultures that tend to construct an animistic aural cosmology around their soundscapes through myth, oral literature and legend, with frequent impersonation of animals in popular songs and related rituals. For these people, sound has real life and its effects are never produced at random; they understand human existence as both a part and a mirror of those sounds (e.g. Blacking, 1977; Feld, 1985; Keeling, 2012; Mâche, 1992; Rath, 2003; Sabom, 2007). In the words of a shaman of Tuva (central Asia), Mongush Kenin-Lopsan, ‘Our music all began from imitating the sounds of animals […] For Tuvans, the sounds of nature have been our school, our university’ (cited in Levin & Süzükei 2006, p. 125). Imitating the environment using musical instruments is also common among many primitive peoples of the world (e.g. Racy, 1994; Rawcliffe, 2002; Schneider, 1957; Velázquez, 2002). The soundscape frame privileges the position of the human voice, because the first musical instruments were ‘melodic imitators of the human voice’ (Storr, 2002, p. 94; see also the next section).
Other references could be added here, but so far it seems plausible that classical composers were considerably influenced by the sonic context of their lives, and that similar phenomena might take place in the territory of urban popular genres as well as in non-Western musical cultures, pointing therefore to the possible universality of a common underlying process.
Childhood and linguistic data
Further supporting evidence comes from the fields of childhood and language studies, especially regarding the prenatal phase and early infancy. Here, we will contemplate oral language as a critical soundscape for individuals from the beginning of their lives. Accordingly, language shall be considered primarily in its acoustic form, as a means of communication ‘so overwhelmingly oral’ (Ong, 1987, p. 16; see Wang, 1991).
6
Focusing on the interaction between customary sounds and language, Csepregi has coined the concept of phonetic empathy to describe the influence of spoken language as a present and active soundscape and a moulder of aesthetic and audio profiles, particularly in childhood (2004). Complementarily, Augoyard and Torgue state that a vocal mimesis is an essential ingredient in the cognitive process of development and socialization of the child, regardless of his/her mother tongue (2005, p. 63; see also Rosenberg, 2013). Although in the following case the reference is not human language, the instinctive imitation of Aude, an infant girl, is impressive as well as revealing of an innate sense of affinity and integration with the daily soundscape:
Aude, five months old, is in a cradle placed near a door that creaks every time the mother’s helper enters [. . .] that is to say, about a hundred times a day! In a recording, we find the striking presence of this highpitched sound in Aude’s tears and babbles – a perfect example of osmosis between reception and emission on the level of sensation. (Bustarret, 1982: 51)
This infant was able to perform these three important actions: processing the sound of the door and distinguishing its parameters accurately; retaining the sound in memory; and finally imitating it, for which she had to experience it convincing to her own ear. Earlier in life, prenatal experience is a crucial period in the formation of our musical mind:
Ontogenetically, the infant is surprisingly sensitive to patterns of sound and movement that adults perceive as musical [. . .] . The origin and evolutionary function (if any) of this sensitivity is unclear. One possibility is that musical patterns are similar to perceptual patterns to which the fetus is regularly exposed before birth: the fundamental frequency trajectory of the mother’s voice, its relationship to breathing, and the rhythm of her heartbeat and footsteps. (Parncutt, 2012, p. 219; see Abrams et al., 1998; Dowling, 1999)
Among the many postnatal psychological consequences of this phase, we believe that the dominant sound presence of the mother’s voice (female) in the womb could be the reason why humans tend to identify as leading melodies those which are high-pitched rather than low-pitched, the latter often instinctively recognized as an accompanying background (although there is no apparent acoustic reason for this). We might also relate the sonic boom of birth (after James’ celebrated ‘booming buzzing confusion’) to that critical experience, when the sudden activation of the auditory system and the entrance of many medium and treble frequencies not audible before – due to the abrupt departure of protective amniotic fluid – together with the cries of the mother and extreme biological stress of the moment, must be perennially recorded in the deep consciousness of the individual. Soon thereafter, motherese, the particular instinctive semi-language between mother and newborn child, will come into play and this represents another stage considered by many as highly influential for the musical brain (e.g. Nakata & Trehub, 2004). It is noteworthy that motherese is dominated again by the mother’s (female) voice, thus high-pitched. 7
Early developments may influence the individual’s lifelong musical tastes and interests (Levitin, 2006), and those sounds that could be conceptualized as concordant stimulus are preferred. This means that when a stimulant mimetic-mnemonic connection is activated, it might be due to a ‘concordance’ with an auditory memory hierarchy established in the past. In other words, sonic concordance is an emotional and aesthetic phenomenon that takes place when the auditory signal basically coincides with the positive sonic profiles assumed by the brain throughout life. 8
What happens in adulthood when all these existing neural networks enter the field of musical composition? Evidences point to a palpable influence of the prosody of spoken language in instrumental music and a strong affinity between linguistic rhythm and musical structure. In 1953, linguist Robert Hall suggested a resemblance between Elgar’s music and the intonation of British English (Hall, 1953). Other authors have reached similar conclusions; for instance, Abraham stated that ‘The nature of a people’s language inevitably affects the nature of its music not only in obvious and superficial ways but fundamentally’ (1974, p. 62). Garfias (1989) noted that in the Hungarian language, each word starts with a stressed syllable and Hungarian musical melodies typically start on strong beats (i.e., anacrusis, or upbeat, is rare). MacDonald (1996) has shown a relationship between Gaelic song rhythms and the Scottish piper’s pibroch. Temperley and Temperley concluded that ‘linguistic rhythm is an important factor in the shaping of musical rhythm’ (2011, p. 62). Moreover, the rhythms of French and English speech appear to be reflected in the music of French and English composers respectively (Patel & Daniele, 2003), spoken French having a significantly lower pitch interval variability than spoken English – a pattern also mirrored by music (Patel, Iversen, & Rosenberg, 2006). In a later work, Patel wrote:
Composers, like other members of their culture, internalize these patterns [of language] as part of learning to speak their native language. [. . .] We suggest that when composers write music, linguistic rhythms are ‘in their ears,’ and they can consciously or unconsciously draw on these patterns in weaving the sonic fabric of their music. (2007, p. 165)
It seems very probable that the mother tongue – so pervasive from the prenatal stage – might exert a crucial influence in the musical brain and creative/aesthetic preferences of the composer during his mature life, by building a bridge (affinity) between the material orality of that language and the morphological traits of the music composed. Furthermore, the physicality of vocal sound can possibly reach its own meaning in music regardless of semantic-linear value of that same language taken in isolation:
[M]any systems of syllables for transmitting melodic intervals are far from being ‘musically speaking, arbitrary’ [. . .] there is a close and highly regular connection between sonic aspects of the mnemonic syllables and of the corresponding musical phenomena. I propose the admittedly awkward term acoustic-iconic mnemonic systems. (Hughes, 2000: 93)
This rather controversial feature (not accepted by all specialists in the field of language and music) may explain vocal music repercussions. As stated by another of its defenders, ‘Phonetic patterns such as rhyme, assonance, and alliteration create pleasing sequences of sound that are aesthetically meaningful in their own ways’ (Salley, 2011, p. 409). Some celebrated musicians seem to have understood the same idea from their own experience, like singer-songwriter Bob Dylan: ‘The semantic meaning is all in the sounds of the words’ (2004, pp. 173–174; see footnote 7 for remarks about scat singing).
Sonic affinity
The theory this article proposes is that there is a shared common reason for the existence of the aforementioned data and approaches related to generative environmental processes and functional causality involved: the human musical brain is shaped by the action of surrounding sounds, generating aural profiles and aesthetic patterns in complex reflective processes. The ability to unconsciously internalize auditory signals is the result of an adaptive evolutionary mechanism, and develops into a sonic affinity (see Campos, 2012b) It is very likely that this process starts in the prenatal phase of development.
We suggest the term ‘affinity’ as more appropriate for the theory than ‘mimesis’ – which is a main concept in the present article – because the latter suggests too close an imitation of the environmental root, and within this perspective the presence and action of both learned legacy and personal talent/inspiration play a crucial role, developing into a musical response whose traits might show a wide separation from any kind of previous root. Instead, the affinity described appears to be a logical and necessary outcome of a flexible adaptation to the environment, combining reflective reactions to its sounds with human singularity and creativity. Therefore, any notion of determinism associated with this scheme should be discarded. According to Lamont and Greasley:
[D]ifferent individuals have unique, preferred levels of arousal which explain their global music preferences for style. Temperament differences predict differences in preferential listening behaviour, even at 8 months of age. (2012, p. 162)
The graphic shown in Figure 3 develops the central contents of the theory, in diachronic vertical levels and synchronous horizontal axes. From left to right are the natural and cultural starting frames, followed by human perception and derived processes (centre), finishing with the musical results (preferences plus production) and, therefore, the gestation of a new soundscape. From top to bottom: the upper axis (blue – dark) mainly concerns the audience stages – although they also take place in performers. The central axis (green – intermediate) refers to the brain processes and mechanisms that rule the system, and the bottom axis (purple – clear) is primarily that of the composer and production. 9

Graphic of sonic affinity.
One of the major points in this model is causality. Where does our propensity to isomorphism with the environment come from? What does sonic affinity improve within the struggle for survival? How do mnemonic mechanisms operate in these processes? In principle, at least three areas of response to these questions are proposed. Firstly, within the block of offensive and defensive actions, the most immediately visible utility is cynegetic mimesis, which is present in many early civilizations and the most conscious and deliberate resource of those considered here. The notion of ‘alert listening’ should also be taken into account (Barthes, 1991) as well as the specialized mechanisms to extract critical sounds for detecting prey, predators, mates or a combination of the three (Bregman, 2008). Levitin observes a crucial need to perceive any danger in sounds (2006, pp. 180–182; see also Rath, 2003; Rawcliffe, 2002). Furthermore, Mâche believes that the imitation of animal sounds by groups of hunters was crucial to the origin of myth and music: hunters would echo natural sounds not only for hunting but also, previously, as a propitiatory rite or ‘sympathetic magic’ (1992, p. 40) similar – we believe – to the social and religious functionality of the paintings at Altamira and Lascaux.
Secondly, there is the cardinal necessity of communication, which is self-explanatory: children instinctively learn the language and forms of behaviour of their elders, a task requiring immeasurable uptake and mimicry abilities. Particularly between the fifth and tenth months of age, the infant makes a substantial effort to learn, understand and imitate every gesture and sound of his/her parents. This is necessary to acquire one’s mother tongue, to access the stage of comprehension and for socialization; these developments have been depicted in an extensive literature. With respect to future musical preferences and behaviour, on the one hand, the mimicry involved is probably crucial in assuming phonetic prosody as an aural frame where the future musician is formed, and on the other, the achievement of social integration – of a mainly assimilative-mimetic base – should remain on the individual in terms of successful psycho-emotional reward.
Finally, the sense of balance and orientation depends on the vestibular system of the inner ear, which helps orientate the body in space, and provides a sense of movement and balance – so-called equilibrioception. The perception and control of spatial and temporal structure facilitates access to objects and relationships of the outside world.
10
On a deeper level, the human being seems to need a parallel sense of rootedness, of emotional orientation, as a being-in-the-world (Dasein in the expression by Heidegger, 1996 [1927]; see also Becker, 2003), which relies very much on the aural experience. Thus orientation would keep a close relation with the need for and consequential construction of identity. For this reason, music can play a prominent role in the processes of formation and grouping of youth identities, among other ramifications, as it allows the location and recognition of the self due to sound reciprocity: ‘through sound imitation the external world can be influenced and, to a certain degree, controlled’ (Both, 2006, p. 319; see also Campos, 2012a). And, very importantly, that surrounding world can be understood by the individual because of its sounds:
Like a landscape, a soundscape is simultaneously a physical environment and a way of perceiving that environment; it is both a world and a culture constructed to make sense of that world. (Thompson, 2002, p. 1. See also the concept of personal sound space in Fluegge, 2011)
For instance, Leadley (2011) has described the problems of disorientation in people deprived of their natural (real) soundscape; sonic disruption – as it might be called – caused them anxiety and a strong feeling of dislocation, the dissociation between person and milieu (see also Perea, 2009). The core point here is the perception of and integration with the soundscape as a means for physical and psychological orientation:
Soundscapes, no less than landscapes, are not just physical exteriors, spatially surrounding or apart from human activity. Soundscapes are perceived and interpreted by human actors who attend to them as a way of making their place in and through the world. (Feld, 2003, p. 226; see also Finchum-Sung, 2012)
These analytical perspectives occasionally arise in personal terms, reinforcing the notion of an emotional and dialogical sonic affinity:
[S]ound places me in the midst of a world [. . .] when we enter a new place, we immediately feel its atmosphere and make sense of it. Our body responds to the place in some way or another. We tend to adopt its rhythm and its tonality. [. . .] With sounds – as with ambiances – we do not experience the world from the outside, in front of us, but through it, in accordance with it, as part of it. The sensing subject is nothing but a resonant body that gets in tune and in sync with his environment. [. . .] In a way, I become part of what I sense; I tend to merge with the ambiance; my level of tension adjusts to the one of the world. My voice – my way of speaking – tends to sound akin to what I hear. (Thibaud, 2011a)
In summary, the sonic affinity might have different causal backgrounds, some of them overtly practical but others embedded in deep psychological processes. The union of all of them may ultimately underline the necessity of a reflective integration with the soundscape to better overcome all types of survival problems throughout the long millennia of our prehistory, and directly concerning the formation of musical thinking.
Conclusions
This work aims to bridge a gap between aural phenomenology and neuroscience; it is an attempt to bring scientific method closer to the ethnographic interpretation of musical experience. Accordingly, with respect to the variety of authors and perspectives seen above, it understands sound studies as an open and fertile field of multidisciplinary research (perhaps a ‘transdiscipline:’ Meelberg, Cobussen, Stewart, & Nieuwenhuis, 2013) with notable contributions stemming from systematic musicology, ethnomusicology, social anthropology, psychoacoustics, sound perception/cognition studies and neuroscience. Partially for this reason the combination of all the viewpoints and results shown might contain several complex aspects; particularly dense and problematic is the issue of the causality involved, a threefold connection of offensive/defensive purposes, communication, and physical and psychological orientation that could point to the reason for a sonic affinity within the logic of evolution and survival. The range of influence that each separate sound or a stable audiosphere exerts on our musical psychology also needs accurately observing, because obviously not all sonic experiences have the same repercussion on the human being.
The notion of sonic affinity proposed here goes beyond a mere aural empathy with the medium; it considers an innate and simultaneously acquired process that is highly flexible, capable of evolving throughout the lifespan of the person and partially subject to his/her will, but with a noticeable initial reflective and unconscious root embedded in the logic of adaptive evolution. In this sense, human individuality is decisive in the volitional development and shaping of an almost overwhelming variety of musical tastes; however, the environment plays an equally influential role in the same direction, as much due to the ‘cultural/learned’ effect as to influences based on earlier life stages, and deeper and unconscious evolutionary mechanisms. The soundscapes may not have made our music, but they helped to shape it.
To conclude, sonic affinity probably obeys a practical adaptive reaction of the human being to better survive in concrete environments by means of achieving the psychological formation of a personal identity and the integration with the surrounding world and capacity to fight against its potential threats. It is a real and operative resource of our species that witnesses the underlying relationship between musical thought and certain natural laws, namely the permanent and dialogic interaction between soundscapes and the human ear – soundscapes understood as complex cognitive systems and even including paralinguistic aspects of speech – because music is not an essence beyond time and place but a tangible phenomenon closely related to the most elementary biological codes of the human species. For the same reason the aural productions derived from the mechanism of sonic affinity – music among the most principled ones – did not initially belong to any ‘art’ category we could consider as such, but were absolutely necessary for the survival and development of human civilizations.
Footnotes
Acknowledgements
I am grateful to Professor Annabel J. Cohen and sound researcher Michael Cumberland for their valuable help in several areas of this article. The remarks of the two anonymous peer reviewers of the journal have been very important. I am also grateful to the Editor, Alexandra Lamont, for her orientation during the editing process.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
