Abstract
To date, behavioural procedures adopted to assess sound preferences in young children have evaluated the responses of participants while listening to the stimuli administered by the experimenter. Due to the difficulties which may arise in the interpretation of the results, recent studies have suggested some limitations to these procedures, stimulating the further development of behavioural methods. Here, we introduce a new method for testing sound preferences in children, in which participants actively produce the stimuli during the experimental session. The apparatus consists of a musical lever which emits different sounds depending on its rotation around a hinge. The device was programmed to emit consonant and dissonant harmonic intervals. The procedure has been tested with 22 participants from 19 to 40 months of age. Results show that: (a) sound emission strongly stimulates toy manipulation; (b) the examined participants distinguished the two types of sounds, showing a preference for producing consonant over dissonant stimuli. This method could be used to study a wide range of sound qualities in young listeners, such as rhythm or pitch. Grounded in the mutual interaction between perception and action, this procedure is in line with recent research highlighting the role of embodiment in the perception of music.
Keywords
Over the last several decades, different experimental methods have been adopted to investigate consonance and dissonance perception in humans. Technologically advanced procedures based on functional magnetic resonance imaging and electroencephalography have shown that consonant and dissonant stimuli generate different reactions in the brain (Koelsch, Fritz, Schulze, Alsop, & Schlaug, 2005; Koelsch & Mulder, 2002; Minati et al., 2008; Park, Park, Kim, & Park, 2011; Perani et al., 2010). At the same time, behavioural paradigms have attempted to demonstrate the same distinction in infants and children by judging their reactions (Masataka, 2006; Plantinga & Trehub, 2014; Schellenberg & Trainor, 1996; Schellenberg & Trehub, 1996; Trainor & Heinmiller, 1998; Trainor, Tsang, & Cheung, 2002). The results are largely consistent with the claim that Western listeners prefer consonance (e.g., Bones, Hopkins, Krishnan, & Plack, 2014; Schellenberg & Trehub, 1996; Trainor & Heinmiller, 1998; Trainor et al., 2002). Studies have shown that the human preference for consonance has a biological basis (Bowling & Purves, 2015; Cousineau, McDermott, & Peretz, 2012; Perani et al., 2009; Virtala, Huotilainen, Partanen, Fellman, & Tervaniemi, 2013) and is common to other species, such as primates and birds (Brooks & Cook, 2009; Chiandetti & Vallortigara, 2011; Izumi, 2000; Sugimoto et al., 2010).
Looking-time and head-turn procedures are most commonly adopted in behavioural studies on consonance and dissonance perception with children of different ages. Originally, the head-turn method was developed to assess auditory perception in children under six (Dix & Hallpike, 1947) and three years of age (Suzuki & Ogiba, 1960), and it evaluates children’s stimuli discrimination by measuring their head-turn movements. In the last decades, it has been primarily used with infants older than six months (ideally 6–10 months) who can control head movements (Plantinga & Trainor, 2009; Schellenberg & Trainor, 1996; Schellenberg & Trehub, 1996; Trainor & Trehub, 1992). Recently, the looking-time method has been preferred to the head-turn method due to the former being more easily adapted to younger participants. Looking-time methods are a preferential procedure in which the experimenters measure the amount of time that participants look at a target while listening to different stimuli (Spelke, 1985). In consonance and dissonance studies, this method is generally used with infants younger than six months (Masataka, 2006; Plantinga & Trehub, 2014; Trainor & Heinmiller, 1998; Trainor et al., 2002; Zentner & Kagan, 1998). However, visual attention measures have also been adopted with older participants in domains other than music perception, such as language development studies. For example, Van Heugten and colleagues (2015) examined the developmental trajectory of toddlers’ (between 19 and 26 months old) comprehension of unfamiliar regional accents by using participants’ looking times as an index of listening.
In music perception studies, visual attention procedures are based on the hypothesis that the longer a baby looks, the more he/she likes (or is interested in) what he/she is listening to. Although it seems to be a reasonable criterion for studying visual information processing, its reliability for auditory preferences studies has often been discussed (Hunter & Ames, 1988; Trainor & Heinmiller, 1998). Recently published studies have clearly stated the limitations of this method for music perception studies: why should the baby look at a target longer while listening to a particular sound? “Looking, the principal response measure in laboratory contexts, is hardly a musical behaviour” (Trehub, 2012, p. 40). Therefore, “the most frustrating aspect of the preferential listening procedure is that the absence of differential listening – a common occurrence, unfortunately – is uninterpretable and may not imply discrimination failure. In many cases, highly contrastive musical patterns do not result in contrastive listening times” (Trehub, 2012, p. 39). This ambiguity has lead to different interpretations of the same behaviour: in object continuity and solidity studies (e.g., Spelke & Van de Walle, 1993), longer looking time has been interpreted as a reaction to unfamiliar stimuli, while in consonance/dissonance studies (Trainor & Heinmiller, 1998; Trainor et al., 2002; Zentner & Kagan, 1998), visual ability and form perception studies (Fantz, 1961; Langlois, Roggman, & Rieser-Danner, 1990), and intermodal perception studies (Spelke, 1976; Spelke, Smith Born, & Chu, 1983), longer looking time has been interpreted as a reaction to familiar (or coherent with the real world) stimuli. For these reasons, it would be useful to devise new methods for testing aspects of music perception that are independent from visual attention.
In the present study, we introduce a new method to assess auditory preferences in young children based on a musical toy that emits different intervals based on the manner in which it is manipulated by the child him/herself. Using an action-based measure rather than a looking-time or head-turn procedure allows for the addressing of a gap in the literature related to one to three-year-old children and the perception of consonance and dissonance. In fact, the interaction with a toy should be more engaging than passive listening for participants who may be less interested in tasks that require sitting for a long time, as occurs in visual attention or head-turn methods. Therefore, an experimental procedure based on children’s motor activity should overcome some of the evidenced limitations of standard visual/gaze methods.
The new method is based on the rationale that children’s exploratory behaviour is driven by curiosity to acquire new information (Berlyne, 1950; Kashdan & Silvia, 2009; Reeve & Nix, 1997). Therefore, in the process of acquiring new knowledge, novel stimuli usually motivate children to explore their environment (Taffoni et al., 2014). During exploration, children experience the effect of their action on the environment, thereby understanding “action–outcome contingencies” (Kenward, Folke, Holmberg, Johansson, & Gredebäck, 2009). Therefore, the free exploration of a new object is so highly motivating for the child that it should be intrinsically rewarded by the curiosity to explore it (Litman, 2005). Thus, a behavioural response stemming from a free exploration through a motor action should be considered a more reliable measure than should standard gaze/attention paradigms, particularly among toddlers, who develop manipulation skills that are crucial for object exploration. Moreover, in recent years, neural mechanisms for auditory-motor integration in music have been increasingly elucidated (Hickok, Buchsbaum, Humphries, & Muftuler, 2003), particularly the pathway linking the primary auditory cortex to the posterior parietal cortex and, from there, to the motor areas of the frontal cortex (see Zatorre, Chen, & Penhune, 2007, for a review). Recent investigations (e.g., Komeilipoor, Rodger, Craig, & Cesari, 2015; Leman, 2007; Maes, Leman, Palmer & Wanderley, 2014; Maes, Van Dyck, Lesaffre, Leman, & Kroonenberg, 2014) have focused on the role of the human motor system and body movements in music perception, demonstrating the relationships between music-driven gestures and music perception (Leman & Maes, 2014; Maes, 2016). This evidence and hypotheses suggest that a novel behavioural protocol that tests music perception should be based on auditory stimuli that are intrinsically related to participants’ motor activity. In the present study, the new method has been tested with 22 children between 19 and 40 months of age using a consonance versus dissonance discrimination procedure.
Method
Participants
The study involved 22 typically developing children between 19 and 40 months of age (M = 30±6 months; number of females = 13, number of males = 9). Three more children had been involved in the study, but they were not included in the results because they did not interact with the musical toy. Children were recruited from the daycare centre at Campus Bio-Medico University. The parents of the children signed informed consent forms that described the aim of the experiment as well as the apparatus and procedure. The experimental protocol and informed consent form were approved by the Institutional Review Board of Campus Bio-Medico University (Prot. 07.13 TS ComEt CBM, 29/11/2013).
Apparatus
The musical toy was designed to produce acoustic stimuli, according to its orientation. It has a simple handle that allows +/− 90° rotations, in terms of the resting position (vertical orientation at 0°, see Figure 1).

Musical toy: The mechanical external structure.
When children grasp the handle, they can rotate it around the hinge at its base. When children release the handle, a spring brings it back to the resting position. Rotations exceeding the −40°/+40° interval produce dissonant and consonant sounds, respectively. In the interval between −40° and +40°, the device is silent. Possible movements are restricted to rotations in the vertical plan around the hinge. At the bottom of the handle, a cubic structure houses the electronic core (see Figure 2).

Musical toy electronic sensing core (left) and its functional diagram (right).
A magneto-inertial sensor, composed of a six-axis unit with accelerometers and gyroscopes (MPU-6050, InvenSense, Inc., San Jose, California, USA), measures linear accelerations and angular velocities, which are used to determine angular displacements with respect to gravity. A central logic unit, which is composed of an eight-bit microcontroller (PIC18F46J50, Microchip Technology, Inc., Chandler, Arizona, USA), acquires data from the sensor, calculates angular displacements, commands the audio module and sends data to a remote laptop through Bluetooth communication. An audio module (SOMO14d, 4D Systems Pty Ltd., Minchinbury, Australia) is used to generate the acoustic stimuli, and a Bluetooth interface (Parani-ESD200, Sena Technologies, Inc., Seoul, South-Korea) is used for data transmission. The power supply is composed of a 500 mAh rechargeable lithium polymer cell, with an integrated protection circuit board to prevent short-circuits (GM652535-PCB). The technological components of the device are adequate for empirical research with children because these components were previously used to assess spatial cognition in infants (Campolo et al., 2012; Campolo, Taffoni, Formica, Schiavone, Keller, & Guglielmelli, 2011).
The device measures the time spent in the three different spatial regions. This information is sent to a remote laptop, where it is stored for subsequent elaborations. Every 250 ms, the software checks the handle position: (a) if there is no change in orientation, then a different interval of the same class (i.e., consonant or dissonant) is played (or the same stimulus if it is not finished yet); and (b) if there is a change in orientation, the stimulus corresponding to the new position is played. The device has been coloured to appeal more to children.
Stimuli
In all experimental conditions, the audio stimuli used were the same as those used in Trainor and Heinmiller (1998), and they consisted of harmonic intervals. The consonant intervals are two fifths, A3–E4 and C4–G4, and two octaves, C4–C5 and E4–E5; the dissonant intervals are two tritones, Bb3–E4 and F4–B4, and two minor ninths, A3–B4 and E4–F5. Consonant and dissonant intervals were originally chosen for different reasons: they are similar in size but differ in quality; both the consonant and dissonant sets contain four different notes; or their range is identical (19 semitones). Thus, a preference for size, amplitude, and variety of tones is disentangled from consonance and dissonance (Trainor & Heinmiller, 1998).
Finally, in all musical treatises from the medieval period to the present, fifths and octaves are the only intervals that are always considered to be consonances, whereas tritones and minor ninths are consistently treated as dissonances. Moreover, the octave and the perfect fifth are reserved as prominent intervals in most scale systems that have been developed throughout recorded musical history worldwide (Harwood, 1976; Thompson, 2014).
Tones were played using the piano timbre in the Virtual Studio Technology (VST) tool “Kontact” (version 4, Native Instrument, Berlin, Germany). The rhythmic pattern of the tones is depicted in Figure 3.

Rhythmic pattern.
All intervals were played at 120 bpm (each quarter note was equal to 500 ms in duration). A quasi-random procedure avoids consecutive repetitions of the same interval: when the device is in the consonant (or dissonant) position, the software checks the last interval played and sends back a different interval (the same type).
Procedure
The participants were individually tested in a quiet room at the daycare centre at University Campus Bio-Medico (Rome, Italy). The study involved a single experimental session in which the participant freely explored the toy producing sounds as they liked. The experimental session was administered by two experimenters: one interacted with the child, and one operated the remote laptop. The experimental session was video-recorded. To ensure adequate familiarization with the child, the two experimenters interacted with him/her before the experimental session.
At the beginning of each session, the researchers invited the child to play with a new toy and showed him/her how it worked: one of the experimenters moved the handle and produced both consonant and dissonant sounds, according to the orientation. To ensure the free exploration of the participant, no verbal instruction was given. Finally, the toy was left to the child, who was free to play with it. The child could easily adjust both the orientation and the position of the toy (see Figure 4).
Each session lasted seven minutes and consisted of three phases (see Figure 5).
Phase 1: Three-minute duration. During this phase, the device was in the playing mode, and it emitted the experimental stimuli according to its rotation. The children had to understand that by moving the lever, a sound was emitted. Because Phase 1 was the first contact that the children had with the toy and sounds, we expected no significant differences in the average duration between the classes of sounds (i.e., consonant and dissonant) produced by the children. Therefore, Phase 1 served as baseline from which to evaluate the discrimination and preferences of the children in the following phases.
Phase 2: Two-minute duration. During this phase, the toy was muted. However, data from the device were recorded. This phase was fundamental as the time period in which the children experienced the toy without sounds. The primary goal of this phase was to control for whether sound emission actually influenced the children’s manipulation of the toy. In this phase, we expected to observe a significant decrease of interest in the musical toy.
Phase 3: Two-minute duration. In the last phase, the device was switched to playing mode again. The aim of this phase was to verify whether the children distinguished between the two types of stimuli and whether there was a prevalence of consonant or dissonant sounds (i.e., by comparing the durations of the two classes of sound produced by the children).

A child playing with the toy.

The different procedure phases and durations.
Results
Total consonance and dissonance times were recorded in each phase for each participant. The times were then normalized over the total time of each phase (total time = consonance time + dissonance time + silence time). Thus, the normalized duration could vary between 0 and 1. Data were then averaged based on the total number of participants. The mean consonant and dissonant durations were, respectively, 0.32 and 0.29 in Phase 1, 0.18 and 0.25 in Phase 2, and 0.36 and 0.25 in Phase 3.
A multifactorial ANOVA analysis showed no main effect of sex (F(3, 18) = .465, p = .710) and age (F(6, 34) = .602, p = .727) on stimuli durations and no effect of the interaction between sex and age (F(6, 28) = .261, p = .950). To verify that sound emission influenced the children’s manipulation of the toy, we compared the mean manipulation time (consonance time + dissonance time) in Phase 2, when the device was muted, with the manipulation time in Phase 1 and Phase 3, when the device was in the playing mode. Figure 6 shows the mean manipulation time for each phase.

Mean manipulation time for each phase.
A one-way ANOVA showed that the manipulation time significantly varied across the phases (F(2, 63) = 9.58, p = .001,
We then investigated the effect of sound on the children’s use of the toy across the three phases using repeated-measures ANOVA, with phase (three levels) and type of sound (consonant, dissonant, two levels) as the within-subjects factors and consonant and dissonant stimuli durations as the dependent variables. We found a significant effect of phase (F(2, 20) = 9.39,p = .001,

Consonance and dissonance durations across the three phases.
We also recorded the durations of the longest consonant sequence and the longest dissonant sequence emitted by each child in Phase 3. A paired t-test showed that in Phase 3, the consonance sequences were, on average, significantly longer than the dissonance sequences (p = .01). These findings consistently support the prevalence of consonance in Phase 3.
We also performed a video analysis (two participants were excluded from the analysis because of missing videos due to technical problems). This additional analysis aimed to (a) indicate the side-bias in manipulation of the toy, (b) assess the children’s hand preference, and (c) exclude the effects of boredom across the procedure.
The video analysis was performed by a blinded coder who was unaware of the aim of the study. For each participant, the coder counted (a) the number of movements toward the left and right sides, (b) the number of times that the child grasped the toy with his/her right and left hands, and (c) the number of movements toward the consonant and dissonant sides.
Paired t-tests revealed no significant difference in the number of left or right movements (from the viewpoint of the child) in Phase 1 or Phase 3 (p = .80 and p = .39, respectively). The absence of any boredom effect (previously excluded by the one-way ANOVA for manipulation time) was confirmed by a paired t-test showing that there was no significant difference in the number of average total movements in Phase 1 and Phase 3 (p = .34).
The paired t-tests also revealed no significant difference between the movements that produced consonance and dissonance in Phase 1 (p = .77) and Phase 3 (p = .94) (see Figure 8).

Average number of movements that produced consonance and dissonance in Phase 1 and Phase 3.
Finally, we found that, on average, children grasped the toy with their right hands approximately twice as often as with their left hands (p = .0009).
Conclusion
This study aimed to test a novel behavioural research method for auditory perception studies. In the present article, we focused on music perception. To date, the music perception literature has primarily focused on newborns, infants, and children older than four years of age. This new method was tested with toddlers ranging in age from 19 to 40 months, thereby increasing the amount of music perception research involving participants who may be less interested in tasks that require sitting for a long time, as is required for visual attention methods.
In the present study, the apparatus was programmed to emit consonant and dissonant stimuli. The results showed that the participants actively interacted with the toy in the sounding mode during the experimental session, that their performances varied across the phases and that they were influenced by the type of sound (i.e., consonant or dissonant). The video analysis data helped differentiate the preference for consonance or dissonance from the preference toward the left or right sides, thereby confirming that when it is considered separately from the type of sound, side or hand preference is not a relevant factor in this procedure. As shown in Figure 8, the prevalence of consonance in Phase 3 did not correspond to the prevalence of movements toward the consonant side; instead, it depended on the longer average time that the child kept the handle in the consonant position. Therefore, the results can be viewed as being consistent with the preference for consonance that has been largely reported in the literature on infants and children (Masataka, 2006; Schellenberg & Trehub, 1996; Trainor & Heinmiller, 1998; Trainor et al., 2002; Zentner & Kagan, 1998). Nevertheless, our results do not align with other findings, such as those of Plantinga and Trehub (2014). Plantinga and Trehub found that six-month-old infant looking behaviour was influenced more by the familiarity of the stimulus, regardless of whether the stimulus was consonant or dissonant. However, the methods used are very different: while in traditional looking-time or head-turn procedures children experience the acoustic stimuli before the experimental session, the present procedure does not involve any familiarization phase before the experimental phase. Therefore, both consonant and dissonant stimuli used in this study are unfamiliar or novel when participants interact with the toy. In addition, the participants’ behaviours were evaluated based on comparisons between the mean duration of consonant/dissonant stimuli produced by the child. In contrast, in the looking-time/head-turn procedure, stimulus duration depended on the experimenter’s interpretation. Finally, the duration of the experimental session (as defined using software) assures the same experimental exposure for each participant. For these reasons, the present method could help address the limitations of the existing paradigms.
The video analysis provided further fundamental information concerning the reliability of the method to test acoustic preferences. In particular, the analysis showed that the preference for type of sound can be separated from the preference of movement direction (i.e., toward left or right) and that auditory feedback seems to be the only factor that drives interaction with the toy. According to the literature on hand preference and manipulation (Rat-Fischer, O’Regan, & Fagard, 2013; Sharoun & Bryden, 2014, for a review), we found that in this study, the children, on average, preferred to interact with the toy using their right hands. This finding may be attributed to the fact that playing with the toy requires some degree of coordination and strength; the literature has shown that right-handed individuals exert greater force with their preferred hand, whereas left-handed individuals do not show such a difference (Daniels & Backman, 1993). This result also suggests that hand preference does not influence children’s movements toward the left or right, thus ruling out hand preference as an explanation for sound preference.
In conclusion, the results of our study are largely consistent with the previous behavioural literature on consonance and dissonance perception, based on visual attention methods, which suggests that the present task can be considered a valid alternative measure of perceptual discrimination and preference in infancy. However, the fact that infants show different attention (e.g., listen for longer) to one or the other stimulus is a pure indication of discrimination. What exactly motivates the preference is still unclear, and is a question that is not addressed by the results of the present study. In fact, different reasons may drive children to produce more consonant stimuli than dissonant stimuli, including aesthetic preference, novelty/familiarity, interest and curiosity. To address these issues, future experiments should focus on other stimuli to investigate different aspects of consonances (e.g., intervals, triads, melodies). Moreover, retesting the same sample with the same stimuli after a period of time should be useful to differentiate true preference from other aspects of perception (e.g., familiarity/novelty/memory).
Furthermore, because the musical toy can be used to assess a variety of sound preferences in young listeners, it will be convenient not only to shift the investigation to other musical parameters (i.e., rhythm or pitch) but also to provide more in-depth analysis of motor performance through motion sensors and video analysis, thereby providing an “embodied” framework for music perception research (Leman & Maes, 2014). Moreover, an experimental system based on the simultaneous use of two devices could promote research on the interaction between two children playing with the same toy (e.g., Addessi & Pachet, 2005). In this manner, music therapy research has investigated the effect of music playing on communication in children with autism (e.g., Gattino, Dos Santos Riesgo, Longo, Loguercio Leite, & Schűler Faccini, 2011).
More generally, the new method helps reveal preferences for one auditory pattern over another. Therefore, there are many potential applications for this research outside music. For example, in language development studies, the method can be used to assess the preference for known over unknown words, native dialect or accent over non-native (e.g., Van Heugten, Krieger, & Johnson, 2015), and grammatical over non-grammatical utterances or to investigate children’s reactions to infant-directed speech and singing. The toy could also be used to ascertain the sensitivity of special populations to various auditory distinctions. For example, deaf children with cochlear implants (often received at approximately 12 months of age) show a different pattern of language development (e.g., Miyamoto, Houston, Kirk, Perdew, & Svirsky, 2003). In this case, the toy could reveal discrimination between real versus nonsense speech. Finally, it could also be adopted to study the gender-correlated differentiation emerging in language development.
Footnotes
Acknowledgements
We are very grateful to Matteo D’Aria (ST-Microelectronics) for his precious help in developing the software for the device and to Lorenzo Rampa for helping in video coding. We are particularly grateful to Judy Plantinga, Frank Russo, and Sandra Trehub who read the manuscript and provided valuable comments. We thank the director, the coordinator, and the personnel of the UCBM daycare centre where children were recruited and the parents of the children who participated in the study.
Funding
This research was funded in part by the grant program “Embodiment” from Università Campus Bio-Medico di Roma.
