Abstract
Infant vocal production has been studied mainly from the perspective of language development. We studied it from the perspective of singing development by analyzing a 15-month-old’s imitations of songs. The infant wore a recording device that yielded a continuous, 16-hr audio recording of all the sounds produced by him and around him throughout the day. We listened to the audio file and identified instances in which his unprompted vocalizations resembled songs he had heard earlier. One imitation was recognized by his father, who then sang the song himself and engaged in imitative turn taking with the infant; the other imitation went unnoticed by his parents. Perceptual and acoustic analyses of the imitations and the song models showed that the infant imitated critical music features of the songs, including pitches, intervals, and rhythms. We discuss the use of new technologies for the study of singing development in infancy; such technologies facilitate the collection of spontaneous vocalizations that may go unnoticed by parents and make it possible to trace connections between music environment opportunities and specific singing outcomes in infants.
Keywords
Infants’ vocal behavior undergoes remarkable changes during the first 2 years of life. Developmental processes and early experiences shape a trajectory in which infants’ sounds become increasingly similar to vocal sounds produced around them. Improvements in cognitive and perception abilities, more complex social interactions and communicative exchanges, and anatomical changes and greater motor control of the vocal apparatus all provide infants with the adequate framework for responding to and producing meaningful patterns of vocalizations (Goldstein & Schwade, 2008; Kent & Murray, 1982; Kuhl, 2004; Vihman & Boysson-Bardies, 1994). The linguistic environment that surrounds infants also contributes to the development of vocalizations: Even before they produce their first words at approximately 12 months (Bates et al., 1994), the acoustic and phonetic characteristics of infant sounds already reflect the influence of a specific language model (Boysson-Bardies et al., 1989; Boysson-Bardies & Vihman, 1991).
Imitation is one process by which infants master new speech sounds and learn to engage in communicative vocal exchanges (Kuhl & Meltzoff, 1982, 1996; Oller, 2000; Papoušek & Papoušek, 1989). For example, the imitation of speech sounds, such as vowels (Kuhl & Meltzoff, 1996; Legerstee, 1990), helps refine the articulatory movements necessary for speech production (Kuhl et al., 2008; Kuhl & Meltzoff, 1996); in turn-taking interactions, imitation of intonation patterns and reciprocal vocal matching by the communication partner provide auditory feedback and reinforce the communicative usage of sounds (Gratier & Devouche, 2011; Papoušek & Papoušek, 1989). Through such imitation and repetition in social interactions and private vocal play, infants learn to communicate through sounds very early in life.
There is limited research on infants’ vocal development from the perspective of music. However, it is not unreasonable to expect that infants would develop the necessary production skills to vocalize music sounds during the first years of life just as they do for speech. After all, many of the speech production skills that infants acquire during the first year, such as regulation of fundamental frequency (Kent & Murray, 1982) and control of duration (Lynch et al., 1995), also are related to the skills necessary to produce sequences of sounds imbued with the temporal and pitch organization of music. One also might expect infants’ music vocalizations to be supported by their music environment just as their babbling is supported by their language environment (Vihman & Boysson-Bardies, 1994). In fact, there is an abundance of research suggesting that infants are immersed in and attentive to music (Addessi, 2009; Costa-Giomi & Ilari, 2014; Costa-Giomi & Sun, 2016; Custodero & Johnson-Green, 2003; Ilari, 2005; Koops, 2014; Nakata & Trehub, 2004; Trehub et al., 1993, 1997; Young, 2008) and that they perceive and process music with much sophistication (for a review, see Trainor & Hannon, 2013).
It is clear that infants produce sounds that can be described in terms of musical characteristics (e.g., Davidson, 1985; Dowling, 1984; Kelley & Sutton-Smith, 1987; McKernon, 1979; Moog, 1968/1976b, 1976a; Papoušek & Papoušek, 1981; Reigado et al., 2011; Reigado & Rodrigues, 2017; Stadler Elmer, 2011; Tafuri, 2008; Tafuri & Villa, 2002) and that the musical content of infant vocalizations in the context of infant−caregiver vocal exchanges are valuable for conveying emotion and structuring communicative interactions (Costa-Giomi & Benetti, 2017; Malloch, 1999; Papoušek & Papoušek, 1981; Powers & Trevarthen, 2009). But in order to study the development of music vocalization in infancy systematically, it is necessary to chart how the behavior changes over time. Developmental investigations of infant music vocalization and singing are scarce, however, and research on singing acquisition has focused almost exclusively on children older than 2 (e.g., Flowers & Dunne-Sousa, 1990; Rutkowski, 1997; Welch, 1994, 2016). Furthermore, reports of infants’ singing are somewhat inconsistent across studies. An example of conflicting findings concerns the emergence of songs and imitation of tunes. McKernon (1979) found that the singing of four children from the ages of 12 to 18 months consisted entirely of glissandos with no discrete pitches; it was only after 18 months of age that children produced melodic patterns with clearly distinguishable pitches. The imitation of songs emerged after the age of 2 years; even then, children reproduced only the melodic contour and not the rhythm of familiar songs. Papoušek and Papoušek (1981) described a similar sequence of behaviors but displayed much earlier by their own daughter: At 10 months, she imitated sequences of tones in short melodies; at 11.5 months, she hummed melodies of nursery rhymes; and at 15 months, she sang conventional songs in her private vocal play. On the other hand, Moog (1976a, 1976b) asserted that children’s initial focus when imitating songs was not on pitch at all; rather, children first imitated the words of songs, then the rhythm, and finally, the pitch. According to his cross-sectional observations of nearly 500 children, only a third of 1- to 2-year-olds could vocalize patterns that resembled songs that were sung to them. But there are studies that show that 3- to 6-month-old infants can be successfully conditioned to match isolated pitches sung or played to them (Kessen et al., 1979) and that after hearing songs, 9- to 11-month-olds compress the frequency range of their vocalizations to match the range of songs (Reigado et al., 2011), suggesting that before the age of 1, infants already attend to and imitate musical pitch. Overall, the contradicting findings across studies indicate the need for additional research on singing behavior in the first years of life.
The conclusions of studies on infant music vocalization have been based on data collected in a variety of settings and contexts, including vocalizations elicited in the lab following training sessions (e.g., Kessen et al., 1979); recorded in a day care after a period of controlled exposure to stimuli (e.g., Reigado et al., 2011); recorded during home visits by the researcher while the children played and performed tasks, such as learning a song (e.g., Davidson, 1985); and sampled at home with a portable tape recorder when the researcher identified a behavior of interest (e.g., Dowling, 1984; Papoušek & Papoušek, 1981). Each method favored particular vocalization contexts while overlooking others. Procedures involving prompting or structuring by an adult did not capture infants’ spontaneous vocal behavior in the absence of those particular conditions. Procedures involving infant−adult interactions, such as teaching a song, did not capture infants’ vocal behavior in nonsocial contexts. More naturalistic approaches, such as recording the infant at home during routine activities, also involved selection processes: Perhaps a vocalization was selected because it was salient to the researcher for some reason and the recorder was nearby, or perhaps the researcher selected a specific moment or activity during the day because that situation was judged to be of interest to vocal behavior. Home recordings were also constrained by the technology available at the time that made it difficult to record for extended periods of time or to capture infants’ private behavior produced during solitary vocal play. But infant vocal behavior is not the same in all contexts. Spontaneous songs, for instance, seem to have different characteristics than singing that is elicited by researchers (Gudmundsdottir & Trehub, 2017; McKernon, 1979; Moog, 1976a), and infants appear to vocalize differently depending on how others interact with them; for example, they tend to gravitate toward vocalizing sounds that resemble speech in the presence of coordinated verbal feedback from adults (e.g., Bloom et al., 1987; Warlaumont et al., 2014). Additionally, as noted by Dowling (1984), it is possible that children change their behavior in the presence of a recording machine: During the earlier years, before the children became used to the tape-recorder, the songs were sometimes hard to capture. Spontaneous songs are elusive in the sense that they occur at odd moments and the children are easily distracted from singing them. (p. 148)
In the present study, a 15-month-old male infant, James, wore a small, portable device that captured all sounds produced by him and around him throughout an entire day. This technology, unavailable at the time of many of the previous studies, yielded a 16-hr, continuous audio recording of the infant’s vocalizations as they occurred naturally in a variety of contexts, both in the presence of others (e.g., parent−infant dyad, interactions with family members) and in private (e.g., playing alone in his crib). This method of data collection allowed us to explore connections between the infant’s own vocalizations and the sounds of his environment. By documenting all the sounds that he was exposed to and that he produced, it was possible to identify instances in which the infant imitated songs he had heard earlier that day. We specifically focused our analyses on two such imitations. These imitations were contrasting in terms of their context and content and provided us with valuable examples of James’s imitative music behavior: One was embedded in a music interaction between James and his father and based on a song he had heard his mother sing earlier; the other one, based on a song played by a toy, passed unnoticed by his parents. By restricting our analyses to these two distinct imitations, we were able to describe in much detail the characteristics of the infant’s vocalizations as well as the characteristics of the song models he heard.
Method
Participant
James, a 15-month-old male infant, lived with his mother and father, 5-year-old brother, and 3-year-old sister in a middle-class urban neighborhood of a large city in the southwest region of the United States. The mother worked as a homemaker and the father as a software engineer. Both parents held graduate degrees and had studied music. The father had participated in band between 4th and 12th grades and had played saxophone and clarinet in a garage band while in high school. The mother had taken piano lessons as a child for 10 years. All three children participated in music classes for young children: James and the sister attended a Suzuki music class for children ages 0 to 3 and their parents, and the brother participated in a string program. It was through these classes that the family was recruited to participate in the study. We did not select James and his family for a particular motive; although it was clear that the parents were interested in having their children learn music, we had no reason to believe that James’s behavior would be in any way special. For recruitment purposes, the parents were told that the study focused on infants’ home sound environment. They were aware that the main areas of interest were music and language, but they were not told that the researchers were focusing specifically on infant singing. The parents provided consent to participate following institutional review board guidelines.
Data Collection
The infant wore a small and light recording device throughout an entire day. The recording device, known as Digital Language Processor (DLP), is part of a system developed by LENA (Language Environment Analysis; www.lena.org) for studying the home language environment of infants and young children. The DLP captures sounds produced within hearing range of the infant, including the infant’s own vocal production, for up to 16 continuous hours. The audio data are analyzed by the LENA software, which provides estimates of language indicators, such as total number of adult words, total number of conversational turns, child vocalization frequency, and total time of ambient sounds (e.g., sounds from electronic devices, television, and toys). Estimates of music data, such as ambient music, singing by adults, or infant vocal production of music, are not provided; it is necessary to listen to the audio file to identify and study such music indicators. Although originally developed for language research, the use of LENA technology is also valuable for gaining insight into infants’ and young children’s early musical experiences (Costa-Giomi, 2016; Costa-Giomi & Benetti, 2017; Costa-Giomi & Sun, 2016; Dean, 2015; Fausey & Mendoza, 2018).
In the present study, we investigated James’s music environment and imitative vocalizations by listening to the complete audio file of the infant recorded on a weekday during which the family stayed at home. The mother turned on the device when James woke up in the morning and turned it off 16 hr later. The parents provided a time log listing the main activities of that day, such as nap times and meals (see Table S3 included with the online version of the article).
Data Analysis
We first listened to the audio file and identified the events reported by the parents (see Table S3 in the online version of the article). We then divided the audio file into shorter segments to make the listening of the 16-hr file more manageable. Each segment corresponded loosely to an event or a general interaction that occurred on that day (e.g., mom and children playing in the living room, dad and infant walking outside). Each segment was subjected to repeated listening, first to describe the general context, such as who was present and where they were in relation to the infant, and then to register in greater detail the characteristics of interactions, vocalizations, and any music heard by the infant. A description of each segment is provided in Table S3 in the online version of the article. From these segments, we isolated smaller sections that contained music and/or infant vocalizations for in-depth listening. Infant vocalizations and music directed at James or in the background were described and transcribed using music symbols when appropriate. We also identified the repertoire James was exposed to (e.g., artist and song names) and the source of the music (e.g., sister singing, electronic toy, brother plucking cello). The entire procedure involved several months of repeated listening, segmentation of the audio file, taking notes, registering the characteristics of the music environment, and isolating sections for analysis.
It was through this systematic process that we identified the two song imitations that are the focus of the present study. To clarify, we defined song imitation as a vocalization that was perceptually similar to a song model present in the infant’s sound environment. To corroborate our pairing of the infant’s imitations and the song models he had heard, we examined whether listeners who were not involved in the study and were unaware of the source of the data and study objectives also would perceive the similarity between the infant vocalizations and the song models. The responses of the 65 adults who participated supported our identification of the imitations and song models. A full description of the questions participants answered, the stimuli they heard, and their answers is provided in the supplemental material (included with the online version of the article).
Perceptual and acoustic analyses
We identified the music and speech characteristics of the song models heard by the infant the day of data collection and of the infant imitations of the songs. We compared the characteristics of each model and imitation, including speech elements (e.g., consonants, vowels, syllables) and music elements (e.g., pitch, duration, interval and rhythmic structure, key, meter). We used Praat, Version 6.0.17 (Boersma & Weenink, 2016), to study the physical characteristics of the sounds by collecting estimates of fundamental frequency (F0), spectrum, duration, and intensity. To facilitate the comparison between the perceptual and acoustic analyses, F0 values are both reported in Hertz and converted to a logarithmic scaling based on semitones of the equal-tempered scale (A4 = 440 Hz; 1 semitone = 100 cents). For further detail about the acoustic analysis, please see the supplemental material included with the online version of the article.
Results
The results of the analysis of the 16-hr recording showed that James was exposed to approximately 39 min of music that day, mostly sung by family members or produced by electronic toys. He heard all family members sing at least once during the day for a total of 14 min, including 10 min 43 s of infant-directed singing (i.e., singing produced specifically for the infant). Supplemental Tables S3 and S4 (included with the online version of the article) provide detailed information about the events of the day and James’s music environment. The LENA software analysis showed that James vocalized 30 min 38 s of speech-related sounds (e.g., babbling) and 29 min 15 s of nonspeech sounds (e.g., crying, vegetative sounds), for a total of 59 min 53 s. Note that the software does not provide information regarding music, including infant vocal production of music or adult singing, and it was through the detailed listening of the sound file that we were able to identify the infant’s imitations and the song models.
We present an in-depth analysis of two of the vocalizations that incorporated features of the song models that had been produced around him. The episodes are described chronologically to provide an overview of the sequence of events.
Imitation 1: “Rain Rain”
Shortly after lunch, James heard his mother sing “Rain Rain” twice. The music notation is shown in Figure 1a, with the fundamental frequency (F0) values of the two notes that compose the initial minor-third interval indicated below the notation. The first execution was in the key of A-flat major and the second execution was in the key of A major.

“Rain Rain” imitation. (a) Music notation representing the mother’s singing, (b) the infant’s first vocalization, (c) the father’s singing and infant imitation of the father, and (d) the father’s singing during the last part of the infant–father interaction. The pitch contours of individual infant sounds are shown below the musical notation in (b).
Six hours later, in the evening, when James was alone with his father, he vocalized a sequence of 11 discrete sounds that resembled the rhythm and pitches of “Rain Rain.” The sequence lasted approximately 7 s and comprised syllabic combinations of consonant-like sounds and vowels. The pitches of the first two sounds were E-flat4 and C4, respectively. The onsets of the sounds occurred at regular intervals except for the last three sounds, when James seemed to have some difficulty coordinating his breathing. The durations between consecutive onsets clustered around two categories of values in a ratio of 2:1, producing a temporal organization of sounds similar to that of the rhythm of the song. The predictability of timing induced the sensation of an isochronous pulse of approximately 78 beats per minute. James’s breaths occurred at the end of every second beat and naturally divided the sequence into four groups of approximately equal duration. The first sound in each group was the loudest, giving the impression of a metric accent at the beginning of the group.
The rhythmic pattern and the initial pitches of James’s sequence of sounds were perceptually similar to the rhythm and melody of the song “Rain Rain” that his mother sang earlier that day. Figure 1b depicts the infant’s sequence of sounds using music notation in the key of A-flat major, with the four groups represented as four measures and the duration categories as half notes and quarter notes. F0 values are indicated below the music notation, and the duration of each sound and each group are indicated above the notes and measures, respectively. In each of the four groups, the descending melodic contour was preserved. The pitch contours of each sound are shown below the music notation in Figure 1b. As can be seen in this figure, the pitch contour of each sound was either mostly stable (e.g., the contour of the first sound) or centered on a pitch target (e.g., the slight rise and fall of the contour of the third sound). The durations of the inter-onset intervals (i.e., time elapsed between the onsets of successive sounds) clustered around two values, 0.4 s and 0.8 s, which is consistent with the perception of two duration categories. The duration of each of the four groups was approximately equal, ranging from 1.63 s to 1.80 s (M = 1.69 s), which supports the perception of the vocalization as comprising four approximately equally timed groups.
The father, who was next to James when he vocalized, identified the resemblance between the infant’s sounds and the song “Rain Rain”: Immediately after James finished the vocalization of interest, the father said, “It sounds like you’re singing . . .,” and proceeded to sing the song himself, in the key of G major and starting on D3 and descending to B2. Figure 1c shows the music notation for the father’s singing, with F0 values for the first two notes indicated below. James responded with a sequence of vowels that matched the rhythm and melodic contour of the last measure of the song sung by the father (Measure 4 of Figure 1c). Figure 1c also shows the infant’s vocalization represented with music notation and the corresponding F0 values. The first three sounds formed a descending interval pattern with durations equivalent to two quarter notes followed by a half note. James did not begin on E-flat4, as he had done in his initial vocalization, but rather on D4, an octave higher than his father’s starting pitch. The lower note of the interval was not clearly pitched, but it was perceptibly lower than D4. Despite the somewhat unclear pitch percept induced by the infant’s lower note, inspection of the waveform showed a periodic pattern for this sound, with F0 of 233 Hz (B-flat3). Thus, even though James was not entirely successful in matching the pitch class of the father’s lower note (B) as he was with the higher note of the minor third interval (D), the interval he produced was quite close to the father’s, and the descending melodic contour of the tune was preserved.
James’s following two sounds maintained the rhythm of “Rain Rain,” but the sounds were partially obscured as the father interrupted and began singing again. This time, however, his father sang the melody a half step lower, in the key of G-flat major, starting on D-flat3 and descending to B-flat2 (Figure 1c; “Come a-gain a-no-ther day”). When the father finished, James produced three syllable sounds that matched the rhythm of the last part of his father’s singing (i.e., “-no-ther day”) but with unclear pitch (Figure 1c). The three syllables were perceptually similar to the sounds “no,” “der,” and “deh,” matching parts of the accompanying lyrics. James followed this with another two sounds that resembled “da da.” When James stopped vocalizing, his father resumed singing the same melody, maintaining the key of G-flat major but with variations in lyrics (Figure 1d). After his father’s fourth execution of the melody, James vocalized three sounds that resembled the rhythm of the last three notes sung by his father, followed by short moans of frustration and crying. The father sang the melody one last time, again with alternate lyrics. After a few seconds of silence, the infant changed the focus of the interaction by making “uh-oh” sounds, and the musical exchange between father and infant ended.
In summary, the results show that the infant produced an unprompted vocalization that resembled the song his mother had sung 6 hr earlier in interval structure, key, rhythm, and temporal organization (i.e., meter). He then engaged in an imitative interaction with his father that centered on the same song and reflected speech and musical elements, such as consonants and vowels from his father’s lyrics, interval structure, key, and rhythm.
Imitation 2: “Happy Birthday”
In the morning, James interacted with an electronic music toy that played the melody “Happy Birthday.” During the 10-min interaction, the toy played 11 executions of the entire melody and one partial execution, for a total of 128 s. The melody was in G major, starting on D4, and consisted of synthesized keyboard sounds and no lyrics (Figure 2a). The tempo was approximately 140 beats per minute (duration of quarter note = 0.43 s). In the evening, while with his parents, James vocalized 10 sounds that resembled the beginning of “Happy Birthday.” The entire vocalization lasted slightly over 4 s. The parents were talking among themselves while the infant vocalized.

“Happy Birthday” imitation. Music notation representing the (a) toy melody and (b) infant’s sounds.
The music notation of the infant’s sounds is shown below the toy melody in Figure 2b, and F0 values are shown below the notation. The infant’s first seven notes matched the pitches of the melody played by the toy. Of these seven notes, only G4 seemed to present some difficulty: After the jump from D4, James overshot but then corrected the note by gliding down a half step. Figure 2b also shows the durations of each of the infant’s notes. The temporal relationship between the note durations is represented using quarter notes in the musical notation. The tempo was approximately 130 beats per minute (average duration of quarter note = 0.46 s).
James produced the first four notes (D4, D4, E4, and D4) with different vowel sounds. They were perceptually similar to the vowels of the words “bad” (D4), “bed” (D4), “bud” (E4), and “bead” (D4) in American English. Acoustic analysis showed changes in formant patterns consistent with the perception of different vowel qualities (see the supplemental material online for more details and results of acoustic analysis). The subsequent notes were produced with sounds similar to “ya,” except for the seventh note (D4), which was produced with a “ga” syllable.
In conclusion, perceptual and acoustic analyses of the infant’s sounds indicate that James reproduced the pitches of the beginning of the melody of the song “Happy Birthday” played by an electronic toy and incorporated different vowel sounds in his vocalization.
Discussion
We listened to one infant recorded throughout an entire day and identified and analyzed two vocalizations that resembled music produced around him. Perceptual and acoustic analyses of these imitations and the song models showed that the infant imitated a variety of music features, including pitches, intervals, melodic contours, and rhythms. James also incorporated phonetic elements into his imitations, such as consonant and vowel sounds associated with changes in the lyrics. In the discussion that follows, we first summarize the findings from our analysis of each imitation and then discuss these findings in the context of music learning.
“Rain Rain”
James vocalized a sequence of sounds that resembled the melody of “Rain Rain,” a song that his mother had sung 6 hr earlier. James’s unprompted vocalization displayed an organized temporal pattern and prominent descending minor-third characteristic of the song. The father heard James sing the melody and proceeded to sing the song himself; James and his father then engaged in an imitative interaction with the father singing various verses of the song and James imitating intervals and rhythms. There are several points worth emphasizing about James’s vocalizations, the parent’s vocalizations, and the interaction in general. Concerning the infant’s vocalizations, (a) James’s initial vocalization expressed features relevant to musical organization, such as discrete sounds with stable pitch contours and with durations conforming to music rhythm. That the father labeled the song so promptly after hearing the infant’s vocalization suggests that James was able to express critical musical elements of the song, facilitating his father’s identification of the melody. (b) The infant’s second “Rain Rain” vocalization—the one following the father’s singing of the song—reflected the same descending melodic interval of the first vocalization but was sung in the key of G major instead of the original A-flat major. James transposed the melodic interval to match his father’s rendition of the song. We know from perceptual studies that infants can encode both absolute and relative pitch information from music melodies (Plantinga & Trainor, 2005; Saffran, 2003; Saffran et al., 2005; Saffran & Griepentrog, 2001; Trainor & Trehub, 1992) and that the learning and encoding of pitch depends on the information available to them (Saffran et al., 2005); here, we describe how one infant’s vocalizations reflected the transposed melodies he heard around him.
Second, regarding the parents’ singing, it is worth noting that the mother and father provided James with models of the same melody sung in four different keys (A major, A-flat major, G major, and G-flat major), by two different voices, and with different tempi—all on the same day. That James had multiple opportunities to hear the same song sung in different ways is important from a music development perspective because such variability supports the development of melodic categorization (Costa-Giomi, 2013). For example, by being exposed to the same song sung in various keys or by different voices, James had the opportunity to perceive the similarities of the melody despite differences in key or timbre. This type of input prioritizes the structural pitch information of the melody (Patel, 2008) because such information is what remains unchanged amid variations in pitch or timbre. Adding to further sources of input variation, the parents’ singing was not professional; that is, although the song was communicated effectively, there were moments in which notes were out of tune, the pitch drifted, and the timings were slightly off. Considering that the identity of a melody should be robust to all these eventual variations, it may be the case that input representative of natural singing with all its variability is particularly beneficial for music learning as it, too, reinforces the formation of music categories.
Finally, the “Rain Rain” episode is an illustration of a musical interaction common between infants and caregivers that emphasizes communication and emotional connection (Costa-Giomi & Benetti, 2017; Dissanayake, 2009; Malloch, 1999). James initiated the musical exchange by calling his father and vocalizing; the father acknowledged and elaborated on the infant’s music vocalizations, in effect affirming the infant’s vocalization as a successful and valid expression of music. The song then served as a framework for coordinated turn taking between infant and father, much like the communicative musicality of mother−infant dyads described in previous research (Malloch, 1999). The continuous recording of James’s soundscape revealed that the same song was part of two meaningful infant−parent events during that day, one with the mother after lunch and one with the father in the evening. In other words, the song was not an isolated occurrence and was not bound to one single interaction but was a musical object shared by at least three family members and embedded into James’s daily life. The results of the analysis of “Rain Rain” highlight the value of the particular methodology we used. Not only did it allow us to establish that James had heard the song sung by the mother earlier—information that, arguably, we could have obtained from a report by a very attentive parent with excellent memory—but it also let us determine the exact duration, key, and tempo of both the adults’ singing and the infant’s singing.
“Happy Birthday”
The second infant vocalization of interest consisted of a melody sung by James that matched the pitches of “Happy Birthday.” James had heard this melody produced by an electronic toy with which he had played 10 hr earlier. Although James heard the melody played by the toy with a synthesized timbre and no lyrics, he incorporated vowel sounds in the first four notes of his vocalization. Acoustic analysis showed changes in formant patterns consistent with the perception of different vowel qualities. Considering that James did not hear the melody sung with words on that day, the finding that he produced sounds resembling linguistic content in addition to musical content is notable. It is possible that the vowels he produced were simply the result of vocal exploration, but this argument does not detract from the fact that he produced perceptually and acoustically distinct vowel sounds as he imitated the pitches played by the toy. The possibility that he associated the melody from the toy with sung renditions of “Happy Birthday” cannot be dismissed. After all, “Happy Birthday” is a popular song, and it is likely that James heard it sung many times before the day of data collection.
Unlike live singing, toys provide many opportunities for repeated listening. At the kick of a crib mobile or the push of a button, infants can hear melodies at will as many times as they wish (Costa-Giomi & Merkow, 2015; Young 2008). In fact, James was engaged in playing with the toy for 10 min and heard the melody that he later imitated played 12 times in quick succession. Considering the abundance of interactive music toys and devices in the homes of young children (Young, 2008), it seems important to study their impact on infant music learning and singing development. Studies of infants’ engagement with digital music toys suggest that adult–infant joint play with toys may provide meaningful musical experiences (e.g., Costa-Giomi & Merkow, 2015; Merkow, 2013; Young, 2008), but we know little about how infants’ independent play with such devices affects music learning and, more specifically, singing. The methodology we used in the present study allowed us to capture a moment of independent play and the vocalization that occurred hours later. Without this extended and continuous recording, we would have missed James’s rendition of the song, just like the parents did. Being able to study infants’ spontaneous singing as it occurs when they are not being observed is one of the advantages of using devices such as LENA.
Music Learning
Although much is known about perception of music by infants, little is known about how such perception skills interact and translate into observable manifestations of musical understanding. Arguably, when producing music, perceptual capabilities become crystalized: Models of accurate singing in adults propose that music imitation involves, among other things, perceiving the relevant features of the music model and organizing an appropriate motor response to match them (e.g., Berkowska & Dalla Bella, 2009; Pfordresher et al., 2015). We can speculate that James’s organized expression of identifiable music features was in part a reflection of his music perception abilities. He produced clearly articulated and resonant sounds with stable pitches, and his vocalizations showed temporal organization and a regular pulse. James’s imitations were not flawless renditions of the songs, but they were similar enough to the models that adult listeners, such as ourselves, the participants in the perception study, and James’s father, were able to perceive the similarity. The very fact that his vocal imitations were not at par with adult singing standards calls for a framework of early singing that contemplates a developmental trajectory of music vocalization that takes into consideration the developmental status and limited experience with music and music making of infants. 1
The technology presented here may facilitate the systematic study of such vocalizations as they emerge during the first 2 years of life. It allows for studying manifestations of infant vocalization as they occur in their natural environments, including vocalizations that occur during moments of independent play or that may go unnoticed by parents. The technology also makes it possible to integrate the study of the music environment and the development of vocalizations. The connection between what infants hear and what they learn has been studied in the context of language but not in the context of music. For example, we know that infants learn to produce words that are part of their language environment and that their everyday language experiences are associated with specific language outcomes (Roy et al., 2015; Swingley & Humphrey, 2018). The analysis of all the sounds produced by and around the infant continuously may facilitate the study of these types of connections for the understanding of singing development.
The results of the present study allowed us to trace connections between music environment opportunities and specific singing outcomes in one infant. We recognize that it is possible that James’s music vocal behavior was exceptional and not representative of infant singing in general. His music environment was indeed particularly rich: He heard multiple instances of infant-directed models of songs; engaged in imitative, turn-taking singing with an adult (who acknowledged and celebrated his music vocalization); and had opportunities to produce music by himself without adult intervention. Would infants raised in homes with fewer opportunities for music engagement show comparable vocalization behaviors? Exploring the connections between the music environment and the vocalization of infants will provide answers to this question. Such research will not only inform us about how to shape infants’ music environment to optimize learning opportunities and support singing development but will also provide insight into the mechanisms and developmental processes underlying infants’ ability to learn from everyday music experiences.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
