Abstract
The purpose of this study was to test the effect of daily singing instruction on the singing accuracy of young children and whether accuracy differed across four singing tasks. In a pretest-posttest design over seven months we compared the singing accuracy of kindergarteners in a school receiving daily singing instruction from a music specialist to a control school receiving no curricular music instruction. All children completed four singing tasks at the beginning and end of the study: matching single pitches, matching intervals, matching short patterns, and singing a familiar song from memory. We found that both groups showed improvement on the pitch-matching tasks from pretest to posttest, but the experimental group demonstrated significantly more improvement. Performance on the familiar song task did not improve for either group. Students achieved the highest accuracy scores when matching intervals. Regular singing instruction seems to accelerate the development of accurate singing for young children, but the improvement was evident only in the pitch-matching tasks. It is possible that singing skill development proceeds from pitch-matching to the more difficult task of singing a song from memory. If so, this has implications for how we structure singing instruction in the early grades.
Singing is one of the most natural ways for a child to engage in making music, and singing activities play a central role in elementary music curricula (Campbell & Scott-Kassner, 2014; Phillips & Doneski, 2011). Singing is an entry-level skill for musical engagement often before students learn to play instruments. Developmental studies have found that children’s ability to sing accurately emerges over time as age and musical experience increase (Welch, 2006). While singing ability seems to improve naturally for many children, there are those who struggle with pitch accuracy and are less inclined to participate in elective singing experiences, thus limiting their opportunities for improvement. Researchers have found that an inability to sing accurately can have devastating consequences for one’s musical self-image and lead many to think that their challenges as a singer reflect some deeper lack of musicality (Demorest, Kelley, & Pfordresher, 2017; Welch, 2006). Music teachers frequently discuss the challenges of dealing with students at all levels who have difficulty singing accurately, and the research literature details numerous attempts (see Svec, in press) to test strategies for helping students who may have been labeled by pejorative terms such as “monotone” or “tone deaf”. 1 It would be helpful to know if there are approaches to early childhood music education that might promote the acquisition of singing skills and help children avoid these challenges later in life. The goal of this research was to explore the degree to which daily singing instruction could help young children to sing more accurately. Prior research in children’s singing has explored developmental trajectories for singing skill, variability of singing accuracy across different tasks, and the effect of various types of interventions.
Children’s singing development
Children’s singing development is characterized by a gradual age-related improvement in both accuracy and vocal register (Rutkowski & Miller, 2003; Welch, 2006). Within that larger trajectory, however, individual singing development is highly variable due to differences in family background, singing opportunities, amount and type of instruction, and attitude toward singing. Rutkowski and Miller (2003) tracked the singing voice development of a group of 28 first graders until the end of fifth grade using the Singing Voice Development Measure (SVDM), which measures changes in students’ accessible singing register. They found consistent improvement in use of singing voice as children matured, although they also found considerable variability within the age groups. Whereas the SVDM does not measure singing accuracy directly, scores on the SVDM are highly correlated with accuracy (Rutkowski, 2015). Welch and colleagues (Welch, Sergeant, & White, 1997) tracked the singing development of 184 five-year-olds for three years. They found that children’s accuracy improved overall but varied by task. Accuracy was much better for pitch-matching (glides and single pitches) than for singing a previously learned song. This was particularly true for boys over the three years where accuracy in singing a song actually declined.
Task variables that influence accuracy
As the Welch et al. (1997) study demonstrates, children’s singing accuracy can be heavily influenced by the tasks the children are asked to perform. Previous research has reported that singing accuracy is influenced by the timbre of the model (Green, 1990; Yarbrough, Green, Benson, & Bowers, 1991), the melodic context (Demorest & Clements, 2007; Geringer, 1983), the presence of other voices (Cooper, 1995; Goetze & Horii, 1989), the use of text vs. neutral syllables (Gault, 2002; Levinowitz, 1987), and the use of short pitch patterns vs. songs (Apfelstadt, 1984; Nichols, 2016a; Roberts & Davies, 1975; Van Zee, 1984; Welch et al., 1997). In general, younger children perform best on short patterns echoed on neutral syllables presented by either a child or female non-vibrato vocal model and perform worst when singing a learned song from memory. Data from a recent study of fourth graders (Nichols, 2016b) found that both pitch-matching and song singing tasks were good discriminators of singing accuracy. To get a complete picture of singing skills, it is best to include multiple assessments within a single study to identify where students are most and least accurate (Demorest et al., 2015; Nichols, 2016a).
Instructional interventions
A recent meta-analysis by Svec (in press) of 34 studies of singing instruction with young children found that effect sizes varied widely across studies, but her analysis supported the conclusion that children benefit from receiving instruction in how to sing as a part of their music education. This conclusion supports the ideas of several scholars who have advocated for a singing skills curriculum, one that teaches vocal technique, over the traditional song repertoire curriculum (Phillips & Doneski, 2011; Rutkowski, 1996; Welch, 2006).
Research on the effects of instruction on singing accuracy has focused on improving the singing of both the general student population and those identified as inaccurate singers with mixed results. Since Joyner’s (1969) categorization of some singers as “monotones,” a term no longer in use, instructional interventions have been attempted with inaccurate-singer samples of children, including the use of tape recorders (Klemish, 1974), vertical keyboards (Jones, 1971), and multiple discrimination training (Porter, 1977). Although remediation techniques may not help inaccurately-singing children perform quite as well as their more accurate peers (Van Zee, 1984), evidence suggests a training period as brief as eight weeks may produce some positive results in less able singers (Roberts & Davies, 1975).
Roberts and Davies (1975) used pitch-matching and song singing tasks to study the effects of remediation instruction in a sample of 6–8-year-old students identified by the classroom teacher as “monotone singers.” The sample was randomly assigned to a control group, a traditional group, or a remedial group, with 30 participants each. An additional group of 30 “normal” singers was added for comparison. After twice weekly singing improvement sessions over eight weeks, both the normal singer group and the monotone singer groups improved on all measures of singing production. The remedial group indicated better improvement of single pitch and interval production and their vocal range improved overall, but song singing did not. Studies of kindergarten students not classified into normal or monotone groups also found that pitch-matching accuracy may improve without a corresponding improvement in song singing accuracy (Apfelstadt, 1984; Welch et al., 1997).
The purpose of this study was to test the effect of daily group singing instruction versus no formal instruction on the singing accuracy of young children (5–7 years) from the general kindergarten population and to explore whether singing accuracy performance differed across tasks. The research questions were:
Does the accuracy of children receiving daily group singing instruction improve significantly compared to children receiving no school music instruction?
Are some tasks easier or harder for young children to sing accurately?
Is there differential improvement in accuracy based on task type or difficulty?
Method
Sample
The treatment group (n = 41) consisted of all of the students in three different kindergarten classrooms in one U.S. elementary school. Kindergarten students at this school each receive 20 minutes a day of group instruction in a Kodaly-based music classroom that emphasized the development of the singing voice in terms of tone, register, and accuracy. The control group (n = 38) consisted of all students in three different classrooms in a school matched for general SES and racial diversity 2 where kindergarteners received no formal music instruction during the school day. The mean age of the entire sample at the beginning of the study was 5 years 7 months (age range = 5 years 0 months – 6 years 4 months) with no significant age differences between schools.
Stimuli
To maximize children’s accuracy performance, the test stimuli were recorded by an adult female who was asked to use minimal vibrato (Green, 1990). All pitches in the three conditions were in the range of a fifth from C4–G4. Stimuli consisted of three pitch-matching tasks that were based on the design of Pfordresher and Brown (2007): a single pitch condition, an interval condition, and a pattern condition, and each task condition contained five items preceded by a practice item. The single pitch condition consisted of a single pitch presented four times at approximately 60 bpm (e.g. C-C-C-C) on the syllable “doo”. Children responded by echoing or singing back the four repeated pitches. On different trials, children responded to each of the first five pitches of the C major scale in random order.
The interval condition was also presented as four pitches (e.g. G-G-E-E) and a total of five intervals were tested on different trials, two ascending and three descending. Likewise, the pattern condition was presented as four pitches using three unique pitches starting and ending on the same pitch (e.g. C-E-G-C) and a total of five items were tested. Lastly, the students were asked to sing a familiar song from memory, Twinkle, Twinkle Little Star, preceded by a middle C given from an electronic pitch pipe.
Procedure
We were introduced to the classes at the beginning of the school year in September. We familiarized each classroom with the test procedures for the three pitch-matching tasks and sang through Twinkle, Twinkle Little Star in C. 3 Within a week after familiarization we conducted the pretest measure. All participants were tested individually during the school day in a quiet room. Students sat facing a microphone placed approximately 12 inches from their mouth. Stimuli were presented over a high quality portable stereo placed directly in front of the students about three feet away. The tests were all presented in the same order: single pitch, interval pitch, pattern pitch, followed by the familiar song task. The entire test lasted approximately seven minutes. Seven months later we again familiarized students with the testing procedure in their classes and then conducted individual posttests using the exact same stimuli and procedures.
Treatment
Kindergarten classes at the treatment school received music instruction for 20 minutes a day, 5 days a week. The music classes were singing-focused and began with a predictable routine of singing two songs that call the children into a circle to be seated. The musical material was taken primarily from collections based on a Kodály approach and district materials. The classes emphasized a balance between group and individual work with a focus on in-tune singing through familiar songs in a limited range, vocal experiments, and improvisation. Risk taking was encouraged with errors viewed as opportunities for growth. Everything was taught through singing games that feature group and individual singing opportunities and the teacher worked to establish a playful attitude toward the lessons. For the singing activities a variety of methods and approaches are used, including: repetition, imitation, dramatization, and following a model. Lessons employed aural, visual, kinesthetic modalities and musical concepts were prepared “sound before sight.” The teacher is a national board certified music educator with 25 years of teaching experience. She holds a master’s degree and a Kodaly level 4 certification and trains other teachers in the Kodaly approach.
Analysis
Students’ singing accuracy data were analyzed acoustically using a procedure adapted from Pfordresher and Mantell (2012). The procedure first determines a median F0 for each sung pitch by extracting the middle 50% of each pitch to avoid vocal fluctuations based on scoops and consonant transitions and compares that value to the target pitch. The acoustic analysis yielded the proportion of correct pitches, that is, the proportion of pitches for each task where the median F0 fell inside a +/- 50 cents range. 4
Familiar song accuracy was scored using an 8-point scale (Wise & Sloboda, 2008) shown in Figure 1. Scores did not depend on whether the student started on the given C but only how well they stayed in tune with themselves. Pretest and posttest performances were scored by two independent raters who were experienced vocal music teachers and were blind to condition with an inter-rater reliability of r = .85. Scores of both raters were averaged to produce a single familiar song accuracy score for each student.

The singing accuracy scale used by Wise & Sloboda (2008).
Results
For the pitch-matching tasks 69 of the 79 students successfully completed both the pretest and posttest. In addition, one outlier participant was removed from the analysis because the pretest–posttest difference was 2 standard deviations below the mean of the entire sample leaving a total of 68 participant scores for analysis (86% of the original sample – 32 experimental and 36 control). Of those 68 only 60 successfully completed the song singing task (30 experimental and 30 control). 5 Table 1 gives means and standard deviations for all the singing tasks by condition.
Mean singing accuracy scores by task and condition (standard deviation in parentheses).
The singing accuracy data were analyzed in a 2 × 3 × 2 factorial ANOVA with time (pre-post) and pitch-matching task (single, interval, pattern) as within-subject variables and group (treatment, control) as between-subject variables. There was a significant main effect of time on the mean proportion of correct pitches F(1, 66) = 23.45, p < .001,

Mean pitch-matching scores by group pre to post (bars show standard error).

Mean pitch-matching scores by task (bars show standard error).
Because of the difference in the scoring procedure used, the familiar song data were analyzed separately in a 2 × 2 factorial design. As Figure 4 indicates, the experimental group demonstrated a marginal increase in accuracy, while the control group actually performed slightly worse on the posttest. However, these group differences were not statistically significant and there was no significant trial by condition interaction. Song accuracy was significantly correlated with pitch-matching accuracy in both the pretest, r(60) = 0.427, p = .001, and the posttest, r(60) = 0.458, p < .001, though the correlations were only moderate in strength.

Mean accuracy score pre to post by group for singing a familiar song.
Discussion
Research question 1 asked whether daily singing instruction could improve the singing accuracy of kindergarten-aged children when compared to the more typical no instruction condition. These results demonstrate that daily singing instruction can significantly aid the improvement of young children’s singing accuracy, though improvement was not seen on all tasks and the effect sizes were relatively small. Though both groups achieved a mean accuracy of just over 50% at posttest, this does not indicate a ceiling effect as there were students who achieved perfect scores on the measure at both pretest and posttest suggesting that students at this age are capable of performing at a high level. While it may seem obvious that daily instruction would be better than no instruction, American elementary schools often begin music instruction in grade 1 (age 6–7), thus benefits of instruction for younger children are an important consideration in determining the age at which formal instruction should begin. Our results support the idea that younger children can benefit significantly from attention to their singing skills over a time period as short as seven months.
The children in our control population also improved somewhat over time with maturation as many studies have suggested (Welch, 2006). However, that improvement may have been influenced by an unplanned intervention that occurred during the course of the study. Unbeknownst to us, the control school began offering a voluntary weekly after-school music program to kindergarten students approximately one month after our pretest. While the instruction in that program was focused more on movement and rhythm than on singing, it is possible that it could have influenced our control results. We felt it was important to explore the possible influence of this confound on the interpretation of our findings. We were able to identify that 21 of the 36 control participants elected to participate in the after-school program. Figure 5 shows the gain scores of the three groups: daily, weekly, and no instruction. While the weekly group did show greater gains than the no instruction group, the differences are negligible and both groups gained less than half of what the treatment group did. On a positive note, the confound may have provided an even stricter test of the benefits of daily instruction as our experimental group still improved more than the control even with the outside influence. Future research might explore the relationship between the frequency of singing instruction (daily, weekly) and improvement in accuracy over time.

Average gain in accuracy by frequency of instruction (bars show standard error).
Research question 2 dealt with how difficult the various singing accuracy tasks were for young children. We found that some of our pitch-matching tasks were significantly harder than others for young children though all showed improvement pre to post. The task main effect yielded our largest effect size (
The task-based differences found in this and other studies make it clear that researchers interested in singing development need to employ multiple measures of singing performance to get an accurate picture of children’s abilities. To that end, researchers in music education and music cognition have begun developing standard measures of singing accuracy incorporating multiple tasks (Demorest et al., 2015). 7 If singing researchers in various disciplines began using a standard measure, we could begin to compare findings across studies and form a clearer picture of how singing skills develop across the lifespan and the role of instruction in that development. Teachers should also consider the importance of multiple singing assessment tasks. As Demorest and Clements suggested, “For teachers, the choice of matching task could result in mislabeling students as uncertain who are capable of matching, or assuming that matching in one context automatically transfers to matching in all contexts” (2007, pp. 199–200).
Singing a song from memory was quite difficult for young children with an average score of approximately 4.8 out of 8 and no measurable improvement pre to post. Roberts and Davies (1975) found improvement in pitch-matching tasks that were similar to ours and their free song task also did not improve from pre to post. Their study dealt primarily with singers identified as inaccurate and provided eight weeks of training versus our study, which included all children and lasted seven months. They suggested that the lack of improvement in singing a familiar song might be due to the short training period, but our results suggest that improvement in singing whole songs may have a much longer trajectory for improvement.
Singing a song from memory requires a considerably different set of cognitive skills than matching pitches through pitch-matching. The child must access their memory of the song and reproduce it in the absence of any supporting music. In addition, issues of text and rhythm can affect accuracy (Gault, 2002). Given the central role of song singing in elementary music, the development of song singing accuracy merits more study with careful attention to task parameters.
The results of this study indicate that young children can benefit from regular singing instruction and that such instruction may accelerate the development of accurate singing. It is also worth noting that this study only examined improvements in accuracy, but previous studies have found that singing instruction also can benefit other important aspects of vocal production (Phillips 1985; Phillips & Aitchison, 1997, 1998). While young children may have many singing opportunities that are not related to formal instruction, other music educators have suggested that a singing-skills curriculum is superior to a song-based curriculum for students’ singing development (Phillips & Doneski, 2011; Welch, 2006). Future research needs to explore whether the singing gains found here were due more to the frequency of instruction or the focus on singing skills and whether the benefits of such instruction would continue as students mature or whether they level off. Also it would be interesting to note if the more difficult task of singing a song would eventually show improvement if students were followed over a longer time period.
Singing is a foundational musical skill that can influence children’s perceptions of their musicality and even influence whether or not they continue music instruction (Demorest et al., 2017). These results have implications for both the scope and content of children’s music education. It would be beneficial for young children’s singing development if primary schools began regular music instruction in the kindergarten grades. The results also indicate that children’s development may be enhanced if music teachers addressed singing skills directly in their music curriculum. In addition, teachers should be encouraged to use multiple tasks when assessing children’s singing in order to get the most accurate picture of their skills.
Footnotes
Acknowledgements
The authors are grateful to Kelly Foster Griffin for her assistance with this research.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors received financial support from the NSF (Grant BCS-1256964) and from the AIRS project.
