Abstract
Previous research has suggested that training on a musical instrument is associated with improvements in working memory and musical pitch perception ability. Good working memory and musical pitch perception ability, in turn, have been linked to certain aspects of language production. The current study examines whether working memory and/or pitch perception ability are possible mediators of the effect of musical training on second language phonological production. Native English-speaking undergraduate participants were asked questions about their previous music and Spanish training, and were asked to complete tests of pitch perception, working memory, and Spanish pronunciation. Results indicated that although musical training was linked to both better working memory and better pitch perception, only pitch perception ability was a significant predictor of Spanish pronunciation. These results suggest that incorporating musical training into language classes may be one way to improve second language pronunciation.
A growing body of research suggests possible cognitive benefits of formal musical training (Hannon & Trainor, 2007; Schellenberg, 2006). One area of cognition that has been hypothesized to be affected by musical training is linguistic ability (e.g., Piro & Ortiz, 2009). The link between music and language has been proposed because of similarities in music and language structure (Patel, 2003), similarities in musical perception ability and language development (Anvari, Trainor, Woodside, & Levy, 2002; Lamb & Gregory, 1993), and some overlap in the brain regions responsible for processing music and language (Koelsch & Siebel, 2005; Levitin & Menon, 2003). In this study, we examine how musical training may increase both pitch perception ability and working memory capacity, either of which may then be related to production aspects of language.
A few related lines of research would suggest a link between musical training, pitch perception ability, and language production. For example, musicians have enhanced auditory discrimination skills that appear to involve both bottom-up changes in the auditory cortex and brainstem and top-down attention processes (see Kraus & Chandrasekaran, 2010, for a review). These changes appear to result not just in improved processing for musical perception (e.g., the process of pitch, rhythm, tempo, contour, timbre, loudness, reverberation, meter, key, melody, or harmony) but also to increased sensitivity to speech sounds (Wong, Skoe, Russo, Dees, & Kraus, 2007). Neuroimaging studies suggest that the cortical areas involved in speech perception are active during speech production (Skipper, van Wassenhove, Nusbaum, & Small, 2007). One implication of this set of findings is that musical training could improve a person’s ability to produce language through changes in pitch perception ability.
As pointed out by Schellenberg & Peretz (2007), however, a proposed special relationship between musical training and language abilities is muddied by the fact that other general cognitive factors are also improved by musical training. In the context of the current study, this would imply that an observed relationship between musical training and language production could also be explained by changes in cognitive factors. One cognitive factor that is a plausible mediator between musical training and language production is working memory. Like pitch perception ability, working memory shows both improvements with musical training (Chan, Ho, & Cheung, 1998; Franklin et al., 2008; Ho, Cheung, & Chan, 2003) and links to language production (Acheson & MacDonald, 2009; Ellis, 1980). In addition, complex verbal working memory tasks are highly correlated with general fluid intelligence (Engle, Tuholski, Laughlin, & Conway, 1999), and are typically included in the broader definition of executive function (e.g., Miyake, Friedman, Emerson, Witzki, & Howerter, 2000). It is then possible that musical training could improve a person’s ability to produce language sounds through changes in working memory capacity.
While both the positive effects of musical training and the musical components of first language development have been well-documented, fewer studies have been conducted examining the relationship between musical training, pitch perception ability, working memory capacity, and L2 production. For example, Slevc & Miyake (2006) examined English language abilities in native Japanese speakers who had been living in the United States, and found that pitch perception ability predicted the ability to produce English language above and beyond other predictors of second language learning (including phonological short-term memory, a construct related to working memory). However, the study did not examine the potential causes of individual differences in pitch perception ability (like musical training). In addition, although phonological short-term memory was not a significant predictor of L2 productive phonology, short-term memory tasks (such as the digit span, used by Slevc & Miyake) are typically weaker predictors of higher order cognitive function than are more complex working memory tasks (e.g., reading span and operation span; Engle et al., 1999).
Therefore, the primary goal of the current study was to explore the relationship between musical training, pitch perception ability, working memory and L2 productive phonology. Specifically, we hypothesized that musical training would correlate with both pitch perception ability and working memory, but only pitch perception ability would then predict L2 phonological production. A secondary goal was to replicate and extend the previously observed relationship between pitch perception ability and L2 learning to a new population: English-speaking students learning Spanish as a second language.
Method
Participants
Participants were 45 native English speaking undergraduate students, (27 males), ages 18–23 (M = 18.8, SD = 1.12). Participants volunteered through the psychology department research pool and received course credit for participation. Each participant had previously enrolled in at least two semesters of Spanish language courses post-high school.
Materials
Operation span test
The operation span test was comprised of 42 different arithmetic-word pairs. For each pair, participants solved a 2-step math problem involving addition and subtraction, and then read a word aloud. Once a card showing the word ‘Recall’ was shown, participants attempted to recall the set of words they just read. The test began at two equation-word pairs per trial. After three trials, an additional equation-word pair was added until the participant was attempting to recall five words during each trial. Testing was discontinued when the participant could no longer recall all the words at a particular list length. Working memory scores were determined by adding the total number of words correctly recalled. 1 This test has a reliability of α = .80.
Pitch perception test
The pitch perception test was based on the Wing Measures of Musical Talents (Wing, 1968), which was used by Slevc & Miyake (2006). The test was comprised of three sections: a tone-judgment test, a chord-judgment test, and a melody-judgment test.
The tone-judgment test required participants to judge 10 half-step pitch differences between two tones. Each tone was played once through stereo speakers for duration of 2 seconds, with 4 seconds between tones. Participants indicated whether the second tone was higher, lower or the same in pitch relative to the first tone.
The chord-judgment test required participants to indicate whether ten different 3 or 4-note chords matched their proceeding arpeggios. Each chord was played for 2 seconds, followed by a 3- or 4-note arpeggio with notes spaced one second apart. Participants listened to the chord/arpeggio sequence once then decided whether the tones comprising the arpeggio were the same as the tones in the chord. If the participant indicated the tone was altered, they then indicated whether the altered tone moved higher or lower in pitch.
The melody-judgment test required participants to judge which sequential note, if any, changed between two melodies each consisting of five notes. The tempo of each melody varied between 90 and 100 beats per minute throughout five trials. Melodies were played two seconds apart from each other on stereo speakers. Participants listened to both melodies, then determined which sequential note of the second melody, if any, was changed. Altered tones in this section varied one half-step.
Pitch perception scores were determined by calculating the sum of each section’s total correct answers. As each section grew in difficulty, correct answers were weighted differently for each section. Correct answers in the tone, chord, and melody judgments were worth 2, 4, and 8 points respectively. Reliability for the pitch perception test was α = .81.
Spanish paragraph and productive phonology scoring
A recording was made using Alesis brand speakers, a Sterling Audio small condenser microphone with a TASCAM US-144 USB 2.0 Audio/MIDI Interface using Sonar LE music production software as each participant read aloud the first two paragraphs of The Little Red Riding Hood in Spanish (Caperucita Roja, 2009). Participants were instructed to take their time reading the paragraph aloud with their best Spanish accent, without regard for speed. Participants were allowed to correct any mispronunciations as they read but were only permitted a single reading of the passage. No time limit was given for the reading’s completion.
Spanish diction was assessed by two doctoral level Spanish diction professors at Appalachian State University. The professors each used a list of twelve criteria to assess students’ diction:
Overall correct pronunciation of vowels
Overall correct pronunciation of consonants
Correct placement of accents
Correct stop/fricative alternation of the voiced dental stop consonant
Double L’s pronounced laterally
Trill double R’s and R at the beginning of a word
Grapheme ‘H’ is silent
Grapheme ‘Qu’ is pronounced without labialization
Tilde on ñ pronounced
Use of English /r/
Voicing of /s/ pronounced as [z]
Maintains fluidity of Spanish diphthongs within and between words.
Each criterion was scored using a Likert scale from 0 (poor/absent) to 4 (outstanding/present). The L2 phonological score was assessed by reverse-scoring criteria 5, 10 and 11, then taking the sum of all 12 criteria. The sums given by both raters were then averaged to produce a final score. Inter-rater reliability for the total Spanish score was .77.
Procedure
After informed consent, participants were asked at what age they began both musical and language training, and how many years they had studied. Testing then began with performance of the operation span test. Participants were then administered the pitch perception test, followed by the Spanish reading.
Results
As may be seen in the top row of Table 1, the number of years of musical training was significantly correlated with both pitch perception ability and working memory capacity. However, only pitch perception ability was significantly correlated with L2 Spanish scores; working memory scores were not.
Intercorrelations of study variables
Note: Correlations across the first row were computed with the full, continuous range of scores for the years of musical training. Correlations down the first column were computed with Musical Training dummy-coded as a dichotomous variable (No Training = 0, Training = 1). Years Music = years of instructor training with a musical instrument. Semesters Spanish = number of semesters of Spanish as a second language. Age Spanish = age at which student took first Spanish course. Working memory = operation span score (Total possible: 42). Pitch perception ability = pitch perception test score (Total possible = 100). Spanish Phonology = average Spanish productive phonology score score given by raters (Total Possible = 36 ). * p < .05
To further test whether pitch perception ability mediated the effect of musical training on Spanish productive phonology, we completed a series multiple regression analyses using the steps proposed by Baron & Kenny (1986). Specifically, mediation tests whether the effect of one variable ‘A’ (e.g., musical training) on another variable ‘C’ (e.g., Spanish productive phonology) can be accounted for by an intervening variable ‘B’ (e.g., pitch perception ability or working memory). Mediation can be visualized as a cascade, in which A influences B, which in turn influences C (see Figure 1). To test whether B is a mediator of the A–C relationship, three conditions must occur: (1) A and B must be correlated, (2) A and C must be correlated, and (3) the relationship between A and C must be significantly reduced or eliminated when B is controlled, but the relationship between B and C must remain. Figure 1 shows the proposed mediation models for pitch perception and working memory.

Mediation models for pitch perception and working memory. Numbers are standardized regression weights for the relevant regression equations. Significant relationships are indicated by solid lines and single asterisks; non-significant relationships are indicated by dashed lines and the indication ns
Total years of musical training was positively associated with pitch perception scores, β = .35, t(43) = 2.45, p = .02, and musical training was also associated with L2 Spanish productive phonology, β = .39, t(43) = 2.79, p < .01. When both years of training and pitch perception scores were added to the prediction of Spanish productive phonology, pitch perception scores were significantly positively related to productive phonology, β = .46, t(42) = 3.47, p < .01, but the relationship between training and productive phonology failed to reach significance, β = .23, t(42) = 1.72, p = .09. The Sobel test for mediation was significant, z = 1.98, p < .05, indicating that in the current sample, the effect of musical training on L2 production was completely mediated by the effects of training on pitch perception ability. In addition, adding the variables of measuring Spanish exposure (Age at the beginning of Spanish learning, Semesters of Spanish) to the full prediction equation again yielded pitch perception ability as the only significant predictor of Spanish pronunciation, β = .49, t(40) = 3.84, p < .001.
Completing the above steps for working memory as a mediator indicated that although years of training was significantly associated with both Spanish production (see above) and working memory, β = .38, t(43) = 2.03, p = .04, when both years of training and working memory were added to the prediction equation only years of training reached statistical significance, β = .38, t(42) = 2.58, p = .01 (β = .03, t(42) = 0.19, p = .85 for working memory; z = .185, p = .85 for Sobel test).
The above analyses treated years of training as a continuous variable. About half (N = 23) of the subjects, however, had no musical training. The remaining 25 participants had between one and twelve years of instruction on a musical instrument (M = 4.76 years, SD = 3.19 years). Because of the dichotomous nature of the training variable in the current sample, we repeated the above analyses with training as a dummy-coded dichotomous variable (No Training = 0, Training =1) in the correlation and regression analyses. The correlations between the dummy-coded training variable and the other study variables may be seen in the first column of Table 1.
The major change in this analysis was that the correlation between working memory and training dropped out; that is, there were no group differences in working memory between the Training and No Training groups. The group differences in pitch perception ability and Spanish pronunciation, however, remained. Repeating the mediational analyses for pitch perception ability with the dummy coded variable indicated that Training predicted both Pitch perception ability, β = .42, t(42) = 3.02, p < .01, and Spanish Pronunciation, β = .52, t(42) = 4.08, p < .001, but this time both Training β = .37, t(42) = 2.79, p < .01, and Pitch perception ability, β = .39, t(42) = 2.98, p < .01 remained significant predictors of Spanish pronunciation. The Sobel test, however, remained significant, z = 2.11, p = .03, again indicating that Pitch perception ability was a full mediator in this analysis as well. As in the previous analysis, controlling for the Spanish exposure variables did not significantly change the results – both Pitch perception ability, β = .42, t(40) = 3.37, p < .01, and musical training, β = .32, t(42) = 2.55, p < .02 remained significant predictors of Spanish pronunciation.
Finally, we repeated the first analysis within the group of 25 individuals who had at least one year of musical training. Here we found a different pattern of correlations than when those with no musical training were included. Although pitch perception ability and Spanish pronunciation were still significantly correlated (r =.57, p = .003), neither pitch perception ability nor Spanish pronunciation were significantly correlated with years of instruction (rs of .10 and .04, respectively, ps > .6). The correlation between years of instruction and working memory capacity, however, remained moderate (r = .39), although the correlation did not reach statistical significance in this smaller sample (p = .06). Working memory still did not correlate with Spanish pronunciation (r = –.12, p = .53). Because the conditions for moderation were not met in this group, we did not do further analysis. Scatterplots of the relationship between years of musical training and Spanish pronunciation are shown in Figure 2, illustrating how this relationship changes depending on the inclusion of participants with no musical training.

Scatterplots of the relationship between Years of Musical Training and Spanish Productive Phonology. The top portion of the figure includes participants with no musical training. The bottom portion of the figure shows the relationship with these participants excluded
Discussion
The results of this study support the hypothesis that pitch perception ability is a unique mediator between musical training and L2 Spanish productive phonology. Although participation in musical training predicted both working memory capacity and pitch perception ability, only pitch perception ability predicted L2 Spanish productive phonology. Moreover, pitch perception ability predicted L2 Spanish production above and beyond either the age the person started learning Spanish or the semesters of formal Spanish instruction. This finding supports previous research suggesting that there is a special link between pitch perception ability and productive phonology (e.g., Slevc & Miyake, 2006). It also extends those findings to suggest that musical training may improve pitch perception ability, which in turn improves productive phonology, and that the musical training/language production relationship cannot better be accounted for by the effects of music lessons on working memory.
We believe that second language productive phonology is linked to the ability to perceive minute differences in intonation, and that this accounts for the strong correlation between pitch perception ability and productive phonology in the current study. We acknowledge, however, that some of our measures of productive phonology (e.g., ‘Trill double R’s and R at the beginning of a word’) do not seem to directly tap into pitch perception. In addition, musical ability is multifaceted, and although there is some overlap in cortical regions for music perception and language, there is also evidence that pitch perception may involve neural regions distinct from those involved in language processing (e.g., Peretz & Zatorre, 2005). It is possible, therefore, that an unmeasured third variable related to general musical ability is driving the relationship between pitch processing ability and second language production. Future behavioral research could address the third variable problem by including multiple measures of musical ability as possible mediators, rather than simply pitch perception.
Relatedly, although working memory was not a significant mediator of the impact of musical training on L2 production, it may still be a significant mediator of other cognitive effects of musical training. For example, working memory is a well-known mediator of individual and developmental differences in general intelligence (e.g., Engle et al., 1999; Salthouse, 1993). This makes increases in working memory a likely mediator of music lessons’ impact on general intelligence. Similarly, although working memory was not a mediator of L2 production, it may mediate a relationship between musical training and other aspects of second language learning. For example, Slevc and Mirake’s (2006) study showed that short-term phonological memory predicted L2 receptive phonology and syntax, but not productive phonology. One reason for this may be the fact that working memory capacity contributes to syntax, sentence/word order, conveyed messages, grammar and response planning (Acheson & MacDonald, 2009).
There are several limitations to the current finding that should be examined in future research. First, as with much of the literature on the impact of music lessons, the current study is not a true experiment, and some caution of the causal mechanisms are warranted. In particular, it is possible that innate music ability may impact who takes music lessons to begin with. A related point is that the years of training within the group of individuals who have taken music lessons was not correlated with either pitch perception ability or Spanish pronunciation. Some caution is warranted in the interpretation of this finding, given the small N in this analysis and the fact that previous research has found relationships between brain changes in sound perception and the years of musical training within groups of musicians (Kraus & Chandrasekaram, 2010). If the current finding is replicated, it would either support the alternative hypothesis of causality, or suggest that the impact of musical training on second language productive phonology is dose-dependent, with initial exposure to musical training being the important factor.
In addition, both the current study and the Slevc & Miyake (2006) study examined ‘late’ second language learners who learned primarily through formal classroom instruction. It may be that the benefits of musical instruction are unique to this type of learner, who tends to be past the ‘window’ of a language learning sensitive period. A related limitation is that we only assessed the amount of formal training in Spanish, and did not ask about other sources of Spanish exposure participants may have had. It is possible that non-formal exposure to Spanish-speaking friends, television programs, or radio exposure could also influence a person’s ability to speak Spanish with a native accent.
Despite these limitations, this study brings us one step closer to understanding how musical training may influence certain aspects of language production. Although both working memory and pitch perception ability are improved as a result of musical training, only pitch perception ability mediates the effect of musical training on L2 productive phonology.
Footnotes
Acknowledgements
James Posedel, Department of Psychology, Appalachian State University; Lisa Emery, Department of Psychology, Appalachian State University; Benjamin Souza, Department of Foreign Language & Literatures, Appalachian State University; Catherine Fountain, Department of Foreign Language & Literatures, Appalachian State University. Benjamin Souza and Catherine Fountain contributed equally to this project, and authorship order was determined by a coin toss.
