Abstract
Absolute pitch (AP) refers to labelling individual pitches in the absence of external reference. A widely endorsed theory regards AP as a privileged ability enjoyed by selected few with rare genetic makeup and musical training starting in early childhood. However, recent evidence showed that even adults can learn AP, and some can attain a performance level comparable to natural AP possessors. These training studies involved native tonal language speakers, whose acquisition of AP might be facilitated by tonal language exposure during early childhood. In this study, adults speaking non-tonal languages went through AP training that was 20-hr long, computerised and personalised. Performance on average improved, which was accompanied by enhanced working memory for tones, whereas relative pitch judgement and sensitivity to small pitch differences remained unchanged. Notably, two out of 13 learned to label all 12 pitches within an octave, with accuracy and response time comparable to natural AP possessors. Overall, the findings suggest that tonal language exposure is not a prerequisite for AP learning in adulthood. The understanding of the origin of AP would benefit from considering the role of lifelong learning instead of focusing only on early childhood experience.
Introduction
Absolute pitch (AP) refers to the ability to label the pitch of an isolated tone. It is well known to be rare among musicians, with some suggesting a prevalence rate of only 1/10,000 in the population (Takeuchi & Hulse, 1993; Ward, 1999), whereas the rates among music students in colleges or in music conservatories ranged from 7% to around 50% in different countries (Deutsch et al., 2006; Miyazaki et al., 2012). A popular theory suggests that AP can only be acquired among individuals with rare genetic makeup as well as early learning experience within the critical period (Bachem, 1940; Baharloo et al., 1998; Chin, 2003; Levitin & Rogers, 2005; Takeuchi & Hulse, 1993; Zatorre, 2003). The role of genes is consistent with the finding that AP runs in families (Baharloo et al., 1998; Drayna, 2007; Gregersen et al., 2001), and support for the critical period comes from the observation that training AP in adults has enjoyed very limited success (Cuddy, 1968, 1970; Levitin & Rogers, 2005; Russo et al., 2003; Van Hedger et al., 2015; Ward, 1999).
A recent study, however, demonstrated that adults can learn AP through a computerised and intensive training protocol for 12–40 hr (Wong et al., 2020). The training required participants to perform pitch naming or verification, with the number of pitches gradually increased to 12 based on individual progress. It led to general performance improvement in all participants that lasted for 1–3 months. The generalisation of learning to new timbres and octaves depended on the variety of trained tones, a typical characteristic of perceptual learning (Nosofsky, 1986, 1987; Wong et al., 2011). Importantly, across three experiments, 14% (six out of 43) of participants learned to name all 12 pitches at 90% or above, a performance level comparable to natural AP possessors in the real world. This finding challenges the idea that AP acquisition is constrained by the critical period, and suggests that any gene required for AP acquisition would have to be more common in the population than previously thought. The understanding of AP acquisition would thus benefit from a learning perspective, and individual degrees of AP may simply reflect differences in exposure, attention, and motivation to acquire pitch names of tones in an absolute manner.
The primary goal of this study was to test whether tonal language exposure during early childhood is a necessary condition for learning AP in adulthood. In the aforementioned study that successfully trained AP in adults (Wong et al., 2020), all participants spoke Cantonese, a tonal language, as their first language. According to some theoretical proposals, their tonal language learning as a child has established absolute, precise, and stable tonal templates that are associated with words, which might have facilitated AP acquisition as a “second language” (Deutsch et al., 2004, 2006, 2009). This leaves open a possibility that AP acquisition in adulthood is impossible for non-tonal speakers who are not exposed to tonal languages during early childhood. Another recent study reported successful training of two out of six adults in an AP training that spanned 8 weeks (Van Hedger et al., 2019). They did not report the language background of the two successful candidates, and therefore, it was possible that the participants might be non-tonal language speakers. However, those two participants could perform AP judgement with an error well within 1 semitone before training for both the piano timbre and multiple timbre conditions (Van Hedger et al., 2019). This level of precision is one of the most common definitions for the possession of AP in the literature (Bermudez & Zatorre, 2009; Crozier, 1997; Loui et al., 2012). Therefore, these two participants could be considered “AP possessors” before training. Here, we tested whether non-tonal language speakers with low pitch naming performance to begin with would be able to learn AP in adulthood.
Another goal of this study is to explore the importance of various cognitive factors that might support the learning of AP. We identified three candidate abilities based on the literature, namely, tonal working memory, relative pitch, and pitch sensitivity, and examined if they would change in the process of AP learning. First, in contrast to the proposal that AP possessors do not have better working memory for tones than others (Miyazaki & Rakowski, 2002; Takeuchi & Hulse, 1993), some findings suggest that tonal working memory is better among natural AP possessors (Ross et al., 2005; Siegel, 1974) and can predict individual degrees of practice effect in an AP task (Van Hedger et al., 2015). We, therefore, tested whether tonal working memory is enhanced when AP improves. Second, relative pitch refers to the ability to identify the relationship between pitches, or “interval,” in terms of the number of semitones. Relative pitch is commonly acquired by trained musicians, who perform relative pitch judgements by “feeling” the distance between the two tones without knowing their exact pitches (Levitin & Rogers, 2005). AP possessors have an additional route to perform relative pitch judgements—to name the two tones, and then figure out their relation based on musical knowledge. Interestingly, while it has been reported that AP possessors performed better than non-possessors on relative pitch judgements (Dooley & Deutsch, 2010, 2011), others reported that some AP possessors have difficulty making relative pitch judgements (Miyazaki & Rakowski, 2002), indicating that acquiring AP does not always make relative pitch judgements easier. Here, we clarify whether relative pitch is a separable ability independent of AP performance, or it benefits when AP improves. Third, perceptual encoding of fine pitch categories has been considered one of the key processes in pitch naming (Zatorre, 2003). Consistently, some AP possessors had superior ability to discriminate in-tune versus mistuned tones (Ross et al., 2005). We examined whether AP training is accompanied by improved sensitivity to small changes in pitch between two tones.
In this study, adult speakers of non-tonal languages went through a 20-hr AP training regimen that was computerised, gamified, individualised, and intensive. They performed identical tests before and after training that included measures of AP, tonal working memory, relative pitch, and pitch sensitivity. Participants were exchange students in Hong Kong, who were not exposed to tonal languages during childhood. With a practical limitation that the participants had a very short period of stay in Hong Kong, we carefully focused the AP training on developing certain critical aspects of AP abilities such that it would be more informative of the learning mechanisms of AP. Here, the end goal of the AP training required participants to perform fast and accurate naming of 12 pitches within an octave and with a single timbre, for a few reasons. First, passing this AP training would mean that one can name all pitches in a Western music scale accurately without an external reference tone, the basic definition of AP (Takeuchi & Hulse, 1993; Ward, 1999). Second, this ability, though limited to a single octave and might not be considered “full-fetched AP possessors” by those favouring the use of multiple octaves during testing (Athos et al., 2007; Baharloo et al., 1998), fits well with the screening criterion for real-world “AP possessors” commonly used in the literature (Keenan et al., 2001; Loui et al., 2011; Ward & Burns, 1982; Zatorre & Beckett, 1989). Third, while speeded responses have not been included as part of the basic definition of AP, it corresponds well with what real-world “AP possessors” can typically do—to perform automatic and effortless AP judgement (Levitin & Rogers, 2005). Although response time of AP judgements of “AP possessors” varied from around 1.5 s (Carroll, 1975; Miyazaki, 1990; for example, Miyazaki, 1989; Takeuchi & Hulse, 1993) to more than 3 s (Bermudez & Zatorre, 2009), slower response time of a few seconds does not match with the presumed automaticity of “AP possessors.” Instead, it corresponds well with a common alternative strategy to AP—to name the pitch of a tone relative to references (Levitin & Rogers, 2005; Takeuchi & Hulse, 1993). Given that none of the previous studies had successfully demonstrated trained AP from scratch with faster or at least comparable response time with real-world ‘AP possessors’, 1 we decided to impose a very short response time window of 1.2 s in the AP training. This would be comparable to one of the fastest mean response times of “AP possessors” in the literature (1,216 ms; Miyazaki, 1990). Despite the fact that response time per se cannot ascertain whether an individual names every pitch absolutely (Wong et al., 2020), it at least ensures that the “degree of automaticity” of AP would be similar between the trained AP and real-world “AP possessors.” Overall, the task demand of fast and accurate AP naming provides a robust test of the learnability of AP in non-tonal speakers.
The predictions of the current training study were as follows. First, if tonal language learning experienced in childhood was not the main reason for the success of AP training before, then in this study we should also observe considerable AP improvement for non-tonal language speakers. Otherwise, AP improvement would be much smaller or even absent. Second, cognitive abilities (tone working memory, relative pitch, and/or pitch sensitivity) that are important for supporting AP learning should be enhanced with improved AP after training.
Methods
Participants
We aimed at recruiting as many exchange students as possible in a 3-month period, and ended up training 13 exchange students at the Chinese University of Hong Kong (six males, mean age = 21.23, SD = 3.98). The sample size required was estimated using GPower 3.1.9.2 based on our previous training study on tonal language speakers (Wong et al., 2020). In that study, a large effect size was observed for the improvement in adults during the generalisation test (pretest vs. posttest; average
The music training and language background of the participants.
For participants who played more than one instrument, their major instrument was listed first.
Materials
Both the training and testing were conducted on computers using MATLAB (Natick, MA) with the PsychToolbox extension (Brainard, 1997; Pelli, 1997) within a 3-month period. Participants were requested to bring their own earphones and adjust the volume to a comfortable level before the training or testing started. Fourteen synthetic tones from A#4 to C#6 were used for training and testing. Each synthetic tone was the sum of the sinusoidal waveforms of the fundamental frequency and harmonics of the tones (Bermudez et al., 2009). F#5, G5, G#5, and A5, each with three deviant versions higher in frequencies by 1%, 2%, and 3%, respectively, were also generated in sine wave timbre for the pitch sensitivity test. The tone durations were different in different tests (see below), and an on/off-ramp of 10% in length was applied to all tones.
Prepost test
Within 3 days before and after the training, an identical prepost test (Figure 1) was administered to the participants to track how the training modified different aspects of the participant’s pitch perception and memory. Two participants passed all 30 levels of the training and were invited to another prepost test 1 month later to examine whether their learning sustained.

The design of the four tasks included in the prepost test: (a) AP test, (b) tonal working memory test, (c) relative pitch test, and (d) pitch sensitivity test.
In the AP test, participants heard a 1-s tone and named its pitch within 5 s by keypress (Figure 1a). The response mapping between the pitches and the keys (From “1” to “=” on a standard computer keyboard for the 12 pitches) was shown on the screen so that no memorisation was required. There were 36 trials, with the 12 synthetic tones in octave five each presented three times in a randomised order. Before testing, participants were allowed five practice trials with feedback. The actual trials did not provide feedback so as to avoid the use of reference tones from the previous trials. The practice block and the test block were separated by 20 s of a descending Shepard tone to disrupt any working memory of the tones presented during practice. Performance was indicated by the proportion of correct trials and the pitch error (deviation of the response from the actual pitch in semitone).
In the tonal working memory test, participants adjusted the frequency of a tone to match the target (Van Hedger et al., 2015). During each trial, they heard a target tone for 250 ms, white noise for 1,000 ms, and a second tone for 250 ms (Figure 1b). They clicked the up or down arrows on the screen to adjust the second tone until it matched with the target, within a time limit of 15 s. There were two types of arrows. The large and small arrows changed the tone by 2/3 and 1/3 of a semitone, respectively. Each time when an arrow was clicked, a new tone was played to enable comparison. Participants clicked the “next tone” button to end the trial if the answer was ready before 15 s. The target tones were F#5, G5, G#5, and A5, whereas another eight standard tones ranging from D5 to C#6 served as the second tones. Four practice trials were provided with feedback before the 32 testing trials without feedback. Performance was indicated as the mean pitch errors from the target tones.
The relative pitch test was identical to the AP test except for the following. Participants heard two 1-s tones drawn from Octave 5, one immediately after another (Figure 1c). They then named the interval between the two tones by keypress within 5 s. The keys were mapped to the 12 intervals, ranging from minor second (1 semitone apart) to unison eighth (12 semitones apart). The 36 trials included the 12 intervals, each presented three times in randomised order. No Shephard tone was presented because an explicit reference was provided in this task during each trial. Performance was indicated by the pitch error (deviation of the reported interval from the actual interval in semitone).
In the pitch sensitivity test (Bonnel et al., 2003), participants judged whether two 100-ms tones were identical or different by keypress (“1” for identical and “2” for different; Figure 1d). A 200-ms blank was added between the tones. Four standard tones, F#5, G5, G#5, and A5, each with three deviant versions higher in frequencies by 1%, 2%, and 3%, respectively, were created in sine wave timbre. During the “same” trials, the standard tone was played twice. During the “different” trials, the standard tone was played first before the deviant. There were 48 trials, with an equal percentage of same and different trials, and the different trials were evenly distributed in the three frequency deviations. Eight practice trials were provided with feedback before the testing without feedback. Non-parametric sensitivity, A’, was the dependent measure (Stanislaw & Todorov, 1999).
AP training
The training was adapted from a previous study (Wong et al., 2020). It was gamified and structured into 30 levels, organised into 10 three-level parts with increasing number of pitches (from 3 to 12; Figure 2a). At each level, an accuracy of 90% was required to progress to the next level; otherwise, participants stayed at the same level. Participants finished an hour of training per day, until they finished 20 hr of training or passed all 30 levels with 90% accuracy.

The design of the training: (a) the training regimen and (b) the training task.
Within each three-level part, the first level was a pitch-naming task, in which participants heard a 1-s tone and named it by keypress within 5s (Figure 2b). Semitone errors were taken as incorrect, and trial-by-trial feedback was provided. The second level was identical to the first level except for the removal of feedback. This discouraged participants from using external reference, or relative pitch, to assist in pitch naming. The third level was identical to the second except that the response time window was shortened from 5 to 1.2 s to promote a fast, automatic mode of pitch naming. Hence, passing this level showed that the participants could name the particular set of pitches with high accuracy and speed in an absolute manner. When participants passed this third level with 90% accuracy or above, the three-level part would repeat, with another pitch added to the training set.
The training tones covered the 12 pitches between C5 and B5 in synthetic timbre. When the number of training pitches increased from 3 to 12, the number of trials per level also increased from 12 to 30 to ensure that all training pitches were covered at each level. For each set of training pitches (e.g., E, F & F#), two extra pitches right outside of the pitch boundary of the set (e.g., D# & G) were included, for which participants should press the spacebar to indicate that they were “out of bound.” Hence, at the highest levels, participants would hear 14 instead of 12 different pitches. This feature of the training rendered it more difficult for participants to infer the name of a pitch in a relative manner. For example, at a level with E, F, and F# as training tones, one could simply name the tone with the lowest pitch as E and that with the highest pitch as F#, which is a relative pitch strategy. Including distractor tones like D# and G would make it more difficult to do so.
Before attempting a level, participants were allowed to listen to a specific pitch by pressing the corresponding key on a sample listening screen. A 20-s descending Shepard tone was played before the no-feedback level that was preceded by a with-feedback level, or after participants listened to sample tones. This was to disrupt participants’ pitch memory and interrupt pitch naming based on a previously provided reference.
To improve participants’ motivation and engagement of the training, participants were awarded 1, 2, and 3 tokens if they reached 60%, 75%, and 90% accuracy on a level. The accumulation of 10 tokens would give the participant a chance to launch, during the first levels (with feedback) of each three-level part, a specific trial in which a correct answer would count double towards the final accuracy. Participants could only stock up to two such chances at any given time. In addition, the special trial also appeared at random within the first (with feedback) levels of each part with a 1/80 chance.
A targeted exercise was additionally provided for the pitch with the lowest performance when the training set included five pitches or above. After every 15 attempted levels, the pitch with the lowest naming accuracy, computed by the hit rate minus the false positive rate, was identified. Two mandatory practice blocks were generated around this pitch. The first block included 12 trials with feedback, with four trials with the weakest pitch, six trials with the pitches adjacent to the weakest, and two trials with the pitches 2 semitones from the weakest. The second block included 22 trials without feedback, with six trials with the weakest pitch, 12 trials with the pitches adjacent to the weakest, and four trials with the pitches 2 semitones from the weakest. The response time window was 5 s for these exercises.
Results
Training progress
Overall, participants demonstrated substantial learning progress in pitch naming. By the end of training, they have learned to name, on average, 7.54 pitches (out of 12, range = 5–12, SD = 2.22) at an accuracy of 90% or above, with semitone errors regarded as incorrect (Figure 3). Notably, two out of 13 participants passed all levels of training and were able to name all of the 12 pitches at 90% accuracy. This level of pitch naming performance is comparable to the AP possessors’ performance in most of the empirical papers that adopted an objective performance-based definition of AP (Wong et al., 2020).

The number of pitches learned in the course of absolute pitch (AP) training. Number of pitches learned was defined as the number of pitches covered at a training level where the participants passed at 90% accuracy without trial-by-trial feedback. Each line refers to the course of training of one participant. For the two participants who attained the highest level (i.e., learned 12 pitches), the solid lines show the training sessions actually performed by the participants (three and nine sessions, respectively), whereas the dotted lines show the extrapolated learning progress of the following sessions for the calculation of the averaged trajectory of all participants (the black line).
Training effects on pre- and posttests
Prepost tests showed general improvement of AP after training (Figure 4a and b). Mean accuracy increased from .139 (SD = .063) at pretest to .378 (SD = .253) at posttest, t(12) = 3.21, p = .007, d = 1.30. Pitch error decreased from an average of 2.30 semitones at pretest (SD = .698) to 1.05 semitone at posttest (SD = .474), t(12) = −5.92, p < .0001, d = 2.09.

The pretest and posttest performance: (a) proportion correct in the AP naming task, (b) pitch error in the AP naming task, (c) pitch error in the tonal working memory task, (d) pitch error in the relative pitch naming task, (e) A’ in the pitch sensitivity task. Red and green dots mark the performance of the two participants who attained the highest level in AP training, as in Figure 3. Black horizontal lines denote the mean of performance of each condition. Mean differences between the pretest and posttest of each task was tested for statistical significance.
Two participants passed all 30 levels of the training. The first participant (JL; data in red in Figures 3 and 4) needed 9 hr to pass all training levels, and reached naming performance comparable to AP possessors in the posttest (proportion correct = .917, pitch error = .250 semitone). When invited 1 month later to a second posttest, the performance showed that the training effect largely sustained (proportion correct = .833; pitch error = .333 semitone). The other participant (TF; data in green in Figures 3 and 4) took 3 hr to pass all training levels, and also performed well in the first posttest (proportion correct = .917, pitch error = .167 semitone). During the second posttest a month later, while the accuracy dropped and pitch error increased (proportion correct = .278; pitch error = .778 semitone), the pitch error was still smaller than all the other participants at the first posttest. These two participants had an onset training age at 12 and 7, respectively, and 8 and 24 years of active playing, respectively (Table 1).
Tonal working memory was enhanced after AP training (Figure 4c). The mean pitch error was reduced from 1.08 semitone at pretest (SD = .598) to .727 semitone at posttest (SD = .391), t(12) = −3.11, p = .009, d = .699. Training had no effect on the performance in the other two tasks (Figure 4d and e). For the relative pitch test, the average pitch error were similar at pretest (M = 2.03, SD = .749) and posttest (M = 2.00, SD = .897), t(12) = −.144, p = .888, d = .031. For the pitch sensitivity test, the A’ at pretest (M = .692, SD = .142) and posttest (M = .674, SD = .182) were also similar, t(12) = −.488, p = .634, d = .111.
Discussion
The current findings suggest that it is possible for non-tonal language speakers to learn AP in adulthood. At the group level, the average error of pitch naming dropped from 2.30 semitones in the pretest to 1.05 semitone in the posttest, a substantial improvement in precision. In addition, two out of 13 participants successfully passed all training levels and learned to name all 12 pitches within an octave, with a performance level (error = .250 and .333 semitones) comparable to that of the natural AP possessors. Beside AP naming, tonal working memory was also enhanced after training, whereas relative pitch performance and pitch sensitivity did not change.
The role of tonal language exposure in early childhood in AP learning
The substantial improvement in AP in non-tonal language speakers has important implications regarding the explanation of AP occurrence in terms of genes plus learning within the critical period (Bachem, 1940; Baharloo et al., 1998; Chin, 2003; Levitin & Rogers, 2005; Takeuchi & Hulse, 1993; Zatorre, 2003). Concerning the critical period, the considerable improvement for all participants, despite their lack of exposure to tonal languages before adulthood, suggest that exposure to tonal languages during the critical period is not a necessary condition for developing AP in adulthood. This is inconsistent with the hypothesis that tonal language exposure during early childhood is crucial for subsequent AP development (Deutsch et al., 2004, 2006, 2009). Concerning genes, the current findings did not speak directly to its role. It should be noted, however, that individual difference in training progress was considerable, with only a small proportion (<20%) of participants passing all training levels in our current and previous studies (Wong et al., 2020). Therefore, while our findings argue against the extreme 1 in 10,000 estimate of AP occurrence with respect to genes alone (Takeuchi & Hulse, 1993), it is conceivable that genes may play a role in affecting the rate and extent to which different individuals learn AP.
The AP training effects in adults speaking tonal and non-tonal languages in previous (Wong et al., 2020) and current studies highlight the role of lifelong learning in the development of AP. In this study, we did not observe any “glass ceiling” of AP learning. Instead, most participants were improving gradually towards the end of the training (Figure 3). Given sufficient motivation and persistence (Vansteenkiste et al., 2004), it is possible that more participants, if not all, can pass the highest level of the training if it had lasted longer. The learning perspective also naturally accommodates the existing findings that AP is modulated by experience. For example, AP performance is better for the highly exposed pitch “A4” (Levitin & Rogers, 2005), for the timbre of one’s own instrument (Takeuchi & Hulse, 1993), for pitches that are more frequently exposed (Deutsch et al., 2013), and in a familiar context similar to one’s musical training experience (Wong & Wong, 2014). Together with the successful AP training for young children (Crozier, 1997; Miyazaki & Ogawa, 2006; Sakakibara, 2014), our finding suggests that AP is learnable from early childhood to adulthood.
Was the trained AP as fast and automatic as that in natural AP possessors? Passing the AP training means that participants could perform accurate AP judgement within the 1.2-s time window imposed in certain training levels. Indeed, at the last training level after which the two participants passed the AP training, the mean correct response time of the AP judgement was 1,036.4 ms (SD = 301.3) and 857.5 ms (SD = 232.2), respectively. This performance was faster than that of the natural AP possessors when they were requested to respond as quickly as possible in previous studies (response times ranging from 1,216 to 1,662 ms; Carroll, 1975; Miyazaki, 1989, 1990). This fast response also makes it less likely that relative pitch strategies were used, as calculating the pitch name of a tone based on the provided reference and the perceived interval between the tone and the reference are time-consuming (Takeuchi & Hulse, 1993). It is worth noting that to date there is no known way to ascertain whether all pitches were named in an absolute manner simply based on the accuracy and response time of AP judgement (Wong et al., 2020). With the caveat in mind about the difficulty of inferring the cognitive mechanisms underlying AP judgement, the comparable response times between the two participants and the natural AP possessors suggests a similar level of automaticity involved.
A limitation, however, concerns that only one octave and one timbre were involved in the training and prepost tests. As discussed in the introduction, the choice was made to focus the limited time to train up the speed in AP naming rather than generalisation to multiple octaves and timbres. Although the trained AP fits with at least one of the widely adopted standards of defining natural AP (Keenan et al., 2001; Loui et al., 2011; Ward & Burns, 1982; Zatorre & Beckett, 1989), it can be said that what was trained does not align with the “full-fetched” versions of AP defined by multiple-octave AP tests. Specifically, one may question whether it is possible for non-tonal speakers to learn to name tones in another octave that share the same 12 pitch labels, that is, to acquire the “pitch class” or “chroma” of tones (e.g., C4, C5, and C6 all share the chroma of “C”). From the perceptual learning perspective, extending fast and accurate AP judgement from one octave to another is likely a matter of training time since the difficulty level involved in the additional learning—to discriminate between pitches of semitone distances—stays similar. Having learned to name the 1st to 12th tones quickly and precisely, there is no reason to suspect that learning to name the 13th to 24th tones in another octave would be impossible. Also, the harmonics of each complex tone used in this study naturally include the frequency of the tones of the same pitch class in higher octaves (Mcdermott & Oxenham, 2008). Given this physical relationship between tones in the same pitch class that does not overlap with the frequencies in any other pitch classes, mapping tones in the same pitch class to a common label should not create another major obstacle in learning. Indeed, generalisation of AP training to new octaves and timbres has been demonstrated in our previous study involving tonal language speakers (Wong et al., 2020). Hence, it is likely that the learned AP can be further extended and observed in more octaves and timbres for non-tonal language speakers. Future studies may examine this and also compare the extent of generalisation for speakers of tonal versus non-tonal languages so as to further clarify the impact of tonal language experience on AP learning. And while it is plausible that the trained AP can be extended to additional octaves with more training time, the same might not be true for speeded response. Specifically, extending an accurate yet unspeeded AP judgement to an accurate and fast one may involve fundamental changes in the underlying cognitive processes, for example, from controlled to automatic responses (Shiffrin & Schneider, 1977). This is why we intentionally chose to train speeded judgement rather than naming in multiple octaves.
Another issue with training on only one octave is that one cannot discourage the use of a relative pitch strategy during AP judgement by having successive tones in larger than an octave distance. This concern is not substantiated because of two reasons. First, while many researchers believe that having tones in larger distances would discourage relative pitch strategies (Carroll, 1975; Deutsch, 1972; Miyazaki, 1988, 1989, 1990), there is no direct evidence of its effectiveness to our knowledge. The heart of this problem is that there is currently no agreed way to unveil whether the cognitive processes involved in the AP judgement was relative or absolute. Second, speeded responses would indeed be one of the best indicators of relatively “automatic and absolute judgment” because the comparison involved in relative pitch judgement takes time (Wong et al., 2020). Here, the trained AP was at least as fast as that of the natural AP, and therefore, relative pitch strategies could be considered as unlikely as those observed in natural AP.
The role of other cognitive abilities in AP learning
Our findings suggest an important role of tonal auditory working memory in AP development, which is consistent with the previous findings in the literature. For example, it has been found that tonal working memory was better among natural AP possessors (Ross et al., 2005). Also, individuals with a better tonal working memory benefitted more from a single-session AP learning (Van Hedger et al., 2015). Here, we showed improvement in tonal working memory after AP training. Importantly, the improvement was observable at the group level when most of the participants were far from fully acquiring AP, suggesting that it was enhanced during the acquisition of AP instead of an end product or a fixed quality. Why is tonal working memory important in learning AP?
In the tonal working memory test, participants heard a tone for every keypress during the frequency adjustment of the second tone. In other words, this task measured one’s ability to hold a precise working memory of tones despite a series of incoming interference from the same category. This ability to maintain the content of the working memory while resisting incoming interference may be critical for acquiring AP in at least two ways. First, many pairings between labels and tones were introduced together during the training. Reducing interference (e.g., proactive or retroactive; Keppel & Underwood, 1962) facilitates the maintenance of the learned associations. Second, one is constantly exposed to many pairings between pitches and speech sound in spoken speech, during which a certain pitch (or similar ones) could have been associated with many other verbal labels. This may largely reduce the distinctiveness of the learned associations and decrease memory retention (Nairne, 2002). The ability to resist such interference would allow consolidation of memory while it is still fragile. Consistently, the two adults who finished all training levels were also the two who performed the best among others in the tonal working memory task before training, with the pitch error approaching zero (Figure 4c). Their excellent tonal working memory might have facilitated their AP acquisition during training, and explained their faster learning progress during the initial stage of AP training compared with the other participants (Figure 3). This reasoning is also consistent with the observation that AP possessors had a larger auditory digit span than non-possessors, which might facilitate the process of associating speech labels to a tone (Deutsch & Dooley, 2013).
In contrast, relative pitch and pitch sensitivity did not change after AP training. There are two possible interpretations of these findings. On one hand, the abilities are separable from AP development such that the previously observed advantages in these abilities in natural AP were simply coincidence. On the other hand, however, these abilities may still be related to AP, but are harder to modify (than tonal working memory) given the moderate amount of AP learning at the group level. It is therefore possible that when the AP learning is more complete, one may then observe changes in these abilities. This may be clarified in further studies where individuals with and without AP are compared in terms of these abilities, and in modified training studies where a larger number of individuals completed the AP training.
In conclusion, it is possible for adults to learn AP, whether they speak tonal or non-tonal languages during early childhood. This challenges the crucial role of tonal language exposure during early childhood on AP acquisition. Instead, the role of lifelong experience should also be considered to better understand the origin and manifestation of AP. In terms of the cognitive abilities behind AP, tonal working memory may support the learning AP by resisting incoming interference.
Footnotes
Data Accessibility Statement
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors would like to thank Department of Educational Psychology and Department of Psychology at the Chinese University of Hong Kong for internal research funding support.
