Abstract
Absolute pitch (AP) is the ability to effortlessly identify or produce pitches without reference. However, behavioral research has shown that pitch perception and identification in certain timbres are more difficult for AP possessors. In this study, we investigate whether pitch identification and labeling in different timbres (piano and voice) would require different amounts of cognitive resource allocation. We measured accuracy, response time, and pupillary responses of 18 musicians with varying degrees of AP while performing a pitch identification test. We also examined whether behavioral and psychophysiological responses were related to aspects of musical experience, such as the age of onset of musical training, daily hours of practice, and years of musical training. Behavioral results revealed significantly longer response time for vocal tones compared to piano tones. However, there was no difference in accuracy when comparing pitch labeling in piano and vocal tones. On the psychophysiological level, pupillary responses were significantly different across timbre conditions, with larger pupil dilation for vocal tones than piano tones. We also observed an effect of key color (whether the tones corresponded to diatonic or chromatic tones in a C major scale) on pupil dilation, with greater dilation for pitches corresponding to black keys compared to white keys. These findings expand the current knowledge regarding how pitches in different timbres are processed by musicians with varying degrees of AP and open new avenues for the investigation of potentially different cognitive mechanisms involved in the processing of the human voice and musical timbres by AP possessors.
Absolute pitch (AP) has been described as the ability to effortlessly name or produce isolated pitches without the need for a reference pitch (Deutsch, 2013; Parncutt & Levitin, 2001; Takeuchi & Hulse, 1993; Ward, 1999). AP is evident primarily in individuals with musical training as it requires memory for specific pitches and a mental representation between pitches and their verbal labels (i.e., note names; Sergeant & Vraka, 2014; Takeuchi & Hulse, 1993). This ability is sometimes considered to be very rare, but its prevalence varies depending on the population studied and the methods and criteria used to establish the threshold for determining AP possession. While some researchers consider AP possession only for those who score above 85% or 90% accuracy in pitch-labeling tasks (Chavarria-Soley, 2016; Deutsch et al., 2006), others adopt more flexible criteria and consider performances above chance as an indicator of some degree of AP (Leite et al., 2016; Vanzella & Schellenberg, 2010). The option for the latter criterion is based on the notion that performances on pitch-naming tests vary substantially among people with musical training and are usually spread in a continuum rather than categorized in a discrete bimodal distribution (Baharloo et al., 1998; Leite et al., 2016; Levitin & Rogers, 2005; Vitouch, 2003). The adoption of a lower threshold for determining AP possession thus contributes to a better understanding of how this ability manifests itself in different cohorts and how specific aspects of musical experience (e.g., hours of practice, years of training) or the stimuli (such as timbre or register) would influence performance in pitch-labeling tasks (Bermudez & Zatorre, 2009; Vanzella & Schellenberg, 2010).
Research on AP dates back to the 19th century; however, there is still no standard method for the behavioral assessment of this ability (for a discussion on this topic, see Bermudez & Zatorre, 2009). As a result, a variety of methods has been used to determine AP ability in past research. For instance, pitch-labeling tests described in the literature vary greatly in their choice of stimuli timbre, using particularly piano timbre or synthesized tones (reviewed in Deutsch, 2013; Takeuchi & Hulse, 1993). However, it has been recently demonstrated that pitch-labeling ability is significantly influenced by the timbre of the stimuli (e.g., Marvin & Brinkman, 2000; Miyazaki, 1989; Pantev et al., 2001; Vanzella & Schellenberg, 2010). Research has shown, for instance, that pitch identification speed and accuracy are significantly worse for unfamiliar timbres and synthesized tones (Athos et al., 2007; Baharloo et al., 1998; Lockhead & Byrd, 1981; Miyazaki, 1990). Vanzella and Schellenberg (2010) tested a large sample of participants with varying degrees of AP on a pitch-labeling task that included piano, pure tone, natural (sung) voice, and synthesized voice. The study demonstrated that task performance was significantly worse with sung tones than piano or pure tones, suggesting that certain types of stimuli may require more effort to be processed. In other words, to accurately identify pitches in certain timbres, participants may need to resort to more costly relational strategies to compensate for a limitation in their AP ability. Such behavioral manifestations, revealed in terms of distinct patterns of accuracy and response time for different timbres, could, therefore, indicate that distinct neuropsychological processes are at play. To investigate this hypothesis, the present study examined AP processing of vocal and instrumental timbres using pupillometry.
It is well-established that pupillary responses are a sensitive and reliable indicator of the extent of central nervous system processing devoted to a task (Beatty, 1982; Granholm et al., 1996). Increases in pupil diameter correspond to increases in the amount of information processing and relate to the effort required to accomplish a given task (Granholm & Steinhauer, 2004). The positive correlations between mental effort and pupil diameter have been consistently shown in a variety of domains, such as arithmetic (Hess & Polt, 1964), visuospatial (Alnaes et al., 2014), language (Hyona et al., 1995; Zekveld et al., 2010), and memory (Kahneman & Beatty, 1966; Van Der Meer et al., 2003). One of the recognized advantages of pupillometry as an indirect measure of neural activity during cognitively demanding tasks is that pupil dilation is a high temporal-resolution measure, providing a dynamic indication of changes in cognitive effort over time during perception (Kang et al., 2014; Van Der Wel & Van Steenbergen, 2018).
To the best of our knowledge, only one study has used pupillometry to investigate the amount of resource allocation needed during the identification of musical pitches (Schlemmer et al., 2005). In that study, the authors analyzed pupil diameter as well as behavioral data (error rates and reaction times) of nine AP possessors (participants who scored higher than 70% in the applied test) while performing a pitch identification task that contained instrumental tones that were classified as more or less familiar. On the behavioral level, their results corroborated previous findings suggesting that accuracy is higher for familiar timbres compared to less familiar timbres (Marvin & Brinkman, 2000; Miyazaki, 1989; Takeuchi & Hulse, 1993). They also found that tones corresponding to the white keys of a piano (diatonic tones in the C major scale) were identified more accurately than tones that corresponded to the black keys (non-diatonic tones in the C major scale); however, results indicated that there was no significant timbre and key-color effects on response times (Miyazaki, 1990; Takeuchi & Hulse, 1991). On the psychophysiological level, they found that peak dilation of the pupil was significantly dependent on key color and that the effect of timbre in pupil dilation was only marginally significant, with less familiar instrumental timbres showing slightly larger pupil dilation than more familiar timbres. The authors concluded that frequent exposure to tones in specific timbres may enhance the cortical representation for these tones, thus facilitating their perception and identification (Schlemmer et al., 2005, p. 472). However, as vocal timbre was not tested in the Schlemmer and colleagues’ study, the question of whether pitch identification and labeling in instrumental and voice timbres require different amounts of cognitive resource allocation is still open for investigation.
Studies have also reported that certain aspects of the AP possessors’ musical background may influence performance in pitch identification tasks (Deutsch, 2013). Early onset of musical training, for example, has been widely associated with higher accuracy (Miyazaki, 1988; Miyazaki et al., 2012; Parncutt & Levitin, 2001; Takeuchi & Hulse, 1993; Vanzella & Schellenberg, 2010), suggesting that “those who acquire AP early show effortlessness and automaticity not seen in those who acquire AP as adults” (Levitin & Zatorre, 2003, p. 109).
In light of these findings, this study investigated whether pitch identification and labeling in different timbres (piano and voice) would require different amounts of cognitive resource allocation in AP possessors. We hypothesized that during a pitch-labeling test with vocal and piano tones, processing of tones presented in the vocal timbre would demand higher amounts of cognitive resource allocation, resulting in greater pupil dilations and possibly longer response times and lower levels of accuracy than during processing and identification of tones in the piano timbre. To test this hypothesis, we measured accuracy, response times, and pupillary responses of 18 musicians with varying degrees of AP. We also examined the influence of key color (i.e., whether a pitch corresponds to black or white piano keys) on the accuracy, response time, and pupillary responses. This was motivated by previous findings indicating that AP possessors do not identify all 12 tones of a chromatic scale equally well, as pitches corresponding to white keys on the keyboard (C, D, E, F, G, A, and B) are generally identified with greater accuracy and speed than those corresponding to black keys (C#/Db, D#/Eb, F#/Gb, G#/Ab, and A#/Bb; Marvin & Brinkman, 2000; Miyazaki, 1988; Miyazaki et al., 2012; Takeuchi & Hulse, 1993; Vanzella & Schellenberg, 2010). In addition, since studies have shown that participants’ musical background can influence performance in pitch-labeling tasks (Miyazaki, 1988; Miyazaki et al., 2012; Parncutt & Levitin, 2001; Takeuchi & Hulse, 1993; Vanzella & Schellenberg, 2010), we also assessed whether behavioral and psychophysiological responses were associated with aspects of musical experience, such as the age of onset of musical training, daily hours of practice, and years of musical training.
Material and methods
Participants
Twenty-four musicians (16 females, aged between 18 and 34 years) enrolled in undergraduate and graduate music courses at universities in Sao Paulo/Brazil were recruited through posters and e-mail lists. All participants undertook a pitch-labeling task to determine (AP) ability. Participants were presented with 18 piano tones and 18 vocal tones divided into six blocks and were asked to identify and label each tone as fast and as accurately as possible. Because semitone errors are commonly observed among individuals with AP abilities and the consistency of pitch-labeling accuracy may vary (Bermudez & Zatorre, 2009; Miyazaki, 1989; Ward, 1999; Wynn, 1993), we counted one semitone deviation of the presented pitch as a correct response in accordance with previous studies (Athos et al., 2007; Baharloo et al., 1998; Schulze et al., 2009; Vanzella & Schellenberg, 2010). According to this criterion, for each trial there were three correct responses out of 12 possibilities (considering the chromatic scale), hence a .25 probability of success for each trial. Therefore, an above-chance performance on 36 trials corresponded to 15 correct answers (41.67%) using the binomial distribution with a .25 probability of success per trial and p-value <.01. Participants who performed above chance were considered AP possessors for the purposes of this study and included in the sample. Six participants were excluded from the final study sample. Three participants performed below chance on the pitch-labeling test and pupil diameter was not properly recorded for three participants due to technical issues.
Thus, the final study sample includes 18 participants (11 women, age M: 23 years, SD: 4.8 years) with an average of 9.8 years of formal music training (SD = 4.5). Participants reported that formal music lessons started on average at the age of 8.7 years (SD = 3.9) and that average daily music practice was 2.9 hr (SD = 1.3). Five participants self-declared having AP, while eight were unsure whether they were AP possessors, and five self-reported having no AP ability. Participants played different musical instruments: eight musicians indicated that piano was their primary instrument, five participants played a string instrument, two indicated that voice was their primary instrument, and percussion/drums, acoustic guitar, and a wind instrument was reported as the primary instrument by one participant each. None of the participants had a history of neurological or psychiatric disorders and reported no ophthalmologic problems other than correctable vision. All participants declared not having taken any medication or substance in the 12-hr period before the experiment that influence pupil responses.
The experimental procedures conformed with the Declaration of Helsinki and were approved by the Research Ethics Committee of the Universidade Federal do ABC (Brazil). All participants were fully informed about the nature of the study and provided written informed consent prior to enrolment.
Auditory stimuli
The stimuli used in this study are a subset of the test stimuli used in a previous experiment (Vanzella & Schellenberg, 2010). The auditory stimuli consisted of 24 piano tones and 24 vocal (natural voice) tones comprising all notes from the chromatic scale between A3 (220 Hz) and G-sharp 5 (831 Hz). The piano stimuli were sounds of 1-s duration with a natural onset and a 10-ms linear offset, generated on a Roland FP-4 keyboard using the Grand Piano 1. The natural vocal stimuli were sung by a professional soprano singer. The singer heard each tone played on a piano and then sang the same pitch with the vowel /a/. Stimuli had a duration of 1 s with 10 ms linear onsets and offsets. To guarantee that all stimuli had equal overall energy, they were digitally edited using SoundEdit to normalize the root-mean-squared values. To analyze the amount of pitch variation in the steady portion of each test tone (from 250 to 750 ms), the fundamental frequency was calculated in 10 ms intervals and the standard deviation for each test tone was recorded in semitones. The analysis indicated that the piano tones had minor variations (mean SD <1/100 of a semitone), while the natural vocal tones had a mean standard deviation greater than 1/5 of a semitone. The original stimuli were low-pass filtered and downsampled from 44,100 Hz to 11,025 Hz sampling rate due to storage and processing time limitations (for further details on the stimuli, see Vanzella & Schellenberg, 2010).
Pitch-labeling task
Participants were asked to identify and label musical pitches in different timbres as fast and as accurately as possible. The task consisted of a total of 72 trials and was designed in a block paradigm. Each block consisted of six 1–s duration tones, with a 5-s interstimulus interval during which participants verbally labeled the perceived tone. Each pitch was labeled with its note name, with pitches corresponding to the black keys having two possible labels (e.g., C-sharp or D-flat). Tones with the same frequency or tones that were an octave apart were not presented in sequence. In total, there were 12 blocks: six for the vocal timbre and six for the piano timbre. The blocks were presented in a pseudo-randomized order, and each block was separated by a 30-s silent interval. A 200-ms white noise warning signal was presented 5 s prior to the beginning of each block. There were no breaks within blocks and no feedback was provided.
Apparatus and procedure
Before starting the experimental session, all participants were briefed on the nature of the study and completed a demographic questionnaire with information about their musical background and whether they thought they were AP possessors. Participants were then seated in a sound-treated room facing a white wall containing a 5-cm fixation mark at a 1-m viewing distance. Pupil diameter of the right eye was measured during the task at a 60-Hz sampling rate with an eye-tracker device (Mobile Eye-5 ASL). Participants were asked to fixate their gaze on the target mark during the presentation of the stimuli and response. The auditory stimuli were presented binaurally through headphones (Maxell model Solid2 Mid) and participants’ verbal (acoustical) responses (i.e., note names) were recorded at an 11,025-Hz sampling rate with a laptop (Apple MacBookPro 2012). A single dedicated program (written in MATLAB version R2015a) was used to present the stimuli and record the verbal responses. The whole procedure, including pre-screening test, eye-tracker calibration, and task performance, took approximately 30 min to be completed.
Data processing
To extract information regarding accuracy and response times from the audio recording of the participants’ verbal responses, a Voice Activity Detection algorithm (Freeman et al., 1989) was used to obtain the onset times of responses for each trial. Response time was measured as the elapsed time between the onset of the played tone and the onset of the participant’s verbal (acoustical) response to the perceived tone. Accuracy scores were calculated by comparing the participant’s responses to the tones played in each trial.
Pupil diameter was estimated for each subject using a one-eye camera. Pupil diameter traces were pre-processed to remove eye blinks by linear interpolation. Subsequently, we applied a low-pass filter with a 10-Hz cut-off and segmented the pupil diameter signal into 6-s segments corresponding to the interstimulus interval, that is, 1 s before tone onset and 5 s after the stimulus presentation. The pupil dilation signal was then estimated by subtracting the average pupil diameter during the 1-s pre-stimulus period. For each participant, we performed three different signal-averaging procedures generating three types of pupil dilation curves: (1) average of the 6-s segments from all notes belonging to the same timbre; (2) average of the 6-s segments only from pitches corresponding to the white keys of a keyboard in each timbre; and (3) average of the 6-s segments only from pitches corresponding to the black keys of a keyboard in each timbre.
Finally, as a metric for statistical comparison between conditions, we used the sum of the average pupil dilation curve points from the stimulus onset to 3 s afterward, herein referred to as the 3-s pupil dilation. We decided to analyze the pupil dilation within a 3-s time window because, on average, the pupil dilation curve, after reaching a peak approximately 1.5 s after the stimulus onset, roughly decayed to half the peak value after 3 s. Thus, the 3-s area-under-the-curve averages were used as metrics for statistical purposes, whereas the 6-s pupil epochs were used for visualization.
Statistical analysis
Statistical analyses were performed on the following variables: (1) demographic information regarding the musical background, including the age of onset of musical training, daily hours of practice, and years of formal training in music; (2) behavioral measures related to the pitch-labeling test: accuracy (%) and response time (ms); and (3) pupillary measures (3 s pupil dilation). A repeated-measures analysis of variance (ANOVA) was conducted to determine whether timbre (piano/voice) and key color (white/black keys) had a significant effect on labeling accuracy, response time, and pupillary response. Pearson’s correlation was applied to investigate possible associations between musical background variables (daily hours of training, years of formal training, age at the start of musical training, and main instrument of training) and the behavioral and pupillary responses at the pitch-labeling task for the entire sample (n = 18). Type I error was set at 5% for all tests.
Results
Behavioral results
Performance on the pitch-labeling task was assessed by measuring accuracy (deeming semitone errors as correct answers) and response time in trials with piano and vocal tones. Overall, the average accuracy across all participants and trials was 83.1% (SD = 16%), while the average response time on the pitch-labeling task was 1,933 ms (SD = 577 ms). Figures 1–3 depict participants’ average individual performance in relation to accuracy (correct labels and mean absolute deviations [MADs]) and response time across trials. No between-group analyses were conducted as no relevant performance differences were observed within our study sample in relation to accuracy and response time.

Performance Accuracy as Measured With the Percentage of Correctly Labeled Pitches Averaged Across Trials for Each Participant in the Pitch-Labeling Task.

Mean Absolute Deviations (MADs, Semitones) to Target Pitch Averaged Across Trials for Each Participant in the Pitch-Labeling Task. Error Bars Are Standard Error.

Average Performance Response Time (ms) of All Participants Across Trials in the Pitch-Labeling Task. Error Bars Are Standard Error.
Considering the overall performance accuracy, as measured by the percentage of correctly labeled pitches, results revealed no main effect of timbre, F(1, 17) = 1.410, p = .25. This suggests that there was no statistical difference between the number of correctly identified pitches in the piano tones (M = 85.3%, SE = 3.6) in relation to the vocal tones (M = 81.4%, SE = 4.5). We also examined the effect of key color (pitches corresponding to the black or white keys of a piano) on accuracy. Statistical analysis suggested no main effect of key color on accuracy, F(1, 17) = .080, p = .78, and no significant interaction between timbre and key color was identified, F(1, 17) = .025, p = .87. Table 1 displays the means and standard errors for accuracy and response time for each timbre and key color.
Average Accuracy and Response Time for Each Timbre and Key Color.
Note. Values are expressed as mean (standard error).
To further examine and confirm the results regarding performance accuracy, we conducted an additional analysis using a more sensitive measure of pitch-labeling accuracy called MAD (Dohn et al., 2012; Leite et al., 2016). MAD was calculated by averaging all absolute distances between response and the target pitch (in semitones) across trials. The analysis indicated no main effect of timbre on pitch-labeling accuracy, F(1, 17) = 0.401, p = .53, but revealed a significant effect of key color, F(1, 17) = 11.789, p = .003,
When analyzing response time in the pitch-labeling task for correct responses, the ANOVA revealed a significant main effect of timbre, F(1, 17) = 12.689, p = .002,
Pearson’s correlation analysis revealed a strong negative correlation between response time and accuracy, indicating that longer response time was associated with lower accuracy, r = –.432, p < .001. We also investigated the possible associations between musical background (age at the start of musical training, daily hours of training, years of formal training, and main instrument of training) and pitch-labeling performance (accuracy and response time). Results indicated a significant positive correlation between daily hours of practice and accuracy, r = .241, p = .04, and a positive correlation between age of onset of musical training with response time, r = .271, p = .02. When we considered associations between performance and music background in relation to each timbre and key color, no significant associations between these variables were observed. In addition, there was no association between participants’ age, gender, or instrument of training on task accuracy and response time.
Pupil dilation results
The analysis of the average pupil dilation, as shown by the size of the area under the curve, revealed that there was a significant main effect of timbre in the 3-s pupil dilation medians, F(1, 17) = 7.469, p = .01,

Grand Average (All Participants) of Pupil Dilation Measures (Arbitrary Units) During Pitch-Labeling of Voice Tones (Red Line) and Piano Tones (Blue Line) as a Function of Time. The 0-s Line Indicates the Stimuli Onset, While the 3-s Line Indicates the Endpoint for the Area-Under-the-Curve Measure.
Results also indicated a significant main effect of key color on the average pupil response, F(1, 17) = 5.523, p = .03,

Grand Average (All Participants) of Pupil Dilation Measures (Arbitrary Units) During Pitch-Labeling of Tones in the Voice Timbre for Pitches Corresponding to the White Keys (Red Line) and Black Keys (Blue Line) as a Function of Time. The 0-s Line Indicates the Stimuli Onset, While the 3-s Line Indicates the Endpoint for the Area-Under-the-Curve Measure.

Grand Average (All Participants) of Pupil Dilation Measures (Arbitrary Units) During Pitch Labeling of Tones in the Piano Timbre for Pitches Corresponding to the White Keys (Red Line) and Black Keys (Blue Line) as a Function of Time. The 0-s Line Indicates the Stimuli Onset, While the 3-s Line Indicates the Endpoint for the Area-Under-the-Curve Measure.
Pearson’s correlations were conducted to determine whether there were possible associations between musical background and pupillary responses. Overall, results indicated a significant negative correlation between pupil dilation and the age of onset of musical training, r = –.298, p = .01, and a significant positive correlation between pupillary response and years of formal musical training, r = .314, p = .009. When considering associations between pupillary response and music background in relation to each timbre and key color (Table 2), there was a significant negative correlation between the age of the start of musical training and pupil dilation during the identification of pitches corresponding to black keys in the piano timbre, r = –.586, p = .01.
Pearson’s Correlations Between Musical Background Variables (Age at the Start of Musical Training, Daily Hours of Training, Years of Formal Training) and Pupil Dilation (Area Under the Curve) as a Function of Timbre and Key Color.
Discussion
This study tested whether pitch identification and labeling in different timbres (piano and voice) would require different amounts of cognitive resource allocation in AP possessors. Based on previous behavioral research, we hypothesized that labeling vocal tones would demand greater cognitive effort than piano tones, which would be reflected behaviorally with differences in response time and accuracy as well as physiologically with greater pupil dilation.
Our behavioral results partially support this hypothesis. Our results revealed no statistically significant differences in accuracy when comparing pitch identification in piano and vocal tones, and an advantage for white-key pitches over black-key pitches was observed only when assessing labeling deviation to target pitch. On the other hand, our data revealed consistently longer response time for vocal tones in relation to piano tones. Overall, this result corroborates previous research suggesting that response latency in note-naming tasks by AP possessors is influenced by the timbre of the stimuli presented (e.g., Marvin & Brinkman, 2000; Miyazaki, 1989; Pantev et al., 2001; Vanzella & Schellenberg, 2010). Our results are also in line with studies demonstrating a key-color effect on pitch-labeling performance (Miyazaki, 1990; Takeuchi & Hulse, 1991), as participants were generally slower to label tones corresponding to the black keys than to white keys.
Timbre and key-color effects were also observed on our physiological data. We found that pupillary responses were significantly different across timbre conditions, with larger pupil dilation during pitch-labeling of vocal tones than piano tones. We also observed that pupil dilation was significantly greater when labeling tones that corresponded to black keys when compared to white keys, suggesting an influence of key color on pitch-labeling ability. These findings agree in part and extend previous research (Schlemmer et al., 2005). Our results concur with Schlemmer and colleagues in relation to the key-color effect on pitch labeling for AP possessors, as they also reported a greater pupil dilation for pitches corresponding to the black keys in relation to white keys. However, the authors reported only a marginal difference in pupillary response relating to timbre, which was likely due to a lack of statistical power given the small study sample or due to the fact that they used the peak of pupil dilation response as a measure of pupil response while we used the area under the curve. Therefore, the results reported in the present study expand previous findings by showing a significant difference in pupillary responses in relation to vocal and instrumental timbres during pitch identification and labeling for AP possessors.
Overall, our findings suggest that pitch identification in the natural voice may indeed require greater cognitive effort compared to instrumental timbres (i.e., piano) for AP possessors. One possible explanation for the note-naming difficulty for voices among AP possessors is that hearing voices or voice-like stimuli automatically activates neural pathways devoted to decoding linguistic and paralinguistic information and that this activation interacts with the identification of the pitch of stimuli produced by the human voice. More specifically, for AP possessors, voices may be inextricably linked with decoding meaning, which might then interfere with the decoding of non-referential information (pitch chroma) particularly when the task also requires mappings with atypical linguistic information (i.e., note names). Human voice sounds carry crucial non-linguistic information, aside from phonological information (Belin et al., 2002, 2004), and growing evidence suggests that non-linguistic properties of speech (such as the speaker’s identity or emotional state) are integrated into the decoding of meaning (Kreitewolf et al., 2014; Nygaard, 2005; Van Berkum et al., 2008). The finding that AP possessors’ ability to identify tones is affected by the timbre of the natural voice may indicate that non-linguistic information may not be discarded in the process of pitch perception but rather are intrinsically represented, consequently demanding greater cognitive resource allocation to disentangle and decode non-referential information (pitch chroma) from vocal stimuli. In line with this hypothesis are studies showing that vocal sounds impair pitch processing in tasks requiring rapid note-naming (Vanzella & Schellenberg, 2010) and tuning judgments (Hutchins et al., 2012), but enhance memory for melodies (Weiss et al., 2012, 2015).
This interpretation is consistent with event-related potential studies showing that human voice elicits an early voice-specific response (Levy et al., 2001, 2003) that is distinctive from neural responses elicited by non-vocal sounds (Capilla et al., 2013; Charest et al., 2009; Gunji et al., 2003). There is also accumulating evidence for voice-sensitive cortical areas along the superior temporal sulci and superior temporal gyri that are more active in response to vocal sounds (whether speech or non-speech) than non-vocal sounds, such as environmental sounds or noise (Aglieri et al., 2018; Agus et al., 2017; Belin et al., 2000, 2002; Binder et al., 2000; Fecteau et al., 2004; Grandjean et al., 2005; Kriegstein & Giraud, 2004; Scott et al., 2000). When neural responses to voice and musical instruments are compared, research has demonstrated that the rapid brain response elicited by voices is indicative of an attentional enhancement generated by the significance of voice stimuli for human listeners (Levy et al., 2003). Previous research has suggested that greater pupil dilation for vocal than instrumental melodies could be indicative of heightened arousal due to the salience of the human voice (or vocal music) to human listeners (Weiss et al., 2016).
Alternatively, previous research has indicated that familiarity (Marvin & Brinkman, 2000; Takeuchi & Hulse, 1991) and years of exposure to a particular instrument (e.g., Brammer, 1951; Miyazaki, 1989, 1990; Schlemmer et al., 2005) can influence pitch-labeling ability. Specifically, it has been suggested that the key-color effect on speed and accuracy on pitch identification may be linked with greater familiarity and exposure to Western tonal music (Huron, 2006; Takeuchi & Hulse, 1991), where white-key pitches occur more frequently than pitches corresponding to the black keys (see also Ben-Haim et al., 2014; Simpson & Huron, 1994). It is also possible that early music training (Baharloo et al., 1998; Brown et al., 2002; Crozier, 1997; Deutsch et al., 2006; Levitin & Zatorre, 2003; Miyazaki, 1990; Russo et al., 2003; for review, see Deutsch, 2013) and years of exposure to a particular instrument (e.g. Brammer, 1951; Miyazaki, 1989, 1990; Schlemmer et al., 2005) may facilitate pitch labeling in specific timbres as it is assumed that the learned association between a certain timbre with a fixed musical scale would result in the most readily accessible representation for the template of a pitch chroma. Research generally agrees that pitch identification is significantly worse for unfamiliar timbres (Athos et al., 2007; Baharloo et al., 1998; Lockhead & Byrd, 1981; Miyazaki, 1990). Evidence from neurophysiological investigations also indicates that musicians exhibit timbre-specific gamma-band activity while listening to musical stimuli (Shahin et al., 2008) and that music training enhances auditory cortical representations for the timbre of the instrument of practice (Pantev et al., 2001).
Our results also suggest possible associations between music background (i.e., hours of daily training, age of onset of music training, and years of formal musical training) with pitch-labeling performance for AP possessors, with a positive correlation between the number of hours of daily practice and accuracy as well as a positive association between the age participants started their formal training with response time. We also found indications of associations between musical experience and pupillary responses, with a negative correlation between the age of onset of musical training and pupil dilation and a positive correlation between years of formal musical training and pupil response. However, limited conclusions can be made regarding possible associations between musical background, pupil dilation, and pitch-labeling abilities in the present study given our relatively small sample. Thus, future research is needed to better understand to what extent aspects of musicians’ musical experience, such as hours of daily training or the age of onset of music training in a particular instrument (piano or voice), directly interact with pupil dilation and pitch-labeling ability in AP possessors.
It may also be of interest to examine potential differences in cognitive effort required during pitch identification and labeling in other instrumental timbres. While the present findings corroborate previous research indicating a particular effect of vocal sounds on note-naming difficulties in AP possessors (Vanzella & Schellenberg, 2010), there are indications that the piano timbre might play a particular role in AP (Brammer, 1951; Li, 2021). The piano timbre is frequently used in AP research and is one of the most common musical instruments used in this context (reviewed in Deutsch, 2013; Takeuchi & Hulse, 1993). In addition, previous studies suggest that piano tones are often the easiest to identify in comparison to other non-vocal timbres (both instrumental and artificial), regardless of the participant’s main instrument of training (Li, 2021; Marvin & Brinkman, 2000; Vanzella & Schellenberg, 2010). Collectively, these findings suggest an advantage of piano tones in AP processing that may be associated with familiarity and years of exposure. Nonetheless, our own research found no evidence of a special link between piano training and piano tone identification (Vanzella & Schellenberg, 2010). Furthermore, if familiarity and early experience are associated with AP abilities, one could argue that the human voice would be one of the most familiar timbres there is since we are exposed to human voices daily from a very early age and our ability to decode linguistic and non-linguistic information contained in voices is key for human social interactions (Belin et al., 2004). Yet, as research results indicate, the human singing voice imposes significant challenges for pitch identification and labeling in AP possessors. Thus, future research may significantly benefit from further investigations on whether behavior and physiological results vary significantly within this population, considering that AP is not an all-or-none ability and that various factors, including timbre, affect pitch-labeling ability. That may be done, for instance, by considering the influence of music training in instruments with fixed pitches (e.g., piano) and non-fixed pitches (e.g., voice, violin) on note-naming performance and pupil response.
Greater variability in targeting and maintaining specific pitches among singers compared to fixed-pitch instruments, such as the piano may also play a role in the decrements in pitch-labeling performance for vocal sounds reported in the present study. It is well-known that pitch production is more stable in fixed-pitched instruments given to the configurations of resonators that support vibrations at one or more resonances (see discussion in the work of Schubert & Wolfe, 2013, p. 7). Our analysis indeed indicated greater amounts of pitch variations in the vocal tones than in the piano tones. However, Vanzella & Schellenberg (2010) demonstrated that AP possessors had more difficulty identifying the pitch of vocal tones (natural and synthesized) than non-vocal tones (piano or pure tones), which corroborates the hypothesis that vocal sounds may represent a special class of auditory stimuli. In addition, this previous study also indicated that AP possessors’ difficulty at identifying the pitch of voices could not be attributed solely to vibrato, which was much more pronounced in the natural compared to the synthesized vocal stimuli.
The current study presents some limitations that need to be considered. We acknowledge that, while pupillary responses are a sensitive and reliable indicator of cognitive effort, other factors can influence pupil dilation (e.g., loudness, salience, attention intensity). Future studies with control conditions are warranted to confirm that the effects on pupil dilation reported here are directly associated with changes in cognitive effort. Also, further research is needed with a larger sample to confirm the null effect of timbre on pitch-labeling accuracy and the associations between musical background variables (e.g., hours of daily training, age at onset of music training, and years of formal music training) and performance on this pitch-labeling task.
Conclusion
The present study provides behavioral and physiological evidence that pitch identification in the natural voice requires greater cognitive effort than instrumental timbres (i.e., piano) for AP possessors, as indicated by results showing significantly longer response time and larger pupil dilation for tones presented in the vocal timbre compared to the piano timbre. These findings may be associated with various (not mutually exclusive) factors, including the activation of neural pathways devoted to linguistic decoding by the vocal timbre, which interferes with the processing of non-linguistic features, such as pitch chroma; higher salience of the human voice than non-vocal timbres for human listeners; pitch fluctuations during singing compared to fixed-pitched instruments such as the piano; and greater familiarity and exposure to piano than other timbres. Further research would be of interest to disentangle the contribution of these factors on pitch perception and identification of tones in different timbres. Overall, these findings expand the current knowledge regarding how pitches in different timbres are processed by musicians with varying degrees of AP and open new avenues for the investigation of potentially different cognitive mechanisms involved in the processing of the human voice and musical timbres by AP possessors.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
