Abstract
The purpose of this study was to examine the effects of score study and conducting gesture on collegiate musicians’ ability to detect errors in a choral score. Is there a combined effect of gesture and score study that impacts undergraduate conductors’ identification of errors in a score? Participants (N = 53) viewed a sequence of four choral score excerpts presented via Zoom video conferencing. We asked participants to identify errors under one of four conditions: score study with a correct model recording, conducting with a simple timekeeping pattern while listening, both score study and conducting, or neither. After listening to each excerpt, participants notified the researcher about the exact location, voice part, and error type of any error that they heard. There were significant differences among conditions, with post hoc tests indicating superior error detection scores for the score study conditions. Conducting during the error detection task resulted in lower error detection scores, especially when not preceded by score study with a correct aural model.
To prepare for rehearsals, music teachers frequently engage in score study to become more familiar with the music they will teach. Familiarity with the music score leads to a greater understanding of possible musical challenges for students when they perform the piece (Chaffin, 2011; Ellis, 1994). The ability to readily detect errors could yield more efficient rehearsals through teachers being more predisposed to responding to problems that occur in students’ performance of the music.
There are specific traits and skills that appear to assist listeners with error detection success. Being a pianist (Hopkins, 1991) and being an excellent sight-singer (Killian, 1991) provided advantages in error detection tasks, demonstrating how error detection is inextricably linked with aural skills proficiency (Stambaugh & Nichols, 2020). Receiving contextual sight-singing and aural skills training (Sheldon, 1998) were beneficial to musicians for this same reason. Degree status and experience also played a role. For instance, Byo (1997) found that graduate students were more accurate than undergraduates at detecting errors. Researchers in earlier studies have concluded that practicing error detection was the most important variable that contributed to success in error detection acuity (Deal, 1985; DeCarbo, 1982; Malone, 1985; Ramsey, 1979).
Researchers have found that particular musical elements affect error detection success. For example, music majors detected errors more easily when asked to do so in one-part as opposed to multiple-part excerpts (Byo, 1997). Crowe (1996) found the same to be true with beginning conductors, with error detection becoming more difficult as the number of parts increased. Errors have been more readily identified in single-timbre versus multiple-timbre excerpts (Byo, 1993) and in homorhythmic versus polyrhythmic textures (Byo, 1997). Rhythmic discrepancies were easier to detect than wrong pitches (Byo, 1993), and errors in outer voices were easier to detect than those in inner voices (Hayslett, 1992; Huron, 1989). Also, errors in a soprano instrumental or vocal line have consistently been more readily detected than those in a bass line (Byo, 1993; Napoles, 2012; Napoles et al., 2017; Sheldon, 2004; Williams, 2022).
Engaging in other physical activities while attempting to identify errors seems deleterious to error detection success. For example, singing while listening hindered undergraduates’ ability to detect pitch and rhythm errors in multiple-part textures (Byo & Sheldon, 2000). Playing the piano while listening for errors also inhibited preservice teachers’ error detection skills (Napoles et al., 2017); one explanation provided for this difficulty was that having to focus on two tasks simultaneously (i.e., where a skill is not sufficiently automatized) proved detrimental to participants’ accuracy. When investigating whether lip synching (mouthing words but not making audible noise—a common practice for choral teachers) would impact error detection in a choral score (Napoles, 2012), the amount of practice confounded results. Collegiate musicians performed better on the second task they were given irrespective of lip-sync condition. Taken together, contextual interference in error detection tasks seems to be mitigated by experience and/or added study. Component rehearsal skills must be sufficiently automatized so that a teacher can rapidly shift attention from one task to another.
Researchers have investigated effects of conducting and conducting gestures with respect to error detection tasks. In one of the earliest studies in this line of research, Forsythe and Woods (1983) found that the act of conducting hindered error detection skills, positing that “the usual process of attending separately to various aspects (ear training in theory, conducting in conducting classes) and then expecting the two to merge in rehearsal techniques classes . . . may be inadequate” (p. 31). Blocher (1986), however, did not find that conducting significantly affected the error detection ability of college band instrumentalists. Musicians in his study who were not assigned to conduct did perform better in error detection, but not significantly so, and graduate and upper-level undergraduate conducting students were more successful in detecting errors than were lower-level students. Building on this work, Stiffler (2004) found that students who did not conduct during error detection identified significantly more errors in a MIDI-generated listening stimulus than did those who did conduct. Using a within-subjects study design, Waggoner (2011) observed that undergraduate music education students were more successful detecting errors in a recording than when conducting a live ensemble. Accuracy of identifying pitch and rhythm errors varied according to ensemble texture. Error detection in conducting thus appears to be a skill-based activity affected by context, task, and experience.
Researchers have examined whether particular score study methods are more effective than others as manifested in conductors’ error detection skills. These score study approaches have included study with the score alone, study with the score and a correct aural reference (i.e., a recording of the music being studied), and score study at the keyboard. Crowe’s (1996) beginning conductors were significantly more effective when studying a band score with a correct aural example compared to study with the score alone. Similarly, Hopkins (1991) found that student conductors detected errors in a choral score with significantly more accuracy when provided with a correct recorded example compared to score study at the piano alone. College musicians who listened to an excerpt immediately before evaluating it for errors performed better on an error detection task than when they only sight-read the excerpt (de Stwolinski et al., 1988). It seems clear from these studies that strategies used to generate an aural image of the music contributed to improved error detection skills. Still, it seems prudent to also consider score study and error detection in the context of conducting activities—as what would ordinarily be done in a rehearsal context (cf. Montemayor et al., 2016; Montemayor & Moss, 2009). Note that participants in each of the aforementioned studies performed their respective error detection tasks in controlled settings without gesturing at the same time.
Score study notwithstanding, gesturing while attempting to detect and correct errors during rehearsal may inhibit musicians’ error detection abilities (Forsythe & Woods, 1983). However, we found no studies where conducting gesture and score knowledge were considered in combination with respect to musicians’ error detection acuity. Determining how and whether conductors’ knowledge of the score and their gesture interact while attempting to detect errors seems important for conductor teacher educators who help novice conductors identify ways to solve individual and ensemble errors, thus maximizing their rehearsal efficiency. Such an examination would lend empirical insight to recommendations seen in conducting, score study, and ensemble pedagogy textbooks (e.g., Battisti & Garofalo, 1990; Feldman et al., 2021, Labuta & Matthews, 2018), which, in general, call for limited use of both recordings and gesture during the score study process so that one might best develop an independent aural image and interpretation of the music being studied (and the skills to do the same). Therefore, the purpose of this study was to examine the effects of score study and conducting gesture on collegiate musicians’ ability to detect errors in a choral score. The following question guided this study: Is there a combined effect of gesture and score study that impacts undergraduate conductors’ identification of errors in a score?
Method
Participants
Participants (N = 53) were undergraduate music majors who had taken at least one conducting course at one of two large schools of music in the midwestern and southwestern United States. Demographic data indicated our participants’ gender identity (male, n = 27; female, n = 25; nonbinary, n = 1), year in school (sophomore, n = 5; junior, n = 24; senior, n = 24), major (music education, n = 45; music performance, n = 2; performance and education double, n = 5; music undecided, n = 1), performing emphasis (voice, n = 31; woodwind, n = 6 [one participant reported a voice and woodwind double emphasis]; brass, n = 13; strings, n = 3; percussion, n = 1), years of private piano lessons (M = 3.03 years, SD = 4.52), semesters of class piano instruction (M = 3.43 semesters, SD = 1.26), and completed semesters of conducting class (M = 1.75 semesters, SD = 0.68).
Before we recruited participants, we conducted an a priori power analysis using G*Power software, Version 3.1.9.2 (Faul et al., 2007) to identify a minimum sample size with an acceptable level of statistical power to identify an effect (Cohen, 1988). Given the use of a repeated measures analysis of variance (ANOVA), an intended test power of .80, a significance level of α = .05, and a small effect size defined by Cohen (1988) as d = 0.20, the results of the power analysis indicated a minimum sample size of 36.
Music Stimulus Materials
We used four excerpts of choral music that had been used in previous studies examining collegiate musicians’ error detection abilities (Napoles, 2012; Napoles et al., 2017). These four-part (soprano, alto, tenor, and bass) hymn tunes were mostly diatonic, eight measures in length, and featured moderate tempi and simple rhythms. We deemed the excerpts to be of equivalent difficulty to each other and were comparable to material that both vocal and instrumental music majors would encounter in their music theory coursework. Consistent with other previous studies in this area (Byo, 1993, 1997), the first measure was error-free to establish a clear tonality. Each of the four professionally recorded audio excerpts included two intentionally performed errors—a pitch error in the soprano voice and a rhythmic error in the bass voice. We utilized professional singers in the recording, two on a part, singing all excerpts on text. Errors were present only in the outer voice parts to facilitate error detection (Hayslett, 1992; Huron, 1989). The excerpts were transcribed using Sibelius (with errors corrected) and converted to PDF files for use as participants’ score excerpts.
Procedure
We used a within-subjects design in which all participants rotated through each condition, namely (a) score study (with correct aural referent) with no conducting gesture during error detection task, (b) score study (with correct aural referent) with conducting gesture during error detection task, (c) no score study with conducting gesture during error detection task, and (d) no score study/no conducting gesture during error detection task. To control for order effects, participants were assigned to one of four orders through the use of a 4 × 4 Latin square. Complete procedures by condition are presented in Table S1 (see online supplemental material).
Although we had initially planned for an in-person protocol, we adapted our procedures to a virtual format via Zoom as a result of the COVID-19 pandemic. We created four videos (one for each of the orders) with the primary author explaining procedures and inserting scores and audio directly into the videos so that the video would replace the procedures that would have taken place in person. Following pilot tests (n = 8), we made minor changes to strengthen the clarity in the protocols and language, and we confirmed the feasibility of the virtual tasks. Each of the videos was approximately 22 minutes in duration.
We recruited participants via email announcements to students enrolled in music education courses. Once they agreed to participate, we provided participants with Institutional Review Board approved consent forms via the survey platform Qualtrics and instructions for materials to bring to the Zoom session, which was scheduled individually with one of the members of the research team. At the beginning of this session, we ensured participants had the necessary materials (including a laptop or desktop computer, headphones or earbuds, and a pencil and scratch paper) and then shared our screen and audio to begin the procedures. In the video, we provided instructions and led participants through a practice example of one of the tasks they would be asked to complete (the no score study/conducting gesture condition but with different music) to familiarize them with the general research protocol. We explained that they were to listen specifically for pitch errors and for rhythm errors. We gave participants the opportunity to ask questions before continuing with the study.
Participants viewed a series of four choral music score excerpts while listening to a recording of the excerpt that contained errors. For the score study conditions, before listening to the recording with errors, they were given three opportunities to listen to a high-quality, correct model recording interspersed with two 30-second periods to silently study the score without gesturing or making audible sounds. For the conducting gesture conditions, while listening to the recording with errors, participants were asked to gesture along with the recording as if conducting with a simple timekeeping pattern. Also for the conducting gesture conditions, we preceded the audio excerpt with a one-measure click track to establish the tempo. We confirmed each participant’s continuous use of headphones or earbuds and their compliance with the studying and gesturing protocols during the Zoom session.
Participants listened to each excerpt with errors twice, with a 30-second pause in between. Following the second excerpt, the research team member asked the participant to verbally identify the location (measure, beat number, and voice part) of any errors that they heard and to specify each one as either a pitch error or a rhythm error. Participants were permitted to take notes on their scratch paper during the error detection task. We did not indicate to participants how many errors were embedded in each excerpt.
Scoring
Similar to previous studies (Napoles, 2012; Napoles et al., 2017), we awarded one point for identifying the correct location (i.e., measure and beat) of the error, and if that was answered correctly, we awarded one additional point for the correct type of error (pitch or rhythm) and another point for the correct voice part of the error (soprano or bass), for a total of three possible points per error. Given that each excerpt had two errors (one in the soprano part, one in the bass part), there was a total of six possible points per excerpt and 24 total possible points for the entire task. We did not award fractions of a point, nor did we deduct points for identifying phantom errors.
Results
Prior to our main analysis, we reviewed our data set for assumptions of ANOVA testing. Although results of Shapiro-Wilk tests for normality were significant for each variable, values of skewness and kurtosis measurements were within acceptable limits (i.e., all skew values were smaller than 0.46, and all excess kurtosis values were smaller than 1.71), and visual inspection of the histogram plots indicated reasonably even distributions. Given that the numbers of participants within each presentation order were nearly equal and that ANOVA testing is considered robust to minor violations of the normality assumption (Blanca et al., 2017; Mertler & Vannatta Reinhart, 2017; Schmider et al., 2010), we felt comfortable in proceeding with parametric tests. We also checked for significant differences among orders, between data collection sites, and between vocalists and instrumentalists on total error detection scores, and for any interactions thereof, and found none; nor did we find differences on error detection scores among the four excerpts (ps > .05 on all comparisons).
The score study only condition yielded the highest error detection score (M = 3.13, SD = 1.26); scores were just slightly lower when score study was followed with gesturing during the error detection task (M = 2.70, SD = 1.53). Scores were notably lower in the no score study conditions both when gesturing during the error detection task (M = 1.72, SD = 1.74) and when not gesturing (M = 1.91, SD = 1.57). We analyzed these data with a repeated measures analysis of variance test. Results indicated that these scores were significantly different from one another, F(3, 156) = 11.05, p < .001, η p 2 = .17. Pairwise comparisons with a Bonferroni correction revealed that error detection scores for the score study only condition were higher than both the gesture only (p < .001, 95% confidence interval [CI] = [0.63, 2.20]) and no study or gesture (p < .001, 95% CI = [0.43, 2.03]) conditions, and these differences were significant. Error detection scores for the score study with gesture condition were higher than the gesture only condition (p = .013, 95% CI = [0.15, 1.81]), and this difference was also significant. No significant differences were seen between the score study with gesture and the score study without gesture conditions or between the no score study with gesture and the no score study without gesture conditions. The difference between the score study with gesture and the no score study with gesture conditions was also not statistically significant. These results are depicted in Figure 1.

Error Detection Scores According to Study and Gesture Condition.
In an exploratory effort, we conducted a separate analysis of our data to see if our experimental conditions might have affected error detection scores differently according to error location and type (i.e., soprano pitch error or bass rhythm error). Results of a two-way ANOVA indicated that there was indeed a significant Condition × Location interaction, F(3, 416) = 4.54, p = .004, η p 2 = .03. As depicted in Figure S2 (see online supplementary material), bass rhythm error detection mean scores were relatively stable across conditions, whereas scores for soprano pitch errors followed the general pattern found in our initial analysis.
Discussion
The purpose of this study was to examine the effects of score study and conducting gesture on collegiate musicians’ ability to detect errors in a choral score. We found that participants detected the most errors in the score study condition, when they were not being asked to engage in any additional physical tasks. This finding is congruent with previous research that showed singing (Byo & Sheldon, 2000), lip synching (Napoles, 2012), and playing the piano (Napoles et al., 2017) interfered with error detection acuity by competing for focus of attention. Other researchers had determined previously that score study with a correct aural referent was a superior method for detecting errors compared to score study at the piano (Hopkins, 1991), study with the score alone (Crowe, 1996), and sight-reading the excerpt (de Stwolinski et al., 1988). Montemayor and colleagues (Montemayor et al., 2016; Montemayor & Moss, 2009) also detected some distinct advantages to aural model-supported rehearsal preparation as evidenced by novice conductors’ discrete rehearsal behaviors.
The findings of our study are somewhat similar to those of Forsythe and Woods (1983), Stiffler (2004), and Waggoner (2011), all of whom observed that gesturing while attempting to detect errors inhibited musicians’ error detection abilities. Authors of all three of these studies questioned the practice of teaching conducting skills without simultaneous integration of error detection training. Figure 1 shows that the additional task of gesturing resulted in slightly lower scores (Ms = 3.13–2.70) for our participants. Blocher’s (1986) participants also scored higher in an error detection task when they did not conduct. Our study differs because we considered conducting gesture during error detection in combination with score study, a process that we believed added ecological validity to our experimental design and one that is often discussed by conducting pedagogues when preparing to lead an ensemble (Feldman et al., 2021; Labuta & Matthews, 2018).
We must note that although error detection scores within each score study condition were highest when not accompanied with gesture, we recognize that within those conditions, the difference between each gesture and no gesture condition was not statistically significant. This finding may suggest some degree of gesture automaticity among even novice conductors such that deleterious effects of gesture were marginal. It may also reflect the relatively straightforward gestural task assigned to participants in that they were requested to show only a “simple timekeeping pattern.” Neither expressivity nor accuracy of gesture was needed or evaluated. Importantly, although participants were executing conducting gestures (successfully so, in our judgment), our procedures did not also incorporate other duties normally concomitant with conducting, such as establishing tempo, conveying musical character, or providing feedback through gesture (all components of the cognitive load associated with conducting; cf. Bodnar, 2017; Chaffin, 2011). Although participants were tasked with error detection, they were not also providing corrections for these errors in a live social/musical context. To that end, we see even our slight differences as instructive. It should be noted that the gesture condition did not enhance error detection acuity, given that conducting does not appear to activate an array of monitoring skills. Such was not the case among these preservice teachers or among those in earlier studies. 1 Altogether, we recognize the fundamentally dynamic and interpersonal nature of musical leadership—all elements that warrant further systematic investigation with regard to error correction. We are also curious about how an error monitoring task might affect the quality of conducting gestures themselves, particularly among novice conductors whose skills in these regards are not yet automatized to the point of being able to rapidly shift attention from one task to another.
Given our finding that studying a music score with a correct model recording led to improved error detection and correction skill, we have three pedagogical suggestions for conducting and rehearsal techniques instructors to consider. First, error detection and correction activities within rehearsal techniques courses could be occasionally decoupled such that undergraduate conductors start the ensemble (or section) by verbally counting off or giving a preparatory gesture, followed by listening only to the performance without gesturing. This strategy could give confidence to novice conductors as they begin refining their error detection and correction skills by eliminating a source of interference. Second, basic conducting instructors could synthesize conducting gesture and error detection and correction from the beginning stages of conducting instruction. This synthesis might instill the idea that conducting involves a host of skills (e.g., leadership, error detection and correction, facial expression) rather than gesture alone. A future experiment comparing the error detection and correction skills of novice conductors who practiced while either conducting or not conducting may inform the curricular design of undergraduate conducting and rehearsal technique courses. Third, we suggest that instructors ask undergraduate conductors to compile a list of potential errors that could occur in the music that they are conducting and to provide rehearsal techniques to ameliorate these problems. This approach would help novices to anticipate problems in advance versus trying to identify and confront them for the first time during rehearsal.
As evidenced by participants’ error detection scores (see Figure 1), it appears that score study—always with an aural model, in our study—had the largest effect on error detection scores. When a correct aural referent was not present, our participants had a more difficult experience with the task. As found in previous studies (Napoles, 2012; Napoles et al., 2017), our participants did not excel at detecting errors overall. Therefore, we posit that repeatedly listening to a correct version of the score is advisable for undergraduates when endeavoring to detect errors in that score. Furthermore, because participants in our study were most successful detecting errors after they had studied the score, irrespective of gesture, perhaps conducting instructors should mandate processes whereby students must demonstrate knowledge of their scores before conducting or rehearsing an ensemble. Some examples could include a recorded discussion of the musical characteristics of the piece, completion of a score study self-checklist, or evidence of a marked score. Allowing novice conductors to rehearse without knowing the basic music elements found in their scores can likely only result in a poor musical experience for all.
We recognize that some professionals (e.g., Battisti & Garofalo, 1990) caution against the undue influence of recordings in the score study process as it relates to musical interpretation and long-term development of score reading skills—variables that, to our knowledge, have yet to be empirically examined (aside from self-report measures about musicians’ preferred use of recordings; see Volioti & Williamon, 2017, 2021). Still, in light of this finding and those from earlier research (Montemayor et al., 2016; Montemayor & Moss, 2009), using recordings seems to be a generally advisable practice for developing an aural image of the music in preparation for rehearsal. Conscientious conducting pedagogues and other music teacher educators might consider initially using recordings for this purpose, then gradually omitting them to help novice teachers develop independent score reading skills. In all cases, it would be advisable for conducting instructors to discuss characteristics of excellent performances, such as tone quality, balance, blend, and intonation. Providing lists of excellent collegiate and professional ensembles would help guide students to outstanding recordings.
As with previous studies using these excerpts (Napoles, 2012; Napoles et al., 2017), our participants experienced more difficulty locating bass rhythm errors compared with soprano pitch errors. In fact, our statistical analyses show that responses to our experimental conditions were different when participants were detecting bass and soprano errors. Successful identification of the soprano errors seemed particularly dependent on the score study condition, whereas bass errors were rarely identified irrespective of the additional tasks. Listeners’ proclivities to attend to top-voice errors has also been found in earlier research using multivoice instrumental music (Byo, 1993; Sheldon, 2004) and in a recent investigation by Williams (2022), who posited that difficulties in detecting bass errors could be a function of acoustic properties, visual placement of the melodic line, and/or a learned prioritization of melodic over harmonic material. Future investigations could explore these phenomena as they relate to gesture and score study, including examining whether differences in error detection skill are due to the voice part or the nature of the error. Additionally, it is unclear how participants utilized their trials when listening; future researchers might employ a think-aloud design that could further elucidate the analysis process and why bass errors were so difficult to detect.
The interaction among score study, conducting, and error detection and correction skills remains an important topic worthy of continued investigation. Although we know that the addition of multiple timbres (Byo, 1993) and an increase in rhythmic complexity (Byo, 1997) impede students’ ability to detect errors, future researchers might ask: Would the same effects be found when increasing the gestural demands of a piece of music, or would the effects be compounded by a lack of visual/auditory score preparation? How does the prescribed tempo of music influence error detection and correction skills? These are just some of the questions that could be explored in future research studies. As conducting instructors continue to refine undergraduate conducting and rehearsal techniques curricula, it seems important to identify ways in which novice conductors can be better prepared to teach, rehearse, and conduct the musicians in their ensembles.
Supplemental Material
sj-pdf-1-jrm-10.1177_00224294221090432 – Supplemental material for Effects of Score Study and Conducting Gesture on Collegiate Musicians’ Ability to Detect Errors in a Choral Score
Supplemental material, sj-pdf-1-jrm-10.1177_00224294221090432 for Effects of Score Study and Conducting Gesture on Collegiate Musicians’ Ability to Detect Errors in a Choral Score by Mark Montemayor, Jessica Nápoles, Brian A. Silvey and Lia Wiese in Journal of Research in Music Education
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
