Abstract
The purpose of this study was to validate previous research that suggests using movement in conjunction with singing tasks can affect intonation and perception of the task. Singers (N = 49) were video and audio recorded, using a motion capture system, while singing a phrase from a familiar song, first with no motion, and then while doing a low, circular arm gesture. Analysis of relationships between a circular singer arm gesture and changes in intonation indicated most singers (67.3%, n = 33) were closer to the target pitch when doing the low, circular gesture. Additionally, significant correlations were found between motion of the hand and face. Participant perceptions of singing with motion included “fuller tone” and “more breath” with the lower motion and singing without motion viewed as “easy” and “comfortable.” Results of this study suggest that singing with motion can affect intonation, other bodily movements, and perception of singing.
Many vocal teachers and choral conductors believe the whole body is the conducting gesture and singer response is based on the conductor’s visual model (Eichenberger & Thomas, 1994). While extensive research has focused on singer gesture (Krause, 1983; Linklater, 1976; Peterson, 2000; Pierce, 2007; Thurman & Welch, 2000; Wis, 1993), empirical data on the possible relationships of singer gesture to vocal sound are limited. Movement is an important pedagogical tool incorporated in widely used music methodologies, including Orff-Schulwerk, Kodály, and Dalcroze (Elliott, 2005), yet association between movement and music learning has not been widely studied using technologies such as motion capture.
Motion capture technology appeared as early as the 1850s and had its roots in the work of Eadward Muybridge. Motion capture, motion tracking, or mocap are terms used to describe the process of recording or capturing movement and translating that movement onto a digital model. A series of three-dimensional infrared camera–tracked sensors are placed on various parts of the body, providing a three-dimensional digital representation of the participant. Motion capture technology is used in the television, film, and video gaming industries. In filmmaking, the systems record human motion data in order to animate digital character models as demonstrated in popular films such as The Lord of the Rings, Polar Express, and Avatar.
Investigation into motion capture technology has focused on extensive analysis and development (Nair, Gibbs, Arnold, Abboud, & Wang, 2010), has included descriptive research (Dariush, 2003), and has resulted in redesign of systems to fit the demands of the particular research being done (Darby, Li, & Costen, 2010; Suk, Sin, & Lee, 2010). Movement of actors (Huang, Hilton, & Starck, 2010), human bouncing and jumping (Racic, Brownjohn, & Pavic, 2010), and facial expression, as well as bodily movement while experiencing emotions (Crane, 2009) have been examined using motion capture technology. Some studies have targeted small fine-motor movements, such as mouth movement and jaw movement (Jiang, Alwan, & Keating, 2000; Jiang, Alwan, Keating, Auer, & Bernstein, 2002), along with the three-dimensional movement of speech (Samaan, 2009; Yehia, Kuratate, & Varikiotis-Bateson, 2002).
The use of motion capture technology in research on music listening or performance has been somewhat limited. In 2008, Toiviainen used a motion capture system to examine the kinematic and kinetic characteristics of spontaneous movement to music. Results indicated that musical beat was most clearly represented by movements in the vertical direction and the beat tended to be associated with bursts of instantaneous muscular power. Schoonderwaldt (2009) examined the bowing performance of violin and viola players (N = 4) using an optical motion capture system that measured velocity, force, and distance. He found that the players adapted the bowing parameters to the physical properties of the string and instrument because of the playable control parameter space. Other motion capture studies in music performance have included the guitar (ElKoura, 2003; Norton, 2008), drums (Kawakami, Mito, & Watanuma, 2008), and piano (Palmer & Bella, 2004).
In recent years, vocal and choral music have emerged as areas of interest examined via motion capture technology. Livingstone, Thompson, and Russo (2009) reported results of two experiments on facial expressions during perception, planning, production, and postproduction of emotional singing. Participants (N = 7) were recorded with motion capture as they watched and imitated emotional singing as indicated by facial expression. Facial expressions were monitored before their imitation, during their imitation, and after their imitation. Results indicated a role of facial expressions in the perception, planning, production, and postproduction of emotional singing. In another study, Kun (2004) used a responsive/interactive system able to capture a conductor’s performance in three dimensions. Finally, Manternach (2012) used motion capture to examine mimicking effects of singers observing a videotaped conductor. Results indicated that participants imitated the conductor’s rounded lips, but most did not notice changes in conductor eyebrow lifting.
The present investigation aligns with Brunkan’s (2013) study of acoustic and psychoacoustic measurements of singing taken during three conducting conditions, singer gestural training, and singer gestural movement on individual singers’ (N = 49) performances of a sung /u/ vowel 1 in the context of a familiar song. Results indicated beneficial effects on intonation when singers performed a low, circular gesture while singing an /u/ vowel.
Purpose Statement and Research Questions
The purpose of the present investigation was to (a) analyze differences in fundamental frequency (intonation) as measured by standard acoustical measures when singers performed a low, circular singer arm gesture versus no movement; (b) assess relationships between direction of motion (measured by motion capture technology) and changes in frequency contour; (c) explore relationships between singer hand, head, eyebrow, and mouth movement (measured by motion capture technology); and (d) examine participant perceptual responses related to singing with and without gesture. Singers (N = 49), ranging in age from 18 to 72 years, sang a phrase from a familiar song under two conditions: (a) a low, circular arm gesture and (b) no movement.
To that end, the following research questions guided this investigation:
Does singing while performing a low, circular arm gesture produce differences in measures of fundamental frequency (intonation)? To what extent do motion data correlate to frequency contours as measured by a motion capture system? To what extent does a low, circular arm gesture correlate to movement of the head, nose, eyebrows, or mouth as measured by a motion capture system? What do participant comments suggest about perceptions of singing with and without a low, circular arm gesture?
Method
Participants (N = 49) were male (n = 14, 29%) and female (n = 35, 71%) singers ranging in age from 19 to 67 years (M = 30 years, SD = 11), with 27 participants (55.1%) in a choir at the time of the study. A majority of participants had choral experience (40; 81.60%), as an adolescent (36; 73.42%), in high school (38; 77.65%), and in college (40; 81.60%). This convenience sample was obtained on a volunteer basis from university classes. All participants completed consent forms and stated that they were familiar with the song excerpt.
The musical excerpt used consisted of the final phrase of the melody line of “Happy Birthday to You” in the key of D major. This phrase was chosen because it was a part of a well-known composition and it ended on a sustained /u/ vowel on the word “you.” The /u/ vowel was chosen for examination as its clear enunciation by singers or speakers occasions exaggerating the acoustic effect of backness through some degree of forward lip rounding, which serves to lengthen the vocal tract. This increase in vocal tract length allows clear analysis of vowel sound.
I gave participants an information form on entering the research room, as well as a code that identified their number, group, and ordering of conditions. Participants then completed a 12-item questionnaire on demographic information, such as singing experience, age, and training. Next, the musical excerpt was played once on a keyboard (MM = 85) in the key of D major. Participants rehearsed the phrase until they felt comfortable singing in both the given key and at the given tempo.
I then directed participants to place seven passive reflective sensors on their faces: one above each eyebrow above the inside corner of the eye, one on the top of the bridge of the nose, one on each corner of the lips, and one on the top and bottom of the outer edge of lips. Participants were asked to place four additional markers on the top and side of the first and second knuckle of the pinky finger (Figure 1). They were also fitted with a black headband with four reflective sensors required for the motion capture device being used.

Reflectors placed on participant’s hand.
Participants stood 6 feet from a projection screen (distance from the screen front to midline of the body) that was set on a table in the corner of the room (Figure 2). A three-dimensional infrared motion capture system (OptiTrack, 1996) with seven small infrared cameras (Model V100) was placed on the table facing the participants. The placement of the cameras was checked so that they did not obstruct the participants’ view of the conductor on the screen. The infrared cameras tracked the reflections from the participants’ sensors.

Participant standing in front of the projection screen and infrared motion capture cameras.
Potential variability in conductor behaviors and consistency of stimuli across participants was controlled through the use of videotaped conducting. Participants saw the same conductor doing the standard conducting pattern for each repetition of the sung phrase. The conductor used a metronome (MM = 85) and a mirror during taping of a standard conducting pattern in ¾ time.
A Master-Key pitch pipe (C–C range) was used to give a starting pitch (G4) prior to each repetition of the melody. Participants were asked to sing the excerpt a cappella and from memory as they viewed the videotaped conductor on a screen in front of them. The stimulus videotape was projected such that the conductor appeared life sized, as determined by having the conductor stand beside the projected image prior to the study. Participants heard the starting pitch on a pitch pipe (G4) before each repetition. Distance from recording devices was consistent for all participants and across repetitions.
A short video of the conductor doing the low, circular gesture was then shown to participants. They were instructed to do the gesture to the tempo of the phrase. They were asked to perform the low, circular gesture while watching the videotaped conductor once before being recorded.
All participants sang the phrase six times while viewing a standard conducting gesture: (a) three times while singing and doing a low, circular arm gesture (two hands, with fingers together, moving upward and outward in large circles in front of torso from the level of the navel to the sternum) and (b) three times while doing no movement as instructed by the researcher. Participants sang the phrase while watching a conductor as well as doing the movements themselves. All participants were audio and video recorded.
Following the investigation, participants removed the headband and sensors and completed a short questionnaire that included perceptual questions. Participants were asked what differences, if any, they noticed in their singing when singing while doing the low, circular gesture versus no movement.
Measurement and Analyses Processes
Sound samples were edited using Praat (http://www.fon.hum.uva.nl/praat/) and loaded onto a Dell laptop computer for subsequent playback. The middle section of each /u/ vowel from the word “you” (0.46 seconds) was edited for each participant. Fundamental frequency was measured on each selection using Praat (Boersma & Weenink, 2014). Praat applied a Gaussian-like window to compute linear predictive coefficients through the Burg algorithm integrated in the software. Fundamental frequency (Fo) was recorded into an Excel spreadsheet for subsequent analysis. Deviation from target fundamental frequency (males 146.83 Hz; females 293.66 Hz) was then converted to cents for comparison of mean deviation in cents from target fundamental frequency.
Using the computer software (Arena Motion Capture software, Version 1.4.0) developed to accompany the motion capture system, all the participant sensors, as recorded by six of the seven infrared cameras, were trajectorized on the X (horizontal), Y (vertical), and Z (depth) planes. The seventh camera was used for audio and video recording. The audio recording was opened with WaveSurfer acoustical software (Version 1.8.5), where labels were manually placed for each of the measurement windows. A temporal subset of the motion data was analyzed on each trial to account for a variable asynchrony between the motion and acoustic data inherent in the OptiTrack recording system.
Each marker (four on the headband, one over each eyebrow, one on the bridge of the nose, four around the lips, and four on the finger) was checked for numerical labeling in order to correlate the number of the marker with the row of data in the output file. Next, using Virtual Dub (Version 1.9.11), the audio and video files were separated and saved under the participant code (number, take, condition). Then, using Tcltk (Version 1), the output of the c3d file was converted into a text file. A custom program was used to merge the motion and derived auditory/acoustic measurement files.
Text files of the numbers corresponding to the motion data in three dimensions (x [side to side], y [up and down], and z [depth]) were loaded into Excel. The numbers were labeled corresponding to the placement of the marker, participant take/number, and the dimension recorded (i.e., head X [head movement on horizontal plane], nose Z [nose movement on depth plane], and hand Y [hand movement in vertical plane]). Those numbers were then used to create correlation matrices by each marker location and dimension in order to assess possible relationships. Weak (<.30), moderate (.30–.58), and strong (>.58) correlations were noted (Stevens, 2009, p. 251). Absolute numbers were then charted in order to assess strong relationships, positive or negative, of the various markers and dimensions. All matrices were then reevaluated in terms of relationships observed (i.e., nose x and head x). These expected relationships were recorded. Other correlations were recorded as well (i.e., eyebrow y and hand x). Movements were measured from the time the singer began singing the phrase until the end of the phrase (5.97 seconds).
The motion capture system requires that a majority of the cameras track each sensor in order to trajectorize participant movement. Depending on the height, facial structure, and body movements of each participant, the sensors occasionally moved out of view of the required number of cameras. Therefore, the analysis used case-wise deletion, thus eliminating participants who were not deemed to have reliable measurements (i.e., at least 80% of data points for each marker recording window) for all the conducted conditions in a particular comparison. All participants (N = 49) were included for at least one of the following tests.
Results
The first research question asked whether there were differences in measures of fundamental frequency (intonation) when singers (N = 49) performed a low, circular arm gesture compared with singing with no gesture. Most singers (67.37%, n = 33/49) were closer to the target fundamental frequency when doing the low, circular motion (M = 19.01 cents, SD = 47.53 cents) compared with baseline measures. Findings also indicated that 93.94% (n = 31/33) of participants who came closer to the target fundamental frequency when doing movement made an audible change (>7 cents). Difference in measures of deviation in cents from target fundamental frequency is shown in Table 1.
Mean Difference in Cents From Target Fundamental Frequency for Participants Who Sang Closer to Target Fundamental Frequency While Making a Low, Circular Gesture.
Results of a paired sample t test (p < .05) of mean deviation in cents from target fundamental frequency by condition were statistically significant, t(49) = 2.77, p = .01. Measures of deviation in cents from target fundamental frequency showed that overall, participant deviation from the target fundamental frequency was lowest (most “in tune”) when doing the low, circular gesture (M = 18.97, SD = 44.94) versus doing no motion (M = 29.01, SD = 52.69). The mean difference across participants was 10.06 cents, an audible difference.
The second research question inquired as to the extent of correlations between motion data (direction of body movement measured by sensors) and frequency contours (direction of sung pitch during the sung phrase). Results of Pearson product–moment correlation coefficients indicated no significant correlations between frequency contour and the hand moving in the x (side to side) dimension, r(20) = −.20, p = .39; the z (depth) dimension, r(20) = .25, p = .29; and y (vertical) dimension, r(20) = .33, p = .15, of the hand markers.
The third research question addressed correlations between movement of the head, nose, eyebrows, or mouth with that of the hand as measured by tracked markers. Moderate correlations were found between some directions of movement. Results of Pearson product–moment correlation coefficients indicated significant relationships between head markers in the z (depth) dimension with hand markers in the x (side to side) dimension, r(20) = .48, p = .03 as well as hand markers in the z dimension, r(20) = .46, p = .04. These results indicate that the forward or backward movement of the head is associated with similar and sideways movement of the hand. This finding suggests that singers may move their heads forward and backward when their hands move.
Moderate, positive correlations were seen between a variety of markers. Significant correlations were found between the eyebrow marker in the y dimension and the hand marker in the y dimension, r(20) = .47, p = .04. Similar to the head marker in the y (vertical) dimension, when the hand moved vertically, so did the eyebrow. There was also a significant relationship between the bottom lip marker in the x dimension and hand marker in the x dimension, r(20) = .50, p = .02. Again, similar to the y (vertical) dimension, this finding may indicate that when the hand moved in the x (horizontal) dimension, so did the bottom lip. Moderate, significant correlations between the bottom lip marker in the z dimension and hand marker in the x dimension, r(20) = .51, p = .02 and z dimension, r(20) = .55, p = .01, were also found.
Finally, the fourth research question inquired as to participant perceptual responses of singing with the low, circular arm movement or no movement. Participants wrote responses to the following prompt: “What difference(s), if any, did you notice in your singing when doing (a) low arm circles and (b) no motion?” Table 2 reports the five most frequent responses in order of decreasing frequency according to type of gesture (low arm circles, no motion). A research assistant independently analyzed participant responses. Reliability was calculated using the formula (agreements/agreements + disagreements) × 100. Observer agreement was found to be 87% for response analysis.
Most Frequent Written Responses of Participants on Perceived Influence of Gestures on Singing.
The most frequent responses from participants with respect to the low, circular gesture had to do with breath and volume. Those who mentioned breath (n = 19/49, 39%) wrote that the influence of low circles was “more air production/support,” “lower breath support,” and the “breath flowed with ease.” Participants also gave responses categorized as having to do with volume or amplitude (n = 11/49, 22%), such as “fuller tone” and “more projection.” Other categories of comments included vibrato, tone color, and rhythm/beat.
The condition of doing no motion seemed to be most familiar for the participants. Participants commented on attention/focus (n = 8/49, 16%) in such ways as, “I was the most focused,” “I wasn’t concentrating on my singing as much,” and “I feel I sang best, less to worry about.” Physical or movement aspects included comments such as “more constricted than the low,” “comfortable, balanced,” and “stiff.” Comments were also made concerning vibrato and breath.
Discussion
Singer gesture has historically been used in vocal pedagogy to evoke changes in sound. Findings of this study seem to support previously reported evidence on the beneficial effects of singer gesture. Participants perceived the low, circular arm gesture positively, acoustical measures of fundamental frequency indicated significant differences in intonation while using the circular gesture, and the majority of participants (67%) came closer to the target fundamental frequency when doing the low, circular arm gesture. Thus, the findings of this study indicated that the low, circular arm gesture appears to contribute overall to in tune singing as found in previous research (Brunkan, 2013). It is also important to note that the magnitude of the differences would be noticeable to the human ear (>7 cents). With this finding in mind, future research might examine the plane of this gesture in relation to pitch. Perhaps doing the same size and direction of a circular motion at a higher plane would affect acoustical measures, particularly intonation, in a different manner.
Findings about frequency contour and movement markers of the motion capture system indicate some interesting trends. Motion markers of the nose, eyes, head, and lips showed similar relationships in the three dimensions. Overall, in the x (vertical) and y (horizontal) dimensions, most mean correlations were negative, possibly indicating that bodily motion and direction of frequency change do not move similarly in time or direction. Depth of movement, however, seems to indicate the opposite. This finding may be of particular interest to voice educators. Singer movement of the face and head indicated a variety of idiosyncratic movements while participants sang. Often, teachers and conductors indicate direction of pitch with a gesture; however, it may be useful to use a variety of gestures when communicating pitch direction to singers.
Participant movements were tracked by a variety of markers in this study. Correlations between markers of the hand, lips, head, and eyebrow offer interesting insights into the movements of singers. Findings indicated that horizontal movement of the hand accompanies similar movement of the lips and head for some singers. Participants in this study also moved their heads in the depth dimension (forward and backward) when the hand moved on a similar plane. As voice educators, awareness of the effect of singer movement is essential to fostering efficient vocal production, as secondary movements may affect vocal technique. It is important, therefore, to observe the effects of movement on other areas of the body.
Singer perception of sound and the sensation of singing are often important to the longevity and enjoyment of the task. Singer responses, overall, suggest perceptions of singing louder, taking a deeper breath, and producing a fuller tone while performing the low, circular gesture. As breath is essential to each of these observations, it follows that the low gesture might enable singers to focus more on their lower abdominal muscles, and thus, feel like they are getting a deeper, fuller breath that could support a louder, fuller tone. Singers with choral singing experience may find no movement most comfortable and familiar as this posture would normally be employed in a choral rehearsal. Therefore, future research could examine movement in relation to singer experience, familiarity with movement tasks, participation in choir, or voice lesson experience. However, the positive effects on intonation found in this investigation supports use of gesture with singers even if there is a level of discomfort or unfamiliarity.
Several confounding variables may have affected the results of this study. Although participants were afforded gestural training and gesture usage was controlled as much as possible through instruction, participant performance varied depending on factors such as singer familiarity with movement, singer arm and hand shape, singer energy level, and other physical qualities of the singers. Furthermore, this research aimed to create a naturalistic setting similar to that of a voice studio or choral rehearsal; however, idiosyncratic singer movement could have contributed to slight differences in motion and subsequent variability in acoustic, motion, and/or perceptual measures. Such restrictions may also inhibit singers’ natural physical response and thus, their vocal production. Future researchers may well decide to assess level of familiarity with movement or explore singer-chosen movement instead of conductor-prescribed gestures. Although not included in this investigation, motion capture data on velocity and distance of movement may be examined in future research to further investigate the relationship of movement to sound.
The movement of the arms may have caused movement in other parts of the body, such as the torso. Movement of the torso could affect aspects of singing such as breath and therefore intonation. Although movement of the torso was not measured as a part of this investigation, measurement of breath and muscle activation may be areas of interest to future investigations.
Future research is needed to examine the effects of movement in a choral setting. The results of this solo singing study may be of interest to choral directors. Conglomerate choral sound as created by individuals in the choir is often more than a simple sum of each of its individual parts. If sufficient numbers of individual choristers with similar proclivities evidence desirable nuances in vocal production behaviors before singing in a group, the acoustical “chorusing” that occurs in choir-singing contexts may result in more robust differences in group sound than would be the case in solo sound.
Finally, singer perceptions of breath and tone quality in relation to gesture may suggest that future research in this area use dependent measures other than those that focus exclusively on intonation. Researchers could complete additional studies to examine the effects of singer gesture with technologies such as electromyography.
The primary contribution of this investigation is that a low, circular arm gesture may produce audible improvement in sung intonation. Moreover, participants perceived beneficial effects on their singing when the low, circular gesture was present. These findings on the use of singer gesture may encourage teachers and students in voice studios, as well as choral rehearsals, to explore movement as a tool to evoke certain sounds in the process of singer education.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
