Abstract
We examined the degree of loudness constancy using two methods of adjustment. One was “sound production,” by which listeners played a musical instrument as loudly as a model player. The other was “sound level adjustment,” by which listeners adjusted the loudness of the sound produced by a loudspeaker. The target sound was produced by the actual musical instrument performance. Sound pressure levels of the stimuli were approximately 60, 75, and 86 dB(A). The distances between the performer and the participant were 2, 8, and 32 m. In both conditions, participants were asked to produce the level of sound pressure matching the stimulus. Results show that when visual cues of musical performance are available, sound production had more robust loudness constancy than the sound level adjustment method.
Perceptual constancy is a tendency to perceive an object as having constant loudness, size, and brightness despite changes in stimulus features. For instance, sounds are perceived as though loudness were determined directly by the intensity at the sound source. Loudness constancy refers to the phenomenon by which loudness remains constant in the presence of substantial changes in the physical stimulus caused by varying the sound distance (Florentine, Popper, & Fay, 2011). The phenomenon has been studied mainly using magnitude estimation (Altmann et al., 2013).
A clear association exists between accuracy of loudness perception for musical stimuli and musical expertise (Bishop, Bailes, & Dean, 2013). For instance, Geringer (1995) investigated loudness judgments of musician and non-musician listeners using the Continuous Response Digital Interface (CRDI) to indicate perceived loudness levels. Results showed that musicians judged a significantly smaller magnitude of dynamic change than did non-musicians. However, few studies have examined loudness constancy systematically and critically (Altmann et al., 2013; Zahorik & Wightman, 2001).
We examined the degree of loudness constancy using two methods of adjustment: “sound production,” where listeners played a musical instrument as loud as a model player; and “sound level adjustment,” where listeners adjusted the loudness of the sound produced by a loudspeaker. Several studies have examined action-specific influences on perception (Proffitt, Bhalla, Gossweiler, & Midgett, 1995; Witt & Proffitt, 2008). For instance, Proffitt et al. (1995) demonstrated that verbal and visual measures exhibited great overestimation, although the action measures yielded accurate assessments. These results, which revealed that reproduction methods affect geographical slant perceptions, suggest that our perception is characterized by how we react to the world around us. Nevertheless, few studies have examined the mechanism by which both the sound production and the sound level adjustment influence the degree of loudness constancy.
Humans learning to play musical instruments or to speak foreign languages often try to produce sounds that closely resemble those produced by teachers. Learning through observation and imitation can account for the natural acquisition of human behavior (Bandura, 1971; Browder, Schoen, & Lentz, 1986). In fact, earlier studies have elucidated motor facilitation following action observation (Bird, Osman, Saggerson, & Heyes, 2005; Porro, Facchin, Fusi, Dri, & Fadiga, 2007). Moreover, action observation can produce activity in premotor regions of the brain despite absence of motor execution (Calvo-Merino, Glaser, Grezes, Passingham, & Haggard, 2005; Hari et al., 1998).
For musical learning and skill acquisition, audiovisual perception and imitation are fundamentally important (Haslinger et al., 2005). In fact, reports of previous studies have described that visual information can affect listeners’ auditory perception, such as loudness judgment (Rosenblum & Fowler, 1991), and that motor learning can aid performers’ auditory recognition of music beyond auditory learning alone (Brown & Palmer, 2012). These findings indicate that the degree of loudness was characterized not only by the musician’s audiovisual information but also by the action information related to playing music. Therefore we hypothesized that a much better degree of constancy might be found if one were to ask listeners to play musical instruments and imitate the performances of model players. To confirm the hypothesis, we conducted the following two experiments.
Experiment 1
Method
Participants were 14 college students (2 men, 12 women; age range, 18–42 years), each reporting normal or corrected to normal vision and no hearing impairment.
The experiment was conducted in the Yamanashi Eiwa College gymnasium, where dark noise was 32–35 dB(A). Stimuli were 2 s tones with G4 (393 Hz) pitch produced using a melodica (MX32C; Suzuki Musical Inst. Mfg. Co., Ltd.). Melodicas, which are also known as a pianicas, melodyhorns, and key harmonicas (presented in Figure 1), are free reed aerophones similar to harmonicas. This musical instrument is popular in music education in Japan (Adachi, 2013). For the present study, the target sound was produced by the second author’s actual musical instrument performance. During the experiment, the performer and the participant sat on a chair in the gymnasium. In experiment 1, participants were able to observe the musical performance. The distances separating the performer and the participant were 2, 8, and 32 m, respectively corresponding to near, middle, and far distances. Sound pressure levels of the stimuli at the music performer were set as 60, 75, and 86 dB(A), respectively corresponding to soft, medium, and loud sounds. For each trial, the sound levels of the target stimuli were confirmed using a sound level meter (LA-3260; Ono Sokki Co. Ltd.) set 60 cm in front of the performer. Furthermore, the sound level meter was placed 90 cm from the floor. During the experiment, the performer watched the sound level meter when producing the tones and controlled the sound levels of the target stimuli adequately. The mean ± standard errors were the following: soft was 60.3 ± 0.16; medium was 75.0 ± 0.08; and loud was 85.3 dB(A) ± 0.15 dB(A).

Melodica. This musical instrument is a free-reed keyboard wind instrument with a musical keyboard on top. It is played by blowing air through the mouthpiece, which fits into a hole in the side of the instrument.
We measured the degree of loudness constancy using two methods of adjustment. One was “sound production,” by which listeners played a musical instrument as loudly as a model player. In other words, participants were asked to perform the melodica (MX32C; Suzuki Musical Inst. Mfg. Co., Ltd.) to adjust the same sound pressure level as the target stimulus. The other was “sound level adjustment,” where listeners adjusted the loudness of the sound produced by a loudspeaker (MX-B55; Yamaha Music Japan Co., Ltd. and Yamaha Corp.). Specifically, participants were asked to operate the loudspeaker volume matching the target stimulus. The loudspeaker was set 50 cm in front of the participants and was placed 90 cm from the floor. In both conditions, the music tone of the reaction sound was G4: the same timbre as the target stimulus. The sound pressure levels of the reaction sound were measured using a sound level meter (NL-62; Rion Co. Ltd.) set 90 cm in front of the participants and placed 120 cm from the floor.
The order of presentation of the sound pressure level of the target stimulus (soft, medium, and loud) was randomized. Condition (sound production and sound level adjustment) and distance (2, 8, and 32 m) were counterbalanced across participants. The experiment started with two practice sessions for both conditions before 90 test trials.
Results and discussion
Results are presented in Figure 2. Mean loudness was judged from the sound pressure levels of the reaction sound. A three-way within-subject analysis of variance (ANOVA) of the mean loudness judgment was conducted, considering the condition (sound production and sound level adjustment), the sound pressure level (soft, medium, and loud), and the distance (2, 8, and 32 m) as factors.

Results of Experiment 1. Error bars represent standard errors of the mean.
Results revealed significant interaction between the condition and distance, F(2, 26) = 6.69, p < .01. A simple main effect of distance was found on the sound level adjustment condition: F(2, 52) = 7.85, p < .01. Post-hoc comparisons (Ryan’s method) revealed that the loudness judgment of 2 m was louder than that of the other distances on the sound level adjustment condition (ps < .01). However, in the sound production condition, the loudness judgment did not differ among distances: F(2, 52) = 0.63, p = .54.
Interaction between the sound pressure level and the distance was also significant: F(4, 52) = 6.97, p < .001. Simple main effects of the sound pressure level were found for 2 m, F(2, 78) = 229.18, p < .001, 8 m, F(2, 78) = 205.45, p < .001, and 32 m, F(2, 78) = 170.15, p < .001. For each distance, the loudness judgment of the loud sound was judged as louder than the judgment of the medium and of the soft. The loudness judgment of the medium was estimated as louder than the judgment of the soft. Simple main effects of the distance were found for the medium, F(2, 78) = 3.97, p < .05, and for the loud, F(2, 78) = 5.14, p < .01. Regarding medium and loud, the loudness judgments on the 2 m are significantly louder than the judgments of the 8 m and the 32 m.
A significant main effect of the sound pressure level was also found: F(2, 26) = 230.26, p < .001. Post-hoc comparisons (Ryan’s method) revealed that the mean loudness judgments on the sound pressure level mutually differed to a significant degree (M = 65.56 on the soft, M = 71.70 on the medium, and M = 78.31 on the loud). No other main effect or interaction was found to be significant.
As expected, the sound production showed more robust loudness constancy than the sound level adjustment. Haslinger et al. (2005) pointed out that audiovisual perception and imitation are fundamentally important for music learning and skill acquisition. Our results might correspond to an earlier claim (Haslinger et al., 2005). However, results of some recent studies suggest that motor facilitation occurs not only through action observation (Bird et al., 2005; Porro et al., 2007), but also through motionless listening (Lahav, Katz, Chess, & Saltzman, 2013). To resolve the question of when visual cues of a musical performance are unavailable, we conducted experiment 2.
Experiment 2
Method
Participants were 13 college students (2 men, 11 women; age range, 18–43 years), each reporting normal or corrected to normal vision and no hearing impairment. Four (1 men, 3 women) of them had also participated in experiment 1.
The apparatus, the stimuli, the distance separating the performer and the participant, and the sound pressure levels of the stimuli were identical to those used in experiment 1. However, in experiment 2, the visual cues of musical performance were unavailable for participants. A stainless steel screen curtain (60 cm length, 180 cm height) was placed 1 m in front of the musical performer. Consequently, participants were able to listen to the sound stimuli, but they were unable to observe the musical performance at all. For each trial, the sound pressure levels of the target stimuli were confirmed using a sound level meter (LA-3260; Ono Sokki Co. Ltd.) set 60 cm in front of the performer and placed 90 cm from the floor. During the experiment, the performer watched the sound level meter when they produced the tones and controlled the sound levels of the target stimuli adequately. The mean ± standard errors were the following: soft was 60.5 ± 0.13; medium was 75.8 ± 0.14; and loud was 86.0 dB(A) ± 0.15 dB(A).
As with experiment 1, we measured the degree of loudness constancy using two methods of adjustment. One was “sound production,” where listeners played a musical instrument as loudly as a model player. The other was “sound level adjustment,” where listeners adjusted the loudness of the sound produced by a loudspeaker. The loudspeaker was set 50 cm in front of the participants and was placed 90 cm from the floor. In both conditions, the music tone of the reaction sound was identical to that used for experiment 1. The sound pressure levels of the reaction sound were measured using a sound level meter (NL-62; Rion Co. Ltd.) set 90 cm in front of the participants and placed 120 cm from the floor.
The order of presentation of the sound pressure level (soft, medium, and loud) was randomized. Condition (sound production and sound level adjustment) and distance (2, 8, and 32 m) were counterbalanced across participants. The experiment started with two practice sessions for both conditions before 90 test trials.
Results and discussion
Results are presented in Figure 3. Mean loudness judgments were calculated from the sound pressure levels of the reaction sound. A three-way within-subject analysis of variance (ANOVA) of the mean loudness judgment was applied, considering the condition (sound production and sound level adjustment), the sound pressure level (soft, medium, and loud), and the distance (2, 8, and 32 m) as factors.

Results of Experiment 2. Error bars represent standard errors of the mean.
Results showed that the main effect of the condition was significant, F(1, 12) = 48.86, p < .001, indicating that the mean loudness judgment of the sound production (M = 65.26) was weaker than on the sound level adjustment (M = 72.5). Furthermore, the main effects of the sound pressure level, F(2, 24) = 207.2, p < .001, and of the distance, F(2, 24) = 3.62, p < .05, were also significant. Post-hoc comparisons (Ryan’s method) revealed that the mean loudness judgments of the sound pressure level differed significantly from each other (M = 61.63 on the soft, M = 69.70 on the medium, and M = 75.30 on the loud). Regarding the distance, the mean loudness judgment of the 2 m (M = 69.88) was significantly louder than that of the other distances (M = 68.22 of the 8 m and M = 68.53 of the 32 m). No other main effect or interaction was found to be significant.
General discussion
The present study examined the degree of loudness constancy using sound production and sound level adjustment. We hypothesized that a better degree of constancy for the sound production would be observed than for the sound level adjustment. Results show that when the participants watched the musical performance, sound production exhibited more robust loudness constancy than sound level adjustment. These results support the previous claim that audiovisual perception and imitation are necessary for musical learning and skill acquisition (Haslinger et al., 2005).
Our results demonstrated that the mean loudness judgment of sound production was significantly weaker than that of the sound level adjustment, for which the visual cues of musical performance were unavailable for participants. Earlier reports have described that motor facilitation occurs not only through action observation (Bird et al., 2005; Porro et al., 2007), but also through motionless listening (Lahav et al., 2013). Our results suggest that visual cues of musical performance served an important role in determining the degree of loudness constancy. Results of earlier studies suggest that visual information affects the perceived loudness. For instance, Rosenblum and Fowler (1991) revealed that sounds paired with large clapping motions are judged as louder than when paired with the smaller motions used to produce softer claps, even though participants were asked to ignore visual information when rating loudness. In experiment 1, loudness judgments of 2 m of the medium or the loud sound were significantly louder than the judgments of either 8 m or 32 m. The results also imply that the sound production was characterized by visual cues such as breath control during the melodica performance. Therefore, our findings provide further evidence that audiovisual perception and imitation are necessary for musical learning and skill acquisition. In particular, our results show that the degree of loudness of the musical instrument was characterized not only by the musician’s audiovisual information but also by the action information related to playing music. Previous reports have described a clear association between accuracy of loudness perception for musical stimuli and musical expertise (Bishop et al., 2013; Geringer, 1995). It is difficult to apply our results to visually impaired musicians, but our findings provide important insight into how musical learners acquire musical skills such as accurate loudness perception through musical learning such as ensemble playing.
The present study has several limitations. Our experiments were conducted in a gymnasium. Earlier reports have described that loudness constancy is influenced by reverberation cues (Altmann et al., 2013; Zahorik & Wightman, 2001). For instance, Altmann et al. (2013) pointed out that sound reverberation cues can serve as direct cues for loudness constancy. The possibility exists that our obtained results might be influenced by reverberation cues in the gymnasium environment. Future studies should include similar experiments conducted under different environmental conditions. It is also reported that people perceive the environment in terms of their ability to act in it (Witt, 2011). Our participants might also have better ability to perform with musical instruments than to control the loudspeaker volume. Future studies should be conducted to investigate the relation between musical instruments and participants’ skills. Our experiments used 2 s tones with G4 (393 Hz) pitch produced using a melodica. It is also necessary to use other sound frequencies or musical instruments. For instance, gestures made by the performers have been shown to influence the perceived duration of notes played by percussion instruments (Schutz & Lipscomb, 2007). It is recommended that future studies be undertaken to confirm whether other sound frequencies or musical instruments show similar phenomena as in our experiments.
Footnotes
Acknowledgements
The authors thank Sonoko Mikami and Yuki Hirose for assistance in administering the experiment.
Funding
The author(s) received the following financial support for the research, authorship, and/or publication of this article: This research was supported by Ministry of Education, Culture, Sports, Science and Technology Grants-in-Aid for Scientific Research (A) (No. 487283 and No. 16H01736) and (B) (No. 26280078).
