Abstract
The aim of this work was to evaluate whether the angular elevation of a sound source could generate auditory cues which improve the auditory distance perception in a similar way to that previously reported by visual modality. For this purpose, we compared ADP curves obtained with sources located both at the listeners’ ears and at ground level. Our hypothesis was that the participants can interpret the relation between elevation and distance of ground-level sources (which are linked geometrically) so we expected them to perceive their distances more accurately than those at ear level. However, the responses obtained with sources located at ground level were almost identical to those obtained at the height of the listeners’ ears, showing that, under the conditions of our experiment, auditory elevation cues do not influence auditory distance perception.
Introduction
In this Short Report, we present a preliminary study in which we evaluate whether the angular elevation of the sound source can generate auditory cues which in turn improve the auditory distance perception (ADP). ADP depends on a wide variety of acoustic and non-acoustic cues such as sound level, direct-to-reverberant energy ratio, spectral content, binaural cues, visual information, previous knowledge of the sound source, among others (see Fluitt et al., 2013; Kolarik et al., 2016; Zahorik et al., 2005, for a complete review). However, to our knowledge no study has yet assessed whether the elevation of the sound source influences the ADP.
We are motivated by a previous study which showed that elevation of visual objects improves visual distance perception (VDP). Philbeck and Loomis (1997) conducted VDP experiments under low-vision conditions and observed that participants were less biased perceiving the distance of objects located on the ground (condition in which the elevation of object varies with distance) than estimating the distance of objects located at eye level (without elevation cues). These authors proposed that this improvement is due to the fact that the subjects could exploit the relationship between elevation and distance of floor-level targets (which are uniquely linked by a geometrical relation) to provide more accurate responses than those to eye-level targets. For this, the brain must be able to extrapolate a line from the head to the ground following the object’s elevation angle, resulting in a single distance value. In this way, the elevation of visual objects located on the ground works as an absolute distance cue. In this vein, Ooi et al. (2001) demonstrated that the eye level serves as a reference for the visual system to compute the angular declination below the horizon. The authors observed that when the angular declination was increased by binocularly viewing through base-up prisms, the observers underestimated the target’s distances.
Taking into account this background, we wonder whether the results reported in previous VDP studies can be extrapolated to the auditory domain. The answer to this question is not obvious since the visual and auditory modalities process space in completely different ways. While the brain receives high-resolution spatial visual information directly from the retina, the elevation of a sound source must be inferred from the acoustic signals generated by the interaction of the incident sound with the head and external ears (Blauert & Hearing, 1997; King., 2009; Middlebrooks, 2015). We hypothesize that, if the brain manages to obtain relatively accurate information about the elevation of objects located on the ground, it will be able to infer their distance regardless of the way in which that spatial information was encoded. Several studies have demonstrated that humans are able to perceive the elevation of sound sources (Middlebrooks, 2015). Discrimination experiments have reported that the minimum audible angle in the vertical plane for sound sources located in front of the listener is around 3°–9° (Perrott & Saberi, 1990; Blauert & Hearing, 1997; Grantham et al., 2003). Similar errors (∼ 4°–8°) were obtained in the mid-sagittal plane when measuring the absolute elevation judgments (Oldfield & Parker, 1986; Lewald, 2002; Carlile et al., 1997). Interestingly, these errors were independent of the elevation angle of the sound source except for sources located above the listener’s head (Oldfield & Parker, 1984).
Here, we recreate the experiment of Philbeck and Loomis (1997) but using sound sources instead of visual targets. To accomplish this, we compare ADP curves obtained with sound sources located both at the listener’s ear and at ground level. Because angular elevation provides absolute information on the distance from sources located at floor level, we expected that ADP responses to floor-level sources would be more accurate than ear-level targets responses.
Methods
Testing Environment
Both the experiment and the acoustic measurements were performed in the scene of a theater from the Universidad Nacional de Tres de Febrero (length: 10 m; width: 7 m; height: 6 m; size: 420 m3). At all times, a thick curtain separates the stage from the seating area. Thus, the room was composed of walls covered by thick curtains, fiberglass ceiling, and wooden floor. The average reverberation time of the room (T30, A-weighting measured with the LSS method) was 0.28 s at the participant’s position. The background noise of the room was 25 dBA (measured with an SVAN 959 sound level meter at the position of the listener).
Participants
A total of 18 students (four women) from the Universidad Nacional de Tres de Febrero voluntarily participated in the study. Ages of the participants ranged from 22 to 37 years, with a median age of 26 years (SD = 3.9 years). Participants reported normal or corrected vision. All participants had normal or near-normal hearing, based on audiograms measured using the procedure described by ANSI/ASA S3.21-2004 (R2019) and the American Speech-Language-Hearing Association (ASHA). Pure-tone average better-ear hearing thresholds over 0.125, 0.25, 0.5, 1, 1.5, 2, 3, 4, 6, and 8 kHz were less than or equal to 25 dB HL. Three participants could not take the test due to time constraints, although they verbally reported normal hearing. The experiments were undertaken with the understanding and written consent of each subject, following the Code of Ethics of the World Medical Association (Declaration of Helsinki), and were approved by the Ethics Committee of the Universidad Nacional de Tres de Febrero. Participants did not have previous knowledge of the experimental room nor of the goals of the study.
Experimental Procedure and Setup
Participants were received in a contiguous room where they were informed of the experimental task and its details before signing a consent form. Instructions were given in writing to avoid biases due to the experimenter’s instructions. Then, participants were blindfolded and guided by the experimenter into the testing room and to a chair, where they remained blindfolded (Figure 1A). The participant was comfortably seated at 2 m distance from one of the short walls, and slightly offset from the central line of the room in the perpendicular direction. Seat height was adjusted until the ears of the participant were 1.2 m from the floor. Subjects performed two counterbalanced ADP tasks. In both tasks, the sound sources were located in the middle sagittal plane (0° azimuth) but at different elevations with respect to the listener’s ear: one with sound stimuli at ear level and another at floor level. The farthest test distance was chosen taking into account that the elevation of the speaker located on the ground can be perceived by the participants. This is relevant since the changes in the elevation angle of the ground-sources decreases with distance. In this way, we place the farthest speaker 6 m from the listener since, for a seated person (ears at 1.2 m), it produces an elevation angle of 11° just above the absolute elevation error (between 4° and 8°) reported in previous works for similar conditions (Oldfield & Parker, 1986; Lewald, 2002; Carlile et al., 1997). The distances used in both conditions were: D = 2.4, 3.6, 4.8, and 6 m from the participant’s ears (Figure 1C). For the ear level condition (0° elevation), the experimental setup consisted of a linear set of four speakers (Genelec 8020b). Each speaker was mounted on a metal support with the central axis positioned at the same height as the subject’s ears (∼1.2 m). To avoid acoustic occlusion between the sound speakers, two experimenters manually positioned each of the sources in front of the participant according to the sequence of the experiment. For this, the experimenters could see the distance corresponding to each trial on a computer screen located behind the participant. Between trials, a broadband sound was presented through two loudspeakers located at both sides of the participant (Figure 1B) in order to mask possible noise related to the loudspeaker movement procedure. The experimental configuration for the floor level condition was the same as in the ear level condition but with the sound sources located on the floor. As in this condition there was no occlusion between the loudspeakers, they remained in a fixed position throughout the experiment and the same masking sound described in the previous condition was used. The central axis of each speaker was directed to the center between each ear in the middle sagittal plane of the subject. Then, the elevation angles for each speaker were: 26.6°, 18.3°, 14°, and 11.3°.

Illustration of the experimental setup. A: location of the blindfolded participant. Ears located 1.2 m high. B: linear set of four speakers (Genelec 8020b), located at D = 2.4, 3.6, 4.8, and 6 m for both condition levels (ear and floor). C: loudspeakers located at both sides of the participant in order to mask possible noise related to the loudspeaker movement procedure.
Auditory stimulus consisted of 500 ms white noise clips (measured bandwidth 0.05–20 kHz) with onset and offset ramped by a 50 ms raised cosine. The stimulus bandwidth was chosen to maximize the availability of acoustical distance and elevation cues. The experimental procedure was controlled by a laptop running Matlab R2012a with Psychtoolbox (PTB-3). The stimuli were delivered to the speakers through a RME Fireface USB external soundboard, with a sampling frequency of 44.1 kHz. Subject reported distance estimates verbally using a meter scale (or centimeter) with a decimal. Each sound source distance was presented randomly five times. At the end of the task, the participants were helped out (blindfolded) by the experimenter to the contiguous room where they took an 8-min break. After the break, the second task took place following the same procedure described above. Before starting each condition and in order to ensure that the participant understood the task, four test trials were performed. The sequence used was: D: 2.4, 6, 3.6, and 4.8 m.
Acoustical Measurements Techniques
To evaluate the behavior of both the ADP and the elevation spectral cues, we calculated the frequency response of the room at the listener’s ears for both distances and conditions (ear and floor levels). Recordings were made using the same sound source as used for the experiments (Genelec 8020B) and a head and torso simulator (G.R.A.S Kemar Type 45DA) placed at the subject’s location. The sound samples used for the recordings consisted of an exponential sweep (0.02–20 kHz and 20 s long) with which we obtained the binaural room impulse response for both floor and ear level sources at the following distances: 2, 3, 4, 5, 6, 7, and 8 m. The acoustic measurements showed suprathresholds spectral changes dependent on both the height of the source (up vs. down) and their distances, especially for sources located on the ground (see Figure 1 in Supplemental material). In addition, we calculated the binaural intensity and direct-to-reverberant energy ratio as described in Spiousas et al. (2017). The binaural intensity for both conditions (ear and floor levels) are plotted as a function of distance on Supplemental Figure 2a. Since sound intensity can be considered as a relative ADP cue (Mershon and Bowers, 1979), the global intensity for all bands was set as 0 dB at 2 m, letting us to focus on the relative decay instead of on the absolute values. Binaural intensity showed a monotonous decay without abrupt jumps for both conditions. The direct-to-reverberant energy ratio for both conditions are plotted on Supplemental Figure 2b as a function of the physical distance to the sound source. Direct-to-reverberant energy ratio also showed a monotonous decay without abrupt jumps of intensity with distance. In conclusion, we did not observe important differences in the distance-dependent decay of both the binaural intensity and direct-to-reverberant energy ratio.
Outliers
To detect and discard extreme estimates (outliers), we analyzed the collected data employing the median absolute deviation (MAD) method with a threshold of 3 MADs, using the Routliers package for R (Delacre & Klein, 2019; Leys et al., 2013). To this end, we calculated the collapsed signed and unsigned relative biases (see next subsection) for each subject in each block. As a result, no outliers were found.
Statistical Analysis
Statistical Models
We fitted linear models and linear mixed-effect models (LMEMs) to estimate the associations of the magnitudes (perceived distance, variability and absolute and relative biases) with the group and target distance. LMEMs were fitted when we tested for within-subject effects. LMEMs were fitted using the lme4 (Bates et al., 2015) packages in R (R Core Team, 2020).
The parameter estimates and confidence intervals were calculated using the broom package in R (Robinson et al., 2021). We tested the fixed effects with an F-test (Satterthwaite’s method for degrees of freedom) using the R’s function ANOVA.
We visually inspected LMEMs to check for normality of both residuals and random effects. The detailed syntax and formal definition of all the fitted models are also presented in the Supplemental Materials section.
Absolute and Relative Bias
We define the signed absolute bias as the difference between the perceived and the physical distance for a given trial. The signed relative bias is the absolute bias divided by the physical distance. Both signed biases indicate the overestimation (if positive) or underestimation (if negative) of the subjects’ perception, and therefore are useful for measuring the estimates’ accuracy. The signed relative bias is also useful for comparing estimates for sources that span over a wide range, as is the case in our study. The unsigned (absolute or relative) bias is obtained by applying modulus to the signed version. These measures indicate the estimation error without considering its direction.
We calculated the relative bias (signed and unsigned) for each subject, source position, presentation, and task, and averaged it across presentations and source positions, in order to obtain the individual collapsed bias for each task. This magnitude allowed us to measure the global accuracy of the subjects’ estimates. It is important to note that the collapsed unsigned bias is always zero or positive by definition, while the signed one can take any value, negative, zero, or positive, even when the unsigned bias is larger than zero.
Auditory Distance Estimation: Response Variability
Intra-subject variability was analyzed by obtaining the standard deviation of individual responses for each target distance. Standard deviations were obtained from the mean variance across phases (i.e., the RMS of the standard deviation).
Data and Code Availability
The anonymized individual-level raw data can be found as a CSV file in an OSF repository in the following link: https://osf.io/djw5q/. Also in the repository, there is an RMarkdown document that allows replication of the analysis and figures of the present work.
Results
Distance Estimation
In Figure 2a, we show the verbally-reported distance as a function of source position for both groups. As can be seen, the responses obtained with sources located at ground level were almost identical to those obtained at the height of the listeners’ ears. Both groups reported average distances below physical distances. In fact, the difference between the average distance estimation and the veridical response ranges from ∼1.2 to ∼3.4 m for both conditions, reflecting an underestimation of all the target’s positions.

(a) Verbal reported auditory distance of the target as a function of the target’s distances. Mean responses (±SEM: standard error of the mean) for the ear level condition (dark gray) and floor level condition (dark yellow or light grey in the printed version) participants are plotted as solid lines fitted by linear mixed-effect model (LMEM). Each thin curve represents a single subject. The grey dashed line indicates perfect performance (response = true distance). (b) Relative signed bias collapsed across distances (mean ± SEM) for both conditions. Each point represents a single subject. (c) Standard deviation of intra-subjects as a function of the target’s distances (mean ± SEM). Mean responses of participants are plotted as solid lines. (d) Standard deviation collapsed across distance (mean ± SEM) for both conditions. Each point represents a single subject.
Responses were well-fitted by a power functions of the form
Bias Analysis
Next, we present the mean signed relative bias. In Figure 2b, we show the individual data points (i.e., per subject) along with the between-subject mean for each condition, and its standard error (SEM). Participants systematically underestimate the distance to the target for both conditions (M = −51.6%, 95% CI [−61.9, −41.3]% for ear level condition and M = −53.6%, 95% CI [−63.9, −43.3]% for floor level condition). After fitting a linear model with level condition as a fixed effect (parameter estimates of the model in SM), we confirmed that participants did not present significant differences in their performance for both condition (β = 1.9%, 95% CI = [−16.6, 12.6]%, fixed-effect test: F(1,32) = 0.075, p = .78). In addition, we analyze the mean unsigned relative bias and we confirmed that participants did not present significant differences in their performance for both conditions (β = 1.8%, 95% CI = [−12.6, 16.2]%, fixed-effect test: F(1,32) = 0.064, p = .80).
Response Variability
Finally, we analyzed the variability in the response across trials. Figure 2c displays the standard deviation mean intra-subjects (±SEM) as a function of source distance. As the distance increases, the variability in the response of each subject increases for both conditions. This relationship presents a good fit with the compression power function. To quantify these relations, we calculate the logarithm on the standard deviation and on the target distances, and fitted an LMEM with fixed effects condition and target distance and both random intercept and slopes for each subject (parameter estimates of the model in SM). Both parameter estimates and p-values indicate that there are no significant differences in slopes between conditions (slope = 0.6, 95% CI = [0.27, 0.93] for ear level condition and 0.64, 95% CI = [0.23, 1.04] for floor level condition). The data shows a clear pattern, which is confirmed by ANOVA: the intra-subject variability increases with distance (fixed effects test: F(1, 16.7) = 17.77, p = .0005) and is not statistically different across condition levels (fixed effects test: F(1, 93.1) = 0.122, p = .72). The variabilities show no apparent interaction (fixed effects test: F(1, 92.2) = 0.03, p = .85). In addition, we analyze the collapsed variability in the response across both conditions. Figure 2d we show the individual data points (i.e. per subject) along with the between-subject mean for each condition, and its standard error (SEM). Participants present similar collapsed variability in response (M = 32.4%, 95% CI [25.7, 39.1]% for ear level condition and M = 29.3%, 95% CI [22.6, 36]% for floor level condition). After fitting a linear model with level condition as a fixed effect (parameter estimates of the model in SM), we confirmed that participants did not present significant differences in their variability for both conditions (β = −3%, 95% CI = [−12.6, 6.4]%, fixed-effect test: F(1,32) = 0.44, p = .51). In conclusion, the response with both condition levels resulted in similar intra-subject variability.
Discussion
The main question of this study was, do changes in elevation of the sound source affect auditory distance estimates? The study was motivated by previous results in the visual domain that showed that the angular elevation of objects influences VDP. For example, Epstein (1966) showed that if two targets are presented in the dark, the relative height of each one in the visual field influences their perceived distance. In the same vein, Ooi et al., (2001) reported changes in the perceived distance of visual objects located at ground level by varying their angular decline through prisms. Finally, Philbeck and Loomis (1997) showed that the typical biases in the perceived distance of visual objects located at eye level in the dark decrease when targets are located on the ground. Our hypothesis was that the auditory system is capable of using the auditory elevation cues of sources located on the ground as an absolute auditory distance cue. Thus, we expected to find less biased responses for sound sources located at ground level than for those located at the height of the listener’s ears (where an underestimation of auditory distance has classically been reported). However, responses obtained with sound sources located on the ground were almost identical to those obtained at the height of the listener’s ears. These observations were corroborated by the data analysis, where no significant differences were observed neither in the bias, the compression nor the variability of the responses.
The question that arises from this null result is whether the participants were unable to use elevation cues effectively to aid distance perception, or it was due to another factor, such as choice of stimuli, environment, configuration, etc. There are many reasons to believe that the participants were able to perceive the elevation of the ground-level sources. Hofman et al., 1998 reported that the participants were able to differentiate the elevation of sound sources using similar stimuli to those used here. In our experiment, the elevation angles of all ground-level sources are above the auditory perceptual limit (Blauert & Hearing, 1997; Oldfield and Parker, 1984; Oldfield and Parker, 1986; Perrott & Saberi, 1990). In addition, we use real broadband stimuli to ensure the presence of individualized spectral cues. In this order, the acoustic measurements showed suprathresholds spectral changes dependent on both the height of the source (up vs. down) and their distance, especially for sources located on the ground (see Figure 1 in Supplemental material). Although our results suggest that the auditory information produced by sound sources located on the floor do not work as an ADP cue, our experimental design does not rule out a possible effect of auditory elevation on ADP. For example, it is possible that the available auditory distance cues (level, reverberation) were robust enough to provide reliable distance estimates, limiting the possible benefit of using elevation cues. Regarding this point, we consider that the responses obtained at ear level still had much to improve as suggested by the high values of the relative bias (close to −50%). Another possibility is that a possible effect of auditory elevation may have been masked by the presence of more robust ADP cues. It would be interesting to approach this topic through different experimental designs, for example, testing stimuli and/or environments that would limit the intensity and reverberation cues (using anechoic environments or rove the level cue) so that the participant depended more on auditory elevation information. An effect of elevation could also be sought by a simpler task comparing relative changes by a discrimination task that does not rely on internal representations of distance. Finally, perhaps increasing the elevation of the sources using shorter distances or by conducting experiments with participants standing rather than sitting might help to observe some effect. We consider that these variants are interesting to be addressed in a future study.
In conclusion, to our knowledge, this is the first time that the role of the sound source elevation on ADP has been studied. Our results did not showed an effect of source elevation on ADP. We consider that if the source elevation were a robust auditory distance cue we would have observed a decrease in ADP bias at least for the speaker located on the ground at the nearest distance (2.4 m) with an elevation of −26.6°. On the contrary, the responses between both conditions for this distance were almost identical. However, as our results come from a preliminary study approach, we are cautious in drawing conclusions in this regard. Future research is necessary to establish the role of the acoustic elevation on the ADP.
Supplemental Material
sj-pdf-1-pec-10.1177_03010066221114589 - Supplemental material for Is source elevation an auditory distance cue? A preliminary study
Supplemental material, sj-pdf-1-pec-10.1177_03010066221114589 for Is source elevation an auditory distance cue? A preliminary study by Esteban N. Lombera, Manuel A. Guevara and Ramiro O. Vergara in Perception
Footnotes
Acknowledgements
We thank Guillermo Bori and Leo Barraza for their technical assistance during the experiments.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was supported by grants from Universidad Nacional de Quilmes, Argentina (PUNQ 1394/15); Agencia Nacional de Promoción Científica y Tecnológica, Argentina (PICT 2016-0738); and Agencia Nacional de Promoción Científica y Tecnológica, Argentina (PICT 2018-4586).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
