Abstract
Virtual reality systems are a popular tool in behavioral sciences. The participants’ behavior is, however, a response to cognitively processed stimuli. Consequently, researchers must ensure that virtually perceived stimuli resemble those present in the real world to ensure the ecological validity of collected findings. Our article provides a literature review relating to distance perception in virtual reality. Furthermore, we present a new study that compares verbal distance estimates within real and virtual environments. The virtual space—a replica of a real outdoor area—was displayed using a state-of-the-art head-mounted display. Investigated distances ranged from 8 to 13 m. Overall, the results show no significant difference between egocentric distance estimates in real and virtual environments. However, a more in-depth analysis suggests that the order in which participants were exposed to the two environments may affect the outcome. Furthermore, the study suggests that a rising experience of immersion leads to an alignment of the estimated virtual distances with the real ones. The results also show that the discrepancy between estimates of real and virtual distances increases with the incongruity between virtual and actual eye heights, demonstrating the importance of an accurately set virtual eye height.
Keywords
Introduction
Over the past few years, tremendous progress has been made in the field of virtual reality (VR). The technology surrounding head-mounted displays (HMDs) has evolved in particular: Technological performance has continued to improve, while acquisition and operating costs have decreased significantly. This has also led to the increasing popularity of VR systems in behavioral research. However, these systems are often used without having gone through a sufficient validation process, although numerous studies have shown that relevant cognitive differences between real and virtual environments may arise (e.g., Bhagavathula et al., 2018; de Kort et al., 2003; Feldstein, 2019; Feldstein et al., 2018; Feldstein & Dyszak, 2020; Feldstein & Peli, 2020; Fink et al., 2007; Slater et al., 2000).The ecological validity of data collected with VR systems must be ensured through a proper validation process that compares the perception and behavior of users in the given virtual environment to perception and behavior in real environments.
VR systems that aim at generating perceptually realistic user experiences are required—among other things—to support depth perception. Different subsystems of modern VR systems target different depth cues:
The simulation software provides photorealistic rendering that ensures the accuracy of occlusions, object shadings, heights in the visual field, and relative sizes. Low-latency head tracking and low pixel persistence support the perception of motion parallax. High display resolution ensures the visibility of looming and relative texture density. Stereoscopic imaging enables cues from binocular disparity and vergence.
A general overview of different depth cues and their effective distance ranges can be found in Feldstein (2019).
Some user performances have been systematically worse in virtual environments than in real environments, including egocentric distance estimation. Although people tend to accurately judge distances to objects in real environments (e.g., Loomis & Knapp, 2003), a meta-analysis conducted by Renner et al. (2013) has shown that egocentric distance estimates in VR correspond to approximately 74% of the modeled distance. The continuous enhancement of HMD technology, however, also leads to perceptual improvements in virtual environments, warranting new experiments in this field.
In the next section, we provide readers with an overview of experiments that have investigated distance perception in virtual environments in the past. These findings form the basis for the design and parameter choices in our experiment that we report thereafter. We investigated distance perception using a state-of-the-art HMD and compared the findings with distance estimates in a real environment. Our experimental modalities are in contrast to most reported experiments: We investigate somewhat farther distances and use an outdoor environment. Our intention was to broaden, reinforce, and update the knowledge that researchers have collected on this topic in the past and to reflect the status quo in state-of-the-art HMD technology.
Literature Review
Distance Estimation in Real and Virtual Environments
Distance estimates relative to the modeled distances (such as reported in the meta-analysis by Renner et al., 2013) primarily characterize the participants’ ability to judge distances in a given environment. Misjudging distances in virtual environments, however, is not necessarily critical per se, as long as the errors in real and virtual environments show similar characteristics. Accuracies in distance judgments in real environments have been observed to vary to some degree across different measurement methods and participant groups. Figure 1 shows the average distance estimation accuracies for 40 different publications over the last 40 years. Although overall distance judgments in real environments have been fairly accurate (M = 92%), there is notable variation across reports (SD = 9.5) (cf. Alexandrova et al., 2010; Andre & Rogers, 2006; Creem-Regehr et al., 2005; Foley et al., 2004; Fukusima et al., 1997; Grechkin et al., 2010; Interrante et al., 2006; Jones et al., 2008, 2011, 2016; Kelly et al., 2004, 2017; Klein et al., 2009; Knapp, 1999; Knapp & Loomis, 2004; Loomis et al., 1992, 1998; Messing & Durgin, 2005; Nguyen et al., 2010; Philbeck et al., 1997; Philbeck & Loomis, 1997; Phillips et al., 2009, 2010; Proffitt et al., 2003; Ries et al., 2009; Rieser et al., 1990; Sahm et al., 2005; Steenhuis & Goodale, 1988; Steinicke et al., 2010; Swan II et al., 2007; Thompson et al., 2004; Thomson, 1983; Willemsen et al., 2004, 2008, 2009; Witmer & Kline, 1998; Witmer & Sadowski, 1998; Wu et al., 2004).

Distance Estimates in Real Environments Relative to the Modeled Distances (retrieved from 40 different publications).
Consequently, when evaluating the quality of provided depth cues in virtual environments, the users’ relative performance between virtual and real estimates should be investigated, and not the users’ absolute performance in estimating distance.
Over the years, distance estimation performances have been measured in various VR systems and compared to distance estimates in real environments. Many of these studies aimed to identify potential factors that were postulated to be responsible for the distance compression observed in virtual environments with the ultimate goal of resolving this perceptual deficiency. Figure 2 shows the ratio of virtual distance judgments to those in real environments, as reported in 20 different studies since the 1990s (see Grechkin et al., 2010; Interrante et al., 2006; Jones et al., 2008, 2011, 2016; Kelly et al., 2017; Knapp, 1999; Messing & Durgin, 2005; Nguyen et al., 2010; Phillips et al., 2009, 2010; Ries et al., 2009; Sahm et al., 2005; Steinicke et al., 2010; Thompson et al., 2004; Willemsen et al., 2004, 2009; Willemsen & Gooch, 2002; Witmer & Kline, 1998). It should be noted that experiments that described estimation performance with several technological configurations are only represented by the most advanced technological condition, permitting a realistic mapping of the state-of-the-art performance at the respective time.

Distance Estimates in Virtual Environments Relative to Estimates in Real Environments (plotted against the year in which the study was published). The straight line shows the result of linear regression.
For better comparability, this graph only includes studies that investigated the perception of virtual distances in head-mounted systems. Studies that simulated technical limitations of HMDs, such as adding weights to the user’s head or reducing the field of view (FOV) in a real environment (e.g., Creem-Regehr et al., 2005; Knapp & Loomis, 2004), were not taken into account in the given graphic representation. Such artificial limitations in real environments showed tendencies toward distance compression similar to virtual environments, but not to the same extent. Experiments in which the virtual environments were projected onto large screen systems (e.g., Alexandrova et al., 2010; Klein et al., 2009; Naceri & Chellali, 2012; Ziemer et al., 2009) were also not included in given comparison because of the substantial technological differences. Although all of the experiments in Figure 2 used a similar technological concept (i.e., HMDs), the study designs contained substantial differences, particularly with regard to the evaluation method and estimation distances (see Table 1). The graph must, therefore, be interpreted with caution and serves mainly to illustrate the performance improvement over the years, presumably connected to improvements in HMD technologies. A simple regression analysis confirmed the significant alignment of virtual estimates to estimates in real environments over time: F(1, 23) = 17.06, p < .001, R2 = .423.
Distance Judgments in Real and Virtual Environments (in chronological order).
The average ratio of virtual to real distance estimates was 77% in the analysis shown in Figure 2. The value is similar but not equal to the average 74% ratio of estimated to modeled values for the virtual environments (see Table 1). However, it must be mentioned that the corresponding estimates in the real environments were quite accurate, with an average accuracy of 95%. This high accuracy in the real environments may be explained with the circumstances that the vast majority of the 20 analyzed experiments used blind walking as evaluation method; other evaluation methods that are known to produce larger underestimations in real environments (see Figure 1) are underrepresented in the analysis shown in Figure 2 due to a lack of available data.
Measurement Methodologies
Distance judgments are commonly assessed on the basis of verbal estimates, perceptual matching tasks, or visually directed actions. These different tasks demand different sets of skills and cognitive processes. Verbal estimates typically require participants to numerically judge the absolute distance, while matching tasks demand the ordinal or relative comparison of two or more distances. Visually directed actions, such as blind walking or blind throwing, additionally call for perceptual-motor skills. Combining visual estimation tasks with relevant locomotive actions has been shown to increase the accuracy of distance estimates (Maruhn et al., 2019; Schneider et al., 2018). The nature of the task (i.e., ordinal, relative, or absolute judgments) determines which depth cues are cognitively relevant for the participants’ distance judgments, with an overview of effective depth cues being reported in Thompson et al. (2011).
In real environments, participants typically show accurate judgments for distances of up to ten meters, with the exception of verbal estimates that tend to show some underestimation also at closer distances (see Figure 1, but also Creem-Regehr & Kunz, 2010; Cutting & Vishton, 1995; Loomis & Philbeck, 2008; Proffitt & Caudek, 2003; Renner et al., 2013). Figure 1 also shows that there is a clear tendency to underestimate distances rather than overestimate. The accuracy of distance judgments in real environments may significantly vary between different groups of participants. For example, direct blind walking has been reported to be quite precise in most studies (e.g., Creem-Regehr et al., 2005; Interrante et al., 2006; Loomis et al., 1992, 1998; Phillips et al., 2009, 2010; Ries et al., 2009; Rieser et al., 1990; Steenhuis & Goodale, 1988; Steinicke et al., 2010; Willemsen et al., 2004, 2009; Willemsen & Gooch, 2002; Wu et al., 2004), while some studies have reported notable underestimations (e.g., Andre & Rogers, 2006; Jones et al., 2008, 2011, 2016; Kelly et al., 2017; Witmer & Sadowski, 1998).
In virtual environments, users consistently underestimated distances, regardless of the measurement method (see Table 1, but also Alexandrova et al., 2010; Creem-Regehr et al., 2015; Durgin et al., 2002; Klein et al., 2009; Kunz et al., 2009; Z. Li et al., 2011; C. J. Lin et al., 2015; Loomis & Philbeck, 2008; Napieralski et al., 2011; Peer & Ponto, 2017; Plumert et al., 2005; Renner et al., 2013; Schneider et al., 2018; Willemsen et al., 2004; Witmer & Sadowski, 1998; Zhang et al., 2012; Ziemer et al., 2009).
Mean Values of Distance Estimates.
Since we are interested in comparing the perception in the virtual environment with the one in reality, the method of choice seems to be a relative pairwise comparison. However, due to the amount of time required for setting up the VR equipment and the subsequent necessary acclimatization period, it is not possible to rapidly switch between the two environments, disqualifying tasks that demand direct matching or discrimination between the real and virtual environments. Furthermore, most of the common action-based methods were not viable for the distance range intended in our study (8–13 m). The limited space in our VR laboratory would only have allowed for action-based methods that are cognitively complex, such as triangulated blind walking (e.g., Knapp & Loomis, 2004) or timed imagined walking (e.g., Plumert et al., 2005). Willemsen et al. (2004) and Grechkin et al. (2010), however, reported that distance judgments from triangulated blind walking and timed imagined walking are less accurate than those for direct blind walking. In addition, results from Klein et al. (2009) indicate that judgments made during triangulated blind walking deteriorated rapidly for distances of more than ten meters, disqualifying this method for farther distances. Verbal estimates, on the other hand, were reported to remain consistent even for distances of ten meters or more (Klein et al., 2009; Loomis & Philbeck, 2008). Loomis and Philbeck (2008) deduced from a series of eight different experiments a constant underestimation of about 20% for verbal estimates.
We chose verbal estimates for our study design despite the known estimation error in real environments (see Figure 1) because practicalities prevent a matching task involving a direct comparison of the two environments, and estimates in action-based methods deteriorate at farther distances. The verbal method may be applied to investigate whether underestimation effects were similar in both environments. The objective of our study was not to evaluate absolute estimation accuracy abilities but the relative difference between estimates in the two environments. After all, we are simply interested in knowing whether participants perceive depth in our VR system the same way they do in reality.
Readers who have a keen interest in comparisons of measurement methods relating to distance perception may find useful results in a recent study by Maruhn et al. (2019), but also Andre and Rogers (2006), Grechkin et al. (2010), Kelly et al. (2017), Klein et al. (2009), Loomis et al. (1998), Loomis and Philbeck (2008), Napieralski et al. (2011), Philbeck and Loomis (1997), Sahm et al. (2005), Swan II et al. (2007), and Willemsen et al. (2004, 2009) compared different measurement methods with each other.
Factors Affecting Egocentric Distance Perception
Factors that influence egocentric distance estimates in virtual environments may be divided into three categories: human factors, compositional factors, and technical factors.
Human Factors
The way an environment is perceived and processed may be subject to human individuality. Proffitt (2006) found that nonvisual factors, such as the given goal and the participant’s physiological state, may also influence distance judgments. More specifically, it has been postulated that the feeling of presence, which differs among users due to varying susceptibility to perceived sensations, affects distance judgments (Interrante et al., 2006; Phillips et al., 2010, 2012; Steinicke et al., 2010). The participants in our study were asked to complete a presence questionnaire after their experience with the virtual environment, ultimately enabling the investigation of the relationship between the subjectively experienced immersion and the estimation performance.
Several studies suggest that male and female participants do not exhibit significant differences when judging distances (Creem-Regehr et al., 2005; Interrante et al., 2006; Naceri & Chellali, 2012). It has also been postulated that factors like age and height have no statistical influence on distance estimates (Murgia & Sharkey, 2009). Furthermore, Murgia and Sharkey (2009) reported that previous experience with virtual environments did not affect the accuracy of virtual distance estimates and that no increase in estimation accuracy was evident over the course of the experiment. Distance judgments that improved during experiments by Jones et al. (2011, 2012) were likely due to the poorly fitting HMD: The participants in their studies may have learned to take advantage of this deficiency during the visually directed action task by peeking at the real environment through the gap below the HMD. There is, however, evidence that it is possible to train the accuracy of distance estimates in virtual environments by providing participants with just a few minutes of feedback-based practice (see Altenhoff et al., 2012; Kelly et al., 2013, 2014; Mohler et al., 2006; Richardson & Waller, 2005, 2007; Waller & Richardson, 2008). This training phase should not be considered as a solution to solving the distance compression occurring in virtual environments since learned estimation accuracy is different from perceived estimation accuracy. Researchers who intend to use blind-walking tasks should also consider that participants may grow in confidence in walking blindly over the course of the experiment, which consequently could affect the outcome (Philbeck et al., 2008).
Compositional Factors
The composition of virtual environments has been shown to affect distance estimates. Pictorial depth cues, such as linear perspective, foreshortening, and texturing, increase not only the realism of a scene but also improve distance perception (Murgia & Sharkey, 2009; Surdick et al., 1997). The superiority of highly realistic sceneries as opposed to nonphotorealistic graphical renderings for judging distances has been shown in multiple studies (Phillips et al., 2009, 2010, 2012; Phillips & Interrante, 2011; Vaziri et al., 2017). In particular, the presence of a realistic ground texture has been shown to improve distance judgments (Sinai et al., 1998). Overall, immersive depth cues, as well as high graphic quality, have been shown to affect the participants’ performance when distances had to be estimated (Kunz et al., 2009). In our experiment, we used a powerful simulation application that rendered a rich virtual environment.
Andre and Rogers (2006) demonstrated that the choice of whether an experiment is conducted indoor or outdoor might affect the accuracy of distance estimates. We chose to conduct our experiment in an outdoor environment for two reasons: One reason was the intention to use our newly developed VR system for outdoor experiments in the future, and the other reason was the sparse literature that exists on distance perception in outdoor environments in contrast to reported indoor experiments.
Scenery content itself has been shown to affect distance estimates (Lappin et al., 2006). In particular, the use of an environment that happened to be a replica of a familiar (real) environment has been demonstrated to positively affect distance judgments (Interrante et al., 2006, 2008; Phillips et al., 2012; Steinicke et al., 2010). We replicated our real-world setting in the virtual environment, which allowed for a high consistency of monocular depth cues between the two environments, which is crucial when comparing real and virtual results.
Technical Factors
Systems that display immersive virtual environments have a slew of technical limitations that could be causes for distance compression. This is supported by the experiment carried out by Messing and Durgin (2005), who demonstrated that significant underestimations of distances could be observed with HMDs, even when a recorded real environment was displayed, indicating that the hardware components of the VR system contribute to the underestimation phenomenon.
Although binocular disparity is considered to be a highly effective depth cue (Cutting & Vishton, 1995), its relevance is limited to relatively close distances, with the effect diminishing with increasing distance (Cutting & Vishton, 1995; Foley, 1991; Lappin, 2014; Nagata, 1989; Palmisano et al., 2010). Studies that investigated tasks involving spatial judgment came to varying conclusions when stereoscopic viewing conditions were compared to monoscopic ones: Tasks in close proximity to the participant (< 2 m) benefitted from the availability of stereoscopic depth information (Bingham et al., 2001; Luo et al., 2009; van der Kamp et al., 1997), while tasks in which had to be focused on somewhat farther distances did not show an advantage of stereoscopic conditions over monoscopic ones (Creem-Regehr et al., 2005; Eggleston et al., 1996; Roumes et al., 2001; Willemsen et al., 2008). Participants in our study saw the virtual environment stereoscopically.
Providing users in virtual environments with accommodative depth cues remains an unsolved challenge to this day since all current systems use a single, invariable focal plane. Similar to convergence, accommodation is, however, only an effective depth cue for distances of up to two meters (Cutting & Vishton, 1995; Fisher & Ciuffreda, 1988; von Hofsten, 1976). Within this range, it has been shown that a conflict between vergence and accommodation does affect depth perception (Swenson, 1932), although other available depth cues may mitigate the relevance of accommodation (Hoffman et al., 2008; Mon-Williams & Tresilian, 1999, 2000). The lack of accommodation effects in our virtual environment may be considered negligible since we only investigated distances of eight meters and more.
Although previous work has been inconclusive on effects of FOV on distance perception (cf. Creem-Regehr et al., 2005; Knapp & Loomis, 2004; Loftus et al., 2004), the advent of HMD systems with higher FOV suggests that the FOV may be relevant to egocentric distance estimation (Buck et al., 2018; Jones et al., 2013, 2016; B. Li et al., 2015, 2016, 2018; Q. Lin et al., 2011). Klein et al. (2009), who compared a virtual environment that was displayed on a large tiled screen to a cave automatic virtual environment (CAVE) with a wide FOV, also attributed a vital role to FOV when it comes to making distance judgments.
Additionally, the weight and inertia of head-mounted systems have been postulated to contribute to the compression of perceived depth (Buck et al., 2018; Jones et al., 2008; Willemsen et al., 2004, 2009). However, the ergonomics of the HMD itself cannot explain the entire distance-compression phenomenon observed in virtual environments (Grechkin et al., 2010; Kelly et al., 2017; Willemsen et al., 2004, 2009). Nonetheless, more accurate distance judgments may be observed with newer HMDs, which are lighter and more ergonomic and also provide larger FOVs (Buck et al., 2018; Kelly et al., 2017; Young et al., 2014). Buck et al. (2018) compared a variety of modern HMDs with each other with a focus on distance judgments, affirming the technological advancement. Unfortunately, their experiments lacked a relative comparison to estimates in reality, which we consider to be crucial for the proper performance evaluation of the given technology.
Depth information retrieved from motion parallax requires real-time tracking of the user’s translational movements. A full-body motion-capture system was implemented in our study, even though several studies suggest that motion parallax is not crucial for accurately completing tasks relating to distance judgments (Jones et al., 2008; Luo et al., 2009; Narayan et al., 2005). The ability to move freely in the virtual environment affects, however, immersion and by that the user’s experience of presence, which has been shown to affect user performance (see Bowman & McMahan, 2007; Creem-Regehr et al., 2015; Cummings & Bailenson, 2014; Hendrix & Barfield, 1995; Schuemie et al., 2001; Slater & Wilbur, 1997).
Users are not able to see their own bodies when using head-mounted VR systems, making the implementation of an avatar necessary. Movement-coupled avatars of one’s own body have not only been shown to increase the experience of presence (Leyrer et al., 2011; Slater, 2009; Slater & Usoh, 1994) but also to improve distance estimates (Mohler et al., 2008; Phillips et al., 2010; Ries et al., 2009). Since our study strived to isolate possible causes for distance compression, users were provided with an avatar mimicking their body movements.
Method
Experimental Concept
Participants were asked to verbally estimate distances of an orange traffic cone with a height of 30 cm. This object possesses excellent visibility and the cone dimensions are not known to the participants since typical traffic cones may vary between 20 cm and 100 cm in height. This prevents participants from deducing the object’s distance based on cues derived from familiar sizes (see Gogel, 1969; Gogel & Newton, 1969; Predebon, 1994).
The traffic cone was placed on the ground at distances of 8 m, 9 m, 10 m, 11 m, 12 m, and 13 m in pseudorandomized order. All of these distances are within the range of the action space (2–30 m) that requires a different set of depth cues than distances within personal space (< 2 m) or vista space (> 30 m) (see Cutting, 1997; Feldstein, 2019). The distances in our experiment were selected to contrast most of the reported experiments on distance perception, which primarily investigated distances of maximum seven meters.
Experimental Tools
Device
Our VR system involved six essential elements: a motion-capture suit, a stereoscopic HMD, stereo headphones with active noise control, a motion-capture system, a dynamic cable-routing system, and a control center (see Figure 3).

Illustration of the Virtual Reality System.
The simulation is run using the Silab software framework (WIVW, Würzburg, Germany), which is originally a traffic simulation software for driving simulation. This tool communicates with the Vicon motion-capture system (Vicon Motion Systems, Yarnton, U.K.) that relies on ten cameras covering a walking area of 4 × 2.5 m2, a Vicon Giganet data preprocessing unit, and the Vicon Nexus software. This motion-capture system identifies and tracks the LED markers that are mounted on the motion-capture suit. In addition to the motion-tracking data processed by Vicon, the traffic simulation software also receives rotational head-motion data from the HMD’s built-in motion-capture unit. The HMD used in our study is an Oculus Rift Development Kit 2 (Oculus VR, Menlo Park, CA, U.S.) with a display resolution of 1,920 × 1,080 px2. This HMD provides each eye with an own image in order to use the advantages of stereoscopic view. The effective FOV of the HMD depends on the configuration of the lenses and the user’s anatomy. In our study, the vertical FOV was determined to be approximately 94°.
Further technical details relating to our simulator setup can be found in Feldstein (2020), Feldstein et al. (2016, 2018), as well as in Lehsing and Feldstein (2018).
Virtual Environment
The virtual environment, created with the Silab software framework, was a replica of the surroundings of the campus of the Technical University of Munich in Garching. Figure 4 shows the real environment and the virtual replica used in our experiment. Note the high consistency of monocular depth cues in both environments.

Real and Virtual Environments of the Experiment.
Presence Questionnaire
The presence questionnaire (PQ) used in this study was initially developed by Witmer and Singer (1998) and later revised by Witmer et al. (2005). The factor structure that we used was developed by the Cyberpsychology Lab of the Université du Québec en Outaouais (UQO) and contained 22 questions (Robillard et al., 2002). The PQ allows for evaluating the subjectively experienced immersion within a given VR system. The questions were presented in their original English format, with participants expressing the perceived immersion on 7-point Likert-type scales for each of the 22 items.
Estimation Discrepancy Metric
Distance estimates depend to some extent on the participants’ abilities to judge the perceived distance. The study’s objective is not to assess the participants’ estimation accuracy but the estimation discrepancy between the real and virtual environments. Instead of comparing the verbal distance estimate with the actual distance, we require a key figure that compares the estimates in the two environments directly with each other. Nonetheless, the absolute difference between the estimates made in the two environments may potentially grow with increasing estimation distance, consequently rendering absolute numerical discrepancy meaningless. A possible way to judge the difference between the two environments consisted of evaluating the ratio of the absolute environment discrepancy to the actual estimation distance. We defined this value as the relative estimation discrepancy and computed it as follows:
Participants
The participants were selected from students and university staff, which resulted in the sample being relatively homogenous. The collective consisted of 30 subjects with 20 male and 10 female participants, who were on average 21.2 y/o (SD = 2.0; [17–27 y/o]) and had an average height of 179.3 cm (SD = 10.0; [161–198 cm]). It should be noted that the initial sample consisted of 31 participants, with one being retroactively excluded since 58 out of 60 estimates by this participant were above the third standard deviation from the group mean values (at the specific conditions), thus showing an apparent overload with regard to the task at hand. For the remaining 30 participants, all of the given estimates were within the third standard deviation, with the exception of one single estimate out of the 1800 values (30 participants × 60 estimates), affirming the overall consistency of the participants’ estimates.
Among the 30 participants, 23 did not require any visual aids, 5 wore contact lenses, and 2 wore glasses. The participants were not examined with regard to their stereoscopic acuity.
The sample size chosen for this experiment exceeded the required minimum size of n ≥ 20, which had been determined in a statistical power analysis (see Faul et al., 2007) for investigating the two-tailed main effect of environment (i.e., whether there is a difference in the distance estimates between the two environments or not). The power analysis was run based on data collected in a pilot study with seven participants, with the level of significance set to α = .05 and the power of the test to 1 – β = .8 (see Cohen, 1977), while the effect size was determined at
Experimental Procedure
A counterbalanced experimental design was necessary since all of the participants had to complete the distance estimates in the virtual environment as well as in the real one. Consequently, half of the participants started the experiment in the simulator, whereas the other half was asked to estimate the distances in the real world first. This exposure order led to a between-subject factor, hereinafter called order of environments.
In the virtual environment, participants were equipped with the HMD, the headphones, and the motion-capture suit, followed by a calibration process. During a short introduction, the participants had the chance to familiarize themselves with the system while being loaded into a virtual replica of the simulator laboratory. The participants had the possibility to move around for a few minutes and explore the virtual environment. After this acclimatization period, the outdoor simulation was loaded, and the investigator briefly explained the task to the participants. Participants accomplished a sequence of 30 estimates from a predefined spot. The six different distances reoccurred each five times in a pseudorandomized order. After finishing all of the 30 estimates in the virtual environment, the participant completed the PQ.
The estimation procedure for the real environment was identical to the virtual one, albeit without the participant wearing the VR equipment. Furthermore, the traffic cone had to be moved manually by the investigator’s assistant. Participants were asked to close their eyes for approximately four seconds while the traffic cone was repositioned in the real environment or respawned in the virtual one. Participants were verbally informed when they were allowed to open their eyes again. This ensured that the participants could not visually follow the relative position change of the cone, thus reducing estimation bias.
The time for each estimate was measured with a stopwatch in both environments. The participants were not aware of their decisions being timed. They also did not receive any feedback on their estimation performance and accuracy.
Results
Data pretreatment and statistical analyses were mainly based on recommendations by Field (2013).
For the analyses of variance (ANOVAs), the five repeated estimates for the same distance and environment were merged to produce 12 values (2 Environments × 6 Distances) for each of the 30 participants. The distribution of the collected estimation values turned out to be right-skewed. The reason for this is the relative proximity of the estimation range to the participant in combination with the existence of a true zero point for distance estimates (i.e., all of the estimates remain above zero). Consequently, distance estimates were proximally limited and distally unlimited. Therefore, a logarithmic transformation of the estimation values was necessary prior to the ANOVA and t-test analyses in order to fulfill the requirements for a normal distribution. The effect sizes in these analyses were reported using Pearson’s correlation coefficient r.
The relative estimation discrepancies, which were generated from the untransformed estimation values (see the Estimation Discrepancy Metric section), and the measured decision-making times showed right-skewed distributions. Thus, these values were also transformed logarithmically. The subsequent regression analyses were based on 1,000 bootstrap samples, ultimately ensuring more robust results.
Absolute Estimation Accuracy
An independent-samples t-test was run, comparing each estimate with the actual distance. Overall, a statistically significant difference was found between the estimates and the actual distances, although the statistically significant outcome was expected, given the immense sample size of 1800 values: t(1799) = 9.027, p < .001, r = .208. Similarly, significant differences to the actual distances were observed in both environments when analyzing the estimates separately by environment: t(1358.5) = 8.425, p < .001, r = .223 (real environment) and t(1271.1) = 6.921, p < .001, r = .191 (virtual environment). The estimates turned out to be on average at 94% of the modeled distance in the real environment and 96% in the virtual one.
Environment Comparison
The estimation values in the real and virtual environments were pairwise compared (within-subject) using a 2 × 6 (Environment × Distance) repeated-measures ANOVA. The results revealed no significant main effect of environment: F(1, 29) = 0.21, p = .650, r = .085. The interaction effect of Environment × Distance was also not significant, suggesting that overall there were no notable differences between the two environments in terms of distance perception: F(5, 145) = 1.18, p = .320. The distribution of the estimation values for each Environment and Distance is shown in Figure 5 with box plots. Table 2 reports the mean estimation values for the six different distances.

Box-and-Whisker Plots Showing the Distribution of the Distance Estimates. Values between the lower and upper quartile are represented by the box, while the whiskers identify estimates within 1.5 IQR of the lower and upper quartile. The horizontal line in the box shows the median, and the small square shows the arithmetic mean. The horizontal lines behind the plots show the actual distances.
A simple regression analysis suggests that the investigated distance did not affect the relative estimation discrepancy (see the Estimation Discrepancy Metric section), indicating that the relative estimation discrepancy remained the same for different distances: F(1, 898) = 0.15, p = .695, R2 < .001. Thus, this implies that the absolute estimation discrepancy between the two environments increases linearly with increasing distance.
Gender
The influence of the participants’ gender on the distance estimation performance was investigated using a
The ANOVA compared the absolute distances that the male and female participants estimated. Additionally, we compared the discrepancy of the estimates between the real and virtual environments for the male and female participants using an independent t-test. The outcome also suggests no significant gender difference: t(898) = −0.650, p = .514, r = .022.
Order of Environments
A 2 × 6 × 2 (Environment × Distance × Order of Environments) mixed-design ANOVA investigated the main effect of the between-subject factor Order of Environments, with the results suggesting the absence of a significant main effect: F(1, 28) = 0.05, p = .821, r = .002. The interaction effect of environment and environment order, however, turned out to be significant: F(1, 28) = 5.99, p = .021. The within-subject comparison showed that participants who started the experiment in the virtual environment had significantly different estimates between the two environments (p = .049). No significant difference was shown for those who started the experiment in the real environment (p = .172). The interaction effect of environment and environment order is shown in Figure 6.

Interaction Effect of Environment and Environment Order.
No significant differences were observed for the between-subject test that compared the group starting in the real environment with the one starting in the virtual environment, neither in the real environment (p = .788) nor the virtual one (p = .527).
The second-order interaction effect of Environment × Environment Order × Distance also showed a significant effect: F(5, 140) = 4.14, p = .002. Figure 7 shows the mean estimates of the participants who started in the real environment, and Figure 8 shows the estimates of the participants who started in the virtual environment.

Mean Distance Estimates for the Participant Group Starting the Experiment in the Real Environment.

Mean Distance Estimates for the Participant Group Starting the Experiment in the Virtual Environment.
Experienced Immersion
Participants reported the subjectively experienced immersion using the PQ. The PQ values were compared to the relative estimation discrepancies between the two environments. A simple regression analysis suggests a significant linear relationship: F(1, 898) = 16.65, p < .001, R2 = .018, with a standardized regression coefficient of β = −.135. Thus, a higher experienced immersion seems to lead to a lower discrepancy between the two environments, which may be described using the following predictive model:
Virtual Eye Height
The same virtual avatar with constant body height and thus constant viewing height was used for all of the participants, whose body height differed in reality. We, therefore, investigated the effect of the discrepancy between the real and virtual body heights on the relative estimation discrepancy between the two environments. The body height discrepancy was calculated analog to the discrepancy of the distance estimates (see the Estimation Discrepancy Metric section), using the following formula:
A simple regression analysis suggests a significant linear relationship between these two factors: F(1, 898) = 7.63, p = .006, R2 = .008, with a standardized regression coefficient of β = .092. The discrepancy between the estimates in the two environments increases along with an increasing discrepancy between the real and virtual body heights (i.e., viewing levels), which may be described using the following predictive model:
Learning Process
Possible learning effects over the duration of the experiment were analyzed using a multiple regression analysis: F(2, 897) = 2.23, p = .108, R2 = .005. The analysis suggests that the relative estimation discrepancy between the two environments was neither affected by the number of estimates made (β = −.061, p = .095) nor influenced by the duration of the decision-making time (β = .025, p = .491).
The decision-making time for each estimate decreased over the number of estimates made (see Figure 9). This was confirmed using a simple regression analysis: F(1, 1798) = 133.89, p < .001, R2 = .069, with a standardized regression coefficient of β = .263. The significant relationship between these two factors may be described using the following predictive model:

Decision-Making Time Plotted Against the Sequential Number of Estimates.
A Wilcoxon signed-rank test suggests a significant difference between the required decision-making time in the real environment (Mdn = 3.37 s) and the decision-making time in the virtual one (Mdn = 3.20 s): T = 183571, p = .014, r = −.058. A look at the effect size puts this statement into perspective, however, since the significant result may simply be related to the large dataset comprising 1,800 estimates.
Discussion
General Observations
The replication of the real environment within the virtual environment led to similar monocular depth cues in both environments. A learning process during the experiment was evident in terms of decreasing decision-making time, but not in terms of an increasing estimation accuracy, which is consistent with the findings by Murgia and Sharkey (2009). The decision-making times decreased predominantly during the first few estimates, with an asymptotic approach to four seconds. This suggests that participants searched for reference points in both environments, with which they compared the current position of the traffic cone. In doing so, a large number of monocular depth cues (e.g., cars, trees, road texture) supported the participants in judging the distances. The similarity of depth cues in the two environments eventually contributed to the nonsignificant main effect between the real environment and the virtual one.
Order of Environments
Although the main effect of environment does not indicate differences between the estimates in the two environments, the interaction effect of the environment and environment order turned out to be significant. A more in-depth analysis of this interaction effect has shown that the distance perception differs between the two environments for participants that started the experiment in the virtual environment, but not for those starting in the real world. The impact of the environment order on the distance estimates has already been observed in the study conducted by Interrante et al. (2006), although in their case, the distance underestimation was more prominent in the respective starting environment in contrast to our study (see Figures 7 and 8).
The between-subject factor order of environments led to sample sizes of 15 participants each, which limits reliable conclusions, although the measured significance may be expected to be reinforced with a larger sample. The impact of the starting environment on the estimates can be counteracted by increasing the break period between the two environments and thus decrease bias. In our study, participants had only a few minutes of break between the two environments.
Since the findings indicate that the order of environments has an impact on the estimation discrepancy between the two environments, there are two possible explanations:
One may be that participants starting in the real environment carry over their experience from the real environment to the virtual environment. Thus, they achieve closer estimation results in the two environments, with the estimates in the virtual environment being influenced by the prior estimates in reality. Participants starting in the virtual environment, on the other hand, recalibrate their distance perception again in reality, independent of their prior virtual experience, with the significant difference between the two environments as an outcome. If this is the case, an increase in the break period between the two environments will elicit a decrease in the spillover effect for the group starting in the real world and will eventually produce a significant difference in estimates within this group as well.
A different potential explanation is that users of virtual environments experience a temporary change in underlying physiological functions that affect their distance perception afterward in reality. It has been shown that participants exhibited esophoric shifts in viewing after ten minutes of viewing through a conventional fixed-focus stereoscopic display (Mon-Williams et al., 1993), causing a temporary disturbance to the binocular function. This is also supported by experiments conducted by Waller and Richardson (2008), who observed increased difficulties in judging distances in the real world after having interacted with a virtual environment. The required time for convergence to return to a normal state is not known, but increasing the break period between viewing environments could reduce the observed difference for the group starting in the virtual environment in that case.
Future studies may investigate these effects by varying the duration of the break between the two environments, and ultimately determine whether the virtual environment affects the subsequent estimates in reality or whether the prior experience in the real environment helps to estimate distances in the virtual environment afterward.
Figures 7 and 8 allow for the observation that the participants had, on average, higher estimates in their starting environment. This is observable for any given distance for the group starting in the real environment as well as for the one starting in the virtual environment. The graphs also suggest—in particular for the group starting in the virtual environment—that estimates in the two environments drift apart with increasing estimation distance, raising the question of whether the differences between the two environments will increase for farther distances and become negligible for closer distances. The estimation difference between the real and virtual environments thereby follows—in an amended form—Weber’s Law (Fechner, 1860) that suggests that the perceptible difference between two physical stimuli (in this case, the perceived real distance and the perceived virtual distance) remains proportional to the baseline stimuli: The estimation difference between the two environments grew proportionally with increasing distance, as a regression analysis that investigated the relative estimation discrepancy across the different estimation distances suggests (see the Environment Comparison section).
Experienced Immersion
The Witmer and Singer Presence Questionnaire is helpful when evaluating VR systems in terms of the subjective impression of provided immersion. In collaboration with the UQO Cyberpsychology Lab in Canada, several established and validated VR systems were evaluated with regard to their PQ scoring, revealing an average overall PQ score of
One possible explanation for not experiencing an overall significant difference in distance perception may be the high presence feeling experienced in our VR system. The regression analysis revealed a negative relationship between the PQ score and the relative estimation discrepancy. Consequently, the high PQ score of our simulator may explain the low overall discrepancy of the egocentric distance perception between the two environments. Factors that contributed to the high sense of presence in our study may include the use of an avatar for self-representation (see Slater, 2009) and the high number of depth cues integrated into the virtual environment (see Dinh et al., 1999).
Virtual Eye Height
The experiment used the same avatar for every participant, thus providing every participant with the same virtual height, while their real heights varied. The created avatar had an eye height of 180 cm. The participants, on the other hand, had an average body height of 179 cm (SD = 10.0), varying between 161 cm and 198 cm. Statistical data provided by Jürgens (1999) suggest the average eye height for our sample to be around 167 cm, thus approximately 13 cm below the set height in the virtual environment. The regression analysis indicates a correlation between the distance estimation discrepancy and the eye height discrepancy between the two environments. These results are in line with the findings by Leyrer et al. (2011), who showed that a manipulation of the virtual eye height influences the distance perception: They observed an increased compression of perceived distance when the virtual eye height was set 50 cm above the real eye height.
These findings underline the importance of individually adjusting the virtual eye height to the participants’ actual eye height. An approximate eye height that is deduced from the participant’s self-reported body height and statistical charts (e.g., Jürgens, 1999) should be sufficient and save some setup time. In fact, we analyzed Jürgens’ statistical data collected across different age groups, height groups, and sexes (including data on 4,350 male and 2,860 female subjects, aged 18–40 years) and deduced a fairly consistent ratio of 0.932 (SD = 0.005 across the 46 different sample groups) between the human eye level and the human body height. This reference value of 93.2% of the body height may be reliably applied for adult humans when setting up the eye height in a VR system.
Limitations of the Experiment
The experiment presented here has certain limitations, which is why the results should be viewed with caution.
Our study may only partially validate the distance perception in the given VR system since the investigated distances were all within the range of 8 m to 13 m. Distance perception, however, strongly depends on different sets of depth cues, which in turn depend on the given distance. A complete distance-perception validation must involve closer as well as farther distances. The perception of closer distances employs in particular a different set of depth cues (Feldstein, 2019). It must be clear that the relevance of different ranges is task-dependent. For instance, the perception of very close distances may not be relevant to traffic investigations (e.g., Feldstein & Dyszak, 2020). Consequently, investigators using virtual environments for behavioral research must always consider the question of what distance ranges are relevant for the intended task and whether the accurate perception of these distances is validated.
It also must be noted that verbal estimates demand a different set of cognitive skills than other assessment methods (see the Measurement Methodologies section). Different assessment methods may lead to divergent results and conclusions, but the approach to how to apply a specific methodology must also be considered. For example, in our study design, the participants were required to indicate the distance estimates in full meters—discretely—which facilitated the task.
Finally, the moderate number of participants in our experiment may be considered a limitation for the validity of the obtained results. Thirty participants are acceptable per se (see the power analysis in the Participants section). However, given that the group was counterbalanced with regard to the order of environments, and this environment order turned out to affect the distance estimates, two groups of 15 participants each may be considered somewhat low for obtaining reliable conclusions, and, thus, call for further studies. Other between-subject factors, such as gender, are also hardly analyzable when the number of participants is low. Therefore, the results of our study hardly permit for drawing reliable conclusions with regard to gender differences.
Conclusion
Contrary to numerous studies carried out in the past and similar to some recent studies (see the Distance Estimation in Real and Virtual Environments section), this experiment did not show a significant difference between distance estimates in the real environment and in the virtual one: The verbal estimates in the two environments showed almost identical values. This may be considered as an indicator of the continuous development made in VR technology, with the gap between experiences in real and virtual environments steadily decreasing. Numerous studies have demonstrated that technological improvements may indeed lead to more accurate virtual distance estimates (e.g., Buck et al., 2018; Creem-Regehr et al., 2015; Kelly et al., 2017; Phillips et al., 2009, 2010; Phillips & Interrante, 2011; Ries et al., 2009). The experiment presented here shall provide insight into the current performance of state-of-the-art HMDs. A complete assessment of depth perception within a specific virtual environment should consider a wider variety of different distances and also a variety of different tasks that may involve a combination of verbal estimates, comparative judgments, and action-based assessments. The size of the participant group should also be carefully considered, especially when fragmentations occur due to between-subject factors.
Footnotes
Acknowledgements
The authors would like to thank Georg N. Dyszak, who assisted in the experiment and participated in discussions, and Cécile Boudot, who provided valuable input to the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: I. T. F. was supported in part by the Fulbright Program and the Studienstiftung des Deutschen Volkes.
