Abstract
Humans can perceive depth when viewing with one eye, and even when viewing a two-dimensional picture of a three-dimensional scene. However, viewing a real scene with both eyes produces a more compelling three-dimensional experience of immersive space and tangible solid objects. A widely held belief is that this qualitative visual phenomenon (stereopsis) is a by-product of binocular vision. In the research reported here, we empirically established, for the first time, the qualitative characteristics associated with stereopsis to show that they can occur for static two-dimensional pictures without binocular vision. Critically, we show that stereopsis is a measurable qualitative attribute and that its induction while viewing pictures is not consistent with standard explanations based on depth-cue conflict or the perception of greater depth magnitude. These results challenge the conventional understanding of the underlying cause, variation, and functional role of stereopsis.
Keywords
Viewing a picture with both eyes produces a clear perception of three-dimensional (3-D) objects and depth relations (Fig. 1a). However, the impression of depth lacks an important qualitative attribute experienced when viewing real scenes: a vivid sense of immersive space, tangible solid objects, and realism. This perceptual attribute has historically been referred to as stereopsis, from the Greek words for solid and appearance, and was famously described by Susan Barry (2009) after she recovered normal binocular vision in late adulthood: [I saw] palpable volume[s] of empty space . . . I could see, not just infer, the volume of space between tree limbs . . . the sink faucet popped out toward me . . . the grape was rounder and more solid than any grape I had ever seen . . . objects seemed more solid, vibrant, and real. (pp. 94–132)

Example of a single pictorial image and stereoscopic image pairs. The image in (a) is a photograph of real objects. The three-dimensional (3-D) object shapes and the relative depth relations among them are clearly perceived because the picture replicates many of the visual cues available to a single eye when viewing a real scene. These monocular cues include interposition (e.g., the visual overlap between grapes), shading patterns (e.g., the change in brightness across the surface of a grape), texture gradients (e.g., the speckles on the pear), and perspective and relative size (e.g., the image of the pear in front is larger than the one behind). The images in (b) show stereoscopic pairs of the same scene (left pair for cross fusing, as indicated by the “X,” and right pair for parallel fusing, as indicated by the “II” 1 ). When either pair of images is fused correctly, 3-D shapes and depth relations similar to that in (a) are perceived, but there is also a more vivid sense of tangible solid objects and a phenomenal sense of real space. Notice the difference between the leftmost and center images and between the rightmost and center images. Fusing the images replicates the same small differences between the images of two eyes (binocular disparities) that would be obtained if one were viewing the real scene with both eyes. However, when the single picture in (a) is viewed with both eyes the binocular disparities produced in the eyes are not consistent with the depicted scene but with the flat paper or screen.
Wheatstone’s (1838) invention of the stereoscope showed that this same impression could be simulated by presenting pictures of two slightly different views of a scene to each eye, mimicking the visual stimulation under binocular viewing of real scenes (Fig. 1b). Most readers will be familiar with this remarkable effect as experienced when watching a 3-D movie. The widely held belief is that this phenomenal impression is a by-product of binocular vision (Ponce & Born, 2008), and the term stereopsis is often used interchangeably with binocular depth perception. Here, we draw the important distinction between binocular depth perception, which is the capacity to perceive quantitative depth relations using the visual information from two eyes, and stereopsis, the qualitative vividness of depth that often obtains as a result of this capacity. The cause, variation, and functional role of this fundamental qualitative visual attribute remains largely unexplained despite more than 150 years of intense research in binocular depth perception.
The binocular theory of stereopsis is challenged by observations that monocular viewing by a moving observer or of a moving object can also produce an impression of stereopsis (Musatti, 1924; Rogers & Graham, 1982; Wallach & O’Connell, 1953; Wheatstone, 1838). This similarity has been attributed to the fact that under conditions of self-motion, the brain receives different views of the scene in temporal sequence (motion parallax) similar to the different views received simultaneously from the two eyes under binocular viewing (binocular parallax), which suggests that stereopsis is caused by visual parallax.
However, this explanation is contradicted by reports of stereopsis in the absence of visual parallax (Ames, 1925; Koenderink, 1998; Michotte, 1948/1991; Schlosberg, 1941; Wheatstone, 1838)—for example, under controlled monocular viewing of static pictures (Fig. 2). This important effect (monocular stereopsis) has been neglected in contemporary research for two possible reasons. First, previous reports were based only on expert-observer introspections and have never been verified empirically in naive observers. Second, the effect is often dismissed on the basis of a conventional understanding of the derivation of quantitative depth.

A method for inducing monocular stereopsis. A single pictorial image is viewed with one eye through an oval aperture (approximately 1.2–1.5 cm in diameter) while the other eye is closed. The aperture is located in front of the eye such that it occludes the rectangular boundary of the image. For a sample image and further guidance for constructing an aperture, see Figure S1 in the Supplemental Material available online.
Perceived depth is generally thought to be derived by combining estimates from individual depth cues in some statistically optimal manner (e.g., Bülthoff & Mallot, 1988; Domini & Caudek, 2009; Frisby, Buckley, & Horsman, 1995; Landy, Maloney, Johnston, & Young, 1995). The conventional explanation of monocular stereopsis is based on the relative coherence or conflict among the depth values specified by each cue (Ames, 1925; Schlosberg, 1941). When an observer views a real scene with both eyes, all the depth cues coherently specify the same or similar depth values. However, when a picture is viewed with both eyes, the pattern of binocular disparities specifies a flat surface (the picture itself), which conflicts with the depth variation specified by the monocular cues in the picture’s contents (see Fig. 1b caption). According to the conventional explanation, the conflicting binocular information suppresses the estimates of depth from the monocular cues, causing a “flattening” of perceived depth and a lack of stereopsis. The impression of stereopsis obtained when viewing with one eye is ascribed to the elimination of this conflicting disparity cue, which results in a greater magnitude of perceived depth consistent with the monocular cues (Ames, 1925; Schlosberg, 1941). This explanation therefore links stereopsis with a greater coherence among depth cues and an associated increase in the magnitude of perceived depth.
Here, we provide evidence that is contrary to all three previous explanations of stereopsis: binocular vision, visual parallax, and cue-coherence/depth-magnitude. We do this by first empirically establishing the qualitative characteristics associated with stereopsis in naive observers and showing that the same visual characteristics perceived in binocular stereopsis can also be perceived under monocular-aperture viewing of static pictorial images. We then show that the qualitative impression of stereopsis can be measured but does not covary with changes in cue conflict as predicted by the cue-coherence/depth-magnitude hypothesis. Furthermore, we show that the perceived magnitude of depth (3-D shape) does not significantly differ for conditions in which stereopsis is present or absent when single pictures are viewed, a result that is also contrary to the cue-coherence/depth-magnitude hypothesis. We discuss these results in the context of an alternative conceptualization of stereopsis based on the distinction between absolute and relative depth perception.
Subjects in all experiments were paid undergraduate and postgraduate students naive to the purposes of the study, and were tested for stereoacuity (Titmus stereo test) and visual acuity (Snellen chart).
Experiment 1
We examined whether naive observers perceive the same qualitative visual characteristics for monocular and binocular stereopsis. We used open and closed questionnaires, following previous work that has established other qualitative perceptual effects (Botvinick & Cohen, 1998; Ehrsson, 2007).
Method
Subject screening
Thirty-one subjects were screened to ensure that they could experience and report on qualitative characteristics of binocular stereopsis. They compared monocular and binocular viewing of a real object (a potted flowering plant that was approximately 30–40 cm in all dimensions and was located 60 cm from the observer) by reporting if they perceived any differences in depth impression and verbally describing differences in their own words. As expected, the significant majority (n = 22) reported a better impression of depth in the binocular condition. Their verbal descriptions were consistent with descriptions of the impression of binocular stereopsis found in the literature, including a greater sense of three-dimensionality and a more definitive or greater sense of separation or space between objects (see Table S1). Eight subjects reported no differences and were not tested any further, and 1 subject reported better depth in the monocular condition (see Methodological Details in the Supplemental Material).
Stimuli
Images (31 cm × 23 cm) were color photographs of natural and man-made objects and environments (see Fig. S2) displayed on an LCD monitor viewed from 60 cm; subjects’ heads were stabilized using a chin rest. Under binocular viewing, the apparatus and monitor frame were visible. Monocular-aperture viewing was through an oval aperture (1.3 cm × 0.95 cm) located approximately 1.5–2 cm in front of the observer’s preferred eye, such that it occluded the image boundary.
Procedure
The 23 screened subjects compared monocular-aperture and binocular viewing of the photographs and made verbal reports on the perceived depth, size or scale, distance, object shape, image sharpness, and color or material appearance. With the aid of diagrams, depth was defined as the spatial separation between objects and parts and distance as the spatial separation between the observer and the objects. Apparent size of objects, or scale of the scene, was distinguished from retinal size so that observers could differentiate between depth and distance, as well as between retinal size and perceived size. Other terms were left open to the subjects’ interpretations and application of the most liberal criteria in identifying differences.
Results
A significant majority of the 23 subjects tested (n = 20) reported that they perceived a better impression of depth under monocular-aperture viewing than under binocular viewing, χ2(2, N = 23) = 28.125, p < .0001; 3 subjects reported no difference. Many subjects spontaneously expressed surprise at the strength of the effect and the fact they had not expected to see such visual enhancement when viewing photographs with one eye. Subjects’ descriptions are summarized in Table 1 and are highly consistent with previous reports of the qualitative impression of stereopsis under binocular viewing. A significant majority of the 20 subjects who perceived a better impression of depth in the monocular-aperture condition (n = 17) also reported changes in perceived distance or size, χ2(1, N = 20) = 9.8, p < .01—mostly that objects appeared closer, smaller, or both (Table 2). Surprisingly, the significant majority of these subjects (n = 17) reported no differences in the perceived shape of objects.
Subjects’ Verbal Descriptions of the Difference in Perceived Depth Between Binocular and Monocular-Aperture Viewing of Photographic Images
Note: Descriptions are drawn from reports by 20 subjects who reported a better impression of depth in the monocular-aperture condition. Each description refers to the monocular-aperture condition unless otherwise indicated. Responses are grouped into descriptive categories roughly based on characteristics identified by recovered strabismics (Barry, 2009). Numerals indicate the number of subjects who gave a report consistent with a given category. Some descriptions have been minimally edited or paraphrased to simplify grouping. Most subjects provided descriptions that fell into more than one category.
Subjects’ Verbal Descriptions of the Perceived Differences in Size or Distance Under Binocular and Monocular-Aperture Viewing of Photographic Images
Note: Descriptions are drawn from reports by 20 subjects who reported a better impression of depth in the monocular-aperture condition. Size and distance were treated as a single category because of known dependencies and interactions in reports of these two attributes (the size-distance paradox). Each description refers to the monocular-aperture condition unless otherwise indicated.
Binocular stereopsis has been associated with enhancement in visual attributes such as glossiness (Sakano & Ando, 2010). There are informal reports from recovered strabismics of a heightened sense of color and material properties and heightened overall visual sharpness (Barry, 2009). A significant majority of the 20 subjects who reported depth enhancement (n = 18) also reported enhancements in color/material perception and visual sharpness, χ2(1, N = 20) = 12.8, p < .001, and their verbal descriptions were highly consistent with these previous reports, including descriptions of enhanced glossiness and shininess, dynamic range, saturation, and contrast (Table 3).
Subjects’ Verbal Descriptions of the Perceived Differences in Color and Material Perception Under Binocular and Monocular-Aperture Viewing of Photographic Images
Note: Descriptions are drawn from reports by 20 subjects. Each description refers to the monocular-aperture condition unless otherwise indicated.
To further validate the similarity between monocular and binocular stereopsis, we had a new group of subjects fill out Likert-type questionnaires. Subjects affirmed or denied statements about 14 perceptual effects, presented in random order, using a 7-point visual analogue scale (Botvinick & Cohen, 1998; Ehrsson, 2007). The seven target statements were derived from the verbal reports of the previous group of naive subjects (Table 1), and the other seven served as controls for suggestibility and task compliance. One group (the monocular-aperture group; n = 16) compared monocular-aperture viewing and normal binocular viewing of single pictures. Subjects in this group were required to first write down, in their own words, the characteristic differences in depth impression they perceived before answering the questionnaire. The other group (the stereoscopic-anaglyph group; n = 16) compared viewing stereoscopic anaglyphs and normal binocular viewing of single pictures. The color anaglyphs and single pictures were viewed through red/green filter glasses.
Again, subjects’ independent written descriptions of the difference in depth impression between binocular and monocular-aperture viewing of single pictures (Table 4) were highly consistent with descriptions given by congenital strabismics after recovery of binocular vision (Barry, 2009).
Verbatim Written Descriptions of the Visual Effect of Monocular-Aperture Viewing of Color Photographs of Real Objects for a Sample of 3 Naive Subjects
In their responses to items on the questionnaire, subjects in the monocular-aperture group affirmed the target descriptors and denied the controls. Ratings for all target descriptors were greater than 0 with a high degree of statistical significance (Fig. 3a). For control items, ratings were either significantly less than or insignificantly different from 0. The same results were found for the stereoscopic anaglyph group (Fig. 3a). Critically, in both groups, average ratings for a change-in-shape control descriptor were not significantly different from 0.

Results in Experiment 1. Panel (a) shows naive subjects’ responses to seven target statements (white regions of graphs) and seven distractor statements (gray regions of graphs) describing the difference between visual impressions, comparing either monocular-aperture viewing of a single picture (left plot; n = 16) or binocular viewing of stereoscopic anaglyphs (right plot; n = 16) to the control condition (binocular viewing of a single picture). Squares indicate median responses, and black circles indicate mean responses. Black error bars represent standard errors of the means, and light-gray error bars represent the range of responses. Descriptors shown are abbreviated versions used in the experiment. Panel (b) shows qualitative characteristics selected by subjects that most uniquely define the difference between monocular-aperture and binocular viewing of single pictures, in comparison with the enhancement of perceived depth due to manipulation of image cues alone. Gray bars indicate the cumulative frequency of selection for each descriptor. Black bars represent cumulative frequency, with each instance inversely weighted by the rank (1, 2, 3, or 4) of the order in which the descriptor was selected.
To determine whether the enhancement in depth impression they perceived was generic—in other words, that it was not necessarily related to the impression of stereopsis per se—we had the subjects in the monocular-aperture group further compare depth impression in two pairs of color photographs (binocular viewing) in which we had disrupted pictorial depth by either blurring the image or applying a filter that disrupted the shading cue to depth (using the Dry Brush tool in Photoshop; see Fig. S3). Subjects reported whether there was a difference between conditions in the impression of depth and, if so, which condition produced the better impression of depth. They reported if this change in depth impression was similar to or different from that perceived under the monocular-aperture manipulation. Those who reported that depth perception was different were then asked to specify a maximum of four descriptors (derived from target items on the original questionnaire) that best described the characteristics differentiating the improvement in depth impression under monocular-aperture viewing from that obtained in the image manipulations, ranking them in order of importance.
All subjects reported a better impression of depth in the original (nonfiltered) images. The significant majority of subjects (n = 13) reported that the improvement in depth impression under monocular-aperture viewing was different, χ2(1, N = 16) = 6.25, p = .012. The top six descriptors selected to identify this difference (Fig. 3b) were the same as the characteristics of binocular stereopsis described by recovered strabismics (Barry, 2009).
Experiment 2
We next tested whether the impression of stereopsis characterized in Experiment 1 could be measured and whether its variation was consistent with the cue-coherence/depth-magnitude hypothesis. Subjects rated the perceived within-pair difference in the impression of stereopsis for different pairs of four viewing conditions (monocular, binocular, monocular aperture, and binocular aperture).
Method
Subject screening
Twenty-three naive subjects were screened to ensure that they perceived monocular stereopsis. Subjects compared binocular and monocular-aperture viewing of a sample color photographic image and were required to verbally report perceptual differences consistent with at least two of the qualitative descriptors established previously. Eighteen subjects passed the screening, and the remaining 5 were not tested any further (see Methodological Details in the Supplemental Material). To help draw the screened subject’s attention to the full gamut of qualitative aspects of stereopsis when they completed the rating task, we had them compare the viewing conditions again and fill out a questionnaire similar to that used in Experiment 1.
Stimulus and display
Subjects viewed four color photographic images (38 cm × 26 cm) on an LCD monitor from a distance of 50 cm. In the binocular and monocular conditions, the apparatus and frame of the display were visible. In the monocular-aperture and binocular-aperture conditions, subjects viewed the display through oval apertures (1.2 cm × 0.85 cm) that were matched to individual interpupillary distance and located 1.5 to 2 cm from the eye so that the image boundary was occluded in both conditions.
Procedure
Subjects numerically rated the difference in the impression of stereopsis within pairs of viewing conditions (e.g., binocular vs. monocular). A rating of 0 indicated no perceived difference, and a rating of 5 indicated the level of difference perceived in the reference binocular-versus-monocular-aperture comparison, which subjects viewed first for each tested image. For every pair, subjects were instructed to view images in each condition at least twice, for at least 5 s on each viewing (self-paced). The order of presentation of tested images and the comparison pairs was randomized between subjects.
Predictions
Under binocular and binocular-aperture viewing, binocular disparity strongly conflicted with the 3-D interpretation specified by the pictorial cues. In the monocular and monocular-aperture conditions, these conflicting binocular cues were eliminated. According to the cue-coherence/depth-magnitude hypothesis, a large difference in depth impression should therefore be obtained in the binocular-versus-monocular comparison and the binocular-aperture-versus-monocular-aperture comparison. However, the cue-coherence/depth-magnitude hypothesis predicts that there should be only small differences, if any, for the monocular-versus-monocular-aperture and binocular-versus-binocular-aperture comparisons. In these comparisons, the only change is the field-of-view restriction due to the aperture. This restriction eliminates the visibility of the image frame, but frame visibility is not a depth cue in the conventional cue-integration sense. 2
Results
Distributions of the raw ratings for all comparison pairs and images tested were unimodal (Fig. S4). To test for internal consistency, we derived ratings for the reference binocular-versus-monocular-aperture comparison as the sums of ratings for two independent sets of comparisons (binocular-vs.-monocular rating plus monocular-vs.-monocular-aperture rating and binocular-vs.-binocular-aperture rating plus binocular-aperture-vs.-monocular-aperture rating). Consistent with the initially defined reference magnitude, both median sums were 5.0 (Ms = 4.56 and 4.69, respectively). This internal consistency and the unimodal distribution of ratings indicate that subjects were reporting on a stable perceptual variable.
However, the average ratings for the four comparison pairs tested were not consistent with the cue-coherence/depth-magnitude hypothesis (Fig. 4). The smallest difference ratings were obtained for the binocular-versus-monocular comparison, in which conflicting binocular cues were eliminated and therefore were predicted to produce a large shift in depth impression. Most frequently, subjects reported no difference (mode = 0). The largest difference was obtained for the monocular-versus-monocular-aperture comparison, in which conflicting binocular cues were not present in either viewing condition, and in which the only manipulation was the aperture. For this comparison, the mean difference rating was 3.46 (Mdn = 4), significantly greater than ratings for the binocular-versus-monocular comparison, F(1, 13) = 25.53, p < .001, and the binocular-versus-binocular-aperture comparison, F(1, 13) = 18.371, p < .001, and nearly significantly greater than that for the binocular-aperture-versus-monocular-aperture comparison, F(1, 13) = 4.65, p = .05. However, the introduction of the aperture alone cannot account for the ratings, given that the binocular-aperture-versus-monocular-aperture comparison, in which the aperture was present in both viewing conditions, yielded the second-highest difference rating (M = 2.71, Mdn = 3). One subject reported ratings contradictory to those of all other subjects and was excluded as an outlier (see Methodological Details in the Supplemental Material).

Ratings of perceived difference in the impression of stereopsis for different pairs of viewing conditions (indicated graphically at right). Ratings were made in relation to the reference binocular-versus-monocular-aperture condition (not shown), which was given an arbitrary value of 5. Higher values indicate greater perceived differences between conditions in each pair. Error bars represent standard errors of the means. Light-gray regions roughly indicate ratings expected on the basis of predictions of the cue-coherence/depth-magnitude hypothesis. (Note that these expected ratings were not quantitatively derived but are used here merely as a visual aid to indicate the expected outcome for each pair.)
Experiment 3
A central prediction of the cue-coherence/depth-magnitude hypothesis is that monocular stereopsis should be associated with an increase in the magnitude of perceived depth. We tested this prediction directly by having subjects make magnitude estimates of curvature-in-depth under binocular and monocular-aperture viewing.
Method
Stimuli
Stimuli were elliptical hemicylinders defined by a pattern of randomly distributed dots (Young, Landy, & Maloney, 1993; Fig. 5). Each cylinder subtended a visual angle of 8.5° when viewed from 55 cm on an LCD monitor (24 in., 1920 × 1200 pixels). There were two versions of seven curvatures (elliptical cross-section axis ratios of 0.5–2.0; see Fig. 5), for a total of 14 unique stimuli that could be presented in horizontal or vertical orientation.

Stimuli and results in Experiment 3. In panel (a), the leftmost and middle images are examples of textured elliptical hemicylinders, and the rightmost image is the magnitude-estimation probe. The curvature-in-depth is expressed as the axis ratio (a/b) of the elliptical cross section, where “a” represents the depth dimension and “b” represents the vertical or horizontal dimension (depending on cylinder orientation). Axis designations were not visible in the actual experiment. The graphs in panel (b) show the curvature-in-depth settings for binocular and monocular-aperture viewing, averaged over 7 naive subjects. Shading indicates average variability in settings (shaded regions = ±1 SEM). The images in the key to the right illustrate viewing in the binocular and monocular-aperture conditions.
Procedure
On each trial, subjects viewed a hemicylinder for 1.2 s. At stimulus offset, a response probe representing an elliptical cross section (Fig. 5) appeared that could be adjusted via keyboard presses to match the perceived curvature. Each session consisted of 28 trials (for each of the seven curvatures, two versions, and two orientations). Subjects completed a total of four (two binocular-viewing, two monocular-aperture-viewing) sessions. Half of the subjects first completed the binocular sessions, followed by the monocular-aperture sessions; for the other half of subjects, the order was reversed. Before starting each session, the subject was required to step through the full response range of the probe, which consisted of cross-section axis ratios (a/b) ranging from 0.2 to 2.0 in steps of 0.06.
To ensure that subjects had perceived monocular stereopsis, we asked them after the main experiment to directly compare binocular and monocular-aperture viewing of a sample hemicylinder and a photograph of a real scene and to fill out a questionnaire for each. All subjects gave reports consistent with the perception of monocular stereopsis for both types of stimuli.
Results
One subject showed no modulation of perceived curvature with changes in base curvature for either viewing condition and was excluded from the analysis. For the remaining 7 subjects, on average, there were main effects of base curvature, F(6, 181) = 51.9, p < .0001, and cylinder orientation, F(1, 181) = 45.3, p < .0001 (see Fig. 5). The effect of orientation is consistent with the horizontal-vertical anisotropy in slant judgments previously reported only in the presence of disparity (e.g., Frisby et al., 1995). This effect shows that the task was sensitive enough to reveal modest shifts in perceived 3-D shape. However, contrary to the cue-coherence/depth-magnitude hypothesis, there was no main effect of viewing condition on perceived curvature, F(1, 181) = 0.41, p = .52. Data for individual subjects revealed no systematic pattern for monocular-aperture versus binocular viewing based on order of presentation (Fig. S5).
Discussion
Since Wheatstone’s invention of the stereoscope, a widely held assumption has been that stereopsis is a by-product of binocular vision. Here, we have provided the first empirical evidence that the impression of stereopsis can be induced in the absence of binocular disparity or visual parallax. A simple explanation for this effect of monocular stereopsis is based on the extent of conflict or coherence among depth cues. This hypothesis predicts that the impression of stereopsis when viewing single pictures occurs mainly because of the elimination of binocular cue conflict, and that it is associated with an increase in the magnitude of perceived depth relief.
In Experiment 2, which established the measurability of the impression of stereopsis, we found no evidence to support the first prediction: There were negligible shifts in perceived stereopsis when conflicting binocular cues were removed and large shifts when there were no changes in binocular cue conflict (see Fig. 4). Our results are also contradictory to the second prediction. Subjects in Experiment 1 reported no significant differences in perceived 3-D shape despite the removal of conflicting binocular cues and the resulting induction of stereopsis. This outcome was confirmed quantitatively in Experiment 3, in which, on average, curvature-in-depth settings were similar for binocular and monocular-aperture viewing, although depth impression was reported to be significantly different in these two conditions. This result shows that perceiving a stronger impression of stereopsis is not the same thing as perceiving a greater magnitude of depth or a different 3-D shape. Furthermore, it suggests that conflicting binocular cues are largely ignored in the perception of depth and 3-D shape in pictures.
If monocular stereopsis is not simply associated with reduction in cue conflict or the perception of a greater magnitude of depth relief, what explains its underlying cause and variation under different viewing conditions? A potential explanation relates to the distinction between absolute (egocentric) and relative (allocentric) depth perception (Glennerster, Rogers, & Bradshaw, 1996; Loomis, Philbeck, & Zahorik, 2002; Zimmerman, Legge, & Cavanagh, 1995). Consider Fig. 6a, in which the observer views a simple 3-D object. The depth cues in the retinal image (e.g., shading and perspective) can specify only relative depth relations (angles, θ, and depth ratios, d1:d2). These values uniquely define the 3-D shape of an object, but without information about its distance from the observer, the object’s size and absolute depth values (d1, d2) remain unspecified (Fig. 6b). In order for the size and absolute depth values to be derived, the retinal image and associated depth cues have to be scaled by an independent estimate of the distance of the object from the observer (Glennerster et al., 1996; Kaufman et al., 2006). If cues to object distance are available, the 3-D shape, scale, and absolute depth can all be derived (Fig. 6c). One possibility is that the impression of stereopsis is induced only when absolute depth values can be estimated, and that such an impression therefore depends on the availability of distance information to scale depth cues (Vishwanath, 2011).

The image in (a) shows an observer viewing a real three dimensional (3-D) object consisting of two planes connected at angle θ. The object’s absolute dimensions in depth, with respect to the observer, are d1 and d2. The retinal image produced in the observer’s eye is shown inside the dashed circle to the left. The image in (b) depicts a hypothetical situation in which the observer has no information about the distance of the object. The observer can derive the relative depth ratios (d1:d2) and angle (θ) on the basis of depth cues in the retinal image (e.g., perspective, shading), but the observer does not know how far away or how big the object is. The observer therefore perceives the 3-D shape of the object but not its size and absolute depth values (d1 and d2). As the image in (c) illustrates, if the observer has access to distance cues (as indicated by the dotted arrow), the depth cues can be scaled to derive an estimate of the absolute depth values. The observer therefore perceives the 3-D shape as well as the absolute size and depth values. The image in (d) shows the observer viewing a picture of the same object with two eyes. The picture surface is visible because of binocular cues and the visible picture frame. Distance information from binocular convergence, vertical disparity, and lens accommodation specify the distance of the picture surface (as indicated by the dotted arrow). The observer therefore correctly perceives the size and location of the picture surface. The observer also perceives the 3-D shape (depth ratios) of the object depicted in the picture because the retinal image is the same as that in (a). However, because there are no cues within the picture to specify the distance of the object, the observer does not know its absolute size or depth values. The image in (e) shows the observer viewing the same picture monocularly through an aperture. The picture surface is no longer visible because binocular disparity and the visibility of the picture frame are eliminated. In the absence of a visible picture surface, the remaining distance information from cues such as accommodation (represented by the dotted arrow) could be assigned to the pictorial object, allowing a derivation of size and absolute depth values.
How can this explain the variations in perceived stereopsis when viewing a picture under different conditions? When a picture is viewed normally with both eyes, the picture’s surface is visible because of cues such as binocular disparity and the visible frame of the picture (Kubovy, 1986; Pirenne, 1970; Vishwanath, Girshick, & Banks, 2005). Distance cues such as binocular convergence, vertical disparity, and the accommodative state of the lens specify the distance of this visible picture surface (Ames, 1925; Watt, Akeley, Ernst, & Banks, 2005) rather than the pictorial contents (Vishwanath, 2011). There are no known optical cues that specify the distance of pictorial objects from the observer. 3 Therefore, under binocular viewing of pictures, although 3-D object shapes can be clearly perceived, their scale and absolute depth should remain optically unspecified (Fig. 6d). Monocular-aperture viewing removes the main cues that specify the presence of the picture surface (binocular disparity and the visible frame), as well as binocular cues specifying its distance (convergence and vertical disparity). However, subsidiary distance cues, such as the accommodation state of the lens, are still present. In the absence of a visible picture surface, it is plausible that the brain attributes the accommodation response to the pictorial objects and assigns any associated distance information to them, thereby allowing absolute depth values to be derived and generating an impression of stereopsis (Fig. 6e; Vishwanath, 2011).
It is likely that viewing conditions that yield only a partial reduction in surface visibility (via removal of either binocular disparity or the visible frame) will result in a greater variability in the assignment of distance information to the pictorial objects and a correspondingly reduced sense of stereopsis. This conjecture appears consistent with our results from Experiment 2. All manipulations that reduced picture-surface visibility (those in the monocular, binocular-aperture, and monocular-aperture conditions) increased the impression of stereopsis, but monocular-aperture viewing, in which the surface visibility was maximally reduced (Vishwanath et al., 2005), yielded a significantly higher degree of stereopsis (Fig. 4). It is also consistent with two other key aspects of our results. First, given that the explanation links stereopsis to the disambiguation of absolute depth values and not to changes in relative depth values (3-D shape), it is consistent with our findings that the induction of monocular stereopsis did not appear to affect perceived 3-D shape (Experiments 1 and 3). Second, because people normally infer objects in pictorial space to be located at some distance beyond the picture plane (Fig. 6d), the reassignment of accommodation-based distance information to pictorial objects should make them appear to be located at or near the surface of the picture and, therefore, closer and smaller than that inferred under normal (binocular) viewing (Fig. 6e). A significant majority of subjects reported a change in perceived object size, distance, or both accompanying the impression of stereopsis—specifically, they reported that objects appeared closer, smaller, or both, and, for some images (see Fig. S6) that they appeared “miniaturized” (Table 2).
Although we did not directly test it, this alternative explanation suggests that stereopsis from binocular image pairs with disparity (Fig. 1b) should be significantly stronger than monocular stereopsis (Fig. 1a viewed through an aperture), not because disparity is the main cue for the impression of stereopsis but because disparity can be scaled by binocular convergence, which is a more reliable distance cue than accommodation (Howard & Rogers, 2002), and should yield a more precise estimate of absolute depth.
The explanation of monocular stereopsis based on picture-surface visibility, distance estimation, and scaling of absolute depth provides a plausible mechanistic basis for previous claims that changes in picture-surface visibility underlie shifts in perceived stereopsis in pictures (Michotte, 1948/1991; Pirenne, 1970).
In conclusion, our results show that, contrary to long-held beliefs, stereopsis is not a simple by-product of binocular vision or visual parallax. These results point to the need for an alternative conceptualization of this fundamental visual property and suggest wide-ranging implications for mechanisms of 3-D perception, the evolution and ontogeny of binocular vision, and the development of display technologies and virtual reality.
Footnotes
Acknowledgements
The authors thank Johannes Burge, Martin Banks, Simon Watt, and Leda Blackwood for comments on an earlier version of this manuscript.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
