Abstract

We are all brought up with the Amesian viewpoint, deriving from the string-chair demonstration of Adelbert Ames III, that perspective information is interpretable as an infinite set of projections of real-world objects that cannot be resolved into a particular interpretation without additional constraints. Indeed, the situation is even less constrained than is implied by the string-chair constructions because the thin strings making up the Ames chair structure were straight, whereas in the limit each line on the retina is itself ambiguous, and could have derived from an infinite array of projections of curved and wiggly lines that happened, by some implausible accident, to be wiggly only within the plane of projection through the eye, and hence to appear straight from its immediate viewpoint. In the face of this array of arrays of interpretative possibilities, is there any sense in which the information provided by a perspective scene is interpretable as just one single structure in space?
Opposing the Amesian viewpoint is the Gibsonian view that there is sufficient information in the world to resolve such ambiguities, and that in general we can tell from the information available in the optic array how far away from us all the objects are located. Indeed, our general experience of the world is that we know the distance of the objects we are navigating among, and that we are rarely confused by Amesian ambiguities, so in operational terms Gibson must be right that there is sufficient information to resolve cues into absolute distance perception, but in general it remains a puzzle what actual information we use to solve the task.
If we dissect it into the component types of information, the linear perspective array, the horizontal binocular disparity array, the optic flow array, the texture gradient array, and so on, each one provides only relative cues to the actual distances of objects in the scene. If we set any one of these cues up as an array on a computer screen, it provides relative depth information, but each is scaled relative to the external cues of the screen. However, in some cases, the optical array does contain some form of information that resolves the relativity to provide absolute distance information. For binocular disparity, for example, it is the array of vertical disparities introduced by the relative magnification to the two eyes in peripheral view that is unique to each distance. We can ask whether there is such an absolute depth cue for the monocular perspective information in pictures.
In general, the answer is ‘no’. If we view a scene composed of arbitrary objects and material textures, there is nothing to go on. It is only when there are textural regularities and matching sized objects that we have any chance of resolving the depth structure of the scene. Consider, for example, what is arguably the earliest surviving perspective diagram (Figure 1), for a painting of the ‘Adoration of the Magi’ by Leonardo da Vinci (∼1481). The architectural elements are overlaid on a classic perspective pavimenti, with the orthogonals receding accurately to a well-defined vanishing point located at what may be the nostril of the rearing horse in the background. So da Vinci was evidently well aware of the vanishing point construction, which I like to think was explained to the young artist by the venerable Florentine mathematician and cartographer, Paolo Toscanelli, who had worked with Brunelleschi on the original formulation of perspective half a century earlier.
Linear perspective study for ‘The Adoration of the Magi’, by Leonardo da Vinci (∼1481; Uffizi Gallery, Florence). The overlaid horizontal dot-dashed blue line indicates the horizon at the level of the central vanishing point. The overlaid black oblique line and cross-diagonals identify corresponding sets of intersections of the implied paving stones between the receding orthogonals and the transversals. Dashed black lines show the projection of the obliques to the distance point on the horizon line, which is separated from the distance point for the solid black line by a visual angle of 90o at the implied viewing distance of the viewer's eye, thus fixing the actual visual angle of the objects in the scene.
What da Vinci evidently was not familiar with was the principle for the placement of the transversals making up the ground texture. He has a high density of them drawn assiduously in parallel, but with a recessive spacing that he evidently concocted by eye alone. The rule is that an oblique, such as the black line overlaid in Figure 1, should intersect the equally spaced orthogonals at equal intervals in space, such as placing a transversal at every intersection. Given da Vinci’s choice of the density of his transversals, a suitable oblique line should intersect the orthogonals at, for example, every fourth transversal. However, although equal numbers of transversals should be marked out in Figure 1 by the perspective construction of the dashed vanishing lines placed across the main oblique to form the crossed diagonals of receding squares, the delineated equal ‘squares’ actually encompass 5, 4, 3, and 2 transversals, counting from left to right. As a result, the further steps of the two staircases look longer than the nearer steps, even though da Vinci has carefully arranged for the staircases to span equal numbers (4) of the transversals (as they should if they were the same width and the transversals correctly spaced). It is, however, unclear what principle or construction method da Vinci used to space his transversals, because they undergo a regular diminution with distance despite its being at a slower rate than prescribed by the intersection principle as illustrated.
However, the main point of introducing this diagram is to ask, if it was correctly drawn, could it convey the absolute distance structure of the scene depicted or not? To answer ‘yes’, we would need at a minimum the assumptions that 1) the straight lines on the paper correspond to nonaccidental views of straight lines in the scene; and that 2) the transversals are indeed of equally spaced pavimenti elements in the scene. The answer, however, would still be ‘no’ without information about the aspect ratio of the pavimenti elements, or paving stones, depicted; the structure of the scheme would only be resolvable to a one-parameter degree of freedom of the distance scaling.
If we could make the additional assumption that 3) the paving stones were square, for example, then the absolute scale and distance of the scene would be unambiguous. This is shown by the horizon line passing though the central vanishing point (horizontal dot-dashed blue line) and the continuation of the obliques of each square to their intersection with it (dashed lines). These intersections define the ‘distance points’ for the obliques, which must necessarily be at a 90° separation to the viewer’s eye if the paving stone are square (so that their diagonals are at a 90° angle). Since the separation of the distance points must be at 90° when the viewer is at the correct distance, the visual angle of all the depicted objects at this defined distance is also known precisely. Then, given the visual angle and the depicted geometry of each (square) paving stone, the angles of its sides are constrained by their projection to the vanishing points, whose angle to us determines the angle of the plane in which they lie. So we also know its precise orientation in space, while the separation of the transversals of the square is determined the distance to that plane. (Of course, as drawn by da Vinci with the incorrect scaling of the transversals, the ground plane should appear as a curved surface rather than flat.) In summary, although Ames was right that ultimately the scene geometry is indeterminate, applying the three stated assumptions would allow the full scene geometry to be known absolutely, in a modified validation of Gibson’s contention.
If da Vinci had only drawn the pavimenti, that would be the end of the story, but he has included some other features that allow the ambiguity to be resolved, namely the arches in two perpendicular views at the left-hand side of the diagram. Here, we can use the familiar shape information that Renaissance aches were semicircles to infer that the archways are the same width and, by focusing on the foot of the arches, we can (just about!) see that they span 5 transversals but only 2½ orthogonals, so the paving stones were evidently intended to have a 2:1 aspect ratio. With this information replacing the third assumption, the three assumption are again sufficient to provide us with the absolute distance information in the scene. Note that the same would be true if we had views of the paving stones themselves at two different angles (such as on the floor and the walls). Thus, the textural regularity has to extend to surfaces in two viewable dimensions in a defined relationship to each other in order to resolve the absolute distance structure in a static perspective image.
Amusingly, however, da Vinci has played a final trick on us (or on himself!) to give the illusion that the frontmost foot of the set of arches in the left foreground of Fig. 1 seems to rise to a location behind the arch immediately to the rear. Actually, this is an erroneous juxtaposition because the frontmost foot is a ruined stump of the completed arch with which it is aligned. But it is still difficult to make out how all the arches were intended to join up in the vaulted space behind it.
We may bring this analysis to bear on Gibson’s contention, and our everyday experience, that the environment is in general rich enough for us to resolve almost all of the distance ambiguities without knowledge of the familiar size of objects from prior exposure. The question at hand is whether this holds true for pictorial (static, one-eyed) information perse or requires the full panoply of the sense apparatus. Given the three assumptions above, should that provide sufficient information to resolve the absolute distance structure of visual scenes? It seems that the answer is still ‘no’ because natural textures in the world (such as a field of grass or a gravel path) tend to be three-dimensional rather than purely two-dimensional surface textures. So even if the physical structure of the material is uniform, its appearance changes with viewing angle, such that in this case, the richness of the information violates the uniformity assumption required for accurate reconstruction of absolute distances. Only in the limit of the smoothly carpentered environment assumed by perspectivists such as da Vinci would Gibson’s implicit uniformity assumptions hold up to provide the kind absolute depth interpretation that we typically experience when viewing a rich perspective scene.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
