Abstract
When people perceive the world, what they see is based on the physics of light reflecting off surfaces and entering their eyes. Their brain then processes the raw data so that photoreceptor activity becomes perceptual awareness. Most textbooks and chapters on sensation and perception follow this formula, building student understanding of perception as progressing from the raw data of light to the biological response of photoreceptors to more complex processing of edges and objects in the brain. This approach is often called bottom-up processing. Top-down processing, in contrast, occurs when people’s expectations, emotions, and bodies affect how they see the world. In this article, I review diverse evidence suggesting that perception is not solely a result of bottom-up processing. I also suggest ways that we should inform students of this complexity.
Most introductory psychology textbooks describe the process of visual perception as a clear series of orderly and insulated steps carried out by a visual system composed of the eyes and brain. First, light from a light source reflects off surfaces and objects in the world. Then, the biological optical structures in the eyes filter and focus the reflected light. Next, the photoreceptors on the retina detect that light and transform the light energy into neural energy. Finally, a series of brain areas process the pattern of neural activity emerging from the retina. In short, the process moves from sun to surface to eye to brain. Within each of these stages, there are more stages. Light first passes through the cornea, then through the pupil, and then through the lens, finally hitting the retina. Neural activity goes from the retina through the optic nerve, then to the thalamus, and then to the primary visual area in the occipital cortex. The primary visual cortex first processes basic qualities like color, edges, and motion before passing on neural signals to the temporal cortex for more complex objects and to the parietal cortex for visually guided actions.
This basic description of the visual system is an important first step in understanding how perception works. But this view of perception is a caricature. Perception is not that simple. Recent research, using both behavioral and neuroscientific techniques, has confirmed that the process of perception is more complicated than the initial cartoon many teachers present to their students. The problem confronting teachers of Introductory Psychology and Sensation and Perception is thus straightforward: How do teachers help their students gain a more complex and nuanced understanding of the process of perception without undermining the clear framework of knowledge students have built? In this article, I suggest complicating the caricature by highlighting and questioning two assumptions I emphasized in the first sentence above: (a) that perception is a series of steps carried out in order, from simple activity of photoreceptors to more complex processing and representation in the brain and (b) that the eyes and brain as a system are separate from the body.
Perception Is Not Simple and Linear
When perception includes cognitive processes such as long-term memory and expectations rather than just processing information from light and photoreceptors, it is known as top-down processing. Moreover, there is evidence that the state of our body influences perception. Although many researchers do not view effects of the body on perception as a type of top-down processing, these effects do provide further evidence that perception is not a simple linear process. Rather than perceptual experiences and appearances being passed along as “pure” information for action, this latter approach, often called embodied perception, maintains that intended actions affect perception itself. For example, objects appear closer when an observer is holding a tool that would enable her to reach it but only when she intends to use that tool (Witt, Proffitt, & Epstein, 2005). Overall, this view of perception complicates the picture that perceptual representations are gradually built from raw data in the environment. perception is instead an interactive process, taking information from higher cognitive processes as well as from the state of the body. In the following pages, I will describe several forms of evidence supporting this type of perceptual processing.
Top-Down Processing
The view that visual processing proceeds from the raw data of the sensory receptors to a rich and comprehensive three-dimensional visual experience is known as bottom-up processing. In contrast, top-down processing moves in the other direction, starting with knowledge and expectations that affect more basic visual qualities. Another way of thinking about this distinction is that the bottom-up approach proposes that perception is a series of independent modules in which people process visual information in one direction. First, edges and colors are distinguished and then they are bound together into an object. Then, an object-recognition module recognizes an object by comparing it to other shapes in long-term memory. Finally, a language module gives that object a linguistic label. In contrast, a top-down approach suggests that, in some cases, long-term memory or language can affect “lower” processes such as object recognition or shape perception. Evidence of top-down processes would include examples of long-term memory affecting object-shape recognition or linguistic labels affecting basic color discrimination.
One of the first ways researchers assessed the existence of top-down processing was by having participants observe a common scene containing many objects. If the visual system only processes information using a bottom-up approach, humans would briefly view and identify the objects in a scene before processing more general, gist-like elements of the scene. Likewise, although basic perceptual qualities like background color and contrast might affect object identification, the meaning of the background or context should not influence basic processes of object recognition.
The earliest suggestion of the existence of top-down processing came from Biederman (1972) and Palmer (1975), both of whom found that participants were less accurate identifying objects when the objects were placed or primed with semantically inconsistent scenes (like identifying a mailbox in a kitchen). Davenport and Potter (2004) extended these results by showing foreground objects along with either consistent or inconsistent background scenes. They also found that participants were less accurate at identifying an object when the background scene was inconsistent with the object (like a priest on a football field or a football player in uniform in a cathedral). Davenport and Potter, however, also found that participants were less accurate identifying the background scene when an object was inconsistent with its background. Davenport and Potter thereby concluded that humans do not process objects before scenes, just as they do not process the gist of a scene before an object. Rather, humans process the object and gist interactively, so that consistency affects the perception of each. Brady, Shafer-Skelton, and Alvarez (2017) later suggested that humans process gist information quickly using basic textural information (e.g., Is it a scene with lots of small edges, like a furnished living room? Or is it a scene with a long, horizontal edge and few small ones, like a beach?). In contrast, humans do not process object information using basic textural configurations.
Long-term memory representations thereby seem to influence earlier steps in the perceptual process. But how far back could these feedback mechanisms reach? Kahan and Enns (2014) tested whether long-term memory representations exerted their influence after an object had been assembled from its edges or during the edge assembly phase itself. They briefly presented participants with very basic shapes that were either similar to letters or to unfamiliar shapes on a typical alarm clock. Kahan and Enns then presented masks that either interfered or facilitated whole object- or edge-based recognition. By comparing processing based on edges to processing based on whole objects in the same paradigm, Kahan and Enns were able to show that the processing advantage given by familiarity in long-term memory occurs early in visual processing, before the edges have been assembled into a coherent object.
Lupyan (2017) extended this work with a more simple and intuitive perceptual task: comparing the blur or sharpness of two sets of letters. Although not a common conscious perceptual task, perceiving blur (or relative sharpness) is a critical cue to depth perception and, indeed, also connects intuitively to what people mean when they say that their vision “works well.” The paradigm was simple: Observers viewed a target four-letter word (such as “seem,” “worn,” or “much”) at different levels of blurriness and adjusted the blurriness of a matched pseudo word (“mcuh”) directly below it so that the blurriness matched. This “blur” paradigm also has the advantage of using a method of adjustment and comparison, where observers see two stimuli and adjust one to match the level of sharpness of the other. Lupyan found that when matching a blurred word (such as “much”) to a blurred nonword made up of the same letters (“mcuh”), observers made the word more blurry, indicating that they saw it as sharper than the nonword. These results suggest that the meaning of the set of letters facilitated perceived sharpness even if the words were nearly identical in terms of other low-level visual characteristics (e.g., number of edges, orientation of edges).
Another striking example of top-down influence on perception is color perception. What could be more basic than distinguishing the colors of surfaces? For example, imagine two surfaces, one reflecting a 450-nm wavelength of light and another reflecting 470 nm of light. The difference of 20 nm is the same as the difference between 460 and 480 nm. According to a bottom-up view of perceptual processing, these two tasks—distinguishing between the two colors 20 nm apart—would not be different depending on what the labels were for those colors in a person’s native language. But this is exactly what Winawer et al. (2007) observed. They tested Russian speakers and found that the participants were faster in distinguishing colors that fell on either side of a language boundary (a blue color known as siniy compared to a blue color known as goluboy) than if each color was on the same side of the boundary (both siniy or both goluboy). In this case, it seemed that the top-down effect could be disrupted if observers were simultaneously completing a verbal task, suggesting that the linguistic processing that delayed the color discrimination was happening in real time. Maier and Abdel Rahman (2018) extended the finding of different color perception depending on color labels by collecting electroencephalogram (EEG)data within an attentional blink paradigm. They found that differences in language categories even affected early neural signatures of perceptual processing and the access to visual consciousness.
Emotional and motivational influences
Two classes of top-down influences on perception deserve special attention. The first is emotional and motivational influences. Although not necessarily considered an influence of knowledge on perception, the observation that emotion can change perception challenges the idea that perception is insulated from other mental processes. A classic study by Phelps, Ling, and Carrasco (2006) investigated whether differences in emotional state can influence basic perceptual sensitivity. Phelps et al. first primed participants with either a fearful or a neutral face (for 75 ms). Then, after a 50-ms delay, the stimulus for the task was presented. That stimulus, presented for 40 ms, consisted of four Gabor patches (a basic visual stimulus consisting of a blurry set of black and white lines oriented in a certain direction), one of which was tilted relative to the others. Participants were to note the direction of the tilt. The Gabor patches varied in contrast to yield a contrast sensitivity function, or an amount of contrast needed to detect the difference in orientation. With low contrast, performance was at chance for both types of cues. But with higher contrast, performance was more accurate for the trials cued by the fearful face. Participants’ responses were not due to the basic perceptual attributes of the face stimuli themselves because this effect was not present when the faces were upside down. Rather, a brief fearful emotional manipulation led to increased contrast sensitivity.
Researchers have assessed other influences of emotion on people’s perception of the basic spatial layout of the world as well. For example, Riener, Stefanucci, Proffitt, and Clore (2011) found that hills appeared steeper to those in a sad mood. Similarly, Stefanucci and Proffitt (2009) found that the normative overestimation of height varied by both state- and trait-level fear of heights. Observers who tended to be scared of heights overestimated heights more, and those who rated their current fear of heights also overestimated more than those who were not scared of heights. Stefanucci and Storbeck (2009) extended this research by manipulating arousal (through arousing or nonarousing images). Observers who viewed arousing images overestimated heights more. Stefanucci and Storbeck also found that instructing observers to use emotional regulation strategies moderated emotional arousal. Finally, observers’ motivation to reach an object also influences its perceived distance. Balcetis and Dunning (2010), for example, found that thirsty participants perceived a water bottle as being closer to them.
Why do emotions have such effects on perception? One popular theory is that emotional affect serves as information in evaluating decisions and actions. Zadra and Clore (2011) reviewed the evidence on emotion influencing perception and argued that a direct influence of emotion and motivation on perception helps inform efficient action choices and avoidance of potential danger, without the need for further conscious, deliberative steps of evaluating costs, benefits, and consequences. When people adjust their perception of the basic visual characteristics of the world (e.g., distance, slant, height, and contrast) based on their emotional states, it facilitates more immediate and automatic decisions on how to act with the perceived objects and how to navigate the environment.
Researchers have also investigated the possible influence of more complex emotions on spatial perception. Zheng, Fehr, Tai, Narayanan, and Gelfand (2015) investigated the influence of offense and forgiveness on perception of hill slant. Zheng et al. randomly assigned participants to one of the two emotional stimulus groups before asking them to judge the slant of a hill. One group wrote about a time when they were seriously offended by another person but ultimately forgave that person. The other group wrote about a time when they were seriously offended but did not forgive. The group that wrote about having forgiven their offender reported the hills as less steep. In another study, Oishi, Schiller, and Gross (2013) manipulated felt understanding and misunderstanding and then assessed perception of hill slant. Participants first rated their own personalities (choosing from a list of 10 positive traits, 2 that described them most accurately and 2 that described them least accurately). Then, participants had a brief conversation with a partner, rated that partner’s personality traits, and then learned how their partner rated them. Importantly, though, the personality rating participants saw was experimentally manipulated to match or mismatch their own ratings (thus creating a felt understanding and a felt misunderstanding group). Although two of their measures, pain perception and estimated distance to an unseen location, were not based on visual perception, Oishi et al. found that estimates of slant were greater for those in the misunderstanding group. In the case of these emotions, social cues of offense and misunderstanding could undermine general feelings of self-efficacy for a variety of tasks as well as safety in one’s environment. Emotional states affect motivation to act and can also affect the body’s capacity to act. These results suggest that influencing potential actions also affects how people see the world in which those actions would take place.
Embodied perception
The suggestion that emotion affects action leads to a second source of evidence against a simple, linear, insulated view of perception: embodied perception. According to this view, our body’s capacity to act in the environment impacts our perceptions of the world. First reviewed by Proffitt (2006), the embodied approach to perception is supported by a variety of studies in which manipulations to participants’ bodies and their ability to act affect perception of the basic layout of the world. For example, studies on perception of hill slant have shown that hills appear steeper to those who are fatigued (Proffitt, Bhalla, Gossweiler, & Midgett, 1995), encumbered, or of old age and declining health (Bhalla & Proffitt, 1999). Researchers have also observed other effects of scaling with the body in spaces farther from the body. Stefanucci and Geuss (2009) showed that the width of participants’ bodies influenced estimates of the size of apertures such that differences in body size were related to differences in perceptual estimates. They also found that holding out one’s arms also influenced estimated width of apertures. Proffitt and Linkenauger (2013) argued that people achieve explicit or conscious visual awareness by rescaling visual information about spatial layout of the environment to metrics of the body. These “body units” then support decisions about which actions are possible. In other words, to be functional, perception must be scaled by some unit or ruler.
Witt and colleagues (Witt & Proffitt, 2008; Witt et al., 2005) extended this research beyond mere capacity to act and into the specific possible actions and intentions to act, often in the context of visually guided actions and sports. For example, in studies of distances close to the body (Witt & Proffitt, 2008), participants judged targets that were just beyond reach as closer when they held a tool. Witt and colleagues (Witt & Sugovic 2013; Witt, Sugovic, & Taylor, 2012) showed that the effectiveness of interacting with an object in a video game situation (either blocking a ball with a paddle or catching a fish with a net) influenced participants’ perceived speed of the object. When the paddle was larger, and therefore more effective at blocking the ball, participants judged the ball to be moving more slowly. When the net was larger, participants perceived the fish as moving more slowly. In other studies, effective performers judged the perceptual characteristics of their targets as more favorable to the performance of the action (for reviews of this approach as well as a study showing the effects of action on perception, see Philbeck & Witt, 2015; Witt, 2011). For example, golfers who were playing better judged the hole to be bigger (Witt, Linkenauger, Bakdash, & Proffitt, 2008), and experienced parkour athletes judged walls to be shorter than novices (Taylor, Witt, & Sugovic, 2011). Overall, these studies suggest that perceptual representations are constructed to be applied to future action and thus are grounded in the body that will be performing the action.
Top-Down Processing: Dissenting Views
Not all vision scientists agree that the above examples offer clear evidence of top-down processing or cognitive penetrability of perception (in which cognitive processes “penetrate” early perceptual processes). Firestone and Scholl (2016) argued that many proposed top-down influences on perception are effects not on perception itself, but on responses or judgments, or on the input to perception, rather than on the early perceptual process. They argued that claims of top-down influences suffer from interpretive pitfalls. For example, observers are more likely to see familiar shapes as the figure in ambiguous figure-ground stimuli (such as the face-vase figure; see Peterson & Gibson, 1994). Firestone and Scholl claimed that the effects of knowledge are not necessary to explain greater familiarity and sensitivity to contours. Rather, increased perceptual sensitivity to certain familiar contours is an effect of perceptual experience.
One common, possible pitfall noted by Firestone and Scholl (2016) is that top-down effects change perceptual judgments but not perception itself. For example, although people may be able to perceive color or size directly, whether something is expensive is a judgment rather than a perception (even if such a judgment is based on visual characteristics). Even when the experimental task is apparently a low-level visual task, changes may be in judgment rather than in perception. For example, observers may report a hill as being steeper when they are wearing a heavy backpack, but they may not actually see it as steeper. In addition, Firestone and Scholl argued that if the existence of top-down effects transforms our understanding of perception, there should be compelling demonstrations plainly observable outside the laboratory. In other words, there should be demonstrations similar to motion aftereffects, in which people are plainly aware of changes in their perceptual experience. The experiments above do not result in obvious changes in perceptual experience (e.g., golfers do not see the hole growing as their play improves). Overall, Firestone and Scholl made a strong claim that despite hundreds of recent articles, there is little convincing evidence of top-down effects of cognition on perception.
Recent experiments have sought to overcome such limitations and criticisms. Taylor-Covill and Eves (2016) used an individual-difference paradigm to investigate the perceived slant of stairs. They observed that participants’ body fat percentage was positively correlated with perceived slant. Because there was no experimental manipulation involved, participants were far less likely to guess the intention of the researchers and to alter their judgments in the hypothesized direction. Likewise, Zadra, Weltman, and Proffitt (2016) found that perceived distance was related to physical fitness (as measured by VO2 max at blood lactate threshold). Again, because Zadra et al. did not manipulate any variables, the participants were likely unaware of the hypothesized relation. Similarly, Witt (2017) described how the video game paradigms described above carefully control the effectiveness of a visually guided action task and the perception of the elements of that task (i.e., the speed of the ball and the size of a virtual paddle). With a more carefully controlled computer paradigm, some of the realism of the previous sporting tasks is lost, but many possible pitfalls in prior research are addressed (see also Philbeck & Witt, 2015, for a discussion of future research directions).
Teaching Top-Down Perception in a Bottom-Up Class
How can teachers present these ideas to students without confusing them about the general importance of bottom-up processing? First, I like to introduce students to these ideas with visual examples from their own experience. Many students have had the experience of navigating a hill by car or bike or by walking. Most can remember noticing how steep a hill seems when they have to power their own locomotion. In contrast, the hill seems less steep when driving. Similarly, student-athletes who have had experience with injuries and crutches can relate to how the change in their capacity to move alters the world they see. And, most likely, all students can relate to how changes in their mood and motivational states affect many other judgments they make.
Such appeals to the phenomenology of changes in students’ own bodies and emotional states are also a good reminder that our own experiences are often good inspiration for scientific research in psychology. Researchers do, however, have to precisely and carefully design experiments to confirm these experiences. In the case of sensation and perception, evidence for top-down processing offers an opportunity to review the possibility of valid behavioral instruments in carefully controlled studies, despite the difficulty of measuring perception directly. How can researchers measure what you see and not just what you say you see? At the end of several weeks discussing a bottom-up approach to perception, the possibility of top-down processing allows teachers to return to some of the fundamental questions that students often have in the first few weeks: “Do you see the same red as I do?” and “How do you tell if someone is more sensitive to color than someone else?” Although perceptual researchers are unable to answer essential phenomenological questions like the first (“Is your red the same as my red?”), they can use psychophysical variation and measurement techniques to answer the second (“Who is more sensitive to color?”). Examples like Lupyan’s (2017) blurriness matching task or Proffitt et al.’s (1995) visual matching of hill slant show how some measures do a better job than others at assessing perception without cognitive judgments adding noise to the responses. The contrast between judging real physical doorways in Stefanucci and Geuss’s (2009) study and judging video game objects in Witt’s (2017) studies shows the value of designing experiments that can focus on internal and external validity. Overall, reviewing the literature on top-down processing can be an opportunity to model how measuring seemingly subjective psychological states can still be accomplished with careful methods of measurement and manipulation.
Conclusion
In conclusion, new findings on top-down perception need not fundamentally change the way we teach perception to introductory psychology students. These findings do, however, suggest that we should not be afraid to complicate the simplistic picture that processing moves from light to eye to sensation to perception. Of course, perception begins with light reflecting off surfaces in the world and entering our eyes. The research reviewed above, however, suggests that perception, like any other mental process, occurs in a rich context of language, memories, emotions, skilled actions, and navigation. Although the world we see may not startle us as it changes in context, the psychological context of top-down processes, emotion, and our visual system being a part of our bodies influences our basic perception. Such an admission not only complicates the overly simplistic view of perception but also prepares our students for a more sophisticated and interconnected approach to other psychological processes.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
