Abstract

The study of high-level visual perception is often divided into distinct branches of research focusing on objects alone or scenes. However, given that all input through the retina is essence a scene, the study of scene perception arguably encompasses both domains. Further, with an increasing push toward more naturalistic stimuli and paradigms, understanding broader scene perception becomes more critical. In this context, Castelhano and Williams’, Elements of Scene Perception, published as part of the Cambridge Elements series, provides a highly accessible and comprehensive overview of decades of research into scene perception.
The book is firmly focused on human behavioral research—neuroimaging, brain stimulation, and neurophysiological studies are barely mentioned. But this is no bad thing, given the rich and varied studies highlighted in this volume, from rapid scene categorization to memory to eye fixations. As the authors note, while scene perception is often thought about in the context of navigating real-world environments, most research (and the bulk of the work covered here) has used two-dimensional images, a critical starting point. Individual chapters focus on specific areas of research including initial scene understanding (and the rapidity of scene processing), online representations, long-term memory, eye movements (and attention), as well as spatial representations and navigation. The broad scope of scene perception research is evident throughout the book.
Although each individual chapter is self-contained, there are several common themes that emerge. Scenes can generally be thought about as containing both object and spatial information, which includes both the general layout (or structure) of the scene and its boundaries and the specific spatial arrangement of the objects. There are also statistical regularities and physical constraints, such as the arrangement of surfaces on which objects might occur, and semantic information which informs the consistency of an object given a specific context or its likely location—a kettle is much more likely to be found in a kitchen than a bathroom, and on the counter than on the floor. The observer's goal or task is also critical. For example, it has long been known that the specific pattern of eye movements and fixations observed during scene viewing depends on the task being performed. Fixations are more widely dispersed during the encoding period of a memory task than during visual search for a specific object, and the fixation pattern strongly reflects the nature of the item being searched for. In revealing these common themes, the book helps synthesize the literature and provides much food for thought.
Throughout the text, Castelhano and Williams highlight the multiple factors, including visual properties, attention, eye movements, memory, and semantics, that all impact scene perception and often interact. The resulting complexity creates a difficult challenge for understanding scene perception. At the same time, the work reviewed demonstrates how careful experimental paradigms and manipulation of the contents of scenes including their visual and semantic properties can be used to tease apart these different factors. Two-dimensional images might not entirely reflect real-world environments but provide the necessary control to be able to isolate diverse influences.
One appealing feature of the book is the inclusion of an “Application in the Real World” section at the end of each of the main chapters. For example, following a discussion of online scene representations and change blindness is a brief discussion of film perception and the types of “cut” used by film editors. Castelhano and Williams discuss how the rules that filmmakers and editors have developed, such as cutting across action, are often intended to mask changes across scenes or increase change blindness. Similarly, the chapter on visual search in scenes includes a discussion of visual search in the context of radiology. Search for tumors in x-rays is complicated by the low frequency with which tumors appear, the limited spatial constraints over where such tumors may appear (in contrast to the spatial constraints in real-world scenes), and the need to scroll through depth in volumetric images. By highlighting these connections with real-world examples, Castelhano and Williams help bring the work out of the laboratory.
Inevitably, a short book like this can't review everything, and more recent work on topics such as deep neural network approaches to saliency (e.g., DeepGaze), the use of virtual reality or immersive 3D environments, and gaze reinstatement during scene recall, are not covered. But this is partly a result of the fact that the study of scene perception is growing so rapidly. Given this growth, this book is particularly valuable—it is a systematic overview of different areas of behavioral research providing a timely and informative guide that is an excellent resource for graduate students and those wanting a broad-based perspective on scene perception.
