Abstract

Reviewed by: Alasdair Clarke, University of Essex, Colchester, UK
At first glance, the world presents a seemingly intractable problem for our visual system. There are many different object categories that we need to discriminate between, and within each category, there are often specific individuals that we must be able to quickly learn to accurately recognise. When looking after my brother’s dog, it is not enough to discriminate between dogs and cat, but I must also be able to recognise Poppy from other golden retrievers at the park. On top of all this, we have to deal with the fact that an image of an object will drastically change as the viewpoint and illumination vary. Somehow we make sense of all this complexity, and I can recognise family, friends and pets.
Visual Cortex and Deep Networks by Poggio and Anselmi aims to explain the first 100 ms of perception, from light hitting the retina to the activation of the inferior temporal cortex. They put forward a mathematical framework, named i-theory, based on the idea that the aim of the ventral stream of the visual system is to learn invariant representations. For example, I can learn to recognise Poppy the golden retriever over a wide range of viewpoints and poses without requiring a huge amount of training data from these conditions. Similarly, changing the font of text does little to interfere with my ability to read. Experiments with simple computer vision algorithms lend support to this idea, with classification problems becoming many orders of magnitude easier if viewpoint, scale and illumination do not vary.
Poggio and Anselmi’s short book is split into five chapters over 64 pages, followed by a lengthy appendix (44 pp.) containing mathematical proofs for many of the results discussed in the main text. While the authors claim that this will allow for a broad audience, as less mathematically inclined readers can skip the mathematics without compromising their understanding, the main text still contains a large amount of formal mathematical notation which I expect readers of Perception may struggle with. I admit that I found it hard going myself, despite having a degree in pure mathematics (although it has been many years since I last studied maths and I am clearly out of practice!)
The first chapter introduces and motivates the ideas behind i-theory. The algebra and group theory is linked to the filtering and pooling processes carried out by simple and complex cells, and the Hubel–Wiesel module is identified as the key computational component. The ideas discussed here have much in common with the linear–nonlinear–linear models (also referred to as filter-rectify-filter and ‘the back pocket model of texture segregation’), although Poggio and Anselmi focus more on the links with group theory and present several theorems relating to invariance. The question of how these functions could be implemented in neuronal circuits is discussed in Chapter 2.
With Chapter 3, however, we return to higher level concepts with a discussion of visual areas V1, V2 and V3. A large part of this chapter deals with the multiresolution nature of the retina and V1. While I am familiar with the use of common image processing techniques such as multiscale pyramids, I found this more theoretical treatment of the topic refreshing. (Although I should also add that I was surprised by how difficult I found it to follow.) Despite its difficulties, this chapter makes it clear that Poggio and Anselmi’s approach has some clear strengths. They detail how i-theory offers an explanation for the architecture of retina and V1 and gives rise to Bouma’s famous law about visual crowding. The authors end the chapter with several new predictions that await empirical testing! Chapter 4 continues this line of work up to area V4 and the IT stage.
The short final chapter is perhaps the most important. It starts with a discussion about the link between i-theory and the deep convolutional neural networks that have proved so successful recently in AI research. This is followed by a list of predictions that should hopefully inspire future empirical work.
I greatly admire the work described in this book, and the book’s main contribution is to collect a series of technical reports together. I’ve spoken to colleagues in the past about my belief that our field, and cognitive psychology in general, is held back by a lack of theoretical mathematical models. Understanding the human brain is surely one of the most complex areas in science, and I have always felt it naive to assume that psychologists will be able to do it without comparable training in mathematics to subjects such as physics. However, this said, I feel that Visual Cortex and Deep Networks will be a challenging introduction to this area unless the reader has a strong grounding in mathematics. I would recommend that interested vision sciences begin with another text (such as Zhaoping’s excellent book ‘Understanding Vision’) and then continue with Poggio and Anselmi’s book if they require more in-depth understanding.
