Abstract
Research in visual object recognition has largely focused on mechanisms common to most people, but there is increased interest in whether and how people differ in the ability to recognize objects and faces. New tests with a variety of familiar categories are being created and validated to measure domain-specific abilities. Because variability in experience with familiar objects contributes to performance, tests with novel objects were designed; these tests provided evidence for a domain-general visual ability that is relatively independent from general intelligence. These advances have led to improvements in linking activity in some visual areas of the brain with domain-specific experience. Much remains to be done to uncover the neural correlates of domain-general visual ability and assess the predictive ability of visual abilities in real-world settings.
Everyone knows there are important individual differences in cognitive abilities and personality, but in areas without long-standing research traditions, intuitions about individual differences are less clear. Here, I discuss object-recognition skills, or the ability to learn to discriminate between visually similar objects, to learn which features are relevant for categorization and individuation, despite image differences such as changes in viewpoint (Palmeri & Gauthier, 2004). How people vary in these abilities has important practical implications; for instance, it is relevant to whether some people may be particularly capable of learning to match fingerprints, to perform radiological diagnosis, or to excel at forensic face matching.
Individual differences in cognition or personality deeply influence modern life, which is to some extent accepted because of shared beliefs of large differences in these areas. Therefore, I first sought to measure laypeople’s intuitions about visual skills. I asked 100 individuals 1 on Amazon Mechanical Turk to judge how much people might vary on six different tasks, some visual and some nonvisual. The nonvisual skills were “learning a second language,” “learning new math skills,” and “learning produce codes in a grocery store”; visual skills were “learning to identify cars,” “learning to match fingerprints,” and “learning to identify mushrooms.” Participants rated how the variability in these skills would compare with typical variability in people’s height (Fig. 1b), using a scale from 1 (much less variation than in height) to 7 (much more variation than in height). I also asked people how they would expect someone who excels at one of these tasks to perform in another (i.e., how related the tasks were; Fig. 1a); participants responded using a scale from 1 (not at all) to 7 (highly). Respondents thought that visual tasks were fairly distinct from other tasks typically related to general intelligence. In addition, respondents judged that people vary less in visual skills than in nonvisual skills. Of course, these intuitions may be incorrect, and recent studies are beginning to address these and related questions, but it appears that people may be willing to consider that measuring visual skills could add value beyond how we currently compare people.

Mean ratings of (a) the degree to which participants thought that people who excel at one type of skill would succeed in another and (b) the degree to which participants thought that variability in visual and nonvisual skills is related to variability in people’s height. Error bars represent within-subjects 95% confidence intervals.
Creating the Tests to Study Visual Abilities
A prerequisite to testing these intuitions is to create psychometrically useful tests. Efforts in research on high-level vision have only recently focused on creating and evaluating such measures. Although performance on many standard high-level visual tasks is highly sensitive to group-averaged effects (and to some extent because of this sensitivity; see Hedge et al., 2017), these tasks have low reliability to measure individual differences (e.g., contrast Richler & Gauthier, 2014, and Ross, Richler, & Gauthier, 2015; see also Herzmann, Danthiir, Schacht, Sommer, & Wilhelm, 2008). Cognitive scientists may sometimes be too quick to label variability across people in their measurements as “individual differences,” but the systematic study of individual differences requires first showing that such variability reflects more than measurement error. Duchaine and Nakayama (2006) led the way by creating the first standard test of face-recognition ability for use in the general population, the Cambridge Face Memory Test (CFMT). The CFMT is a learning task in which people study six unfamiliar male faces and, on subsequent trials, select which of three faces is one of the original six faces. Trial difficulty varies using changing viewpoint and image noise to allow discrimination among the best face recognizers.
Inspired by this work, my colleagues and I created similar tests for many object categories in the Vanderbilt Expertise Test (VET) battery. Each test requires subjects to study six targets followed by three-alternative forced choice trials (McGugin, Richler, Herzmann, Speegle, & Gauthier, 2012; Van Gulick, McGugin, & Gauthier, 2016; see also Dennett et al., 2012). Beyond achieving reliability, scientists must show that their measures have validity. For instance, CFMT performance correlates with performance on other face tasks (Furl, Garrido, Dolan, Driver, & Duchaine, 2011; Richler, Cheung, & Gauthier, 2011) and separates control participants from patients with face-recognition deficits (Duchaine & Nakayama, 2006). VET performance for each category (e.g., cars, shoes, or birds) correlates with measures of semantic knowledge in the same category, even after controlling for other visual and semantic measures (Van Gulick et al., 2016). Once these basic issues are tackled, experimental researchers expanding into individual differences can benefit from learning new statistical approaches, including factor analysis, structural equation modeling (e.g., Anderson & Gerbing, 1988) and item-response theory (Cho et al, 2015; Hambleton, Swaminathan, & Rogers, 1991).
Domain-General Visual Ability Is Not Intelligence
Using this first generation of object-recognition tests, we can assess whether people truly vary less in object-recognition skills than in general intelligence. The coefficient of variation (i.e., the standard deviation divided by the mean) reflects the amount of variation relative to the mean; being unitless, it allows comparisons of different abilities. For reference, IQ tests have a coefficient of variation of .15 (SD/mean, 15/100). Face recognition, as measured by the CFMT (Duchaine & Nakayama, 2006) in a very large sample (> 44,000 individuals; Germine, Duchaine, & Nakayama, 2011) had a coefficient of variation of .16 (13/81). In 213 people, six VET categories (cars, planes, transformers, shoes, birds, leaves) had a coefficient of variation of .15 and only two categories showed less variation (dinosaurs and mushrooms, .07), possibly because fewer people had a lot of experience with these categories (Van Gulick et al., 2016). Contrary to people’s intuitions, evidence suggests that people vary as much in high-level visual skills as they do in cognitive skills.
Laypeople also have the intuition that visual skills differ from general cognitive skills. Is this supported by evidence? The bulk of current research suggests that object recognition is independent from other cognitive skills. One of the first studies to suggest that face recognition was relatively independent from cognitive skills (Wilmer et al., 2010) reported that, despite sufficient range and reliability, memory for verbal material only weakly correlated with face-recognition ability (r = .17). Shakeshaft and Plomin (2015) measured general cognitive ability (g) as a composite of a verbal-ability test and Raven’s Progressive Matrices (Raven, 1998) and found that g showed only a small correlation with face and car recognition (r = .16 and r = .15, respectively, but see Gignac, Shankaralingam, Walker, & Kilpatrick, 2016). Using multivariate genetic analyses in twins, this study also found that although both face and car recognition are highly heritable, this heritability is mostly (> 90%) not shared with g. Richler, Wilmer, and Gauthier (2017) 2 measured object recognition for eight categories. On average, object recognition and IQ were modestly correlated (r = .20) but, more importantly, regressing out IQ did not reduce the shared variance between ability for object recognition in one category (e.g., birds) and the average object-recognition ability in all seven other categories (cars, dinosaurs, leaves, mushrooms, planes, shoes and transformers; average r = .47). Thus, although intelligence might influence performance on specific visual-ability tests, the underlying domain-general visual ability appears to be independent from intelligence.
Visual skills may be distinct from intelligence, but the study of visual abilities can benefit from theories in the study of intelligence. Horn and Cattell (1966) defined fluid intelligence as the ability to solve new problems and crystallized intelligence as knowledge accumulated from experience. In recent work (Richler, Wilmer, et al., 2017), we explored whether a domain-general visual-recognition mechanism could apply broadly and predict how people acquire new visual skills. Because variable experience with familiar categories can contribute to differences in performance, we created Novel Object Memory Tests (NOMTs; see Fig. 2), in a format similar to that of the CFMT and VET (Richler, Wilmer, et al., 2017). We found that performance on these tests varied just as much as on familiar-object tests (coefficient of variation between .13 and .15) but showed more shared variance across categories (about 25%) than is typically observed for familiar-object tests (about 11%). Note that we verified that the ability measured in the NOMTs is not explained by cognitive skills: Shared variance between NOMTs remained unchanged after controlling for performance on the various measures of general intelligence. Researchers attempting to estimate a general visual ability that is distinct from intelligence should consider using NOMTs or other tasks with novel objects to avoid complications from varying experience. Other work (Richler, Tomarken, et al., 2017) shows that tasks with novel objects can predict performance with familiar-object categories.

Sample procedure of a Novel Object Memory Test (NOMT). Shown in (a) are examples of novel objects in a NOMT (Richler, Wilmer, & Gauthier, 2017). Each object in each set has unique parts, although parts can be very similar across objects. There is no rule that can be generalized across categories. In each NOMT, six targets are studied individually in two views, then (b) all six are shown together to memorize for 20 s. This is followed (c) by a series of three-alternative forced-choice trials.
Much more work is needed to clarify the nature of the abilities measured by the NOMTs. We have not explored whether the ability they measure is purely visual, and ongoing projects explore discriminant validity against measures of working memory capacity (Z. Xu, Adam, Fang, & Vogel, 2017) and executive functions (Miyake & Friedman, 2012).
Cars (and Faces) Are Special
NOMT performance in the faces category shows a lower correlation (average r = .33) than the correlations among performance in different NOMT categories (average r = .48). This could lead us to conclude that face recognition is special. However, a second study considered the relation between NOMT scores and performance with another category, cars, using the Cambridge Car Memory Test (similar to the CFMT and VET-car; Dennett et al., 2012). Cars are often used to estimate object-recognition ability in studies using only a single object category (e.g., Shakeshaft & Plomin, 2015). However, in studies that use many familiar categories, performance with the car category dissociates from performance with other categories at least as much as performance with the face category does (McGugin, Richler, et al., 2012; Van Gulick et al., 2016). Likewise, the correlation between average NOMT score and car NOMT scores (average r = .32) was lower than the NOMT-score correlations among other categories (average r = .48). Thus, if faces are special because their recognition is poorly predicted by performance with other objects, then so are cars.
Finding that performance with faces and performance with cars are equally independent from performance with other categories (and relatively independent from each other too) was unexpected but important. It offers a new context for studies in which comparisons of performance with faces and cars have suggested that face recognition is “special” (Dennett et al., 2012; Shakeshaft & Plomin, 2015). A comparison between only two categories (e.g., faces and cars) is not sufficient to claim that one category (e.g., faces) is special (Gauthier & Nelson, 2001) because this comparison assumes cars are representative of all nonface object-recognition categories. Indeed, when considering a single correlation between two categories (e.g., r = .37 between face recognition and car recognition; Dennett et al., 2012), one can emphasize either common variance (suggesting overlapping mechanisms) or unshared variance unrelated to measurement error (suggesting independent mechanisms). For perspective, the mean pairwise correlations between any two nonface object-recognition tests (e.g., birds, cars, planes, dinosaurs, shoes) are about the same magnitude (r = .33; McGugin, Richler, et al., 2012; Van Gulick et al., 2016). Note that these pairwise correlations vary greatly (r = .0–.5), and performance for faces and cars, across different data sets, consistently shows the lowest correlations with performance for other categories (McGugin, Richler, et al., 2012; Van Gulick et al., 2016; Richler, Tomarken, et al., 2017).
If only faces are special, then an evolutionary account, although difficult to test, at least seems plausible. In contrast, car recognition may seem special for multiple reasons: People vary more in their car experience than in other categories, most people are very familiar with cars, and people have more semantic information about cars than they do for many categories. So far, we find little support for most of these hypotheses. No measure of car experience or car knowledge mediated the relation between car and other category performances (Richler, Wilmer, et al., 2017). However, measuring experience with a category is particularly challenging (Gauthier et al., 2014), and we measured only knowledge of car model names as a proxy for semantic car knowledge (Van Gulick et al., 2016).
One interesting way to measure experience is population density: One report found that face recognition increases with increases in hometown population density (Balas & Saville, 2015). Although population density may not influence experience for all categories, it would also apply to cars. Perhaps with increasing experience, people rely on different recognition mechanisms for faces and cars, reducing the contribution of domain-general variance in these cases. Of course, car and face recognition do not have to be “special” for the same reason (or reasons), but until we understand why car recognition is relatively distinct, we cannot rule out the idea that both face and car recognition become special for the same reason. After all, both car recognition and face recognition show a large and equivalent degree of heritability (Shakeshaft & Plomin, 2015; Wilmer et al., 2010), and the car-recognition ability repeatedly correlates with neural responses to cars in the fusiform face area (FFA, Gauthier et al., 2000; McGugin, Gatenby, Gore, & Gauthier, 2012; McGugin, Van Gulick, Tamber-Rosenau, Ross, & Gauthier, 2015; Y. Xu, 2005). These results certainly raise new questions in a field often motivated by differences between face and nonface object recognition (e.g., Farah, Wilson, Drain, & Tanaka, 1998; Kanwisher, 2000).
Measuring Domain-Specific Visual Abilities
In the first studies looking at correlates of real-world expertise (Gauthier et al., 2000; Xu et al., 2005), performance for a category of interest (e.g., cars) was compared with a baseline for another single category (e.g., birds), and the difference in performance was related to the difference in neural response for cars minus birds, and vice versa (i.e., in the same subjects, the birds-minus-cars calculation was used to quantify bird expertise and was related to the birds-minus-cars neural response). Although this led to replicable expertise effects for objects in the FFA, there are limitations to this approach. First, when using this subtraction approach, the control category can contribute as much variance as the main category, so a car-expertise score may be a car-expertise-relative-to-bird-expertise score. A more appropriate approach may be to regress out the control-category variance (DeGutis, Wilmer, Mercado, & Cohan, 2013). Second, even using regression, it is unlikely that a single control category could accurately represent “generic” object-recognition mechanisms. Third, using only a single task means that some individual differences could be task-specific rather than general. Thus, in recent studies of expertise, we use several control categories (e.g., comparing cars with objects in seven other categories) and several tasks (e.g., matching and learning tasks; McGugin et al., 2015). A definition of domain-specific expertise that transcends task and control category can increase validity of measurements and should lead to predictions of behavior and neural effects in a wider range of situations.
Future Directions
The work discussed here is a stepping-stone to several new avenues of research. For instance, past studies of expertise have uncovered neural substrates related to performance with a single category, such as cars, and thus point to the neural substrates of category-specific effects that are likely to be related to experience. New studies focusing on domain-general visual skills are needed to reveal the aspects of brain structure and activity that support the ability to acquire expertise in new categories. As we learn more about domain-general visual abilities, it will be important to assess their predictive validity in real-world domains such as medical imaging. As we take on this challenge, it is important to remember that the quality of our predictors cannot make up for poor measurements of the criteria we want to predict. Additional work may be required to achieve reliable measures of the quality of decisions in professional areas. In addition, experts are by definition rare, limiting sample sizes, and more longitudinal efforts are needed to study the trajectories from novice to expert, which may be constrained by visual abilities (Gegenfurtner et al., 2017).
Recommended Reading
Duchaine, B., & Nakayama, K. (2006). (See References). A landmark study in the creation of tests to study visual abilities in the normal population.
Hedge, C., Powell, G., & Sumner, P. (2017). (See References). A discussion of how tasks that are well suited for experimental work may be too unreliable for individual-differences research.
Richler, J. J., Wilmer, J. B., & Gauthier, I. (2017). (See References). A recent study offering the first tests designed to measure domain-general object recognition.
Shakeshaft, N. G., & Plomin, R. (2015). (See References). A twin study showing that face and car recognition are heritable and independent from intelligence.
Van Gulick, A. E., McGugin, R. W., & Gauthier, I. (2015). (See References). A study offering a battery of tests for visual abilities and semantic knowledge for several object categories that can be used in future work.
Footnotes
Acknowledgements
I thank Mackenzie Sunday for providing comments on the manuscript.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Funding
This work was supported by National Science Foundation Awards SBE-1640681 and BCS-1534866.
