Abstract
When viewing unfamiliar faces that vary in expressions, angles, and image quality, observers make many recognition errors. Specifically, in unconstrained identity-sorting tasks, observers struggle to cope with variation across different images of the same person while succeeding at telling different people apart. The use of ambient face images in this simple card-sorting task reveals the magnitude of these face recognition errors and suggests a useful platform to reexamine the nature of face processing using naturalistic stimuli. In the present study, we chose to investigate the impact of two basic stimulus manipulations (image blur and face inversion) on identity sorting with ambient images. Although these manipulations are both known to affect face processing when well-controlled, frontally viewed face images are used, examining how they affect performance for ambient images is an important step toward linking the large body of research using controlled face images to more ecologically valid viewing conditions. Briefly, we observed a high cost of image blur regardless of blur magnitude, and a strong inversion effect that affected observers’ sensitivity to extrapersonal variability but did not affect the number of unique identities they estimated were present in the set of images presented to them.
Introduction
Human face recognition abilities have frequently been described in terms of the high levels of performance most adults can achieve when presented with familiar faces. When viewing faces of individuals that they are familiar with, observers exhibit impressive tolerance for a wide range of appearance variations. These include changes in viewpoint (O’Toole, Edelman, & Bulthoff, 1998), lighting (Adini, Moses, & Ullman, 1997), and expression (Mian & Mondloch, 2012) as well as less ecologically valid distortions like image shear (Sandford & Burton, 2014) and nonisotropic expansion or compression (Hole, George, Eaves, & Rasek, 2002). These abilities are not representative of face recognition considered more broadly, however. Recently, there has been increased attention given to the profound differences between familiar face recognition and unfamiliar face recognition, the latter of which tends to be far less impressive than the former (Johnson & Edmonds, 2009). In general, matching different images of the same unfamiliar face is far more difficult than accomplishing the same feat with familiar faces. This is evident in small- to medium-size effects of personal familiarity on response latencies (Balas, Cox, & Conwell, 2007) and also in gross failures of recognition that occur when individuals are asked to work with unfamiliar faces in everyday contexts (Bruce et al., 1999; Bruce, Henderson, Newman, & Burton, 2001; Kemp, Towell, & Pike, 1997).
The observation that unfamiliar face recognition is generally quite poor is not new: What has motivated a great deal of recent work is the combination of an unconstrained card-sorting task as a means to characterize face recognition performance (Andrews, Jenkins, Cursiter, & Burton, 2015; Jenkins, White, van Montford, & Burton, 2011) as well as the use of ambient images that depict faces under a wide range of viewing conditions (Burton, 2013). Both of these are important contributions that deserve to be considered separately. The use of ambient images in face recognition tasks is an important step toward linking laboratory results to real-world face recognition. While most work on face recognition has made use of frontally viewed, well-lit faces that are frequently aligned or cropped to be more uniform, face recognition in the wild obviously obeys no such constraints. Studying face recognition only via tightly controlled images obviously places serious limitations on the conclusions we can draw with regard to everyday face processing. Independent of this, the unconstrained card-sorting task popularized in many recent studies (Andrews et al., 2015; Balas & Pearson, 2017; Jenkins et al., 2011) offers a particularly useful paradigm for examining the errors observers make when attempting to sort faces according to identity. Briefly, in this task, observers are asked to sort a large set of ambient images according to identity such that each group of cards they create depicts all of the images in the set that the observer believes depict the same person. Typically, observers substantially overestimate the number of unique people in the set of cards (an outcome that is exacerbated when other-race faces are used [Laurence, Zhou, & Mondloch, 2015; Yan, Andrews, Jenkins, & Young, 2016]) and also rarely place cards depicting different people into the same group. This pattern of results suggests that a key shortcoming of unfamiliar face recognition is an inability to tell people together, or generalize identity across large appearance variations. The combination of useful stimuli (ambient images) with an important methodology (unconstrained card sorting) has thus revealed properties of unfamiliar face recognition that were previously not known, due in large part to the limits of traditional methods for measuring face recognition abilities.
Applying ambient image stimuli and card-sorting methods revealed basic properties of unfamiliar face recognition performance, so it is also probably worth considering what other results derived from more traditional approaches may look like when reconsidered within the same framework. Critically, the goal of such investigations is not to more thoroughly understand how card-sorting tasks work, but to use such tasks as a convenient means of exploring how the recognition of naturalistic faces may differ from the recognition of face images that have limited variability. Specifically, there are several aspects of face recognition that are straightforward to understand in the context of laboratory studies but more challenging to apply to naturalistic stimuli. For example, consider the large literature describing face recognition as holistic or configural (see Maurer, Le Grand, & Mondloch, 2002 for a review). The former term typically refers to a presumed mode of processing in which the entire face pattern is processed as a unit, most often revealed by variations on the composite face effect (Richler, Palmeri, & Gauthier, 2012; Young, Hellawell, & Hay, 1987). This effect, in which the top or bottom half of a face image affects the appearance (and discriminability) of the other half even when it is task irrelevant, heavily depends on typical laboratory face stimuli that are usually forward facing, often cropped, and typically free of exogenous sources of variation. The latter term, configural processing, can refer to a number of different aspects of face processing. One such aspect, second-order configural processing, typically refers to a presumed mode of processing faces that depends on estimating geometric relationships within a face (e.g., interocular distance) and using these metric features, either in isolation or combined into ratios, to recognize individuals. Again, most demonstrations of impressive configural processing abilities rely heavily on well-controlled face stimuli. Indeed, there have been several demonstrations that these kinds of geometric faces are both difficult to measure in ambient images (Gaspar, Bennett, & Sekuler, 2007) and unlikely to be useful even when they are measured (Brunelli & Poggio, 1993; Kleinberg, Vanezis, & Burton, 2007; Taschereau-Dumouchel, Rossion, Schyns, & Gosselin, 2010). In both cases, these terms that have had a substantial influence on our understanding of face recognition are profoundly difficult to examine with natural images of faces that are more representative of our everyday experience. This is not to say that they are unimportant but rather to highlight the need to reexamine how face recognition works in naturalistic settings: Does face recognition work the same way with ambient images as it does with laboratory images?
In the present study, we examine this broad question through the lens of two aspects of face recognition that are relatively easy to translate into the domain of ambient images: (a) observers’ tolerance of image blur in face images and (b) the impact of planar inversion on face recognition. Both of these are well-studied aspects of face recognition but have been examined primarily with laboratory stimuli. That is, while we have substantial evidence that Gaussian blurring negatively impacts performance (Collishaw & Hole, 2000; Sandford, Sarker, & Bernier, 2018) and that face inversion also does so rather dramatically (Yin, 1969), these results have largely been obtained with front-facing, well-lit, unoccluded, faces that were usually photographed within a narrow range of distances. Any one of these sources of variability could introduce substantial difficulty into a simple face recognition (for a recent example of how distance-to-camera impacts appearance, see Noyes & Jenkins, 2017), which could have profound consequences for how face recognition abilities are affected by blur and inversion, among other manipulations, when more ecologically valid images are used. Here, we examined whether or not observers presented with ambient images affected by blur and inversion would exhibit patterns of behavior that are consistent with previous results using more controlled stimuli. In both cases, we use unconstrained card sorting as an index of unfamiliar face recognition ability, with some modifications introduced to allow for the stimulus manipulations we wished to investigate. Again, we emphasize that the goal of both experiments was to understand how the recognition of ambient face images may differ from that of limited-variability face images, using an established card-sorting task as a useful means to study face recognition using these stimuli.
Experiment 1
The goal of our first experiment was to determine how image blur affected the ability to sort faces by identity. Specifically, we applied varying levels of blur to ambient images of unfamiliar faces and asked observers to complete an unconstrained card-sorting task with these stimuli.
Methods
Participants
We recruited 86 undergraduates (55 females) to take part in this study from the North Dakota State University (NDSU) Undergraduate Study Pool. These participants all self-reported normal or corrected-to-normal vision and were unfamiliar with the celebrities used in our stimulus set. Participants received course credit for volunteering and were naïve to the purpose of the experiment. Prior to beginning the testing session, each participant gave written informed consent, consistent with procedures established by the NDSU institutional review board.
Stimuli
Our stimulus set was comprised of 20 ambient images of each of four Dutch celebrities (Doutzen Kroes, Dewi Driegen, Lara Stone, and Sylvie Van Der Vaart) that we expected would be unfamiliar to our undergraduate population. For each of these individuals, we collected ambient images via Google Image Search and cropped these to isolate the face region. In general, our cropping procedure tended to remove most of the hairline but retained the external contour along the temples, cheeks, and jawline. The reason for this was practical and motivated by the performance of pilot participants. Specifically, in many cases, we found that the external contour of the hair in many of our images proved difficult to segment cleanly from the background. While one solution to this problem is to include a sufficiently large amount of the background itself to ensure that the entire hairline is visible, we found that participants had a tendency to try and use this contextual cue as a means of deciding how to group faces together. We therefore chose to adopt this strategy, though it does reduce the ecological validity of our stimulus set somewhat. Besides this concern about ecological validity, there has also been substantial work suggesting that internal face features are relied upon more heavily in familiar face recognition than unfamiliar face recognition (see Kramer, Young, & Burton, 2018 for a discussion of this point). Our cropping procedure may thus be removing visual features, participants are more likely to rely upon when trying to recognize our set of unfamiliar faces. However, recent results from Kramer, Manessi, Towler, Reynolds, and Burton (2018) from a card-sorting task similar to ours demonstrate that full faces (internal and external features) and faces containing only internal features are of approximately equal use for the purposes of identification, while external features are less useful. While we agree that it is general better to use face images that have not been edited to remove specific features, we take this recent result as evidence that our choice to crop images in the manner described earlier should not have a substantial impact on our results. Otherwise, we attempted to capture a wide range of viewing conditions including variation in viewpoint, expression, lighting, occlusion, and other sources of appearance variability. Our first viewing condition, original images, was comprised of these photos, resized for presentation the form of easy-to-handle cards.
To impose varying levels of image blur on these images, we imported each image into MATLAB and resized each picture to a canonical size of 640 pixels × 800 pixels. After this rescaling procedure, we applied a low-pass Butterworth filter to each image and varied the cutoff frequency to obtain three different levels of image blur, which we refer to here as low-, medium-, and high-blur levels. Specifically, we imposed low-pass thresholds of 4, 8, and 16 cycles per image (Figure 1). We emphasize, however, that these values should only be taken as loose estimates of the spatial frequency content of the faces due to the variability inherent to sets of ambient images. We report these values here to be consistent with previous literature describing spatial frequency content in face images with reference to these units, but characterizing these levels of blur in qualitative terms is probably more meaningful.

An example of a single face image cropped and blurred according to the procedures described in Experiment 1. We emphasize that our full stimulus set incorporated substantial variability within the original, unblurred images, so the amount of blur in terms of approximate cycles per face within each level was also variable to some extent.
Following these image processing procedure, each face image was placed on a white background, then printed onto a 2″ × 3″ card for sorting. We applied a unique letter string to the back of each card for easy coding after each experimental session, and these strings were chosen randomly so that participants could not attempt to use these codes during sorting.
Procedure
After providing written informed consent, participants were seated at a desk in a sound-attenuated testing room. We presented participants with a full deck of cards (80 images total) and instructed them to sort these cards by identity. Specifically, we told participants that the deck had a large number of cards, but that we would not disclose the number of unique individuals depicted in the deck. Participants were told that this meant that there could be 80 different pictures of only one person, 80 different people with one image each, or anything in between. They were asked to sort cards into piles based on identity, such that each pile they created contained all of the images that they believed depicted a unique individual in the deck, and no images of anyone else. Participants were told that they could work through the deck at their own pace and rearrange cards as they saw fit before deciding on their final solution.
Each participant was presented with cards at a single blur level (low [N = 22], medium [N = 21], high [N = 21], or no blur [N = 22]). Note that there were unequal participants across our four groups. Our target number of participants per condition was 20, but in each case, we inadvertently tested a small number of additional individuals and chose to retain their data rather than discard it. Participants who were given blurred images were told that the faces in the deck might be somewhat difficult to see because of blurriness, but that they should still do their best to group cards by identity. Participants were randomly assigned to blur conditions.
Results
As in our previous work with card sorting (Balas & Pearson, 2017), we analyzed the results of this experiment using a signal detection theory (SDT) framework. In this approach, we consider sorting performance in terms of all possible pairs of cards in the stimulus set, including card pairs depicting different people (extrapersonal variation) and card pairs depicting the same person (intrapersonal variation). By examining the fate of the constituent images in each pair (placed in the same group or not) relative to what was correct, we can characterize each individual participant’s sorting solution in terms of hit rates and false alarm rates for successful detection of extrapersonal variation, which subsequently allows us to compute both d′ and response criterion values for each participant. For a full description of this analysis approach, see Balas and Pearson (2017). Besides these SDT-based measures of performance, we also conducted an analysis of the number of groups made in each condition to be commensurate with prior work reporting the same (Andrews et al., 2015; Laurence et al., 2015). For all of these descriptors, we used Bayesian analyses of variance implemented in JASP (JASP Team, 2018) to determine the evidence favoring the null and alternative hypotheses in terms of Bayes factors. For each descriptor, we also include a more standard null hypothesis significance testing (NHST) analysis of the same data using analysis of variance to facilitate comparison between the current results and prior studies that did not use Bayesian analyses.
Sensitivity (d′)
Our analysis of observers’ sensitivity to extrapersonal variation indicated that the data provided substantially more evidence in favor of the alternative hypothesis than the null hypothesis (BF10 ≈ 3.4 × 108). Post hoc tests (Bayesian paired-samples t tests) further revealed that this was due to differences between observers’ sensitivity in the original images condition (M = 1.00, SD = 0.5) and all three blurred image conditions (Mlow = 0.24, SD = 0.27; Mmedium = 0.16, SD = 0.26; Mhigh = 0.32, SD = 0.37). Comparisons between the unblurred condition and each of the three blurred conditions revealed substantially more evidence for the alternative hypothesis than the null hypothesis (BF10 > 1,500 in each case), while comparisons between the blurred conditions yielded Bayes factors that suggested more evidence for the null hypothesis (Figure 2). We therefore conclude that while the difference between unblurred ambient images and blurred images at all levels leads to a large effect on sensitivity to extrapersonal variability, there are at best probably only small differences between observers’ sensitivity across different levels of blurriness.

Average d′ values as a function of approximate image blur in Experiment 1. We found that observers performed substantially better with unblurred than blurred images. Further, the level of performance achieved with any amount of blur was generally low and did not differ much across conditions.
Our NHST analysis of these data reveals similar outcomes. We observed a significant effect of blur level, F(3, 82) = 24.23, p < .001, η2 = 0.47, that was driven by significant differences between performance in the original images condition and each of the three levels of blur—Tukey’s test, original versus low blur: t(40) = 6.89, p < .001; original versus medium blur: t(41) = 7.53, p < .001; original versus high blur: t(41) = 6.08, p < .001. We observed no significant differences between any of the other pairs of conditions (p > .48 in each case).
Response Bias (c)
Our analysis of response bias (the tendency to place cards into different groups regardless of their true status) indicated that the data provided more evidence for the null hypothesis than the alternative hypothesis (BF01 = 2.32). That is, the average response bias was more than two times as likely under the null hypothesis that there is no difference between conditions. In all cases, participants exhibited an overall tendency to label image pairs as depicting different people (consistent with the overfragmentation of groups observed in previous studies), which was reflected in negative values of the response criterion (c). However, the magnitude of c did not differ much across blur conditions (Table 1). Our NHST analysis of the same data also revealed no significant effect of blur on criterion values, F(3, 82) = 1.83, p = .15, η2 = 0.063.
Descriptive Statistics (Mean and Standard Deviation) for Average Response Criterion and Numerosity Values Across Participants.
Number of Groups
Our analysis of the number of groups made as a function of blur condition indicates that the data provide more evidence in favor of the null hypothesis than the alternative hypothesis (BF01 = 3.3). The data in this case are more than three times as likely under the null hypothesis that there is no difference between conditions. Thus, though we see a trend for greater group numerosity in the high-blur condition relative to the medium-blur, low-blur, and original images condition, we conclude that blur does not greatly affect the number of groups that individuals tend to make while sorting these unfamiliar faces. An NHST analysis yielded a similar outcome, revealing no significant effect of blur level on the number of groups participants made, F(3, 82) = 1.46, p = .23, η2 = 0.051.
Discussion
The results of our first experiment clearly demonstrate that image blur negatively impacts observers’ ability to discriminate identity in ambient face images. This by itself is not surprising: Unfamiliar face recognition has been shown to be sensitive to image blur in many different settings, including studies that relied on images with relatively limited variability (Ramon & Van Belle, 2016) and studies that relied on more naturalistic images (Bruce et al., 1999; Sandford et al., 2018). However, our results demonstrate several important features of how blur quantitatively affects performance for ambient face recognition that go further than basic demonstrations that blurring makes recognition more difficult.
First, we note that the level of blur appears to matter very little in our task. Both relatively minor blurring and major blurring led to more or less the same level of performance decrement relative to unblurred images. Moreover, that performance decrement was quite large. Unlike previous studies using more standard laboratory images, we found that imposing image blur of any amount led to markedly lower sensitivity to extrapersonal variation. Indeed, the d′ values we observed in all blur conditions are not far from chance, suggesting that blurring face images substantially compromises recognition in this setting. This differs from previous studies in which increasing blur led to increasingly poor performance on face recognition tasks (Costen, Parker, & Craw, 1996) or only led to poor performance once a threshold of pixelization was passed (Bachmann, 1991). Our results therefore represent an important distinction between the processes that support identity discrimination in ambient images and face images with limited variability. What specific differences in processing might lead to the results observed here? We propose several different accounts of our data, each of which highlights a different potential distinction between how ambient face images are recognized and how faces with limited appearance variability may be recognized. First, it could be the case that ambient face recognition for unfamiliar faces is sufficiently difficult that any reduction in the information available to an observer severely impacts performance. Another way to think about this is to say that ambient face recognition is more fragile than face recognition from limited-variability images, possibly because the latter can be bolstered by generic pattern-recognition mechanisms that are not easily applicable to faces that vary substantially in appearance. If the intrinsic difficulty of the task is to blame for the nonlinear dependence on blur that we have observed, then it should be the case that even limited-variability faces that are sufficiently hard to distinguish from one another should be subject to a similarly nonlinear effect of blur on recognition performance. A second possibility that our data suggest is that ambient face recognition may rely more heavily on high spatial frequencies than face recognition with limited-variability images. For example, if surface pigmentation is used to provide some amount of invariance to changes in expression, pose, or lighting, even mild amounts of image blur could conceivably remove critical texture features that were being used to guide performance. While pigmentation is known to be important for face recognition (Russell, Sinha, Biederman, & Nederhouser, 2006), to our knowledge there has not been any direct investigation of the relationship between spatial frequency subbands and surface pigmentation features that are used for face recognition in unfamiliar faces. Therefore, this hypothesis is necessarily speculative but does offer an additional account of how ambient face recognition may differ from limited-variability face recognition based on our results. In each case, we assume that these features of our data are not likely to be specific to card-sorting tasks but should be measurable in other recognition tasks using ambient face images, though this remains an important point to consider when evaluating recent results using this paradigm.
Another interesting feature of our data is that compared with other studies using the card-sorting paradigm employed here, we did not observe an impact of image blur on the number of groups created by participants. This dependent variable is often used as a proxy for recognition performance, in particular the ability to tell faces together (Andrews et al., 2015; Balas & Saville, 2017; Laurence et al., 2015). The fragmentation of an identity across different groups has been interpreted as a specific measure of how well observers are able to generalize identity across highly variable face images. For example, when observers are asked to sort other-race faces, the number of groups made increases significantly (Laurence et al., 2015), which has been interpreted as evidence supporting an other-race effect for ambient face recognition. We make two observations about this inference: (a) We have reported elsewhere that in our signal detection approach to analyzing card-sorting data, the number of groups created by participants appears to more closely reflect some aspect of response criterion rather than the sensitivity for distinguishing intrapersonal variability from extrapersonal variability (Balas & Pearson, 2017). (b) The robustness of this variable (and response criterion) to blur while sensitivity suffers substantial decrements suggests that these descriptors index distinct aspects of face recognition. Based on these observations, we suggest that our data specifically indicate that blur has a substantial negative impact on sensitivity to face identity but does not appear to impact the response criterion observers apply when deciding how to arrange pairs of faces. The application of a signal detection framework here allows us to draw clearer conclusions about what processes are affected by image blur and consider both telling people apart and telling people together in a common analytical framework.
We continue in our second experiment by examining how face inversion, a manipulation that is well known to substantially disrupt face recognition, also affects performance in a card-sorting task with ambient face images.
Experiment 2
The goal of our second experiment was to determine how face orientation (upright vs. inverted appearance) affected identity sorting for ambient images of unfamiliar faces.
Methods
Participants
Our final sample was comprised of 35 participants (20 females). Approximately half of these participants (N = 18) carried out the sorting task with upright faces, while the remaining participants (N = 17) carried out the task with inverted faces. We recruited an additional eight individuals who were not included in the final sample because they were familiar with at least one of the individuals depicted in our stimulus set. As in Experiment 1, all participants were recruited from the NDSU Undergraduate Study Pool and received course credit for participating. All other recruitment procedures are identical to those described in Experiment 1.
Stimuli
We used a subset of the images described in Experiment 1 for this task. Specifically, we used a total of 48 images (16 per individual) depicting Doutzen Kroes, Lara Stone, and Sylvie Van Der Vaart. This smaller stimulus set was necessary due to the practical limitations imposed by the manner in which we implemented our manipulation of face orientation during the task: To control whether participants viewed each face in an upright or inverted orientation, we mounted each 2″ × 3″ image used in this experiment within a small acrylic frame. These frames had a small base and a slightly receding slope such that each picture was easy to see within the frame when seated at a desk. This allowed us to mount each image in either the upright or inverted orientation and limited participants’ ability to flip the image to make the task easier to accomplish in the inverted condition (participants were also given explicit instructions not to flip the images over during the task). These frames took up a good bit more desktop space than the original cards, however, which is why we limited ourselves to a smaller set of stimuli.
Procedure
All testing procedures were identical to those described in Experiment 1. Participants were instructed to make groups of cards on the desk rather than piles but were otherwise given all the same instructions as described earlier.
Results
As in Experiment 1, we analyzed the results from the upright and inverted card-sorting conditions using SDT descriptors and the number of groups made by participants in each condition. We also report both NHST outcomes for these dependent variables alongside Bayesian analyses (here Bayesian t tests) carried out using JASP.
Sensitivity (d′)
A two-tailed independent samples t test revealed a significant difference, t(33) = 2.91, p = .006, 95% CI of the difference between means = [0.18, 1.05], Cohen’s d = 0.98, in sensitivity to extrapersonal variation between the upright (M = 1.33, SD = 0.88) and the inverted condition (M = 0.71, SD = 0.20). A Bayesian independent samples t test computed using JASP yielded a BF10 value of 7.01, indicating that the data are seven times more likely under the alternative hypothesis than the null hypothesis. We therefore conclude that face inversion does lead to a strong effect on observers’ ability to detect extrapersonal variation.
Response Criterion (c)
A two-tailed independent samples t test revealed that response criterion values did not differ significantly, t(33) = –1.52, p = .14, Cohen’s d = –0.51, between the upright (M = –1.29, SD = 0.50) and the inverted condition (M = –1.06, SD = 0.39). A Bayesian independent samples t test computed using JASP (2018) yielded a BF10 value of 0.78, indicating that the data are about 1.1 times more likely under the null hypothesis than the alternative hypothesis. These values do not provide much evidence for either the null or the alternative hypothesis, but at best suggest that the effect of inversion on response criterion is likely to be small.
Number of Groups
Finally, a two-tailed independent samples t test revealed that the number of groups made in sorting solutions did not differ significantly, t(33) = 1.26, p = .22, Cohen’s d = 0.42, between the upright (M = 11.8, SD = 6.8) and the inverted condition (M = 9.3, SD = 4.8). A Bayesian independent samples t test computed using JASP (2018), yielded a BF10 value of 0.60, indicating that the data are about 1.7 times more likely under the null hypothesis than the alternative hypothesis. From this, we conclude that face orientation does not meaningfully impact the number of groups observers make using these unfamiliar face images.
Discussion
As in our first experiment, the results of our second experiment both align with previous results concerning the way that well-controlled face images are processed and add to what we know regarding how ambient face images are recognized. We found that inverting ambient face images led to substantial decreases in sensitivity to extrapersonal variation (lower d′ values), consistent with the well-known face inversion effect observed when highly controlled images are used, and also when familiar, but highly variable, images are used (Kramer, Jenkins, Young, & Burton, 2017). However, as in Experiment 1, we also note that the manner in which this effect is manifest in our dependent variables is important insofar as it helps us understand which aspects of card-sorting performance are most meaningful when characterizing recognition ability. Specifically, we observed a negative impact on d′ values, but no effect of face inversion on either the number of groups participants made during the task nor the response criterion calculated via our SDT analysis of the data. Again, we suggest that this highlights the importance of examining the nature of this particular task closely so that observers’ abilities are clearly characterized in terms of easily understandable dependent variables. While the number of groups participants made does reveal some interesting aspects of how faces are processed as a function of category, image quality, and so on. This descriptor probably has more to do with response criterion rather than sensitivity to identity information. To our knowledge, there are not extensive discussions of how sensitivity versus response bias are affected by either blur or face inversion, but in any event, our current results offer insights into how complex unfamiliar face images are processed by the application of old techniques for manipulating face appearance. In the case of inverted faces, our results essentially confirm that ambient images lead to much the same kind of behavior as more tightly controlled laboratory images. In the case of image blur, however, our results indicate a far greater sensitivity to image quality than previous studies would have led us to believe. The recent emphasis on using images like these, with largely uncontrolled and broad variation across exemplars of the same face, is important largely for reasons of ecological validity. How much of what we have learned about face recognition from using standardized face databases really applies to real-world settings? Our experiments demonstrate that there is reason for both caution and optimism: Blur and inversion do indeed impact recognition (which we already knew), but the nature and extent of that impact may indeed be different in some meaningful ways for ambient images.
These experiments are a useful first step toward reexamining face recognition within a more naturalistic context. We suggest that investigating the impact of a wide variety of image manipulations on performance with ambient images may yield additional important insights into how real faces are processed when variability is large. Indeed, several other recent studies have already begun to make important contributions to this goal, demonstrating that results obtained from limited-variability faces do not always generalize to ambient face images. We have already mentioned Kramer et al.’s (2018) study describing the utility of internal versus external features for ambient face recognition, in which the authors report that external features do not appear to particularly useful in isolation for ambient unfamiliar face recognition, while faces containing only the internal features were of comparable utility relative to faces containing both sets of features. This result requires us to reconsider the idea of a processing shift favoring internal features over external features as a face becomes familiar and instead focus on how the internal features may be used more effectively in ambient images as familiarity increases. Similarly, Redfern and Benton’s (2017) results concerning the independence of emotion and identity in ambient face recognition also require us to reconsider existing models of face recognition given new results obtained from ambient face images. In this case, the authors report that expressiveness variability interfered with the recognition and learning of new face identities, suggesting that emotional expressions may be more tightly integrated into representations of face identity for ambient faces than the data from limited-variability faces implies. Ambient face recognition therefore already appears to differ from limited-variability face recognition in a number of interesting ways and continued characterization of the information that contributes to recognition in both settings will help us understand the computational differences in how face images are recognized in different experimental settings. For example, while we have only examined spatial frequency content here in the context of blur (a low-pass filtering operation), it would be worthwhile to characterize ambient face recognition with images that have been subject to band-pass filtering as well. It is currently accepted that mid-range spatial frequencies (8–16 cycles per face, Gold, Bennett, & Sekuler, 1999; Nasanen, 1999; Ruiz-Soler & Beltran, 2006) are most useful for recognition, but to what extent is this spatial frequency range an artifact of the kinds of images we have tended to use to characterize face recognition ability? Similarly, horizontal orientation energy carries more information for face recognition than vertical orientation energy (Dakin & Watt, 2009; Goffaux & Dakin, 2010; Goffaux, van Zon, & Schlitz, 2011), but could this also be due to the highly constrained face images the field traditionally presents observers with? While these studies may seem like old wine in a new bottle, we argue that the difference between ambient images and the content of typical face databases is substantial in terms of visual ecology, making these questions quite important. The role of color (Yip & Sinha, 2002), contrast polarity (Russell et al., 2006), and many other manipulations of face appearance are all worth understanding in the context of face images that more closely match the range of images we are presented with in everyday encounters.
Our current results also raise a number of useful questions both with regard to the card-sorting paradigm itself and also with regard to the way ambient faces are learned. For example, in several studies, revealing the number of unique identities in a set of faces was sufficient to dramatically improve task performance (Jenkins et al., 2011). Is this still true when images are subject to blur or inversion? We would predict not, given that fixing the number of groups in our analysis amounts to fixing a response criterion, which we would not expect to change observers’ sensitivity. Perhaps a more interesting question concerns the role of familiarity in promoting robustness to image manipulations like the ones we have employed here. Familiarity with a face tends to make observers resilient to the effects of blur, even in images collected with low-fidelity videos (Burton, Wilson, Cowan, & Bruce, 1999). However, once again we note that this may play out differently in the context of unconstrained grouping of ambient images and so is worth examining closely. Finally, all of these questions are just as easily applied to face learning: What does one need to see in terms of image content and critical features to build robust representations of individual appearance so that recognition can be carried out efficiently with novel images? Considering how various combinations of features support robust recognition after either incidental learning (Andrews et al., 2015) or explicit training (Baker, Laurence, & Mondloch, 2017), would be a further means of understanding how complex, unconstrained faces are processed in natural environments.
In conclusion, we have demonstrated that two basic manipulations of facial appearance lead to measurable effects on the recognition of ambient face images. These effects resemble previous results in some ways but also differ enough to suggest that face recognition in the wild may be somewhat different from face recognition in the lab. We suggest that exploring these differences, even if they are subtle, is an important path for face recognition research to take.
Footnotes
Acknowledgements
The authors thank Amanda Auen and Jamie Schmidt for their help with participant recruitment and Ganesh Padmanabhan and Dan Gu for technical assistance.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
Acknowledgements
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by NSF grant BCS-1348627 awarded to B. B.
