Abstract
How does the visual system recognize a camouflaged object? Obviously, the brain cannot afford to learn all possible camouflaged scenes or target objects. However, it may learn the general statistical properties of backgrounds of interest, which would enable it to break camouflage by comparing the statistics of a background with a target versus the statistics of the same background without a target. To determine whether the brain uses this strategy, we digitally created novel camouflaged scenes that had only the general statistical properties of the background in common. When subjects learned to break camouflage, their ability to detect a camouflaged target improved significantly not only for previously unseen instances of a camouflaged scene, but also for scenes that contained novel targets. Moreover, performance improved even for scenes that did not contain an actual target but had the statistical properties of backgrounds with a target. These results reveal that learning backgrounds is a powerful, versatile strategy by which the brain can learn to break camouflage.
Camouflage represents an extreme case of figure-ground segregation whereby a target object is effectively disguised against its background, making it hard to distinguish the target even when it is in “plain view” (Cott, 1948; Cuthill et al., 2005; Stevens & Merilaita, 2009; Thayer, 1923). Thus, camouflage—and breaking camouflage—is useful for studying figure-ground segregation in natural visual scenes (Brady & Kersten, 2003; Hauffen, Bart, Brady, Kersten, & Hegdé, in press).
The ability to break camouflage is often a matter of life and death in the wild, as well as in warfare (Cott, 1948; Cuthill et al., 2005; Stevens & Merilaita, 2009; Thayer, 1923). Discriminating hidden targets in complex backgrounds is also a critically important ability in many other areas of human endeavor, such as hunting, reconnaissance, radiology, and machine vision (Barrett & Abbey, 1997; Stevens & Cuthill, 2006; Thayer, 1923; Troscianko, Benton, Lovell, Tolhurst, & Pizlo, 2009; Van Trees, 2001). With training, it is possible to learn to break camouflage (Brady & Kersten, 2003; Cott, 1948; Stevens & Merilaita, 2009; Thayer, 1923). However, the mechanisms by which the visual system learns to break camouflage, and optimal training strategies for improving camouflage breaking or, more broadly, figure-ground segregation in natural visual scenes, remain poorly understood.
Signal detection theory suggests a potentially attractive strategy for breaking camouflage, whereby the viewer learns the statistical properties of backgrounds with versus without a foreground object of interest (or target object) and infers the presence of the target object by comparing the statistics of the given input image with the two learned statistical distributions (Barrett & Abbey, 1997; Green & Swets, 1974; Helstrom, 1968; Tuzlukov, 2001; Van Trees, 2001; also see Golz & MacLeod, 2002; Schwartz, Sejnowski, & Dayan, 2006). There are many advantages to this approach. First, its demands on processing and memory resources are comparatively modest: The observer need not know what the target looks like; one can simply break the target’s camouflage using an “odd-man-out” strategy (Barrett & Abbey, 1997). Second, the observer need know only the subset of the statistical properties of the scene that is relevant to detecting camouflaged foreground objects (e.g., local feature distribution in the neighborhood of the target) and not necessarily a complete statistical description of the background or the target (see Geisler, 2008; Golz & MacLeod, 2002; Sigman, Cecchi, Gilbert, & Magnasco, 2001; Simoncelli & Olshausen, 2001). Finally, this approach seamlessly relates camouflage breaking with learning to break camouflage: The brain simply uses the relevant learned distributions to break camouflage.
Although learning scene statistics is an attractive computational strategy for learning to break camouflage, it is unclear whether the brain actually uses this strategy or whether it can even learn the relevant scene statistics to begin with. In the four experiments reported here, we tested the hypothesis that the visual system can learn to break camouflage at least in part by learning the statistical properties of the background. We found that the visual system can and does learn the statistical properties of the scene.
Experiment 1: Learning-Dependent Improvements in Camouflage Breaking
Method
Participants
Seven adult volunteers (4 females, 3 males) with normal or corrected-to-normal vision participated in Experiment 1. 1
Stimuli
Stimuli consisted of a complex background image that in some cases had a camouflaged foreground object. To generate instances of a given type of background, 2 we used arbitrarily chosen photographs of natural backgrounds (e.g., foliage; see Fig. 1a) as inputs to the texture synthesizer developed by Portilla and Simoncelli (1999, 2000). For each input background, we generated a large number of instances (each of which was 800 × 600 pixels and subtended 22° × 16°). We used uniform backgrounds rather than backgrounds with distractor objects (e.g., we used foliage as opposed to foliage with birds or flowers) to minimize confounds related to clutter and crowding (Levi, 2008; Pelli & Tillman, 2008; Whitney & Levi, 2011). Two background types were used for each subject.

Process used to create camouflaged scenes with (a) face targets and (b) digital embryo targets. To create the stimuli, photographs of different types of natural background (e.g., a bed of mushrooms or foliage) were used as inputs to a texture synthesizer to create new instances of those background types. Each face target was textured using the same background type that it would be displayed against, but with a different instance of that type. Face targets were created with different rotations so they could be shown in any orientation from left profile through right profile. Face and background were then combined to create a camouflaged scene. In some cases, novel, naturalistic three-dimensional objects called “digital embryos” were used instead of face targets to create camouflaged scenes. These scenes were created in the same manner as the scenes for face targets. In (a), the face target indicated by the asterisk is in the upper right quadrant of the camouflaged scene. In (b), the digital embryo targets indicated by the left and right asterisks are in the upper right and upper left quadrants, respectively, of the corresponding scenes.
The texture synthesizer uses a principled, “synthesis-by-analysis” approach based on steerable pyramids to create new gray-scale texture instances from a given gray-scale input image (see Portilla & Simoncelli, 1999, 2000, for details). Note that our experiments did not require that the synthesized instances necessarily be the best possible, or even accurate, statistical representations of the original input background. Nonetheless, the present algorithm yielded instances that were highly accurate representations of the input background by many objective measures (Portilla & Simoncelli, 1999, 2000).
Because the algorithm extracted hundreds of parameters for each input texture and because the parameter values by themselves are not altogether illuminating (see Portilla & Simoncelli, 1999, 2000), we will not enumerate the specific statistical parameters that the algorithm extracted from each given input texture and used for synthesizing new instances of that texture. Moreover, our study was not meant to address which statistical parameters were learned per se, but instead focused on testing whether they were learned at all.
The foreground object, or target, in a given image was either a familiar object (i.e., a face; synthesized using FaceGen, Singular Inversions, Toronto, Canada) or a naturalistic three-dimensional object called a “digital embryo” (Brady & Kersten, 2003; Hauffen et al., in press; Hegdé, Bart, & Kersten, 2008; Fig. 1b).
We created static camouflaged scenes with or without a target using the 3ds Max graphics toolkit (Autodesk, Montreal, Canada). To create a camouflaged scene with a target, we digitally placed a randomly chosen instance of the target type in front of a randomly chosen instance of a background type (Fig. 1a). To avoid transparency effects (Tankus & Yeshurun, 2009), we made the target opaque and applied the same texture type as, but a different instance of, the background texture. It is important to emphasize that the target was in the frontal plane “in plain view,” was not occluded in any way, and did not cast any shadows.
To make the target less easily detectable across trials, we used four different approaches. First, we used 70 different three-dimensional face models to create a large repertoire of target instances. Second, we varied the location of the target across images so it could be in any random location of the scene except the central 50 × 50 pixel region in which the subject was required to fixate at the start of the trial. Third, we varied the size of the target at three different scales (1.8°, 1.5°, or 1.2°). Finally, we rotated the targets randomly about their y-axis from −90° to +90° so that faces could be oriented in any manner from left profile to right profile.
Training and testing procedures
The experiment consisted of a pretraining test phase, a training phase, and a posttraining test phase. During the training phase, subjects learned to detect face targets (and never digital embryos) camouflaged against a given type of background. The background type used during training was counterbalanced across subjects.
Each training trial started with a 1,000-ms period during which subjects fixated a central spot, following which the fixation spot disappeared and a query image (a camouflaged scene) was presented for 500 ms. The query image was followed by a 50-ms white-noise mask. A face target was present in the scene in a randomly interleaved half of the trials and absent in the remaining half. During the subsequent 2,000-ms response period, subjects had to report whether or not a face target was present in the query image by pressing an appropriate button. Visual feedback was provided for 1,000 ms using a 2.2° green or red square, which indicated whether the response was correct or incorrect, respectively.
Note that the sole explicit requirement of the task was to report the presence or absence of a face target; subjects were not explicitly required or even indirectly encouraged in any way to learn the background. It is also worth noting that our camouflage-breaking task differed in three main respects from a conventional visual search task (Boot, Neider, & Kramer, 2009; Wolfe & Horowitz, 2004; Wolfe, Oliva, Horowitz, Butcher, & Bompas, 2002). First, in our case, the target was not easily distinguishable from the background, that is, it was camouflaged against the background. Second, many features of the target varied randomly from one trial to the next (e.g., orientation), which made the target hard to identify. Third, the background in our experiments consisted solely of a uniform texture and did not contain distractors or clutter (Levi, 2008; Pelli & Tillman, 2008; Whitney & Levi, 2011).
Each training block consisted of 120 trials. During the training phase of the experiment, each subject was shown six synthesized instances of one background type. Training blocks were repeated over multiple days until the subject was able to correctly identify faces during a minimum of 70% of trials.
During both testing phases, subjects were shown stimuli in five randomly interleaved conditions: a face target in a learned instance of a learned background type, a face target in a novel instance of a learned background type, a face target in a novel instance of a novel background type, a digital embryo target in a learned instance of a learned background type, and a digital embryo target in a novel instance of a learned background type.
The testing paradigm was identical to the training paradigm except as follows. First, no feedback was provided during any trial. Second, the query image was presented for three different durations: 50 ms, 200 ms, or 500 ms, depending on the trial. Third, the target in the query image, when present, was either a face or a digital embryo, depending on the trial. Subjects were tested for four blocks each during the pretraining and posttraining test phases. The actual camouflaged scene instances used in either testing phase were never shown in the training phase, and vice versa. During the posttraining test phase, the query images were never the same as those used during the pretraining test phase: The size, location, and the orientation of the target, as well as the background instance, varied randomly from one image to the next.
Data were analyzed using programs custom written in R (R Development Core Team, 2008) and MATLAB (The Mathworks, Natick, MA). Camouflage-breaking performance was measured separately for each condition as the discriminability (d′) of the target during that condition (Green & Swets, 1974; Helstrom, 1968; Tuzlukov, 2001).
Results
Figures 2a and 2b show the changes in d′ of two representative subjects before, during, and after camouflage training. Prior to training, the subjects performed relatively poorly for face targets (d′ = 0.42 and −0.27 for Subjects 1 and 2, respectively; ps > .05). This indicates that our scenes met the conventional functional definition of camouflage, namely that the target could not be reliably detected without training (Cott, 1948; Cuthill et al., 2005; Stevens & Merilaita, 2009; Thayer, 1923).

Discriminability (d′) for stimuli shown for 500 ms in Experiment 1. Graphs (a) and (b) show the performance of Subjects 1 and 2, respectively, as a function of block and condition. The five conditions differed in terms of the target type, background type, and instance of the background used. The graph in (c) shows average discriminability across all subjects in Experiment 1 (N = 7) in the pre- and posttraining phases as a function of condition. Error bars indicate ±1 SEM. Averaged performance during the training phase is not shown in (c) because the length of training differed across subjects.
During training, performance increased significantly (Spearman trend test, p < .05). Following training, camouflage-breaking performance improved substantially for both subjects. Discriminability (d′) during the posttraining test phase was statistically significant, as was the improvement in discriminability before versus after training for both subjects (randomization tests, p < .05). By contrast, the subjects’ performance with a randomly chosen, novel background type showed no significant improvement (randomization tests, p > .05). The lack of improvement with the novel background type was not attributable to a particular background type being particularly difficult (e.g., in which camouflage may have been particularly hard to break), because the background types were counterbalanced between subjects such that the novel background for Subject 1 was the learned background for Subject 2, and vice versa. Altogether, these results indicate that the improvements in performance were specific to the background in which the subjects learned to break camouflage.
The improvement in performance for the learned background was not specific to particular instances of the background, as indicated by the fact that the performance was statistically indistinguishable when tested with instances that the subjects had never encountered before (randomization test, p > .05). This transfer of learning across novel instances of the same texture further suggests that the subject learned the general statistical properties of the given type of background rather than specific instances of the background. This is because, as noted earlier, only the general statistical properties of background were common to all instances of this background. Note also that subjects were not explicitly required to learn the properties of the scene.
Learning also transferred to novel targets—that is, learning to detect a given target in a given background improved the ability to detect previously unseen targets in that background, regardless of whether learned or novel instances of the background were used (randomization tests, ps < .05). Thus, learning to break the camouflage of one type of target in a given background also improved camouflage breaking with novel targets in that background (Boot et al., 2009). These results indicate that the improvement in performance did not depend on the type of target per se, but it did depend on the type of background.
Across all subjects, performance was on average 2.46-fold (±0.8 SEM) larger after training than before training (Fig. 2c). A two-way analysis of variance (ANOVA) with subjects and training phase as factors revealed a significant main effect of training phase (p < .05). The training-dependent increase was statistically significant for subjects collectively and for each subject individually (randomization tests by individual subject, ps < .05). Moreover, when the background instances used during the pretraining phase were different than the ones used during the posttraining phase, improvement in performance was just as large (randomization test, p = .44). A one-way ANOVA revealed that performance improved for either background type tested (p > .05), which indicates that the improvement in training was not attributable to a fortuitous choice of background texture. The learning-dependent improvements in performance as a function of stimulus duration in this experiment are described in the Supplemental Material available online.
Experiment 2: Transfer of Learning to Previously Unseen Background Instances
One potential concern about the results of Experiment 1 is that it is possible, however unlikely, that subjects learned to break camouflage by learning the individual instances of the background (e.g., what a given picture of foliage looked like) rather than the general statistical properties of the background (e.g., what foliage looks like). This is conceivable because, although the number of different camouflaged scenes used during the training was rather large (about 7,200 scenes per background type, on average), these scenes were created using only six instances per background type. To rule out the possibility that subjects may have learned individual instances, we ensured that the subject never saw the same image twice in Experiment 2. Thus, even if the subject learned every single image viewed, it would not help him or her to break the camouflage in the next scene, unless the subject learned something about what was common among the scenes, namely, the statistical properties of the background.
Method
Four subjects (2 men, 2 women) participated in Experiment 2. The methodology of Experiment 2 was identical to that in Experiment 1 in all but four respects. First, every trial throughout Experiment 2 featured a new instance of each background, so subjects never saw the same image or background instance twice. Second, a previously unseen background type (e.g., a heap of fruit) was used. Third, only two of the five conditions were used: face targets in novel instances of learned background types and digital embryos in novel instances of learned background types. Finally, only one of the three stimulus durations (500 ms) was used in the present experiment.
Results
Subjects showed significant improvement in performance following training (randomization tests, p < .05 for each subject; see Fig. 3). Furthermore, for both faces and digital embryos, the training-dependent increase in d′ in Experiment 2 was indistinguishable from the corresponding increase in Experiment 1 (randomization tests, p > .05). These results indicate that the increase in Experiment 1 was not attributable to subjects remembering individual backgrounds. This is particularly noteworthy because it implies that in Experiment 1, subjects were able to learn the background using a relatively small training set (i.e., six background instances).

Discriminability (d′) in Experiment 2 as a function of block and condition (a) for an individual subject and (b) across all subjects (N = 4). Error bars indicate ±1 SEM. Averaged performance during the training phase is not shown in (b) because the length of training differed across subjects.
Experiments 3a and 3b: Background Learning by Itself Can Improve Camouflage Breaking
Together, the results of Experiments 1 and 2 indicate that learning to break camouflage results in the learning of the statistical properties of the background. However, they do not by themselves prove that camouflage breaking can be mediated by background learning alone. This is because the camouflaged scenes in these experiments contained an actual target, and it is conceivable that subjects learned features of the actual targets and somehow were able to use this information to break camouflage, with or without the statistical information about the background. Given the results of Experiments 1 and 2, it is all but impossible to envision a relevant scenario that does not require learning thousands of individual target objects or learning that is necessarily statistical. Nonetheless, it is desirable to establish that subjects could learn to break camouflage solely by learning statistical properties of the background.
In Experiments 3a and 3b, we tested the hypothesis that if statistical learning of backgrounds can by itself lead to improved camouflage breaking, then scenes that lack an actual target but nonetheless have the statistical properties of the background with a target should also be able to elicit camouflage learning and camouflage breaking.
Method
This experiment was identical to Experiment 1 in all but four respects. First, only one of the three stimulus durations (500 ms) was used. Second, only faces were used as targets. Third, only two of the five conditions were used: face targets with a novel background and face targets with a learned background. Fourth, during either the testing phases (Experiment 3a) or the training phase (Experiment 3b), the stimuli that nominally contained targets did not contain actual face targets but implicit face targets (Fig. 4a). To synthesize these stimuli, we composited raw images of a background type not used in the previous experiments. These background types included face targets textured with a different texture than used for the background. The composite images were then used as inputs to the texture synthesizer, so that the resulting stimuli reflected the statistical properties of both the background and the target but contained no actual target. Corresponding stimuli without targets were synthesized using the same input image.

Example of the stimulus-creation process and results in Experiments 3a and 3b. In the example shown in (a), a raw background image of a pile of nuts and a face target were used as input to the texture synthesizer. Instances of the resulting output image (two with the target present and two with the target absent) are shown. The graphs show discriminability (d′) for stimuli with learned and novel backgrounds averaged across subjects in the pre- and posttraining blocks. Results are shown separately for (b) Experiment 3a and (c) Experiment 3b. Error bars indicate ±1 SEM.
The same 4 subjects (2 men, 2 women) participated in Experiments 3a and 3b. In the training phase of Experiment 3a, subjects were shown scenes with and without targets. Scenes with targets used stimuli that consisted of an actual target face camouflaged against a previously unseen background. But during the testing phase, the stimuli that nominally contained a target (50% of the stimuli) had an implicit rather than an actual target. Experiment 3b had a complementary design, in which half of the stimuli used in training had implicit targets but stimuli in the testing phase had actual targets.
Results
In Experiment 3a, subjects were unable to reliably distinguish between scenes with an implicit target and scenes without a target before the training (Fig. 4b; randomization test, p < .05); this finding indicates that the two sets of images were visually similar without training. After training with the standard stimuli (i.e., stimuli with an actual face target or stimuli without a target), subjects’ performance for synthesized composites increased significantly for learned backgrounds, even though the subjects were not trained in, and had not even seen, the individual images used during the posttraining test phase (randomization test, p < .05). However, the training did not improve performance for the same implicit target with novel background types (p > .05). These results confirm the hypothesis that learning to break camouflage using images with an actual target leads to improved camouflage breaking when subjects are tested with images with an implicit target.
The results of Experiment 3b were similar, in that subjects showed a training-dependent improvement for the learned background (Fig. 4c; randomization test, p < .05), but not for the novel background (p > .05). These results confirm our hypothesis that learning to break camouflage using stimuli with an implicit target leads to improved camouflage breaking when subjects are tested with stimuli with an actual target.
Together, the results from Experiments 3a and 3b indicate that subjects were able to learn the statistics of the scene with an actual or an implicit target, and they were able to use this information in camouflage breaking. Specifically, subjects did not rely on detecting an actual target object in the scene. These results lend further support to the aforementioned notion that subjects used an odd-man-out strategy by comparing the statistics of the background with a target versus the statistics of the same background without a target.
General Discussion
The results of the four experiments reported here reveal two novel principles of learning to break camouflage. First, learning to break camouflage results in learning the statistical properties of the background. Second, the statistical learning of the background can by itself lead to improved camouflage breaking.
It is important to note that the learning was not specific to a given target but transferred to previously unseen targets in the same background, thus sparing the visual system the burden of having to learn to break the camouflage of each new target anew in a given background. Hence, our results establish, for the first time, that learning scene statistics is a strategy for learning to break camouflage.
From the computational viewpoint, camouflage breaking is simply a case of figure-ground segregation at its most effective (Brady & Kersten, 2003; Cott, 1948; Cuthill et al., 2005; Stevens & Merilaita, 2009; Thayer, 1923). In this sense, learning scene statistics is likely to play an important role in figure-ground segregation in natural visual scenes in general (see Fiser & Aslin, 2002; Schwarzkopf, Zhang, & Kourtzi, 2009; Yi, Olson, & Chun, 2006; Zhang & Kourtzi, 2010; also see Golz & MacLeod, 2002; Schwartz et al., 2006).
This is not necessarily to say, however, that background learning in and of itself accounts for all camouflage breaking, or more generally figure-ground segregation, in all natural visual scenes. For instance, natural scenes often contain other, behaviorally irrelevant distractor objects. From the computational viewpoint, it is eminently feasible to also use statistical learning to detect the target in a cluttered background (Barrett & Abbey, 1997; Hegdé, Thompson, Brady, & Kersten, 2012; Van Trees, 2001). However, because learning scene statistics generally becomes more computationally burdensome as the complexity of the scenes increases, it is possible that the visual system uses additional (or perhaps alternative) strategies to cope with such scenarios. Further studies are needed to clarify this issue.
It is important to emphasize that the visual system may not learn the full complement of the statistical properties used by the algorithm to synthesize the images. Instead, it may learn only the subset of the properties that are relevant to breaking camouflage (Kovacs & Julesz, 1994; Troscianko et al., 2009). For instance, the visual system may learn only the local changes in the visual features when a target is present versus when the target is absent. In this sense, even when subjects are nominally learning to perform a target-detection task, they may be actually learning to perform a background-discrimination task, in which the precise nature of the target is largely irrelevant. This may explain, at least in part, why the learning readily transfers to novel targets.
These caveats notwithstanding, the fact that the visual system can and does learn the statistics of the background, together with the fact that this learning readily transfers to previously unseen target objects, is a novel finding that reveals that background learning is a mechanism by which the visual system learns to break camouflage. Our results also suggest a new, powerful method for human camouflage training, in which subjects learn to break camouflage by learning the backgrounds of interest.
Footnotes
Acknowledgements
We thank Eugene Bart, Daniel Kersten, Mark Neider, John Tsotsos, Zhiyong Yang, and Greg Zelinsky for helpful discussions and suggestions. We thank Nicole Streeb for excellent technical assistance.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This work was supported by the U.S. Army Research Laboratory and the U.S. Army Research Office Grant W911NF-11-1-0105 to J. H.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
