Abstract
It has been suggested that a neural instantiation of the temporary multidimensional representations of objects might be synchrony of firing between the neurons representing the features that co-occur in a given location. In this article, we direct attention to a logical problem that arises when certain synchrony assumptions are applied to real situations in which multiple multidimensional objects are presented. We demonstrate a new behavioral effect that shows that this logical problem coincides with a genuine behavioral problem. Even when a display contains only a small number of objects characterized by features on two dimensions, the representation of the display becomes difficult when, according to our described assumptions, the object representations cannot be simultaneously synchronized on both features. This article outlines a new principle that governs object representation, and the experimental results might be unique behavioral evidence for a neural-based theory of feature binding.
Keywords
Representing multifeatured objects is a fundamental task for the perceptual system. The initial representation of such an object seems to take the form of specialized feature “maps”—one for each feature (e.g., color, shape, and orientation). The information in each map is at least partly independent from the information in other maps (e.g., Treisman, 2006; Treisman & Gelade, 1980; Treisman & Schmidt, 1982). For example, when a scene contains a blue car next to a red bicycle, neurons in the color map would register the presence of the colors red and blue. At the same time, neurons in the shape map would register the shapes of a bicycle and a car. This kind of representation by itself does not contain information that allows the observer to know which color belongs to which object. The features must be bound into multifeatured object representations, in addition to activating the basic separate feature maps. If these integrated representations fail, perceptual errors in binding are experienced (e.g., Treisman & Schmidt, 1982).
Theoretically, the binding could be represented in two different ways. First, there could be a preexisting conjunction detector for each combination. Because the possible combinations of features are limitless, however, this solution is likely to work only for some familiar feature combinations, such as the color red and a strawberry shape, or the color yellow and a banana shape. For other, less frequent and less consistent feature combinations, a post hoc temporary representation of the conjunction must be formed (see Hommel & Colzato, 2009, for a discussion on this issue).
The object-file theory suggests a framework for this kind of post hoc binding. The theory (Kahneman, Treisman, & Gibbs, 1992) suggests that different features that characterize the same object are attached together temporarily via their location, which also acts as the retrieval cue for accessing the object. The theory distinguishes the long-term stored knowledge of object categories (or types), represented as nodes in a recognition network with just one representation for each category, and the temporary episodic representations (object files) that mediate “seeing” the specific objects currently present in their particular instantiation. The spatial location of each object file is its unique retrieval cue, or tag. Different locations that represent different objects are represented by distinctive tags. For example, for three separated objects, three distinctive location tags are created. 1 A temporary object file that is created for each filled location specifies the features that are presently in that location. These files can be matched to the types in the recognition network in order to identify the objects that are present. This theory also allows the representation of multiple identical objects as separate object files with identical contents but different locations, even though there is only one type node for any given stored category.
Lately, we (Goldfarb & Treisman, 2012) have extended this theory and suggested that the representation of temporary object files acts as the platform from which the arithmetic system can count particular subsets of the items present. The perception of number is a basic ability that exists in human adults, human babies, and even some species of animals (e.g., Dehaene, Piazza, Pinel, & Cohen, 2003; McComb, Packer, & Pusey, 1994). People can generate a number label for the overall number of things in a scene (e.g., Piazza, Izard, Pinel, Le Bihan, & Dehaene, 2004; Piazza, Pinel, Le Bihan, & Dehaene, 2007), but can also count the numbers of objects in specific subsets among those things. For example, an observer can notice that there are four objects in a scene and that there are two cars and two bicycles in the same scene. The numbers in subsets are important not only in representing one’s surroundings, but also in allowing one to perform basic mathematical calculations (e.g., Halberda, Mazzocco, & Feigenson, 2008). The overall number of items present in a scene can be identified simply by counting the filled locations in a location map without binding the features they contain. In contrast, however, when an observer needs to count a specified subset of similar or identical objects, the defining feature (or features) of that subset must be specified. Thus, within our framework, the countable units need to be object files of bound features.
What might be the neural instantiation of these temporary multidimensional representations, or object files? One suggestion is that they might be encoded as synchrony of firing between the neurons representing the features that belong to the same object (e.g., Engel & Singer, 2001; Hommel & Colzato, 2009; Vogel, Woodman, & Luck, 2001; von der Malsburg, 1999). Support for this idea has been documented in several experiments with both human and animal participants. (For a review of the evidence, see Engel & Singer, 2001, and Raffone & Wolters, 2001; but see Lamme & Spekreijse, 1998, and Shadlen & Movshon, 1999, for discussion of some controversial assumptions of this theory). In this account, an object file is represented by the firing synchrony of the neurons that represent the features—such as color, size, and shape—that occupy the same spatial location. Each object file is individuated by its location tag, but its features are integrated by sharing the same neural synchrony within that particular location. It is important to note that binding occurs only within locations. Synchrony across different locations does not result in binding the features of different objects together.
We now provide an example of how the synchrony assumption might fit with the object-file theory and then derive a new prediction about counting items in subsets. We then report two experiments in which we tested this prediction.
In the example display shown in Figure 1a, the system needs to represent four letters that are in four different locations and printed in four different colors. The hypothesis is that neurons representing the color red and the X shape in Location 1 share a common synchrony, which we can label Synchrony Correlation a, or SCa. Similarly, the color green and the O shape in Location 2 share a common synchrony, SCb; the color blue and the T shape in Location 3 share a common synchrony, SCc; and the color brown and the S shape in Location 4 share a common synchrony, SCd. From this representation, the arithmetic system can abstract, for example, that there is one X or one green item. Now consider a second case, shown in Figure 1b. In this example, features are repeated. The identity of a feature can be represented only once, and each distinct object file that contains that feature is bound to this identity. This can be done by synchronizing the firing in each location with the identity of the feature the location contains. In the example in Figure 1b, the object files in Locations 1 and 4 can be synchronized with the X shape, and the object files in Locations 2 and 3 can be synchronized with the O shape. From this platform, the arithmetic system can conclude, for example, that there are two Xs or two Os.

Applying the neural-synchrony explanation to object files. In each example stimulus, four letters appear in four locations. If each location has a unique shape and color (a), a specific shape, a specific color, and a specific location can share a common synchrony. For example, neurons representing the X shape and the color red in Location 1 can share a common synchrony, which can be labeled Synchrony Correlation a (SCa). If only letter shapes have to be represented and the letters are repeated (b), the four letter shapes can be represented by synchrony in firing of neurons representing shape and location; locations that contain the same shape will have the same SC. However, if (c) the same letter shape appears in different colors in different locations and both colors and shapes need to be represented (a mixed shape-color association), the stimulus is logically impossible to represent. Neural synchrony can represent which shape is in each location, and it can also represent which color is in each location; however, it is impossible to simultaneously synchronize both the colors and the shapes in all their locations.
According to this theory, there are some situations that are logically impossible to represent. In the case shown in Figure 1c, there are repeated features, and in addition, both colors and shapes must be specified. As in the example in Figure 1a, shape and color must be synchronized in each location. However, in this case, features are repeated. As in Figure 1b, the firing rate associated with the X shape can be represented in Locations 1 and 4, and the firing rate associated with the O shape can be represented in Locations 2 and 3. However, another option is for the firing rate associated with the color red to be represented in Locations 1 and 3, and the firing rate associated with the color green to be represented in Locations 2 and 4. But, in this example, it is impossible to simultaneously synchronize both the colors and the shapes in all their locations.
Notice that in Figure 1c, if the value SCa is associated with the X shape and Locations 1 and 4, and the value SCb is associated with the O shape and Locations 2 and 3, then the observer can see the string “XOOX.” However, the correct colors of this string cannot be synchronized with their locations at the same time. The first X in the string is red; hence, it is clear that the X shape needs to be synchronized with the color red. However, if the system tries to synchronize the red with the X by giving the color red the value SCa, then the color red will also be “seen” in Location 4, even though the object in this location is green. Hence, in this example, the specific color, shape, and location of each object cannot be simultaneously synchronized. It follows that the system can either know how many Xs there are or how many reds there are, but it cannot know both at the same time. Does this theoretically assumed difficulty create a real behavioral problem? In the following experiments, we examined this question by comparing the perception of mixed color-shape associations with that of unique color-shape associations.
Experiment 1
In Experiment 1, we asked participants to compare the numbers of instances in two subsets, each defined by a value on one of two different dimensions: a color or a shape. The participants’ task was to decide if there were more Xs than red objects, if there were more red objects than Xs, or if the number of Xs was equal to the number of red objects. The experiment had two conditions: In one, it was possible to synchronize the relevant features on the two dimensions simultaneously, according to the assumptions we have just described (i.e., there was a unique color-shape association for the relevant features, X and red; see Fig. 2a). Thus, both the Xs and the reds could be synchronized with their locations. In the other condition, such synchronization was impossible (i.e., there were mixed color-shape associations for X and red; see Fig. 2b). We compared both response times (RTs) and error rates in the two conditions.

Illustration of the two conditions in Experiment 1. In one condition (a), the target feature X was uniquely associated with the color green, and the target feature red was uniquely associated with the O shape. Therefore, when both X and red needed to be represented, two different synchrony correlations (SCa and SCb) could simultaneously indicate the locations in which each target feature appeared. In the other condition (b), the target feature X was associated with both red and green, and the target feature red was associated with both X and O shapes. Therefore, it was not possible to simultaneously synchronize both the color red and the shape X in all their locations.
Method
Participants
Eleven undergraduate students from Princeton University participated in the experiment in partial fulfillment of course requirements. All had normal or corrected-to-normal vision.
Stimuli
The stimulus displays consisted of letter strings composed of the letters X and O, colored in red and green. The size of each letter was approximately 1.2°. Each string of five letters appeared in the center of a white screen. Each feature (X, O, red, or green) could appear either two or three times. On some trials, there were more Xs than reds; on others, there were more reds than Xs; and on others, there were equal numbers of Xs and reds. In the unique-association condition, the Xs and the reds never shared a common location (i.e., all Xs were green). In the mixed-associations condition, X and red co-occurred in at least one location: When there were more Xs than reds, either one or two of the three Xs were green; when there were more reds than Xs, either one or two of the three Os were red; and when there were equal numbers of Xs and reds, one X was green. Figure 3 shows the complete set of stimuli. The order of the letters was randomized in each display.

The nine sets of stimuli used in Experiment 1. In each of the two conditions (unique association, mixed associations), there were more Xs than reds on some trials, more reds than Xs on other trials, and equal numbers of Xs and reds on still other trials. The order of the objects in the strings (each shape and color combination) was randomized in the actual displays.
Procedure
Stimulus presentation and data collection were controlled by a Dell computer with an Intel Xeon central processor. Stimuli were presented on a Dell 19-in. monitor. A keyboard was placed on a table between the participants and the monitor, which was approximately 65 cm from where they were seated. Stickers with the labels “same,” “more Xs,” and “more reds” were pasted on the keys “g,” “h,” and “j,” respectively. Participants completed the experiment individually. They were instructed to compare the number of Xs with the number of red letters, and they were told that the possible responses were “more Xs,” “more reds,” or “the same.” Participants were asked to respond as fast as possible but to avoid mistakes. Each trial started with a 1,000-ms white display that was followed by a letter string. The letter strings were chosen randomly for each participant from a list in which 50% of the trials had equal numbers of Xs and reds, 25% had more reds than Xs, and 25% had more Xs than reds; half of the trials chosen belonged to the unique-association condition, and half belonged to the mixed-associations condition. The string on each trial disappeared when the participant responded, and then the next trial began. The computer registered the participant’s responses, as well as the RT (in milliseconds) from the onset of each string to the participant’s response.
Before the beginning of the experimental block, participants were given 4 practice trials, regardless of how many errors they made. They then performed a block of 80 experimental trials.
Results and discussion
The overall mean error rate was 0.5% for the unique-association condition and 6.7% for the mixed-associations condition. The difference between conditions was significant, t(10) = 3.34, p < .01. For the trials responded to correctly, the mean RT was calculated for each participant in each condition. A paired-samples t test was applied to these data. Responses to letter strings with unique color-shape associations were significantly faster (2,464 ms) than responses to letter strings with mixed color-shape associations (3,232 ms). This 768-ms effect was significant, t(10) = 4.77, p < .001.
To sum up, a large effect was observed for both the RT and the error data. Strings that had unique color-shape associations and therefore—according to our assumptions—could be synchronized neurally were counted faster and with fewer errors in comparison with strings that had mixed color-shape associations and therefore—according to our assumptions—could not be simultaneously synchronized.
Experiment 2
In Experiment 1, the strings with mixed color-shape associations (the ones we assumed are impossible to synchronize) were composed of Xs (the target shape feature) that were printed both in red (the target color feature) and in green. The strings with unique color-shape associations (the ones we assumed are possible to synchronize) were composed of Xs that were never printed in the target color feature (red). This observation suggests a possible alternative explanation for the relative difficulty participants experienced in responding to the strings with mixed color-shape associations: If the perceptual system has difficulty in identifying or counting two target features when they share the same location, then the same result would be observed regardless of whether the strings theoretically can or cannot be synchronized. Experiment 2 was designed to rule out this shared-location explanation. In this experiment, we added another condition in which all Xs were printed in red. That is, the two target features always shared the same location. If the shared-location explanation is valid, then participants should have difficulty responding in this condition. However, if our synchrony assumptions are correct, and consistent color-shape associations are therefore represented more easily than mixed color-shape associations, then it should be easy for participants to count features in this condition (because both red and X can have the same SC value).
Method
Stimulus presentation and data collection were controlled by a Dell laptop computer (Latitude D830) with an Intel central processor and 15-in. monitor. The method of Experiment 2 was the same as that of Experiment 1, except for the following changes. In this experiment, three experimental conditions were included: In one condition, the letter strings had mixed color-shape associations (and were impossible to synchronize, according to our assumptions). In the other two conditions, the letter strings had unique color-shape associations (and were possible to synchronize, according to our assumptions). In one of the latter two conditions, the two target features always appeared in different locations, and in the other, the two target features always shared the same locations. The mixed-associations condition was identical to the mixed-associations condition in Experiment 1. The unique-association condition with target features in different locations was identical to the unique-association condition in Experiment 1. The unique-association condition with target features that shared the same locations was not included in Experiment 1. In this condition, the Xs were always red (e.g., the stimulus “XOXOO,” with the letters printed in red, green, red, green, and green, respectively). Note that in this kind of display, when the number of Xs is compared with the number of red objects, the correct response can only be “the same.” Hence, in order to compare the three experimental conditions correctly, we analyzed only “the same” responses in the other conditions as well.
Six volunteers, with normal or corrected-to-normal vision, participated in this experiment. The three conditions were randomly mixed, with 20 trials in each conditions. In addition, 20 filler trials in which there were unequal numbers of Xs and reds were randomly mixed into the sequence of trials.
Results and discussion
For trials on which the correct response was “the same,” the overall mean error rate was 8.4% for the mixed-associations condition, 1.8% for the unique-association condition with target features in different locations, and 0% for the unique-association condition with target features that shared the same locations. For trials responded to correctly, the mean RT was calculated for each participant in each condition. The mean RT across participants was 3,690 ms for the mixed-associations condition, 2,963 ms for the unique-association condition with target features in different locations, and 2,399 ms for the unique-association condition with target features that shared the same locations. A one-way analysis of variance, with condition as a within-participants factor, revealed a significant effect of condition on RT, F(2, 10) = 37.95, MSE = 66,253, p < .001. As in Experiment 1, RT was slower in the mixed-associations condition than in the unique-association condition with target features in different locations, F(1, 5) = 19.24, MSE = 82,342, p < .01. In addition, RT in the unique-association condition with target features that shared the same locations was not slower than RT in the unique-association condition with target features in different locations, but rather was faster, F(1, 5) = 50.46, MSE = 18,946, p < .001. This last result suggests that the findings of Experiment 1 cannot be explained by a difficulty in counting target features that shared the same location. In addition, the finding that RT was slower in the mixed-associations condition than in the unique-association condition with target features in different locations replicated the results from Experiment 1. Thus, our results were again consistent with our prediction that features would be counted faster when the strings’ relevant features had unique associations rather than mixed associations.
General Discussion
The synchrony hypothesis of binding suggests that neurons that code a certain feature fire in synchrony with neurons that code other features of the same object (e.g., Engel & Singer, 2001; von der Malsburg, 1999). This hypothesis offers an elegant solution for binding different features together to form a temporary object-file representation. However, in this article, we have directed attention to a certain logical problem that arises when the synchrony assumptions are applied to situations in which the same features are paired differently in different objects. It seems that this logical problem results in a genuine behavioral problem.
In two experiments, we asked participants to compare the number of Xs with the number of red objects in each of a series of displays. Given that each feature appeared only two or three times in each display, this task was not arithmetically demanding. Although the required level of computation remained the same in all the experimental conditions, the stimuli that the perceptual system had to represent were manipulated. These stimuli, according to our assumptions, either allowed synchronization of features within the object files (i.e., when the objects in the strings had unique color-shape associations) or did not allow synchronization (i.e., when the objects in the strings had mixed color-shape associations). The results revealed that when the strings had mixed color-shape associations (and were therefore theoretically impossible to synchronize), participants had behavioral difficulties in counting the features.
This study demonstrates a new behavioral principle that governs object representation. When shapes are repeated in several locations and have mixed color-shape associations, they are hard to perceive. In addition, we suggest that the findings might offer unique behavioral evidence for a neural-based theory of feature binding. According to the assumptions of the neural-synchrony hypothesis, letter strings with mixed color-shape associations should be impossible to synchronize simultaneously. The behavioral difficulty our participants demonstrated in the mixed-associations condition fits with this prediction.
Although the findings fit nicely with our assumptions, we should consider whether they could also be explained by some other hypothesis that does not involve neural synchrony. Because none of the alternative theories directly predict the behavioral effect we observed, new assumptions must be added to them in order for them to accommodate these results. For example, it has been suggested that bound objects could be represented in a topographical saliency map (e.g., Koch & Ullman, 1985). According to this theory, a location becomes more salient the fewer features it shares with other locations; the more salient a location is, the more it “earns” priority in processing. In the experiments reported in this article, all the features were repeated, so it is not clear that there would be any differences in salience across locations. Although it is possible that the conflict reflected in this new effect results from conflict in a saliency map, this result is not directly predicted by this alternative theory.
Note that although participants demonstrated difficulty in counting features in strings that were theoretically impossible to synchronize, they were able to count the features eventually. Our hypothesis is that strings that are impossible to synchronize simultaneously can nevertheless be synchronized for one feature after another. In each new synchrony, new object files must be created, and this might explain the RT cost that we observed in this study.
Footnotes
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This research was supported by fellowships from the Israel Science Foundation (Bikura), the Rothschild Foundation, and the Advancing Women in Science program of the Weizmann Institute of Science (L. G.) and by Grants 2004 2RO1 MH 058383-04A1 and EY016975 from the National Institutes of Health (A. T.).
