Abstract
Spatial resolution fundamentally limits any image representation. Although this limit has been extensively investigated for perceptual representations by assessing how neighboring flankers degrade the perception of a peripheral target with visual crowding, the corresponding limit for representations held in visual working memory (VWM) is unknown. In the present study, we evoked crowding in VWM and directly compared resolution in VWM and perception. Remarkably, the spatial resolution of VWM proved to be no worse than that of perception. However, mixture modeling of errors caused by crowding revealed the qualitatively distinct nature of these representations. Perceptual crowding errors arose from both increased imprecision in target representations and substitution of flankers for targets. By contrast, VWM crowding errors arose exclusively from substitutions, which suggests that VWM transforms analog perceptual representations into discrete items. Thus, although perception and VWM share a common resolution limit, exceeding this limit reveals distinct mechanisms for perceiving images and holding them in mind.
People’s perception of the visual world is limited by their ability to resolve its elements. This is easily demonstrated by comparing scene perception across visual eccentricities: Although people easily individuate and identify foveated items, the same image becomes blurry and amorphous in the periphery. Although limits on the spatial resolution of perceptual representations have been studied extensively (e.g., Anton-Erxleben & Carrasco, 2013; Whitney & Levi, 2011), this is not so for representations maintained in visual working memory (VWM) after sensory input has faded. A decade of research has revealed that object features are degraded in VWM relative to perception (Bays, Catalao, & Husain, 2009; Bays & Husain, 2008; Fougnie, Asplund, & Marois, 2010; Fougnie, Suchow, & Alvarez, 2012; van den Berg, Shin, Chou, George, & Ma, 2012; Wilken & Ma, 2004; Zhang & Luck, 2008), but it is unknown whether the spatial resolution of VWM is comparably degraded. Ben-Shalom and Ganel (2015) recently measured the precision of distance representations in VWM but not the spatial resolution of VWM, leaving unanswered the question of whether spatial proximity differentially impairs people’s ability to resolve items in VWM and perception.
The visual-crowding paradigm is a well-known means of assessing the spatial resolution of perception (Whitney & Levi, 2011) and attention (He, Cavanagh, & Intriligator, 1996). In crowding, flanking items degrade perceptual representations of targets presented in the periphery (Bouma, 1970; Levi, 2008; Whitney & Levi, 2011). Critically, the target-flanker distance regulates the degree of interference, revealing the limit of perceptual spatial resolution (Bouma, 1970; Levi, 2008; Levi, Hariharan, & Klein, 2002). Consequently, crowding represents a potentially excellent means for comparing spatial resolution in VWM and perception. Moreover, studying how crowding degrades items can reveal much about the nature of VWM representations, just as it has done for perceptual representations. In the case of visual perception, crowding is thought to degrade image representation in one or both of two ways (Levi, 2008; Whitney & Levi, 2011). First, target features may be averaged with or otherwise contaminated by flanker features (cross-item pooling error), which leads to greater imprecision. Second, targets and flankers may be correctly individuated but lack positional fidelity, which results in the report of a flanker as a target (substitution error). These two types of errors can be distinguished using mixture modeling, a technique that discerns the relative contributions to the overall response distribution of multiple sources of information and error. Indeed, recent studies suggest that both pooling and substitution errors underlie crowding in perception (Ester, Klee, & Awh, 2014; Freeman, Chakravarthi, & Pelli, 2012).
In the present study, our goal was to evoke crowding in VWM to characterize its spatial resolution and to compare the effects of VWM crowding with those of perceptual crowding. We adapted a standard perceptual-crowding paradigm to VWM and measured how errors in target reporting changed with the distance between targets and flankers. Strikingly, we found that the spatial-resolution limit of VWM was no worse than that of perception. However, mixture-modeling analyses (Bays et al., 2009; Zhang & Luck, 2008) of the consequences of exceeding such limits revealed the qualitatively distinct natures of perceptual and VWM representations.
Method
Subjects
Twelve subjects completed Experiment 1, and 6 subjects completed Experiment 2. In Experiment 1, an additional 3 subjects did not complete the experiment because they failed to fixate consistently; their partial data were excluded. In Experiment 2, an additional 2 subjects completed the experiment, but they failed to fixate consistently, so their data were excluded. No subject participated in both experiments. All subjects gave written informed consent as approved by the Vanderbilt University institutional review board. Subjects were paid $12 per hour for participation.
Eye tracking
We monitored eye position using an eye tracker (PC-60; Arrington Research, Scottsdale, AZ) controlled by ViewPoint software, the ViewPoint MATLAB toolbox, and custom MATLAB code. Trials in which we detected eye movements were excluded from all analyses. Detailed eye-tracking methods and analyses are provided in the Supplemental Material available online.
General task design and procedure
The basic task design consisted of a standard crowding paradigm in which subjects had to report a feature of one of three bars presented in the display (Fig. 1). Across trials, in a full factorial design, we varied the report feature (orientation vs. location), the representation level (perceptual vs. VWM), and the amount of crowding (i.e., interitem distance; throughout this article, we refer to this interitem-distance manipulation as a manipulation of crowding, and the effects of crowding can be quantified as differences in task performance across levels of interitem distance). In addition, we also varied which of the three bars served as a target on any given trial, to force subjects to maintain all item locations during VWM trials (unlike most crowding paradigms, in which only the central item is ever reported). This four-factor task design meant that we could acquire only a few trials per cell in each experimental session. To obtain sufficient trials for modeling, we therefore required subjects to perform numerous sessions, as detailed later.

Examples of trial sequences in Experiments 1 and 2. Subjects were presented with a black screen with a white fixation dot in the center. In the images in the figure, the empty bottom half of the screen has been cropped from view for simplicity. The stimulus array of target and flanker items appeared at the top of the screen. The sequence in (a) illustrates an orientation-report perceptual trial. A stimulus array of three bars was presented and remained on-screen until the end of the trial. After 1 s, a 500-ms auditory cue signaled the subject to report the orientation of the target bar. After the auditory cue, the red dot cuing the target and the adjustment bar appeared. Subjects reported the orientation of a target bar (indicated by a red dot, which appeared directly beneath the target) by rotating a centrally presented adjustment bar until its orientation matched that of the target. (The bar was superimposed on the fixation dot, which remained partially visible.) Orientation-report visual working memory (VWM) trials were similar, except that the initial stimulus array disappeared from the screen after 1 s, and an additional delay of 800 ms preceded the auditory cue. The sequence in (b) illustrates a location-report perceptual trial. The trial sequence was similar to that in (a), except that the red dot and the adjustment bar reversed roles; the orientation of the adjustment bar signaled the orientation of the target bar, and subjects reported the target’s location by moving the red dot horizontally until it was directly beneath the target bar. Location-report VWM trials were similar, except that the initial stimulus array disappeared from the screen after 1 s, and an additional delay of 800 ms preceded the auditory cue. The displays in panel (c) illustrate high-, medium-, and low-crowding conditions from Experiment 2; Experiment 1 included an identical high-crowding condition, and a low-crowding condition that was intermediate between the low- and medium-crowding conditions of Experiment 2. All items are to scale except that the stimulus lines and fixation dot have been enlarged for visibility. The yellow arrows indicating response adjustment did not appear on-screen and are included here for illustration only.
On each trial, subjects viewed a central white fixation dot and a peripheral stimulus array, followed by a visual cue and an adjustment item; they also heard an auditory cue. All stimuli were presented on a black background. The stimulus array consisted of three white bars (length = 1.69° of visual angle; width = 0.21° of visual angle) spaced equally along an imaginary horizontal line 12.20° of visual angle above the central fixation dot. Bar orientations varied pseudorandomly across items in the range ±45° from vertical, in increments of 10°, with the constraint that no two bars on the same trial had identical orientations. We chose to use this restricted range so that we could maximize the effects of crowding by presenting items as close as possible to one another but not touching. The horizontal center of the stimulus array was also varied randomly across trials.
Representation level: perceptual or VWM
On perceptual trials (Figs. 1a and 1b), the stimulus array was displayed until the subject responded. A 500-ms auditory cue indicating the feature to be reported (for orientation, the spoken word tilt; for location, the spoken word place) began after the stimulus was displayed for 1 s. Immediately after the offset of the auditory cue, subjects reported the orientation or location of the target bar, which was identified by a visual cue. At the same time, we presented an adjustment item that was used for target report (see the next section). The visual cue and adjustment item remained on-screen until the subject finalized his or her response.
On VWM trials, the stimulus array disappeared from the screen after 1 s. That offset was followed by an 800-ms blank delay before the onset of the 500-ms auditory cue (tilt or place) that signaled which feature to report. Immediately after the offset of the auditory cue, subjects reported the orientation or location of the target bar, which was identified by a visual cue. The adjustment item that was used for target report (see the next section) appeared simultaneously with the visual cue. Thus, the total VWM delay was 1,300 ms. As in perceptual trials, the visual cue and adjustment item remained on-screen until the subject finalized his or her response.
For both the perceptual and VWM conditions, the three target positions and two features (orientation vs. location) were cued with equal frequency. In the VWM condition, the paradigm enforced the use of VWM representations that maintained the original, crowded perceptual conditions during the delay interval, because subjects did not know which item was the target and which feature they would be asked to report until after the delay period.
After their responses, subjects in both experiments (save for the first 3 in Experiment 1) were presented with a 500-ms feedback screen reporting their error in degrees of rotation (for the orientation trials) or pixels (for the location trials). Subjects were instructed to use this feedback to do their best to minimize errors.
During the first session of each experiment, subjects performed at least three practice runs. These simplified and shortened runs gradually introduced various features of the task. Subjects were instructed to perform additional practice runs until they and the experimenter were confident in their understanding of all cues, stimuli, and trial types. Subjects completed a mean of 4.1 practice runs (SD = 0.5) in Experiment 1 and 3.3 practice runs (SD = 0.5) in Experiment 2. All practice-run data were discarded and not analyzed. All task displays and response acquisitions were accomplished via custom MATLAB code using the Psychophysics Toolbox (Version 3; Kleiner, Brainard, & Pelli, 2007).
Report feature: orientation or location
On orientation-report trials, a peripheral red dot was used as the visual cue and a centrally presented bar (similar to the peripheral target and flanker bars) was used as the adjustment item. The identity of the target bar was signaled by the location of the red dot, displayed at 1.83° of visual angle directly below the target item. The red dot thus indicated the spatial location of the target bar but was uninformative as to its orientation. Subjects reported the orientation of the target bar by adjusting the orientation of the adjustment bar displayed at fixation so that it matched the orientation of the target bar. The starting orientation of the adjustment bar was randomly chosen from the range ±55° from vertical in increments of 10°. To adjust its orientation, subjects repeatedly pressed the “<” or “>” key; each press rotated the adjustment bar by 10°. When subjects arrived at a satisfactory answer (i.e., when they had matched the orientation of the adjustment bar to the orientation of the perceived or memorized target), they finalized their responses by pressing the quotation-mark key.
On location-report trials, the identities of the visual cue and adjustment item were opposite those on the orientation-report trials: The bar at the center of the screen was the visual cue; its orientation signaled which bar in the stimulus array was the target, and subjects reported the target’s location by adjusting the position of the red dot. The central-bar visual cue had exactly the same orientation as the target bar but differed in that it was located at fixation. Thus, the central bar indicated only the target bar’s orientation, not its location. The appearance of the red dot was identical to that of the red dot on orientation-report trials, except that on location-report trials, the starting horizontal position of the red dot was chosen randomly (see the next section). To adjust the horizontal position of the red dot, subjects repeatedly pressed the “<” or “>” key; each press moved the red dot by 0.17° of visual angle. When subjects arrived at a satisfactory answer, they finalized their responses by pressing the quotation-mark key.
Experiment 1
The design of Experiment 1 was as described earlier, with the following additional characteristics: The horizontal center of the stimulus array was placed randomly within 5.43° of visual angle of the horizontal center of the screen. We employed two levels of crowding distance (high = 1.36° and low = 5.43° of visual angle). On location-report trials, the red dot’s starting location was random within 5.97° of visual angle of the horizontal center of the screen. Experiment 1 was a 2 (representation level: perceptual vs. VWM) × 2 (crowding distance: high vs. low) × 2 (report feature: orientation vs. location) × 3 (three potential target items) design with a total of 24 cells. Before trials were excluded because of breaks in fixation, there were approximately 12 trials per cell in an average hour-long task session. We thus required subjects to attend a series of sessions to obtain sufficient numbers of trials in each cell for modeling. Subjects performed a mean of 12.1 sessions (SD = 1.4), each session containing a mean of 3.6 (SD = 0.35) task runs of 80 trials per run. After rejecting trials on which fixation was broken, we obtained an average of 110.21 trials per cell per subject (SD = 20.76) for the critical cells of the design (central-target-orientation trials). Further details on trial counts, including trial-rejection rates as a result of fixation breaks, may be found in Table S1 in the Supplemental Material.
Experiment 2
The purpose of Experiment 2 was to ensure that the results of Experiment 1 would generalize to other crowding distances. Thus, in Experiment 2, we employed three levels of crowding distance (high = 1.36°, medium = 3.39°, and low = 7.46° of visual angle). To accommodate the larger range of crowding distances, we expanded the range in which the horizontal center of the stimulus array could be placed to be within 7.46° of visual angle of the horizontal center of the display. On location-report trials, the red-dot adjustment item’s starting location was completely random within 8.20° of visual angle of the horizontal center of the screen. All other stimulus characteristics were identical to those in Experiment 1. Experiment 2 was a 2 (representation level: perceptual vs. VWM) × 3 (crowding distance: high vs. medium vs. low) × 2 (report feature: orientation vs. location) × 3 (three target items) design with a total of 36 cells. Before trials were excluded because of breaks in fixation, there were approximately 8 trials per cell in an average hour-long task session. We thus required subjects to attend a number of sessions to obtain sufficient trials in each cell for modeling. Subjects performed a mean of 24.3 sessions (SD = 4.4) of 2.8 task runs (SD = 0.15), each run containing 96 trials. After excluding trials on which fixation was broken, we obtained an average of 128.61 trials per cell per subject (SD = 23.09) for the critical cells of the design (central-target-orientation trials). Further details on trial counts, including trial-rejection rates as a result of fixation breaks, may be found in Table S2 in the Supplemental Material.
Analysis of report errors
All analyses were conducted separately for each experiment using custom code implemented in MATLAB. We analyzed only trials in which the central item was probed, because central targets exhibit the greatest visual-crowding effects (Levi, 2008). Although our design included location-report trials to force subjects to maintain the original location of the stimulus array in VWM, these trials were not analyzed because errors on location-report trials could be due either to location errors or to crowding of the orientation cue that indicated the target item. In addition, location-report trials are uninformative with regard to spatial resolution because increased crowding (by closer flankers) may also be construed as increased spatial information about the central location of the array (i.e., the target location).
First, we ran an analysis of variance (ANOVA) on the error magnitudes from orientation-report trials without considering the direction (clockwise vs. counterclockwise) of these errors. For the ANOVA, we broke down errors by representation level and crowding distance. We then fit these same errors in reporting orientation, except that we then considered their direction as well as their magnitude, with various mixture models (Bays et al., 2009; van den Berg et al., 2012; Zhang & Luck, 2008).
Model implementations
Models were modified from their original forms to remove parameters that varied with set size, because our stimulus arrays always contained three items (one target and two flankers). In addition, for target precision, each model was adapted to use truncated normal distributions rather than circular (von Mises) distributions. We chose to use truncated normal distributions because our paradigm, unlike that in most VWM mixture-modeling studies, required subjects to report feature values over a restricted range that did not “wrap” in a circle. Such a restricted orientation range was necessary to maximize item proximity and thus crowding. Unlike the circular normal probability distributions used in most implementations of these models, a truncated normal distribution is bounded. Preliminary examination of our data revealed that the most extreme response on any trial from any subject was rotated |65°| relative to vertical. Thus, we bounded our truncated normal distribution at ±75° relative to vertical (modeling using truncation at ±89° yielded similar results). This choice ensured that no data were excluded from analysis. It also enabled us to avoid using a wide distribution that would deflate the possibility of obtaining a nonzero guess-rate parameter estimate in those models that included a guess rate.
We adapted the model of Zhang and Luck (2008) and two variants of the model of Bays et al. (2009) to our paradigm. For all models, we fixed the parameter for the mean of the representation distribution at the actual orientation value of the target stimulus. Our implementation of the Zhang and Luck model included an imprecision parameter (i.e., the standard deviation of the truncated normal distribution of the representation of the target’s orientation) and a guess-rate parameter (i.e., the proportion of reports drawn from a uniform distribution). The guess rate corresponds to a complete failure to represent an item, which results in a random guess as to its orientation.
The first variant of the Bays et al. (2009) model (no-guess model) included an imprecision parameter (i.e., the standard deviation of the truncated normal distribution of the representation of the target’s orientation) and a substitution-rate parameter (i.e., the proportion of trials on which the report was drawn from the representational distribution of a flanker’s orientation rather than the target’s). Unlike Zhang and Luck’s (2008) model, the no-guess Bays et al. (2009) model also fit a separate distribution for each flanker representation. The target and flanker representations were constrained to share a common standard deviation (i.e., imprecision parameter), but they were centered on the actual target and flanker orientation values, respectively. In the second variant of the Bays model (combined model), we added a guess-rate (uniform distribution) parameter as in Zhang and Luck (2008). This model was otherwise identical to the no-guess Bays model.
We also considered additional model variants—such as a variable-precision model (van den Berg et al., 2012; see also Fougnie et al., 2012)—but opted not to use them because their multicomponent variable-precision parameters do not clearly map onto an interpretable cognitive construct in the way that a fixed imprecision itself does, or in the way that substitution rate (i.e., a confusion between two represented items in the Bays model) or guess rate (i.e., a failure to encode or maintain an item in the Zhang & Luck, 2008, model) does. Moreover, a preliminary application of the variable-precision model did not reveal any benefit to using this model compared with others we tested.
Model selection
Although previous modeling of visual crowding (Ester et al., 2014) favored the combined Bays model, which included imprecision, substitution, and guess-rate parameters, the no-guess Bays model produced the most plausible and internally consistent parameter estimates of our data. Specifically, in models that included a guess parameter, guess rates and imprecision traded off idiosyncratically across subjects in many experimental conditions (see Figs. S2–S4 in the Supplemental Material; for further evidence that guessing does not drive the present results, see the Supplemental Method and Results and Figs. S5 and S6 in the Supplemental Material). Such parameter trade-off is a hallmark of overfitting (i.e., fitting subject-specific noise variance rather than arriving at useful parameter estimates of the true signal; Pitt & Myung, 2002).
In addition to subject-specific overfitting, systematic inconsistencies emerged from the application of the Zhang and Luck (2008) and combined Bays models to our data. In particular, the parameter estimates suggested that VWM representational fidelity increased, albeit with implausibly high guess rates, as conditions became more crowded. These results were neither predicted nor realistic under any account of VWM or crowding of which we are aware. Guesses appear to play a much-reduced role in our data compared with that of Ester et al. (2014), perhaps because of the stimulus-presentation methodology: Ester et al. presented their stimuli for a 75-ms encoding period, which probably led to frequent trials in which some stimuli failed to be encoded at all. By contrast, we provided a minimum of 1 s of encoding time, thus reducing the likelihood of a total failure to encode any items (Bays et al., 2009). These extended viewing times do not abolish crowding (Intriligator & Cavanagh, 2001; Townsend, Taylor, & Brown, 1971).
Given that the no-guess Bays model provided the most plausible fit to the data, we used its parameter estimates for further statistical analyses. Specifically, we extracted subject-specific and condition-specific (i.e., Representation Level × Crowding Distance) parameter estimates from this model and subjected them to separate ANOVAs in which we treated subject as a random effect.
Results
Nondirectional error in target report
To assess whether crowding affected perceptual and VWM representations differently, we first examined error magnitudes without considering direction of error (Fig. 2; see also Fig. S1 in the Supplemental Material). We analyzed these magnitudes using ANOVAs with crowding distance and representation level (perceptual vs. VWM) as factors. Table 1 presents the results for Experiment 1, and Table 2 presents the results for Experiment 2.

Nondirectional error in target report. Nondirectional error is graphed as a function of condition for Experiments 1 and 2. Error bars represent ±1 SEM. VWM = visual working memory.
Results for Experiment 1: Analyses of Variance on Nondirectional-Error Magnitudes and Model Parameter Estimates
Results for Experiment 2: Analyses of Variance on Nondirectional-Error Magnitudes and Model Parameter Estimates
In both experiments, the ANOVAs revealed a main effect of crowding, such that error magnitudes increased with shorter interitem distance, which is consistent with the typical crowding effect. There was also a main effect of representation level, such that errors were larger on VWM trials than on perceptual trials. Experiment 1 showed no evidence of an interaction: The amount of crowding in VWM was indistinguishable from that in perception. This interaction is the critical test for differential spatial resolution in perception and VWM because the interaction measures whether identical changes in interitem distance lead to different crowding effects in perception than in VWM.
In Experiment 2, although the interaction achieved statistical significance, it appears to have been driven by a floor effect on error in the low-crowding (i.e., high interitem distance) perceptual condition. This conjecture is bolstered by the absence of an interaction in a separate ANOVA that included only the medium- and high-crowding conditions—interaction: F(1, 5) = 1.24, p = .3157, η p 2 = .1588; main effect of crowding: F(1, 5) = 149.28, p = 6.4950 × 10−5, η p 2 = .7404; main effect of representation: F(1, 5) = 82.49, p = .0003, η p 2 = .9053. The floor effect for the low-crowding perceptual condition was unsurprising given that the stimulus separation was 7.46°, which translates to 0.61 times the stimulus eccentricity. Because the critical distance for experiencing crowding in visual perception is typically reported to be 0.1 to 0.5 times the stimulus eccentricity (Bouma, 1970; Levi et al., 2002), error in the low-crowding condition in Experiment 2 should primarily have been driven by the limits of featural precision in peripheral vision for single items (these limits have been documented previously; see, e.g., Ester et al., 2014). Thus, the floor effect for the low-crowding perceptual condition (i.e., high interitem distance) of Experiment 2 leads to an underestimation of the size of the perceptual-crowding effect for low-crowding trials versus medium-crowding trials in Experiment 2.
We should also note that the 10° of rotational error in the low-crowding VWM condition in Experiment 2 should not be taken as evidence that VWM representations are strongly crowded in this condition. Instead, nondirectional error here primarily reflects the level of imprecision with which orientation is represented in VWM under conditions of minimal crowding, similar to measurement of the representational precision of a VWM target in isolation. Indeed, our VWM low-crowding conditions yielded measures of orientation representational fidelity roughly comparable to those previously obtained in VWM for isolated targets or targets with a single distractor (Bays & Husain, 2008; Wilken & Ma, 2004). Put differently, the error in each individual condition provides information about the fidelity of the representation of orientation in that condition, whereas the difference in error between conditions with different target-flanker spacing provides information about the influence of crowding (i.e., spatial resolution) on the representation of that feature.
To sum up the nondirectional-error results, we can safely conclude that, across the two experiments, interitem distance and representation level did not meaningfully interact. In other words, perception and VWM share a common spatial-resolution limit, at least for the granularity of the target-flanker spacings we have tested. Note that this absence of an interaction between crowding distance and representation level might not hold across all possible paradigmatic variations. For instance, the feature values of the flankers may influence the nondirectional error if substitution plays a significant role in crowding. We next report mixture-modeling results that explicitly take into account the feature values of each target and flanker.
Representational imprecision and substitution
Whereas the results of the nondirectional-error analysis suggested that manipulations of interitem distance had the same effect on perception and VWM, unpacking target errors into categories of imprecision (pooling) and substitution (see Analysis of Report Errors section) when using the no-guess Bays mixture model revealed that crowding affects perception and VWM in qualitatively distinct ways (Fig. 3; see also Supplemental Fig. 5). Specifically, we ran separate ANOVAs on imprecision and substitution errors (Experiment 1: Table 1; Experiment 2: Table 2). The results revealed not only main effects in both experiments but also interactions between crowding distance and representation level for both imprecision and substitution errors. The pattern of these interactions was such that perceptual crowding increased both imprecision (Figs. 3a and 3c) and substitution (Figs. 3b and 3d), whereas VWM crowding increased only substitution, leaving the precision of features intact. One possible account of these results is that location representations might be less precise in VWM than in perception, which would lead subjects to confuse which item was cued on VWM trials and thus cause apparent substitution errors. If this were the case, nondirectional errors would be expected to show greater crowding effects in VWM than in perception. Instead, we observed comparable crowding in both representation levels, inconsistent with this trivial explanation of the modeling results.

Parameter estimates from the no-guess Bays model. Imprecision, or σ, is the standard deviation of the truncated normal distribution. It is plotted as a function of condition for (a) Experiment 1 and (c) Experiment 2. Substitution rate, or β, is the proportion of trials on which a flanker feature was reported instead of the target’s feature. It is plotted as a function of condition for (b) Experiment 1 and (d) Experiment 2. Error bars represent ±1 SEM. VWM = visual working memory.
To better understand the interactions obtained from the mixture modeling, we next performed tests of simple main effects to separately assess the consequences of crowding in VWM and perception. We used paired t tests to assess Experiment 1, which had only two levels of crowding distance, and we used one-way ANOVAs to assess Experiment 2, which had three levels of crowding. We first assessed the simple main effect of crowding distance in VWM on imprecision parameter estimates. We observed no significant effect of crowding distance on imprecision in VWM in either experiment—Experiment 1: t(11) = 1.1901, p = .2591, Cohen’s d = 0.3436; Experiment 2: F(2, 10) = 1.2037, p = .3401, η p 2 = .0241. However, we observed that large increases in imprecision were associated with perceptual crowding for both experiments—Experiment 1: t(11) = 7.1836, p = 1.7898 × 10−5, Cohen’s d = 2.0737; Experiment 2: F(2, 10) = 30.0764, p = 5.8854 × 10−5, η p 2 = .9572. These results (along with the results from the main ANOVAs) support our conclusion that, unlike perceptual crowding, VWM crowding did not modulate representational precision; that is, it did not lead to pooling of target and flanker feature values.
We next performed parallel tests on substitution parameter estimates. We observed significant effects of VWM crowding distance on substitution in both experiments—Experiment 1: t(11) = 15.5697, p = 7.693 × 10−9, Cohen’s d = 4.4946; Experiment 2: F(2, 10) = 136.9534, p = 5.4215 × 10−8, η p 2 = .7467. We also observed significant effects of perceptual crowding distance on substitution in both experiments—Experiment 1: t(11) = 5.0561, p = 3.6853 × 10−4, Cohen’s d = 1.4596; Experiment 2: F(2, 10) = 12.8569, p = .0017, η p 2 = .8486. These results (along with the results from the main ANOVAs) support our conclusion that crowding modulated substitution in both VWM and perception, although the interactions in the main ANOVAs also indicate that crowding had a greater effect on substitutions in VWM than on perception.
Discussion
In these experiments, we experimentally evoked VWM crowding for the first time, as far as we are aware, and showed that, contrary to expectations, the spatial resolution of VWM is no worse than that of perception. However, mixture modeling of report errors indicated that exceeding spatial-resolution limits degrades perceptual and mental representations in qualitatively different ways.
That VWM and perception are subject to similar spatial-resolution limits is consistent with the sensory-recruitment hypothesis, according to which VWM representations are perceptual representations maintained after stimulus offset (Ester, Serences, & Awh, 2009; Serences, Ester, Vogel, & Awh, 2009; see also Tsubomi, Fukuda, Watanabe, & Vogel, 2013). However, if VWM were simply time-extended perception, errors from perception and VWM would not be categorically distinct. Instead, we showed that identical crowding leads to dissociable errors for perception and VWM—both imprecision and substitution in perception but exclusively substitution in VWM. Thus, contrary to a strong form of the sensory-recruitment hypothesis, our results indicate that VWM representations may be significantly transformed from perceptual representations.
If substitution errors reflect report of a nontarget item, increased substitution errors resulting from exceeding the limits of spatial resolution in VWM could be taken as evidence in support of a “slot” model of VWM (e.g., Luck & Vogel, 1997). Slot models posit that VWM items are represented using one of a few discrete, indivisible units of resource. It is also possible, however, that substitutions reflect feature-binding errors such that the location of the target is erroneously bound to the orientation of a flanker (Levi, 2008; Pelli, Palomares, & Majaj, 2004), akin to the perceptual phenomenon of illusory conjunctions (Treisman & Schmidt, 1982; Wheeler & Treisman, 2002). Further research is necessary to resolve these alternatives.
It seems certain that exceeding spatial-resolution limits in VWM leads exclusively to substitutions of intact features, whereas exceeding spatial-resolution limits in perception also leads to pooling of feature values across items. How can crowding have such different effects on perception and VWM? We propose that VWM transforms continuous analog perceptual representations into discrete digital mental representations, and under crowded conditions, these discrete VWM representations suffer from all-or-none substitution of either individual features or entire objects. We further suggest that items that are less precisely represented in perception are more susceptible to substitution, which accounts for the shift in error type with the transition from perception to VWM (see also Brady, Konkle, Gill, Oliva, & Alvarez, 2013). Evidently, although perception and VWM share the same spatial resolution, the limit of this resolution reveals distinct mechanisms by which people perceive images and hold representations of those images in mind. It will be up to neurobiological research to reveal the nuts and bolts of these perceptual and VWM representations (e.g., Ester, Anderson, Serences, & Awh, 2013; Sprague, Ester, & Serences, 2014).
Footnotes
Acknowledgements
We thank Shea Littlepage and Mareike Eydt-Beebe for technical assistance. We used resources of the Vanderbilt Advanced Computing Center for Research and Education in the analysis of these data.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This work was supported by National Science Foundation Graduate Research Fellowship DGE1445197 (to A. R. Fintzi). The Vanderbilt Vision Research Center was supported by National Eye Institute Core Grant P30-EY008126.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
