Abstract
A growing body of scientific evidence suggests that visual working memory and statistical learning are intrinsically linked. Although visual working memory is severely resource limited, in many cases, it makes efficient use of its available resources by adapting to statistical regularities in the visual environment. However, experimental evidence also suggests that there are clear limits and biases in statistical learning. This raises the intriguing possibility that performance limitations observed in visual working memory tasks can to some degree be explained in terms of limits and biases in statistical-learning ability, rather than limits in memory capacity.
Visual working memory is an extremely limited form of information storage. At present, the precise nature of the limit on memory capacity remains controversial, with competing theories favoring either discrete or continuous memory resources (for a review of this debate, see Brady, Konkle, & Alvarez, 2011). However, given the importance of visual working memory in myriad natural tasks (Hollingworth, Richard, & Luck, 2008), the existence of any form of strong limit suggests that the available capacity or resources should be used in an efficient manner. That is to say, visual working memory can be simultaneously limited and yet efficient, in the sense that it makes the most of its limited resources. The focus of the present article is to explore the subtle and important implications of this point, rather than focus on the exact nature of the capacity limit.
In nearly all cases, “efficiency” in visual working memory can be understood as the ability to learn and exploit the statistical structure of the visual environment. Mathematical results on information coding and transmission from the field of information theory (Shannon & Weaver, 1949) demonstrate that efficient information storage and transmission is intrinsically linked to accurate knowledge of the statistics of the information source. This means that the structure of the visual world has strong implications for how information should be encoded in visual memory, and suggests that memory encoding and decoding processes should be sensitive and adaptive to this statistical structure. This perspective points to a deep and surprising connection between visual working memory and statistical-learning ability. An intriguing and open question is to what extent performance limits in visual working memory are caused by limits in statistical-learning ability, as opposed to capacity limits.
Efficient Coding and Statistical Learning in Visual Working Memory
Visual working memory can be conceptualized as an information channel that receives visual signals from the environment and stores this information for further processing at a later moment in time. This process is illustrated schematically in Figure 1. As shown at the left side of Figure 1, the statistical structure of the visual world can be described by a probability distribution over visual signals, indicated by

A model of visual working memory as an adaptive and efficient communication channel (based on a model by Shannon & Weaver, 1949, Fig. 1). The input to visual memory, shown at the left side of the figure, has a probability distribution
Visual working memory is subject to noise, which fundamentally limits the precision of memory representations (Palmer, 1990; Wilken & Ma, 2004). Indeed, the presence of noise is unavoidable for any physical communication channel, but an efficient communication channel can adapt the encoding mechanism to minimize the negative consequences of this noise. In sensory neuroscience, this idea forms the basis of the efficient-coding hypothesis (Barlow, 1961; Geisler, 2008; Simoncelli & Olshausen, 2001). One basic prediction of this hypothesis is that if certain signals are more likely than others in an environment, the encoding mechanism should be optimized to convey more information about the signals that occur most frequently. For example, auditory neurons in the frog are extremely efficient—in fact, close to the theoretical bounds—at encoding naturalistic stimuli such as other frogs’ calls, but are much less efficient at coding unnatural stimuli such as white noise (Rieke, Bodnar, & Bialek, 1995).
We have recently shown that efficient encoding also plays an important role in human visual working memory (Sims, Jacobs, & Knill, 2012). According to results from rate–distortion theory (a subfield of information theory; Shannon & Weaver, 1949), the minimal capacity needed to store a visual feature in memory depends on two factors: the precision of the memory representation and the distribution of feature values in the current context (Fig. 2). Intuitively, storing a visual feature with lower precision requires less memory than storing the same feature with a higher precision (Fig. 2a). Less obvious is the fact that if memory capacity is fixed, increasing the variance of a visual feature in the environment should lead to an increase in the memory error for that feature (Fig. 2b). The intuitive explanation is that as the variance of a distribution increases, an encoding system with a fixed response range must be prepared to encounter a wider range of feature values, leading to a decrement in memory precision. Exactly this result was obtained in an experimental test of the framework (Sims et al., 2012).

Rate–distortion theory defines the minimum memory capacity necessary for storing visual features from a given distribution (here, a Gaussian distribution with a mean of 0 and a standard deviation of 30) at a given level of memory error (a). The accuracy of memory depends on available capacity, as well as on the variance of the features in the current context (b). Human visual working memory has been shown to be consistent with both of these effects (Sims, Jacobs, & Knill, 2012).
Efficient information storage in visual working memory requires that the properties of the information channel adapt to the statistics of visual information in the current context. By adapting to statistical regularities in the visual environment, more information can be stored or transmitted in such a communication channel while using a fixed capacity. Brady, Konkle, and Alvarez (2009) demonstrated this by having participants store bicolored objects in visual working memory. When the objects’ colors contained statistical regularities, participants were able to store more objects. Brady et al. demonstrated that this improvement in performance was likely attributable not to an increase in the total capacity of visual working memory but, rather, to more efficient use of available memory resources, exactly as would be predicted from a data-compression standpoint. Similarly, Anderson, Vogel, and Awh (2013) demonstrated that the presence of perceptual grouping cues such as collinearity among visual features also leads to an increase in memory performance, in terms of both the number of items stored and their precision. From an information-theoretic standpoint, perceptual features such as collinearity introduce statistical redundancy between neighboring items and, hence, reduce the overall demand on memory.
Another prominent line of research shows that subjects are quite successful at extracting summary statistics from visual displays, such as mean orientation or mean size of a number of items (Chong & Treisman, 2003; Dakin & Watt, 1997). The ability to perceive and adapt to the statistical properties of the visual world is a fundamental precursor to having efficient memory representations. Chong and Treisman (2003) argued that storing summary statistics, rather than individual items in a display, might be one way to make efficient use of limited memory capacity. However, the research reviewed in this section suggests that sensitivity to visual statistics also enables more efficient storage of each individual item in memory.
Other research has shown that more complex statistical regularities (e.g., hierarchical dependencies between objects based on ensemble statistics; Brady & Alvarez, 2011; Brady & Tenenbaum, 2013; visual chunks defined by higher-order dependencies between a set of primitive shapes; Fiser & Aslin, 2005; Orbán, Fiser, Aslin, & Lengyel, 2008) can also be learned both in visual working memory tasks and other behavioral tasks. Figure 3 gives examples of different kinds of statistical regularities that can be learned and exploited by subjects to form efficient representations. In general, any statistical property of the input distribution can be regarded as a statistical regularity.

Schematic illustration of a variety of statistical regularities that can be learned and exploited by subjects to form efficient representations of stimuli. Panel (a) shows example stimulus displays that illustrate different kinds of statistical regularities. In the left plot, the orientations of line segments are generated independently, but the mean and the variance of the stimulus distribution can be learned to increase the encoding efficiency, as in Sims, Jacobs, and Knill (2012). In the middle plot, the orientations are generated such that the entire configuration forms a smooth contour. This can be achieved by a stimulus distribution with statistical dependencies between the orientations of adjacent line segments. In the right plot, the stimuli are generated in clusters. This, again, corresponds to a stimulus distribution with statistical dependencies between different stimuli. Panel (b) shows an alternative way of visualizing different types of statistical regularities in the stimulus distribution. The three contour plots show three possible stimulus distributions,
Much of the work described in this section can be viewed as extending a long line of research on the role of chunking in working memory (Miller, 1956). As noted by Miller, more information can be stored in working memory when it is reorganized or recoded into familiar units—a single chunk in working memory can serve as a pointer to a much richer memory representation in long term memory. The contribution of the work reviewed in this section is that it suggests that statistical regularities of visual stimuli in the environment can serve as the rich structure that is encoded in long-term memory; hence, statistical learning enables chunking of perceptual information.
Although statistical-learning ability plays a key role in defining the efficiency of visual working memory, as we discuss below, there are also clear limits to what types of statistical regularities can be successfully learned by individuals. This suggests that visual working memory capacity, as a theoretical construct, may not be easily separable from statistical-learning ability. Indeed, statistical learning may play a key role in explaining developmental effects in visual working memory. It has previously been shown that both the effective number of items stored (Cowan, AuBuchon, Gilchrist, Ricker, & Saults, 2011) and the precision of visual memory representations (Burnett-Heyes, Zokaei, van der Staaij, Bays, & Husain, 2012) increase with age in children. However, it remains to be seen how much of this increase can be accounted for by improvements in statistical-learning ability. Similarly, the empirical finding that experts in a visual domain, such as identifying cars or birds, have a higher visual working memory capacity than do nonexperts (Curby, Glazek, & Gauthier, 2009; Sørensen & Kyllingsbaek, 2012) may also be explainable from a statistical-learning standpoint.
Visual Working Memory as Probabilistic Inference
Visual sensory signals are inherently noisy and ambiguous. The rational response to this fact is to treat visual perception as a problem of probabilistic (or Bayesian) inference. A large body of research has demonstrated that the human visual system exhibits key properties that are consistent with near-optimal probabilistic inference (for reviews, see Kersten, Mamassian, & Yuille, 2004; Ma, 2012). For example, Bayesian inference suggests that when multiple information sources are available, they should be combined and weighted according to their relative reliabilities. In many perceptual tasks, the brain appears to achieve this optimal benchmark (e.g., Alais & Burr, 2004; Ernst & Banks, 2002; Jacobs, 1999; Knill & Saunders, 2003; Trommershäuser, Körding, & Landy, 2011).
Just as optimal cue combination in visual perception requires knowledge of the reliabilities of different information sources, humans are also capable of learning and exploiting the uncertainty inherent to their own visual memories. Brouwer and Knill (2007, 2009) conducted a series of experiments that required participants to reach and touch targets located in the visual periphery (illustrated in Fig. 4). In these experiments, participants had two sources of information to guide their hand movements: information about the target’s location stored previously in visual working memory, and sensory information available from the visual periphery. Each of these two information sources was uncertain, given sensory noise and imperfect memory, and therefore the optimal strategy was to aim for a location given by a weighted combination of the two signals (Fig. 4). Exactly this behavior was observed. Critically, computing the optimal motor plan required that participants possess (implicit) knowledge of both the reliability of the sensory information and the reliability of their visual working memory. Thus, even in simple tasks such as pointing to a target, visual working memory is part of a complex statistical-inference process.

In a task used by Brouwer and Knill (2007, 2009), the goal is to reach toward and touch a target located in the visual periphery. Two noisy sources of information regarding the target’s location are available: the location stored in visual working memory and information available in the visual periphery. Both information sources are uncertain, leading to probability distributions regarding the true target location given memory and vision (indicated by the red and blue ellipses). The optimal aiming location is a weighted combination of these two sources.
Limits to Adaptation in Visual Working Memory
As indicated above, efficient coding of information in visual working memory requires an accurate model of the input distribution to memory. In standard laboratory studies, the input distribution corresponds to the distribution used by the experimenter to generate the stimuli. An adaptive observer can learn a good model of the input distribution by simply observing samples from it during the course of an experiment. Indeed, the observer’s learning of statistical regularities in the input distribution leads to more efficient use of his or her limited memory resources (Brady et al., 2009; Sims et al., 2012). But are people capable of learning and adapting their memory-encoding process to arbitrary input distributions? A large body of evidence suggests that this is not the case. Certain types of statistical regularities are easier to learn for subjects than others (Backus, 2011; Fiser & Aslin, 2005; Michel & Jacobs, 2007; Schwarzkopf & Kourtzi, 2008; Seydell, Knill, & Trommershäuser, 2010). In general, subjects tend to learn more natural or ecologically valid statistical regularities more successfully.
This may be because subjects already have a well-developed model or template for more natural statistical regularities (Michel & Jacobs, 2007); whereas ecologically unrealistic statistical regularities require the learning of new models from scratch. For example, in natural images, collinearity of oriented elements is a strong cue to the presence of contours (as in Fig. 3a, middle). Schwarzkopf and Kourtzi (2008) showed that subjects were very sensitive to collinearity as a cue to the presence of contours, whereas learning to detect contours based on ecologically unrealistic regularities that contained the same amount of statistical information about the presence of contours required a significant amount of training. Even after training, subjects’ contour-detection performance for collinear arrangements was superior, suggesting a clear bias in favor of the more ecologically realistic cue.
Conversely, sometimes the actual input distribution may contain no statistical structure (as is generally the case in standard visual working memory studies), but the subject’s internal model of the input distribution may incorrectly assume such structure (Brady & Tenenbaum, 2013; Orhan & Jacobs, 2013). For example, subjects may assume a model in which feature values of different items in a trial are dependent or correlated (Brady & Tenenbaum, 2013; Jiang, Olson, & Chun, 2000; Orhan & Jacobs, 2013) when, in fact, they are generated independently. We have recently demonstrated in both continuous-recall and change-detection tasks that subjects’ responses are consistent with an internal model that exhibits statistical dependencies between feature values of different items whose strength increases with the similarity between the feature values (Orhan & Jacobs, 2013). The use of an internal model with statistical dependencies that does not match the independent and uniform input distribution used by the experimenter to generate the stimuli results in characteristic biases and dependencies in subjects’ estimates. Similarly, subjects may assume a model in which feature values of items presented in different trials are dependent (Huang & Sekuler, 2010) when, in fact, there are no such dependencies in the actual input distribution.
These types of mismatches between the actual input distribution used by the experimenter and the internal model used by the subject can have significant detrimental consequences for performance in visual working memory tasks (Orhan & Jacobs, 2014). Why, then, can the subject not always adapt his or her internal model to more closely reflect the actual input distribution used in the experiment? Again, a plausible explanation of this mismatch is that the subject’s model is “contaminated” by lifelong adaptation to a rich set of statistical regularities present in the natural visual environment that may be difficult to change during the course of an experiment. For example, orientations of nearby line segments tend to be highly correlated in natural images (Geisler, 2008). If the subject’s internal model partly reflects such statistical regularities observed in the natural visual environment, any input distribution used in a visual working memory task that does not display such statistical regularities will cause a mismatch with the subject’s internal model.
This suggests that experiments using more natural stimulus statistics might yield qualitatively and quantitatively different results from standard visual working memory experiments, which typically use unnatural stimulus statistics. For example, if the subject’s internal model is at least partly adapted to natural stimulus statistics, using unnatural stimulus statistics would underestimate the capacity of visual working memory, because any input distribution that does not match the distribution that visual working memory is adapted to underutilizes it. It is also possible that some prominent phenomena reported in the literature, such as the decline in memory precision with set size or the variability in encoding precision, would either disappear or be much less prominent with more natural stimulus statistics (Orhan & Jacobs, 2014). Hence, researchers should be cautious about formulating general theories of the organization and capacity of visual working memory solely on the basis of experiments using unnatural stimulus statistics.
Conclusion
The study of visual working memory has historically focused on understanding and measuring capacity limits. However, this narrow focus obscures the fact that visual memory performance is a much richer phenomenon. In particular, memory capacity, as a theoretical construct, is of little value without the simultaneous consideration of memory efficiency, or how well the available capacity is utilized in a given setting. In many cases, memory makes efficient use of its available resources by adapting to statistical regularities in the input. However, there are also clear limitations to what types of statistical regularities, or lack thereof, can be successfully learned by subjects. This leads to inefficient use of available resources relative to what can be achieved optimally under a fixed-capacity limit. Thus, capacity or resource limitations and the efficiency with which the available resources are used are two independent factors that jointly determine performance in visual working memory tasks.
The second factor is determined by the subject’s statistical-learning ability—in other words, how well he or she can adapt to statistical properties of the input distribution—and yet it is relatively neglected as a potentially significant contributor to performance in visual working memory tasks. Individual differences in visual working memory performance among the general population, differences in visual working memory performance between normal adults and individuals with cognitive disorders, and developmental and age-related changes in visual working memory may partly reflect differences or changes in how efficiently the available resources can be used (i.e., differences or changes in statistical-learning ability), in addition to any differences or changes in capacity or the amount of available resources.
To what extent these two factors (resource limitations and how efficiently the resources are used) contribute to performance limitations in visual working memory is an open empirical question. This question must be answered with sufficiently powerful experimental paradigms and computational models that carefully consider the role of statistical learning, as well as the relationship between stimulus statistics in the laboratory and visual statistics of the natural environment.
Footnotes
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This work was supported by research grants from the National Science Foundation (DRL-0817250) and the Air Force Office of Scientific Research (FA9550-12-1-0303).
