Abstract
Visuospatial bootstrapping is the name given to a phenomenon whereby performance on visually presented verbal serial-recall tasks is better when stimuli are presented in a spatial array rather than a single location. However, the display used has to be a familiar one. This phenomenon implies communication between cognitive systems involved in storing short-term memory for verbal and visual information, alongside connections to and from knowledge held in long-term memory. Bootstrapping is a robust, replicable phenomenon that should be incorporated in theories of working memory and its interaction with long-term memory. This article provides an overview of bootstrapping, contextualizes it within research on links between long-term knowledge and short-term memory, and addresses how it can help inform current working memory theory.
People can remember details of specific events many decades after they happened. These memories can often be effortless and automatic, with the information being retained without conspicuous effort. However, there are also memory processes with different characteristics. These include processes associated with “working memory,” a term that simultaneously implies the effortful nature of the retention alongside the idea that information can be processed or worked upon. Cowan (2005) has suggested a model of working memory that proposes a key role for attention in activating the contents of memory for the purposes of processing and manipulation, describing it as the “few temporarily active thoughts” (Cowan, 2010). Meanwhile, Baddeley (2000; Baddeley, Allen, & Hitch, 2011; see Fig. 1) argued for a dedicated “central executive” that is called into operation when information must be processed or manipulated in memory.

The multimodal working memory model. The figure shows how two systems handling various subtypes of modality-specific information (the phonological loop and the visuospatial sketch pad) interact with a multimodal channel, the episodic buffer. These short-term storage components are subject to influence by processes within the central executive but can also be influenced by information in long-term memory. They are described as “slave systems” to indicate that they are subsidiary to the attentional and control processes of working memory. This diagram is our own attempt to combine the ideas presented by Baddeley (2000), which highlighted the interactions of long-term memory and fluid structures of working memory, with those of Baddeley, Allen, and Hitch (2011), which modified the relationship between the central executive and the episodic buffer as well as expanding how different stimulus subtypes may be accommodated within the model. “Artic” = articulatory rehearsal.
Whereas Cowan’s approach emphasized activation of specific task-related material within long-term memory, Baddeley emphasized discrete working memory processes with circumscribed roles. Computation, manipulation, and processing were separated from short-term storage and held to be carried out by a set of executive or attentional processes. The processes of short-term storage were allocated to “slave systems” in which information was subject to rehearsal. (The term “slave” here describes processes responsible for the retention, but not the manipulation, of information.) These processes were modality-specific, hence the common description of the model as “multimodal.”
There is evidence for the independence of visuospatial and verbal short-term memory (see Darling, Allen, Havelka, Campbell, & Rattray, 2012, and Darling & Havelka, 2010, for reviews) and for independence of long- and short-term memory (Baddeley, 2012). Nonetheless, there is also evidence that long-term memory influences short-term memory: Ease of naming of abstract patterns facilitates their retention (Brown, Forbes, & McConnell, 2006), and short-term-memory capacity is enhanced for sentence-based word sequences (Baddeley, Hitch, & Allen, 2009). Observations of this sort led Baddeley (2000) to propose a new component for the working memory model: an “episodic buffer.” Mnemonic processes mediating between short-term and long-term memory were within the remit of the episodic buffer, whereas processing and manipulation were thought to be within the remit of the central executive.
One class of tasks thought to recruit the episodic buffer are binding tasks, which require the participant to recall the relationship between multiple aspects of a stimulus. For example, Morey (2009) asked participants to remember what a letter was and where that letter was in an array, demonstrating verbal-spatial binding. These bindings were susceptible to concurrent verbal interfering tasks, whereas spatial memory was not, implying that the verbal-spatial binding took place outside the spatial memory system. However, short-term-memory binding tasks do not necessarily encapsulate the interaction with long-term knowledge that is a key feature of the episodic buffer.
Visuospatial Bootstrapping
Visuospatial bootstrapping tasks were designed to address this possible shortcoming. In such tasks, participants verbally recall random digit sequences immediately after they have been presented on a computer screen. This kind of task is an archetypical short-term-memory task. In an initial study, Darling and Havelka (2010) contrasted performance using three display conditions: a control condition in which digits were presented sequentially in the middle of the screen, a linear condition in which digits were presented in a horizontal array across the screen and highlighted sequentially, and a keypad condition, which used an array similar to that in the linear condition but based on the mobile phone “T9” keypad (see Fig. 2). Participants remembered sequences correctly on more trials in the keypad condition than in either of the other conditions. The term “visuospatial bootstrapping” was used to reflect the bootstrapping, or support, of verbal memory by visuospatial memory.

A diagrammatic representation of the visuospatial bootstrapping task, showing five types of stimuli. In all conditions, digits are presented visually, at a rate of around one per second, for verbal recall when the “Recall” message is shown. The extra visuospatial information in the “typical” condition facilitates performance compared to the control condition, and this is what is referred to as “bootstrapping.” The linear condition was used in the study by Darling and Havelka (2010) and was not associated with better recall. The two random conditions have been used in several studies (Calia, Darling, Allen, & Havelka, 2015; Darling, Allen, Havelka, Campbell, & Rattray, 2012; Darling, Parker, Goodall, Havelka, & Allen, 2014; Race, Palombo, Cadden, Burke, & Verfaellie, 2015) and have never produced improved recall.
Parmentier and Andrés (2006) found evidence that recall of a sequence of locations in order is impaired when the path between them crosses itself many times, as would necessarily be the case in the linear array. This could potentially account for the lack of a bootstrapping effect on linear arrays. Darling et al. (2012) sought to replicate the bootstrapping effect using arrays where this issue could not confound recall. Alongside the original control and keypad conditions, two new “random” keypad conditions were added in this study (see Fig. 2). These had the same shape as the “typical” keypad, but the digits were mapped randomly to locations. Only the typical-keypad display was associated with improved memory performance. This pattern suggests that participants accessed knowledge of the keypad display held in long-term memory; otherwise, performance should also have been enhanced in the random-keypad conditions. Lack of facilitation in the random-keypad conditions also excludes the possibility that the apparent improvement in the typical-keypad condition resulted from interference caused by repeating digits in one location. Consequently, bootstrapping was argued to be supported by linkages between verbal short-term memory (the “phonological loop” slave system of the working memory model) and long-term visuospatial knowledge.
Articulatory suppression, the continuous repetition of a syllable or phrase, is an effective inhibitor of activity in the phonological loop (Baddeley, 2000). When carried out during encoding in a bootstrapping task, it has been seen to decrease memory performance in both control and keypad display conditions (Allen, Havelka, Falcon, Evans, & Darling, 2015). However, the impact of articulatory suppression was greater in the control than in the keypad condition, and the positive impact of providing extra visuospatial information was greater when the verbal short-term-memory system was under heavy load due to articulatory suppression. Visuospatial systems potentially provided additional support to verbal short-term memory when it was challenged, and bootstrapping was not eliminated even when phonological-loop efficiency was compromised; consequently, bootstrapping must involve processes beyond the phonological loop.
In a second experiment, Allen et al. (2015) attempted to inhibit spatial encoding by requiring participants to touch a sequence of locations in a predetermined order, a task that is known to impair spatial rehearsal processes (Logie, 2003). This task completely eliminated the keypad-condition benefit, indicating the involvement of spatial short-term memory (the visuospatial sketch pad slave system) in bootstrapping. The role of spatial short-term memory was specific to encoding, because a third experiment showed that tapping at recall did not abolish the bootstrapping effect, ruling out the possibility that participants maintained separate memory traces and then combined them using knowledge in long-term memory at recall. In sum, the bootstrapping effect seems to suggest the storage of a multimodal representation linking traces in separate verbal and visual short-term stores with information in long-term memory. Any system enabling this would resemble the episodic buffer (Baddeley, 2000, 2012; Baddeley et al., 2011) and would also be consistent with results from Langerock, Vergauwe, and Barrouillet (2014), who concluded that the capacity limit for cross-domain verbal-spatial bindings was lower than the capacity limit for unbound features and that domain-specific resources were not involved in the maintenance of the cross-domain bindings.
There are reasons to predict that bootstrapping is independent of executive attention. First, there is increasing evidence that episodic-buffer binding processes are independent of attention (Baddeley et al., 2011; Langerock et al., 2014), and second, the fact that verbal-visuospatial binding in bootstrapping is incidental to task instructions implies a degree of automaticity. Finally, bootstrapping seems to be suited to implementation in a storage-focused component such as a buffer, being a short-term storage task quite different from the processing-oriented working memory span tasks used extensively to understand general functioning of working memory from a psychometric perspective (e.g., Daneman & Carpenter, 1980; Engle & Kane, 2004).
Calia, Darling, Allen, and Havelka (2015) found that similar bootstrapping effects were observed in younger (19- to 35-year-old) and older (55- to 76-year-old) adults, with no evidence of an interaction between age and display type once the decline in overall verbal span was adjusted for. There is evidence (see Park & Reuter-Lorenz, 2009) that older adults’ executive function is less effective than that of younger adults, so the persistence of a largely equivalent bootstrapping effect in older adults supports the claim that bootstrapping does not weigh heavily on attentional or executive resources. Incidentally, Darling, Parker, Goodall, Havelka, and Allen (2014) identified that 9-year-old children showed evidence of bootstrapping, so it seems likely that bootstrapping, consistent with a mature episodic buffer, is present from at least age 9 across the life span.
Visuospatial bootstrapping was also preserved in a group of patients with medial temporal lobe amnesia (Race, Palombo, Cadden, Burke, & Verfaellie, 2015) who showed severe difficulties with learning new material. These amnesic patients showed a bootstrapping benefit. They also showed increased recall for words presented in a sentence context (a sentence superiority effect), an effect previously argued to index episodic-buffer function in the verbal domain (e.g., Baddeley et al., 2009). These data clearly indicate short- to long-term memory links, consistent with the episodic buffer, that are preserved in the case of hippocampal damage. It is worth noting, too, that this study independently replicated bootstrapping in healthy older adult controls.
Bootstrapping is one of a number of phenomena that incorporate an element of “leveraging stored semantic knowledge to improve memory performance” (Race et al., 2015, p. 272). Further examples of this include sentence superiority (Baddeley et al., 2009), familiarity within digit sequences (Jones & Macken, 2015), and expert memory effects in chess (Chase & Simon, 1973), music (Sloboda, 1976), and abacus use (Hatano & Osawa, 1983). We also note work by Smith and Jarrold (2014), who showed that presenting pictorial representations of words alongside verbal presentations facilitated memory performance in typically developing children and children with Down syndrome. Compared to these other long-term-memory-leveraging tasks, bootstrapping invokes a more definitively spatial form of visuospatial processing, related explicitly to location. Furthermore, its key features have been systematically evaluated and are now well understood, enabling confidence in the claim that bootstrapping emerges from an episodic-buffer-like pattern of links between verbal short-term memory, visuospatial short-term memory, and long-term knowledge that are probably quite automatic in nature.
Some authors (Quak, London, & Talsma, 2015) have argued that it makes sense to adopt a multisensory perspective of working memory. Meanwhile, influential approaches to working memory such as the embedded-processes model (Cowan, 2005) and psychometric approaches (Engle & Kane, 2004) do not explicitly invoke dedicated modality-specific storage systems. Exploration of the bootstrapping phenomenon might illuminate this issue. First, processes linking multimodal long- and short-term memory that support bootstrapping are likely to lie below the level of attention and executive processing (Calia et al., 2015). Second, loading the individual modality-specific components of working memory with interfering tasks causes the effect to break down in a predictable way, consistent with modality specificity (Allen et al., 2015). Based on this, it would be premature to dispense with modality specificity yet. Beyond this particular issue, though, the visuospatial bootstrapping phenomenon has useful and novel implications for a range of working memory models, and theorists developing such models should consider how to incorporate these findings.
Future Directions and Applications
It is also worth briefly speculating about processing in the episodic buffer. Our bootstrapping data suggest a process that interacts with long-term knowledge to enable contextual integration across multiple independent stimulus modalities. Superficially, this may not seem very episodic in nature—certainly any one participant in a bootstrapping study may see upwards of 200 individual digits presented, and it seems quite unlikely that he or she would be encoding those individual episodes for future mental time travel (Tulving, 2002). Consider, though, that the episodic significance of a collection of individual co-occurring stimuli may be apparent only some time after an event; therefore, the systems that are necessary for the laying down of an episodic trace may not actually need to initially code those items as an episode. However, the co-occurring stimuli do need to be temporarily coagulated into a unit that has the potential for later encoding. Perhaps the role of the episodic buffer is to maintain constellations of stimulus attributes from across the range of short-term-memory systems, to allow potential encoding by long-term-memory systems depending on biological, situational, or processing needs. Bootstrapping results suggest that spatial information is perhaps particularly important in this kind of process, an assertion that is bolstered by recent work (Guida & Lavielle-Guida, 2014; van Dijck & Fias, 2011) asserting the importance of spatial information in order recall.
This speculation generates testable predictions. First, although its potential automaticity makes it difficult to disrupt the functioning of the episodic buffer independently of disrupting the modality-specific systems, two long-term-memory-leveraging tasks that ostensibly overlap little (e.g., sentence context and item-location binding) should mutually disrupt each other. Second, overloading the episodic buffer in this way might inhibit the formation of episodic traces in long-term memory. Finally, one might expect implicit bindings involving long-term memory, like bootstrapping, to be fairly short-lived in most situations.
It is not desirable for theoretical research to proceed in a vacuum, and the working memory model has a history of enabling considerable clinical advances to be made (e.g., Foley, Kaschel, Logie, & Della Sala, 2011; Gathercole & Alloway, 2008). However, as yet, the episodic-buffer hypothesis has been somewhat slow in deriving applications. One key issue is that it has been difficult for researchers and clinicians to understand how the episodic buffer could be assessed in a practical way. The bootstrapping task is a potentially informative tool to achieve this. It would be simple to adapt the experimental tasks into a useful clinical and research tool that could be administered in minutes. Development of such a test awaits targeted research, but given the resilience of the bootstrapping phenomenon to aging (Calia et al., 2015) and amnesia (Race et al., 2015), this would be a promising avenue to follow.
Footnotes
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
