Abstract
Experimental measures of working memory that minimize rehearsal and maximize attentional control best predict higher-order cognitive abilities. These tasks fundamentally differ from clinically administered span tasks, which do not control strategy use. Participants engaged in concurrent articulation (to limit rehearsal) or concurrent tapping (to limit attentional refreshing) during forward and backward serial recall with each of three distinct stimulus sets: digits, line drawings of common objects, and images of nonsense symbols. The span tasks used common clinical stopping and scoring procedures. Scores were highest for digits and lowest for novel symbols in all combinations of direction and concurrent task. Furthermore, concurrent articulation and concurrent tapping interfered with backward recall to the same degree. Together, these findings indicate that clinically administered immediate serial recall tasks depend on both rehearsal and long-term lexical knowledge making it difficult to use these tasks to separate problems in language ability from problems in attention.
In the past three decades, the tasks used experimentally to uncover mechanisms of short-term and working memory have diverged from the tasks used clinically to make decisions about diagnoses and interventions. Since the advent of complex span tasks such as Reading Span (Daneman & Carpenter, 1980) and Counting Span (Case et al., 1982), experimental tasks have become increasingly varied. Nonetheless, a commonality is that these tasks introduce manipulations to control which mnemonic strategies—if any—are available to the test-taker. Meanwhile, clinical interpretations of short-term and working memory are typically based on standardized measures available at the local school or clinic [e.g., Kaufman Assessment Battery for Children (Kaufman & Kaufman, 2004); Wechsler Intelligence Scale for Children (WISC-III; Wechsler, 1991)]. These batteries tend to include adaptive immediate serial recall tasks—such as forward and backward digit span—which are based largely on procedures developed over a century ago by Ebbinghaus (see Dempster, 1981). Although immediate serial recall played a major role in establishing early models of short-term and working memory (e.g., Baddeley & Hitch, 1974; Estes, 1972; Miller, 1956), these clinical testing procedures rarely inform mechanisms within contemporary models. Researchers’ failure to systematically translate empirical findings to clinical settings constrains clinicians’ ability to interpret performance on both forward and backward span tasks with reference to contemporary understanding of short-term and working memory.
If interpretations can be translated directly from the experimental literature, poor performance on clinically administered forward and backward span tasks would be dissociated into short-term and working memory deficits, respectively. Short-term memory refers to the temporary storage of information; working memory reflects the additional manipulation of information currently maintained in short-term memory (see Adams et al., 2018, for a complete review). In clinical forward span tasks, the test-taker is asked to repeat, in order, a list of items read aloud by a clinician. The lists become increasingly longer until the test-taker misses a predetermined number of lists, at which point testing stops. Typically, the tests are scored by tallying the number of lists correctly recalled; thus, larger scores generally indicate that longer lists could be accurately stored in and retrieved from short-term memory. Administration procedures are similar for backward serial recall, except that the test-taker is asked to recall the list beginning with the last item presented and end recall with the initial item presented. Because backward serial recall requires both storage and processing of the items on the list, it is typically considered a test of working memory.
One challenge to interpreting clinically administered forward and backward span tasks is that experimental testing procedures which have informed contemporary theories of working memory differ from clinical testing procedures (e.g., Miyake et al., 2000). In particular, two features of clinical administration—(a) the adaptive stopping rule based on whole list scoring and (b) list reversal as the working memory processing component—prevent generalization of experimental interpretations to clinical procedures. In the current study, we assessed individual differences in performance under these standard clinical testing procedures, but with two manipulations common in experimental procedures. Specifically, we manipulated the strategy use by imposing dual tasks, and we manipulated the availability of long-term linguistic support by using multiple different stimulus sets. These manipulations allowed us to test whether two basic assumptions can be generalized to clinical testing procedures. The first assumption concerns strategy selection during short-term and working memory tasks. Forward recall is associated with phonological short-term memory; short-term memory tasks predominately invoke the maintenance strategy verbal rehearsal (Unsworth & Engle, 2007). Backward recall is associated with working memory; working memory tasks predominately invoke attention (Unsworth & Engle, 2007), and, thus, the maintenance strategy attentional refreshing. The second assumption concerns the interactions of working memory and long-term memory. Digit span tasks are thought to test memory for order information but require only minimal effort to maintain the identities of specific items in immediate memory (Baddeley, 2012). The logic underlying this assumption is that the digits 1 to 9 comprise a finite set of highly familiar items—all of which should be readily accessible to working memory without effortful retrieval from semantic long-term memory. Therefore, digit span tasks are considered benchmark measures of individual differences in short-term and working memory capacity independent of individual differences in language ability.
Rehearsal and Refreshing as Maintenance Strategies
Rehearsal, the repetition of the to-be-remembered items’ verbal labels, supports maintenance of phonological traces in short-term memory. Rehearsal can be used to maintain information presented in the auditory or visual modality—if that information is recoded into a verbal-phonological form (Baddeley et al., 1975). After rehearsal is fully developed, it requires some attentional resources to initiate but then occurs with minimal effort (C. C. Morey et al., 2013).
Rehearsal is differentiated from refreshing, a modality-independent strategy which has been described as effortfully attending to (or “thinking about”) a memory trace (Camos et al., 2011). A memory trace begins to decay once it leaves the focus of attention. Cycling a decaying memory trace back into the focus of attention reactivates the trace and protects against decay (Cowan, 2008). Although refreshing requires attention from childhood into adulthood, it remains domain-general so can be used for any type of material. Because refreshing relies on the domain-general focus of attention, it competes with other processing tasks for attentional resources (Camos, 2017).
Manipulations of rehearsal and refreshing in experimental settings have dissociated these strategies. During concurrent articulation, the participant is asked to repeat aloud an unrelated syllable or word (e.g., “blah”) which blocks rehearsal by occupying sensory-motor pathways needed for speech production and articulation (Baddeley et al., 1975). Blocking rehearsal with concurrent articulation dramatically reduces performance on forward digit and word span tasks, although it does not affect serial recall with spatial stimuli (Alloway, 2010) or performance on complex span tasks which measure working memory (Baddeley et al., 1975). Thus, rehearsal is considered a strategy for maintaining verbal materials in short-term memory but is considered less useful for storing nonverbal materials or supporting working memory. Consequently, scores on forward digit span should reflect individual differences in underlying limits of storage capacity plus any benefits gained from rehearsal.
Requiring participants to engage in a concurrent complex tapping task imposes attentional demands which limit refreshing (Kane & Engle, 2000). Tapping tasks have been shown to disrupt multiple executive functions associated with frontal cortical regions but not disrupt passive storage associated with the medial temporal lobe (Moscovitch, 1994). Concurrent tapping impairs the immediate forward recall of nonverbal spatial materials that relies on refreshing but does not disrupt forward recall of verbal materials (Alloway, 2010). Thus, refreshing is synonymous with the attention-based processing within working memory (Camos et al., 2009). Scores on backward digit span should then reflect individual differences in the pool of attentional resources available to refresh test items in the face of increasingly greater processing demands. Based on these posited connections between strategy use and short-term and working memory, a convenient interpretation of poor performance on forward serial recall could be a failure of phonologically based rehearsal strategies, whereas poor performance on backward recall could be a failure of attention-based refreshing. However, experimental manipulations of refreshing during working memory have utilized complex span tasks rather than the backward span tasks usually included in clinical testing batteries.
Differences Between Clinical and Experimental Procedures
Forward and backward digit spans remain popular measures of short-term and working memory for a variety of practical reasons. The instructions for forward and backward digit span are easy to understand. Adaptive stopping rules allow digit spans to be completed within 3–5 minutes with minimal distress from overly difficult testing demands. Despite their clinical advantages, adaptive forward and backward span tasks are less commonly included in experimental paradigms. These paradigms often eliminate serial recall altogether—thus minimizing the usefulness of rehearsal—by presenting simultaneous visual arrays of colored squares or rotated lines (e.g., Zhang & Luck, 2008). When an experimental task does require serial recall, the task also constrains participants’ rehearsal use. For example, rapidly presenting digit lists of unknown length leaves participants no time to engage rehearsal (Bunting et al., 2006). Similarly, complex working memory span tasks such as Operation Span or Reading Span impose a secondary information processing task between presentations of to-be-remembered stimuli. This secondary task occupies the delay in which rehearsal would otherwise occur (Unsworth & Engle, 2007).
Preventing rehearsal reveals underlying capacity limits of attention within the working memory system (Cowan, 2008). Those experimental paradigms that effectively isolate individual differences attentional control are repeatedly shown to predict higher-level skills such as reading comprehension (Daneman & Carpenter, 1980), executive function abilities (Miyake et al., 2000), general intelligence (Turner & Engle, 1989), and college entrance exam scores (Unsworth & Engle, 2007). It is precisely the presence of rehearsal that attenuates the relationship between forward serial recall and more complex cognitive abilities. Only under specific testing and scoring procedures are individual differences in attentional control maximized relative to the contributions of rehearsal during immediate serial recall. Procedures that randomize the order of list lengths, include supraspan lengths, and score performance based on the proportion of items correctly recalled (rather than whole list scoring) are most sensitive to individual differences in attentional control (Unsworth & Engle, 2007). Incidentally, in free recall of supraspan lists, concurrent tapping only impaired recall of items presented in earlier list positions. Early list items are presumed to have been displaced from the phonological short-term store so require working memory resources for maintenance and retrieval (Moscovitch, 1994). The finding that concurrent tapping only disrupts recall for items no longer in the short-term store reinforces the connection between refreshing and attentional control component of working memory.
Despite the importance of supraspan lists and randomized list lengths, clinical testing procedures incorporate neither of these methodologies. Clinical procedures dictate that lists initially contain only two or three items and get progressively longer only as long as the test-taker recalls the list correctly. Thus, list lengths are presented in ascending rather than randomized order. Moreover, supraspan lists contribute little to clinical scoring procedures because stopping rules minimize testing at supraspan lengths. A supraspan list is unlikely to be correctly recalled in its entirety because, by definition, the number of items in the list exceeds the number of items an individual can encode and store in short-term memory. However, some proportion of items from supraspan lists may be correctly recalled, adding information when scored as proportion correct. Therefore, the adaptive stopping rule and whole list scoring procedure leave open the question as to whether the most common clinically administered measure of working memory—backward digit span—taps refreshing independent of rehearsal.
Isolating Short-Term and Working Memory From Linguistic Long-Term Knowledge
Digit span tasks, in particular, are considered a measure of short-term and working memory isolated from long-term memory (Baddeley, 2012). In theory, this makes forward and backward digit span useful in separating individual differences in cognition from individual differences in experience or environment, which can influence vocabulary development (Hoff, 2006). In immediate serial recall tasks, the test-taker must remember both the individual items on the list as well as the order (or reverse order) of those items. Long-term memory traces act as templates for the reconstruction of partially decayed short-term memory traces of individual items (Gathercole et al., 1999). Therefore, long-term memory is thought to be particularly beneficial when the test draws items without replacement from a large set of potential stimuli because a robust semantic network can be called upon to support item memory as a necessary precursor to remembering item order. In short-term memory tests that utilize large pools of stimuli, larger vocabularies provided faster and more accurate reconstruction (Gupta & Tisdale, 2009). In contrast, digit span is considered a purer test of item order because the possible items are highly familiar and restricted to a closed set, which repeats from list to list, eliminating uncertainty for item memory (Baddeley, 2012).
Despite the assumption that digits’ highly familiar, closed-set nature minimizes contributions from long-term memory, recent laboratory manipulations have demonstrated consistent effects of long-term memory on serial recall even when items are drawn from familiar, closed, repeating sets (Neath & Surprenant, 2019). During free recall, in which items may be recalled in any order, words that are drawn from a shared semantic category are recalled more often than unrelated words. Strong interitem semantic associations also appear to benefit immediate serial recall (Tse & Altarriba, 2007) even when reherasal is prevented with articulatory suppression (Poirier & Saint-Aubin, 1995). However, Neath and Surprenant (2019) and Poirier and Saint-Aubin (1995) both used a consistent list length of six items; Tse and Altarriba (2007) used a consistent list length of seven items. Presenting consistent list lengths allowed both studies greater sensitivity to detect differences in attentional control because the proportion correct scoring procedure provides information even when list recall is not perfect. Consequently, it is unclear whether a clinical testing procedure, with adaptive stopping rules and whole list scoring, is also sensitive to differences in item frequency and semantic associations stored in long-term memory. If clinical testing procedures are not sensitive to these effects, the question would remain as to whether (a) lexicality effects emerge only when attentional control is engaged, or (b) lexicality effects are too subtle to be captured with whole list scoring procedures. However, if clinical testing procedures are sensitive to item-level differences in lexical long-term memory, these procedures may also be sensitive to individual differences in lexical long-term memory. This would pose a problem for test interpretation because short-term memory estimates may not be independent of language ability or experience.
Current Study
The first goal of the current study was to determine if clinical testing procedures for immediate serial recall, namely, adaptive stopping based on whole list scoring procedures, are sensitive to lexical characteristics of items, as is routinely observed in highly controlled laboratory studies with experimental task procedures. To this end, we administered adaptive forward and backward span tasks with three stimulus sets: visually presented digits, line drawings of concrete objects, and images of novel nonsense symbols. Performance was scored as the total number of lists correctly recalled. If clinical testing procedures can detect variation due to differences in stimulus characteristics, then participants who are not engaged in a concurrent task should benefit from the high word frequency and strong semantic associations of digits, resulting in higher scores on the digit-based tests relative to the other two sets of stimuli. Recall should benefit less when items are not strongly associated, even if their verbal labels are familiar—as is the case with line drawings of unrelated concrete objects. Finally, participants should find it most difficult to recall drawings of novel nonsense symbols as these—much like nonwords—have no preestablished lexical entries in long-term memory and thus no interitem associations.
A second goal was to determine if participants’ spontaneous selection of strategy (rehearsal or refreshing) is tied to direction of recall. Forward span—a test of short-term memory—should be associated with rehearsal while backward span—a test of working memory—should be associated with refreshing. To this end, separate groups of participants engaged in either concurrent articulation or concurrent tapping while completing forward and backward serial recall with each of the three stimulus sets. Alloway (2010) observed that concurrent articulation, but not concurrent tapping, interfered with forward serial recall of digits using clinical testing procedures. We anticipated replicating her finding and extending it to pictures of unrelated objects and drawings of novel nonsense symbols. Previous literature indicates that concurrent articulation minimally interferes with complex span tasks while concurrent tapping interferes with nonverbal materials and items from supraspan list positions. Therefore, if backward span taps attention similarly to other working memory tasks, then performance will be impacted by concurrent tapping more than concurrent articulation. Finally, the present design also allows us to explore which maintenance mechanism(s) give rise to lexicality effects. If short-term memory taps the lexical network via phonological processes, then only concurrent articulation should eliminate the recall advantage that digits hold over unrelated objects and novel nonsense symbols. However, if both concurrent tasks diminish lexicality effects, then access to both phonological processing and attention is necessary.
Method
Participants
Sixty-eight college students (34 females and 34 males), ages 18–25 (M = 20.4; SD = 1.7) participated in this experiment. Participants were recruited through a university subject pool and were compensated with either course credit or US$5 per half hour spent on the study. All participants reported having normal hearing and normal (or corrected-to-normal) vision. Two additional participants were excluded for being non-native speakers of English.
Procedure
Each participant was randomly assigned to either the no concurrent task (n = 22), concurrent articulation (n = 23), or concurrent tapping (n = 23) condition. If applicable, participants first practiced the concurrent task in isolation. All participants then completed the forward and backward adaptive immediate serial recall tasks with each of the three stimulus sets. All stimuli were presented visually on a touchscreen monitor (12.1 inch, 800 × 600 pixel, Model Keytech L1201S) and responses were recorded using the same monitor. Finally, each participant completed Form G of the Nelson–Denny Reading Test (Brown, Fishco, & Hanna, 1993) in a quiet testing room.
Dual Task Instructions
Participants in the concurrent articulation group were instructed to say “blah” aloud twice per second. The participants in the concurrent tapping group were instructed to tap stickers adhered to the corners of a 5 × 5 inch slip of paper, clockwise with their nondominant hand. Prior to the experiment, participants in the concurrent articulation and tapping conditions practiced speaking aloud or tapping in time with a flash that appeared on the computer monitor two times per second. The experimenter provided feedback on the participants’ speed during both the training trials and the experiment. Participants began the concurrent task before the experimenter initiated each trial and continued through both presentation and recall.
Measures
The Computerized Memory Span Tasks required participants to remember, either in forward or backward serial order, a list of images and then reproduce the sequence by pointing to the images on a computer screen. Three sets of images were used: the numerals 1 through 9 (digit span), a set of nine black-and-white line drawings of concrete, everyday objects (object span), and a set of nine black-and-white visual images with little-to-no agreed upon labels (symbol span). The three computerized span tasks were developed to match the primary administration parameters (e.g., rate of stimulus presentation; increasing list lengths; two trials per list length, stopping rule) of the WISC-III (Wechsler, 1991) auditory forward and backward digit span tasks although all items were visually presented to minimize auditory masking during concurrent articulation. 1
Before the span tasks with a given stimulus set, each item from that set appeared, in isolation, in the center of the computer screen for 1 second after which a 3 × 3 response grid (450 × 450 pixels) containing all nine items from that stimulus set appeared (Figure 1). The participant was instructed to select the just-seen item from the response grid. Stimulus familiarization was followed by the corresponding forward and backward span tasks, each of which began with practice trials at list lengths 2 and 3. Response arrays for digit span, object span, and symbol span.
For the actual task, two lists were presented at each list length, beginning with list length 2. Items appeared for 1 second, one-at-a-time in the center of the screen. The corresponding response grid appeared 250 ms after the offset of the final item. Participants were instructed to touch the monitor to select items in either forward or backward order. If the participant correctly reproduced at least one of the two lists, the subsequent lists increased by one item; if the participant failed to reproduce both lists at a given length, testing stopped. The score for each task was the total number of lists correctly recalled before testing was terminated.
The Nelson–Denny Reading Test (Brown et al., 1993) is a test of silent reading ability specifically designed to detect variability within high school and college students. The General Reading Ability composite raw scores were derived from the index scores on the vocabulary and comprehension subtests as described in the manual.
Statistical Analyses
Inferential analyses were performed with the BayesFactor package (version 0.9.12-4.2; R. D. Morey & Rouder, 2018) in R (R Core Team, 2018). The Bayes factor quantifies the strength of evidence in favor of one model relative to another, often the null model. The Bayes factors near 1 reflect roughly equivocal support for both models. Increasing support for the alternative model is reflected by the absolute value of the Bayes factor—which also takes into account model complexity. As the Bayes factor deceases toward zero, this indicates that evidence supports the null model over the alternative model. 2 Therefore, the Bayes factors provide a useful alternative to traditional frequentist significance testing when anticipated results include no difference between models (Rouder et al., 2017) as is anticipated for the no concurrent task and concurrent tapping groups on forward span and the no concurrent task and concurrent tapping groups on backward span. Moreover, unlike traditional frequentist significance testing which becomes unstable with small sample sizes, Bayesian tests are valid with small sample sizes except under a specific combination of factors not met in the current experiment (Wagenmakers et al., 2018).
For any test which included within-participant factors (i.e., repeated measures), the null model included a free intercept for each participant which was fixed across conditions. For comparisons not involving within-participant factors, the null model included only a single intercept which was fixed across participants and conditions. The anovaBF function provides the Bayes factors for each possible combination of parameters relative to a null model. The lmBF function calculates the Bayes factors for specified models relative to the null; because these Bayes factors have a common denominator (the null), these models can then be directly compared. The correlationBF function was constrained to test for positive relations among variables compared to a model of rho = 0. Each model was estimated with 100,000 samples. Otherwise, default prior settings were used.
Results
Lexicality Effects in Clinical Testing Procedures
Mean (and SD) Number of Lists Recalled by Group and Stimulus Set.
Changes to Overall Performance Due to Concurrent Task
Bayesian ANOVAs confirmed that the three groups performed similarly on the General Reading Ability subtest of the Nelson–Denny Reading Test, as a single-intercept null model was preferred to a model including a group parameter, BF = 0.2 ± 0.03%. Means and standard deviations were similar for the no concurrent control (M = 183.6, SD = 27.6), concurrent tapping (M = 183.3, SD = 29.3) and concurrent articulation (M = 175.3, SD = 25.6) groups. Therefore, we can be reasonably confident that random assignment adequately equated the three groups.
Of interest was how overall recall and the consistency of strategy use were altered in the presence of a concurrent task. Within-participant factors of direction (forward and backward) and stimulus set (digits, objects, and symbols), and a between-participant factor of group (no concurrent task, concurrent articulation, and concurrent tapping) were entered into a Bayesian ANOVA. The Bayes factors were calculated for models containing all combinations of parameters relative to a participant-level intercept-only null model. The model with the most support included all three main effects as well as all three two-way interactions (BF = 5.7 × 1074 ± 3.8%).
Our main predictions regarding group and direction were that concurrent articulation would impair performance only in forward span, and concurrent tapping would impair performance only in backward span. Therefore, to simplify the analyses, we examined these predictions separately for each direction of recall. A Bayesian ANOVA of forward span indicated that the model with the most support included both stimulus and group as well as their interaction, BF = 3.8 × 1050 ± 1.0%. To better understand how overall performance on forward span changed as a result of concurrent task, we built four models with alternative coding schemes for the group factor: (a) the same coding distinguishing between the three groups as used above; (b) coding such that the no concurrent task group was contrasted with the two concurrent task groups; (c) coding such that the no concurrent task and concurrent tapping were combined and contrasted to the concurrent articulation group; or (d) coding such that the no concurrent task and concurrent articulation groups were combined and contrasted to the concurrent tapping group. The Bayes factor for the third coding scheme (BF = 9.9 × 1049 ± 1.1%) was preferred by a factor of 2.1 over the first (BF = 4.6 × 1049 ± 1.1%), >600,000 over the second (BF = 1.4 × 1044 ± 0.6%), and 30 billion over the fourth (BF = 2.4 × 1039 ± 1.3%). Thus, consistent with our prediction, evidence strongly favored both models in which the concurrent articulation group recalled fewer lists during forward recall than the no concurrent task and concurrent tapping groups. The preferred model also combined the no concurrent task and concurrent tapping groups, although this evidence was less strong.
A Bayesian ANOVA of backward span performance supported the model which included both stimulus and group but not their interaction, BF = 2.6 × 1022 ± 0.6%. The model without the interaction was preferred by a factor of 6 to the model with the interaction. The follow-up analysis indicated that the second scheme, BF = 1.9 × 1023 ± 0.8% (no concurrent task group separated from the two concurrent tasks groups) was only weakly favored by a factor of 1.6 over the first scheme, BF = 1.0 × 1023 ± 0.6% (three distinct groups), and by a factor of 2.3 over the third scheme, BF = 8.4 × 1022 ± 0.2% (which was preferred for forward span). However, all three of those schemes were preferred over the fourth coding scheme BF = 1.9 × 1020 ± 1.2% (our prediction), in which performance only impacted the concurrent tapping group. Therefore, it is unclear to what extent concurrent tapping disrupted backward recall; however, the evidence strongly favors models in which concurrent articulation disrupts backward recall.
Bayesian Correlation Coefficients (r) and Bayes Factors (BF) across Forward and Backward Recall Instructions Within a Given Stimulus Set.
Note. Bayesian correlation coefficients (r) and Bayes Factors (BF) across forward and backward recall instructions within a given stimulus set. All models were directional and compared to a point null.

Scatterplots across forward and backward recall instructions for each stimulus set within each concurrent task group.
Access to Lexical Long-Term Memory in the Presence of Competing Task Demands
In the earlier analysis which examined the sensitivity of clinical testing and scoring procedures with only the no concurrent task control group, evidence favored the coding scheme differentiating the three stimulus sets. However, the earlier Bayesian ANOVA of forward span supported the presence of an interaction between stimulus and concurrent task. Therefore, we repeated the Bayesian ANOVA of stimulus for both the concurrent task groups. In both cases, the evidence supported distinguishing among the three stimulus sets: for the concurrent tapping group (BF = 2.0 × 1014 ± 0.3%), this scheme was favored over combining any stimulus sets by a factor >1 million; for the concurrent articulation group (BF = 1.8 × 107 ± 0.3%), by a factor of 10. Thus, lexicality effects remained for forward span in the presence of both concurrent tasks, though the strength of the evidence varied. In the earlier Bayesian ANOVA with backward span, evidence favored a model without an interaction between stimulus set and concurrent task. Accordingly, the models distinguishing the three stimulus sets under concurrent tapping (BF = 3.76 × 105 ± 0.25%) or concurrent articulation (BF = 3.79 × 108 ± 0.32%) were both favored by a factor of 2.6 relative to the model which differentiated digits from objects and symbols. This is within the same order of magnitude of support reported earlier for the no concurrent task control group.
Discussion
The present study examined performance on forward and backward serial recall using three stimulus sets—visually presented digits, line drawings of concrete objects, and images of novel nonsense symbols. One goal was to determine whether commonly used clinical administration and scoring procedures were sensitive to differences in long-term lexical knowledge at both the item and individual levels. Consistent with clinical procedures, administration began with short lists which increased in length until a predetermined number of incorrectly recalled lists resulted in the termination of testing. This procedure is fast and minimizes frustration from prolonged testing, but does not allow for the calculation of proportion correct because each participant completes a different number of test trials. Despite using small, nine-item stimulus sets with repeated presentation of the same items across lists, an approach which is assumed to minimize contributions from long-term memory, clinical testing procedures detected item-level lexicality effects in both forward and backward adaptive administrations with whole list scoring procedures. This finding is consistent with recent work in which concreteness and word frequency ratings influenced immediate serial recall with closed sets; although that study used six-item lists throughout the study—allowing for the more sensitive proportion correct scoring procedures (Neath & Surprenant, 2019). Thus, adaptive testing and total list scoring procedures capture variation in how the working memory system makes use of lexical long-term memory.
Differences in recall across stimulus sets provide support for the conclusion that long-term lexical memory influences short-term and working memory performance but does not guarantee that clinical testing procedures are sensitive to individual differences. With open sets, long-term lexical memory provides templates to support item-level memory; larger vocabularies lead to more reconstruction and better recall (Gathercole et al., 1999; Gupta & Tisdale, 2009). To understand how individual differences in lexical long-term memory impact recall in clinical settings, we also examined Nelson–Denny Reading Test as a measure of lexical knowledge. Vocabulary predicted forward recall, even with restricted stimulus sets and adaptive stopping rules based on whole list scoring. Moreover, vocabulary scores did not interact with stimulus type, indicating that the influence of individual differences in long-term lexical knowledge are not mitigated by using a closed set of highly familiar items such as digits.
Flexibility of Strategy Use
The second goal of this study was to better understand the strategy use during these clinically administered protocols, particularly under varying levels of lexical long-term support. To this end, participants completed the forward and backward immediate serial recall tasks under concurrent articulation (which limits rehearsal), concurrent tapping (which limits refreshing), or with no concurrent task (control condition). Conventionally, forward digit span is considered a test of short-term memory with maintenance occurring primarily via rehearsal, whereas backward span is considered a test of working memory with maintenance occurring primarily via refreshing. The present results are only partly consistent with this dissociation. As predicted, forward span consistently decreased when rehearsal was prevented with concurrent articulation but not when refreshing was disrupted by concurrent tapping. Thus, the ability to rehearse is especially important to achieving expected performance on clinical tests of forward serial recall. Moreover, forward span, alone, is not sufficient to detect disturbances within the larger working memory system. Forward digit span performance may appear within the normal limits even when attention is otherwise occupied, so long as a robust lexical network in long-term memory can give rise and support to a relatively automatic (i.e., attention-free) rehearsal process.
By classifying backward span as a working memory task, clinical test batteries diverge from contemporary models of working memory. In the current study, blocking rehearsal impaired backward recall to the same extent—if not more—that preventing refreshing did. Although refreshing and rehearsal have been shown to independently contribute to working memory tasks (Camos, 2017; Camos et al., 2009, 2011), processing constraints and speeded presentation of complex and running span tasks sufficiently minimize rehearsal to reveal individual differences in attention. However, the current findings indicate that rehearsal remains the predominant strategy in backward span. Indeed, the current findings are consistent with previously observed self-report data of strategy use during forward and backward serial recall (Norris et al., 2019): when tested repeatedly at an individualized list length which was set to challenge participants, participants reported using more conscious strategies during backward digit span than forward digit span; nevertheless, rehearsal strategies—either a evenly paced rehearsal or a rehearsal which additionally clustered digits into small groups—were the most commonly endorsed strategy for both forward and backward tasks, regardless of whether digits were presented visually or aurally (Norris et al., 2019).
In the current study, adults with typical language skills appeared to spontaneously use rehearsal in both forward and backward span tasks despite the visual presentation of list items. Previous work demonstrates that auditory input gains obligatory access to a phonological maintenance system suitable for rehearsal, whereas visual input must be recoded into a phonological form (Salame & Baddeley, 1982). Nonetheless, participants’ strategy use shifted with a dual task manipulation. When one strategy was constrained by the concurrent task, the forward and backward versions of each stimulus type (digit, words, and symbols) became strongly correlated indicating that participants were able to flexibly employ the remaining available strategy on both tasks. The only exception to this pattern was forward and backward recall of digits which were similarly correlated with no concurrent task and under articulatory suppression. Digits, having high familiarity and strong interitem associations, are more likely to promote automatic long-term phonological activation than the other two stimulus sets (Gathercole et al., 1999). Such phonological activation may have allowed some participants to continue recoding the visual stimuli using rehearsal during forward span, even under concurrent articulation demands.
Relationship Between Lexical Long-Term Memory and Strategy Use
Even under clinical testing procedures, recall consistently favored digits then objects and then symbols—regardless of secondary task or rehearsal instructions. Even the group engaged in concurrent articulation displayed a lexicality advantage during forward span, though the advantage was less robust than observed for the two other groups whose access to phonological strategies were not hindered. It is possible that effects of lexical knowledge—both at the item and individual level—are larger than they would be had we used auditory instead of visual presentation. Because visual presentation requires phonological recoding before rehearsal can be employed, the current study includes two steps at which lexical knowledge might influence recall. Still, a well-established long-term lexical network appears to benefit serial recall performance—even in the presence of attentional distraction and to a smaller, but still significant degree, during list reversal. Consequently, individual differences in lexical long-term memory may manifest as a relative impairment in an adaptive serial recall task (Tehan & Lalor, 2000), especially during forward recall. In other words, children with smaller vocabularies or slower word retrieval may receive less long-term linguistic support during immediate serial recall tasks and, thus, recall fewer lists than peers with more robust lexical networks who have faster more efficient lexical retrieval (e.g., AuBuchon et al., 2019).
Conclusion
There is increasing divergence between clinical and theoretical goals in regard to measuring short-term and working memory capacity. Experimental paradigms are highly controlled to isolate individual components of attentional capacity and refreshing by minimizing contributions from rehearsal and long-term knowledge. Unfortunately, these paradigms are often impractical in a clinical setting that, instead, opts to use digits. The assumption is that digits are so highly familiar across all individuals that other developmental and cognitive differences due to experience or environment should be undetectable with digit span. However, even among college students, vocabulary knowledge predicted forward digit span performance. Therefore, we must move clinical interpretation away from the simple dichotomies of short-term memory/working memory, forward/backward span, or even verbal/visual memory, which were prevalent when immediate serial recall tasks were first developed (Adams et al., 2018).
Contemporary views of working memory focus on the active involvement of controlled attentional resources (e.g., Camos, 2017; Unsworth & Engle, 2007). However, the current dual task manipulations revealed rehearsal—not attentional refreshing—was the dominant strategy used by adults in both forward and backward serial recall. Importantly, rehearsal may require more controlled attentional resources for school-age children than adults (AuBuchon, McGill, & Elliott, 2019). Thus, measures of immediate serial recall performance in early childhood may have a predictive value for long-term outcomes precisely because rehearsal has not become automatic. However, as children get older, it becomes less clear whether poor digit span performance reflects differences in attentional resources or atypical rehearsal development (AuBuchon et al., 2016).
In clinical and other real-world settings, we are free to flexibly engage components of the working memory system. Such flexibility gives rise to potential compensatory mechanisms that may mask underlying working memory disruptions, at least in situations with few demands on the system. Conversely, disruptions in other cognitive skills—such as language development and reading—may contribute to problems in immediate serial recall rather than be a consequence of poor working memory (Cowan, 2008). Future laboratory-based research studies will be better situated to translate current theories of short-term and working memory into guidelines for interpreting clinical tests and developing novel interventions if they are designed with clinical goals and real-world complexities in mind.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
