Abstract
The standard view in cognition is that the identification of visually presented words, up to and including semantic activation, is automatic in various senses. The perspective favored here is that various kinds of attention are intimately involved in the identification of words. Some forms of attention are necessary, whereas others (i.e., executive attention) are recruited to optimize performance. We briefly review results from a variety of literatures that (a) support the latter perspective and (b) are difficult to reconcile with an automatic-processing account.
The identification of visually presented words, up to and including semantic activation, is widely thought to be automatic (e.g., see virtually all cognitive psychology textbooks; see also Augustinova & Ferrand, 2014). Centrally, for present purposes, the underlying processes are viewed as not needing any form of attentional resource and not subject to interference from nor open to control by other processes (e.g., Posner & Snyder, 1975, among many others). Recent evidence, however, does not support this view. In particular, we review investigations of the roles of spatial attention, central attention, and executive attention in word identification, all of which have yielded evidence inconsistent with the prevailing automatic-processing perspective.
Spatial Attention
Recent research has focused on the role of spatial attention (the focusing of attention at a particular location in space) in visual word identification. Two questions have arisen from this research that are central when considering the extent to which word identification requires spatial attention. First, is spatial attention manipulated between the word and another location (e.g., comparing cases when the word appears in an attended location vs. an unattended location) or within the word (e.g., comparing cases when spatial attention is distributed across the entire letter string vs. focused on a specific location/letter within the letter string)? Second, is word identification an explicit part of the task (e.g., reading aloud) or an implicit part of the task (e.g., as in the Stroop paradigm). In the following, we highlight findings that speak to both of these distinctions.
When words are outside the focus of attention: Explicit word identification
If visual word recognition is automatic in that it does not require spatial attention, then task performance should be unaffected by spatial cues that draw attention away from the printed word. McCann, Folk, and Johnston (1992) examined the effects of spatial cuing in the lexical decision task. On each trial, a small rectangle appeared briefly either in the location where a subsequent target string would appear (a valid trial) or in a location where the target string would not appear (an invalid trial). McCann et al. reported robust cuing effects (faster responses on validly cued trials) that were equivalent for high- and low-frequency words. They therefore concluded that spatial attention, as indexed by the spatial-cuing effect, is a necessary preliminary to word identification. Their logic was that the word-frequency effect reflects a central component of lexical processing, and it was assumed that additivity of the spatial-cuing effect and word frequency implied that spatial attention influences a process that occurs prior to the effects of word frequency.
A converging line of research has examined how various kinds of distractor words affect target processing in reading aloud, lexical decision, and semantic categorization. This work has yielded a remarkable consensus with regard to explicit word identification: Provided that the spatial location of the target is highly predictable and that the distractor appears at a location other than that of the target, there are no distractor effects (Besner, Risko, & Sklair, 2005; Lachter, Forster, & Ruthruff, 2004; Lien, Ruthruff, Kouchi, & Lachter, 2010; Waechter, Besner, & Stolz, 2011; although notably, there are such distractor effects with the same displays when distractors’ spatial locations are less predictable). Thus, readers are capable of focusing spatial attention on the target without being affected by a distractor word in tasks that require explicit word identification. These data are inconsistent with the view that distractor words can be processed without spatial attention.
When words are outside the focus of attention: Implicit word identification
A distinct line of research has examined the Stroop effect when a color patch and the irrelevant word occupy different spatial locations. In many studies, spatially cuing the color patch failed to prevent the distractor word from interfering with color naming (Brown, Gore, & Carr, 2002; Lachter, Ruthruff, Lien, & McCann, 2008; Waetcher et al, 2011, Experiment 5), even under the same cuing conditions that had eliminated a distractor effect in explicit word-identification tasks (Lachter et al., 2004; Waechter et al., 2011, Experiments 1–4). These results are consistent with the argument that implicit word identification does not require spatial attention.
Waechter et al. (2011) proposed a different account of these results, according to which color processing (the target task) requires only a minimal amount of spatial attention, thus allowing it to be distributed across locations, including that of the irrelevant word. In other words, the target task does not demand enough spatial attention for us to be confident that participants are not distributing attention over all locations (i.e., the argument is that distractor words are attended to some extent). Robidoux and Besner (2015) confirmed this account by using a color-patch task that demanded more spatial attention by requiring participants to name the color that occupied the most space (see Fig. 1). Unlike the use of a homogenous color bar in other studies, use of this color-patch task eliminated interference from an irrelevant color word in a different location.

Sample stimulus from Robidoux and Besner (2015).
When spatial attention varies within a word: Evidence from the Stroop paradigm
Previous research has demonstrated that in experiments in which the word carrying the ink color is from the response set, modifying the display so that the color of a single letter is to be identified (e.g., by spacing, coloring, and spatially cuing a single letter within the word) drastically reduces the Stroop effect (Besner & Stolz, 1999; Besner, Stolz, & Boutilier, 1997). This implies that at least some word-identification processes are derailed in the standard Stroop task when attention is not distributed across the entire letter string. Recent work using the semantic Stroop effect, in which the irrelevant carrier word is semantically related to the color (e.g., the word sky printed in blue), however, has challenged the view that these kinds of attentional manipulations influence semantic processing. Specifically, Augustinova and colleagues (Augustinova, Flaudias, & Ferrand, 2010; Augustinova & Ferrand, 2014; Augustinova, Silvert, Ferrand, Llorca, & Flaudias, 2015) reported that the semantic Stroop effect was insensitive to single-letter coloring and spatial-cuing manipulations, and hence they concluded that semantic activation is automatic.
In contrasting the original work with this more recent research, one potentially critical methodological difference is notable. In Augustinova and colleagues’ work, the letters in the word were normally spaced, whereas in some of Besner and colleagues’ earlier work, the letters had a space added between them. This was important in that it might have increased participants’ ability to focus attention on a particular letter. To assess this general idea, we investigated whether coloring a single letter in combination with other cues to aid the focusing of attention within the word would eliminate the semantic Stroop effect. Labuschagne and Besner (2015) found that displays similar to that shown on the left in Figure 2 yielded a semantic Stroop effect remarkably similar in magnitude to those reported previously by Augustinova and colleagues, whereas displays similar to that shown on the right yielded no semantic Stroop effect. These results demonstrate that processes leading to semantic activation can be prevented, and hence by this criterion are not automatic.

Sample stimuli from Labuschagne and Besner (2015).
Central Attention
Another important type of attention is central attention. This is the type of attention typically discussed when we consider, for example, the extent to which an individual can perform two tasks at the same time. Critically, if word identification is automatic, then it should not require central attention. We review evidence inconsistent with this notion below.
Before discussing research on the role of central attention in word identification, it is important to briefly describe extant accounts of the processes that are thought to underlie word identification, because some require central attention whereas others do not. Although we focus on models of explicit word identification, specifically those of reading aloud, the processes involved also support implicit word identification (Coltheart, Woollams, Kinoshita, & Perry, 1998), though there are likely differences in the processes involved across different reading tasks (e.g., silent reading vs. reading aloud). The most successful account of reading aloud is based on dual-route models (see Fig. 3) that contain both a “direct” lexical/semantic route to phonology (e.g., see Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; Perry, Ziegler, & Zorzi, 2007) and a distinct sublexical route that allows for the generation of pronunciations of words unknown to the reader. Correctly reading regular words (i.e., words with typical spelling-to-sound correspondences, such as save) can be achieved with either the lexical or the sublexical route, but the correct pronunciation of irregular words (i.e., words with atypical spelling-to-sound correspondences, such as have) relies, ultimately, on the output of the lexical/semantic route. The sublexical route is necessary in order to correctly read new words and nonwords. With this brief introduction to a dual-route architecture in hand, we turn now to a consideration of where in processing central attention is brought to bear.

A localist dual-route class model of reading aloud.
The psychological refractory period paradigm
The psychological refractory period (PRP) paradigm is a dual-task paradigm that allows inferences as to whether or not various mental processes require central attention. (For a full description of this paradigm, see Pashler, 1992). Again, if word identification is automatic, then none of its subprocesses should require central attention.
Focusing first on the lexical route, Reynolds and Besner (2006) used long-lag repetition priming (with a lag of about 80 items) as an index of processing up to and including the orthographic input lexicon (e.g., Visser & Besner, 2001) in the context of the PRP paradigm. Their results demonstrated clearly that the benefit of repeating a word (when reading aloud) does not require central attention. That said, Ruthruff, Allen, Lien, and Grabbe (2008) have provided evidence that less skilled readers need central attention for lexical-level activation.
Reynolds and Besner (2006; see also O’Malley, Reynolds, Stolz, & Besner, 2008) also studied sublexical processing. Across all of their manipulations (including nonword repetition), processing along the sublexical route required central processing. The conclusion is clear: In skilled readers, lexical processing up to and including the orthographic input lexicon does not require central attention, whereas sublexical processing does.
Given that we know lexical activation up to the level of the orthographic input lexicon does not require central attention, what about semantic activation that follows such lexical activation? Using the Stroop task as an index of semantic processing in the PRP paradigm, Fagot and Pashler (1992) claimed that the standard Stroop effect required central attention. They therefore concluded that semantic processing cannot be capacity free. One problem with this conclusion is that the standard Stroop effect (when the irrelevant word belongs to the response set) indexes more than just semantic processing (it also indexes a large response-competition component). A direct test of the extent to which semantic processing requires central attention calls for the use of the semantic Stroop task described earlier. Besner and Reynolds (2014) and White and Besner (2015) addressed this issue with the semantic Stroop effect and found that, like the standard Stroop effect, it too requires central attention. This result is contrary to a central definition of an automatic process as not being capacity limited (e.g., as explicitly claimed by Neely & Kahan, 2001).
Executive Attention
A defining attribute of an automatic process is that it cannot be controlled. Thus, the processes underlying word identification should be insensitive to variations in context. Recent research, however, suggests a rather more dynamic view of the reading system that includes an executive attentional system that monitors and controls processing as a function of situational demands (e.g., Balota & Yap, 2006; Fernandez-Duque, Baird, & Posner, 2000).
Control over the balance between lexical and sublexical routes
As noted above, in dual-route models of reading aloud, the lexical pathway must be strong enough to read exception words correctly, and the sublexical pathway must be strong enough to read nonwords correctly. Simulations with a computational model (e.g., Coltheart et al., 2001) have shown that it is possible for a mixed list of exception words and nonwords to all be read accurately, confirming that a parameter set exists that satisfies the need for a strong-enough lexical route to avoid regularizations coupled with a strong-enough sublexical route to avoid lexicalizations. The unanswered question remains: Can the relative emphasis on the lexical versus sublexical pathways be controlled?
To address this question, Reynolds and Besner (2005, 2008) adopted a paradigm from the task-switching literature (see Shafiullah & Monsell, 1999) that involves participants predictably switching between two tasks in an AABB fashion. The presence of switch costs (i.e., A to B and B to A, relative to A to A and B to B) is used to index the involvement of executive control. In Reynolds and Besner’s adaptation of this paradigm, participants switched between reading different types of words (regular words, irregular words, nonwords) rather than switching between different tasks. The central issue here was whether there was a cost when switching from reading exception words to reading nonwords and vice versa. If there were, then this would imply that the relative emphasis on the lexical versus sublexical pathways can be controlled.
Critically, Reynolds and Besner (2005, 2008) reported just such a switch cost. In contrast, no switch costs were observed when participants switched between reading regular words and irregular words, nor when they switched between reading regular words and nonwords. This observation makes sense theoretically because regular words can be read correctly regardless of which pathway dominates performance, and as such there is no need to reconfigure any parameters (i.e., no “switching” is necessary). Nonetheless, even when switch costs were absent, implying a consistent balance between routes across trials, the word-frequency effect was significantly larger for the same set of regular items when mixed with irregular words in one experiment than when mixed with nonwords in another experiment. This implies that the lexical route had more of an influence in the former case than in the latter one. The general conclusion is clear: Individuals are able to exert control over the relative strength of lexical and sublexical routines when print is converted to sound. These findings are at odds with the idea that reading is automatic in the sense of not being open to control.
Control over processing dynamics
More evidence consistent with controlled processing has come from work demonstrating contextual influences over how information flows through the word-processing system. All computational accounts to date have implemented cascaded processing along the lexical route. In such processing, activation accrual across modules overlaps in time; there are no discrete stages (McClelland, 1979). Consistent with this idea, word frequency and stimulus quality interact when reading aloud (O’Malley & Besner, 2008), and this interaction can be simulated in a cascaded model (Reynolds & Besner, 2004). However, this interaction holds only when the list consists entirely of words; when nonwords are randomly intermixed, word frequency and stimulus quality are additive factors (O’Malley & Besner, 2008; relatedly, see Besner, O’Malley, & Robidoux, 2010).
Clearly, how different factors jointly influence reading aloud changes as a function of the context, inconsistent with the idea that the processing dynamics are fixed. A module or modules that monitor the makeup of stimulus elements in the list are called for, along with control processes that determine how processing unfolds in response to list composition.
Conclusions
We have reviewed a number of examples of how various kinds of attention (i.e., spatial attention, central attention, executive attention) play a role in the explicit and implicit identification of individual words. Given these and other results, we suggest that it is time for the field to discard the view that lexical, sublexical, and semantic processing in visual word identification are automatic in the various senses noted here.
Future Directions
Moving away from the idea that word identification is “attention-free” opens a number of avenues for future research, including (a) determining how attention, in its various forms, contributes to word identification (see Risko, Stolz, & Besner, 2010, for one such attempt) rather than focusing on whether or not attention is involved; (b) exploring the role of reading skill level; (c) implementing attention, in its various forms, into extant computational models of reading; and (d) examining whether the conclusions drawn here about performance at the behavioral level also apply at the neural level when event-related potential, fMRI, and other brain-type measures are employed.
Footnotes
Acknowledgements
We are grateful to D. Balota, M. Augustinova, and an anonymous reviewer for their comments.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
