Abstract
The three experiments reported here demonstrated a cross-modal influence of an auditory rhythm on the temporal allocation of visual attention. In Experiment 1, participants moved their eyes to a test dot with a temporal onset that was either synchronous or asynchronous with a preceding auditory rhythm. Saccadic latencies were faster for the synchronous condition than for the asynchronous conditions. In Experiment 2, the effect was replicated in a condition in which the auditory context stopped prior to the onset of the test dot, and the effect did not occur in a condition in which auditory tones were presented at irregular intervals. Experiment 3 replicated the effect using an accuracy measure within a nontimed visual task. Together, the experiments’ findings support a general entrainment perspective on attention to events over time.
Many sources of natural stimulation in the environment (e.g., biological motion, vocal communication) have explicitly rhythmic and predictable patterns (Winfree, 2000). Consider the temporal characteristics of a variety of commonplace activities, such as the rhythmic interplay of speech and gesture in a conversation, the coordinated movements of runners in a race, and the rhythmic and often synchronous movements of the audience at a rock concert. Recently, interest in what such rhythms can tell researchers about the temporal aspects of attention has increased (Barnes & Jones, 2000; Coull, Frith, Büchel, & Nobre, 2000; Large & Jones, 1999; Schroeder, Lakatos, Chen, Radman, & Barczak, 2009). Accumulating behavioral and neural evidence suggests that sensory rhythms have the potential to entrain (synchronize) attentional processes. The evidence suggests that perceivers give maximal attention to expected time points and that such entrainment occurs in both the auditory (e.g., the next beat in a song) and the visual (e.g., the next flash in a flashing red light) domains (Jones, 1976; Large & Jones, 1999). The implication of such findings is that sensory rhythms drive a periodic series of attentional peaks and troughs that occur at roughly equal temporal intervals (Jones, 1976; Jones & Boltz, 1989; Large & Jones, 1999).
Rhythms in Auditory Attention
Behavioral evidence for attentional entrainment in the auditory modality comes from several sources. Some studies have examined overt motor tracking of tone sequences and revealed that individuals show less variability and greater accuracy in responding to rhythmically simple sequences than in responding to complex or irregularly timed sequences (Jones & Pfordresher, 1997; Large, Fink, & Kelso, 2002; Large & Palmer, 2002). Other studies have shown that when listeners are asked to detect or discriminate changes in event sequences, performance is better for rhythmically expected targets than for unexpected targets. Thus, the common behavioral finding across studies of entrainment of auditory attention is that rhythmically expected events (i.e., synchronous events) are better detected and discriminated than early or late events (i.e., asynchronous events; Jones, Boltz, & Kidd, 1982; Jones, Moynihan, MacKenzie, & Puente, 2002; Jones & Yee, 1997; McAuley & Jones, 2003). Converging work in neuroscience has shown that oscillations in the auditory cortex are hierarchically organized, with low-frequency oscillations modulating higher-frequency oscillations. Thus, this organization allows the auditory cortex to structure its temporal activity to be aligned with rhythmic inputs (Lakatos et al., 2005).
Rhythms in Visual Attention
There is also behavioral and neural evidence that visual rhythms can drive the temporal allocation of visual attention. Thus, monitoring of visual sequences is enhanced in simple (compared with complex) rhythmic contexts (Jones & Skelly, 1993). Reaction times and sequence learning are similarly influenced by the rhythmic structure of visual event sequences (Olson & Chun, 2001). Moreover, microsaccadic movements of the eye occur rhythmically, and the temporal structure of these microsaccades predicts the speed of behavioral responses to stimulus change (Bosman, Womelsdorf, Desimone, & Fries, 2009). Further, more visual attention is allocated when a stimulus is temporally expected than when it is unexpected (Doherty, Rao, Mesulam, & Nobre, 2005). Converging neuroscience support comes from the finding that delta oscillations in V1 entrain to the rhythm of a stream of visual input if the input is rhythmic (Lakatos, Karmos, Mehta, Ulbert, & Schroeder, 2008).
Cross-Modal Links in Attention
General links between auditory and visual attention are extensive and have been demonstrated in the domains of spatial attention (Driver & Spence, 1998a, 1998b; Spence & Driver, 1997), perceptual processing (for a review, see Vroomen & de Gelder, 2004), and saccadic eye movements (Colonius & Arndt, 2001; Frens, Van Opstal, & Van der Willigen, 1995). Much of this research has focused on the spatial aspects of cross-modal attention rather than the temporal aspects that are the focus here. However, it has been suggested that sound can alter temporal aspects of vision such as perceived duration and rate (Walker & Scott, 1981). Furthermore, there is evidence that neural oscillations may be linked to early integrative multisensory processing in that gamma-band oscillations are sensitive to temporal aspects of multisensory stimuli, responding best to multisensory components that occur with the closest synchrony (Senkowski, Talsma, Grigutsch, Herrmann, & Woldorff, 2007).
Overview of the Current Study
The growing literature on the temporal allocation of attention has largely focused on cuing attention to specific temporal intervals during which stimuli are particularly likely to occur (Coull et al., 2000; for a review, see Nobre, 2010). Fewer studies have focused on the potential for attentional entrainment by stimulus rhythms (Doherty et al., 2005; Jones et al., 2002; Large et al., 2002; Sanabria, Capizzi, & Correa, 2011), and those that have considered attentional entrainment have been limited to a single modality. However, if entrainment is a general property of the attentional system (Jones, 1976; Large, 1999; Large & Palmer, 2002), then it should operate both within and across modalities. To date, no other researchers have specifically examined how entrainment by a rhythmic stimulus in one modality may influence the temporal allocation of attention in another modality. Toward this end, we considered the possibility of cross-modal interactions in the temporal allocation of attention, focusing on effects of entrainment by a rhythm in the auditory modality on the temporal allocation of attention to stimuli in the visual modality.
The possibility that such cross-modal interactions exist is consistent with the view of attention as a limited resource that is shared among multiple modalities (Kahneman, 1973). We reasoned that two different, specific patterns of influence might be consistent with this view. First, entrainment of attention in one modality may similarly serve to entrain attention in another modality such that attentional peaks are synchronized between the two modalities. We refer to this as the correspondence account of attentional entrainment. Conversely, a conflict account predicts that attentional entrainment in one modality will suppress attention in another modality, such that when attention is at its peak in the entrained modality, attention is minimized in the other modality. Thus, with respect to the current work, the term conflict simply refers to the idea that a limited attentional capacity is exhausted by the entrained modality, and that this exhaustion results in a reduction of attentional resources available for detection of the onset of a target in another modality. Note that both of these accounts predict a cross-modal effect of entrainment. Alternatively, an independent-effect account predicts that attentional entrainment in one modality will not affect the allocation of attention in another modality. This account is consistent with models of attention that emphasize independent processing of cross-modal stimuli (Bonnel & Hafter, 1998; Shiffrin & Grantham, 1974).
The current study consisted of three experiments that assessed these three predictions regarding cross-modal attentional entrainment. All three experiments examined the effect of entrainment by an auditory rhythm on the allocation of attention in vision. Experiment 1 explored whether auditory entrainment affects the allocation of visual attention by examining the effect of a predictable auditory rhythm on saccade latencies. Experiment 2 assessed whether the effect would be observed when the auditory rhythm needed to be extrapolated beyond its actual presentation and whether the effect requires the auditory stimulus to be rhythmic. Finally, Experiment 3 examined the influence of a rhythmic auditory stimulus on accuracy (rather than latencies) using a visual gap judgment task.
Experiment 1
Experiment 1 examined the effect of auditory attentional entrainment on visual attention. Figure 1a shows the paradigm. On each trial, participants were entrained to an auditory rhythm and then made a saccade from a central fixation point to a test dot that appeared at the end of the auditory rhythm. Critically, the onset of the test dot varied such that it was in synch with the final tone of the auditory rhythm or out of synch with it. Generally, visual attention is tied to gaze position and precedes the actual movement of the eyes to a new location (Deubel & Schneider, 1996; Hoffman & Subramaniam, 1995; Kowler, Anderson, Dosher, & Blaser, 1995). Thus, saccade latencies can be used as an index of visual attention, with faster saccade latencies indicative of enhanced visual attention to the saccade target.

Schematics illustrating the paradigms used in the experiments. In Experiment 1 (a), observers saw a central fixation point, and 10 tones sounded. The interonset interval (IOI) between tones (other than the last) was fixed at 600 ms. At the end of the trial, a visual test dot was presented in one of the four corners of the display, and the fixation dot disappeared. Trials in the extrapolation condition of Experiment 2 (b) were identical to the trials in Experiment 1 except that the final tone was omitted. In the irregular-timing condition in Experiment 2 (c), the IOIs between tones were varied randomly, with the constraint that the timing of the onset of the last tone relative to the first tone was the same as in the extrapolation condition. Otherwise, the trials were the same as in the extrapolation condition of Experiment 2. The trial structure in Experiment 3 (d) was the same as the trial structure of the extrapolation condition in Experiment 2 except that there were 7 tones rather than 9 and the visual stimulus was a Landolt square rather than a test dot. Large boxes represent visual displays.
Method
Participants
Twenty undergraduates from the University of Notre Dame participated in the experiment in return for course credit. All had with normal or corrected-to-normal vision
Stimuli and design
Visual stimuli were dots displayed on a 19-in. computer monitor at a resolution of 1,024 × 768 pixels. Viewing distance was 56 cm from the center of the screen, with the total display subtending approximately 34° × 27° and the fixation dots subtending 0.6° × 0.6°. Test dots appeared in one of the four corners of the screen, with distance from the central fixation point held constant at 19.8°. A sequence of ten 60-ms tones occurred, with a fixed 600-ms interonset interval (IOI) between tones. The total duration of the auditory rhythm, including the duration of the last tone, was 5,460 ms. The first tone coincided with the onset of the initial fixation dot. Across trials, the onset of the test dot occurred at each of three intervals: in synch (600 ms after the 9th tone), slightly out of synch (600 ± 21 ms after the 9th tone), and very much out of synch (600 ± 76 ms after the 9th tone; minus signs indicate onsets that occurred too early to fit the pattern established by the earlier tones, and plus signs indicate onsets that occurred too late). 1 The test dot occurred at each of these intervals with equal frequency (i.e., each interval was used on 20% of trials). The dependent measure was saccade latency, the time from onset of the test dot to initiation of the saccade toward that dot.
Apparatus
Eye movements were recorded using a head-mounted EyeLink II eye tracker (SR Research, Chicago, IL) that recorded the position of the pupil of the left eye with a sampling rate of 500 Hz. Participants were positioned in stationary chairs to maintain viewing distance and listened to sounds through two mono speakers that were spatially centered.
Procedure
Each trial was preceded by a drift correction to ensure that the participant was centrally focused. A sequence of auditory tones began playing as soon as the trial began, and the participant was not given an explanation for the presence of the tones. During the trial, the participant focused on a fixation dot centered on the screen. The fixation dot then disappeared, and a visual test dot appeared in one of the four corners of the screen. The participant moved his or her eyes as quickly as possible to the test dot. The next trial began after a saccade had been made from fixation and the participant had refixated for 1 s. Participants completed 10 practice trials and then 40 test trials. Errors were defined as either saccades that were not in the direction of the test dot or premature saccades away from central fixation before the test dot appeared.
Results and discussion
Figure 2a shows average saccade latency as a function of test-dot onset. A one-way repeated measures analysis of variance on saccade latencies revealed a main effect of onset condition, F(2, 38) = 9.62, p < .05, η p 2 = .391. Saccade latencies were significantly faster when test-dot onset was in synch with the preceding rhythm (M = 215.3 ms, SE = 6.8) than when it was slightly out of synch (M = 231.1 ms, SE = 7.5) or very much out of synch (M = 229.5 ms, SE = 7.2), ps < .025 (Bonferroni corrected). These results are consistent with the correspondence account of cross-modal entrainment: A rhythmic auditory stimulus affected visual attention such that it was maximized at a point in time when a stimulus was expected to occur aurally according to the time structure of the auditory stimulus. 2

Results of Experiments 1 through 3. Mean saccade latency as a function of test-dot onset time is shown for (a) Experiment 1, (b) the extrapolation condition in Experiment 2, and (c) the irregular-timing condition in Experiment 2. Percentage of correct answers in Experiment 3 (d) is shown as a function of Landolt-square onset time. Error bars indicate standard errors. Asterisks indicate a significant reduction in reaction time or a significant improvement in the percentage of correct answers in the synch condition relative to the two out-of-synch conditions (p < .05).
Experiment 2
Experiment 2 was designed to rule out two alternative explanations of Experiment 1’s results. First, in Experiment 1, test-dot onset varied both with respect to the preceding rhythm and with respect to the actual occurrence of the 10th tone. That is, when the onset of the test dot was synchronized to the preceding rhythm, it was also synchronized with the final tone; when the test dot was somewhat out of synch with the preceding rhythm, it was also asynchronous with the final tone. Because the simultaneous presentation of visual and auditory stimuli has been shown to improve both identification of and reaction time to visual targets (McDonald, Teder-Salejarvi, & Hillyard, 2000; Vroomen & de Gelder, 2000), one alternative explanation is that the observed effects were not a direct result of entrainment, but rather were an artifact of the simultaneous presentation of cross-modal stimuli (test dot and final tone). To address this issue, we arranged the extrapolation condition of Experiment 2 so that the auditory context ended before the presentation of the test dot; thus, the presentation of the test dot was not synchronized with a tone. If the effects observed in Experiment 1 were due to entrainment, then the same pattern of results would be expected in Experiment 2.
Second, it is possible that the attentional focus of subjects was centered on the mean time interval across the series of tones, which would result in an attentional peak at the onset of the test dot in the in-synch condition and would account for the results of Experiment 1, independently of entrainment. Therefore, in the irregular-timing condition of Experiment 2, the average IOI between tones was the same as in Experiment 1, but the test tones occurred randomly, at unpredictable time intervals. If the effects observed in Experiment 1 were in fact due to entrainment, no difference in saccade latencies across onset conditions would be expected for the irregular-timing condition.
Method
Forty participants participated in Experiment 2. We used the same method and stimuli as in Experiment 1, except that we eliminated the final tone in the extrapolation condition and irregular-timing condition (see Fig. 1b) and presented irregularly timed tones (IOIs varying from 200 to 800 ms, with an average IOI of 600 ms) in the irregular-timing condition (see Fig. 1c). The order of these rhythm conditions—extrapolation condition first or irregular-timing condition first—was blocked and counterbalanced across subjects. Each block contained 80 trials, and participants were allowed a break between blocks.
Results and discussion
The data of 3 participants were eliminated from the analysis because more than 20% of their trials resulted in errors. Removal of these participants’ data did not change the pattern of results. There was no effect of the order of the rhythm conditions, F(1, 35) = 1.90, p > .05, and there were no reliable interactions of rhythm type (extrapolation vs. irregular timing) and onset interval with rhythm-condition order. Consequently, we collapsed the remainder of the analyses over rhythm- condition order. There was a significant interactive effect of rhythm type and onset condition on saccade latency, F(2, 72) = 3.36, p < .05. Therefore, we examined the simple effect of test-dot onset interval on saccade latency for each rhythm condition independently.
Figure 2b summarizes mean saccade latency as a function of test-dot onset in the extrapolation condition. There was a significant difference in saccade latency across onset conditions, F(2, 72) = 5.67, p < .05, η p 2 = .136. Saccade latencies were significantly faster when test-dot onset was in synch with the preceding rhythm (M = 209.32 ms, SE = 4.75) than when test-dot onset was slightly out of synch with the preceding rhythm (M = 219.73 ms, SE = 4.2) or very much out of synch with the preceding rhythm (M = 218.81 ms, SE = 4.15), ps < .025 (Bonferroni corrected). Thus, the faster saccade latencies in Experiment 1 for the in-synch condition were not due to the synchronous presentation of the final tone and the test dot.
Figure 2c summarizes mean saccade latency as a function of test-dot onset in the irregular-timing condition. There was no effect of onset condition on saccade latency, F(2, 72) = 0.04, p > .05, and this lack of effect suggests that the effect in Experiment 1 was not due to the predictability of the stimulus onset as defined by the average interval between tones.
Experiment 3
In Experiments 1 and 2, both the independent variable (synchronicity of the auditory rhythm and the onset of the visual test stimulus) and the dependent variable (saccade latency) were temporal measures. In Experiment 3, we sought to replicate the entrainment effect within an accuracy measure, using a gap-detection task. In this task, participants were briefly presented with a Landolt square (a square with a small gap in one side) and asked to determine the side in which the gap occurred. Figure 1d illustrates the paradigm used in Experiment 3.
Method
Participants
Sixteen undergraduates from the University of Notre Dame participated in the experiment in return for course credit. All had normal or corrected-to-normal vision.
Stimuli and design
The stimuli were white on a black background. During each trial, a fixation dot subtending 0.3° × 0.3° was presented, followed by a typical Landolt square subtending 1° × 1°. The square could occur in one of four locations on the screen: 3° directly above, below, left of, or right of the location in which the fixation dot had been presented. The square had a gap in either its left or right side. The gap subtended approximately 0.11° and was always vertically centered in the side in which it occurred. A postmask subtending 1.5° × 1.5° covered the Landolt square after 100 ms. The auditory stimuli consisted of a series of seven 60-ms tones of 440 Hz, all with an IOI of 600 ms. We used the same extrapolation paradigm as in Experiment 2, such that the onset of the Landolt square occurred when an eighth tone would have occurred and, consequently, there was no concurrent presentation of cross-modal stimuli. The onset of the Landolt square was in synch (600 ms after the 7th tone), slightly out of synch (600 ± 21 ms after the 7th tone), and very much out of synch (600 ± 76 ms after the 7th tone; minus signs indicate onsets that occurred too early to fit the pattern established by the earlier tones, and plus signs indicate onsets that occurred too late). The Landolt square occurred at each of these intervals with equal frequency (i.e., each interval was used on 20% of trials).
Procedure
Participants focused on a fixation dot in the middle of the screen until the square appeared. They indicated the side of the square containing the gap by pressing “1” for left and “0” for right. The postmask remained on-screen until the response. Participants completed 20 practice trials and then two test blocks of 80 trials each.
Results
Figure 2d summarizes mean accuracy as a function of onset condition. There was a significant difference in accuracy across visual onset conditions, F(2, 30) = 3.52, p < .05, η p 2 = .196. Accuracy was significantly higher when the onset of the square was in synch with the preceding rhythm (M = 91.3%, SE = 1.9) than when the onset of the square was slightly out of synch with the preceding rhythm (M = 87.8%, SE = 2.5) or very much out of synch (M = 88.7%, SE = 1.9), ps < .025 (Bonferroni corrected). In sum, results from Experiment 3 show that the cross-modal effect of auditory entrainment extends to an untimed visual discrimination task.
General Discussion
Three experiments examined cross-model effects of auditory entrainment on the temporal allocation of visual attention. In Experiments 1 and 2, participants moved their eyes to a test dot that appeared in one of four corners of the screen with a temporal onset that was either synchronous or asynchronous with the periodic continuation of an auditory rhythm. In both experiments, saccade latencies were faster for the synchronous condition than for the asynchronous condition. Experiment 2 revealed that the cross-modal effects of auditory entrainment were not dependent on having a tone synchronous with the onset of the test dot, but did depend on the rhythmic timing of the auditory precursor. Irregularly timed auditory precursors abolished the cross-modal effect on visual attention. Experiment 3 tested the effect of auditory entrainment on visual attention using a gap judgment task. Results were consistent with those of Experiment 1 and 2, as gap judgments were more accurate in the synchronous condition than in the asynchronous conditions.
Together, these experiments provide support for the correspondence account of cross-modal entrainment. That is, results consistently revealed that perceivers allocated more visual attention to points in time that would satisfy an extrapolation of the auditory rhythm than to other points in time. These data offer a strong behavioral demonstration that the temporal distribution of visual attention can be altered by a rhythmic auditory stimulus.
Broadly, the present findings show that the effects of entrainment on the temporal allocation of attention are not modality-specific, but rather are more general. That is, the attentional system is, in general, more prepared to respond to the occurrence of a stimulus when that stimulus occurs at the expected time even when the expected time is not cued by the modality of that stimulus. This view is consistent with the findings of Lange and Röder (2006), who examined cross-modal interactions in the temporal allocation of attention to auditory and tactile stimuli. Those authors found that attending to either a short or a long temporal interval cued by a tactile stimulus enhances performance for both tactile and auditory stimuli presented at the to-be-attended time point.
The present study is the first to show that a task-irrelevant auditory rhythm can serve to orient visual attention to time points that would fit the extrapolation of the auditory rhythm. Notably, the effects observed support an entrainment account of attentional processing and did not emerge simply because visual stimuli occurred after an expected time interval had elapsed (irregular-timing condition in Experiment 2) or because visual stimuli were more likely to occur at particular time points. Across the experiments, results of different measures (saccade latency in Experiments 1 and 2 and gap-detection accuracy in Experiment 3) suggested that this enhancement of attention could be either a criterion shift toward responding at synchronous moments in time, increased perceptual sensitivity, or both (for review, see Nobre, 2010). Additional research is needed to tease apart these possibilities.
Schroeder and Lakatos (2009) proposed that the attentional system switches between two operating modes: a continuous mode that is linked to sustained states of vigilance and a rhythmic mode. They argued that the rhythmic mode may be the preferred state of the attentional system, because it allows an animal to leverage the temporal structure of the environment to selectively enhance processing at important time points. From this perspective, the present study provides strong evidence that such a default rhythmic mode of attention can operate cross-modally.
Footnotes
Acknowledgements
We thank Ashley Herrmann, Beck Roan, and Corinne Swearingen for their assistance in gathering the data in these experiments.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
