Abstract
Previous studies have found that processing of a second stimulus is slower when the modality of the first stimulus differs, which is termed the modality shift effect. Moreover, people tend to respond more slowly to the second stimulus when the two stimuli are similar in the semantic dimension, which is termed the nonspatial repetition inhibition effect. This study aimed to explore the modality shift effect on nonspatial repetition inhibition and whether such modulation was influenced by different temporal intervals. A cue–target paradigm was adopted in which modality priming and identity priming were manipulated at three interstimuli intervals. The results showed that the response times under the modality shift condition were slower than those under the modality repeat condition. In trials with modality shift, responses to congruent cues and targets were slower than to incongruent cue–target combinations, indicating crossmodal nonspatial repetition inhibition. The crossmodal nonspatial repetition inhibition effect decreased with increasing interstimuli interval. These results provide evidence that the additional intervening event proposed in previous studies is not necessary for the occurrence of crossmodal nonspatial repetition inhibition.
To improve the efficiency of a visual search, individuals tend to be slower in directing their attention to a previously attended location than to a novel location that was not previously attended (Klein, 2000; Posner et al., 1985; Posner & Cohen, 1984). Nonspatially, this mechanism was examined by Law et al. (1995), who used a prime-neutral cue–target paradigm in which attention was first attracted to one cued color patch. The response times (RTs) to the target were slower if the color patch of the target was identical to that of the prime. However, this nonspatial inhibition could occur only when a neutral distractor was presented between the prime and the target. According to the episodic retrieval account (Grison et al., 2005; Tipper et al., 2003), the representation of a neutral event could inhibit the retrieval of the prime when the prime appeared as the target.
Nonspatial repetition inhibition could be observed not only in the unimodal domain (Q. Chen et al., 2010; Fox & de Fockert, 2001; Francis & Milliken, 2003; Zhou & Chen, 2008) but also in the crossmodal domain (Chi et al., 2014; A. Wang et al., 2020; L. Wang et al., 2012; Wu et al., 2019, 2020). For example, the response to a target, such as the Chinese sound \hong\ (\rεd\in English), is delayed if the prime was the Chinese word 红 (red in English). The neutral event, which differed from both the prime and the target, was the Chinese word 蓝 (blue in English). Such repetition inhibition could also be observed when the prime was a sound and the target was a word, which is termed crossmodal nonspatial repetition inhibition (L. Wang et al., 2012). To explain these results, the revised episodic retrieval account (L. Wang et al., 2012) was proposed. When the prime was presented, the identity and modality of the prime could be coded as an episodic representation (Kahneman et al., 1992). According to instance theory (Logan, 1995), such coding refers to the obligatory encoding processing in which attention to the cue stimuli can be stored in memory. If the target stimulus is associated with the cue stimulus, attention to the target can be retrieved from memory, which is termed obligatory retrieval. However, if a neutral event is presented before the target, a new episodic representation of the neutral event, which competes for attentional resources in the same modality as the cue, would inhibit the retrieval of the old representation of the cue. Because the nonspatial representations inhibited by repetition inhibition are supramodal (L. Wang et al., 2012), the delayed response could be detected when the old representation, which was delivered via different modalities, appeared as the target again.
Although the episodic retrieval account could explain both unimodal and crossmodal nonspatial repetition inhibition, previous studies continuously found that the effect size of crossmodal nonspatial repetition inhibition was larger than that of unimodal nonspatial repetition inhibition (A. Wang et al., 2020; L. Wang et al., 2012; Wu et al., 2019). To further explain the difference in effect size, L. Wang et al. (2012) proposed that the participants adopted an attentional set to make it easier to create a new representation of the target when the target had different modalities from the prime. Other researchers suggested that the difference in the modality processing of the target resulted in larger effects under the crossmodal condition (A. Wang et al., 2020; Wu et al., 2019). For example, retrieving an inhibited representation delivered via the auditory modality was more difficult than that delivered via the visual modality when the prime modality was visual. However, these explanations merely focused on the modality of the prime (Chi et al., 2014) or compared different functions of the target modality (A. Wang et al., 2020; Wu et al., 2019). More importantly, the additional shift cost between the different modalities of the prime and the target that does not occur in the unimodal effect was not considered. Thus, this study proposed the assumption that larger crossmodal nonspatial repetition inhibition might be caused by not only an inhibitory response to repeated representation due to a neutral event but also the additional cost of modality shift between different prime and target modalities. To test this assumption, it was necessary to investigate whether crossmodal nonspatial repetition inhibition could occur when no neutral event is presented between the prime and the target.
Evidence suggests that the modality of first stimuli can affect the processing of second stimuli. For example, when a series of lights and tones were presented randomly, the response to the imperative stimuli, which were presented in different modalities, was longer than that when the stimulus was presented in repeated modality. This effect is called the modality shift cost (modality shift effect [MSE]; Cohen & Rist, 1992; Post & Chapman, 1991 ) and reflects the insufficient capacity to deploy attention to sequential modalities (Turatto et al., 2002). When researchers investigated the nonspatial attention shifts among visual, auditory, and tactile modalities, the MSE could be observed under both expected and unexpected conditions (Spence et al., 2001). Interestingly, such cognitive cost occurs not only during the early perception stage (Töllner et al., 2009) but also during the semantic or conceptual processing stage (Scerrati et al., 2015, 2017). Thus, previous results provided plausible evidence suggesting that compared with unimodal conditions, the modality shift under crossmodal conditions might cause an additional cost that contributes to larger nonspatial repetition inhibition.
Taken together, this study investigated whether crossmodal nonspatial repetition inhibition could occur in the absence of a neutral event. One of the most direct ways to investigate this question is to remove the neutral event when adopting a cue–target paradigm in which two stimuli (the cue and the target) are presented in series at a central location. To compare the priming effect under the unimodal condition, the modality of the prime could be the same as or different from that of the target. In addition, it has been demonstrated that the processing time of modalities plays an important role in the crossmodal priming effect (Y. C. Chen & Spence, 2011). Thus, the interstimuli intervals (ISIs) were considered in the experimental design. Based on these manipulations, we predicted that if the MSE did not cause an additional cost of shifting between the prime and the target, a facilitatory effect could be observed under the crossmodal condition similar to the unimodal condition (Fuentes et al., 1999; Law et al., 1995; Spadaro et al., 2012). In contrast, nonspatial repetition inhibition might occur across modalities due to the modality shift cost. Finally, various cue–target temporal intervals could affect the repetition effect (List & Robertson, 2007; Zhou & Chen, 2008). We predicted that the repetition effect size would decrease with increasing ISIs.
Material and Methods
Participants
Twenty-five participants 1 (11 males, 14 females, age range from 18 to 25 years) participated in Experiment 1, and another 24 participants (9 males, 15 females, age range from 19 to 25 years) participated in Experiment 2. All participants had normal/corrected-to-normal vision, were right-handed, did not have mental illness or color blindness, and were naive to the purpose of the experiment. In accordance with the Helsinki Declaration, written informed consent was obtained prior to their participation, and payment was obtained after their participation. This experiment was approved by the Academic Committee of the Department of Psychology, Soochow University, China.
Stimuli and Apparatus
The stimuli were displayed on a Dell monitor (resolution: 1,920 × 1,080; refresh rate: 60 Hz; size: 24 in; model, Dell E2216HV). The visual and auditory stimuli contained two written words and their verbal sounds, which were presented in the form of Chinese letters [红(\hong\) and蓝(\lan\) corresponding to red(\rɛd\) and blue(\blu\) in English]. The fixation point (“+,” 0.5° × 0.5°) and written words (2° × 2°) were placed in the center of a square box (2.5° × 2.5°) with a black frame throughout the experiment. The verbal sounds were delivered binaurally via stereo headphones. All stimuli were displayed on a gray background.
Design and Procedures
Experiment 1 adopted a 2 (Modality Priming: Modality Repeat and Modality Shift) × 2 (Identity Priming: Identity Congruent and Identity Incongruent) × 3 (ISIs: Short, Medium and Long) within-subjects design. Under the modality repeat condition, the cue and target stimuli were presented in the same modality; otherwise, the condition was a modality shift condition. Under the identity congruent condition, the identity of the cue was same as that of the target; otherwise, the condition was an identity incongruent condition.
The procedures of the experiment are depicted in Figure 1. At the beginning of each trial, a fixation point inside a square box was presented for a random duration (750–850 milliseconds) at the center of the screen. The cue was subsequently presented for 300 milliseconds. After an interval of 100 milliseconds/500 milliseconds /900 milliseconds, the target was presented for 300 milliseconds. In the response blank, the participants performed a discrimination task as quickly and accurately as possible or until 1,000 milliseconds passed. If the target was 红(\hong\), the participants should press F. If the target was 蓝(\lan\), the participants should press J. The response buttons were counterbalanced across the participants. The intertrial interval randomly ranged from 1,500 to 2,000 milliseconds. The outline of the square box did not disappear until the judgment was complete. Before the formal experiments, the participants were informed of the instructions and task and completed a practice block that included 96 trials. The participants did not enter the formal experiments until their accuracy during the practice block was at least 90%. The procedures of the practice block were identical to those of the formal experiment. There were 10 blocks in the formal experiment, and each block consisted of 96 trials and 12 experimental conditions. The cue was uninformative regarding both the modality and identity of the target. The cue and the target could be 红 or 蓝 (50%) and could be randomly presented in the same or different modalities. The ISIs varied across trials in a randomly selected order. The participants sat approximately 60 cm from the screen and were required to maintain their attention on the fixation point. There was a 1-minute rest between blocks. The entire experiment lasted for approximately 1.5 hours.

Schematic Diagram of the Experimental Procedures.A visual word or an auditory sound was presented as a cue first. After a random ISI (100 milliseconds, 500 milliseconds, or 900 milliseconds), a visual word or an auditory sound was presented as the target. The cue and target could have the same or different semantic identities and could be delivered via either the same or different modalities. ISI = interstimuli interval.
Experiment 2 was a replication study which adopted a 2 (Modality Priming: Modality Repeat and Modality Shift) × 2 (Identity Priming: Identity Congruent and Identity Incongruent) × 2 (ISIs: Short and Long) within-subjects design. The procedures were the same as described in Experiment 1, except for the total number of trials was 640 trials divided into 8 blocks. The whole experiment lasted for approximately 1 hour.
Results
RTs with an extreme response (beyond 100 or 1,000 milliseconds), incorrect responses, and responses beyond ± 3 SDs were excluded from the analysis (5.6% in Experiment 1 and 4.7% in Experiment 2). The data from three participants (one male and two females) were excluded due to their low accuracy in Experiment 1 (all below 90%). With the correct mean RTs of the participants, the data were analyzed via a three-way within-subjects analysis of variance. With the two-way interactions between modality priming and identity priming, we predicted that a repetition priming/inhibition effect would be observed under the modality repeat/shift condition. Thus, we compared the main effects of identity priming under different levels of modality priming. With the three-way interactions, we predicted that both the repetition priming effect size and repetition inhibition effect size would be decreased with increasing ISIs. Thus, we compared the repetition effects under different ISI levels. The data from Experiment 1 (see Table 1) and Experiment 2 (see Table 2) are plotted in Figures 2 and 3, respectively.

Mean RTs and IOR effect size for each condition in Experiment 1. (A) Short ISI condition, (B) medium ISI condition, and (C) long ISI condition (the mean response times are a function of modality priming and identity priming in Experiment 1). (D) The cueing effect (identity_congruent-identity_incongruent) under each experimental condition. Error bars represent the standard deviation of the mean. *p < .05, **p < .01, ***p < .001, ns p > .05.

Mean RTs for each condition in Experiment 2. (A) Short ISI condition and (B) long ISI condition (the mean response times are a function of modality priming and identity priming in Experiment 2). Error bars represent the standard deviation of the mean. *p < .05, **p < .01, ***p < .001, ns p > .05.
Mean Response Times (M±SD Milliseconds) and Repetition Effects Sizes (identity_congruent minus identity_incongruent; Milliseconds) Under Each Condition in Experiment 1.
Note. ISI = interstimuli interval.
Mean Response Times (M±SD Milliseconds) and Repetition Effects Sizes (identity_congruent minus identity_incongruent; Milliseconds) Under Each Condition in Experiment 2.
Note. ISI = interstimuli interval.
Experiment 1
With regard to our hypothesis, the interaction between identity priming and modality priming was significant, F(1, 21) = 100.76, p < .001, η2 = .84. This interaction revealed a repetition priming effect under the modality repeat condition and a repetition inhibition effect under the modality shift condition, F(1, 21) = 10.85, p < .01, η2 = .34 and F(1, 21) = 27.01, p < .001, η2 = .56. The three interactions were significant, F(2, 42) = 19.58, p < .001, η2 = .48. To reveal the ISIs in the identity priming effect (identity_congruent minus identity_incongruent) modulation, we compared the repetition priming effects or repetition inhibition effects at different ISIs. The results revealed that the repetition priming effect under the short ISI conditions was larger than that under the medium and long ISI conditions, 900 milliseconds versus 500 milliseconds, t(21) = −2.56, p < .05, Cohen’s d = 0.56; 900 milliseconds versus 100 milliseconds, t(21) = −4.84, p < .001, Cohen’s d = 1.03; 500 milliseconds versus 100 milliseconds, t(21) = −2.87, p < .01, Cohen’s d = 0.6, and that the repetition inhibition effect under the short and medium ISI conditions was larger than that under the long ISI conditions, 900 milliseconds versus 500 milliseconds, t(21) = 4.34, p < .001, Cohen’s d = 0.92; 900 milliseconds versus 100 milliseconds, t(21) = 2.59, p < .05, Cohen’s d = 0.53; 500 milliseconds versus 100 milliseconds, t(21) = −0.13, p = .9, Cohen’s d = 0.04.
The main effects of the ISIs (faster RTs with long ISIs) and modality priming (slower RTs with a modality shift) were observed, F(2, 42) = 76, p < .001 and F(1, 21) = 15.16, p < .001. The main effect of identity priming did not reach significance, F(1, 21) = 0.77, p = .39. The interaction between the ISIs and modality priming was significant, F(2, 42) = 46.91, p < .001. This interaction revealed a significant MSE under the short ISI condition, F(1, 21) = 56, p < .001 and no MSE under the medium, F(1, 21) = 1.22, p = .28, and long, F(1, 21) = 0.9, p = .35, ISI conditions. The interaction between the ISIs and identity priming did not reach significance, F(2, 42) = 1.23, p = .30.
Experiment 2
With regard to our hypothesis, the interaction between identity priming and modality priming was significant, F(1, 23) = 88, p < .001, η2 = .79. This interaction revealed a repetition priming effect under the modality repeat condition and a repetition inhibition effect under the modality shift condition, F(1, 23) = 45.55, p < .001, η2 = .66 and F(1, 23) = 40.08, p < .001, η2 = .64. The three interactions was significant, F(1, 23) = 37.94, p < .001, η2 = .62. To reveal the ISIs in the identity priming effect modulation, we compared the repetition priming effects or repetition inhibition effects at different ISIs. The results revealed that the repetition priming effect under the short ISI condition was larger than that under the long ISI condition, F(1, 23) = 23.8, p < .001, η2 = .51, and that the repetition inhibition effect under the short ISI condition was larger than that under the long ISI condition, F(1, 23) = 16.39, p < .001, η2 = .42.
The main effects of the ISIs (faster RTs with long ISIs) and modality priming (slower RTs with a modality shift) were observed, F(1, 23) = 80.13, p < .001 and F(1, 23) = 20.83, p < .001. The main effect of identity priming did not reach significance, F(1, 23) = 1.33, p = .26. The interaction between the ISIs and modality priming was significant, F(1, 23) = 59.28, p < .001. This interaction revealed a significant MSE under the short ISI condition, F(1, 23) = 46.58, p < .001, and no MSE under the long ISI condition, F(1, 23) = 3.55, p = .07. The interaction between the ISIs and identity priming did not reach significance, F(1, 23) = 0.003, p = .96.
Discussion
The present experiments investigated whether crossmodal nonspatial repetition inhibition based on a modality shift could occur. A cue–target paradigm in which a neutral event was not presented was adopted. Three main results emerged in both experiments. First, the responses under the crossmodal condition were slower than those under the unimodal condition, indicating that a response cost occurs when shifting between modalities. Second, the modality shift cost could cause repetition inhibition under the crossmodal condition, while a facilitatory effect occurs under the unimodal condition. Third, the inhibitory effect size became less significant or nonsignificant as the ISIs increased, indicating that the modality shift cost that mainly affected crossmodal nonspatial repetition inhibition was mediated by the temporal intervals. Taken together, these results provide new evidence suggesting that the MSE could cause an additional cost, which contributed to the difference between unimodal and crossmodal nonspatial repetition inhibition. A preparation effect is proposed to explain the dynamic changes with different ISIs.
Different Mechanisms of Unimodal and Crossmodal Nonspatial Repetition Inhibition
This study found that crossmodal nonspatial repetition inhibition could occur without the presence of a neutral event. However, a neutral event is critical for nonspatial repetition inhibition to occur in the unimodal domain. Researchers found that the facilitatory effect could become nonspatial repetition inhibition only when a neutral event that is uninformative regarding both the cue and the target was presented (Fuentes et al., 1999; Hilchey et al., 2017a, 2017b; Klein et al., 2015; Law et al., 1995; Spadaro et al., 2012; Spadaro & Milliken, 2013; Wang et al., 2020; Wu et al., 2020). Thus, a neutral event presented between the cue and the target resulted in a cost for the retrieval of the cue. According to the episodic retrieval account (Grison et al., 2005; Tipper et al., 2003), a new representation of the neutral event resulted in the inhibition of the old representation of the prime.
In contrast to the unimodal condition, the present studies indicate that a neutral event is not critical under the crossmodal condition. On the one hand, processing stimuli delivered via different modalities with a temporal interval can produce a shift cost for both relevant and irrelevant information because attention is limited to be serially directed to each modality in turn (Turatto et al., 2002). On the other hand, whether the old episodic representation of the cue is updated or a new representation is created is determined by the attentional set (Lupiáñez et al., 2001). Thus, when participants are required to complete a discrimination task, they might adopt an attention set creating a new representation because integrating different representations of a cue and target that have different modalities could incur an additional cost. Thus, participants tend to respond more quickly under the identity_incongruent and modality_shift conditions than under the identity_congruent and modality_shift conditions, indicating a crossmodal nonspatial repetition inhibition. An alternative explanation was that previous study (Hommel et al., 2004) demonstrated not only that RTs were faster in repetitions but also that RTs in trials with full switches (i.e., modality shift and identity incongruent condition) were faster than switches in one dimension only (modality shift and identity congruent condition).
Mechanism of the MSE Underlying the Repetition Effect
Without a neutral event between the prime and the target, the present experiments found a significantly faster response under the unimodal condition than under the crossmodal condition regardless of whether the identity was congruent or incongruent, indicating an MSE (Collins et al., 2011; Marques, 2006; Mühlberg et al., 2014; Scerrati et al., 2017; Spence et al., 2001; Spence & Driver, 1997). To explain the MSE, we propose that modality-specific areas might engage in processing visual and auditory information separately. Neuroimaging studies have shown that the visual cortex is active when processing color written words (Simmons et al., 2007) and that the auditory area is active when processing auditory concepts (Kiefer et al., 2008). Given that the cue stimuli presented in a limited amount of time could induce automatic attention (Scerrati et al., 2015; Spence et al., 2001), the lower level color sensation of the visual cue is activated in the occipital cortex (Simmons et al., 2007). Another study proposed that when participants are required to only attend to visual stimuli and auditory stimuli separately in a crossmodal interaction, dissociable neural networks are revealed (Q. Chen & Zhou, 2013). Altogether, these findings suggest that the properties (e.g., color) of stimuli delivered via visual or auditory modalities might be modality-specific. Thus, when stimuli are repeated with stimuli of the same modality, the activating information of the preceding cue could facilitate the subsequent processing of the target, indicating a facilitatory effect. In contrast, as the prime preactivates a specific sensory modality, the repetition of stimuli delivered via different modalities induces a modality shifting cost, indicating an inhibitory effect.
Mechanism Underlying the Unimodal Effect
Regarding the unimodal trials, the repetition effect resulted in faster responses to repeated targets compared with those to nonrepeated targets, indicating a facilitatory effect. This repetition benefit was not only decreased as the ISIs increased but also eventually vanished under the long ISI condition. These results are consistent with previous studies in which a neutral event was not presented (Fuentes et al., 1999; Kwak & Egeth, 1992; Law et al., 1995; Lupiáñez et al., 2001; Milliken et al., 2000; Spadaro et al., 2012; Tanaka & Shimojo, 1996). Similarly, Hu and Samuel (2011) investigated color repetition and found that the facilitatory effect was the strongest at the short stimulus-onset asynchronies (SOAs) and residual at the longer SOAs. The response to a repeated target is rapid due to the S-R binding (Hommel, 1998; Kahneman et al., 1992; Pashler & Baylis, 1991), providing a benefit to cueing the retrieval of a similarly encoded cue. The habituation-based model proposes that (Dukewich, 2009) if the cue and target are sufficiently close in properties (e.g., temporal, spatial, and modality), the processing of the target could benefit from the processing shared by the cue. The similarity between the cue and the target could enhance integration and ultimately increase habituation (Dukewich, 2009; Lupiáñez, 2010). In the present experiments, under the unimodal trials, the location and modality were sufficiently close between the prime and the target. Once the cue appeared, the episodic representation was activated. If the representation of the target was the same as that of the cue, the response would be enhanced. However, this enhancement was limited to time. Thus, the more similar the cue and the target in time and modality, the larger the facilitatory effect.
Preparation Effect for the Dynamic Changes in the Crossmodal Effect
Regarding the crossmodal trials, nonspatial repetition inhibition could occur based on the modality shift between the cue and the target. Specifically, there was no significant difference in nonspatial repetition inhibition between the short ISIs and medium ISIs, which significantly differed from the long ISIs. Although crossmodal nonspatial repetition inhibition was observed under the long ISIs condition, the modality shift did not engage in the integration because there was no MSE under both the identity congruent condition and identity incongruent condition. In contrast, crossmodal nonspatial repetition inhibition under the short ISI condition was entirely based on the modality shift because the MSE was significant under both the identity congruent condition and identity incongruent condition. Here, we propose that a preparation effect exists in different modalities (Murray et al., 2009; Quinlan & Hill, 1999; Spence & Driver, 1997). In this study, the RTs after the long ISIs were significantly faster than those after the medium and short ISIs. Moreover, the manipulation of the ISIs affected the MSE, which led to reduced or nonsignificant shift costs at longer ISIs. The preparation effect was examined in a crossmodal situation during which attention shifted between bivalent stimuli (Lukas et al., 2010). In their experiments, the cue–target modality mapping was manipulated in a crossmodal task shift. The results showed that there were reduced modality shift costs at the long ISIs compared with the short ISIs. The present experiments extended the preparation effect of the modality shift task by showing that the manipulation of the modality repeat between the cue and the target under different ISIs was effective. At the long ISIs, we only found a shift benefit under the identity incongruent condition. Based on this evidence, the residual effect at the long ISIs was due to a longer preparation interval, which converted the shift costs into a shift benefit. The temporal dynamics of preparation first influenced the MSE, which, in turn, affected nonspatial repetition inhibition. From this perspective, we propose that modality shifts mainly affect crossmodal nonspatial repetition inhibition during the early stage.
Conclusion
In this study, we propose that a neutral event is not necessary for the occurrence of crossmodal nonspatial repetition inhibition. Crossmodal nonspatial repetition inhibition based on a modality shift is mediated by the temporal interval.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Natural Science Foundation of China (31871092 and 31700939). A.W. was also supported by the Natural Science Foundation of Jiangsu Province (BK20170333) and the Ministry of Education Project of Humanities and Social Sciences (17YJC190024).
