Abstract
Pairing a single visual stimulus with multiple auditory stimuli will lead to the illusory perception of multiple visual stimuli, which is known as sound-induced flash illusion (SIFI). The present study adopted the classic SIFI paradigm to investigate whether value-associated tasks could affect the SIFI. By adjusting the sequence of reward and nonreward conditions, we also examined the effect of reward history on SIFI. The results showed that the fission illusion was reduced when associated with momentary reward, demonstrating significantly higher accuracy and discriminability than the nonreward condition. However, the fusion illusion was not affected by the momentary reward, and the explanation was that the fusion illusion was not as stable as the fission illusion and disappeared across different trials and conditions. Moreover, the robustness of reward history in the present study was not as strong as previous studies have suggested, indicating that the effect of sound on the perceptual representation of visual stimuli is strong and robust to reward history. These findings demonstrated that the reward could reduce the SIFI and broaden the existing dichotomy of SIFI. New evidence for the operation of value-driven attention mechanisms is also provided, suggesting that the underlying value-driven attention operates across multiple sensory systems.
Introduction
Multisensory integration is a fundamental perceptual process by which stimulus information arriving from different senses is received and registered and then finally combined to create a unified percept (Keil, 2020). The sound-induced flash illusion (SIFI) demonstrates an auditory-dominated multisensory integration phenomenon, in which pairing visual flash with an unequal number of auditory beeps in 100 ms successively or simultaneously will lead to the illusory perception of visual stimuli, the number of which is equal to that of auditory beeps (Shams et al., 2000, 2002). The SIFI can be divided into fission and fusion illusions. The fission illusion refers to combining two auditory stimuli with one visual stimulus that can induce the perception of two visual stimuli (Shams et al., 2000, 2002). The fusion illusion refers to combining one auditory stimulus with two visual stimuli that can induce the perception of one visual stimulus (Andersen et al., 2004).
Some studies highlighted the role of attention in SIFI and found that attentional distribution and endogenous attention both have an impact on SIFI (Andersen et al., 2004; Michail & Keil, 2018; Mishra et al., 2010). Looking back to the prominent models of attentional control, a dichotomy has been generally asserted between top-down and bottom-up control. The former is determined by current selection goals, and the latter is determined by physical salience (Awh et al., 2012). Available evidence supports the view that various manipulations of the constituting stimuli and the task can increase or decrease the likelihood of an illusion. For example, increased spatial frequency and visual complexity of the visual stimulus and reduced loudness of the auditory stimuli could decrease the illusion, whereas increased luminance contrast has the opposite effect (Pérez-Bellido et al., 2015; Takeshima & Gyoba, 2013, p. 2015). Moreover, top-down factors, such as increased cognitive load induced by an n-back task, have been found to result in higher illusion rates, whereas expectations regarding the proportion of SIFI trials could reduce illusion occurrence (Michail & Keil, 2018; Wang et al., 2019). Based on previous studies, factors that are considered to have an impact on SIFI can also be separated into top-down and bottom-up factors (Keil, 2020). However, this theoretical dichotomy fails to explain a growing number of cases in which neither goal-directed nor stimulus-driven frameworks can account for strong selection biases, such as the role momentary reward plays in attentional priority (Anderson, 2015; Anderson et al., 2011a, 2011b; Awh et al., 2012).
The notion of incentive motivation states that monetary gains can enhance perceptual and executive control processes to achieve more efficient goal-directed behavior (Pessoa & Engelmann, 2010). In this view, it is natural to presume that the role of reward is to provide motivational significance, and motivational significance evokes all the known consequences of voluntary attentional orienting to the rewarded items (Awh et al., 2012). However, more studies have shown that when equally salient stimuli are associated with reward predictors, these stimuli gain a competitive advantage that promotes attentional selection (Anderson, 2013; Anderson et al., 2011a, 2011b; Wang et al., 2013, 2014). Thus, value-driven attentional priority has been recently proposed to account for reward-induced selection biases in addition to salience-driven (bottom-up) and goal-driven (top-down) mechanisms (Anderson et al., 2011a, 2011b; Awh et al., 2012).
More research has suggested that value-driven attention reflects a broad principle of information processing that can be extended to other sensory modalities (Anderson, 2016a, 2016b; Pooresmaeili et al., 2014). It is now well established that the priming of a visual stimulus can be modulated by reward information (Hickey & Los, 2015). However, there is also evidence that auditory stimuli can affect value-driven attention in the visual domain. For example, a previous study found that learned associations with reward could modulate the sensory processing of a stimulus within modalities other than vision, and previously reward-associated sounds could facilitate the processing of visual stimuli through cross-modal interactions (Pantoja et al., 2007). Specifically, visual stimuli paired with reward-indicated sound will subsequently capture attention (Miranda & Palmer, 2014), and a sound previously associated with high reward interferes more strongly with the recognition of a visual target than a sound previously associated with relatively low reward (Anderson, 2016a, 2016b). These studies suggested that value-driven attention is not domain-specific; instead, it operates across multiple modalities and can bias cross-modal stimulus competition. This finding leaves the question of to what extent value-driven attention reflects a broad principle of information processing that extends across different modalities.
Thus, it becomes more desirable to investigate whether value-based attentional priority can influence cross-modal stimulus competition, such as SIFI. To our knowledge, one study first introduced the variable of reward in SIFI, investigating whether increasing motivation to perform accurately would render feedback training effective in reducing the magnitude of SIFI (Rosenthal et al., 2009). This finding suggested that monetary reward combined with feedback would lead to a statistically significant effect on the magnitude of the illusion, while the effect was not significant with feedback alone, indicating the interaction of reward and feedback functions. Given that this study aimed to investigate the impact of feedback on SIFI, a momentary reward was used to enhance participants’ motivation in feedback training rather than adopted as a factor that affects stimulus processing. Thus, the reward variable was not analyzed in this study, and its effect on the SIFI remains unclear.
To determine whether monetary rewards could affect SIFI, the present study investigated SIFI under two conditions, both reward and nonreward conditions. Reward was not given unless the participant's performance reached the standard. Participants accepted all experimental conditions and were informed of the block attribute with the clue presented before the stimuli. In Experiment 1, we explored the effect of reward on the SIFI with the block presented in the order of the reward condition and then the nonreward condition thrice. To prevent the impact of the combination of reward and instant feedback, which was mentioned in Rosenthal et al.'s (2009) discovery, feedback was not given after each response in our experiment. Given that an increasing number of findings have recently demonstrated that visual selection is biased toward items associated with the previous reward, which is known as “reward history” (Anderson et al., 2011a, 2011b; Della Libera & Chelazzi, 2009; Della Libera et al., 2011; Krebs et al., 2010, 2011; Raymond & O'Brien, 2009; Rutherford et al., 2010), we adjusted the sequence of block presentation in Experiment 2, and the reward condition did not appear until nonreward conditions were presented.
On the one hand, previous studies have altered the physical salience of stimuli in SIFIs, such as the shape, color, brightness, and size of the visual flash; the frequency and intensity of the auditory sound; and the special location of audiovisual stimulation. The results showed that no disruption was found in the fission illusion, but a disruption of the fusion illusion was noted, suggesting that the fusion illusion could be less stable than the fission illusion (Abadi & Murphy, 2014; Kawabe, 2009; Shams et al., 2000, 2002, 2005a, 2005b; Watkins et al., 2007a, 2007b). Previous studies that examined top-down and bottom-up factors in SIFI indicated that fission illusion could be significantly affected by selection goals and physical salience, whereas minimal or no impact was found on fusion illusion (Abadi & Murphy, 2014; Wang et al., 2019). On the other hand, the regulatory mechanism of reward on selective attention also enables the stimuli to gain more attentional resources in competition (Anderson, 2016a, 2016b; Chelazzi et al., 2013), increasing the accuracy of discrimination. The impact of reward on audiovisual integration also supported that reward-associated sounds could affect the processing of visual stimuli, increasing the sensitivity of visual perception, even when the sounds and reward associations were both irrelevant to the visual task. Therefore, we hypothesized that the fission illusion in SIFI would be reduced under reward conditions, whereas the fusion illusion would not show consistent effects of reward because the fusion illusion effect is weak and easily disappears across trials.
Experiment 1
Method
Participants. The sample size was calculated using G*Power 3.1.9.7. Using 2 (reward type: reward vs. nonreward) × 2 (condition: F1B2 vs. F2B1) repeated measures analysis of variance as the statistical test. The parameter effect size ƒ was set to 0.4, and the probability of type I error (α err prob) was 0.05. The power (1 − β err prob) was 0.95, and the sample size was calculated to be 24. Thirty participants (14 males, aged 19–23 years, M = 20.37, standard deviation [SD] = 1.02) were recruited by advertisement to participate in this experiment. All participants were naive to the experimental procedure and were paid for their participation in the experiment. All participants reported normal or corrected-to-normal vision, and all had no known neurological, psychiatric, or visual disorders. Before the experiments, all participants provided their written informed consent following the standard of the Declaration of Helsinki. The study protocol was approved by the Ethics Committee of Soochow University.
Apparatus and materials. All stimuli were presented on a View Sonic P220f VS10284, the screen resolution was 1024 × 768 pixels and the refresh rate was 100 Hz. All visual stimuli in the experiment were presented on a black background (red–green–blue [RGB]: 0, 0, 0) by Presentation Software (Neurobehavioral Systems Inc.). The reward attribute indicator, that is, clue (¥, #) subtended 2° at an eccentricity of 5° below the central fixation point on a black background for 500 ms. The visual stimuli were white disks (visual angle of 2°, RGB: 255, 255, 255) that presented a 5° visual angle below the central fixation point (RGB: 255, 255, 255) for 17 ms because with the accompanying auditory stimuli, the visual stimuli had the greatest illusion effect in the peripheral field (Shams et al., 2002; Figure 1). The auditory stimuli in the experiment were presented through a head-mounted iron triangle earphone (ATH-WS99) at 75 dB and 3.5 kHz.

Experimental procedure and temporal profile of presentation of the stimuli. A clue indicated that the reward attribute (500 ms) was presented before the stimuli. Each trial contained one or two visual flashes (17 ms) with zero, one or two sounds (7 ms). The time interval between the two flashes was 66 ms, and the time interval between the two sounds was 76 ms. The interval between trials was random from 1500 to 2500 ms (in steps of 250 ms).
Experimental Design and Procedure. The experiment was a 2 (Reward Type: reward vs. nonreward) × 6 (Condition: F1 vs. F2 vs. F1B1 vs. F1B2 vs. F2B1 vs. F2B2) within-subject design. The six experimental conditions were composed of visual flash stimuli and auditory sound stimuli. For ease of discussion, the task types are all expressed in abbreviations, specifically: F stands for flash and B stands for sound. For example, “F1B1” refers to the task presenting one visual flash stimulus accompanied by one auditory sound stimulus; “F1B2” refers to that with one flash followed by two sounds, and “F1” refers to a trial with only one flash. The F1B2 and F2B1 conditions are thought to induce the participant's illusion.
Participants were asked to focus on the central fixation point throughout the experiment and press the correct button corresponding to the number of flashes with the index finger or middle finger. One button was used for a single flash, and the other button was used for a double flash. At the beginning of the experiment, participants were required to have sufficient practice to determine whether they understood the task and could discriminate between the beeps or flashes in isolation. Instant feedback is given after each trial, and the average response time is recorded as the baseline response in the formal experiment. After the practice, the participants were told that the formal experiment would be divided into two types: the reward condition and the nonreward condition. In each reward-related block, the participants were informed that quick (faster than the baseline response, according to the average reaction time in practice) and accurate (the correct rate reached up to 75%) responses would win them an additional ¥5 reward. However, under nonreward conditions, even if the participants’ response reached the standard, no additional reward was given.
The formal experimental procedure is shown in Figure 1. After the presentation of the central fixation point (duration time: 100 ms), a clue appeared and lasted for 500 m. The clue was located directly below the central cross, indicating the attribute of the block. The ¥ clue indicated that this stage was a reward-related block and that the current quick and accurate response was related to additional rewards. When the clue was #, it indicated that this stage was a reward-independent block, specifying that the current reaction had nothing to do with additional rewards. The visual flash stimulus and auditory sound stimulus were simultaneously presented. The duration of the auditory sound stimulus was 7 ms, and that of the visual flash stimulus was 17 ms. The time interval between the two visual flash stimuli was 66 ms, and the time interval between the auditory sound stimuli was 76 ms (Andersen et al., 2004; Shams et al., 2002). The interval between trials was random from 1500 to 2500 ms in steps of 250 ms after the stimuli were presented. No instant feedback was provided after each trial, and feedback was only given at the end of each block. The experiment consisted of 96 trials of each block, amounting to a total of 576 trials ordered pseudorandomly. Participants completed all the experimental conditions. In most cases, the session took approximately 60 min.
Results
Accuracy
Figure 2 shows the accuracy of the judgment under each experimental condition in Experiment 1. We found that participants achieved higher accuracy in the F1, F2, F1B1, and F2B2 conditions regardless of the reward type, demonstrating that participants were able to perceive the number of flashes more accurately when the visual flash stimulus was presented without sounds or when the number of flashes was equal to that of auditory stimuli.

Accuracy under each experimental condition in Experiment 1. The different color bars are used to mark the reward type. Gray bar: the proportion of trials with rewards; silver bar: the proportion of trials with nonrewards. F1 refers to a visual flash stimulus without sound and F2 refers to two flashes. F1B1 stands for one flash stimulus with a single auditory sound and F2B2 stands for two flashes and an auditory sound. F1B2 and F2B1 refer to illusion conditions, wherein one flash is paired with two auditory stimuli and two flashes are paired with one auditory stimulus. *p < .05.
For both reward and nonreward conditions, repeated measures were used to analyze the difference of the six conditions in SIFI, and the results both showed significant differences in the two reward types [F(1,29) = 18.33 and p < .001; F(1.29) = 18.58 and p < .001]. The results of post hoc tests showed that for the reward condition, the accuracy of F1B2 was significantly lower than that of the nonillusion conditions (ps < .05). The accuracy of F2B1 was also significantly lower than that of nonillusion conditions (ps < .001). For the nonreward condition, the accuracy of F1B2 was significantly lower (ps < .001) than that of the nonillusion condition. The accuracy of F2B1 was also significantly lower than that of the nonillusion conditions (ps < .05). These results supported the appearance of fusion and fission illusions.
Considering that F1B2 and F2B1 are two different types of illusions and that the underlying mechanism and influencing factors also differ (Abadi & Murphy, 2014; Kostaki & Vatakis, 2016; Mishra et al., 2008), we compared the reward and nonreward types under these two illusion conditions. Figure 3a reveals the tendency of F1B2 and F2B1 in Experiment 1, and the line charts show that the accuracy of the fission illusion under the reward condition was higher than that under the nonreward condition. The accuracy of the fusion illusion under reward was slightly lower than that under nonreward, which might account for the instability of the fusion effect. Tukey's method was performed to test whether the reward could have a significant impact on them. For the reward and nonreward conditions in F2B1, t < 1. For the reward and nonreward conditions in F1B2, t(29) = 2.63, p = .014, and Cohen's d = 0.20. The results showed that the accuracy of F1B2 under the reward condition (M = 52%, SD = 0.25) was significantly higher than that under the nonreward condition (M = 46%, SD = 0.29). These results indicated that the reward increased the accuracy of participants’ judgment of the fission illusion.

Tendency of fission and fusion illusions. (a) Represents the mean accuracy (%) of fission and fusion effects in Experiment 1. (b) Represents the mean accuracy (%) fission and fusion effects in Experiment 2.
Signal Detection Theory Analysis of Fission and Fusion Illusions
To determine whether the different magnitudes of the fission and fusion illusions in the reward versus nonreward were attributable to a change in the discriminability of the flashes and/or criterion for reporting the number of flashes induced by the presentation of the beeps, the data were analyzed in terms of signal detection theory (McCormick & Mamassian, 2008; Violentyev et al., 2005). Conditions were divided into fission conditions (F1B2 and F2B2) and fusion conditions (F1B1 and F2B1). For each of these conditions, sensitivity measure d′ and response bias c were calculated. Here d′ and c were calculated using the following equations (Macmillan & Creelman, 2004; Rosenthal et al., 2009; Stanislaw & Todorov, 1999):

(a) and (b) Represent the mean discriminability (d′) and criterion measure (c) for Experiment 1. (c) and (d) Represent the mean d′ and c for Experiment 2. Error bars represent the standard error of the mean. *p < .05, **p < .01.
Experiment 2
Reward history has been reported to influence selection processes in visual search (Kiss et al., 2009; Kristjánsson et al., 2010), and learned stimulus–reward associations have also been shown to affect subsequent attentional selection. For example, when examining whether the learned value of a stimulus could modulate salience-based attentional priority, Anderson et al. (2011a, 2011b) discovered that a colored distractor associated with reward in the pretraining phase could still have an impact on participants’ target processing in the training phase, wherein the reward contingency had been eliminated and the target never appeared in the previously rewarded color. Thus, even when the reward was not related to the present task, the participants might mistake the task for a reward-associated task due to previous experiences. To further explore whether the effect of the fission illusion was completely affected by the reward itself or reward history also played a role in the influence, we performed Experiment 2.
Method
Participants. Thirty participants (7 males, aged 18–23 years, M = 19.50, SD = 1.41) were recruited by advertisement to participate in this experiment. All participants had not been involved in similar experiments and were paid for their participation in the experiment. All participants reported normal or corrected-to-normal vision, and all participants had no known neurological, psychiatric, or visual disorders. Before the experiments, all participants gave their written informed consent following the standard of the Declaration of Helsinki. The study protocol was approved by the Ethics Committee of Soochow University.
Apparatus and Materials. The apparatus and materials were consistent with Experiment 1.
Experimental Design and Procedure. The experiment was a 2 (Reward Type: reward vs. nonreward) × 6 (Condition: F1 vs. F2 vs. F1B1 vs. F1B2 vs. F2B1 vs. F2B2) within-subjects design. The sequence of block presentation was adjusted: three blocks under the nonreward condition were presented, and then the three blocks were presented under the reward condition. The task and procedure were similar to Experiment 1, participants were also asked to make their quick and accurate responses.
Results
Accuracy
In terms of accuracy in each condition of Experiment 2 (see Figure 5), the participants achieved higher accuracy in the F1, F2, F1B1, and F2B2 conditions regardless of the reward type, demonstrating that they could judge the number of flashes more accurately when the visual flash stimulus was presented alone or when the number of flashes was equal to that of auditory stimuli.
For both reward and nonreward conditions, repeated measures were used to analyze the difference of the six conditions in SIFI, and the results both showed significant differences in the two reward types [F(1,29) = 8.19 and p < .001; F(1,29) = 13.47 and p < .001)]. Posthoc tests were further performed. For the reward condition, the accuracy of F1B2 was significantly lower than that of the nonillusion conditions, with ps < .001. The accuracy of F2B1 was significantly lower than that of the nonillusion conditions (ps < .05), except that of F1, with p = .055. For the nonreward condition, the accuracy of F1B2 was significantly lower (ps < .001) than that of the nonillusion condition. The accuracy of F2B1 was significantly lower than that of all nonillusion conditions (ps < .05). These results suggested that fusion and fission illusions existed in Experiment 2.
Given that F1B2 and F2B1 represent the two classic illusion types in SIFI, line charts (see Figure 3b) were applied to reveal the tendency of F1B2 and F2B1 in Experiment 1 and the results showed that the accuracy of both fission and fusion illusions under the reward condition was greater than that under the nonreward condition. We further performed Tukey's method to test whether reward could have an impact on them. For the reward and nonreward conditions in F2B1, t(29) = 1.43, and p = .162. For the reward and nonreward conditions in F1B2, t(29) = 2.51, p = .018, and Cohen's d = 0.24. The results showed that the accuracy of F1B2 under the reward condition (M = 70%, SD = 0.26) was significantly greater than that under the nonreward condition (M = 64%, SD = 0.24), indicating that the reward factor increased the accuracy of participants’ judgment of the fission illusion.
Signal Detection Theory Analysis of Fission and Fusion Illusions
The d′ values (see Figure 4c) were submitted to paired-samples t-tests with the two reward types (reward and nonreward). For the fusion condition, the difference in d′ between reward type and nonreward type was not significant [t(29) = 1.80, p = .082]. For the fission condition, t(29) = 2.66, p = .013, and Cohen's d = 0.30, showing that d′ with reward (M = 2.65, SD = 0.96) was significantly higher than that with nonreward (M = 2.38, SD = 0.90). The c values (see Figure 4d) were also submitted to paired-samples t-tests with the two reward types (reward and nonreward). For the fusion condition, the c value in reward type was not significantly different from that in nonreward type [t(29) = −0.14, p = .889]. For the fission condition, the difference was also not significant [t(29) = 1.71, p = .098].
Given that the potential impact of reward history was eliminated in Experiment 2 by adjusting the sequence of blocks, we compared the accuracy of fission and fusion illusions to further examine whether reward history affects the SIFI. In Experiment 1, the difference in accuracy under the reward condition and nonreward condition was calculated in F1B2, and the same calculation was also applied in F2B1. In Experiment 2, the difference value for F1B2 and F2B1 between reward and nonreward conditions was calculated in the same manner. A paired-samples t-test was performed and the results showed no great significance in either F1B2 or F2B1 between the two experiments (p = .849 and p = .161).

Accuracy under each experimental condition in Experiment 2. The different color bars are used to mark the reward type. Gray bar: the proportion of trials with rewards; silver bar: the proportion of trials with nonrewards. F1 refers to a visual flash stimulus without sound and F2 refers to two flashes. F1B1 stands for one flash stimulus with a single auditory sound and F2B2 stands for two flashes and an auditory sound. F1B2 and F2B1 refer to illusion conditions, wherein one flash is paired with two auditory stimuli and two flashes are paired with one auditory stimulus. *p < 0.05.
Discussion
By incorporating monetary reward manipulation into the classic SIFI paradigm (Shams et al., 2000, 2002), the present study mainly focused on the effects of value-driven factors on SIFI, demonstrating that in addition to top-down and bottom-up factors, a reward could affect SIFI, as suggested by previous studies (Georgios & Julian, 2018; Mishra et al., 2010). Experiment 1 explored the effect of reward on the SIFI by controlling the experimental order to eliminate the influence of reward history. Experiment 2 further investigated the reward effect by adjusting the sequence of each block. The two experiments consistently showed the classic SIFI (i.e., fission and fusion illusions) effect, which was similar to existing studies (Andersen et al., 2004; Shams et al., 2000, 2002, 2005a, 2005b; Watkins et al., 2006, 2007a, 2007b). Furthermore, the present study showed that value-driven attentional priority could affect SIFI, especially for the fission illusion. Taking the legacy effect of reward into account in this study, we also found that reward history did not have a strong impact on SIFI. This account gains additional support from our signal detection analysis, and the appearance of momentary reward increased participants’ discriminability (d′) to visual flashes.
The present results are consistent with the classic SIFI (Shams et al., 2000, 2002). In both Experiments 1 and 2, participants achieved higher accuracy in the F1, F2, F1B1, and F2B2 conditions regardless of the reward type, which illustrated that participants were able to perceive the number of visual flashes more accurately when the visual flash stimulus was presented without a beep or when the number of visual flashes was equal to that of auditory beep stimuli. Furthermore, the lower accuracy in the F1B2 and F2B1 conditions compared to that in the F1, F1B1, F2, and F2B2 conditions indicated that auditory dominance occurred; that is, participants incorrectly perceived the number of visual flashes due to the misleading number of auditory beep stimuli. The results also suggested that the number of fission illusions perceived by the participants was greater than the number of fusion illusions, which is consistent with the finding of Kumpik et al. (2014) by signal detection theory.
As previously assumed, the present study showed that the fission illusion was reduced with the reward factor, whereas the fusion illusion did not show consistent effects of reward because the fusion illusion effect was weak and unstable. The signal detection analysis in the present study also revealed that participants had higher d′ for the reward-associated fission illusion, demonstrating that participants were more sensitive to the “signal” in audiovisual perception when tasks were related to value. Previous studies found an interaction between multisensory integration and reward. By associating faces with or without monetary reward in the training phase, a recent study found that individuals could report more McGurk percepts in the subsequent test phase report, demonstrating that multisensory integration was directly facilitated by reward association (Luo et al., 2020). This notion was also supported by a previous study, showing that multisensory regions may mediate the transfer of value signals across senses rather than classical reward regions in the cross-modal context (Pooresmaeili et al., 2014). In addition, by using representational similarity analysis, Hall-McMaster et al. (2019) detected that reward prospects could promote cognitive performance by strengthening neural coding of task rule information, helping to improve cognitive flexibility during complex behavior and increasing encoding of the active task rule in preparation for the target.
Moreover, the reward was suggested to have a positive impact on the interaction between orienting and executive control. Numerous studies have demonstrated that reward can produce an attention effect by driving its associated stimulus (Anderson et al., 2011a, 2011b; Della Libera & Chelazzi, 2009; Hickey et al., 2010; Raymond & O'Brien, 2009; Wang et al., 2013). Recently, one study found that reward associations have different effects on the three attentional networks and can enhance the interaction of orienting by executive control (Cao et al., 2021). Considering that the present study demonstrates that the value-driven factor has such an influence on the fission illusion, which was reported to be affected by the distribution of attention resources (Mishra et al., 2010), it is possible that the effect of reward on SIFI can be an interaction among multisensory integration, attentional orienting, and executive control.
Furthermore, it deserves mention that our results seem to be in contrast with Rosenthal et al.'s (2009) study, in which monetary reward was found to affect the degree of reported illusion only when feedback was provided. In their study, the effect of reward alone was not found by comparing the magnitude of the illusion and difference (p = .19 and .37, respectively) in criterion bias for the no-feedback phase in Experiment 3 with those in Experiment 1. However, it is worth mentioning that in Experiment 1, participants were randomly assigned to one of two groups—the feedback and the no-feedback groups. However, in Experiment 3, all participants accepted feedback and no-feedback training. That is, except for the reward factor, participants still received different treatments, and the nonsignificant result was obtained through between-subjects comparison. Thus, the expositions mentioned above seem to be unsatisfactory given the large interindividual variability in the susceptibility to the illusion (de Haas et al., 2012). Additionally, the participant volume in this study might not achieve ideal statistical performance, with only 15, 8, and 6 participants in each experiment, separately. In addition, since the feedback training effect was the most attended factor in Rosenthal's study, the reward factor just served as a motivation (participants could get it regardless of their performance). However, in the present study, momentary reward acted as a value-driven factor (participants could achieve it only when they reached the standard). Thus, whether the function of these two reward types differs merits future systematic studies.
Interestingly, in our study, the robustness of reward history was not as strong as previous studies have suggested. Numerous previous studies have found that the learned reward value could unconsciously bias attention to the rewarded stimulus feature even when it had already been eliminated (Della Libera & Chelazzi, 2009; Failing & Theeuwes, 2014; MacLean & Giesbrecht, 2015). It has been hypothesized that reward learning induces visual cortical plasticity, which modulates early visual processing to capture attention (Tankelevitch et al., 2020). Considering the effect of reward history, the present study adopted the presentation in the sequence of “nonreward” and then “reward” in Experiment 2 to eliminate this effect. However, the results of the two experiments did not differ, and one possible explanation for this result is that the SIFI effect, especially the fission illusion, is so stable that it will not easily disappear (Andersen et al., 2004; Kostaki & Vatakis, 2016).
In the present study, similar to many previous studies (Andersen et al., 2004; Innes-Brown & Crewther, 2009; Shams et al., 2000), we also found that the fission illusion generally had a stronger effect than the fusion illusion. The possible reason for the dissociation effect in fusion and fission illusions could be related to different neural mechanisms in the brain. A transcranial direct current stimulation (tDCS) study of the SIFI found that the perception of “fission” was increased after anodal tDCS of the temporal cortex and decreased after anodal stimulation of the occipital cortex. In contrast, the fusion illusion was unaffected by tDCS (Bolognini et al., 2011). The absence of tDCS modulation of the fusion illusion suggests that although fission and fusion flash illusions may appear to be mutual psychophysical phenomena, their neurofunctional foundations may differ. This view has been supported by Watkins et al. (2007a, 2007b), who used high-field functional magnetic resonance imaging in humans and discovered a very different activation pattern in the neural basis of these multisensory perceptual illusions. Specifically, the fission illusion had higher activation levels in the primary visual cortex (V1), whereas the fusion illusion had lower activation levels in the V1 cortex. Hirst et al. (2020) concluded that the components associated with fusion illusion occur later in time relative to fission effects, and modulations observed in V1 may lag in the parietal cortex (Innes-Brown et al., 2013; Meylan & Murray, 2007; Mishra et al., 2008). In addition, Bolognini et al. (2011) also suggested that multiple visual flashes in fusion trials may maximally activate the visual cortical areas, leading to the failure of subthreshold excitability modulation to produce changes in the fusion illusion. Therefore, different neural mechanisms between fission and flash illusions may account for the dissociation effect found for reward in the present study.
To conclude, by dividing tasks into reward and nonreward conditions, we demonstrated that individuals reported fewer fission illusions for reward conditions, indicating that the value-driven reward enhanced the influence of visual information on SIFI perception. The signal detection analysis also revealed that when the perception tasks were associated with value, individuals tended to be less likely to be disturbed by the sound in processing the SIFI stimuli. Recent studies have examined the influence of bottom-up and top-down factors on multisensory integration, suggesting that multisensory integration in the SIFI is not an automatic and rigid process and that stimulus characteristics, task instructions, and cognitive processes, such as attention and expectations, shape multisensory integration (Keil, 2020). Our study broadened the existing dichotomy of SIFI, adding value-driven attentional priority to the influential factors in SIFI. Moreover, given that SIFI is considered to be a bistable perception phenomenon across sensory modalities, our findings also provided evidence for value-driven attention mechanisms, demonstrating that operations underlying value-driven attention are not domain-specific and instead operate across multiple sensory systems (Anderson, 2016a, 2016b), and reveal a relationship among SIFI, attention, and reward, providing evidence to support how value-driven factors could influence SIFI by modulating attention and visual processing.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Jiangsu Provincial Key Constructive Laboratory for Big Data of Psychology and Cognitive Science (72592162005G) and the Japan Society for the Promotion of Science KAKENHI (20K04381) and the National Natural Science Foundation of China (31871092 and 31700939).
