Effects of Multimodal Association on Ambiguous Perception in Binocular Rivalry

Abstract

When two eyes view dissimilar images, an observer typically reports ambiguous perception called binocular rivalry where the subjective perception fluctuates between the two inputs. This perceptual instability is often comprised of exclusive dominance of each image and a transition state called piecemeal state where the two images are intermingled in patchwork manner. Herein, we investigated the effects of multimodal association of sensory congruent pair, arbitrary pair, and reverse pair on piecemeal state in order to see how each level of association affects the ambiguous perception during binocular rivalry. To induce the multisensory associations, we designed a matching task with audiovisual feedback where subjects were required to respond according to given pairing rules. We found that explicit audiovisual associations can substantially affect the piecemeal state during binocular rivalry and that this congruency effect that reduces the amount of visual ambiguity originates primarily from explicit audiovisual association training rather than common sensory features. Furthermore, when one information is associated with multiple information, recent and preexisting associations work collectively to influence the perceptual ambiguity during rivalry. Our findings show that learned multimodal association directly affects the temporal dynamics of ambiguous perception during binocular rivalry by modulating not only the exclusive dominance but also the piecemeal state in a systematic manner.

Keywords

perceptual ambiguity binocular rivalry multimodal association disambiguation psychophysics

Introduction

When a pair of incompatible images are shown to each eye, the subjective perception falls in ambiguity while the optical input to each eye remains unchanged (Wheatstone, 1838). To resolve the unstable state, our brain endeavors to achieve a single unequivocal interpretation by constantly alternating between the two sensory inputs. Each percept takes turn, winning only a few seconds of subject’s experience and this competition continues as long as the rival stimuli are present (Blake, 2001). Because of this dissociation between the physical input and the subjective experience, binocular rivalry has been considered as an important experimental tool for exploring the neural correlate of conscious visual awareness and how our brain deals with ambiguous information.

Perhaps one of the most striking differences between binocular rivalry and other perceptual bistability experienced in ambiguous figures (Boring, 1930; Necker, 1832; Rubin, 1921) is the existence of mixed percept where the two given images are perceived simultaneously (Paffen, Naber, & Verstraten, 2008). Also called the piecemeal state due to its patchwork-like appearance, this transient state is the common ground before exclusive selection or reversal takes place. Thus, inspecting the piecemeal state can lend us insights on mechanism of perceptual disambiguation. For example, it has been shown that the piecemeal state increases with larger visual angle in which distributed zones undergo rivalry independently, lending the idea that visual disambiguation transpires not globally between the eyes but in each separate receptive field of monocular vision (Blake, 2001). Another study using complementary patchworks of intermingled rivalrous images reported that subjects experienced competition between unscrambled coherent images rather than separate patchwork images (Kovacs, Papathomas, Yang, & Feher, 1996). This entails that rather than oscillating between the given low-level optical input itself, our brain first converts them into familiar subjective patterns and then tries to conjoin them in order to disambiguate the apparent paradox. These studies reveal that the dynamics of piecemeal state is affected not only by low-level neural signal but also by higher level cognitive processes such as semantics, making the piecemeal rivalry an intriguing phenomenon to test the perceptual disambiguation mechanism.

Over decades, the mechanism of perceptual disambiguation in binocular rivalry has been thoroughly investigated by studying several bottom-up and top-down factors that influence the behavior of binocular rivalry (Blake & Logothetis, 2002; Brascamp, Klink, & Levelt, 2015; Tong, Meng, & Blake, 2006). Earlier studies using luminance difference (Fox & Rasche, 1969; Kakizaki, 1960), contrast polarity (Hollins, 1980; Whittle, 1965), or color difference (Carney, Shadlen, & Switkes, 1987; Hollins & Leung, 1978) showed that any left- and right-eye stimulus difference could instigate the ambiguity, making lateral inhibition and self-adaptation of optical neural signal as the main building blocks of ambiguity control (Wilson, 2003). Studies using top-down factors such as attention (Lack, 1969, 1974; Ooi & He, 1999; Paffen & Alais, 2011; von Helmholtz & Southall, 1924) or emotional salience (Alpers & Gerdes, 2007) showed the possibility of attentional control during disambiguation process. Other studies in similar vein using induced color difference (Andrews & Lotto, 2004; Hong & Shevell, 2008) or orientation aftereffect (Chopin, Mamassian, & Blake, 2012) showed that even the physically identical stimuli could cause rivalry when they signify different information at subjective experience level, supporting the idea that the disambiguation process takes place at higher level neurons along the visual pathway rather than low-level neural signals (Dijkstra, van de Nieuwenhuijzen, & van Gerven, 2016; Scocchia, Valsecchi, & Triesch, 2014). Furthermore, brain-imaging studies using functional magnetic resonance imaging, magnetoencephalography, or electroencephalography imply that there is no isolated cortical area selectively correlating with the participant’s current percept during binocular rivalry (Kornmeier & Bach, 2012; Lumer, Friston, & Rees, 1998; Maier et al., 2008; Polonsky, Blake, Braun, & Heeger, 2000; Srinivasan, Russell, Edelman, & Tononi, 1999; Tong, Nakayama, Vaughan, & Kanwisher, 1998; Wilke, Logothetis, & Leopold, 2006).

Recently, several studies reported the influences of even higher levels of information processing such as multimodal congruence (Chen, Yeh, & Spence, 2011; Conrad, Bartels, Kleiner, & Noppeney, 2010; Einhäuser, Methfessel, & Bendixen, 2017; Kang & Blake, 2005; Lunghi & Alais, 2013; Lunghi, Binda, & Morrone, 2010; Lunghi, Morrone, & Alais, 2014; Maruya, Yang, & Blake, 2007; Pápai & Soto-Faraco, 2017; Piazza, Denison, & Silver, 2018; van Ee, van Boxtel, Parker, & Alais, 2009), or prior experience (Klink, Boucherie, Denys, Roelfsema, & Self, 2017; Klink, Brascamp, Blake, & van Wezel, 2010). The fact that even cross-modal influence or prior experience can directly modulate the dynamics of rivalry implies that the mechanism of conscious univocal interpretation of given environmental cues is highly plastic and integrative than previous psychophysics and brain-imaging studies have speculated. Moreover, these studies showed that previous experience as recent as laboratory-induced multimodal association tasks could significantly influence the contents of conscious perception under ambiguity.

However, these studies primarily focused on unitary rivalry, investigating the cross-modal influence of disambiguation on initial selection or dominance enhancement of exclusively congruent stimuli during multimodal binocular rivalry. Dissociating the effect of cross-modal processing on unitary rivalry and piecemeal rivalry will allow to better understand how multimodal interaction affects different states of perceptual ambiguity during binocular rivalry. Herein, we investigated the due effects of novel multimodal association on the piecemeal state, the common ground of selection and alternation, in an attempt to shed more light on how the perceptual ambiguity is resolved under multidimensional environment.

Experiment 1

Objective

In our first experiment, we investigated the effects of multimodal association of sensory congruent audiovisual pairs on piecemeal state. Specifically, we compared the level of disambiguation effects of low-level sensory congruence and high-level cognitive congruence by probing the temporal dynamics of piecemeal state. Studies on multisensory integration show that visual and auditory information can significantly affect each other to dramatically change the perceptual interpretation of given input (Alais & Burr, 2004; McGurk & MacDonald, 1976; Shimojo & Shams, 2001). Although some studies argued the perceptual synergy of low-level sensory congruence such as flicker frequency (Kang & Blake, 2005), motion direction (Conrad et al., 2010), or spatial orientation (Stein & Stanford, 2008), many agree that the multisensory congruence effect becomes stronger with high-level, more naturalistic audiovisual pairs formed as a function of repetitive experience in complex natural scene over time (Conrad et al., 2013; Gilbert & Sigman, 2007; Piazza et al., 2018; Shams & Seitz, 2008; Stein, Stanford, & Rowland, 2014; Talsma, 2015). Although the mechanisms for forming new multisensory association that may influence the contents of conscious perception are unclear, recent works utilizing explicit multisensory association showed that laboratory-induced audiovisual association tasks can significantly modulate the predominance during unitary rivalry. Thus, in this experiment, we investigated the modulatory effect of auditory information on piecemeal state before and after the audiovisual association task to compare the disambiguation effects of sensory congruence and cognitive congruence.

Figure 1.

Stimuli set and association rules for Experiment 1. For the low-level sensory congruent audiovisual stimuli, flickering sinusoidal gratings (1 Hz and 3 Hz) and amplitude-modulated tones (1 Hz and 3 Hz) were used. In the matching task, the audiovisual stimuli were paired in matching temporal frequencies.

Stimuli and Apparatus

For the low-level sensory congruent audiovisual stimuli, we designed a pair of flickering sinusoidal gratings (1 Hz and 3 Hz) and amplitude-modulated (AM) tones (1 Hz and 3 Hz) so that the audiovisual stimuli can form sensory congruent pairs in each temporal frequency (Figure 1). The sinusoidal gratings and AM tones were generated using MATLAB in conjunction with the PsychToolBox-3 (Brainard, 1997; Kleiner, Brainard, & Pelli, 2007). Both gratings had the spatial frequency of 2.5 c/deg oriented either horizontally (1 Hz flicker) or vertically (3 Hz flicker). The gratings appeared within a circular Gaussian mask. Both patches were surrounded by a black line-drawing box with a red fixation cross in the middle in order to help subjects keep stable stereoscopic match during the rivalry procedure. The fixation cross subtended 0.57° in size with line width of 0.11° in the viewing distance of 50 cm. The contrast for both gratings ranged from 0 (invisible) to 1 with π/4 phase shift to ensure that at least one of the stimuli would be visible at any given time. The size of horizontal and vertical gratings was 600 × 600 pixels which subtended approximately 8.02° in the viewing distance of 50 cm. Here, relatively large visual stimuli were used on purpose in order to instigate more piecemeal rivalry during the sessions (Hollins & Hudnell, 1980). All the visual stimuli were presented in gray (mid value of the color look-up-table of the screen) background. The sampling rate of AM tones was 48 kbps, and the carrier frequency was 500 Hz monotone. The 1 Hz and 3 Hz AM tones were also phase shifted by π/4 to sync with the horizontal and vertical flickering gratings, respectively. Both tones were prepared for stereo playback.

The visual stimuli were presented on a 27-in. LCD display of iMac with the screen resolution of 2,560 × 1,440. The subjects viewed the monitor through a mirror stereoscope, each eye seeing only the left or right half of the monitor, respectively. The mirror stereoscope could be adjusted to fit each subject’s vantage point to ensure binocular superposition of the left and right visual fields at the center. An adjustable chin rest was used to help subjects keep their head still during the calibration and rivalry procedure. The auditory stimuli were presented using a Sony noise-canceling headphone to isolate any irrelevant outside noise.

Subjects

Fourteen undergraduate volunteers (4 females, mean age = 21.2 years, standard deviation [SD] = 1.8 years, range 19–25) with normal or corrected-to-normal vision and normal hearing participated in this experiment in exchange for course credit. The subjects were naive as to the specific purpose of this study. The subjects were also naive to each visual and auditory stimuli or their combination, meaning that they did not have any explicit knowledge of association to infer cross-modal congruence from given auditory information. Also, notice that the subjects did not know the exact temporal frequencies (1 Hz or 3 Hz) of the visual and auditory stimuli nor did they know that the stimuli could be audiovisually paired by matching temporal frequencies. All subjects provided written informed consent before participating in this study, and the tenets of the Declaration of Helsinki were followed. Every aspect of this study was approved and carried out in accordance with the regulations of the Korea Advanced Institute of Science and Technology (KAIST) Institutional Review Board.

Procedure

Subjects were initially told to sit in a comfortable position and adjust the mirror stereoscope and the chin rest to snugly fit their views toward the monitor. The subjects were trained in advance to report their subjective experience using three predesignated keyboard keys (‘d’ for vertical grating, ‘v’ for horizontal grating, and ‘f’ for piecemeal). Each key was to be pressed only once as soon as an image came into dominance. Here, the criteria for the piecemeal state were defined as ‘when neither one of the stimuli is exclusively dominant.’

The recording procedure began with three 60-second rivalry sessions (no sound, with 3 Hz sound, and with 1 Hz sound) followed by an explicit matching task with given association rules and then finished by repeating the three 60-second rivalry sessions (no sound, with 3 Hz sound, and with 1 Hz sound) to see the intervention effect of sound and audiovisual association. Before each 1-minute rivalry sessions, the guiding box and the red fixation cross appeared on a gray background to let the subjects adjust the mirror stereoscope and calibrate the views. After stereoscopic matching, the subjects pressed the ‘space bar’ to begin the actual rivalry session. During the audiovisual rivalry, the visual and auditory stimuli were presented continuously from the beginning to the end of the session. The subjects were given sufficient break time between every 60-second session in order to recover from any eye strain and minimize the effects of adaptation. Also, the dichoptic presentation of visual stimuli was counterbalanced across the eyes for each subject in order to eliminate any effects of governing eye.

In the audiovisual matching task, the subjects were told the exact temporal frequencies of the stimuli and that they can be audiovisually paired in matching frequencies. After learning the audiovisual association rules, the subjects went through the matching task. During the matching task, one of the four possible audiovisual combinations (1 Hz grating—1 Hz sound, 1 Hz grating—3 Hz sound, 3 Hz grating—1 Hz sound, and 3 Hz grating—3 Hz sound) was presented to the subjects. The subjects pressed ‘O’ key when the temporal frequencies were congruent and ‘X’ when incongruent. A simple audiovisual feedback with O/X image and correct/incorrect tone was given on each keyboard responses. This procedure was repeated until the subjects could score 20 consecutive correct answers, making them practice the rule exhaustively. If a subject made a mistake during the procedure, the score count was reset to 0, starting from the beginning. The average time took for the subjects to pass the matching task was approximately 4 minutes (M = 251 seconds, SD = 32.70 seconds).

Results

The duration of time for 3 Hz, 1 Hz, and piecemeal percepts was recorded for each subject. As some of the subjects occasionally pressed the same key twice by accident (average error response rate = 0.79%, SD = 0.020), the logged data were trimmed to exclude these events (both initial and repeated keypress events) leaving only the valid events. The mean dominance duration, total dominance, and switch rates were then analyzed using SPSS statistics software version 22.0. Two-way repeated measures analyses of variance (ANOVAs) were conducted on the influence of two independent variables (association and sound) to compare the main effects of association and sound and the interaction effects between association and sound on 3 Hz, 1 Hz, and piecemeal percepts and switch rate, respectively. Association included two levels (before and after), and sound consisted of three levels (no sound, 3 Hz sound, and 1 Hz sound). Here, the basic idea was that if a congruent auditory stimulus helps the selection of coinciding visual stimulus during intermittent ambiguous states, the overall mean and total durations of the piecemeal state should decrease with 3 Hz or 1 Hz sound in addition to its congruence effects on complete dominance. Also, if the low-level temporal congruence of visual and auditory stimuli themselves was sufficient to induce multisensory integration, the auditory modulatory effects should be present without any explicit association process. Should this not be the case, the piecemeal resolution should occur only after learning the audiovisual pairing rules and forming higher level cognitive congruence. However, if the intervention of sound or association cannot cause any difference in conscious perception during piecemeal state, neither sound nor association would result in significant main effects on piecemeal durations.

First, the analysis on complete dominance of 3 Hz and 1 Hz percepts revealed consistent enhancement effects of congruent auditory stimuli after association in mean and total dominance duration (Figure 2). In the mean dominance duration of 3 Hz percept, there was a significant interaction effect between association and sound, F(2, 26) = 5.81, p = .008, $η_{p}^{2}$ = 0.31, with pairwise comparison showing significant enhancement effect of 3 Hz auditory stimulus (M = 1.80, SD = 0.63, p = .02, d = 0.61) after the association compared with no sound condition (M = 1.46, SD = 0.48) with Bonferroni correction (all the p values are Bonferroni corrected otherwise noted as uncorrected). However, there was no suppression effect by 1 Hz auditory stimulus (M = 1.50, SD = 0.43, p = .40, uncorrected, d = 0.09). Similar results were found in total dominance duration, F(2, 26) = 3.48, p = .046, $η_{p}^{2}$ = 0.21, with pairwise comparison showing enhancement effect of 3 Hz sound (M = 24.39, SD = 5.07, p = .0003, d = 1.03) compared with no sound condition (M = 18.89, SD = 5.61) and no suppression effects of 1 Hz sound (M = 20.66, SD = 5.04, p = .10, d = 0.32). In the mean dominance duration of 1 Hz percept, there was also a significant interaction effect, F(2, 26) = 4.28, p = .025, $η_{p}^{2}$ = 0.25, with pairwise comparison showing significant enhancement effect of 1 Hz auditory stimulus (M = 1.78, SD = 0.50, p = .0004, d = 0.84) after the association compared with no sound condition (M = 1.42, SD = 0.36), while no suppression effect was revealed with 3 Hz sound (M = 1.50, SD = 0.43, p = .69, d = 0.20). The total dominance results revealed similar interaction effect, F(2, 26) = 3.61, p = .041, $η_{p}^{2}$ = 0.22, with pairwise comparison showing significant enhancement of 1 Hz sound (M = 22.39, SD = 3.71, p = .0001, d = 1.41) from no sound condition (M = 16.38, SD = 4.81) and no suppression effect of 3 Hz sound (M = 17.96, SD = 4.83, p = .36, d = 0.33).

Figure 2.

Results of Experiment 1. Mean and total dominance duration of 3 Hz visual percept (top row), 1 Hz visual percept (second row), piecemeal percept (third row), and switch rate (bottom row). The small cross symbols represent individual data points. Error bars represent 1 standard error of the mean. ns = not statistically significant. *p<0.05, **p< 0.01.

In the mean duration of piecemeal state, the results revealed a significant interaction effect between association and sound, F(2, 26) = 5.81, p = .008, $η_{p}^{2}$ = 0.31, meaning that the intervention of auditory stimulus on mean piecemeal duration differed between association time points. The pairwise comparison of effects of sounds revealed significant auditory modulatory effects of 3 Hz sound (M = 1.19, SD = 0.34, p = .04, d = 0.79) and 1 Hz sound (M = 1.24, SD = 0.36, p = .03, d = 0.67) compared with no sound condition (M = 1.55, SD = 0.57) after the association. Although the presentation of sounds slightly increased the piecemeal duration before the association as if the auditory cues acted as a distraction source, the effect was statistically insignificant for both 3 Hz (M = 1.73, SD = 0.57, p = .34, d = 0.35), and 1 Hz (M = 1.76, SD = 0.65, p = .62, d = 0.38) sounds compared with no sound condition (M = 1.53, SD = 0.57). Similar results were obtained with the total dominance duration of piecemeal state. The statistical analysis revealed a significant interaction effect, F(2, 26) = 4.91, p = .016, $η_{p}^{2}$ = 0.27. The pairwise comparison revealed significant auditory modulatory effects of 3 Hz sound (M = 13.99, SD = 6.93, p = .009, d = 0.94) and 1 Hz sound (M = 15.84, SD = 7.20, p = .009, d = 0.66) compared with no sound condition (M = 20.49, SD = 6.86) after the association, while the same pairwise comparison revealed nonsignificant effects of 3 Hz sound (M = 18.68, SD = 7.31, p = .34, uncorrected, d = 0.27) and 1 Hz sound (M = 18.76, SD = 5.75, p = .35, uncorrected, d = 0.31) compared with no sound condition (M = 16.52, SD = 8.63) before the association. Finally, the analysis on switch rate revealed a nonsignificant interaction effect, F(2, 26) = 0.62, p = .548, $η_{p}^{2}$ = 0.045, meaning that the reduction of mean piecemeal duration after association was not due to increased switch rate.

This implies that the matching temporal frequency without explicit association was insufficient for subjects to infer audiovisual congruence, whereas learning the audiovisual pairing rules could help the subjects break away from the visual ambiguity during the rivalry.

Experiment 2

Objective

In our previous experiment, we showed that the multimodal association of congruent audiovisual pairs could significantly reduce the duration of piecemeal state. However, one can speculate that this congruence effect emerged not entirely from the explicit association but also in part from the apparent sensory congruence itself. In this case, the cognitive congruence acquired by the association task could have merely facilitated the preexisting sensory congruence into significance. Also, as mentioned in ‘Objective’ section of Experiment 1, the cross-modal congruence effect is expected to be stronger with ecologically valid audiovisual pairs which in many cases bear the intrinsic low-level sensory congruence such as spatiotemporal synchrony (De Gelder & Bertelson, 2003; Macaluso & Driver, 2005; Macaluso, George, Dolan, Spence, & Driver, 2004; Stevenson, Fister, Barnett, Nidiffer, & Wallace, 2012).

However, recent studies using motion or simple gratings paired with pure tones have demonstrated that learning the given rules in the laboratory could bias the initial perception or increase the dominance duration to some extent during binocular rivalry (Einhäuser et al., 2017; Piazza et al., 2018). This implies that arbitrary pairings of nonnaturalistic stimuli could also affect the temporal dynamics of piecemeal state. Thus, in Experiment 2, drawing from the results of Experiment 1, we did an in-depth investigation of multimodal association on piecemeal state using arbitrary audiovisual pairs void of any sensory congruence in order to isolate the effects of cognitive congruence.

Figure 3.

Stimuli set and association rules for Experiment 2. For the arbitrary audiovisual stimuli, images of animals (dog/duck) and amplitude-modulated tones (1 Hz/3 Hz) were used. In the matching task, the given audiovisual pairing rules were ‘Dog—3 Hz’ and ‘Duck—1 Hz.’

Stimuli and Apparatus

For the arbitrary audiovisual pairs without any apparent sensory congruence, we adopted line drawings of animals (dog/duck) and AM tones (1 Hz/3 Hz) (Figure 3). The line drawings were adapted from the ‘Snodgrass and Vanderwart-Like Objects’ (Rossion & Pourtois, 2004, stimulus set files #073, #081) to subtend approximately 8.02° in the viewing distance of 50 cm. To induce extended periods of piecemeal state, the images were flipped horizontally to look the opposite sides (dog looking right and duck looking left) so that relatively large portions would result in nonoverlapping areas. Both images were presented with the same black box and the red fixation cross used in Experiment 1 for stable stereoscopic matching. All the visual stimuli were presented in a gray (mid value of the color look-up-table of the monitor) background. The AM tones were generated using MATLAB with PsychToolBox-3 with the same specifications used in Experiment 1 without the phase shift. The sampling rate of AM tones was 48 kbps, and the carrier frequency was 500 Hz monotone. Both tones were prepared for stereo playback.

The visual stimuli were presented on a 27-in. LCD display of iMac with the screen resolution of 2,560 × 1,440. The subjects viewed the monitor through a mirror stereoscope with an adjustable chin rest. The auditory stimuli were presented with a Sony noise-cancelling headphone to minimize outside noise.

Subjects

Thirteen undergraduate volunteers (5 females, mean age = 22.5 years, SD = 2.3 years, range 19–26) with normal or corrected-to-normal vision and normal hearing participated in this experiment in exchange for course credit. The subjects were naive as to the specific purpose of this study. The subjects were also naive to the visual and auditory stimuli other than the fact that the images signify ‘dog’ and ‘duck’. All subjects provided written informed consent before participating in this study, and the tenets of the Declaration of Helsinki were followed. Every aspect of this study was carried out in accordance with the regulations of the KAIST Institutional Review Board.

Procedure

The experimental flow remained the same as in Experiment 1. The subjects first adjusted the mirror stereoscope and the chin rest to achieve a secure vantage point that perfectly matches the guiding cues (black box and red fixation cross). After the calibration process, the recording began with three 60-second rivalry sessions (no sound, with 3 Hz sound, and with 1 Hz sound) as a baseline period. The subjects were trained in advance to report their subjective experience using three keyboard keys (‘h’ for dog, ‘m’ for duck, and ‘j’ for piecemeal) pressing only once as soon as an image came into dominance. The criteria for piecemeal state were defined as ‘when neither dog nor duck is exclusively dominant.’

After the baseline recording, the subjects learned the pairing rules (dog—3 Hz tone, duck—1 Hz tone) and practiced the rule using the explicit matching task. During the matching task, one of the four possible combinations (dog—3 Hz tone, dog—1 Hz tone, duck—3 Hz tone, and duck—1 Hz tone) was randomly presented on the monitor, and the subjects were told to press ‘O’ key when the given stimuli match the pairing rules and ‘X’ key when mismatch. A simple audiovisual feedback with O/X image and correct/incorrect tone was given on each keyboard responses. This procedure continued until the subjects could score 20 consecutive correct answers, making them practice the rule exhaustively. Any mistake during the procedure reset the score count to 0. The average time took for the subjects to pass the matching task was 4 minutes (M = 233 seconds, SD = 23.34 seconds).

Finally, another set of three 60-second rivalry sessions (no sound, with 3 Hz sound, and with 1 Hz sound) were recorded again to see the effects of multimodal association. The subjects were given sufficient break time between every 60-second session in order to recover from any eye strain and minimize the effects of adaptation.

Results

The dominance duration of each percept was recorded and trimmed as in Experiment 1 to eliminate any erroneous responses (average error response rate = 0.13%, SD = 0.003), leaving only the valid events. The mean duration, total duration, and switch rate values were then analyzed using SPSS statistics software. Two-way repeated measures ANOVAs were conducted on the influence of two independent variables (association and sound) to compare the main effects of association and sound and the interaction effects between association and sound on dog, duck, piecemeal percepts, and switch rate, respectively. Association included two levels (before and after), and sound consisted of three levels (no sound, 3 Hz sound, and 1 Hz sound).

Figure 4.

Results of Experiment 2. Mean and total dominance duration of dog visual percept (top row), duck visual percept (second row), piecemeal percept (third row), and switch rate (bottom row). The small cross symbols represent individual data points. Error bars represent 1 standard error of the mean. ns = not statistically significant. *p<0.05, **p<0.01.

First, the analysis on complete dominance of dog and duck percepts revealed consistent enhancement effects of explicitly paired auditory stimuli after association in mean and total dominance duration (Figure 4). In the mean dominance duration of dog percept, there was a significant interaction effect between association and sound, F(2, 24) = 6.82, p = .005, $η_{p}^{2}$ = 0.36, with pairwise comparison showing significant enhancement effect of 3 Hz auditory stimulus (M = 1.93, SD = 0.57, p = .002, d = 0.57) after the association compared with no sound condition (M = 1.63, SD = 0.48). However, there was no suppression effect by 1 Hz auditory stimulus (M = 1.59, SD = 0.45, p = .66, uncorrected, d = 0.09). Similar results were found in total dominance duration, F(2, 24) = 16.88, p = .00003, $η_{p}^{2}$ = 0.58, with pairwise comparison showing enhancement effect of 3 Hz sound (M = 22.96, SD = 3.95, p = .009, d = 1.10) compared with no sound condition (M = 17.73, SD = 5.60) and no suppression effects of 1 Hz sound (M = 14.76, SD = 3.61, p = .28, d = 0.64). In the mean dominance duration of duck percept, there was also a significant interaction effect, F(2, 24) = 6.16, p = .007, $η_{p}^{2}$ = 0.34, with pairwise comparison showing significant enhancement effect of explicitly paired 1 Hz auditory stimulus (M = 2.03, SD = 0.78, p = .03, d = 0.68) after the association compared with no sound condition (M = 1.58, SD = 0.54), while no suppression effect was revealed with 3 Hz sound (M = 1.42, SD = 0.44, p = .18, d = 0.33). The total dominance results revealed similar interaction effect, F(2, 24) = 6.06, p = .007, $η_{p}^{2}$ = 0.34, with pairwise comparison showing significant enhancement of 1 Hz sound (M = 21.11, SD = 4.29, p = .004, d = 1.11) from no sound condition (M = 16.45, SD = 4.11) and no suppression effect of 3 Hz sound (M = 14.56, SD = 5.15, p = .92, d = 0.41).

In the mean piecemeal duration, the results revealed a significant interaction effect between association and sound, F(2, 24) = 9.40, p = .001, $η_{p}^{2}$ = 0.44, suggesting that the auditory modulation effects on mean piecemeal duration was different before and after the association task. The post hoc pairwise comparison of each sound levels before the association revealed that the variance of mean piecemeal duration was statistically insignificant between no sound (M = 1.84, SD = 0.73) and 3 Hz sound (M = 1.77, SD = 0.74, p = .40, uncorrected, d = 0.10) or between no sound and 1 Hz sound (M = 1.99, SD = 0.74, p = .37, d = 0.20). This entails that presenting irrelevant auditory information does not significantly modify the subjective perception of the ambiguous state during rivalry. However, the same pairwise comparison of each sound levels after the association revealed that the magnitude of the auditory modulatory effect was significant both in 3 Hz sound condition (M = 1.52, SD = 0.47, p = .004, d = 0.59) and in 1 Hz sound condition (M = 1.53, SD = 0.40, p = .008, d = 0.61) compared with no sound condition (M = 1.84, SD = 0.61). Similar results were obtained with the total dominance duration of piecemeal state. The statistical analysis revealed a significant interaction effect, F(2, 24) = 5.35, p = .012, $η_{p}^{2}$ = 0.31. The pairwise comparison revealed significant auditory modulatory effects of 3 Hz sound (M = 16.01, SD = 7.20, p = .003, d = 0.62) and 1 Hz sound (M = 14.68, SD = 6.30, p = .0001, d = 0.84) compared with no sound condition (M = 20.78, SD = 8.21) after the association, while the effects were statistically insignificant for both 3 Hz sound (M = 16.74, SD = 7.08, p = .69, uncorrected, d = 0.10) and 1 Hz sound (M = 18.23, SD = 7.92, p = .70, uncorrected, d = 0.10) in comparison with no sound condition (M = 17.47, SD = 7.05) before the association. Finally, the analysis on switch rate revealed a nonsignificant interaction effect, F(2, 24) = 0.07, p = .934, $η_{p}^{2}$ = 0.01, meaning that the reduction of mean piecemeal duration after association was not due to increased switch rate.

This implies that, with appropriate audiovisual matching procedure, previously irrelevant auditory information void of any low-level sensory congruence can become a relevant cue to induce piecemeal resolution. Furthermore, the results of Experiment 2 support the idea that the multisensory disambiguation effect is determined not primarily by the intrinsic sensory congruence such as spatiotemporal synchrony but rather by explicit associations between distinct audiovisual information which form higher level cognitive congruence. Also, notice that the planned cognitive congruence here is acquired through repetitive exposure to novel stimuli, judging the appropriate pairing patterns and feedback process to reinforce the decision which resembles how we experience multisensory pairs in natural scenes.

Experiment 3

Objective

In the previous experiments, we showed that brief explicit associations can directly alter the ambiguous perception during binocular rivalry by modulating the duration of piecemeal state even with arbitrary audiovisual stimuli void of any apparent sensory congruence. Contemplating on the results of this study thus far and previous reports of effects of explicit associations on binocular rivalry (Einhäuser et al., 2017; Piazza et al., 2018), we noticed that the potential multisensory congruence acquired in these laboratory-designed tasks were newly formed recent associations compared with common naturalistic associations. As mentioned in ‘Objective’ section of Experiment 1, the influence of multisensory congruence is expected to become more evident over time with repetitive experience (e.g., mouth and speech association leading to ventriloquist illusion) possibly even becoming an automatic process. However, other studies on top-down effects of multisensory integration suggest the contrary that automatic multisensory responses can be overruled depending on the current task or expectations of the observer (Koelewijn, Bronkhorst, & Theeuwes, 2010; Macaluso et al., 2016; van Atteveldt, Formisano, Goebel, & Blomert, 2007).

Thus, in Experiment 3, we aimed to examine how recent audiovisual associations compare to already existing strong associations in terms of piecemeal modulation during binocular rivalry. Specifically, we adopted a pair of semantically congruent stimuli (images and soundtracks of animals) and designed a multimodal reverse association task to compare the modulatory effects of recent reverse associations and long-standing natural associations.

Figure 5.

Stimuli set and association rules for Experiment 3. For the semantically congruent audiovisual stimuli, images and soundtracks of dog and duck were used which subjects were highly familiar and could correctly categorize each visual and auditory stimuli into either ‘dog’ or ‘duck’ without any instruction. In the matching task, reverse association rules were given where subjects had to pair dog image with crying duck sound and vice versa.

Stimuli and Apparatus

For the semantically congruent audiovisual stimuli pairs with robust natural associations, we chose the images and soundtracks of dog and duck (Figure 5). The line drawings of animals were adapted from the ‘Snodgrass and Vanderwart-Like Objects’ as in Experiment 2. The images were flipped horizontally to instigate more piecemeal state with each image subtending approximately 8.02° in the viewing distance of 50 cm. Both images were presented with the guiding cues (black box/red fixation cross) for stable stereoscopic matching. All the visual stimuli were presented in a gray background. The real-life recordings of barking dog and crying duck soundtracks were downloaded from Soundsnap website (www.soundsnap.com) and prepared for 60-second stereo playback to ensure smooth continuous presentation of auditory context. The volume of each soundtrack could be adjusted by the subjects before the presentation to have similar subjective strength. The apparatus used to present the stimuli were the same as in previous experiments.

Subjects

Fifteen undergraduate volunteers (3 females, mean age = 22.2 years, SD = 1.5 years, range 19–24) with normal or corrected-to-normal vision and normal hearing participated in this experiment in exchange for course credit. The subjects were naive to the specific purpose of this study but were highly familiar with the semantic contents of the visual and auditory stimuli that they could easily categorize the stimuli into two groups (dog/duck) without any instruction, meaning that the subjects already retained the natural associations (dog image—barking dog soundtrack and duck image—crying duck soundtrack) from previous experiences. All subjects provided written informed consent before participating in this study, and the tenets of the Declaration of Helsinki were followed. Every aspect of this study was carried out in accordance with the regulations of the KAIST Institutional Review Board.

Procedure

The subjects first calibrated the mirror stereoscope to superimpose the left and right guiding cues. The rivalry report began with three 60-second recordings (no sound, with barking dog sound, and with crying duck sound) to see the effects of semantically congruent soundtracks in natural associations.

Then, the subjects underwent the matching task to learn the reverse associations. During the matching task, one of the four possible audiovisual combinations (dog—barking dog soundtrack, dog—crying duck soundtrack, duck—barking dog soundtrack, and duck—crying duck soundtrack) was randomly presented on the monitor. The subjects were told to press ‘O’ key when the presented pair matches the given reverse association rules (either dog image with crying duck sound or duck image with barking dog sound) and ‘X’ when mismatch. Every response was reinforced using an audiovisual feedback of O/X image with correct/incorrect tone. The matching task continued until the subjects could score 20 correct answers. Any mistake during the task reset the score count to 0, making the subjects stay attentive to the rules. The average task completion time was 4 minutes (M = 245 seconds, SD = 28.16 seconds).

Next, the rivalry report was repeated in the same manner (no sound, with barking dog sound, and with crying duck sound) to see whether the same soundtracks would pose different effects in reverse associations. The criteria for piecemeal were ‘when neither dog or duck is exclusively dominant’ for all rivalry sessions and subjects could rest in between the sessions to recover from any eye strain to minimize the effects of adaptation.

Figure 6.

Results of Experiment 3. Mean and total dominance duration of dog visual percept (top row), duck visual percept (second row), piecemeal percept (third row), and switch rate (bottom row). The small cross symbols represent individual data points. Error bars represent 1 standard error of the mean. ns = not statistically significant. *p<0.05, **p<0.01.

Results

The duration of time for each dominant percept was recorded for each subject. After eliminating accidental keypresses (average error response rate = 1.10%, SD = 0.023), the trimmed data were analyzed using SPSS statistics software. Two-way repeated measures ANOVAs were conducted on the influence of two independent variables (association and sound) to compare the main effects of association and sound and the interaction effects between association and sound on dog, duck, piecemeal percepts, and switch rate, respectively. Association included two levels (before and after), and sound consisted of three levels (no sound, barking dog sound, and crying duck sound).

First, the analysis on complete dominance of dog and duck percepts revealed enhancement effects of semantically congruent auditory stimuli before reverse association in mean and total dominance duration (Figure 6). In the mean dominance duration of dog percept, there was a significant interaction effect between association and sound, F(2, 28) = 3.52, p = .043, $η_{p}^{2}$ = 0.20, with pairwise comparison showing significant enhancement effect of dog soundtrack (M = 1.76, SD = 0.37, p = .0003, d = 1.03) before the association compared with no sound condition (M = 1.44, SD = 0.25). However, there was no suppression effect by duck soundtrack (M = 1.43, SD = 0.32, p = .92, uncorrected, d = 0.04). The same pairwise comparison after the reverse association showed that the auditory modulation effects were statistically insignificant in both dog soundtrack (M = 1.54, SD = 0.24, p = .53, uncorrected, d = 0.20) and duck soundtrack (M = 1.63, SD = 0.39, p = .54, d = 0.44) compared with no sound condition (M = 1.49, SD = 0.25). Similar results were found in total dominance duration, F(2, 28) = 6.67, p = .004, $η_{p}^{2}$ = 0.32, with pairwise comparison showing enhancement effect of dog soundtrack (M = 21.75, SD = 4.97, p = .0001, d = 1.13) compared with no sound condition (M = 17.06, SD = 3.33) and no suppression effects of duck soundtrack (M = 17.15, SD = 3.88, p = .95, uncorrected, d = 0.02) before reverse association. The same pairwise comparison after the reverse association showed that the auditory modulation effects were statistically insignificant in both dog soundtrack (M = 18.07, SD = 5.33, p = .70, uncorrected, d = 0.06) and duck soundtrack (M = 19.85, SD = 4.37, p = .48, d = 0.44) compared with no sound condition (M = 17.75, SD = 5.26). In the mean dominance duration of duck percept, there was a significant interaction effect, F(2, 28) = 5.93, p = .007, $η_{p}^{2}$ = 0.30, with pairwise comparison showing significant enhancement effect of duck soundtrack (M = 2.06, SD = 0.35, p = .001, d = 1.55) before the reverse association compared with no sound condition (M = 1.54, SD = 0.32), while no suppression effect was revealed with dog soundtrack (M = 1.59, SD = 0.26, p = .56, uncorrected, d = 0.17). The same pairwise comparison after the reverse association showed that the auditory modulation effects were statistically insignificant in both dog soundtrack (M = 1.51, SD = 0.41, p = .58, uncorrected, d = 0.22) and duck soundtrack (M = 1.53, SD = 0.39, p = .67, uncorrected, d = 0.18) compared with no sound condition (M = 1.61, SD = 0.51). The total dominance results revealed similar interaction effect, F(2, 28) = 5.09, p = .013, $η_{p}^{2}$ = 0.27, with pairwise comparison showing significant enhancement of duck soundtrack (M = 22.57, SD = 5.08, p = .007, d = 0.62) from no sound condition (M = 19.19, SD = 5.81) and no suppression effect of dog soundtrack (M = 19.09, SD = 7.72, p = .95, uncorrected, d = 0.01) before reverse association. The same pairwise comparison after the reverse association showed that the auditory modulation effects were statistically insignificant in both dog soundtrack (M = 18.97, SD = 6.52, p = .27, d = 0.36) and duck soundtrack (M = 17.06, SD = 4.95, p = .99, uncorrected, d = 0.002) compared with no sound condition (M = 17.05, SD = 4.08).

In the mean piecemeal duration, the results revealed a significant interaction effect between association and sound, F(2, 28) = 10.41, p = .0004, $η_{p}^{2}$ = 0.43, meaning that the same animal soundtracks had different auditory modulatory effects before and after the reverse association. The post hoc pairwise comparison of each sound conditions was both statistically significant before and after the reverse association but the direction of auditory modulation effect on the ambiguous state turned out to be the opposite. First, before the reverse association task, presenting animal soundtracks could significantly reduce the amount of piecemeal state in both barking dog sound condition (M = 1.52, SD = 0.39, p = .04, d = 0.73) and crying duck sound condition (M = 1.62, SD = 0.55, p = .01, d = 0.48) compared with no sound condition (M = 1.92, SD = 0.70). Notice here, that unlike the ‘before association’ conditions in Experiment 1 or 2, the subjects already retained the long-standing natural associations between the visual and auditory stimuli (dog image—barking dog sound and duck image—crying duck sound) which could cause semantic congruence effect. As the graph shows, it seems that the preexisting natural associations could help reduce the amount of visual ambiguity. However, after the reverse association, listening to the same animal soundtracks now increased the mean piecemeal duration with the barking dog sound (M = 2.30, SD = 1.01, p = .04, d = 0.86) and the crying duck sound (M = 2.52, SD = 1.16, p = .02, d = 1.05), respectively, when compared with no sound condition (M = 1.68, SD = 0.44). Similar results were obtained with the total dominance duration of piecemeal state. The statistical analysis revealed a significant interaction effect, F(2, 28) = 20.02, p = .000004, $η_{p}^{2}$ = 0.59. The pairwise comparison revealed significant decrease of dog sound (M = 12.23, SD = 4.88, p = .005, d = 1.25) and duck sound (M = 11.58, SD = 4.16, p = .003, d = 1.45) compared with no sound condition (M = 19.83, SD = 7.24) before the association, while the same sounds significantly increased the total piecemeal durations in both dog sound (M = 23.37, SD = 8.64, p = .003, d = 1.17) and duck sound (M = 23.95, SD = 8.85, p = .001, d = 1.23) compared with no sound condition (M = 15.43, SD = 4.96) after the reverse association. Finally, the analysis on switch rate revealed a nonsignificant interaction effect, F(2, 28) = 0.92, p = .41, $η_{p}^{2}$ = 0.06, meaning that the decrease or increase of piecemeal durations was not due to modulated switch rate.

This result raises several questions regarding how multisensory associations work to affect the perceptual ambiguity and how recent novel associations interact with preexisting associations. We will discuss these and other questions in similar vein in the following discussions.

Discussions and Conclusion

Our findings show that novel multisensory association formed by explicit matching task can significantly affect not only the complete dominance but also the piecemeal state during binocular rivalry. Specifically, in Experiment 1, we compared implicit sensory association and explicit cognitive association in order to investigate the mechanism of cross-modal modulation effect and showed that sensory congruence of shared low-level feature is not sufficient to guarantee the resolution of visual ambiguity whereas high-level cognitive association formed by matching task could instigate the multisensory congruence effect and significantly reduce the piecemeal duration. Experiment 2 extended these findings and further clarified that the observed multisensory congruence effect on piecemeal state derived primarily from the arbitrary mappings between dissimilar sensory information. Finally, in Experiment 3, we utilized explicit reverse associations to examine how recent novel audiovisual associations compare to long-standing associations in terms of piecemeal modulation and showed that reversing preexisting associations result in significant increase of piecemeal percept during binocular rivalry. To our knowledge, this is the first demonstration of systematic study on the effects of laboratory-induced multisensory association on piecemeal state.

Although there were several previous studies investigating multisensory congruence effects on binocular rivalry, there are a few critical differences in our study design. First, instead of assuming binocular rivalry as serial unitary alternations, we incorporated large visual stimuli and focused on the piecemeal state which not only differentiates binocular rivalry from other forms of multistable phenomena but also provides insights on how ambiguous sensory information is handled as it is the conjunctional bridge of exclusive selection or reversal during the visual competition. This enabled us to see how different types of audiovisual associations could either mitigate or aggravate the visual confusion. Next, we tightly isolated congruence paradigms into sensory congruence and cognitive congruence by designing appropriate stimuli pairs. Rather than using familiar audiovisual pairs and assuming the subjects’ ability to integrate distinct visual and auditory information, we presented novel stimuli and controlled the level of association to minimize the confounding results of congruence effects. Last but not least, the procedure of inducing audiovisual association differed from other studies using multisensory learning. We used a more engaging explicit training with active judgment of match/mismatch in every given pair during the task, whereas previous studies used a go/no-go paradigm or a passive statistical learning where subjects either ignored the mismatch pairs or simply did not respond during the whole induction phase (Einhäuser et al., 2017; Piazza et al., 2018). Also, we presented an audiovisual feedback at every iteration to reinforce the subjects’ decision. With all this and as small as 20 total repetitions per training that lasted 4 minutes, the association training time was considerably shorter than previous reports of 8 to 20 minutes.

It should be noted that according to our experimental design, the association had to be induced only after testing without explicit association, and our participants had to go through six consecutive binocular rivalry sessions in a row which altogether may pose ordering effect, practice effect, and adaptation issue. First, in order to minimize the effects of adaptation, we not only counterbalanced the visual stimuli but also instructed the subjects to rest between the sessions until they felt no eye strain or any possible afterimage. In addition to minimizing the adaptation carryover between sessions, we also tried to minimize the effect of adaptation within a session by limiting the length of single session to 60 seconds. While keeping a relatively short single session duration, we tried to obtain best quality data within this period by making the subjects go through a warm-up session before the actual recording and even repeated several times if needed to assure the subjects performance while minimizing the practice effect in recorded data. Last but not least, we argue that ordering effect did not play a significant role in our current results as several previous evidences strongly support the idea that temporal dynamics of binocular rivalry is mostly involuntary and robustly chaotic (Blake, 2001; Blake & Logothetis, 2002). However, with all these efforts, we acknowledge that we cannot fully rule out the possibility of ordering effect and larger data overall may provide more comprehensive results.

There have been several speculations on how multisensory integration occurs and subsequently resolves the given perceptual ambiguity to achieve unequivocal interpretation (Deroy, Spence, & Noppeney, 2016; Faivre, Filevich, Solovey, Kühn, & Blanke, 2018; Hsiao, Chen, Spence, & Yeh, 2012; Hu & Knill, 2010; Lunghi, Lo Verde, & Alais, 2017; Salomon et al., 2016; Salomon, Kaliuzhna, Herbelin, & Blanke, 2015; Salomon, Lim, Herbelin, Hesselmann, & Blanke, 2013; Shams & Beierholm, 2010; Smith, Grabowecky, & Suzuki, 2007). One of the highly plausible explanations is that some intrinsic commonality in one stimulus can bias the perception of the other stimulus. Those intrinsic features have included spatiotemporally synced motion (Conrad et al., 2010), temporal structure (Lunghi & Alais, 2015; Lunghi et al., 2014; van Ee et al., 2009), or semantic relatedness (Chen et al., 2011; Zhou, Jiang, He, & Chen, 2010). However, it is worth noting that in order for the intrinsic commonality such as spatiotemporal synchrony to take effect, a near-perfect temporal congruence is required (Sekuler, Sekuler, & Lau, 1997; Shimojo & Shams, 2001). Furthermore, these studies were done using adult subjects which makes it difficult to evaluate whether the intrinsic commonality arises from the sensory input themselves or the subjects’ previous experience. Based on the findings of this study, we thereby suggest that the cross-modal information that may serve to modulate the weighting of elements in visual competition emerges from learned associations from an observer’s daily life.

Another aspect to consider is the existence of multisensory neurons and its neurophysiological behavior. Developmental studies of multisensory integration using the cat have shown that neither superior colliculus nor anterior ectosylvian sulcus neurons have multisensory properties at birth and are incapable of generating enhanced multisensory responses (Stein, Labos, & Kruger, 1973; Wallace, Carriere, Perrault, Vaughan, & Stein, 2006; Wallace & Stein, 1997). Rather, the integrative capacity developed over time with cumulative cross-modal sensory experience and recalibration of senses (Gori, Del Viva, Sandini, & Burr, 2008; Stein, Wallace, Stanford, & Jiang, 2002; Wallace & Stein, 2000). A study with human babies has also shown the delay in their ability to integrate visual and auditory cues for spatial localization, suggesting that humans might also acquire visual–auditory multisensory integration only after substantial postnatal experience of cross-modal stimuli (Neil, Chee‐Ruiter, Scheier, Lewkowicz, & Shimojo, 2006). These studies suggest that although intrinsic commonality of cross-modal stimuli can be an effective factor for facilitating multisensory integration, experiencing those inputs is also essential for enhanced multisensory responses. Given the fact that binocular rivalry is a visual competition between two different images within separate intraocular pathways, the mixed percept presumably takes place at a higher level of neural pathway. Thus, the cross-modal influence on mixed percept found in this study suggests that cross-modal disambiguation is likely a form of top-down mechanism based on learned associations rather than bottom-up modulation by multisensory neurons.

Several binocular rivalry studies using multisensory stimuli have argued the attentional effects of congruent auxiliary input biasing the predominance of visual perception by either boosting the coinciding stimulus (Conrad et al., 2010; Ooi & He, 1999; Shams, Kamitani, & Shimojo, 2000; Talsma, Senkowski, Soto-Faraco, & Woldorff, 2010; van Ee, Van Dam, & Brouwer, 2005) or suppressing the dominance of nonassociated stimulus (Brascamp et al., 2015). Although these studies do not provide direct remarks on piecemeal dynamics, they do acknowledge that implicit or explicit attention can alter the visual perception to some degree. The current findings of piecemeal duration modulation by explicit audiovisual association can also be explained in terms of response bias. In Experiments 1 and 2, novel visual and auditory stimuli could not interact in initial presentation as there was no existing relation. However, when subjects were trained with explicit matching task, the given rules between the visual and auditory cues could have caused response biases in favor of congruent percepts while in piecemeal state, leading to overall decreased duration of piecemeal percept. The same logic expands to the results of Experiment 3 with one caveat. In Experiment 3, it is worth noting that learning the reverse associations did not eliminate previously existing natural associations, as the subjects could still clearly recognize and correctly attribute the semantic contents of the stimuli. Thus, learning a reverse association and creating a response bias that contradicts prior associations could have interfered with natural association rules and led to increased duration of piecemeal percept. This also relates to one of the questions we aimed to answer in Experiment 3 of how multiple associations, although formed in vastly different timescale, interact to affect the ambiguous perception during rivalrous situation. In statistical sense, when visual stimuli are ambiguous and auditory stimulus is unambiguous, the optimal usage of environmental cues would be to select the auditory cue and put more weight on perceptually certain interpretation (Alais & Burr, 2004; Einhäuser et al., 2017). However, in reverse association, previously unambiguous auditory cues are now also ambiguous, as barking dog sound refers to duck image and vice versa which is an addition to the natural association. For example, when hearing the barking dog sound, the subjects are not simply directed to duck image as given by the reverse association rules, but first they recognize the semantic content of auditory cue as dog and redirect it to duck image. This suggests two important remarks regarding how multisensory associations affect the piecemeal state during rivalry. First, although in vastly different timescale, recent novel association do not simply overrule preexisting association, if there was any, and work in tandem to pose integral effects on ambiguous state. Second, when multiple associations contradict (single auditory cue associated with multiple visual stimuli), presenting an ambiguous auditory cue cannot selectively boost or suppress the visual stimuli and result in the increase of piecemeal duration. Further research is required to clarify how different levels of congruency (arbitrary, sensory, semantic) with different timescale (short, long) and multiplicity of associations (singular, plural) would affect the perceptual ambiguity during binocular rivalry.

In conclusion, we demonstrate that explicit audiovisual associations with given rules substantially affect the piecemeal state during binocular rivalry. The congruency effect by shared audiovisual representation that can subsequently reduce the amount of visual ambiguity originates primarily from repetitive active judgment of given pairs rather than common sensory features between different modalities. Furthermore, when one information is associated with multiple other information, recent and preexisting associations work collectively to influence the perceptual ambiguity during binocular rivalry. These results suggest that the temporal dynamics of perceptual ambiguity represented by the piecemeal state in this study is determined by a highly plastic process that involves evaluating arbitrary mappings and multiplicity of associations.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Alais

Burr

(2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14, 257–262. doi:10.1016/j.cub.2004.01.029

Alpers

G. W.

Gerdes

(2007). Here is looking at you: Emotional faces predominate in binocular rivalry. Emotion, 7, 495. doi:10.1037/1528-3542.7.3.495

Andrews

T. J.

Lotto

R. B.

(2004). Fusion and rivalry are dependent on the perceptual meaning of visual stimuli. Current Biology, 14, 418–423. doi:10.1016/j.cub.2004.02.030

Blake

(2001). A primer on binocular rivalry, including current controversies. Brain and Mind, 2, 5–38.

Blake

Logothetis

(2002). Visual competition. Nat Reviews Neuroscience, 3, 13–21. doi:10.1038/nrn701

Boring

E. G.

(1930). A new ambiguous figure. The American Journal of Psychology, 42, 444–445. doi:10.2307/1415447

Brainard

D. H.

(1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. doi:10.1163/156856897x00357

Brascamp

J. W.

Klink

P. C.

Levelt

W. J.

(2015). The ‘laws’ of binocular rivalry: 50 years of Levelt’s propositions. Vision Research, 109, 20–37. doi:10.1016/j.visres.2015.02.019

Carney

Shadlen

Switkes

(1987). Parallel processing of motion and colour information. Nature, 328, 647. doi:10.1038/328647a0

10.

Chen

Y. C.

Yeh

S. L.

Spence

(2011). Crossmodal constraints on human perceptual awareness: Auditory semantic modulation of binocular rivalry. Frontiers in Psychology, 2, 212. doi:10.3389/fpsyg.2011.00212

11.

Chopin

Mamassian

Blake

(2012). Stereopsis and binocular rivalry are based on perceived rather than physical orientations. Vision Research, 63, 63–68. doi:10.1016/j.visres.2012.05.003

12.

Conrad

Bartels

Kleiner

Noppeney

(2010). Audiovisual interactions in binocular rivalry. Journal of Vision, 10, 27. doi:10.1167/10.10.27

13.

Conrad

Kleiner

Bartels

O’Brien

J. H.

Bülthoff

H. H.

Noppeney

(2013). Naturalistic stimulus structure determines the integration of audiovisual looming signals in binocular rivalry. PLoS One, 8, e70710. doi:10.1371/journal.pone.0070710

14.

De Gelder

Bertelson

(2003). Multisensory integration, perception and ecological validity. Trends in Cognitive Sciences, 7, 460–467. doi:10.1016/j.tics.2003.08.014

15.

Deroy

Spence

Noppeney

(2016). Metacognition in multisensory perception. Trends in Cognitive Sciences, 20, 736–747. doi:10.1016/j.tics.2016.08.006

16.

Dijkstra

van de Nieuwenhuijzen

M. E.

van Gerven

M. A.

(2016). The spatiotemporal dynamics of binocular rivalry: Evidence for increased top-down flow prior to a perceptual switch. Neuroscience of Consciousness, 2016, niw003. doi:10.1093/nc/niw003

17.

Einhäuser

Methfessel

Bendixen

(2017). Newly acquired audio-visual associations bias perception in binocular rivalry. Vision Research, 133, 121–129. doi:10.1016/j.visres.2017.02.001

18.

Faivre

Filevich

Solovey

Kühn

Blanke

(2018). Behavioral, modeling, and electrophysiological evidence for supramodality in human metacognition. Journal of Neuroscience, 38, 263–277. doi:10.1523/jneurosci.0322-17.2017

19.

Fox

Rasche

(1969). Binocular rivalry and reciprocal inhibition. Perception & Psychophysics, 5, 215–217. doi:10.3758/bf03210542

20.

Gilbert

C. D.

Sigman

(2007). Brain states: Top-down influences in sensory processing. Neuron, 54, 677–696. doi:10.1016/j.neuron.2007.05.019

21.

Gori

Del Viva

Sandini

Burr

D. C.

(2008). Young children do not integrate visual and haptic form information. Current Biology, 18, 694–698. doi:10.1016/j.cub.2008.04.036

22.

Hollins

(1980). The effect of contrast on the completeness of binocular rivalry suppression. Perception & Psychophysics, 27, 550–556. doi:10.3758/bf03198684

23.

Hollins

Hudnell

(1980). Adaptation of the binocular rivalry mechanism. Investigative Ophthalmology & Visual Science, 19, 1117–1120.

24.

Hollins

Leung

E. H.

(1978). The influence of color on binocular rivalry. In Armington

J. C.

Krausfopf

Wotten

B. R.

(Eds.), Visual psychophysics and physiology (pp. 181–190). New York, NY: Academic Press. doi:10.1016/b978-0-12-062260-3.50021-6

25.

Hong

S. W.

Shevell

S. K.

(2008). Binocular rivalry between identical retinal stimuli with an induced color difference. Visual Neuroscience, 25, 361–364. doi:10.1017/s0952523808080139

26.

Hsiao

J. Y.

Chen

Y. C.

Spence

Yeh

S. L.

(2012). Assessing the effects of audiovisual semantic congruency on the perception of a bistable figure. Consciousness and Cognition, 21, 775–787. doi:10.1016/j.concog.2012.02.001

27.

Knill

D. C.

(2010). Kinesthetic information disambiguates visual motion signals. Current Biology, 20, R436–R437. doi:10.1016/j.cub.2010.03.053

28.

Kakizaki

(1960). Binocular rivalry and stimulus intensity. Japanese Psychological Research, 2, 94–105. doi:10.4992/psycholres1954.2.94

29.

Kang

M. S.

Blake

(2005). Perceptual synergy between seeing and hearing revealed during binocular rivalry. Psichologija, 32, 7–15.

30.

Kleiner

Brainard

Pelli

(2007). What’s new in psychtoolbox-3? Perception, 36(supplement), 14.

31.

Klink

P. C.

Boucherie

Denys

Roelfsema

P. R.

Self

M. W.

(2017). Interocularly merged face percepts eliminate binocular rivalry. Scientific Reports, 7, 7585. doi:10.1038/s41598-017-08023-9

32.

Klink

P. C.

Brascamp

J. W.

Blake

van Wezel

R. J.

(2010). Experience-driven plasticity in binocular vision. Current Biology, 20, 1464–1469. doi:10.1016/j.cub.2010.06.057

33.

Koelewijn

Bronkhorst

Theeuwes

(2010). Attention and the multiple stages of multisensory integration: A review of audiovisual studies. Acta Psychologica, 134, 372–384. doi:10.1016/j.actpsy.2010.03.010

34.

Kornmeier

Bach

(2012). Ambiguous figures—What happens in the brain when perception changes but not the stimulus. Frontiers in Human Neuroscience, 6, 51. doi:10.3389/fnhum.2012.00051

35.

Kovacs

Papathomas

T. V.

Yang

Feher

(1996). When the brain changes its mind: Interocular grouping during binocular rivalry. Proceedings of the National Academy of Sciences of the United States of America, 93, 15508–15511. doi:10.1073/pnas.93.26.15508

36.

Lack

L. C.

(1969). The effect of practice on binocular rivalry control. Perception & Psychophysics, 6, 397–400. doi:10.3758/bf03212798

37.

Lack

L. C.

(1974). Selective attention and the control of binocular rivalry. Perception & Psychophysics, 15, 193–200. doi:10.3758/bf03205846

38.

Lumer

E. D.

Friston

K. J.

Rees

(1998). Neural correlates of perceptual rivalry in the human brain. Science, 280, 1930–1934. doi:10.1126/science.280.5371.1930

39.

Lunghi

Alais

(2013). Touch interacts with vision during binocular rivalry with a tight orientation tuning. PLoS One, 8, e58754. doi:10.1371/journal.pone.0058754

40.

Lunghi

Alais

(2015). Congruent tactile stimulation reduces the strength of visual suppression during binocular rivalry. Scientific Reports, 5, 9413. doi:10.1038/srep09413

41.

Lunghi

Binda

Morrone

M. C.

(2010). Touch disambiguates rivalrous perception at early stages of visual analysis. Current Biology, 20, R143–R144. doi:10.1016/j.cub.2009.12.015

42.

Lunghi

Lo Verde

Alais

(2017). Touch accelerates visual awareness. i-Perception, 8(1), 1–14. doi:10.1177/2041669516686986

43.

Lunghi

Morrone

M. C.

Alais

(2014). Auditory and tactile signals combine to influence vision during binocular rivalry. Journal of Neuroscience, 34, 784–792. doi:10.1167/14.10.434

44.

Macaluso

Driver

(2005). Multisensory spatial interactions: A window onto functional integration in the human brain. Trends in Neurosciences, 28, 264–271. doi:10.1016/j.tins.2005.03.008

45.

Macaluso

George

Dolan

Spence

Driver

(2004). Spatial and temporal factors during processing of audiovisual speech: A PET study. Neuroimage, 21, 725–732. doi:10.1016/j.neuroimage.2003.09.049

46.

Macaluso

Noppeney

Talsma

Vercillo

Hartcher-O’Brien

Adam

(2016). The curious incident of attention in multisensory integration: Bottom-up vs. top-down. Multisensory Research, 29(6-7), 557–583. doi:10.1163/22134808-00002528

47.

Maier

Wilke

Aura

Zhu

Frank

Q. Y.

Leopold

D. A.

(2008). Divergence of fMRI and neural signals in V1 during perceptual suppression in the awake monkey. Nature Neuroscience, 11, 1193. doi:10.1038/nn.2173

48.

Maruya

Yang

Blake

(2007). Voluntary action influences visual competition. Psychological Science, 18, 1090–1098. doi:10.1111/j.1467-9280.2007.02030.x

49.

McGurk

MacDonald

(1976). Hearing lips and seeing voices. Nature, 264, 746–748. doi:10.1038/264746a0

50.

Necker

L. A.

(1832). LXI. Observations on some remarkable optical phænomena seen in Switzerland; and on an optical phænomenon which occurs on viewing a figure of a crystal or geometrical solid. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1, 329–337. doi:10.1080/14786443208647909

51.

Neil

P. A.

Chee-Ruiter

Scheier

Lewkowicz

D. J.

Shimojo

(2006). Development of multisensory spatial integration and perception in humans. Developmental Science, 9, 454–464. doi:10.1111/j.1467-7687.2006.00512.x

52.

Ooi

T. L.

Z. J.

(1999). Binocular rivalry and visual awareness: The role of attention. Perception, 28, 551–574. doi:10.1068/p2923

53.

Paffen

C. L.

Alais

(2011). Attentional modulation of binocular rivalry. Frontiers in Human Neuroscience, 5, 105. doi:10.3389/fnhum.2011.00105

54.

Paffen

C. L.

Naber

Verstraten

F. A.

(2008). The spatial origin of a perceptual transition in binocular rivalry. PLoS One, 3, e2311. doi:10.1371/journal.pone.0002311

55.

Pápai

M. S.

Soto-Faraco

(2017). Sounds can boost the awareness of visual events through attention without cross-modal integration. Scientific Reports, 7, 41684. doi:10.1038/srep41684

56.

Piazza

E. A.

Denison

R. N.

Silver

M. A.

(2018). Recent cross-modal statistical learning influences visual perceptual selection. Journal of Vision, 18, 1–1. doi:10.1167/18.3.1

57.

Polonsky

Blake

Braun

Heeger

D. J.

(2000). Neuronal activity in human primary visual cortex correlates with perception during binocular rivalry. Nature Neuroscience, 3, 1153. doi:10.1038/80676

58.

Rossion

Pourtois

(2004). Revisiting Snodgrass and Vanderwart’s object pictorial set: The role of surface detail in basic-level object recognition. Perception, 33, 217–236. doi:10.1068/p5117

59.

Rubin

(1921). Visuell wahrgenommene figuren [Visually perceived figures] (Vol. 1). Moscow, Russia: Рипол Классик.

60.

Salomon

Galli

Łukowska

Faivre

Ruiz

J. B.

Blanke

(2016). An invisible touch: Body-related multisensory conflicts modulate visual consciousness. Neuropsychologia, 88, 131–139. doi:10.1016/j.neuropsychologia.2015.10.034

61.

Salomon

Kaliuzhna

Herbelin

Blanke

(2015). Balancing awareness: Vestibular signals modulate visual consciousness in the absence of awareness. Consciousness and Cognition, 36, 289–297. doi:10.1016/j.concog.2015.07.009

62.

Salomon

Lim

Herbelin

Hesselmann

Blanke

(2013). Posing for awareness: Proprioception modulates access to visual consciousness in a continuous flash suppression task. Journal of Vision, 13, 2–2. doi:10.1167/13.7.2

63.

Scocchia

Valsecchi

Triesch

(2014). Top-down influences on ambiguous perception: The role of stable and transient states of the observer. Frontiers in Human Neuroscience, 8, 979. doi:10.3389/fnhum.2014.00979

64.

Sekuler

A. B.

Lau

(1997). Sound alters visual motion perception. Nature, 385, 308. doi:10.1038/385308a0

65.

Shams

Beierholm

U. R.

(2010). Causal inference in perception. Trends in Cognitive Sciences, 14, 425–432. doi:10.1016/j.tics.2010.07.001

66.

Shams

Kamitani

Shimojo

(2000). Illusions—What you see is what you hear. Nature, 408, 788–788. doi:10.1038/35048669

67.

Shams

Seitz

A. R.

(2008). Benefits of multisensory learning. Trends in Cognitive Sciences, 12, 411–417. doi:10.1016/j.tics.2008.07.006

68.

Shimojo

Shams

(2001). Sensory modalities are not separate modalities: Plasticity and interactions. Current Opinion in Neurobiology, 11, 505–509. doi:10.1016/s0959-4388(00)00241-5

69.

Smith

E. L.

Grabowecky

Suzuki

(2007). Auditory-visual crossmodal integration in perception of face gender. Current Biology, 17, 1680–1685. doi:10.1016/j.cub.2007.08.043

70.

Srinivasan

Russell

D. P.

Edelman

G. M.

Tononi

(1999). Increased synchronization of neuromagnetic responses during conscious perception. Journal of Neuroscience, 19, 5435–5448. doi:10.1523/jneurosci.19-13-05435.1999

71.

Stein

B. E.

Labos

Kruger

(1973). Sequence of changes in properties of neurons of superior colliculus of the kitten during maturation. Journal of Neurophysiology, 36, 667–679. doi:10.1152/jn.1973.36.4.667

72.

Stein

B. E.

Stanford

T. R.

(2008). Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews Neuroscience, 9, 255–266. doi:10.1038/nrn2331

73.

Stein

B. E.

Stanford

T. R.

Rowland

B. A.

(2014). Development of multisensory integration from the perspective of the individual neuron. Nature Reviews Neuroscience, 15, 520. doi:10.1038/nrn3742

74.

Stein

B. E.

Wallace

M. W.

Stanford

T. R.

Jiang

(2002). Book review: Cortex governs multisensory integration in the midbrain. The Neuroscientist, 8, 306–314. doi:10.1177/107385840200800406

75.

Stevenson

R. A.

Fister

J. K.

Barnett

Z. P.

Nidiffer

A. R.

Wallace

M. T.

(2012). Interactions between the spatial and temporal stimulus factors that influence multisensory integration in human performance. Experimental Brain Research, 219, 121–137. doi:10.1007/s00221-012-3072-1

76.

Talsma

(2015). Predictive coding and multisensory integration: An attentional account of the multisensory mind. Frontiers in Integrative Neuroscience, 9, 19. doi:10.3389/fnint.2015.00019

77.

Talsma

Senkowski

Soto-Faraco

Woldorff

M. G.

(2010). The multifaceted interplay between attention and multisensory integration. Trends in Cognitive Sciences, 14, 400–410. doi:10.1016/j.tics.2010.06.008

78.

Tong

Meng

Blake

(2006). Neural bases of binocular rivalry. Trends in Cognitive Sciences, 10, 502–511. doi:10.1016/j.tics.2006.09.003

79.

Tong

Nakayama

Vaughan

J. T.

Kanwisher

(1998). Binocular rivalry and visual awareness in human extrastriate cortex. Neuron, 21, 753–759. doi:10.1016/s0896-6273(00)80592-9

80.

van Atteveldt

N. M.

Formisano

Goebel

Blomert

(2007). Top–down task effects overrule automatic multisensory responses to letter–sound pairs in auditory association cortex. Neuroimage, 36, 1345–1360. doi:10.1016/j.neuroimage.2007.03.065

81.

van Ee

van Boxtel

J. J.

Parker

A. L.

Alais

(2009). Multisensory congruency as a mechanism for attentional control over perceptual selection. Journal of Neuroscience, 29, 11641–11649. doi:10.1523/jneurosci.0873-09.2009

82.

van Ee

Van Dam

L. C. J.

Brouwer

G. J.

(2005). Voluntary control and the dynamics of perceptual bi-stability. Vision Research, 45, 41–55. doi:10.1016/j.visres.2004.07.030

83.

von Helmholtz

Southall

J. P.

(1924). Helmholtz’s treatise on physiological optics (Vol. 1). Rochester, NY: Optical Society of America

84.

Wallace

M. T.

Carriere

B. N.

Perrault

T. J.

Vaughan

J. W.

Stein

B. E.

(2006). The development of cortical multisensory integration. Journal of Neuroscience, 26, 11844–11849. doi:10.1523/jneurosci.3295-06.2006

85.

Wallace

M. T.

Stein

B. E.

(1997). Development of multisensory neurons and multisensory integration in cat superior colliculus. Journal of Neuroscience, 17, 2429–2444. doi:10.1523/jneurosci.17-07-02429.1997

86.

Wallace

M. T.

Stein

B. E.

(2000). Onset of cross-modal synthesis in the neonatal superior colliculus is gated by the development of cortical influences. Journal of Neurophysiology, 83, 3578–3582. doi:10.1152/jn.2000.83.6.3578

87.

Wheatstone

(1838). XVIII. Contributions to the physiology of vision—Part the first. on some remarkable, and hitherto unobserved, phenomena of binocular vision. Philosophical Transactions of the Royal Society of London, 128, 371–394. doi:10.1098/rstl.1838.0019

88.

Whittle

(1965). Binocular rivalry and the contrast at contours. Quarterly Journal of Experimental Psychology, 17, 217–226. doi:10.1080/17470216508416435

89.

Wilke

Logothetis

N. K.

Leopold

D. A.

(2006). Local field potential reflects perceptual suppression in monkey visual cortex. Proceedings of the National Academy of Sciences, 103, 17507–17512. doi:10.1073/pnas.0604673103

90.

Wilson

H. R.

(2003). Computational evidence for a rivalry hierarchy in vision. Proceedings of the National Academy of Sciences, 100, 14499–14503. doi:10.1073/pnas.2333622100

91.

Zhou

Jiang

Chen

(2010). Olfaction modulates visual perception in binocular rivalry. Current Biology, 20, 1356–1358. doi:10.1016/j.cub.2010.05.059