Abstract
Is consciousness necessary for integration? Findings of seemingly high-level object-scene integration in the absence of awareness have challenged major theories in the field and attracted considerable scientific interest. Lately, one of these findings has been questioned because of a failure to replicate, yet the other finding was still uncontested. Here, we show that this latter finding—slowed-down performance on a visible target following a masked prime scene that includes an incongruent object—is also not reproducible. Using Bayesian statistics, we found evidence against unconscious integration of objects and scenes. Put differently, at the moment, there is no compelling evidence for object-scene congruency processing in the absence of awareness. Intriguingly, however, our results do suggest that consciously experienced yet briefly presented incongruent scenes take longer to process, even when subjects do not explicitly detect their incongruency.
Keywords
Consciousness and integration are typically held to be closely related (e.g., Baars, 2005; Dehaene & Naccache, 2001; Tononi & Edelman, 1998). Yet this common assumption has recently been challenged by findings of seemingly high-level integration in the absence of awareness (for a review, see Mudrik, Faivre, & Koch, 2014). One such finding pertained to object-scene integration. In two studies, the relations between an object and the scene in which it appears were manipulated. Surprisingly, it was found that these relations could be deciphered even when the scenes were rendered perceptually invisible (Mudrik, Breska, Lamy, & Deouell, 2011; Mudrik & Koch, 2013) using either continuous flash suppression (CFS; Tsuchiya & Koch, 2005) or visual sandwich masking (Breitmeyer & Ogmen, 2000). In the CFS study (Mudrik et al., 2011), incongruent scenes broke suppression faster than congruent ones. In the masking study (Mudrik & Koch, 2013), incongruent scenes, serving as primes, slowed down performance on subsequent target scenes, in comparison with congruent primes.
Recently, this CFS finding was called into question by Moors, Boelens, van Overwalle, and Wagemans (2016), who failed to replicate it. They further used Bayesian statistics to show that this null result does not stem from lack of power or from inconclusive data. This joins other studies that show that subjects are unable to explicitly detect incongruent objects in scenes presented for less than 100 ms (Glanemann, Zwitserlood, Bölte, & Dobel, 2016), though their performance can be affected by such incongruencies (Greene, Botros, Beck, & Fei-Fei, 2015).
Given the above evidence and the widely discussed “replication crisis” in the field (e.g., Munafò et al., 2017), we reexamined our own previous findings in an attempt to settle this ongoing debate. We focused on visual masking for two main reasons. First, we no longer think that the breaking-CFS procedure is the best strategy for probing unconscious processing. It does not allow one to isolate unconscious processes from those that are involved in emergence to awareness because there is not a good-enough control condition for such emergence-related effects (Stein, Hebart, & Sterzer, 2011). Further, it can sometimes induce prolonged periods of partial awareness of low-level features, which may modulate high-level effects (Gelbard-Sagiv, Faivre, Mudrik, & Koch, 2016). Second, several researchers have claimed that when conducted correctly, CFS substantially reduces perceptual processing (Yuval-Greenberg & Heeger, 2013), especially in comparison with masking (e.g., Izatt, Dubois, Faivre, & Koch, 2014; Peremen & Lamy, 2014). Thus, we reasoned that using masking will increase our chances of finding an effect. We conducted three experiments in which congruent and incongruent scenes were presented as primes and as targets, attempting to reproduce the finding of slower reaction times (RTs) for targets preceded by incongruent than by congruent primes. In Experiment 1, we closely followed the Mudrik and Koch (2013) procedure but with a substantially larger sample size and more trials. In Experiment 2, we tried to overcome a potential limitation of the original study: There, subjects’ performance on the target was relatively low because of its impoverished presentation conditions, yielding a relatively low number of trials with correct responses for each experimental condition. To further enhance signal-to-noise ratio in the data, we presented targets in Experiment 2 with a longer duration and higher contrast to obtain more usable trials with fewer repetitions of each stimulus. Finally, in Experiment 3, we examined whether the results of the previous two experiments could be explained by stimuli habituation or by too long a delay between prime and target.
Method
Subjects
One hundred thirty-five healthy Tel Aviv University students participated in this study for course credit or payment ($12 per hour), 45 in Experiment 1 (38 women; 41 right-handed; age: M = 22.56, SD = 1.95), 45 in Experiment 2 (30 women; 37 right-handed; age: M = 25.18, SD = 4.02), and 45 in Experiment 3 (24 women, 43 right-handed; age: M = 24.73, SD = 4.02). Thirty-four additional subjects were excluded from the analysis because they met at least one of the following predefined exclusion criteria: (a) fewer than 25 trials in each experimental cell for visibility 1 trials only (9 subjects in Experiment 1, 11 in Experiment 2, and 6 in Experiment 3), (b) performance higher than 65% for the prime task in visibility 1 trials in the visibility posttest (2 subjects in Experiment 1 and 6 in Experiment 3 out of those who passed the first criterion), and (c) performance lower than 70% correct for target classification as congruent or incongruent in Experiment 2 only (none of the subjects met this criterion).
Sample sizes were predefined as 45 subjects who did not meet any of the above exclusion criteria (following Simonsohn, 2015), which was 2.5 times the original sample size. 1 Power analysis using the G*Power package (Faul, Erdfelder, Buchner, & Lang, 2009) and the original effect size of Mudrik and Koch (2013) showed that this sample size would give over 99% power to detect an effect. Experiment 1 was preregistered on the Open Science Framework (OSF; osf.io/bpj7m), including sample size and subject exclusion criteria—which were similar to those used in Experiment 2. Experiment 3 was also preregistered on the OSF (osf.io/y2exn). All subjects reported normal or corrected-to-normal vision. The study was approved by the ethics committee of Tel Aviv University, and informed consent was obtained after the experimental procedures were explained to the subjects.
Stimuli
Stimuli were 144 pairs of congruent and incongruent colored real-life scenes used in Mudrik and Koch (2013). The images depicted a person performing an action with a critical object, which could be either congruent (e.g., a car mechanic holding a mechanical device) or incongruent (e.g., a car mechanic holding a purse). Both types of objects (congruent, incongruent) were inserted from another image using Adobe Photoshop (see the Method section in Mudrik & Koch, 2013, for further details regarding image selection). The images appeared on a gray background (RGB: 128, 128, 128) at the center of the computer screen and subtended 5.20° (width) × 7.27° (height) of visual angle. The masks were scrambled images created by dividing the images into a 5 × 6 matrix and randomly shuffling the cells.
Apparatus
Stimuli were presented on an LCD monitor (23-in. SyncMaster, ASUS, Taipei, Taiwan; 1,920 × 1,080 resolution; 60 Hz refresh rate) using MATLAB (The MathWorks, Natick, MA) and the Psychophysics Toolbox Version 3 (Brainard, 1997). Subjects sat in a dimly lit room, and their heads were stabilized using a chin rest located 60 cm from the screen.
Procedure
All experiments included three sessions: (a) calibration, (b) main experiment, and (c) visibility posttest. Stimulus presentation was identical in all three sessions (see Fig. 1): The prime appeared for 33 ms, preceded by two forward masks and followed by backward masks (i.e., sandwich masking); all masks were presented for 50 ms and followed by 17-ms blank interstimulus intervals (ISIs). In Experiments 1 and 2, two backward masks were presented. In Experiment 3, only one backward mask was presented to shorten the delay between prime and target and increase the chances of obtaining an effect. To strengthen masking power, we accompanied the first backward mask with six colored squares (two blue, two green, and two red), which overlapped the mask’s borders and extended outward (see Fig. 1). In Experiments 1 and 2, this sequence was presented three times to increase the strength of the prime signal while still keeping it suppressed (Macknik & Livingstone, 1998).

Experimental paradigm and stimuli. On each trial (a), the prime appeared for 33 ms. It was preceded and followed by either two masks (Experiments 1 and 2; shown here) or one mask (Experiment 3; not shown here). The first backward mask was presented with six colorful squares that overlapped its edges and extended its boundaries. This sequence repeated either three times (Experiments 1 and 2) or once (Experiment 3) and was followed by an unmasked target that appeared for 33 ms (Experiments 1 and 3) or 500 ms (Experiment 2), at either a reduced or full contrast. Then questions were presented following a blank interval. In the main session, subjects were asked to classify whether the target was weird or normal, rate prime visibility on a 4-point scale, and report whether the prime was weird or not. In both the calibration and visibility posttest conditions, subjects reported whether the prime was upright or inverted, and then rated its visibility. The enlarged examples (b) show congruent and incongruent versions of both a prime and a target.
Experiment 3 tested whether the results could be explained by habituation effects stemming from the repeated presentation of the prime. Thus, in that experiment, the sequence was presented only once. After this stimulation sequence, a target image appeared for either 33 ms (Experiments 1 and 3) or 500 ms (Experiment 2) at a contrast of 0.5 (Experiments 1 and 3) or 1.0 (Experiment 2). In all experiments, the target was not masked and thus always visible, but in Experiments 1 and 3, its visibility was substantially reduced given the short presentation duration and the lower contrast. The target was followed by a 17-ms blank ISI, after which a series of questions appeared, defined according to each experimental session.
Calibration
The calibration phase was designed to determine prime and mask contrasts (defined by image transparency using MATLAB’s alpha function) for each subject separately, in order to reach the highest contrast of the prime for which subjects were still at chance performance. It included 72 trials in which primes were presented either upright or inverted (randomly intermixed, with the constraint that prime orientation was never the same in four consecutive trials). A staircase procedure (Levitt, 1971) was used, in which the initial mask’s contrast was set to 0.85 (Experiments 1 and 3) and 1.0 (Experiment 2), the initial prime contrast was set to 0.7 (Experiments 1 and 3) and 0.5 (Experiment 2), and the lowest possible prime contrast was set to 0.4 (Experiments 1, 2, and 3). Masks and prime contrasts were changed on the basis of subjects’ performance on an orientation judgment task, in which they had to determine whether the prime image was upright or inverted (for further details, see Mudrik & Koch, 2013). Across subjects, the averaged mask contrast was 0.98 (SD = 0.06), 0.96 (SD = 0.08), and 0.96 (SD = 0.08) and averaged prime contrast was 0.48 (SD = 0.08), 0.45 (SD = 0.04), and 0.56 (SD = 0.11), for Experiments 1, 2, and 3, respectively.
Main session
The main session included three (Experiments 1 and 3) or two (Experiment 2) experimental blocks, each with 144 trials. All primes were upright and included either a congruent or an incongruent object in an intermixed order (with the constraint that prime and target congruency were never the same in four consecutive trials). Within each block, every image appeared both as congruent and as incongruent; the order was randomly determined per block, and thus, all images repeated three (Experiments 1 and 3) or two (Experiments 2) times throughout the experiment. Subjects’ task was threefold. First, they had to report target congruency as quickly and accurately as possible by pressing either the left arrow (“incongruent”) or the right arrow (“congruent”) on the keyboard. Because the target in Experiment 2 was presented for a long duration, subjects could respond even before the target disappeared. After judging target congruency, subjects were asked to subjectively rate prime visibility using the 4-point Perceptual Awareness Scale (Ramsy & Overgaard, 2004), where 1 is “I didn’t see anything,” 2 signifies “I had a vague perception of something,” 3 represents “I saw a clear part of the image,” and 4 stands for “I saw the entire image clearly.” Finally, they were asked to report prime congruency using the arrow keys on the keyboard. If subjects did not know the answer, they were instructed to guess.
Visibility posttest
The visibility posttest was intended to further assess stimulus visibility during the main session. There, because of the high-level nature of the task, subjects could perform at chance in reporting prime congruency even if they were partially aware of the scene (Gelbard-Sagiv et al., 2016; Kouider & Dupoux, 2004). Chance performance could also stem from a memory failure because this was the last question out of three. Thus, in the visibility posttest (N = 104), primes from the main session were presented either upright or inverted, and subjects were asked to judge their orientation first and then rate their visibility.
Analysis
All analyses besides the post hoc analysis on target classification and comparisons between experiments were confirmatory (they either were explicitly mentioned in the preregistration form or exactly followed the analyses in Mudrik & Koch, 2013).
Exclusion criteria
Trials in which subjects pressed the wrong button or in which RTs were shorter than 0.3 s, were longer than 4 s, or deviated 3 standard deviations or more from the mean for each experimental condition in each block were excluded from all analyses (3.7% in Experiment 1, 3.8% in Experiment 2, and 3.4% in Experiment 3). Target-related analyses were performed on the remaining trials. All prime-related analyses were performed on trials in which subjects rated the visibility of the prime as “1” (which we refer to as visibility 1 trials) only. Out of these, RT effects were examined using correct trials only (for the percentage of visibility ratings and accuracy, see the Results section).
Measures of signal detection theory (SDT)
All SDT measures were calculated using the Palamedes toolbox (Prins & Kingdom, 2009). In extreme cases, hits and false alarm rates of 0 were replaced with 0.5/n, and those of 1 were replaced with (n – 0.5)/n, where n is the number of signal or noise trials, respectively (Stanislaw & Todorov, 1999).
Confidence intervals (CIs)
For t-test results, 95% CIs are given for the mean difference between the two conditions.
Bayesian analysis
Traditional null-hypothesis significance testing was complemented with Bayesian analysis, which was intended to quantify the evidence for the presence or absence of effects. We calculated the Bayes factor (BF), defined as the ratio of the probability of observing the data given the null hypothesis (H0) and the probability of observing the data given the alternative hypothesis (H1), using the BayesFactor package (Version 0.9.11-1; Morey & Rouder, 2015) for the R software environment (R Core Team, 2015). For mean comparisons, we used the BF t test with default settings (medium prior scale for paired observations). For factorial analyses, we used the analysis of variance (ANOVA) BF with default settings as well (medium prior for fixed effects and nuisance prior for random effects) while the subject factor was considered random. Notably, our model did not include item as another random factor because a model including item proved to be nonpreferable over the existing model (BFs for the comparison between models, i.e., the full model with item as a random factor divided by the full model without it, were smaller than 10−19 for all experiments).
In order to compute BFs for the main effects and interactions, we compared a full model that includes all main effects and interactions with a reduced model in which the effect of interest was not included. We adopted the convention that a BF less than 0.1 implies strong evidence for the lack of an effect (i.e., the data are at least 10 times more likely to be observed given H0 than given H1), a BF between 0.1 and 0.33 provides moderate evidence for the lack of an effect, a BF between 0.33 and 3 suggests insensitivity of the data (anecdotal evidence for the lack or presence of an effect, for 0.33 < BF < 1 or 1 < BF < 3, respectively), a BF between 3 and 10 denotes moderate evidence for the presence of an effect (i.e., H1), a BF between 10 and 100 implies strong evidence, and a BF greater than 100 suggests extreme evidence for the presence of an effect (Lee & Wagenmakers, 2013).
Results
Prime-related effects
Prime visibility
Subjective measure
Subjects’ visibility ratings in the main session confirmed that the masking procedure for the prime was effective in all experiments: 83.39% (95% CI = [80.21, 86.57]), 81.60% (95% CI = [77.65, 85.55]), and 90.30% (95% CI = [87.58, 93.02]) of the trials in Experiments 1, 2, and 3, respectively, were rated as “I saw nothing” (1); 13.37% (95% CI = [10.84, 15.91]), 14.53% (95% CI = [11.25, 17.82]), and 9.30% (95% CI = [6.67, 11.93]) of the trials in Experiments 1, 2, and 3, respectively, were rated as “I had a vague perception of something” (2); and only 3.24% (95% CI = [1.78, 4.64]), 3.87% (95% CI = [2.12, 5.55]), and 0.40% (95% CI = [0.14, 0.53]) of the trials in Experiments 1, 2, and 3, respectively, were rated as either “I saw a clear part of the image” (3) or “I saw the entire image clearly” (4).
In order to examine whether prime congruency modulated the probability to rate the prime as not seen (visibility 1), we performed a t test comparing the mean frequency of visibility 1 trials for congruent versus incongruent primes. In Experiment 1, incongruent primes (M = 84.21%, SD = 10.57%) were rated more as not seen than were congruent primes (M = 82.59%, SD = 10.79%), t(44) = 3.09, p = .004, 95% CI = [−0.03, −0.01], BF = 9.70. Though this could have suggested a differential processing of congruent and incongruent primes, this result was not replicated in Experiment 2 (incongruent: M = 81.53%, SD = 13.01%; congruent: M = 81.66%, SD = 13.55%, t < 1, BF = 0.17) or in Experiment 3 (incongruent: M = 90.16%, SD = 9.28%; congruent: M = 90.45%, SD = 8.95%, t < 1, BF = 0.22).
Objective congruency measure (main experiment)
In all three experiments, subjects performed at chance in classifying the primes as congruent or incongruent in visibility 1 trials (see Table 1). This chance performance did not differ between the three experiments, F(2, 132) = 1.29, p = .281, η p 2 = .03, BF = 0.24. Performance on the objective congruency measure also did not differ between congruent and incongruent primes in Experiment 1, t(44) = 0.18, p = .858, 95% CI = [−0.16, 0.14], BF = 0.16; in Experiment 2, t(44) = 1.37, p = .177, 95% CI = [−0.27, 0.05], BF = 0.39; or in Experiment 3, t(44) = 1.74, p = .089, 95% CI = [−0.28, 0.02], BF = 0.65.
Descriptive Statistics and Between-Condition Comparisons for the Two Objective Measures in All Three Experiments
Note: The table shows results for visibility 1 trials only. For means, standard deviations are given in parentheses; t tests were conducted against 0.5 for accuracy and against 0 for d′. BF = Bayes factor, CI = confidence interval.
Objective orientation measure (visibility posttest)
Overall, there was an increase in prime visibility in the visibility posttest in all three experiments (this was also observed in Mudrik & Koch, 2013): Visibility 1 trials now consisted of only 70.06% (95% CI = [64.78, 75.35]), 68.54% (95% CI = [64.04, 73.04]), and 75.67% (95% CI = [69.87, 81.46]) of the trials in Experiments 1 to 3, respectively. The probability of visibility 1 trials was lower in the posttest than in the main task—Experiment 1: t(44) = 6.14, p < .0001, 95% CI = [8.96, 17.71], BF = 7.2 × 104; Experiment 2: t(44) = 6.93, p < .0001, 95% CI = [9.26, 16.85], BF = 8.9 × 105; Experiment 3: t(44) = 5.87, p < .0001, 95% CI = [9.61, 19.66], BF = 3.1 × 104. As with objective performance in the main session, here, too, only visibility 1 trials were analyzed.
As in Mudrik and Koch (2013), subjects performed slightly yet significantly above chance in discriminating prime orientation in visibility 1 trials in Experiment 1, in Experiment 2 (only in d′, not in raw accuracy measures), and in Experiment 3 (Table 1). This implies that in the main session, too, there might have been some partial awareness of the primes, though not one that allowed subjects to become aware of prime congruency, which is the focus of this study.
Performance in the visibility posttest did not differ between congruent and incongruent primes in any of the experiments—Experiment 1: t(44) = 0.38, p = .704, 95% CI = [−0.18, 0.13], BF = 0.17; Experiment 2: t(44) = 0.15, p = .879, 95% CI = [−0.16, 0.14], BF = 0.16; and Experiment 3: t(44) = 0.09, p = .928, 95% CI = [−0.13, 0.12], BF = 0.18.
Reaction times
A Shapiro-Wilk normality test conducted on the combined distribution of RTs in all experimental conditions showed that RTs were normally distributed in Experiments 1 and 3 (W = 0.99, p = .336, and W = 0.99, p = .135, respectively), but not in Experiment 2 (W = 0.94, p < .0001). We therefore ran a log transformation on the RT data of Experiment 2, which rendered the distribution closer to normality (W = 0.98, p = .028). Accordingly, all analyses were conducted on log-transformed RT data. We ran the same transformation on the data from Experiments 1 and 3, too, to allow comparison between the three results patterns. Conducting the analyses on the raw data of Experiments 1 and 3 did not change the results.
A repeated measures ANOVA was conducted with prime congruency (congruent vs. incongruent) and prime-target repetition (same vs. different) as within-subjects variables. In Experiments 1 and 3, the interaction was significant, F(1, 44) = 34.75, p < .0001, η p 2 = .44, BF = 3.3 × 1010, and F(1, 44) = 20.15, p < .001, η p 2 = .31, BF = 2.7 × 105, respectively. This interaction, in fact, reflects subjects’ longer RTs for incongruent targets rather than any effect caused by the primes: It shows that subjects were always slower in trials in which targets were incongruent (incongruent primes in same trials = incongruent target, or congruent primes in different trials = incongruent target; see Table 2). Such an interaction was not found in Experiment 2, where subjects had ample time to inspect the target scene (F < 1, BF = 0.22).
Mean Reaction Time and Accuracy for Same and Different Trials in Experiments 1 Through 3
Note: The table shows results for visibility 1 trials only. Standard deviations are given in parentheses.
More critically, in all three experiments, none of the main effects—including that of prime congruency, which was found by Mudrik and Koch (2013)—were significant; Experiment 1—main effect of congruency: F < 1, BF = 0.20, main effect of repetition: F < 1, BF = 0.18; Experiment 2—main effect of congruency: F(1, 44) = 2.34, p = .134, η p 2 = .05, BF = 0.23, main effect of repetition: F < 1, BF = 0.17; Experiment 3—main effect of congruency: F(1, 44) = 2.43, p = .126, η p 2 = .05, BF = 0.32, main effect of repetition: F < 1, BF = 0.19. This pattern did not change when we analyzed the first block only, which did not include any image repetition; Experiment 1—main effect of congruency: F < 1, BF = 0.20, main effect of repetition: F < 1, BF = 0.25, interaction: F(1, 44) = 18.45, p < .001, η p 2 = .30, BF = 4.0 × 103; Experiment 2—main effect of congruency: F(1, 44) = 3.00, p = .090, η p 2 = .06, BF = 0.33, main effect of repetition: F(1, 44) = 2.05, p = .159, η p 2 = .04, BF = 0.30, interaction: F(1, 44) = 2.42, p = .127, η p 2 = .05, BF = 1.34; Experiment 3—main effect of congruency: F < 1, BF = 0.22, main effect of repetition: F < 1, BF = 0.16, interaction: F(1, 44) = 4.43, p = .041, η p 2 = .09, BF = 5.03.
Bayesian analysis was used to assess whether these null results stem from a genuine absence of effect or from inconclusive or underpowered data. Because our main goal was to replicate the main effect of prime congruency, we focus on it here: In all three experiments, BFs for this main effect were lower than 0.33 (Experiment 1: BF = 0.20, Experiment 2: BF = 0.23, Experiment 3: BF = 0.32), suggesting that H0 (i.e., the absence of an effect) was more probable than H1 (i.e., the presence of an effect). Importantly, when we combined the data from all three experiments (in a model that included experiment as an additional factor), the BF dropped to 0.14, further suggesting the lack of an effect. Following Moors et al. (2016), we conducted a post hoc sequential analysis in which we iteratively examined the data, each time adding one more subject from the sample, and recalculated BF for the main effect of congruency (see Fig. 2). This analysis shows how—aside from occasional deviations—the evidence for the lack of an effect becomes stronger the more data are obtained.

Sequential analysis of the Bayes factor (BF; on log-transformed reaction times) for the main effect of congruency as a function of sample size (i.e., number of participants included in the analysis), separately for each of the three experiments. The background shades indicate whether the BFs provide moderate evidence for the null hypothesis (H0), provide moderate evidence for the alternative hypothesis (H1), or are inconclusive. Note that BFs are given in log10 values, so that 0.33 corresponds to −0.48, and 3 corresponds to 0.48, etc. Filled and empty shapes represent conclusive and inconclusive evidence, respectively.
Accuracy
A repeated measures ANOVA with prime congruency (congruent vs. incongruent) and prime-target repetition (same vs. different) as within-subjects variables mirrored the RT results. In Experiments 1 and 3, only the interaction was significant; Experiment 1—interaction: F(1, 44) = 22.42, p < .0001, η p 2 = .34, BF = 7.0 × 1010, main effect of congruency: F(1, 44) = 3.50, p = .068, η p 2 = .08, BF = 0.28, main effect of repetition: F(1, 44) = 1.44, p = .237, η p 2 = .03, BF = 0.18; Experiment 3—interaction: F(1, 44) = 13.04, p < .001, η p 2 = .23, BF = 2.9 × 106, main effect of congruency: F(1, 44) = 3.90, p = .055, η p 2 = .08, BF = 0.20, main effect of repetition: F(1, 44) = 1.88, p = .177, η p 2 = .04, BF = 0.18. In other words, target classification was less accurate when the target was incongruent (congruent prime – same target, incongruent prime – different target). In Experiment 2, none of the effects were significant—main effect of congruency: F < 1, BF = 0.17, main effect of repetition: F < 1, BF = 0.17, interaction: F(1, 44) = 1.16, p = .287, η p 2 = .03, BF = 0.83. This pattern did not change when we analyzed the first block only, which did not include any image repetition; Experiment 1—main effect of congruency: F < 1, BF = 0.17, main effect of repetition: F(1, 44) = 1.07, p = .306, η p 2 = .02, BF = 0.20, interaction: F(1, 44) = 9.68, p = .003, η p 2 = .18, BF = 6.6 × 103; Experiment 2—main effect of congruency: F < 1, BF = 0.19, main effect of repetition: F < 1, BF = 0.19, interaction: F < 1, BF = 0.61; Experiment 3—main effect of congruency: F < 1, BF = 0.18, main effect of repetition: F < 1, BF = 0.16, interaction: F(1, 44) = 7.89, p = .007, η p 2 = .15, BF = 1.7 × 103.
Target-related effects
Aside from our critical examination of the original effect, which we now failed to replicate, we examined subjects’ response pattern for the targets. Notably, although the target was consciously perceived in all experiments, it was very briefly presented (33 ms) at a low contrast in Experiments 1 and 3, which affected subjects’ ability to detect its incongruency.
In all experiments, subjects were accurate in classifying the targets as congruent or incongruent (see Table 3). Accuracy and d ′ rates in Experiments 1 and 3 were higher than chance, yet lower than in Experiment 2—one-way ANOVA for the difference in performance between the three experiments: F(2, 132) = 318.60, p < .0001, η p 2 = .89, BF = 1.59 × 1016.
Descriptive Statistics and Between-Condition Comparisons for Overall Performance and as a Function of Target Congruency in All Three Experiments
Note: The table shows results for all trials. For means, standard deviations are given in parentheses; t tests were conducted against 0.5 for accuracy and against 0 for d′ and lnβ. BF = Bayes factor, CI = confidence interval.
In Experiments 1 and 3, in which the target was deteriorated and more difficult to detect, subjects were more accurate at classifying congruent targets than incongruent ones—Experiment 1: t(44) = 5.41, p < .0001, 95% CI = [0.10, 0.23], BF = 7.23 × 103; Experiment 3: t(44) = 4.25, p < .001, 95% CI = [0.06, 0.17], BF = 219.97. Subjects also classified congruent targets faster than incongruent ones—Experiment 1: t(44) = 3.40, p = .002, 95% CI = [−0.04, −0.01], BF = 21.00; Experiment 3: t(44) = 3.35, p = .002, 95% CI = [−0.04, −0.01], BF = 18.44 (note that these analyses were conducted on log-transformed data). This accuracy advantage for congruent targets may be explained by subjects’ response bias to classify targets as congruent under conditions in which the target was less detectable (Table 3).
This bias could potentially also explain the difference in RTs: If subjects are biased to press “congruent,” it may take them longer to switch from their preferred response to the nonpreferred “incongruent” response. Yet if this is the case, the RT difference should be manifested only in correct trials, where subjects’ responses were indeed different in congruent and incongruent trials. Alternatively, if this RT effect does not stem from response bias, it should remain the same also in trials where the response was identical, irrespective of target congruency (i.e., in trials where subjects responded “congruent” either because the target was congruent or because they did not explicitly recognize it was incongruent). To test these alternatives, we conducted a post hoc exploratory analysis including only congruent responses. There, too, subjects were slower for incongruent targets (M = 1.44 s, SD = 0.34) than for congruent ones (M = 1.39 s, SD = 0.31) in both Experiment 1, t(44) = 3.23, p = .002, 95% CI = [−0.06, −0.01], BF = 13.70, and Experiment 3 (incongruent: M = 1.34 s, SD = 0.31; congruent: M = 1.30 s, SD = 0.29), t(44) = 2.34, p = .024, 95% CI = [−0.04, −0.003], BF = 1.87 (Fig. 3). Critically, and in sharp contrast to the prime-related null results reported above, here when the data from the two experiments were combined (in a model that included experiment as an additional factor), the BF increased to 114.36, providing extreme evidence for the existence of an effect. This suggests that subjects were implicitly affected by target congruency even when not explicitly detecting it: Although they said the image was congruent (i.e., they were unaware of its incongruency), their slower RTs imply that they nevertheless processed that incongruency. Notably, the overall slower and worse performance for incongruent targets was not found in Experiment 2, where the target was clearly presented for 500 ms—RTs: t(44) = 0.61, p = .545, 95% CI = [−0.02, 0.04], BF = 0.19; accuracy: t(44) = 1.52, p = .137, 95% CI = [−0.01, 0.07], BF = 0.47.

Implicit processing of target congruency, as reflected by reaction time (RT; raw data) for congruent and incongruent targets, in trials where subjects classified the target as congruent (i.e., correct congruent trials and incorrect incongruent trials). Each shape represents the average RT for an individual subject (congruent targets: x-coordinate; incongruent targets: y-coordinate), in Experiment 1 and Experiment 3. Shapes on the diagonal represent no difference between RTs for congruent and incongruent targets, shapes above the diagonal suggest slower performance for missed incongruent targets, and shapes below the diagonal suggest slower performance for correctly identified congruent targets. Darker shapes represent group means with lines representing within-subject 95% confidence intervals (Morey, 2008) for RTs following both congruent (vertical lines) and incongruent (horizontal lines) targets. The histograms at the lower left corner sum the number of shapes with respect to the diagonal line. Note that in both experiments, shape distribution is nonsymmetrical, so most shapes are above the diagonal.
We further note that the latter exploratory analysis is exposed to the criticism of regression to the mean (Shanks, 2017) because it was based on a post hoc selection of some of the trials (i.e., those in which subjects classified the target as congruent). Though we cannot exclude this possibility, a key prediction of the regression-to-the-mean account is that the averaged effect in the selected cases (i.e., congruent – incongruent targets for congruent classification trials, calculated for the log-transformed RTs) would be greater than the averaged effect in the entire sample (i.e., congruent – incongruent targets, calculated for the log-transformed RTs), which was not the case here—Experiment 1: t(44) = 1.45, p = .155, 95% CI = [−0.004, 0.03], BF = 0.43; Experiment 3: t(44) = 0.19, p = .848, 95% CI = [−0.02, 0.01], BF = 0.16. Future research oriented at directly probing implicit processing of object-scene relations should further examine this potential problem, with a design directly aimed at avoiding post hoc selection.
Discussion
In three experiments, we failed to find evidence for integration of objects and scenes without awareness. Bayesian analysis confirmed that this lack of an effect did not stem from inconclusive data or from an underpowered design, providing evidence for the null hypothesis. Thus, our study joins that of Moors and collaborators (2016) in suggesting that consciousness, here assessed by stimulus visibility, may in fact be needed for integrating an object within the scene in which it appears. This is in line with a specific suggestion in that spirit (Koch & Tononi, 2011) and with the two prominent theories in the field of consciousness studies, namely, the global neuronal workspace (GNW) theory (Dehaene & Naccache, 2001; see also Baars, 2005) and the information integration theory (IIT; Tononi, Boly, Massimini, & Koch, 2016).
Both theories assign a special importance to integration. Experience is held to be irreducible to noninterdependent subsets of phenomenal distinctions; conscious processes are considered interactive, involving long-range projections and recurrent processes, whereas unconscious processes are local and encapsulated (Dehaene & Naccache, 2001; Tononi et al., 2016). Under these assumptions, unconscious processes cannot suffice for object-scene integration. It has been suggested that the latter may be mediated by recurrent, integrative connections between ventral and prefrontal areas (Bar, 2004), with ongoing interactions between scene- and object-identification mechanisms working in parallel (Torralba, Oliva, Castelhano, & Henderson, 2006; see also Mudrik, Lamy, & Deouell, 2010). Our current results are thus in line with both the IIT and GNW theories, suggesting that object-scene integration does not occur during unconscious processing (for two other examples of studies implying other types of limited integration in the absence of awareness, see Faivre & Koch, 2014; Harris, Schwarzkop, Song, Bahrami, & Rees, 2011). Much like every null result, however, our results cannot prove that this is the case; future studies, using different strategies to tackle the problem, could always show otherwise.
Interestingly, our results do imply that although processing incongruencies may require conscious perception of the scenes, it may be independent of subjects’ awareness of scene incongruency. This was revealed by analyzing subjects’ performance on the consciously perceived yet highly impoverished target scenes in Experiments 1 and 3. In these experiments, the target scenes were unmasked, and thus visible, but were presented for only 33 ms at a low contrast (0.5). This resulted in relatively poor performance at detecting incongruent scenes, in line with previous studies (Glanemann et al., 2016; Greene et al., 2015). In other words, subjects consciously perceived the scenes and often managed to extract their gist, but they were not always able to detect incongruencies when they appeared (i.e., they classified the scene as congruent though it was incongruent). Critically, our results show that even in cases when subjects missed the incongruency, reporting the image as “normal” or congruent, their RTs were nevertheless slower (see Greene et al., 2015, for a similar result), either because they were slowed down by the incongruency of the scene or simply because they erred. Under both accounts, implicit processing of scene-objects relations had to take place. This may reflect a genuine dissociation between awareness of the scene and awareness of its incongruency—the former seems to be needed for processing object-scene relations, whereas the latter is not: Even when subjects are not explicitly aware that the scene is incongruent, they still implicitly process its incongruency. One should note that the effect was not found in the original study that we aimed to replicate here (Mudrik & Koch, 2013) or in Experiment 2, but this could be explained by the designs of both experiments (in the original study, the sample size was too small, N = 14; in Experiment 2, long presentation durations might have rendered the RT measure less sensitive).
On a more methodological note, this study highlights the need for more rigorous research practices, especially in the study of unconscious processes where the effects are typically weaker and more fragile than in conscious processing. In our case, unfortunately, the use of overly small sample sizes without an internal replication seems to have led to two false positive results in the same direction. This again demonstrates how small sample sizes are prone to extreme outcomes (Tversky & Kahneman, 1971) that sometimes lead researchers astray—part of the ongoing replication crisis in the psychological and biomedical sciences (e.g., Munafò et al., 2017). As a result of this crisis, psychological and neuroscientific sciences are gradually transitioning toward new practices and norms, with respect to both the way studies are designed and predefined and the way the data are analyzed and evaluated (Cumming, 2014; Dienes, 2011). We consider the current work a part of this self-improvement wave in the field and hope others will follow in reevaluating their findings with the new practices that, in many ways, are transforming and improving the way psychological research is being conducted.
Footnotes
Acknowledgements
We thank Christof Koch, Nathan Faivre, Hagar Gelbard-Sagiv, and Shlomit Yuval-Greenberg for their insightful comments on the manuscript; Jeffrey Rouder for his assistance in using the BayesFactor package; Guido Hesselmann and David Shanks for their advice about regression to the mean; and Alex Manevitch, Yoav Roll, and Nada Yassin for their help with data collection.
Action Editor
Philippe G. Schyns served as action editor for this article.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Funding
The study was supported by the Israel Science Foundation (Grant No. 1847/16) and the Marie Skłodowska-Curie Individual Fellowships (Grant No. 659765-MSCA-IF-EF-ST).
Open Practices
All data have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/bpj7m/. The design and analysis plans for Experiments 1 and 3 were preregistered at https://osf.io/bpj7m/ and osf.io/y2exn, respectively. The complete Open Practices Disclosure for this article can be found at http://journals.sagepub.com/doi/suppl/10.1177/0956797617735745. This article has received the badges for Open Data and Preregistration. More information about the Open Practices badges can be found at
.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
