Abstract
The flexible adjustment to changing demands is an astonishing human ability. One related phenomenon is the context-specific proportion congruency (CSPC) effect. Regarding response conflict, the CSPC refers to reduced response interference in contexts with a high conflict proportion as opposed to contexts with a low conflict proportion. Derived from previous research showing CSPCs in the visual domain, we here aim to investigate whether human voices (male vs. female) as auditory contexts trigger control adjustments. To this end, we used a numerical judgement task with number words spoken by a male or female voice. We created response conflict by presenting the words either to the left or right ear (Experiment 1), and we created different levels of processing fluency by presenting them clearly or with background noise (Experiment 2). For a given participant, either the female or the male voice was associated with a high proportion of incongruent/disfluent trials and a low proportion of congruent/fluent trials, respectively. Extending previous findings from the visual modality, we found that the frequency of challenging information within one auditory context (i.e., the voice) can lead to typical CSPC patterns. In two further experiments, using frequency biased and unbiased items, we found evidence for the contribution of associative learning. Limitations of context control associations will be discussed.
Introduction
The flexible adjustment to changing task demands highlights an outstanding ability of human action control. Adjustments of control may occur dynamically from one trial to the next, like, for example, in classical interference tasks. In the Eriksen–Flanker, Simon, or Stroop paradigm, an incongruent stimulus triggers the upregulation of control and, consequently, response interference is reduced in trial n + 1 (e.g., Botvinick et al., 2001; Gratton et al., 1992; Kerns et al., 2004; Stürmer et al., 2002; for a review, see Egner, 2014). Context-specific, rather than sequential, control adjustments in contrast describe the ability to adapt to specific demands in different contexts (Crump et al., 2006; Dreisbach, Fröber, et al., 2018; Dreisbach, Reindl, & Fischer, 2018; for reviews, see Bugg, 2012; Bugg & Crump, 2012). For example, if response incongruent stimuli occur predominantly at one location and congruent stimuli at another, the location associated with mostly incongruent stimuli triggers the upregulation of control. Consequently, response conflict is reduced at this specific location, but not at the other location. Previous research has shown context-specific adjustments to visual contexts such as colour, shape, and location of items (for a review, see Bugg & Crump, 2012), or even to more complex context features like faces (Cañadas et al., 2013, 2016; Jiménez-Moya et al., 2018). Derived from this research, we here aim to investigate whether human voices (males vs. female) as auditory contexts can also trigger control adjustments.
In the lab, context-specific adjustments are typically investigated using classical interference tasks like the Eriksen–Flanker, Simon, or Stroop task (for a review, see Bugg & Crump, 2012; for context-specific adjustments in task switching and dual tasking see, for example, Crump & Logan, 2010; Fischer et al., 2014 Surrey et al., 2017). In these tasks, a high proportion of incongruent trials (e.g., 80% incongruent trials) in a given context leads to decreased interference effects in this context, whereas a high proportion of congruent trials leads to larger interference effects (e.g., Logan & Zbrodoff, 1979; for a review, see Bugg & Crump, 2012).
If, for example, mostly congruent (MC) trials are presented at and associated with a location to the right of fixation and mostly incongruent (MI) trials, in turn, to the left of fixation, the resulting congruency effect is significantly smaller on the left as compared with the right side (e.g., Corballis & Gratton, 2003). This context-specific proportion congruency (CSPC) effect illustrates that the frequency of conflict at a particular location leads to context-specific control associations that supports handling challenging information as illustrated by attenuated interference effects in those contexts (e.g., Corballis & Gratton, 2003; Crump et al., 2006; Dreisbach, Reindl et al., 2018; Jacoby et al., 2003; Vietze & Wendt, 2009). Meanwhile, there are numerous studies replicating this CSPC for contextual features such as affect, colour, face, form, temporal foreperiod, and spatial location (e.g., Bugg et al., 2008, 2011; Cañadas et al., 2013; Crump et al., 2006, 2017; Crump & Milliken, 2009; Dreisbach, Fröber, et al., 2018; Dreisbach, Reindl, & Fischer, 2018; Heinemann et al., 2009; Lehle & Hübner, 2008; Vietze & Wendt, 2009; Wendt & Kiesel, 2011).
So far, however, CSPC effects have mainly been investigated in the visual modality, and barely in the auditory domain (but see Spapé & Hommel, 2008). Adaptation to our acoustical surroundings seems, however, a quite conceivable and adaptive feature of human action control. Imagine a student during lectures. From previous experiences, they might know that remarks from one fellow student are most often distracting and uninformative, whereas the remarks from another fellow student are often helpful and informative. Consequently, voice identity may serve as contextual cue to direct attention towards useful sources and away from distracting sources of auditory information. In fact, there is already some evidence supporting the claim that voice identity can have a modulatory effect on control adjustments. However, so far it is restricted to sequential processing adjustments in a Stroop-like task (Spapé & Hommel, 2008). Therefore, participants had to respond to high- or low-pitched tones by saying “high” or “low,” respectively. Simultaneously, they heard a voice speak the word “high” or “low,” which they had to ignore. In line with previous findings, performance was impaired if the presented word was incongruent with the required response. This Stroop-like effect was reduced after incongruent trials. Interestingly, this sequential modulation only occurred if the voice in the two successive trials was the same, whereas no modulation was obtained whenever the voice changed (Spapé & Hommel, 2008).
To extend this finding of sequential control to context-specific control adjustments, we aim to investigate whether participants are able to adapt conflict processing in an auditory Simon task with different proportions of congruency to the speaker identity. Voice identity, just like facial expression or gender (Cañadas et al., 2013, 2016), is a rather complex feature. In the visual domain, there have been investigations using plain perceptual and abstract features as contextual cues, but also more complex features, which extend the external validity of the findings. A generalisation to the auditory domain, especially to human voices, would further increase the relevance of control adjustments in daily life. In our paradigm, we presented number words (one to nine except for five) spoken by a male and a female voice via headphones. The number words were presented monaurally either to the participants’ left or right ear, and responses to the number magnitude had to be given manually pressing a lateralised button using the left or right index finger. That way, stimulus presentation was either response congruent (e.g., <5 presented to the left ear requiring a left key press) or response incongruent (e.g., >5 presented to the left ear requiring a right key press; cf. Simon & Rudell, 1967; for an overview, see Lu & Proctor, 1995). Critically, congruent and incongruent trials occurred with equal frequency, while one voice was associated with MI trials and the other voice was associated with MC trials. We predict a CSPC effect in terms of a smaller Simon effect for the mostly incongruent voice and a larger Simon effect for the mostly congruent voice. Statistically, we thus predicted an interaction of Congruency (congruent, incongruent) × Conflict Voice (MC, MI).
Experiment 1—conflicting voice
Method
Participants
G*Power 3.1 software (Faul et al., 2009) revealed that a sample size of N = 32 is required to guarantee sufficient statistical power of 1–β = .95 with α = .05, and partial η2 = 0.31 (Dreisbach, Reindl, & Fischer, 2018; Experiment 1). Based on this analyses, 32 undergraduate students of the University of Regensburg (16 female; 27 right-handed; Mage = 21.6, SDage = 3.5, Rage = 18–35 years) participated in this study. Participants had normal hearing and were naive with respect to the hypothesis of the experiment. All participants signed an informed consent form and were debriefed and rewarded with partial course credit after the session. Data from one participant have to be excluded due to error rates deviating more than 3 SDs from the sample mean.
Materials and procedure
As stimuli, we presented the spoken number words: One, two, three, four, six, seven, eight, and nine, spoken by a single female and a single male speaker at approximately 70 dBA. Participants were instructed to categorise the numbers as smaller or as larger than five. Accordingly, for numbers smaller than five, they pressed the “Y” key, and for numbers larger than five, they pressed the “M” key using a QWERTZ keyboard. This stimulus–response assignment was held constant across participants to avoid any influence of spatial associations with response hands as reflected in the spatial–numerical associations of response codes (SNARC) effect (Dehaene et al., 1993).
These stimuli, pseudo-randomised to avoid any stimulus repetition, were always presented to the left or the right ear via headphones (Sennheiser HD 201), thus creating an auditory Simon task (cf. Simon & Rudell, 1967; for an overview, see Lu & Proctor, 1995). Congruent trials were those where the stimulated ear (left/right) coincided with the lateral position of the correct response (one, two, three, and four presented to the left ear; six, seven, eight, and nine, presented to the right ear). The other trials were coded as incongruent. For a given participant, one voice was associated with MI trials (80% incongruent; 20% congruent) and the other voice was associated with MC trials (80% congruent; 20% incongruent). This association between voice and proportion congruency (PC) was kept constant within a given participant but counterbalanced across participants. Note that the voice varied randomly from trial to trial. Overall, there were 50% congruent and 50% incongruent trials in a given block.
Each trial started with the presentation of a white fixation cross (Courier New, 28 pt) on a black background for 300 ms followed by the imperative stimulus (i.e., spoken number word) to either the left or the right ear. The screen remained black until a response was given or 1300 ms had passed. If the response was correct, the next trial started after an inter-trial interval (ITI) of 550 ms on average. The ITI varied randomly between 100 and 1000 ms in 100 ms steps, to prevent rhythmic responding and thus prevent artificially blurred or created (Schmidt, 2016) effects. If the response was wrong or slower than 1300 ms, the German word for wrong (i.e., “falsch”) or too slow (i.e., “zu langsam”) was displayed in red (Courier New, 22 pt) on the screen for 300 ms. In total, the whole experiment consisted of a short block of 20 practice trials and three experimental blocks of 160 trials each. A 2 (Congruency: congruent, incongruent) × 2 (Conflict Voice: MI, MC) repeated measures design was applied. RTs and errors served as dependent measures.
Results
Reaction times
For statistical analysis, we excluded the first trial of each block, erroneous, and post-error trials (together 7.5%), as well as RTs that exceeded more than 3 SDs from the individual cell mean (1.6%). To investigate as to whether the Simon effect varies as a function of conflict voices associated with different degrees of proportion congruency, the remaining data were submitted to a 2 (Congruency: congruent, incongruent) × 2 (Conflict Voice: MI, MC) analysis of variance (ANOVA) with repeated measures on both factors.
The respective analysis revealed a significant main effect of Congruency, F(1, 30) = 63.17, p < .001,
Error rates
An analogous ANOVA on errors revealed a significant main effect of Congruency, F(1, 30) = 40.63, p < .001,
Discussion
This study for the first time hints at context-specific control adjustments in the auditory domain. More precisely, voice features that were associated with different levels of PC modulated the auditory Simon effect. That is, results show a significantly larger Simon effect for voices associated with MI trials as compared with voices associated with MC trials.
This is in line with findings from the visual modality which showed PC effects in contexts like face or gender identity before (Cañadas et al., 2016; Jiménez-Moya et al., 2018). Just as in the visual modality, the frequency of conflicting information transferred by a particular voice leads to context (here: voice) control associations that are retrieved by the respective context (see Gottschalk & Fischer, 2017; for reviews, see Bugg, 2012; Bugg & Crump, 2012). Our results are also in line with findings from Lawo and Koch (2014) who also found a Simon effect in the auditory domain using human voices. More precisely, they also used digits presented by male and female voices dichotically to the left and right ear, with a visual cue announcing which of the two voices (male or female) participants had to attend to in the upcoming trial. Interestingly, while they found robust switch costs (e.g., switch from male to female voice), these switch costs did not interact with the Simon effect (which in fact was identical for voice repetitions and voice switches). Lawo and Koch (2014) consequently concluded that the process of stimulus selection (depending on the voice) can be dissociated from response selection (depending on the spoken number magnitude). In our paradigm, however, participants only had to attend to the stimulus content (and not the voice) to accomplish the task. But still, we found evidence that participants were able to use the voice to either retrieve the appropriate control set. Before we further discuss potential underlying mechanisms of the CSPC effect observed here, we first aim to replicate the findings of Experiment 1 with a modified stimulus set.
Voices in our daily environment sometimes may convey conflicting information. But most of the time, voices vary in acoustical quality. We are often confronted with variations in processing fluency and the ease of processing due to background noises or bad phone connections. Furthermore, the theoretical account of conflicts as aversive signals (Dreisbach & Fischer, 2015, 2016) would predict context-specific control adaptations also to different levels of processing fluency just as adaptation to proportion of conflict. This is derived from the well-established finding that (perceptual) fluency of processing is hedonically marked, and “high fluency indicates a positive state of affairs, whereas low fluency indicates a negative state of affairs” (Winkielman et al., 2003 p. 203). Correspondingly, from the visual domain, there is already evidence that disfluent stimuli can trigger control adjustments as conflicts do. In fact, there is empirical evidence for a CSPF effect in the visual domain (Dreisbach, Reindl, & Fischer, 2018) as well as sequential processing adjustment (Dreisbach & Fischer, 2011). To investigate whether context-dependent control adaptations can also be found for different signal-to noise-ratios in the quality of a particular voice, we manipulated the proportion of Fluency (fluent or disfluent) in Experiment 2.
Experiment 2—disfluent voice
Method
Participants
Thirty-two undergraduate students of the University of Regensburg (16 female; 27 right-handed; Mage = 21.8, SDage = 3.8, Rage = 18–30 years) participated in this study. None of them had participated in the first experiment and were naive with respect to the hypothesis of the experiment. All participants signed informed consent and were debriefed and rewarded with partial course credit after the session. Data from one participant have to be excluded due to a mean error rate of more than 3 SDs from the sample mean.
Material and procedure
Task and procedure were the same as in Experiment 1 except for the following changes: Instead of presenting the number words to the left or the right ear as in Experiment 1, they were now presented binaurally and were either presented in clear speech without any background noise or were embedded in so called multi-speaker babble background noise by using Audacity® 2.1.2. (www.audacity.de). More precisely, the original number words spoken by a male or female voice were combined with convolving speech streams of 10 (five female, five male) different speakers (for further detail, see Obermeier et al., 2012) and were windowed with 10 ms linear onset and offset slopes. The signal-to-noise ratio of original number words and multi-speaker babble noise was –5 dB. For a given participant, one voice (male or female) was associated with mostly disfluent (MD) trials (80% disfluent, 20% fluent) and the other voice with mostly fluent (MF) trials (80% fluent, 20% disfluent). In this study, word repetitions were possible, but were excluded from further analyses. Instructions, trial, and block procedure remained the same as in Experiments 1 and 2. A 2 (Fluency: fluent, disfluent) × 2 (Fluency Voice: MF, MD design with repeated measures was applied.
Results
Reaction times
For statistical analysis, we excluded the first trial of a given block, erroneous trials, post-error trials (together 5.72%), and number repetitions (11.57%), as well as RTs that exceeded more than 3 SDs from the individual cell mean (1.01%). A 2 (Fluency: fluent, disfluent) × 2 (Fluency Voice: MF, MD) ANOVA with repeated measures on both factors revealed a significant main effect of Fluency, F(1, 30) = 92.95, p < .001,
Error rates
An analogous ANOVA on error rates revealed a significant main effect of Fluency, F(1, 30) = 16.58, p < .001,

Reaction time (ms) and error rate (%) of congruent and incongruent trials in Experiment 1 (left panel) or fluent and disfluent trials in Experiment 2 (right panel) as a function of voice (mostly congruent/fluent [MC/MF] vs. mostly incongruent/disfluent [MI/MD]).
Discussion
Further extending context-specific adaptation in the auditory domain, Experiment 2 revealed control adaptations also for non-conflicting, disfluent stimuli (cf. Dreisbach et al., 2018; Dreisbach & Fischer, 2011). More precisely, manipulation of the fluency of auditory information conveyed by a particular voice led to smaller fluency effects for the voice that spoke mostly disfluently as opposed to another voice that spoke mostly fluently.
So far, we are tempted to argue that the observed context-specific adjustments presented can be taken as a sign for control adjustments: Participants learn to associate a certain human voice to the appropriate control set which is then retrieved whenever the respective voice is presented. However, CSPC effects may just as well be the result of stimulus–response learning in the sense that participants learn that a certain stimulus presented by a certain voice has to be answered with a certain key. If, for example, the male voice was associated with MI/MD trials, then the number four spoken by a male voice will mostly be presented to the response incompatible right side. Participants may thus learn to associate certain stimuli with certain responses. In other words, it is necessary to investigate the underlying mechanisms of the effects observed here. By now, our paradigm may still create effects due to item-specific control adaptation, or, more critically, pure contingency learning (cf. Schmidt, 2016, 2019). The CSPC in the visual domain is an effect that can generalise to items that do not follow the PC manipulation, but appear on the respective context (Bugg et al., 2011; Cañadas et al., 2013, 2016; Crump & Milliken, 2009; Jiménez-Moya et al., 2018). In order to investigate whether the voice is a sufficient trigger of control adaptation, we re-ran Experiments 1 and 2, this time including frequency unbiased items. If the effect in the auditory domain results from context-specific control adaptations, the CSPC should still be existent for the diagnostic, frequency-unbiased items. Finding the effect for frequency-biased items only would suggest that context-specific adjustments to vocal stimuli result primarily from associative learning.
Experiment 3
Method
Participants
We collected data of 33 students of the University of Regensburg to meet the required sample size of N = 32 (see power analysis in section “Method” of Experiment 1). Normal or corrected hearing and no participation in any of the other experiments presented here were required to participate. All participants signed informed consent and were debriefed and rewarded with partial course credit or €4 after the session. Data from one participant have to be excluded due to a mean error rate deviating more than 3 SDs from the sample mean (14.70% at a sample mean of MER = 3.97%, equaling 3.22 SD). The final sample was aged Mage = 22.47, SDage = 2.91, Rage = 18–31 years. Twenty-three participants were female and 30 were right-handed.
Materials and procedure
The stimuli presented were the same as were used in Experiment 1. The only difference was that the proportion of congruent and incongruent trials per voice did only apply for six of the eight number words. Two-number words (four and six, three and seven, two and eight, or one and nine; counterbalanced across participants) were presented equally often congruent and incongruent in both voice contexts (50% congruent, 50% incongruent). These items are coded as unbiased. The procedure of trials and blocks was exactly as described in Experiment 1. Thus, one block consisted of 160 trials of which 120 followed the 80:20/20:80 proportion of congruency and 40 were unbiased (see Table 1).
Frequency of trial types.
MC: mostly congruent; MF: mostly fluent; MI: mostly incongruent; MD: mostly disfluent.
All items in Experiments 1 and 2 were frequency-biased items (proportion congruency [PC] 80/20). In Experiments 3 and 4, PC for biased items was still 80/20 but 50/50 for unbiased items. The overall context-specific PC in Experiments 3 and 4 was therefore weaker with 72.5%/27.5%.
We conducted a 2 (Congruency: congruent, incongruent) × 2 (Conflict Voice: MC, MI) × 2 (Frequency-biased: biased, unbiased) repeated measures ANOVA. The CSPC is expected to show independently of context bias. That is, we expect an interaction Congruency by Conflict Voice, which is not further modulated by the frequency bias. To avoid the prediction of a null effect, we will also report separate interaction contrasts of Congruency × Conflict Voice for biased and unbiased items, respectively.
Results and discussion
Data preprocessing
The data of the practice blocks were not included in any analyses. For the analysis of error rates, the first trial of each of the four experimental blocks (0.63%) and all number repetitions (11.88%) were excluded. Before RT analysis, errors (3.41%), post-error trials (2.99%), and all trials with RTs deviating more than ±3 SDs from the individual cell mean (1.25%) were excluded. This left a total of 87.50% of all trials for error analysis and 79.84% for RT analysis.
Reaction times
The 2 (Congruency: congruent, incongruent) × 2 (Conflict Voice: MC, MI) × 2 (Frequency Bias: biased, unbiased) repeated measures ANOVA brought up a main effect of Congruency, F(1, 31) = 54.14, p ⩽ .001,
Mean RT (error rates) for Experiment 3 and 4.
RTs: reaction times; CSPC: context-specific proportion congruency; MC: mostly congruent; MI: mostly incongruent; CSPF: context-specific proportion fluency MF: mostly fluent; MD: mostly disfluent.
significance at the .05 level.
Error rates
An analogous 2 (Congruency: congruent, incongruent) × 2 (Conflict Voice: MC, MI) × 2 (Frequency Bias: biased, unbiased) ANOVA on error rates revealed a main effect of Congruency, F(1, 31) = 35.80, p < .001,
Experiment 3 showed a reliable Simon effect for both item types for both dependent variables. However, the CSPC was, just like in Experiment 1 and 2, only present in RTs. A closer look showed that it was significant only for biased items and was only descriptively observed (but not significant) for unbiased items. RTs thus suggest that associative learning may also have contributed to the effect (cf. Schmidt, 2019). Before further discussing the results, we, analogously, test our findings also for the manipulation of Fluency.
Experiment 4
Method
Participants
Matching the other experiments reported here, a sample size of N = 32 was targeted. Thirty-three students of the University of Regensburg with normal or corrected hearing who were not part of the samples of the other experiments participated in the experiment. All participants signed informed consent and were debriefed and rewarded with partial course credit or €4 after the session. Data from one participant have to be excluded due to a mean error rate deviating more than 3 SDs from the sample mean (18.95% at a sample mean of MER = 4.14%, equaling 3.90 SD). The final sample was aged Mage = 22.13, SDage = 2.83, Rage = 18–31 years. A total of 27 of the participants were female and 30 of them were right-handed.
Materials and procedure
The stimuli presented were the same as in Experiment 3. The only difference was that the proportion of fluent to disfluent trials per voice context did only apply for six of the eight number words. Analogously to Experiment 2, two number words (four and six, three and seven, two and eight, or one and nine; counterbalanced across participants) were presented equally often fluently and disfluently in both voice contexts. The procedure of trials and blocks was identical to the other experiments. Like in Experiment 3, one block consisted of 120 biased and 40 unbiased items (see Table 1).
RT and error rates were analysed in a 2 (Fluency: fluent, disfluent) × 2 (Fluency Voice: MF, MD) × 2 (Frequency Bias: biased, unbiased) repeated measures ANOVA. We expected a two-way interaction of Fluency by Fluency Voice independently of Frequency Bias. To avoid the prediction of a null effect, we will also report the interaction contrasts Fluency by Fluency Voice for biased and unbiased items separately.
Results and discussion
Data preprocessing
The data of the practice blocks were not included in any analyses. For the analysis of error rates, the first trial of each of the four experimental blocks (0.63%) and all number repetitions (11.96%) were excluded. Before RT analysis, errors (3.75%), post-error trials (3.29%), and all trials with RTs deviating more than ±3 SDs from the individual cell mean (1.04%) were excluded. Thus, 87.41% of all trials were left for error analysis and 79.33% for RT analysis.
Reaction times
The 2 (Fluency: fluent, disfluent) × 2 (Fluency Voice: MF, MD) × 2 (Frequency Bias: biased, unbiased) repeated measures ANOVA brought up a main effect of Fluency, F(1, 31) = 114.61, p ⩽ .001,
Error rates
The 2 (Fluency: fluent, disfluent) × 2 (Fluency Voice: MF, MD) × 2 (Frequency Bias: biased, unbiased) repeated measures ANOVA showed a significant main effect of Fluency, F(1, 31) = 15.29, p ⩽ . 001,
The results show a stable fluency effect but no context-specific adaptation effects whatsoever. We can therefore neither conclude that participants identified the voice identity as context cue for the adjustment of cognitive control nor that any associative learning or binding processes have taken place. This finding contradicts evidence from the visual modality (Dreisbach, Reindl, & Fischer, 2018) and is not in line with results of Experiments 1 and 2 presented here.
General discussion
The experiments presented here were designed to further investigate the role of contextual demands on behavioural adjustments. More precisely, this study aimed to investigate whether the cognitive system adjusts to voice identity as an auditory context feature signalling a particular kind of demand, that is, conflict or (dis-) fluency of processing. To this end, we manipulated the contextual control demands by associating one voice (fe-/male) with mostly in/congruent (Experiment 1) or mostly dis/fluent trials (Experiment 2). To rule out alternative explanations for the effect, such as contingency learning (cf. Schmidt, 2016), we added frequency-unbiased items in a third (congruency manipulation) and fourth (fluency manipulation) experiment.
For Experiments 1 and 2, our results were as expected: We found a CSPC effect for the spoken number words: Irrespective of the exact nature of the challenge in a particular voice-context (response incongruent or hard to perceive), participants adapted to the particular features of the respective voice resulting in a smaller Simon/Fluency effect for voices associated with MI/MD trials compared with voices associated with MC/MF trials.
Experiment 3 by and large corroborates these findings. In line with previous research on visual stimulus material, we expected that context-unbiased items that appeared in the same task context would also show context-specific adjustment. Our results show only weak evidence in favour of this assumption. Even though the CSPC effect did not interact with Frequency Bias, a closer inspection of the data showed that the CSPC was significant for frequency biased, but only descriptively present for unbiased items. This suggests that the findings from Experiment 1 at least partly emerged due to associative learning.
In Experiment 4, in which we used the same Fluency manipulation as in Experiment 2, we no longer found a CSPC (i.e., CSPF) for neither biased nor unbiased items. This latter result is ambiguous as it may suggest that the effect found in Experiment 2 was a false positive. We, however, favour an alternative explanation: The lack of an effect in biased items of Experiment 4 may just as well speak against an item-specific explanation in Experiment 2. Note that the fluency manipulation used in Experiments 2 and 4 was different from the conflict manipulation used in Experiments 1 and 3 insofar, and that it harbours an additional component of the context feature. In the conflict Experiments 1 and 3, one of the voices (e.g., female) is mostly presented on the response congruent side and one (e.g., male) is mostly presented on the response incongruent side. Although the auditory experience of the voice identity features always stays the same (male or female), irrespective of the spatial occurrence, this may not be the case in the fluency manipulation. The voice identity feature (male or female) is distinct from the additional distracting feature (multi-speaker babble), which means that participants would have to learn an additional association between the voice and its background noise. The associative learning between voice identity, stimulus content (i.e., number magnitude), and corresponding response may, therefore, be facilitated in the conflict experiments as compared with the fluency experiments. The absence of a PC effect for unbiased items in Experiment 3 (conflict) speaks to this interpretation. The absence of a PC effect in both biased and unbiased items in Experiment 4 (fluency) clearly speaks against associative learning because associative learning should be unaffected by the inclusion of unbiased items. In particular, the results of Experiment 4 do not necessarily rule out context control associations in Experiment 2. The question of whether context-specific control adjustments occur in the auditory domain with voice identities as contextual cues can therefore not be answered unambiguously by the presented research.
There is in fact much evidence in the literature that PC effects can be volatile and depend on the strength of contextual features (cf. Crump et al., 2017). It may be that the context manipulation was weakened by the inclusion of unbiased items and therefore no contextual control adaptation emerged. Although many successful replications and variations of the CSPC exist, there are indications that it underlies strict limitations. Hutcheon and Spieler (2017) have tried to replicate the original findings of Crump and Milliken (2009) who first added unbiased items to their design to rule out any lower level processes in responding as source of the effect. Their results from three experiments did not match the original results: They did not find any CSPC effects either in the frequency biased or in the unbiased transfer items. They concluded that the context ought to provide meaningful and consistent information on how to organise control and attention levels best and that the addition of unbiased items may add too much ambiguity. In fact, this is an inherent problem of adding unbiased items, which is hard to circumvent. Namely that any inclusion of unbiased items necessarily changes also the reliability of the context. That is, including unbiased items necessarily makes it less worthwhile to adjust to the context. Unless unbiased items are presented so rarely that they do not substantially reduce the informative value of the context. However, this would create problems at the other end, namely, the analysis of the few unbiased items. In fact, the descriptively present but non-significant CSPC effect for unbiased items in Experiment 3, which is based on a much smaller subset of trials, supports this argument. In our Experiments 3 and 4, the inclusion of unbiased items has certainly reduced the informational value of the context such that the adaptation to the now less reliable context was weakened (Experiment 3) or even absent (Experiment 4).
Among the abovementioned trial-type frequency per context, this and other methodological factors may be important factors in the occurrence of CSPCs in laboratory designs. It is not uncommon that context-specific adjustments only show in one of two dependent variables. This is another hint at the fragility of the effect and at possible factors of influence that may change it in size or hinder its development, but are not usually controlled or manipulated. Such factors may also lie in subtle differences in composition of the samples or instructions of experiments (cf. Cañadas et al., 2013), even if not manipulated explicitly. With strong differences in cognitive sets between possible underlying contexts, even additional processes such as attentional set-switching or affective context modulations (see Dreisbach, Fröber, et al., 2018) have been reported. In our paradigm, the mere switching between contexts (i.e., voices) does presumably not harbour unwanted confounds, as switching between voices is independent from response congruency effects (cf. Lawo & Koch, 2014). The possible methodological confound of rhythmic responding that may account for CSPCs in unbiased items (Schmidt, 2016) has been prevented in our tasks by jittered ITIs. Many of these inconsistencies in different methods probably have to be accepted as possible noise in these kinds of investigations. In the light of our and earlier results (cf. Crump et al., 2017; Hutcheon & Spieler, 2017), some methodological recommendations that can minimise a large part of this noise can be inferred. A different type of transfer manipulation, for example, can solve the issue of attenuated context cues. Using unique transfer items in separated blocks or after a certain time of context initiation, for example, might be methods that are more robust. A learning phase can boost the identification of different contexts. Furthermore, salient differences in affective value between contexts and the task properties such as the possibility of rhythmic responding should be prevented (see also Braem et al., 2019, for further recommendations).
The human organism presumably always strives for optimisation of behaviour upon minimisation of resource consumption. When task-relevant and reliable context cues exist, it is supposedly beneficial for the organism to adapt behaviour in line with them. When no such overarching cues can be identified, more specific stimulus–response rules may guide behaviour in the most beneficial way (cf. Dreisbach, 2012). In this case, associative learning may underlie some ratio of CSPC-like effects reported in the literature.
Conclusion
We cannot draw a clear conclusion in favour of context-driven control in auditory conflict and fluency interference tasks. Our results suggest that both, associative learning and contextual control—as far as a relevant and meaningful context is identified—contribute to the observed adjustments to human voices. The research presented here can only be a first step in this direction. Given the omnipresence and everyday relevance of human voices in different contexts, further research is clearly needed. In particular, investigating processing adjustments in the auditory domain in general and for the context of human voices in particular may be of great interest from a basic as well as applied perspective.
Footnotes
Acknowledgements
We would like to thank Teresa Feilmeier, Sophia Geisenhofer, and Julia Hauke for data collection.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research presented here was funded by Deutsche Forschungsgemeinschaft DFG, DR 392/6-3.
