Abstract
Studies to date have focused on the priming power of visual road signs, but not the priming potential of audio road scene instruction. Here, the relative priming power of visual, audio, and multisensory road scene instructions was assessed. In a lab-based study, participants responded to target road scene turns following visual, audio, or multisensory road turn primes which were congruent or incongruent to the primes in direction, or control primes. All types of instruction (visual, audio, and multisensory) were successful in priming responses to a road scene. Responses to multisensory-primed targets (both audio and visual) were faster than responses to either audio or visual primes alone. Incongruent audio primes did not affect performance negatively in the manner of incongruent visual or multisensory primes. Results suggest that audio instructions have the potential to prime drivers to respond quickly and safely to their road environment. Peak performance will be observed if audio and visual road instruction primes can be timed to co-occur.
Introduction
The use of in-car navigation systems has grown exponentially during the past few decades, and is expected to continue to grow (Broy, 2006). Traditionally, these systems have been used to provide navigational information to drivers. While these devices have been studied extensively in terms of their potential to distract drivers from the road scene (Dingus, McGehee, Hulse, Jahns, & Manakkal, 1995; Regan, Oxley, Godley, & Tingvall, 2001; Srinivasan & Jovanis, 1997), or benefit drivers by effectively providing route-guidance information (Barrow, 1991; Burnett, 2000; Burns, 1997; Fastenmeier, Haller, & Lerner, 1994), little work to date has looked at their potential to directly prime drivers to respond quickly and safely to the road environment. Considering the usefulness of other cues such as road signs in priming drivers to respond quickly and effectively to their road environment (Crundall & Underwood, 2001; Koyuncu & Amado, 2008), it is important to explore the potential of in-car devices to do the same.
The priming paradigm has become an important tool in measuring how a road cue or device affects driver behaviour. While driving studies had previously focused on a driver’s ability to recall or name a road sign (Johansson & Backlund, 1970; Milošević & Gajić, 1986), Fisher (1992) proposed that a more important measure of a road cue’s effectiveness is how it affects a driver’s response—that is, the extent to which it can successfully prime a driver such that subsequent responses to the road environment are faster and more accurate. Indeed, implicit measures of a road cue’s efficacy are considered to be more useful than measures examining explicit recall or description, as implicit measures are more closely tied to actual modifications in driver behaviour (e.g., slowing down; Summala & Hietamäki, 1984). On this basis, numerous lab-based studies have focused on the implicit processing of road sign information using a priming paradigm (e.g., Charlton, 2006; Crundall & Underwood, 2001; Koyuncu & Amado, 2008). Using this paradigm, a road cue is considered to be effective if its presence facilitates a reaction to a road scene in terms of speed and/or accuracy.
Drivers use several cues to respond appropriately to the road environment. Road signs are the most frequently used visual primes in preparing drivers to respond (Koyuncu & Amado, 2008), and they play a crucial role in road safety. Indeed, road signs indicating an upcoming intersection or curve in the road can decrease the number of crashes at these sites by as much as 30%-40% (Agent, Stamatiadis, & Jones, 1996; Creasy & Agent, 1985). In lab-based studies, both accuracy and response speed are improved by road sign priming. Crundall and Underwood (2001) presented participants with priming images of road signs (“turn left” or “turn right”) followed by images of road bends that were congruent or incongruent in direction. They report that for experienced drivers, congruent road sign priming significantly decreased response time to a subsequent road bend target, relative to incongruent or control priming (see Koyuncu & Amado, 2008 for similar results). Similarly, Charlton (2006) demonstrated that semantic road sign priming increased accuracy and improved response times to images of hazardous road scenes. These studies demonstrate the importance of the priming function of road signs in responding to a road environment.
It is becoming increasingly clear that the safest way to present in-car navigational information to drivers is in audio rather than visual form, as in-car glancing to perform secondary tasks such as looking at a GPS device increases drivers’ risk of crashing for both experienced (Klauer, Guo, Sudweeks, & Dingus, 2010; Liang, Lee, & Yekhshatyan, 2012) and novice drivers (Simons-Morton, Guo, Klauer, Ehsani, & Pradhan, 2014). The addition of auditory prompts can significantly improve driving performance compared to when information is presented visually on an in-car device (Liu, 2001; Walker, Alicandri, Sedney, & Roberts, 1991), and this is likely to be due to both a reduction in visual workload (Labaile, 1990; Walker et al., 1991; Xie, Zhu, Guo, & Zhang, 2013) and the reduction in glance-time away from the road scene (Jensen, Skov, & Thiruravichandran, 2010). In a comprehensive mixed methods study, Dalton, Agarwal, Fraenkel, Baichoo, and Masry (2013) performed the first large-scale investigation of how the auditory presentation of in-vehicle navigational information can affect driving performance. They report a memory advantage and a user preference for auditory over visual presentation of route information. However, that study focused largely on the navigational function, rather than the priming potential of audio presented in-vehicle guidance—an area which has been overlooked in the literature.
The redundant signals effect (Kinchla, 1974) describes a phenomenon whereby processing is speeded when information is presented in two different sensory channels compared to either channel alone. Whereas a race model (e.g., Raab, 1962) predicts this gain to be a result of activation of the faster of two responses, which are activated separately, Miller’s (1982) seminal paper demonstrated that multisensory input produces processing gains that cannot be explained by separate activation of the single senses involved. Instead, congruent input from different senses combines to produce a redundancy gain due to a response threshold being reached more quickly with input from two sources compared to one.
Indeed, it has been proposed that presenting road environment cues in a multisensory manner—for example, in both visual and auditory form—should maximise their efficacy (Regan, 2004). Multisensory cues are generally better at capturing spatial attention during an attention-demanding task such as driving, compared to either visual or auditory cues alone (see Spence & Santangelo, 2009 for a review), and previous work has shown that the provision of multisensory auditory and vibrotactile cues lead to faster braking times than the presentation of auditory or vibrotactile cues alone (Ho, Reed, & Spence, 2007). Of particular interest to driving researchers are the processes contributing to making a decision in a road environment. The coactivation model outlined by Miller (1982) specifically suggests that the presentation of information in more than one modality should lead to speeded decision responses. As such, the multi-modal presentation of auditory and visual priming information should lead to improved driving performance in response to a road scene target compared to either auditory or visual priming cues alone. A further advantage of presenting road cue information aurally as well as visually is that there is less chance that a solitary visual road sign cue—which may be obscured or not attended to—will be missed.
Aims and rationale
No research to date has directly compared the effects of visual road sign priming and auditory priming on a road scene task. Considering that over 75% of drivers that use in-car navigational devices utilise the auditory function (Dalton et al., 2013), of interest is whether the addition of an audio prime can strengthen the priming effects traditionally observed for road signs. Similar to the method used by Crundall and Underwood (2001) and Koyuncu and Amado (2008), we used a lab-based priming paradigm to present participants with images of road bend signs followed by semantically related or unrelated images of target road bends. We modified this paradigm to include auditory primes to investigate the relative priming power of visual and audio road turn primes. Of interest was whether conventional in-car auditory instruction—“turn left” or “turn right”—primarily used to convey navigational information could be useful in priming drivers to respond rapidly to a road scene. Finally, we proposed that the concurrent presentation of multisensory visual and auditory road cue primes would produce additional benefit over the presentation of either type of prime alone.
In a naturalistic setting, a road turn sign will typically be viewed for 700 ms to 1 s (Mori & Abdel-Halim, 1981). We designed our study to mirror this real-world experience closely by presenting primes for 580 ms to 882 ms. Auditory in-vehicle navigational instructions are often presented 30 s in advance of a road event, with a reminder prompt optimally presented at 4 s-7 s in advance of a road turn (Green & George, 1995; Wu, Huang, & Wu, 2009). We are interested in the potential additional priming function of these instructions if presented at closer temporal proximity to the required road task. We focused here on semantic priming, whereby a road sign or auditory prime is followed by a target road turn image, as this kind of priming is naturally occurring in the driving experience.
Methods
Participants
A total of 32 participants (19 female) with a mean age of 29.3 years (standard deviation, SD, = 11.7) volunteered to take part in the study. Participants were required to have a full, clean driving licence with at least 2 years’ driving experience in the United Kingdom (average length of experience = 8.6 years, SD = 8.8). This research complied with the American Psychological Association Code of Ethics and received ethical approval from the Psychology Departmental Research Ethics Panel at Anglia Ruskin University. Informed consent was obtained from each participant. Participants were paid £7 for taking part in the experiment.
Stimuli
Prime stimuli for the visual prime condition comprised images of standard UK road signs indicating a left or right bend in the road, taken from the publicly available Highway Code: traffic signs publication (Department for Transport, 2007). The control stimulus for this condition was created by replacing the left or right bend arrow in the road sign with “XXX” using Adobe Photoshop. These stimuli were scaled to 600 × 524 pixels. Prime stimuli for the audio prime condition comprised a computer generated voice saying “turn left,” “turn right,” or “hello” (control condition). These stimuli were created using free online text-to-speech software and set to speak with a female UK voice, and were converted to mp3 files using Audacity software. Prime stimuli for the multisensory prime condition comprised pairings of visual and audio stimuli. These pairings were always congruent (e.g., the left bend road sign was always accompanied by the audio instruction “turn left”).
Target stimuli comprised eight images of local Cambridge road scenes (four rural, four urban) taken in similar lighting conditions using a Nikon D60 camera. The road scenes had a distinct road bend in either the left or right direction and were clear of people and cars. Each road scene was saved in its original form and a mirror reversed form using Adobe Photoshop to create 16 road scenes in total (eight left and eight right bends). These images were converted to greyscale and any distractors (such as road signs visible in the scene) were removed. The images were then cropped such that the centre of each road bend was central to the image, and were saved at 1100 × 884 pixels. Images were viewed on a 17-inch screen of a Dell PC. Images subtended a viewing angle of 11.66 by 12.96 degrees (primes) or 18.98 by 23.41degrees (targets) when viewed from a distance of approximately 70 cm.
Procedure
Participants ran nine practice trials followed by six blocks of test trials. A trial comprised the presentation of a prime (visual, audio, or multisensory), followed by a fixation cross for 300 ms (SOA = 1,182 ms), followed by the presentation of a target stimulus. The visual primes and the visual element of multisensory primes were always presented for 882 ms, whereas the audio primes and audio element of multisensory primes were presented for 580 ms (“hello”)—882 ms (“turn left” and “turn right”). In choosing our prime durations, we were guided on the literature which showed that road signs are typically viewed for a comparable length of time (700 ms to 1,000 ms; for example, Mori & Abdel-Halim, 1981). While it is difficult to ensure that audio primes using different words have identical durations (in this instance, the congruent and incongruent audio primes had a duration of 882 ms, and the control audio prime had a duration of 580 ms), to counteract against this, stimulus onset asynchronies (SOA’s) were held strictly at 1,182 ms for all trial types. A prime was followed by a fixation cross, followed by the presentation of a target stimulus. The target stimulus was an image of a road scene, presented until the participant responded. Participants were required to press a button on the keyboard (“z” or “m”) to indicate whether the road scene depicted a left turn or a right turn, respectively. Each trial was followed by an inter-trial interval (ITI) varying randomly between 700 ms and 1,200 ms. See Figure 1 for an illustration of a single trial. Participants were instructed to respond to the road scene as quickly and as accurately as possible. Trials were balanced such that each prime medium type (visual, audio, and multisensory) was congruent to the target road scene in direction, incongruent to the road scene in direction, or a control an equal number of times. Trials were presented in a randomised order within each block. Six testing blocks comprised 144 trials each, resulting in 864 trials in total (3 prime types × 3 prime-target congruencies × 16 road scenes × 6 repetitions each).

Sequence of trial events. Primes were either visual (road sign), audio (computerised voice), or both (as illustrated in this example).
Results
Response time analyses
Response times for incorrect responses were excluded from analysis and response times further than 2 SD away from each participant’s mean were excluded as outliers (Ratcliff, 1993; 3.96% of correct responses). A 3 × 3 repeated measures analysis of variance (ANOVA) with factors of Medium (visual, audio, and multisensory) and Prime-Target Congruency (congruent, incongruent, and control) revealed a significant interaction between Prime-Target Congruency and Medium, F(4, 124) = 3.27, mean square error (MSE) = 136.39, p < .05, ηp2 =.095. Follow-up tests were carried out separately for visual, audio, and multisensory prime trials.
For trials containing both visual and audio priming information, participants responded significantly faster where the prime-target information was congruent compared to when this information was either incongruent, t(31) = 2.73, p < .05, or control, t(31) = 2.69, p < .05. There was no difference in response time to incongruent and control trials, t(31) = 1.17, n.s. All results hold after Bonferroni correction for three comparisons.
This pattern was repeated for trials containing only visual primes: responses to congruent trials were faster than responses to either incongruent, t(31) = 1.75, p < .05 (p = .045; approaches significance following Bonferroni alpha adjustment to .017 for three comparisons), or control trials, t(31) = 2.77, p < .05 (significant following Bonferroni correction for three comparisons), with no difference in reaction time between incongruent and control trials, t(31) = .06, n.s.
A different pattern was observed for audio prime trials. Here, responses were once more faster in response to congruent trials compared to either incongruent, t(31) = 3.48, p < .05, or control trials, t(31) = 7.03, p < .05. However, responses to incongruent audio prime trials were significantly faster than responses to control audio prime trials, t(31) = 2.42, p < .05. All results hold after Bonferroni correction for three comparisons. See Figure 2 for an illustration of these interaction effects.

Mean response times to correctly respond to a road turn when the prime stimulus was congruent (diagonal lines) or incongruent (checker) to the target road turn direction or a control (dots). Error bars represent the standard error of the mean.
Overall, a significant main effect of Prime-Target Congruency was observed, F(2, 62) = 11.71, MSE = 459.10, p < .001, ηp2 = .274, with follow-up tests showing a straightforward priming effect such that participants responded faster to congruent trials compared to either incongruent trials, t(31) = 3.06, p < .005, or control trials, t(31) = 5.26, p < .005. There was no difference in reaction time for incongruent and control trials, t(31) = 0.60, n.s. All results hold after Bonferroni correction for three comparisons. See Figure 3 for an illustration of this effect.

Mean response times to correctly respond to a road turn when the prime stimulus was multisensory (solid grey), visual (vertical lines), or audio (horizontal lines). Error bars represent the standard error of the mean.
A significant main effect of Medium was also observed, F(2, 62) = 38.72, MSE = 243.30, p < .001, ηp2 = .555. Follow-up tests revealed that participants responded significantly faster on trials where multisensory primes were presented, compared to trials where either visual primes, t(31) = 6.78, p < .001, or audio primes, t(31) = 7.71, p < .001, were presented alone. When presented on their own, participants responded faster to trials containing visual primes compared to audio trials, t(31) = 3.62, p < .001. All results hold after Bonferroni correction for three comparisons. See Figure 4 for an illustration of this effect.

Mean response times to correctly respond to a road turn following a visual prime, multisensory prime, or an audio prime when the prime stimulus was congruent (diagonal lines) or incongruent (checker) to the target road turn direction, or a control (dots). Error bars represent the standard error of the mean.
Accuracy analyses
As expected with a simple forced-choice task, accuracy performance was close to ceiling (average accuracy = 98.01%, SD = 0.58%). There was no main effect of Medium, F(2, 62) = 2.70, MSE = 2.31, n.s., nor was there an interaction between Medium and Prime-Target Congruency, F(4, 124) = 1.48, MSE = 2.47, n.s. There was a straightforward effect of Prime-Target Congruency, F(2, 62) = 7.99, MSE = 2.98, p < .01, ηp2 = .205, with planned comparisons showing participants scoring significantly less accurately on incongruent trials compared to both congruent trials, t(31) = 2.52, p < .05, and control trials, t(31) = 3.56, p < .05. There was no difference in accuracy between congruent and control trials, t(31) = 1.17, n.s.
Discussion
In a road scene task, we report strong priming effects for visual road sign primes, audio instruction primes and a combination of both of these types of prime. Responses to a road scene were fastest when priming information was presented concurrently using audio and visual primes, compared to using either visual or audio primes alone. Finally, we report that the presentation of incongruent audio information may not be detrimental to road scene performance in the same way that the presentation of incongruent visual information is.
Audio information can be useful in providing navigational information to drivers in a safe manner (Dalton et al., 2013; Klauer et al., 2010; Liang et al., 2012; Simons-Morton et al., 2014). An aim of this study was to investigate whether audio information can also be used to serve a priming function, improving reaction speed to a road scene. We report here that in-car audio priming (e.g., “turn left”) does indeed prompt participants to react more quickly to a subsequent road scene. In demonstrating the priming function of auditory road instruction for the first time, we highlight an additional potential benefit of audio in-car information systems, beyond the provision of navigational information.
A main finding from this experiment is that drivers respond most quickly to a road scene when visual and auditory prompts are presented concurrently. This was expected, considering the general benefits of presenting information in a multisensory manner (e.g., Spence & Santangelo, 2009). Novel to this study is the suggestion that—to be most effective—auditory road environment priming prompts should be timed to coincide with visual road sign prompts possible. This result can be explained in terms of a redundancy gain, whereby the concurrent activation of auditory and visual senses with congruent input lead to a response threshold being reached more quickly compared to when the information was presented in either sensory modality alone (e.g., Miller, 1982).
Because visual road signs elicited faster responses to road scenes than did audio instruction alone, we should be cautious to avoid suggesting that audio primes should supplant visual driving primes. Rather, we highlight the supplementary role that audio information can play in road scene priming—bolstering the effects of visual road sign priming when audio primes are timed to coincide.
In-car audio instruction typically focuses on providing navigational information in a timely manner. In an on-road driving experiment, drivers selected 8 s-9 s before a turn is required as the ideal presentation time of audio navigation information (e.g., 148 metres from turn at 40 mph; Green & George, 1995). In our experiment, the prime was presented 1.2 s prior to the required road turn. Decreasing the time between a prime and a target is likely to improve its priming efficacy (e.g., Huber & O’Reilly, 2003). However, we should be cautious here and note that drivers need enough time to respond safely to any upcoming road turn. As such, we suggest that two audio prompts be delivered when approaching a road turn: one several hundred metres before the turn (serving a traditional navigational function) and another approximately 1 s-2 s before the turn response is required (serving the significant priming function that our results suggest). Indeed, this should have a cumulative priming effect, similar to that observed in road sign studies using repetitive priming (e.g., Crundall & Underwood, 2001; Koyuncu & Amado, 2008). Novel to the literature, our results suggest that this will be especially effective when accompanied by a visual road sign instruction; priming timing effects should also be taken into account in road sign placement.
In this experiment, we note differential effects for auditory primes compared to visual primes or a multisensory combination of both. While all prime medium types elicit strong priming effects, with congruent primes speeding responses to subsequent road scenes, incongruent primes were less detrimental to performance when information was presented aurally. Specifically, visual and audio–visual combination trials elicited classic priming effects, with fastest responses to congruent trials, followed by control and incongruent trials. For audio trials, a different pattern emerges. Again we see the fastest responses for congruent trials, but incongruent trials elicit faster responses than control trials. This suggests that even incorrect audio instruction can prime drivers to respond faster than a neutral phrase. This wasn’t the case for trials containing visual information, and we propose that this may be due to the stronger visual ties between a road bend sign and a road bend, compared to the weaker association between the words “turn left” and a left road turn. In addition, up to one-third of people may experience mild difficulty when discriminating directional meaning from the words “left” and “right” (McMonnies, 1990), making audio directional instructions a weaker prompt.
Alternatively, the instruction to “turn” may provide a priming function in and of itself. That is, the audio command to “turn left” may speed responses because it contains an instruction to act (rather than serving a more descriptive “left turn” function). This might explain why even incongruent audio information (containing the commands “turn left” or “turn right”) led to faster response times than control audio information (“hello”). Interestingly, this does suggest that “getting it wrong” will be less detrimental to driving performance for audio instructions compared to visual instructions.
Laboratory driving studies have been found to yield comparable results to real-world driving studies (e.g., Lajunen, Hakkarainen, & Summala, 1996), and it is considered useful to use lab-based work to understand how drivers extract information from road sign cues (Castro & Horberry, 2004). Indeed, specific perceptual factors can be controlled and manipulated to a much greater extent in a lab setting. As such, much research on the efficacy of road signs is laboratory based, where we can tightly control variables such as duration of presentation, size, colour, and contrast of the sign (Wogalter & Laughery, 1996). In the case of this study, a lab-based experiment facilitated a clean comparison of the relative priming efficacy of different types of driving instruction. However, it is important to acknowledge that in a real-world driving environment, information presented using an in-car device—whether serving a navigational or priming purpose—will be competing for attentional demand with other visual and auditory stimuli. In particular, driving requires that a high level of visual attention be paid to the road scene, relative to auditory information. We should, therefore, be cautious in emphasising the superiority of visual priming over audio priming for road scenes on the basis of laboratory findings.
In-car navigational systems traditionally only provide information when there is a navigational choice to be made. That is, a driver is only presented with visual or audio “turn left” information when faced with a junction or motorway exit. This study investigated the advantages of audio, visual, and multisensory priming where a different type of road response is required—namely responding quickly and accurately to a road bend. Here, we were investigating whether the priming function served by road signs indicating a bend in the road ahead (e.g., Crundall & Underwood, 2001) could be replicated using in-car systems. Our results suggest that this should be the case. The response time advantages this type of priming confers would likely be most useful in precision driving situations. For example, this type of priming should be particularly useful in rally car driving, where extremely fast responses to the road environment are required. The priming function served by presenting this type of priming information—which is functionally different from navigational information currently provided by in-car systems—may confer small advantages in normal driving scenarios as well.
Responses in this experimental study were limited to button presses. In part, this design was chosen to control for extraneous perceptual variables to make a clean comparison between visual, audio and multisensory road instruction primes, as mentioned above. Another reason for this design choice was to closely replicate the conditions of Crundall and Underwood’s (2001) seminal study of the priming function of road signs. In doing so, we could test whether their findings of a priming effect of road signs might extend to in-car audio cues as well, a hypothesis which has not before been tested. Based on our findings, we recommend this as a promising line of investigation, and propose future simulator and real-world research to further investigate the priming potential of these audio cues.
Conclusion
With the increased availability of in-car information systems, the question of how these devices can best be used to prime drivers to respond, rather than simply provide navigational information, is becoming more pertinent. Our experiment demonstrates the efficacy of audio priming in responding to a road scene and suggests that the concurrent presentation of both visual and audio primes should improve reaction speed in a driving setting. Overall, we suggest a new research direction in the study of in-car information systems focusing on the priming potential of these devices.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
