Abstract
Objective:
It was investigated whether providing an explanation for a takeover request in automated driving influences trust in automation and acceptance.
Background:
Takeover requests will be recurring events in conditionally automated driving that could undermine trust as well as acceptance and, therefore, the successful introduction of automated vehicles.
Method:
Forty participants were equally assigned to either an experimental group provided with an explanation of the reason for a takeover request or a control group without explanations. In a simulator drive, both groups experienced three takeover scenarios that varied in the obviousness of their causation. Participants rated their acceptance before and after the drive and rated their trust before and after each takeover situation.
Results:
All participants rated acceptance on the same high level before and after the drive, independent of the condition. The control group’s trust ratings remained unchanged by takeover requests in all situations, but the experimental group showed decreased trust after experiencing a takeover caused by roadwork. Participants provided with explanation felt more strongly that they had understood the system and the reasons for the takeovers.
Conclusion:
A takeover request did not lower trust or acceptance. Providing an explanation for a takeover request had no impact on trust or acceptance but increased the perceived understanding of the system.
Application:
The results provide insights into users’ perception of automated vehicles, takeover situations, and a fundament for future interface design for automated vehicles.
Keywords
Introduction
Advances in passive and active safety technologies have led to a remarkable increase in traffic efficiency and safety (Kühn & Hannawald, 2016). Automated vehicles are currently being introduced to the consumer market, with the intention to provide an even higher standard (Watzenig & Horn, 2017). However, societal goals do not necessarily coincide with a driver’s personal goals (Adell, Várhelyi, & Nilsson, 2014a). Consequently, previous research that accompanied the introduction of advanced driver assistance systems (ADAS) has shown that to guarantee a successful introduction of a new technology, it is necessary to evaluate its deployment not only from a technological perspective but also from a driver-centered perspective (Bengler et al., 2014; Regan, Horberry, & Stevens, 2014). Whereas excellent system performance may be sufficient from a technical point of view, a system’s functionality must be known, understood, believed in, and valued by the driver in order for it to be accepted and used (Adell et al., 2014a; van der Laan, Heino, & de Waard, 1997). An unsystematic introduction without a driver-centric approach may give rise to issues such as information overload, overreliance, or negative behavioral adaptation to the technology (Broughton & Baughan, 2002; Mahr & Müller, 2011; Parasuraman & Riley, 1997). These issues can lead to low acceptance or even disuse of the new system after its introduction despite all the possible benefits (Lee & Seppelt, 2012).
Acceptance represents a multidimensional attitude that results out of the fulfillment of the user’s individual needs and requirements. It consists of an affective as well as a rational-cognitive (e.g., perceived usefulness) component and is an antecedent of the intention to buy and to use a system (Adell et al., 2014a; Schade & Baum, 2007; van der Laan et al., 1997). We define acceptance as an attitude and follow Adell’s (2009) definition of acceptance as “the degree to which an individual intends to use a system and, when available, to incorporate the system in his/her driving” (p. 31). Acceptance is closely related to actual usage of a system because, as described in the theory of planned behavior (Ajzen, 1991), attitudes influence the intention to use a system and, thereby, actual behavior. Based on this theory, the technology acceptance model (Venkatesh, Morris, Davis, & Davis, 2003) has successfully explained the adoption of driver assistance systems or automated vehicles in several studies (Choi & Ji, 2015; Ghazizadeh, Peng, Lee, & Boyle, 2012; Meschtscherjakov, Wilfinger, Scherndl, & Tscheligi, 2009).
The introduction of driving automation will generate the claimed benefits only if the technology is accepted by the drivers and used appropriately (Najm, Stearns, Howarth, Koopmann, & Hitz, 2006). Contrary to manual driving, in conditionally automated driving (Level 3 in SAE International, 2016), the driver is removed from the driving task and a driving automation operates the vehicle. The driver merely acts as a fallback level and has to take over vehicle control at system limits. This concept of vehicle control represents a novelty for the majority of the driving population, which is why acceptance is not guaranteed and has to be investigated (Payre, Cestac, & Delhomme, 2014).
Trust as a Necessary Precondition of Acceptance
Given the close relationship between trust in automation and reliance on it (Bailey & Scerbo, 2007; Körber, Baseler, & Bengler, 2018), it seems reasonable to include trust in an acceptance framework. Indeed, previous research has shown that trust is a key determinant for the adoption of new technologies (Gefen, Karahanna, & Straub, 2003), the adoption of automation (Lee & Moray, 1992, 1994; Parasuraman & Riley, 1997), and the intention to use autonomous vehicles (Choi & Ji, 2015). The incremental value of investigating trust in studies on acceptance has been successfully shown by several studies, such as on an onboard monitoring system (Ghazizadeh, Peng et al., 2012), on ADAS (Trübswetter & Bengler, 2013), and on the reliance on and intention to use automated vehicles (Choi & Ji, 2015). Consequently, trust in automation as a determinant of acceptance of automation has been included in Arndt’s (2011) model of acceptance of ADAS and in the automation acceptance model (AAM) of Ghazizadeh, Lee, and Boyle (2012). In the AAM, trust partially mediates the effect of the operator’s beliefs and external variables on perceived usefulness and perceived ease of use but also has a direct effect on the behavioral intention to use an automation. Hence, trust in automation is a necessary condition that has to be fulfilled before acceptance may arise. Put simply, “operators tend to use automation that they trust while rejecting automation that they do not” (Pop, Shrewsbury, & Durso, 2015, p. 545). Therefore, it is necessary to include an assessment of trust in automation in a study on acceptance of automation.
Increasing Trust and Acceptance by Providing Explanations
Operator and automation are not isolated entities but act as a joint system, that is, as a team (Bengler, Zimmermann, Bortot, Kienle, & Damböck, 2012). Therefore, a driving automation cannot be considered in isolation from its users and must be designed following a human-centered approach to perform in conjunction with the human interacting with it (Billings, 1997; Christofferson & Woods, 2002; Sheridan & Parasuraman, 2005). In comparison with ADAS, a driving automation represents a more sophisticated automated system and an increase in autonomy and authority (Parasuraman, Sheridan, & Wickens, 2000). Although a status icon alone may be sufficient for a less complex function, such as a lane departure warning system, it may no longer be sufficient to support effective coordination with more complex machine agents, like a driving automation, that require more coordination (Norman, 1990; Sarter, 2008). Coordination needs an adequate model of the automation’s intentions and actions. In order to design automated systems as “cooperative partners rather than as mysterious and obstinate black boxes” (Christofferson & Woods, 2002, p. 4), they should act neither capriciously nor unobservably (Klein, Woods, Bradshaw, Hoffman, & Feltovich, 2004; Lee & Seppelt, 2009).
However, feedback alone is not enough; the interactions have to be as comprehensible for the driver as possible to create a common ground and, thereby, to ensure the construction of a correct mental model (Clark & Brennan, 1991). Drivers of automated vehicles will be not experts but laypersons who do not possess complete in-depth knowledge of the automation and must at first build themselves a mental model of its functioning (Walker, Stanton, & Salmon, 2016). A user generally builds his or her mental model based on the information provided by the system or interactions with it (Naujoks & Totzke, 2014). Hence, to ensure trust in driving automation, it is crucial to provide the driver with obvious and comprehensible information on its intentions, state, capacity, and upcoming actions to help him or her understand and to make it predictable. Otherwise, the increase in autonomy and authority creates an intransparent black box, where users cannot comprehend or retrace the actions (Dzindolet, Peterson, Pomranky, Pierce, & Beck, 2003; Verberne, Ham, & Midden, 2012).
Automation failures result in a drop of trust in the automated system (Lee & See, 2004); however, as Lewandowsky, Mundy, and Tan (2000) concluded, this drop represents more than a simple perception of whether an automation failure occurred, because the failure’s impact depends on its predictability rather than on its magnitude. A drop in trust in ADAS follows only if problems were omitted in a description of the system given beforehand (Beggiato & Krems, 2013) or if the failures were inconsistent with the perceived design of the system or occurred unpredictably (Lees & Lee, 2007). The attitude toward an automated system is, therefore, not purely based on performance (Lewandowsky et al., 2000). Even if the system exhibits high performance, a discrepancy between the operator’s expectations and the system’s behavior, that is, a large gulf of evaluation (Norman, 2013), can erode trust (Lee & See, 2004). If operators had prior knowledge of the magnitude of the failure (Riley, 1996) or if the failure was predictable or if its cause was comprehensible (Dzindolet et al., 2003), a decrease in trust did not occur. Accordingly, Gold, Körber, Hohenberger, Lechner, and Bengler (2015), as well as Hergeth, Lorenz, Krems, and Toenert (2015), observed a slight increase in trust after the experience of a takeover request (TOR) because the automation worked as described beforehand.
Dimensions such as predictability, understanding, or transparency have been proposed as a basis for trust in automation (Hoff & Bashir, 2015; Lee & See, 2004), which has been empirically shown in several studies (Choi & Ji, 2015; Muir & Moray, 1996; Seong & Bisantz, 2008). For example, Beller, Heesen, and Vollrath (2013) presented the uncertainty of an automation in an interface, which led to better knowledge of fallibility and, in consequence, to higher trust ratings and increased acceptance. Users rated an adaptive cruise control system that took over the driving task as more trustworthy and acceptable when it provided information on this action (Verberne et al., 2012). Forster, Naujoks, and Neukum (2017) found that the provision of auditory explanations of the automation’s actions led to higher reported trust.
Besides these aforementioned cognitive aspects, Adell, Várhelyi, and Nilsson (2014b) suggested investigating the emotional reactions of the driver, such as irritation or stress, in research on user acceptance. Beaudry and Pinsonneault (2010) already showed that anxiety is negatively related to the use of information technology. Individuals tend to search for or create explanations for unpleasant events afterward if no immediate reason can be deduced from the environment or prior knowledge, referred to as retrospective control (Thompson, 1981). Because unexpected TORs are rather stressful situations (Maule & Hockey, 2012), providing an explanation after the TOR might alleviate the negative affective reaction and promote a feeling of control. Accordingly, Koo et al. (2015) reported that providing information yielding reasons for the actions of an auto-brake function created the least anxiety and highest trust and was preferred by the drivers. Hence, avoiding negative emotions is essential in guaranteeing user acceptance.
In this study, we explicitly focus on takeover situations. We investigate if providing an explanation for a TOR increases system transparency and understanding and, in doing so, also increases trust in automation as well as acceptance of the automation. We expect that an explanation should avoid a decrease in trust and acceptance when a takeover situation occurs because it guarantees the construction of an appropriate mental model by helping to bridge the gulf of evaluation (Norman, 2013), enabling a driver to learn when a takeover situation is to be expected and how to react appropriately (Larsson, Kircher, & Andersson Hultgren, 2014). The created predictability and comprehension of the situation should mitigate the negative impact of a TOR on trust (Riley, 1996). An explanation also helps to avoid automation surprises (Sarter, Woods, & Billings, 1997) and negative emotional reactions caused by unexpected situations, which are known to reduce acceptance.
Depending on the situation, providing information can, however, also be counterproductive. Whereas an explanation beforehand is often not possible due to technological limits (e.g., sensor range; Gold & Bengler, 2014), a presentation simultaneous with the TOR might overload information-processing capacity and may result in a delayed reaction (Walch, Lange, Baumann, & Weber, 2015; Wickens, 2002). Besides a possible objective detrimental effect, subjective ratings of real-time feedback appear to be more negative as well. Koo et al. (2015) reported that the participants felt subjectively overstrained if too much information was presented during the automatic brake maneuver. Similarly, Roberts, Ghazizadeh, and Lee (2012) compared the acceptance of real-time with postdrive driving performance feedback. Drivers rated real-time feedback as more obtrusive, less useful, and less easy to use. To provide the explanation without a loss in performance and appraisal but still linked to the situation at hand, we suggest presenting the explanation directly after regaining vehicle control and stabilizing the vehicle, that is, after the situation is solved and when workload is at a sufficiently low level. To increase the generalizability of the results, we investigate the provision of explanations in situations with varying obviousness.
Prestudy
We conducted an online prestudy to evaluate if the chosen takeover situations were comprehensible and whether they differ in their obviousness of the reason for the takeover. In this survey, a total of 36 participants, 20 (55%) male, 16 (45%) female, between the ages of 18 and 51 (M = 25.60, SD = 6.30), watched videos of three different takeover scenarios (duration between 14 and 29 s, filmed in ego perspective). The three scenarios, which we expected to vary in their obviousness, were (a) GPS data missing (“GPS”; low obviousness), (b) missing lane markings (“missing lines”; medium obviousness), and (c) roadwork (high obviousness). The videos were presented in a resolution of 680 × 400 pixels. The TOR signal was a sharp sinusoidal tone (3000 Hz) and a blinking hands-on icon and was presented 9 s prior to a theoretical takeover. After every video, participants answered the following three questions on a 5-point rating scale from not at all (1) to very much (5):
“I think this TOR was a system failure.”
“It is obvious to me, why the TOR was triggered.”
“I would have wished for an explanation, why this TOR was triggered.”
The results, illustrated in Figure 1, show that the scenarios tend to differ for all three questions. In addition, the participants could elaborate as to why they thought the TOR was triggered. No participant could name the correct reason for the system limit for the GPS scenario, 35% answered correctly for the missing-lines scenario, and 78% could name the correct reason in the roadwork scenario. The results of this prestudy are described in further detail in Prasch and Tretter (2016).

Reported answers on the videos by question and takeover request.
Main Study
Experimental Design and Scenarios
In the main study, we used a 2 × 3 mixed design. The factor Explanation (between subjects) consisted of a control and an experimental group (“explanation”). We assigned the participants equally and randomly to both groups. The explanation group was provided with an explanation of the reason for the TOR after each takeover situation. This explanation was absent in the control group. The explanations conveyed the external reasons for the TOR as well as the internal implications for the system (Koo et al., 2015; Lombrozo, 2006). Every explanation had the same structure and wording, with the only difference being the respective cause and effect: “The takeover request was triggered because of [cause]. Due to [effect], driving in highly automated mode can temporarily not be continued.” The explanations were recorded by a female voice actor in a natural manner and friendly tone as recommended by Broadbent, Stafford, and MacDonald (2009). The explanations were presented on the mock-up speaker system at 68 dB 14 s after the presentation of the TOR. At the same time as the audio, a flashing icon was displayed in the head-up display (HUD) indicating the presence of an explanation. The participants of both groups carried out a non-driving-related task (NDRT), the Surrogate Reference Task (ISO 14198, 2012), while driving in conditional automated mode (Level 3; SAE International, 2016).
The factor Scenario (within subjects) represented three takeover scenarios that each participant experienced in the course of the experimental drive: (a) GPS, (b) missing lines, and (c) roadwork. The scenarios were chosen to correspond to realistic takeover situations in automated driving (Aeberhard et al., 2015) and varied in their obviousness of the reason of the takeover, as tested in the prestudy. The GPS scenario represented a TOR caused by missing GPS data (Figure 2). Conditionally highly automated driving requires highly precise map data that are not available for every section of highways yet (Aeberhard et al., 2015). If these data are missing for the current section of the road, a TOR is emitted. In this scenario, no visible cue for the reason of the takeover was present. The missing-lines scenario represented a highway section where the right-lane markings were missing (Figure 3). Without lane markings, it is impossible for the vehicle to detect its exact position in the lane, and a TOR has to be emitted. This scenario contained a visible cue for the reason of the takeover in form of the missing lane markings. The roadwork scenario (Figure 4) represented roadwork in the participant’s lane, which required bypassing in an alternative lane. In such an unpredictable situation and without map data, conditionally automated driving becomes unavailable and a TOR is emitted. In this scenario, the reason for the TOR (roadwork) was directly visible to the driver.

Schematic visualization of GPS scenario.

Schematic visualization of missing-lines scenario.

Schematic visualization of roadwork scenario.
Every scenario was exactly 1,000 m (30 s at a speed of 120 km/h) long and started with a TOR 9 s before the irregularity in the environment/the cue for the reason of the TOR (disappearing lane markings or yellow, swerving lanes in the roadwork scenario). This time budget corresponds to the time taken for a noncritical takeover process for the great majority of participants (Eriksson & Stanton, 2017). It was thereby ensured that all situations were experienced as noncritical to avoid a confounding influence of criticality. No other traffic was present during the TOR. After every scenario, the automation became available again, which was indicated by an icon in the instrument cluster. The order of the scenarios was permutated using a Latin square. In each situation, the NDRT was presented three times for 60 s, and the first presentation was interrupted by the TOR. In addition, to reduce the predictability of the TOR, the driving time prior to the TOR (ranging from 2.50 to 7.50 min) was manipulated by implementing up to two NDRT phases (Figure 5), also permutated according to Latin square.

Procedure of the experiment.
Hence, a participant encountered three situations that each included one of the scenarios (GPS, missing lines, roadwork), with the time between the scenarios being either short, medium, or long. Trust was measured before and after each of the three TORs. The design and procedure of this study was critically evaluated by the institute’s interdisciplinary internal ethical review entity.
Instructions and experimental track
The experiment was carried out in a driving simulator on a two-lane freeway. Prior to the experimental drive, the participants received a written introduction to the automation that explained the functionality of the automation and its interface (e.g., icons). The participants were instructed that the automation is capable of executing lateral as well as longitudinal control without the need to monitor it. The automation can not solve every situation and the driver is requested to take over vehicle control within a sufficient time budget in this case. The participants then performed an approximately 5-min familiarization drive. In this drive, they were prompted to steer and to brake manually, to turn on the automation, and to observe the automation carrying out vehicle control. They also engaged in the NDRT and experienced a single TOR. The drive came to an end when the participants indicated that they felt comfortable using the driving simulator. The following experimental drive was a single drive of approximately 30 min and contained three TORs in the aforementioned scenarios. Following previous studies (Gold, Körber, Lechner, & Bengler, 2016; Körber, Gold, Lechner, & Bengler, 2016), each TOR was represented by a blinking hands-on icon in the HUD and a sharp double earcon (3000 Hz at 74 dB) via the mock-up speaker system with a time budget of 9 s.
Sample
A total of n = 40 participants, 20 (50%) female and 20 (50%) male, took part in the study. The participants were between the ages of 21 and 30 (M = 25.20 years, SD = 2.60). All of them were students or employees at the Technical University of Munich. Possession of a valid driver’s license was required for participation (mean duration of possession, M = 7.40 years, SD = 2.30). Participants completed an informed consent form and acknowledged their voluntary participation and consent with a signature. Twenty-four (60%) participants had already taken part in at least one driving simulator study. Annual mileage and acquaintance with automated driving are shown in Table 1. No participant reported an impairment relevant for driving. Participation was rewarded with candies. The three participants with the best performance in the NDRT were rewarded with vouchers for an online store worth €20, €30, and €50.
Participant’s Annual Mileage and Reported Acquaintance With Automated Driving on a Rating Scale From 1 (Lowest) to 5 (Highest)
Apparatus and Measures
Driving simulator and driving automation
The study was conducted in a static driving simulator equipped with a BMW 6 Series mock-up. Seven projectors provided a front view of approximately 180°, side and rear mirrors, and a mock-up of a HUD. The implemented driving automation performed on SAE Level 3, conditional automation (SAE International, 2016). The participants were asked to attend to the NDRT whenever it was present. The automation could be toggled via a button on the steering wheel and was also shut off by steering or braking input. The participants were instructed to switch on the automation whenever it was available. Its status was displayed via an icon in the top center of the instrument cluster.
NDRT
While driving, participants had to perform an NDRT, the Surrogate Reference Task (ISO 14198, 2012), which is a visual-/manual-demanding task that simulates real-life situations in which drivers are strongly engaged in an NDRT during conditional automated driving. In this task, the participants were presented a scatter of 50 white circles (diameter 40 pixels) in 18 columns and 15 rows on a black background. A single, larger circle (diameter 47 pixels) randomly implemented in this scatter represented the target stimulus. The participants’ task was to find that larger circle and to highlight the respective column out of a total of six selectable columns. The task was presented for 60 s every 2.50 min on a 14-in. Lenovo ThinkVision monitor at a resolution of 1,366 × 768 pixels mounted on the center console and operated via an external numeric keypad next to the gear lever. To increase their motivation, participants were informed that their performance was being tracked and the best three participants would be rewarded with vouchers.
Acceptance questionnaire
Following previous studies on the acceptance of ADAS (Adell, Várhelyi, & Hjälmdahl, 2008; Törnros, Nilsson, Östlund, & Kircher, 2002), we measured acceptance of the driving automation using a questionnaire by van der Laan et al. (1997). It represents a semantic differential consisting of two scales, usefulness and satisfaction, each containing nine bipolar items (e.g., useful–useless) that are rated on 5-point rating scales from −2 to 2. The questionnaire was presented before and after the experimental drive via GoogleForms.
Trust questionnaire
Trust in automation was measured with a single item that has been shown as valid in previous studies (Beller et al., 2013; Brown & Galster, 2004; Hergeth, Lorenz, Vilimek, & Krems, 2016). The participants were prompted via an intercom system to rate their trust on a scale from 0 to 100 (“On a scale from 0 to 100, how much do you trust the system?”) after each engagement in the NDRT. We analyzed only the trust ratings reported directly before and after each takeover.
Understanding of the TOR
To assess if the explanation of the TOR had an effect on the predictability and understanding of the automation, we presented four statements, which could be answered on a rating scale. Participants could rate how much they felt safe during the takeover, how much they felt that they understood the system, and how much they would like to know more about the system.
Procedure
After they had been welcomed by the experimenter, the participants received the instructions and filled out a questionnaire on demographic data. Next, participants started the familiarization drive and practiced the NDRT. Afterward, the participants filled out the van der Laan questionnaire (van der Laan et al., 1997) for the first of two times. Subsequently, the experimental drive started. Upon completion, the same questionnaire was filled out for the second time, and the participants were interviewed with regard to their experience of the scenarios. At the end, the participants were debriefed and the reward for participation was given.
Data Analysis
We relied on Bayesian parameter estimation to quantify the uncertainty in the parameter estimates and to obtain a full-probability distribution on the resulting credible interval (Kruschke, 2015). For hypothesis testing, we relied on Bayes factors (BF; Rouder, Speckman, Sun, Morey, & Iverson, 2009), which represent the ratio of the probability of the data given a null model to the probability of the data given an alternative model and thus quantify whether the data are more compatible with a null model or an alternative (Schönbrodt, Wagenmakers, Zehetleitner, & Perugini, 2015). A BF, therefore, directly quantifies evidence as a likelihood ratio and also, contrary to a p value, is able to obtain evidence for a null hypothesis as it can distinguish between uninformative results and results supporting the null hypothesis (Dienes, 2014). A BF10 of 3, for example, states that the data are 3 times more likely in the alternative model than in the null model. If it equals 1, both models predict the data equally well or the data are uninformative for a decision. Lee and Wagenmakers (2013) interpret a BF10 1 to 3 as anecdotal evidence, 3 to 10 as moderate evidence, and >10 as strong evidence. The data analysis was carried out by the BayesFactor package (Morey & Rouder, 2015) and scripts by Kruschke (2015) implemented in the statistical computer software R (R Core Team, 2016) and JAGS (Plummer, 2003).
A Cauchy distribution with r = 1/√ 2 was chosen as the prior distribution for the effect size δ of the alternative model in the Bayesian t test. This weakly informative prior was chosen as a trade-off between results that are completely determined by data and the expectation of a small to medium effect size and represents an anchor point in psychological research (Schönbrodt et al., 2015). With this prior, a p value of .05 in an independent-samples t test with t(40) = 2.021 corresponds to a BF10 = 1.49. We estimated the descriptive parameters with a normal prior and uninformative priors for its parameters (µ ~ N(x-, 1/(100⋅ σ²)); σ ~ U(σ/1000, σ⋅100)).
Results
Acceptance
We compared both scales of the questionnaire between the experimental group (with explanations) and the control group (without explanations) as well as within each group before and after the experimental drive. The descriptive statistics for the satisfaction scale are reported in Tables 2 and 3. With regard to the reports of satisfaction, we found no difference between the groups before (BF10 = 0.36) and after the experimental drive (BF10 = 0.42). There was also moderate evidence that the ratings did not change within the control group before and after the experiment (BF10 = 0.23). Data were inconclusive whether a slight decrease in the explanation group occurred (BF10 = 0.74). The results are visualized in Figure 6.
Sample Description of the Scores on the Satisfaction Scale
Note. HDI = 95% highest-density interval; LL = lower limit; UL = upper limit; BF = Bayes factor; pre-exp = before the experimental drive; post-exp = after the experimental drive.
Difference Pre-/Post-Takeover Situation on the Satisfaction Scale
Note. BF = Bayes factor.

Difference before and after the experimental drive on the satisfaction scale by group. Error bars = 95% highest-density interval.
To investigate the interaction between the conditions and the time of measurement, we conducted an ANOVA conceptualized as a hierarchical linear mixed model in which the levels are clustered within each factor, following the approach of Rouder, Morey, Verhagen, Swagman, and Wagenmakers (2016). Here, the effect of group and point of measurement are expressed in the effect size di, where each factor gets a shared prior for its levels. Consistent with the prior, the prior width for the expected range of effect sizes was set to r = .5 (medium), which corresponds to the prior width of r = 1/√ 2 for the Bayesian t test (Wagenmakers et al., 2017). Participant was included as a random factor. An ANOVA showed no interaction effect between group and point of measurement (BF10 = 0.08; Table 4).
ANOVA for the Scores of the Satisfaction Scale With the Factors Group and Point of Measurement
Note. BF = Bayes factor; indicates comparison with a null model without any factors.
The data on the ratings of usefulness showed no difference between the groups before (BF10 = 0.39) and after the experiment (BF10 = 0.59) and also no change within a group (BF10 Control = 0.25, BF10 Explanation = 0.45; Tables 5 and 6; Figure 7). An ANOVA indicated no interaction effect between group and point of measurement (BF10 = 0.11; Table 7).
Sample Description of the Scores on the Usefulness Scale
Note. HDI = 95% highest-density interval; LL = lower limit; UL = upper limit; BF = Bayes factor; pre-exp = before the experimental drive; post-exp = after the experimental drive.
Difference Pre-/Post-Takeover Situation on the Usefulness Scale
Note. BF = Bayes factor.

Difference before and after the experimental drive on the usefulness scale by group. Error bars = 95% highest-density interval.
ANOVA for the Scores of the Usefulness Scale With the Factors Group and Point of Measurement
Note. BF = Bayes factor; indicates comparison with a null model without any factors.
Trust Ratings
The participants reported their subjective trust on one item with a rating scale from 0 to 100 before and after the experience of a scenario. The results are visualized in Figure 8.

Differences in trust scores before and after the scenarios by group and by scenario. Error bars = 95% highest-density interval.
Scenario (a): GPS
We found no difference between the groups before (BF10 = 0.31) and after the experimental drive (BF10 = 0.33; Table 8) as well as no change within the groups (BF10 Control = 0.25, BF10 Explanation = 0.37; Table 9).
Sample Description of the Trust Scores in the GPS Scenario
Note. HDI = 95% highest-density interval; LL = lower limit; UL = upper limit; BF = Bayes factor.
Difference Pre-/Post-Takeover Situation in GPS Scenario
Note. BF = Bayes factor.
Scenario (b): Missing lines
The data also showed no difference between groups before (BF10 = 0.32) and after (BF10 = 0.31) the experimental drive (Table 10) as well as no change within the groups (BF10 Control = 0.28, BF10 Explanation = 0.24; Table 11).
Sample Description of the Trust Scores
Note. HDI = 95% highest-density interval; LL = lower limit; UL = upper limit; BF = Bayes factor.
Difference Pre-/Post-Takeover Situation in the Missing-Lines Scenario
Note. BF = Bayes factor.
Scenario (c): Roadwork
We found no difference between the groups before (BF10 = 0.36) and after the scenario (BF10 = 0.46) as well as no change within the control group (BF10 = 0.23; Tables 12 and 13). However, we found substantial evidence for a decrease in trust within the explanation group of Δ = 5.54 score points (5.98 %; d = 0.60 [0.13, 1.08]; Table 13).
Sample Description of the Trust Scores
Note. HDI = 95% highest-density interval; LL = lower limit; UL = upper limit; BF = Bayes factor.
Difference Pre-/Post-Takeover Situation in the Roadwork Scenario
Note. BF = Bayes factor.
We carried out an ANOVA to evaluate the evidence for an interaction effect. Data yielded no interaction effect in the GPS (BF10 = 0.06) and missing-lines (BF10 = 0.04) scenarios but moderate support for an interaction of group and point of measurement in the roadwork scenario (BF10 = 2.64); this finding is consistent with the analysis in Table 13. Table 14 lists the results.
ANOVA With the Factors Group and Point of Measurement
Note. BF = Bayes factor; indicates comparison with a null model without any factors.
Independent of the scenario, we investigated if the trust level changed in the course of the experiment. The data points in Figure 9 represent the mean of the pre- and post-scenario trust rating. We found moderate evidence for an increase in course of the experiment (BF10 = 3.89) and moderate evidence that this effect was independent of group (BF10 Interaction = 0.46; Table 15).

Development of the trust score in course of the experiment by group. Error bars = 95% highest-density interval.
ANOVA With the Factors Group and Point of Measurement
Note. BF = Bayes factor; indicates comparison with a null model without any factors.
In an explanatory analysis, we compared the difference in the trust ratings between the rating before and after the TOR for participants who experienced roadwork as their first, second, or third scenario. Although there was no difference in the trust ratings if the participants experienced roadwork as their first scenario (MΔ = 2.17, BF10 = 0.42), the difference was already larger if it was the second scenario (MΔ = 3.57, BF10 = 1.50) and large if it was their last scenario (MΔ = 10.29, BF10 = 4.50, d = 1.24). This trend was not observable in the control group (BF10 Situation 1 = 0.60, BF10 Situation 2 = 0.71, BF10 Situation 3 = 1.41). However, the sample sizes (n = 7) for these calculations are too small to conduct reliable and valid inferential statistical methods.
Understanding of the TOR
After the experiment, we asked the participants to rate four statements on their experience with the takeover situations. We used an ordinal probit model for parameter estimation, which assumes an underlying normal distributed metric variable that is mapped to the empiric ordinal values via response thresholds (Liddell & Kruschke, 2015). There was no evidence for a difference in the ratings of Questions 1 and 4. However, the participants in the explanation group felt more strongly that it was clear why they had to take over (BF10 = 149.10) and that they had understood the system (BF10 = 14.71; Table 16).
Descriptive Results of the Four Questions After the Experimental Drive
Note. N = 20. BF = Bayes factor.
Discussion
In this study, we investigated the effect of providing an explanation of the reason for a TOR on trust and acceptance of driving automation. An experimental group provided with an explanation of the reason for an occurred TOR and a control group given no explanations experienced three takeover situations that varied in the obviousness of the reason for the takeover.
Both groups indicated in the questionnaire prior to the experimental drive that they were satisfied with the system and found it useful. This appraisal did not change by experiencing the three takeover situations. Consistent with previous findings (Gold et al., 2015), it seems that participants do not view a TOR, as implemented in this study, as a threatening malfunction but rather as a legitimate warning of a system that is working correctly. In general, trust ratings increased slightly from experiencing the first takeover to experiencing the last takeover, independent of the condition. This increase in trust with increasing system experience and no experience of negatively evaluated events has also been reported in similar studies (Beggiato, Pereira, Petzoldt, & Krems, 2015; Hergeth et al., 2016). Accordingly, a takeover situation did not influence the trust rating, and we found no difference between both groups in the GPS and missing-lines scenarios. However, we found persuading evidence for a decrease in trust in the explanation group in the roadwork scenario. An imaginable reason for this finding might be that the explanations led to a different evaluation of the automation’s competence. The provided explanations might have conveyed the image of a more complex and competent system in contrast to the system in the control group, which merely experienced performed lateral and longitudinal control. Therefore, it may be surprising for the participants of the explanation group that roadwork, the most obvious reason for the TOR, could not be solved by the driving automation.
A similar finding was observed by Madhavan, Wiegmann, and Lacson (2006), who observed that automation errors in easy trials led to greater mistrust than errors in difficult trials. Even small errors of an automated system affect trust more than a large error if the error was unexpected (Muir & Moray, 1996), and trust erodes if the system does not behave as expected even if it shows high performance (Lee & See, 2004). Because the assessment of the automation’s competence requires some experience with the system and some exposure to the explanations, the effect should be the most pronounced in the last situation. Following this line of thought, we compared how much the trust ratings changed by experiencing the TOR for participants who experienced roadwork as either their first, second, or third scenario in an explanatory, descriptive analysis. There was no change in the trust ratings if the participants experienced roadwork as their first scenario but a large decrease occurred if it was their third scenario. We did not observe this trend in the control group. Each of the three scenarios was implemented with a noncritical takeover time budget of 9 s. Whereas the road continued as a straight lane after the TOR in the GPS and missing-lines scenarios, roadwork was the only scenario that required steering after the 9 s to follow the alternative lane on the construction site (see Figure 4). Therefore, a miscalibration of trust might weight stronger than in the other scenarios, which might be the reason why a TOR had different influence on trust in this scenario.
Nevertheless, all scenarios were easily solvable. The participants might therefore not have seen the explanations as overly helpful given that no problem occurred that may be explained to ease the mind. The lack of consequences and real risk in simulator driving might have alleviated the need for explanations as well. That being said, the explanations could have a stronger effect if the situations are more critical or more confusing. Last, the interaction with the automation was very short and limited to longitudinal and lateral control. Drivers might be more in need of transparency and explanation in more complex situations, such as an overtaking maneuver, crossroads, or entering a highway. The results also have to be interpreted in light of the fact that both acceptance and trust were on a very high level right from the beginning, although the automation’s functioning and limitations were explained in a neutral and accurate way prior to the experiment. A possible reason for this fact may be that the study was conducted at a technical university, with the majority of the participants being students. The affinity for and trust in technology may generally be on a very high level in such a sample. We, therefore, recommend repeating the study with a sample that has a lower affinity for technology and less experience with automated driving.
In their rating of their understanding of the TOR, participants in the explanation group felt more strongly than the control group that they had understood the system and that the reason for the takeover was clear to them. Hence, albeit the explanations had no systematic effect on trust and acceptance, the increase in transparency by the explanations seems to have been successful. Authors of future studies should explicitly investigate whether this subjective increase indeed reflects an improvement in the constructed mental model. For example, drivers should then be able to predict a TOR in a novel situation with higher accuracy. Furthermore, behavioral measures, such as takeover time or gaze behavior, may also function as an indicator of system understanding. For example, individuals generally reacted faster to a signal if the signal was expected to appear. A lower takeover time thus may indicate that the user understood the system and was able to predict the emission of the TOR (Larsson et al., 2014; Martens, 2004).
Limitations and Future Work
The study was conducted in a driving simulator to ensure that each participant experienced exactly the same scenarios. It is possible that the participants may have reported differently due to the lack of risk in a simulator, especially regarding their perceived safety during the takeover situations. Hence, providing an explanation could have a greater effect in a naturalistic drive. That being said, Eriksson and Stanton (2017) have shown that participants’ behavior and subjective ratings did not substantially differ between an naturalistic automated on-road drive and a high-fidelity simulator. We recruited a gender-balanced sample, but at the same time, mostly students from a technical university who were between 21 and 30 years old took part, which led to a homogenous sample regarding affinity to technology, prior knowledge as well as experiences, and trust in automation (Körber et al., 2016). Recent research has revealed moderating covariates, such as age, that may influence attitudes toward automated driving (Hohenberger, Spörrle, & Welpe, 2016; Körber & Bengler, 2014; Payre et al., 2014). To increase the external validity of the results, we therefore recommend investigating attitudes toward automated driving with different demographics in future studies.
Key Points
Providing a post hoc explanation for a takeover request had small to no impact on trust or acceptance of a driving automation. Providing a post hoc explanation increased the perceived understanding of the system and of the reason for a takeover request.
Footnotes
Acknowledgements
The authors thank Joachim Vandekerckhove and Torrin M. Liddell for their statistical advice. We also thank Jonas Schmidtler for his comments on this manuscript.
Moritz Körber is a graduate research associate working with Klaus Bengler at the Chair of Ergonomics at the Technical University of Munich. After obtaining his diploma in psychology in 2012 at the University of Regensburg, his research interests now are automated driving and methodology.
Lorenz Prasch earned his BSc in cognitive science at the University of Tübingen and his master’s degree in human factors engineering at the Technical University of Munich with a focus on system ergonomics and interaction design. Since October 2016, he is working at the Chair of Ergonomics as a research associate and continues to study the field of cooperation of highly automated vehicles.
Klaus Bengler graduated in psychology at the University of Regensburg in 1991 and received his PhD in 1995 in cooperation with BMW Group at the Institute of Psychology (supervisor: Prof. Dr. Zimmer). He is the head of the Chair of Ergonomics at the Technical University of Munich, which is active in research areas like digital human modeling, human–robot cooperation, driver assistance, HMI design, and human reliability.
