Abstract
Objective
The aim of this paper was to synthesize the experimental research on factors that affect takeover performance during conditionally automated driving.
Background
For conditionally automated driving, the automated driving system (ADS) can handle the entire dynamic driving task but only for limited domains. When the system reaches a limit, the driver is responsible for taking over vehicle control, which may be affected by how much time they are provided to take over, what they were doing prior to the takeover, or the type of information provided to them during the takeover.
Method
Out of 8446 articles identified by a systematic literature search, 48 articles containing 51 experiments were included in the meta-analysis. Coded independent variables were time budget, non-driving related task engagement and resource demands, and information support during the takeover. Coded dependent variables were takeover timing and quality measures.
Results
Engaging in non-driving related tasks results in degraded takeover performance, particularly if it has overlapping resource demands with the driving task. Weak evidence suggests takeover performance is impaired with shorter time budgets. Current implementations of information support did not affect takeover performance.
Conclusion
Future research and implementation should focus on providing the driver more time to take over while automation is active and should further explore information support.
Application
The results of the current paper indicate the need for the development and deployment of vehicle-to-everything (V2X) services and driver monitoring.
Keywords
Introduction
Driving automation systems are becoming increasingly popular in modern automobiles. These systems take on at least part of the driving task and have the potential to increase safety by addressing a major factor present in 94% of fatal crashes: human error (National Highway Traffic Safety Administration, 2017). SAE International (2018) defines six levels of driving automation, from no driving automation (level 0) to full driving automation (level 5), as shown in Figure 1.

SAE J3016 levels of driving automation
Numerous automobile manufacturers have become increasingly interested in developing higher levels of driving automation (i.e., level 3 and above) and plan to release them in the near future, though none have been released yet (Audi, 2017; BMW, 2020; Ford, 2016; General Motors, n.d.; Tajitsu, 2019). For level 3 driving automation (i.e., conditionally automated driving), the vehicle is equipped with an automated driving system (ADS) capable of performing the entire dynamic driving task on a sustained basis for limited operational design domains (ODDs), which are the operating conditions for which a driving automation system is designed (e.g., environmental, geographical, time-of-day, traffic, and roadway maintenance constraints; SAE International, 2018). Hence, level 3 driving automation does not require the driver to monitor the roadway like lower level automated systems, but the driver is expected to be available to resume control of the vehicle when the vehicle issues a takeover request (TOR) because it is exiting its ODD and when there is an evident vehicle system failure (SAE International, 2018). One of the biggest concerns for these higher levels of driving automation is that the driver must take over vehicle control following a period in which they were not engaged in the driving task, which could lead to a problem known as the out-of-the-loop performance problem.
Out-of-the-Loop Performance Problem and Time Budget
The out-of-the-loop performance problem is a phenomenon in which operators of automated systems are less capable of performing a task after they take over control of the system compared to manual operators (Endsley & Kiris, 1995). It is likely to occur when drivers are not required to monitor the roadway environment during driving automation (Gold et al., 2013; Körber et al., 2015; Louw et al., 2015). Endsley and Kiris (1995) attribute much of the out-of-the-loop performance problem to a loss of situation awareness. When operators take over control of the system, they must acquire information about the state of the system and environment before making an adequate decision and response, which requires time. In the context of conditionally automated driving, the time given to drivers to take over before reaching the system limit is called the time budget, and it is defined as the time between the moment the TOR is issued and the time at which the vehicle would have reached the system limit assuming no driver response (Gold et al., 2014; Happee et al., 2017).
For drivers to get back in the loop, they will require a sufficient time budget to acquire information on the state of the driving environment before making an adequate driving response to any roadway hazards. Eriksson and Stanton (2017) contend that noncritical takeovers will be more common than urgent takeovers, and for noncritical takeovers, the ADS could predict well in advance when a system limit is approaching. For example, if the ODD of an ADS is limited to a divided highway, the ADS will know that the system limit for this trip is the highway exit and can issue a TOR with ample time to take over (Miller et al., 2015; Petermeijer, Bazilinskyy, et al., 2017). However, the ADS will not always be able to predict a system limit far in advance. For example, debris in the road could reflect a system limit if the ODD assumes good roadway conditions, and the ADS may have to rely on its onboard sensors to detect this (e.g., camera, radio detection and ranging [RADAR], light detection and ranging [LIDAR]; Petermeijer, Bazilinskyy, et al., 2017). In these critical takeover situations, prior research has compared the takeover performance of a relatively shorter time budget (e.g., 5 and 6 s) and a relatively longer time budget (e.g., 7 and 8 s) to determine a minimum time budget required for a safe takeover (Gold et al., 2013; Mok et al., 2017; Wandtner et al., 2018). Whereas some studies found that relatively shorter time budgets were sufficient (Mok, Johns, Lee, et al., 2015), other studies found relatively longer time budgets were needed (Mok et al., 2017). More generally, narrative reviews suggested that only relatively longer time budgets are sufficient (Eriksson & Stanton, 2017; Radlmayr & Bengler, 2015).
Engaging in Non-Driving Related Tasks
Another factor that may affect takeover performance is engagement in non-driving related tasks (NDRTs). A recent meta-analysis found that hands-free cell phone use has moderate performance costs on manual driving (Caird et al., 2018). These driving performance costs putatively occur because the driver’s attention is diverted away from the driving task toward the cell phone conversation (Strayer & Johnston, 2001; Strayer et al., 2003). If drivers engage in NDRTs, such as a cell phone conversation, during conditionally automated driving, their attention would be diverted away from the roadway. A narrative review by de Winter et al. (2014) indicated that as level of automation increased, engagement in NDRTs increased, suggesting that drivers are more willing to engage in NDRTs during conditionally automated driving compared to lower levels of driving automation and manual driving. Although drivers are not required to monitor the roadway during conditionally automated driving, engaging in NDRTs may result in drivers becoming more out of the loop compared with not being engaged in NDRTs, because attention to the roadway environment is further reduced while engaging with NDRTs (Gold et al., 2016; Miller et al., 2015). Additionally, if a driver is engaged in an NDRT when a TOR is issued, the driver must switch tasks from the NDRT to manually controlling the vehicle. Research on task switching revealed a switch cost: responses are slower and more erroneous when switching between tasks (Monsell, 2003). This switch cost is attributed to the process of task set reconfiguration, which involves “shifting attention, retrieving task-specific goals and rules, or inhibiting and clearing out a prior task set” (Zeeb et al., 2017, p. 66). Thus, prior research on distracted driving and task switching suggests that drivers engaged with any NDRT will experience performance decrements during takeovers.
Such decrements may be exacerbated when the NDRT has overlapping resource demands with the driving task, according to multiple resource theory (Wickens, 1980, 2008; Wickens & Hollands, 2000, Chapter 11). NDRTs with a visual input modality will overlap more with roadway monitoring (also visual) than nonvisual NDRTs. In other words, drivers must take their eyes off the roadway during conditionally automated driving to engage in a visual NDRT, leading to reduced situation awareness and poorer takeover performance (Radlmayr et al., 2014). Although monitoring the roadway during conditionally automated driving is not required, the driver could choose to do it, nonetheless. Similarly, if NDRTs continue during the takeover, NDRTs with visual input or manual response demands will overlap with the manual driving task, resulting in poorer takeover performance (Wandtner et al., 2018). In short, multiple resource theory predicts that NDRTs with overlapping resource demands will cause more of a takeover performance decrement than NDRTs without these overlapping resources.
Alternatively, engaging in NDRTs may improve takeover performance. According to malleable attentional resources theory, mental underload leads to performance decrements because the operator’s attentional resources shrink in response to low task demands, leaving the operator unable to cope with sudden critical events (Young & Stanton, 2002). Automated driving has been shown to reduce workload (de Winter et al., 2014) and could reduce workload so much that mental underload is induced, resulting in diminished attentional resources. Consequently, the driver is unable to effectively cope with a takeover. However, engaging in NDRTs could counteract mental underload by increasing workload (Clark & Feng, 2017; Miller et al., 2015). Mental underload can also lead to performance decrements when low mental workload induces drowsiness. When a driver is not engaged in NDRTs during conditionally automated driving, they are likely to become drowsy (Johns et al., 2014; Schömig et al., 2015), but when drivers engage in NDRTs, they exhibit significantly fewer signs of drowsiness (Schömig et al., 2015). In short, according to malleable attentional resources theory, engagement in NDRTs during conditionally automated driving should lead to better takeover performance compared with not engaging with NDRTs.
Takeover Request Information Support
Another factor that may affect takeover performance is the level of information provided to the driver by the human–machine interface when a TOR is issued. Parasuraman et al. (2000) proposed a model in which four stages of human information processing—sensory processing, perception/working memory, decision-making, and response selection—have analogous functions in automated systems: information acquisition, information analysis, decision and action selection, and action implementation, respectively. These classes of functions can be automated and replace the human operator to various degrees, resulting in a continuum of levels of automation (Parasuraman et al., 2000). For conditionally automated driving, all four of these classes of functions are automated by the ADS when automation is engaged, but when a TOR is issued, the ADS will only continue implementing actions until the driver takes over or a limit is reached (e.g., time limit; Eriksson et al., 2019). However, the ADS may still provide information to the driver about the other three classes of functions to facilitate information processing during the takeover (Eriksson et al., 2019). For example, the human–machine interface could highlight the cause of the TOR (information acquisition), indicate whether the adjacent lane is open (information analysis), or recommend a specific action such as braking (decision and action selection). In contrast, a lower level of automation only informs the driver of the need to takeover with an auditory alert (Eriksson et al., 2019). The higher levels of information help the driver process the current roadway environment and gain situation awareness (Eriksson et al., 2019; Lorenz et al., 2014). Thus, higher levels of information support should lead to better takeover performance.
Previous Reviews and Meta-Analyses
There are several narrative reviews on the takeover of vehicle control from an ADS (Eriksson & Stanton, 2017; Körber & Bengler, 2014; Radlmayr & Bengler, 2015; de Winter et al., 2014). These narrative reviews have experts in the field read the relevant studies, summarize the findings of the studies, and make conclusions. However, narrative reviews are limited, because they have a lack of transparency about subjective decisions involved (e.g., what studies to include, how findings are weighted while drawing conclusions; Borenstein et al., 2009, Preface), may have difficulty resolving conflicting results (Rosenthal & DiMatteo, 2001), and often focus on statistical significance, which has often misled conclusions that narrative reviewers make (Rosenthal & DiMatteo, 2001). In contrast, a meta-analysis involves an exhaustive search for eligible studies, quantitative procedures that can resolve inconsistent results, a focus on effect sizes instead of statistical significance, and reporting guidelines that emphasize transparency (Borenstein et al., 2009; Rosenthal & DiMatteo, 2001). Thus, the conclusions of a meta-analysis are more accurate and credible than a narrative review (Rosenthal & DiMatteo, 2001).
Only one meta-analysis currently exists on the effects of conditionally automated driving on takeover performance (Zhang et al., 2019). They found that drivers take over more slowly when provided a longer time budget or when drivers were engaged in NDRTs with visual or manual resource demands. Moreover, they found that a directional TOR (i.e., a TOR with information acquisition) did not affect takeover time. However, Zhang et al.’s meta-analysis has a few notable limitations. First, it included studies that examined the takeover of both level 2 and level 3 driving automation systems and collapsed its meta-analysis across these levels of automation. This is problematic because the driver’s responsibilities are different for these levels while automation is engaged. In particular, the driver is responsible for completing the dynamic driving task by monitoring the driving environment during level 2 driving automation, but the driver is not responsible for monitoring the driving environment or any other part of the dynamic driving task during level 3 driving automation (SAE International, 2018). Consequently, the issues the driver must confront during each level of automation may be different (e.g., vigilance decrement for level 2, out-of-the-loop performance problem for level 3), and Zhang et al.’s results may be harder to interpret because a given effect may differ based on level of automation (e.g., large effect for level 2 but a small effect for level 3). Second, Zhang et al.’s meta-analysis only considered takeover time as a measure of takeover performance even though there are many other measures. This is problematic because takeover time does not indicate how well a takeover is performed (i.e., neither a short nor long takeover time is itself indicative of better takeover performance). Only takeover quality can provide information about how well a takeover is performed (Gold et al., 2014). Prior research has emphasized the need to consider both takeover timing and takeover quality measures to obtain a complete view of takeover performance (Zeeb et al., 2016, 2017). Excluding these other measures in a meta-analysis limits conclusions that can be derived. Third, it is unclear how Zhang et al.’s meta-analysis handled multiple effect sizes from the same study. Such effect sizes introduce dependencies that violate the assumptions of traditional meta-analysis models, which can produce invalid estimates (Fisher & Tipton, 2015; Tanner-Smith et al., 2016).
Current Study
The current meta-analysis improves upon the previous meta-analysis by Zhang et al. (2019), by focusing on automated driving in which the driver is not required to monitor the roadway (i.e., level 3) and including as many takeover timing and quality measures as possible. Three important questions are addressed: (1) Does time budget affect takeover performance? (2) Does engaging in NDRTs affect takeover performance? (3) Does providing the driver with information support when the TOR is issued affect takeover performance? There are four hypotheses relating to the research questions:
Method
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines are a highly cited set of guidelines that aim to establish a checklist of essential items to report in systematic reviews and meta-analyses (Moher et al., 2009). This paper is formatted in accordance with these guidelines. No protocol was registered for the current meta-analysis.
Data Sources and Search Strategy
In an initial search, several popular scholarly databases (PsycINFO, PsycArticles, MEDLINE, Web of Science) were queried through October 2017 with the following search terms connected with the OR Boolean operator: “automated driving,” “automated vehicle,” “autonomous driving,” “autonomous vehicle,” “vehicle automation,” and “transition of vehicle control.” For Web of Science, the search results were further narrowed to relevant results by specifying the following categories: ergonomics, psychology applied, behavioral sciences, psychology, psychology experimental, and psychology multidisciplinary. Additionally, targeted journals (The Transportation Research Record: Journal of the Transportation Research Board), conference proceedings (Human Factors and Ergonomics Society; Association for Computing Machinery Special Interest Group on Human–Computer Interaction; International Driving Symposium on Human Factors in Driving Assessment, Training and Vehicle Design; and Transportation Research Board), and government publications (National Highway Traffic Safety Administration; Swedish National Road and Transport Research Institute) were queried with the same search terms to ensure a comprehensive search that included “gray” literature (i.e., original research from government, technical, and research reports that may or may not be peer-reviewed; American Psychological Association, 2020). To be thorough, all the references from the articles that fulfilled the inclusion criteria, regardless of whether they addressed the specific research questions, were considered for inclusion as well. Because of a limited number of relevant articles returned in the initial systematic search, a second systematic search was conducted. The original data sources were queried again with a date restriction of since October 2017 (or only the year 2017 if restricting the search by month was not permitted) through June 2019. Additionally, all the studies included in the meta-analysis by Zhang et al. (2019) were included as an additional source.
Study Selection
In total, 8446 articles were considered for inclusion in the meta-analysis. In the first round, the abstracts from all articles captured by the search were screened using an initial criterion. Specifically, studies were required to examine a transfer of vehicle control of the entire dynamic driving task from an ADS to the human driver following a takeover request. If there was uncertainty as to whether a paper met this initial criterion based on the abstract alone, then that paper was passed on to the next round of review where the full-text was reviewed.
In the second round, the full-text papers were reviewed using an additional set of inclusion criteria. First, nonexperimental research (e.g., surveys, observational studies) were excluded. Second, a study had to have a measure of takeover time or takeover quality following the takeover. Third, papers had to be written in English. Finally, studies were coded into the meta-analysis if they addressed at least one of the research questions. Accordingly, they needed to have at least one of the following manipulations:
A relatively short time budget (operationally defined as 5 s ≤ time budget <7 s) and a relatively long time budget (operationally defined as 7 s ≤ time budget <9 s).
Drivers engaging in NDRTs and drivers not engaging in NDRTs.
Drivers engaging in visual NDRTs and drivers engaging in nonvisual NDRTs.
Drivers engaging in manual NDRTs and drivers engaging in nonmanual NDRTs.
Drivers engaging in visual-manual NDRTs and drivers engaging in nonvisual, nonmanual NDRTs.
A TOR accompanied by little information support (i.e., low information acquisition, low information analysis, and low decision and action selection) and a TOR accompanied by more information support (i.e., high information acquisition, high information analysis, or high decision and action selection).
When multiple articles used the same data, the most complete one was retained unless the subsequent article provided additional analyses, in which case it was attached to the original article as an addendum. The number of articles that progressed through each round of review is illustrated in Figure 2.

Flow chart of the inclusion and exclusion of articles.
Data Extraction and Coding
The current meta-analysis focused on takeover performance, which can be measured by takeover timing or quality measures. Takeover timing assesses the length of time between the start of a TOR and a specified action by the driver (Kerschbaum et al., 2014). Takeover quality, on the other hand, assesses how well a takeover is performed (Gold et al., 2014). It provides information about how much danger is present during the takeover process (Kerschbaum et al., 2014). As can be seen in Table 1, takeover timing and quality can be divided into smaller measurement categories. For studies eligible for inclusion, takeover performance measures shown in Table 1 were coded.
Categories of Takeover Performance Measures Coded in the Meta-Analysis
Note. NDRT = non-driving related tasks. Ego-vehicle refers to the participant’s vehicle.
aAll takeover timing measures start from the beginning of the TOR.
For the studies that were eligible for inclusion in the meta-analysis, various statistics relevant to the comparison of interest (e.g., mean, standard deviation,) were extracted to calculate the effect sizes, raw mean difference (D), and Cohen’s d (Borenstein et al., 2009). Raw mean difference quantifies the effect size, yet retains the original units. This is especially useful for interpretability when the dependent measure has meaningful units. Cohen’s d also quantifies the effect size but on a standardized scale, which becomes useful when one wants to compare two effect sizes that are calculated from measures not using the same scale. For specific takeover timing measures, both raw mean difference and Cohen’s d are reported because takeover timing is always measured in seconds, an interpretable unit of measure. For takeover quality measures, only Cohen’s d is reported because there are many different units, which can only be combined using a standardized effect size. For all measurement categories except for collisions, raw mean difference and Cohen’s d was calculated for each study and then input into the meta-analysis. However, for collisions, the effect size log odds ratio was calculated for each study because either a collision occurs, or it does not (i.e., binary variable). The log odds ratio from each study was then input into the meta-analysis. The output was converted to Cohen’s d in the reported results for comparability.
When the mean and standard deviations were available, they were used to calculate effect size (Caird et al., 2018; Simmons et al., 2017). If not reported in the text or in a table, estimates of mean and standard deviation were extracted from figures using WebPlotDigitizer (Rohatgi, 2019). If needed, the standard deviation was calculated from the standard error of the mean or confidence intervals of the mean (Cochrane Collaboration, 2011a). If the mean and standard deviation were still not available, multiple attempts were made to contact authors so that they can provide the necessary information. All 51 studies reported statistics in the text or in a table that was used to conduct the meta-analysis. We additionally needed to extract statistics from figures for 18 studies and requested information from the authors of 26 studies. Authors provided the requested information in all but two cases.
Data Analysis
Effect sizes were calculated in R using the formulas from Borenstein et al. (2009, Equations 4.2–4.4, 4.6, 4.12–4.14, 4.16, 4.18–4.21, 4.26, 4.28, 4.29, 5.8–5.11, 7.1, 7.2), which took into account experimental design. When there were at least two studies per measurement category, the calculated effect sizes were meta-analyzed using a random-effects model. Many studies that met the inclusion criteria contributed multiple effect sizes to the same measurement category. When this occurs, the effect sizes are not independent because they come from the same sample of participants. To handle these dependent effect sizes, the robust variance estimation method was employed to meta-analyze the data using the robumeta R package (Fisher & Tipton, 2015; Hedges et al., 2010; Tanner-Smith et al., 2016). This method can handle statistically dependent effect sizes in a meta-analysis without the loss of information that occurs with other methods for handling dependent effect sizes (e.g., creating one aggregate effect size per study). Using this method, mean effect sizes, 95% confidence intervals, and 95% credibility intervals were computed. A mean effect size is the weighted mean of the effect sizes from each study (Borenstein et al., 2009). A 95% confidence interval is a measure of the accuracy of the mean effect size such that the population mean effect size falls within the confidence interval in 95% of cases and indicates whether the estimated population mean effect size is significantly different from zero based on whether zero is captured in the confidence interval (Borenstein et al., 2009). A 95% credibility interval is a measure of the dispersion of effect sizes, such that the true effect in a new study will fall within the credibility interval in 95% of cases (Borenstein et al., 2009). Credibility intervals reflect the heterogeneity between studies and indicate if unique or uncoded moderators are present. Wide credibility intervals suggest the presence of moderators (Schmidt & Hunter, 2015).
Results
First, the characteristics of the included studies are described on both an aggregate and an individual study level. Second, the results from each study are meta-analyzed for each comparison in turn to provide a synthesis of results. Third, the presence of publication bias is tested using a meta-regression. Finally, two moderator analyses are conducted that examine whether the results depend on a moderating variable.
Unless noted otherwise, the α value used for all hypothesis tests is 0.05. As permitted by the American Psychological Association (2020) manual, nonsignificant trends are noted and discussed when the p value is between .05 and .1. Such trends describe the pattern of data that is suggestive but not statistically significant at the predefined α value. When the degrees of freedom using the Satterthwaite approximation is less than four for an analysis using robust variance estimation, the distribution of the test statistic no longer approximates the t-distribution, and type I error can be larger than implied (Tanner-Smith et al., 2016). In these cases, a stricter α of .01 is used as suggested by Tanner-Smith et al. (2016). In tandem, the criterion for nonsignificant trends is also stricter in these cases and is only noted and discussed when the p value is between .01 and .05.
Study Characteristics
In total, 48 articles containing 51 experiments were included in the meta-analysis as shown in Table 2. These studies were published between 2013 and 2019 and reported data from 1972 participants ranging in age from 15 to 86 years old (M = 33.59, SD = 14.80). For the studies that reported sex, there were 1139 males and 749 females.
Studies Coded in the Meta-Analysis
Note. Exp. = experiment; F = female; M = male; man. = manual; NDRT = non-driving related task; N = number of participants; vis. = visual.
aSample characteristics include when available, range, mean, and standard deviation of the participants’ age and the number males and females in the sample.
bThe visual and manual operator resource demands of NDRTs were coded using the description of the task(s) employed by each study. NDRTs were coded as visual if they required the driver to look at the NDRT for successful completion; otherwise, NDRTs were coded as nonvisual. NDRTs were coded as manual if they required the use of the driver’s hands for successful completion; otherwise, NDRTs were coded as nonmanual.
Synthesis of Results
The results from each comparison, including the number of effects (k), number of independent samples (s), total participants (N), weighted mean of Cohen’s d (d), weighted mean of raw mean difference (D), 95% confidence intervals, and 95% credibility intervals, are shown in Table 3. To interpret Cohen’s d, values of 0.2, 0.5 and 0.8 are used to reflect small, medium, and large effect sizes, respectively (Cohen, 1988).
Meta-Analysis of the Effects of Time Budget, Non-Driving Related Task, and Information Support on Takeover Timing and Quality Measures
Note. CI = confidence interval for Cohen’s d; CrdI = credibility interval for Cohen’s d; D = weighted mean of raw mean differences; d = weighted mean of Cohen’s ds; k = number of effect sizes; N = total number of participants; s = number of independent samples. The D column for takeover timing is blank because it combines timing measures with different end points; the D column for takeover quality measures is blank because these measures have different units. For all effect sizes except for takeover quality, a positive effect size indicates a larger value for the first condition (i.e., the condition that appears to the left of vs.) compared to the second condition (i.e., the condition that appears to the right of vs.); for takeover quality, a positive effect size indicates poorer takeover quality for the first group compared to the second group. Effect sizes could only be meta-analyzed when there were at least two independent samples (i.e., studies). When there were less than two independent samples that contributed effect sizes for a particular dependent measure, that dependent measure was omitted from the table. Information about how much a given study influenced the results is reported in the supplementary materials online.
aThe degrees of freedom using the Satterthwaite approximation was less than four for these analyses. When this occurs, the distribution of the test statistic no longer approximates the t-distribution, and type I error can be larger than implied (Tanner-Smith et al., 2016). In these cases, a stricter α of .01 is used as suggested by Tanner-Smith et al. (2016).
†p < .10. *p < .05. **p < .01.
Short versus long time budget
Time budget is the time between the moment the TOR is issued and when the vehicle would have reached the system limit assuming no driver response. For the purposes of the current meta-analysis, a shorter time budget was defined as 5–7 s, whereas a longer time budget was defined as 7–9 s. This analysis examines whether these two time budget ranges affect takeover performance. Effect size was calculated as the shorter time budget minus the longer time (and divided by standard deviation for Cohen’s d). Accordingly, a positive effect size indicates a larger value for shorter time budget compared to the larger time budget. None of the measures analyzed were significantly different from zero even when measures were collapsed into the broad categories of takeover timing and takeover quality. With this in mind, the mean effect sizes indicated the following. Time budget had a moderate effect size (d = −0.46) on takeover timing, which is a broad category reported separately from the specific takeover timing measures it includes (gaze reaction time, road fixation time, hands-on time, and takeover time). More specifically, there was a small effect size for hands-on time (d = −0.39, D = −0.22) and moderate effect size for takeover time (d = −0.62, D = −0.42) such that a shorter time budget yielded faster hands-on and takeover times compared to a longer time budget. As we discuss later, faster takeover timing is not necessarily better. A shorter time budget also led to a lot harder braking (d = 3.99) and moderately more collisions (d = 0.49) compared to a longer time budget. With regard to lane positioning, there was a negligible effect (d = −0.07) for time budget. Overall, the broad category takeover quality, which includes minimum distance, braking magnitude, steering magnitude, lane positioning, and collisions, was a little worse (d = 0.17) for a shorter time budget compared to a longer time budget.
NDRT versus no NDRT
Drivers may engage or not engage in NDRTs while automation is active. This analysis examines whether this affects the subsequent takeover performance when the ADS issues a TOR. Effect size was calculated as engaging in NDRTs minus not engaging in NDRTs (and divided by standard deviation for Cohen’s d). Accordingly, a positive effect size indicates a larger value for engaging in NDRTs compared to not engaging in NDRTs. Engagement in NDRTs did have a significant effect on takeover timing (d = 0.38). More precisely, there was a large effect on road fixation time (d = 3.75, D = 0.68), small effect on hands-on time (d = 0.34, D = 0.27), and small effect on takeover time (d = 0.33, D = 0.25) such that engaging in NDRTs led to later road fixation, hands-on, and takeover times compared to not engaging in NDRTs. However, the mean effect sizes for road fixation time and hands-on time were not significantly different from zero. Descriptively, each of the takeover quality measures showed engaging in NDRTs before a takeover led to worse takeover quality compared to not engaging in NDRTs. In particular, minimum distance (d = −0.20), braking magnitude (d = 0.17), and collisions (d = 0.19) all had small effect sizes such that there was a shorter minimum distance, harder braking, and more collisions when drivers engaged in NDRTs compared to when they did not engage in NDRTs, but none of these mean effect sizes were significantly different from zero. Lane positioning was significantly more variable (d = 0.55) when drivers engaged in NDRTs compared to not engaging in NDRTs. Steering magnitude had a negligible effect size (d = 0.06) and was not significantly different from zero. Altogether, engaging in NDRTs before a takeover led to significantly worse takeover quality (d = 0.29) compared to not engaging in NDRTs.
Visual versus nonvisual NDRT
If drivers engage in an NDRT while automation is active, the NDRT could have visual resource demands (e.g., watching a video) or not (e.g., listening to radio). This analysis examines whether this affects the subsequent takeover performance when the ADS issues a TOR. Effect size was calculated as engaging in a visual NDRT minus engaging in a nonvisual NDRT (and divided by standard deviation for Cohen’s d). Accordingly, a positive effect size indicates a larger value for engaging in a visual NDRT compared to engaging in a nonvisual NDRT. Visual resource demands by the NDRT had a significant effect on takeover timing (d = 0.47). Specifically, there was a large, medium, and small effect size for road fixation time (d = 0.83, D = 0.77), hands-on time (d = 0.50, D = 0.29), and takeover time (d = 0.30, D = 0.25), respectively, such that engaging in a visual NDRT led to later road fixation, hands-on, and takeover times compared to a nonvisual NDRT. However, the mean effect size for road fixation time and hands-on time were not significantly different from zero. Minimum distance (d = −0.28), braking magnitude (d = 0.23), and lane positioning (d = 0.22) all had small effect sizes such that there was a shorter minimum distance, harder braking, and more variable lane positioning when engaging in a visual NDRT compared to a nonvisual NDRT, but none of these mean effect sizes were significantly different from zero. There were moderately more collisions (d = 0.54) when engaging in visual NDRTs compared to nonvisual NDRTs, but this was also not significantly different from zero. Steering magnitude (d = 0.13) trended higher when engaging in a visual NDRT compared to a nonvisual NDRT. On the whole, takeover quality (d = 0.14) trended slightly worse when engaging in a visual NDRT compared to a nonvisual NDRT.
Manual versus nonmanual NDRT
If drivers engage in an NDRT while automation is active, the NDRT could have manual resource demands (e.g., playing a smartphone game) or not (e.g., watching a video). This analysis examines whether this affects the subsequent takeover performance when the ADS issues a TOR. Effect size was calculated as engaging in a manual NDRT minus engaging in a nonmanual NDRT (and divided by standard deviation for Cohen’s d). Accordingly, a positive effect size indicates a larger value for engaging in a manual NDRT compared to engaging in a nonmanual NDRT. When the NDRT had manual resource demands, takeover timing (d = 0.24) trended longer compared to a nonmanual NDRT. In particular, there was a small effect on both road fixation time (d = 0.24, D = 0.30) and hands-on time (d = 0.28, D = 0.19) and very small effect on takeover time (d = 0.10, D = 0.11) such that engaging in a manual NDRT led to later road fixation, hands-on, and takeover times compared to a nonmanual NDRT. However, these specific timing measures were not significantly different from zero. Minimum distance (d = −0.17), braking magnitude (d = 0.13), and steering magnitude (d = 0.18) had small to very small effect sizes not significantly different from zero such that there was a shorter minimum distance, harder braking, and larger steering magnitude when engaging in a manual NDRT compared to a nonmanual NDRT. There was a trend of more collisions (d = 0.41) when engaging in a manual NDRT compared to a nonmanual NDRT. Lane positioning (d = 0.04) had a negligible effect size. Altogether, takeover quality (d = 0.12) trended worse when the NDRT had manual resource demands compared to no manual resource demands.
Visual, manual versus nonvisual, nonmanual NDRT
If drivers engage in an NDRT while automation is active, the NDRT could have both visual and manual resource demands (e.g., playing a smartphone game) or neither (e.g., listening to radio). This analysis examines whether this affects the subsequent takeover performance when the ADS issues a TOR. Effect size was calculated as engaging in a visual, manual NDRT minus engaging in a nonvisual, nonmanual NDRT (and divided by standard deviation for Cohen’s d). Accordingly, a positive effect size indicates a larger value for engaging in a visual, manual NDRT compared to engaging in a nonvisual, nonmanual NDRT. When the NDRT had visual and manual resource demands, takeover timing (d = 0.58) was significantly longer compared to when the NDRT did not have either of these resource demands. Specifically, there was a medium and small effect size for hands-on time (d = 0.68, D = 0.38) and takeover time (d = 0.38, D = 0.34), respectively, such that engaging in a visual, manual NDRT led to longer hands-on and takeover times compared to a nonvisual, nonmanual NDRT. However, hands-on time was not significantly different from zero. Minimum distance (d = −0.36) and steering magnitude (d = 0.16) had small effect sizes such that the minimum distance trended smaller and steering magnitude trended larger when engaging in a visual, manual NDRT compared to a nonvisual, nonmanual NDRT. Braking magnitude (d = 0.23), lane positioning (d = 0.38), and collisions (d = 0.56) had small to medium effect sizes not significantly different from zero such that braking was harder, lane positioning was more variable, and there were more collisions when engaging in a visual, manual NDRT compared to a nonvisual, nonmanual NDRT. Overall, the takeover quality (d = 0.17) was descriptively worse when engaging in a visual, manual NDRT compared to a nonvisual, nonmanual NDRT, but it was not significantly different from zero.
Less versus more information support
To facilitate the transition of vehicle control, the ADS could provide more information to the driver in addition to the need to takeover. For instance, it could highlight the cause of the TOR, indicate whether the adjacent lane is open, or recommend a specific action. This analysis examines whether these forms of higher information support during the takeover affect takeover performance. Effect size was calculated as less information support minus more information support (and divided by standard deviation for Cohen’s d). Accordingly, a positive effect size indicates a larger value for less information support compared to more information support. None of the measures analyzed were significantly different from zero even when measures were collapsed to the broad categories takeover timing and takeover quality. Taking this into account, the mean effect sizes indicated the following. There were small effect sizes for both steering magnitude (d = 0.26) and collisions (d = 0.36) such that there was a greater steering magnitude and more collisions when drivers were provided with less information support compared to more information support. There was a very small effect on lane positioning (d = −0.14) such that lane positioning was less variable when drivers were provided with less information support compared to more information support. The effect sizes for takeover time (d = 0.05, D = −0.38), takeover timing (d = 0.05), minimum distance (d = 0.00), braking magnitude (d = 0.04), and takeover quality (d = 0.03) were negligible. These findings suggest information support did not have an effect on takeover performance.
Publication Bias
Publication bias is the phenomenon where studies with statistically significant results are more likely to be published than studies that do not have statistically significant results (Borenstein et al., 2009). Although publication bias in the current meta-analysis was minimized using thorough search methods, its presence was still checked by testing whether sample size (N) could predict effect size (Cohen’s d) using meta-regression within the robust variance estimation framework (Hedges et al., 2010; Tanner-Smith et al., 2016). Given small studies are more vulnerable to not being published because they are less likely to reach statistical significance, a significantly negative relationship is indicative of publication bias because it suggests there are studies with small sample sizes and small effect sizes missing from the published body of literature. This method requires at least 10 studies for a given comparison (Cochrane Collaboration, 2011b). Accordingly, only comparisons with at least 10 studies were assessed using this method. As seen in Table 4, two-tailed t-tests showed the coefficient (b) for sample size was not significantly different from zero in 10 out of the 11 analyzes, meaning sample size did not significantly predict effect size. However, there was a negative trend for takeover quality for the less versus more information support comparison. Altogether, there was little evidence that publication bias was present in the current meta-analysis.
Meta-Regression With Sample Size as Predictor
Note. NDRT = non-driving related tasks.
aThe degrees of freedom using the Satterthwaite approximation was less than four for these analyses. When this occurs, the distribution of the test statistic no longer approximates the t-distribution, and type I error can be larger than implied (Tanner-Smith et al., 2016). In these cases, a stricter α of .01 is used as suggested by Tanner-Smith et al. (2016).
Moderator Analysis
Many of the credibility intervals in Table 3 are wide, indicating there are moderators present (Schmidt & Hunter, 2015). Moderators are third variables that affect the relationship between an independent variable and a dependent variable. For example, Hypothesis 2.2 predicts that the relationship between NDRT engagement and takeover performance is affected by NDRT resource demands. Specifically, Hypothesis 2.2 predicts that when the NDRT has overlapping resource demands with the driving task, takeover performance will be worse compared with when the NDRT has no overlapping resource demands. This relationship is tested below. At least 10 studies are needed to test for moderators using meta-regression (Cochrane Collaboration, 2011b), which limited the moderator analyses that could be performed.
For the comparison of engaging in NDRTs versus not engaging in NDRTs, meta-regression was used to test whether effect size (Cohen’s d) was moderated by overlapping resource demands with the driving task (i.e., visual or manual resource demands) for takeover timing, takeover time, and takeover quality. To create the meta-regression model, overlapping resource demands were dummy coded such that NDRTs without overlapping resource demands were coded as the reference category (i.e., the intercept). As shown in Table 5, when the NDRT had overlapping resource demands the effect size (Cohen’s d) trended 0.50 larger for takeover timing and 0.38 larger for takeover time compared with when the NDRT had nonoverlapping resource demands. Takeover quality was not significantly different between conditions.
Meta-Regression for the Effect of Overlapping Resource Demands
Note. CI = confidence interval for Cohen’s d; CrdI = credibility interval for Cohen’s d; D = weighted mean of raw mean differences; d = weighted mean of Cohen’s ds; k = number of effect sizes; N = total number of participants; s = number of independent samples. The D column for takeover timing is blank because it combines timing measures with different end points; the D column for takeover quality measures is blank because these measures have different units. For all effect sizes except for takeover quality, a positive effect size indicates a larger value for the first condition (i.e., the condition that appears to the left of vs.) compared to the second condition (i.e., the condition that appears to the right of vs.); for takeover quality, a positive effect size indicates poorer takeover quality for the first group compared to the second group. The intercept reflects nonoverlapping resource demands (reference category).
a The degrees of freedom using the Satterthwaite approximation was less than four for these analyses. When this occurs, the distribution of the test statistic no longer approximates the t-distribution, and type I error can be larger than implied (Tanner-Smith et al., 2016). In these cases, a stricter α of .01 is used as suggested by Tanner-Smith et al. (2016).
† p < .10. *p < .05. **p < .01.
For the comparison of less versus more information support, meta-regression was used to test whether the effect size (Cohen’s d) was moderated by level of higher information support (high information acquisition, information analysis, and decision selection) for takeover timing, takeover time, and takeover quality. To create the meta-regression model, high information acquisition was coded as the reference category (i.e., the intercept), and information analysis and decision selection each had their own dummy variable. As shown in Table 6, high information acquisition was not significantly different from information analysis or decision selection in terms of takeover timing, takeover time, or takeover quality.
Meta-Regression for the Effect of Level of Information Support
Note. CI = confidence interval for Cohen’s d; CrdI = credibility interval for Cohen’s d; D = weighted mean of raw mean differences; d = weighted mean of Cohen’s ds; k = number of effect sizes; N = total number of participants; s = number of independent samples. The D column for takeover timing is blank because it combines timing measures with different end points; the D column for takeover quality measures is blank because these measures have different units. For all effect sizes except for takeover quality, a positive effect size indicates a larger value for the first condition (i.e., the condition that appears to the left of vs.) compared to the second condition (i.e., the condition that appears to the right of vs.); for takeover quality, a positive effect size indicates poorer takeover quality for the first group compared to the second group. The intercept reflects high information acquisition (reference category).
aThe degrees of freedom using the Satterthwaite approximation was less than four for these analyses. When this occurs, the distribution of the test statistic no longer approximates the t-distribution, and type I error can be larger than implied (Tanner-Smith et al., 2016). In these cases, a stricter α of .01 is used as suggested by Tanner-Smith et al. (2016).
†p < .10. *p < .05. **p < .01.
Discussion
The current meta-analysis focused on three research questions that addressed time budget, NDRTs, and information support at takeover. The discussion is organized by research question.
Research Question 1: Does Time Budget Affect Takeover Performance?
Overall, the results from each of the measures for the time budget comparison suffered from a limited number of studies There were three independent samples or fewer contributing to most measures. This could explain why none of the mean effect sizes were significantly different from zero. Descriptively, the mean effect size for takeover timing was faster when drivers had to take over with the shorter time budget compared to the longer time budget, and the mean effect size for takeover quality was a little worse when the time budget was shorter compared to longer, though neither was significantly different from zero. The meta- analysis by Zhang et al. (2019) also found drivers took over faster when given a shorter time budget compared a longer time budget. Gold et al. (2016) suggest that longer takeover times are indicative of better takeover performance because drivers take the time to regain situation awareness before taking over. The pattern of mean effect sizes from the current meta-analysis tentatively supports this argument and Hypothesis 1 because when drivers were given extra time to take over, they tended to take over more slowly, which corresponded with better takeover quality. These results appear to reflect a speed-accuracy tradeoff such that faster takeovers can only be achieved if takeover quality is sacrificed. This support is tenuous, though, because the results of this meta-analysis did not eliminate the possibility that time budget had no effect on takeover performance. Additionally, our conclusions are specific to the time budgets analyzed (5–9 s) and may not hold for time budgets beyond this range.
Research Question 2: Does Engaging in NDRTs Affect Takeover Performance?
The NDRT versus no NDRT results showed takeover timing was longer and takeover quality was worse when drivers engaged in NDRTs compared to not engaging in NDRTs. In particular, drivers took over vehicle control a little slower and had moderately more variability in lane positioning when engaging in NDRTs. Unlike the first research question results, a longer takeover time coincides with worse takeover quality. This finding suggests longer takeover times are not always better, and takeover time has only limited utility in assessing takeover performance. One theoretical explanation for this is that the extra time the driver took to take over was not used to regain situation awareness and instead was used to switch tasks, which may involve putting away the NDRT, shifting attention, and retrieving task-specific goals. Drivers may also become more out of the loop when engaging in NDRTs and consequently take longer to regain situation awareness. This could be tested in future research by querying drivers’ situation awareness (Durso & Dattel, 2004; Endsley, 1988) or using eye-tracking based methods to measure situation awareness (Samuel et al., 2016). Thus, even though the driver is engaging in the NDRT while they are not responsible for the driving task, engaging in NDRTs does impair takeover performance, supporting Hypothesis 2.1. This means that even if drivers were trying to optimize takeover performance by shedding NDRTs, they did not do so effectively because they still experienced a takeover performance decrement. The subsequent results show the degree of the takeover performance decrement depends on whether the NDRT has overlapping resource demands with the driving task.
The visual versus nonvisual NDRT results and the visual, manual versus nonvisual, nonmanual NDRT results showed that takeover timing was moderately longer and takeover time was a little longer when drivers engaged in NDRTs with overlapping resource demands compared to NDRTs without overlapping resource demands. Takeover timing for the manual versus nonmanual NDRT comparison was a little longer when drivers engaged in NDRTs with overlapping resource demands compared to NDRTs without overlapping resource demands as well, but it only trended in that direction. The moderator analysis also showed a trend that takeover timing was moderately longer, and takeover time was a little longer when drivers engaged in NDRTs with overlapping resource demands compared to NDRTs without overlapping resource demands. Thus, drivers take longer to take over when NDRTs has overlapping resource demands with the driving task.
Specific takeover quality measures and takeover quality broadly did not reach significance for any of the three analyses that examine the effect of overlapping resource demand, but several measures were trending in one direction. In particular, steering magnitude was slightly greater for visual NDRTs compared to nonvisual NDRTs, collisions were slightly greater for manual NDRTs compared to nonmanual NDRTs, minimum distance was slightly shorter for visual-manual NDRTs compared to nonvisual, nonmanual NDRTs, and steering magnitude was slightly greater for visual-manual NDRTs compared to nonvisual, nonmanual NDRTs. Additionally, the visual versus nonvisual NDRT and manual versus nonmanual NDRT analyses showed takeover quality broadly was slightly worse when the NDRT had overlapping resource demands compared to NDRTs without overlapping resource demands. These findings favor the conclusion that engaging in NDRTs with overlapping resource demands with the driving task leads to worse takeover performance as predicted by Wickens’ multiple resource theory, supporting Hypothesis 2.2.
Implementing laws that restrict NDRT use during level 3 automated driving is one practical implication that might be drawn from these results. However, implementing such laws would undermine one of the objectives of level 3 automated vehicles, which is to free the driver from the driving task so that they can complete other tasks. Moreover, drivers may choose to ignore such laws and engage in NDRTs anyway. A better approach would be to ensure drivers have enough time to get back in the loop even if they are engaging in a visual-manual NDRT. This follows a user-centered design approach in which the system is designed around the user. Such an approach should be recommended by standards organizations such as SAE International (2018), which currently gives little guidance on how to determine how much time to give drivers.
One way to ensure drivers have enough time to get back in the loop is investing in the development and deployment of technologies that increase an ADS’s ability to predict system limits (Zhang et al., 2019). For example, implementing vehicle-to-everything (V2X; Chen et al., 2017; Harding et al., 2014) services would allow vehicles to wirelessly communicate with each other and roadway infrastructure about roadway conditions ahead that are not yet in sight (e.g., collisions, construction), and a level 3 automated vehicle or above could use this information to determine whether a system limit is approaching. For level 4 automated vehicles, automobile manufacturers could incorporate technologies to monitor the driver’s preparedness to take over vehicle control, taking into account engagement in NDRTs and their resource demands, and dynamically decide whether to issue a TOR or implement a minimal risk maneuver based on which poses less risk. For instance, if the ADS can no longer handle the driving task, having a driver who is listening to a podcast (which has no overlapping resource demands with the driving task) take over vehicle control in the middle of a multi-lane highway may pose less risk than having the ADS bring the vehicle to a stop in the middle of a multi-lane highway, which poses considerable risk.
Research Question 3: Does Providing the Driver With Information Support Affect Takeover Performance?
Overall, there was a lack of evidence that information support had an effect on takeover performance. None of the measures analyzed were significantly different from zero even when measures were collapsed into the broad categories takeover timing and takeover quality. Additionally, the mean effect sizes for the broad categories takeover timing and takeover quality were negligible. Thus, no support for Hypothesis 3 was found. In line with the current findings, Zhang et al. (2019) found information support had no effect on takeover time, though they examined a narrower question about whether TORs that indicate the direction of a hazard affect takeover time. They acknowledged that this could have been because information support only affects takeover quality, which they did not meta-analyze, but the current results show no support for this hypothesis.
It is possible that not all levels of information support assist the driver with processing information and coming back into the loop. In other words, it is possible that recommending a specific action such as braking (decision selection) supports the driver, but highlighting the cause of the TOR (high information acquisition) and indicating whether the adjacent lane is open (information analysis) does not support the driver. A moderator analysis was conducted to test this, and it found no significant differences between levels of information support. Interestingly, the two studies that compared decision selection to low information acquisition and measured takeover quality had conflicting results (Borojeni et al., 2017, 2018). Thus, it is unclear whether instructing the driver to initiate a specific action at takeover improves takeover performance. This situation in which the ADS instructs the driver to perform a specific action at takeover is paradoxical, though, because if the ADS knows the appropriate maneuver to perform, it is unclear why it would not perform the maneuver itself.
Another point to consider is the possibility that the specific implementations of information support that prior research has examined were inadequate, yet some other unexamined implementation of information support would improve takeover performance. The current results cannot resolve this issue and can only suggest that specific implementations of information support examined by prior research had no effect on takeover performance. Future research should explore this issue further before definitive conclusions are made.
Limitations
It is important to acknowledge limitations of this meta-analysis. There were a limited number of independent samples that contributed to some measures of some of the comparisons. The implication of this limitation is reduced power to detect real effects, reduced utility of the credibility intervals, and higher Type I error rate when the degrees of freedom dropped below four. One reason there were limited independent samples is that this is a new area of research. The earliest included study was published in 2013, and the most recently included study was published in 2019. Moreover, 94% of the included studies were published in 2015 or after, meaning only 5 years of research made up the overwhelming proportion of studies meta-analyzed. Another reason there were a limited number of independent samples for many of the measures is that some studies did not collect a particular measure. In some cases, this may be acceptable (e.g., collecting other takeover timing measures when takeover time is collected). However, some studies failed to collect any takeover quality measure, which is a limitation of these studies. Lastly, some studies did not adequately report summary statistics about a particular measure or provide them after being contacted.
Another limitation is that there are likely undetected moderators present given the width of the credibility intervals. Unfortunately, they could not be tested because many of the comparisons had fewer than 10 studies. Other driving meta-analyses often test whether research setting (e.g., simulator, on road) is a moderator (Caird et al., 2018; Simmons et al., 2017), but this was not possible in the current paper because only one study conducted an on-road study (Naujoks et al., 2019). These other meta-analyses found research setting had no effect on effect size. The fidelity of a driving simulator could also be a moderator, but this would be difficult to test due to the lack of details reported about the driving simulators used by the included studies. Two of the most common details reported were ability to emulate motion (28 fixed-base; 16 motion-base simulators) and horizontal field-of-view (M = 216.22°, Mdn = 180°, SD = 77.40°, n = 37).
Also, the current meta-analysis focused on mean takeover performance, but the edges of the distribution may be more pertinent to certain research questions (Eriksson & Stanton, 2017). For example, to establish a minimum time budget drivers need to takeover, even the near-worst (e.g., 5th percentile) performing drivers should be able to take over safely. We were unable to meta-analyze the edges of the takeover performance distribution because only a few of the included studies reported statistics (Happee et al., 2017; Naujoks et al., 2019; Wandtner, 2018). Future research should consider reporting statistics on the edge of the distribution.
In addition, we only examined papers written in English. Although many authors published their papers in English even when the official language of their country was not English (e.g., Borojeni et al., 2016, 2017, 2018; Gold et al., 2013, 2015, 2016; Wandtner, 2018), there were still some papers that were not published in English. For example, Zhang et al. (2019) found several German papers on takeovers.
Another limitation is that the included studies instructed participants to engage in a particular NDRT instead of allowing them to voluntary engage in NDRTs. This was done to maintain experimental control, but it assumes effects on takeover performance from forced interaction is a reasonable predictor of the effects of takeover performance from actual engagement. Only limited research has addressed this assumption (Clark & Feng, 2017).
Conclusion
The current meta-analysis of 48 studies synthesized the research on the takeover of vehicle control following automated driving and specifically addressed research questions regarding time budget, NDRTs, and information support at takeover. Descriptively, the analysis of time budget showed takeover timing was faster and takeover quality was worse for the shorter time budget compared to the longer time budget. This pattern of results suggest that takeover performance was descriptively better when given a longer time budget. Engaging in NDRTs resulted in worse takeover performance, especially if the NDRTs had overlapping resource demands with the driving task. Finally, implementations of information support examined by prior research did not improve takeover performance.
Future Research
Future research and implementation should focus on ways of providing the driver more time to get back in the loop and explore alternative implementations of information support. Future research should also further explore whether forced NDRT engagement is reasonably predictive of voluntary NDRT engagement. Finally, researchers should measure takeover quality in addition to takeover timing and report summary statistics about their data to assist the reader and meta-analysts in understanding their results.
Key Points
Engaging in non-driving related tasks while automation is active and prior to a takeover degrades takeover performance.
Weak evidence showed takeover performance was better when given a longer time budget.
The current results suggest that current implementations of providing information to the driver about roadway hazards or actions to take during the takeover did not affect takeover performance, but this needs to be explored in future research before definitive conclusions can be made.
Ensuring the driver has long enough to get back in the loop should be the focus of future research and application.
Footnotes
Acknowledgments
This article is based on the master’s thesis completed by Bradley W. Weaver. We thank the thesis committee members, Frederick L. Oswald and Philip Kortum, for their feedback on earlier drafts. We thank Frederick L. Oswald, Scott B. Morris, and Elizabeth Tipton for their input on the meta-analytic methods. We also thank Rosemary Yang and Sai Lammata for helping pull and organize the references from the articles.
Author Biographies
Bradley W. Weaver is a PhD student in the Human Factors and Human–Computer Interaction Program at Rice University. He obtained his MA in psychological sciences at Rice University in 2020.
Patricia R. DeLucia is a professor in the Department of Psychological Sciences at Rice University. She is a Fellow of the American Psychological Association, Association for Psychological Science, Human Factors and Ergonomics Society, and Psychonomic Society. She obtained her PhD in experimental psychology at Columbia University in 1989.
