Abstract
Objective
The present study examines the effect of an existing driver training program, FOrward Concentration and Attention Learning (FOCAL) on young drivers’ calibration, drivers’ ability to estimate the length of their in-vehicle glances while driving, using two different measures, normalized difference scores and Brier Scores.
Background
Young drivers are poor at maintaining attention to the forward roadway while driving a vehicle. Additionally, drivers may overestimate their attention maintenance abilities. Driver training programs such as FOCAL may train target skills such as attention maintenance but also might serve as a promising way to reduce errors in drivers’ calibration of their self-perceived attention maintenance behaviors in comparison to their actual performance.
Method
Thirty-six participants completed either FOCAL or a Placebo training program, immediately followed by driving simulator evaluations of their attention maintenance performance. In the evaluation drive, participants navigated four driving simulator scenarios during which their eyes were tracked. In each scenario, participants performed a map task on a tablet simulating an in-vehicle infotainment system.
Results
FOCAL-trained drivers maintained their attention to the forward roadway more and reported better calibration using the normalized difference measure than Placebo-trained drivers. However, the Brier scores did not distinguish the two groups on their calibration.
Conclusion
The study implies that FOCAL has the potential to improve not only attention maintenance skills but also calibration of the skills for young drivers.
Application
Driver training programs may be designed to train not only targeted higher cognitive skills but also driver calibration—both critical for driving safety in young drivers.
Introduction
Young novice drivers face a disproportionally high fatal crash risk than experienced drivers. Young drivers aged 16–19 had fatal vehicular crash rates per 100 million miles driven that were five times higher than experienced drivers aged 30–70 (Insurance Institute for Highway safety, 2018). McKnight and McKnight (2003) reported that over 65% of crashes involving young drivers were attributed to cognitive skills, such as maintaining attention and poor visual search ahead, to the side, and to the rear, while risky behaviors such as high speed, following distance, and overtaking contributed only 7.6%. Recent driver training research focused on such cognitive factors including latent hazard anticipation (Pradhan et al., 2005; Unverricht et al., 2018b), latent hazard mitigation (Muttart et al., 2014), and attention maintenance (Chan et al., 2010; Divekar et al., 2013; Pradhan et al., 2009; Yamani et al., 2016). As yet, relatively underexplored are relationships between cognitive performance in these tasks and drivers’ perceived ability to perform the cognitive tasks, or driver calibration, both critical for young drivers’ road safety. The present driving simulator experiment aims to bridge this gap in the literature by examining the effect of an existing attention maintenance training program for young drivers on their performance and calibration.
Driver Calibration
Calibration is the difference between a subjective appraisal and an objective measure of the ability of interest (Horrey et al., 2015; Roberts et al., 2016). The smaller the difference between the subjective appraisal of one’s ability and the objective measure of their actual ability, the better calibrated an individual is. Appropriate calibration is an important aspect to safe driving (Kuiken & Twisk, 2001). Theories of demand regulation such as the task compatibility and interface model (TACM) state that drivers adjust their behaviors to balance driving demands with their self-assessed abilities (De Craen, 2010; Fuller, 2005). To engage in a successful and safe drive, drivers must be able to not only regulate their task demands with their own abilities but also have an accurate estimate of both. Errors in calibration can result in performing strenuous maneuvers or failing to mitigate increased driving demands that surpass their ability, increasing crash risk (Deery, 1999).
The literature suggests that drivers overwhelmingly and consistently overestimate their own driving abilities (Amado et al., 2014; Freund et al., 2005; Horrey et al., 2015). For instance, an on-road study evaluating over 150 drivers found that roughly 95% of the drivers believe their own abilities to be better than their actual performance (Amado et al., 2014). Moreover, another study asked 181 drivers to self-appraise their own driving performance and found that they rated themselves higher than both their peers and the average driver across 18 different components of driving including overall skill, overall safety, and accident likelihood (Horswill et al., 2004).
Young drivers overestimate their driving abilities to a greater extent than experienced drivers (De Craen, 2010; Gregersen, 1996; Horrey et al., 2015; Matthews & Moran, 1986). Additionally, a longitudinal study found that drivers’ calibration did not improve during the first 2 years of their driving, suggesting that they continued to overestimate their abilities even with 2 years of driving experience (De Craen, 2010). This miscalibration is particularly dangerous for young drivers, as overestimation of driving skills is thought to be correlated with a high crash risk of young drivers (Gregersen, 1996; Matthews & Moran, 1986).
One potential reason why calibration might lead to increased crash risk is young/inexperienced drivers’ poor abilities to estimate their own performance. The driver calibration framework (DCF) postulates the stream of information that a driver processes from selection, processing, integration, to response execution impacts their perception of the state of the world and driver’s current performance (Horrey et al., 2015). A key factor in the DCF is that feedback is essential for accurately assessing both the state of the world and the driver’s current performance, both necessary constituents for good calibration. Young drivers, who do not have a rich background of experience or precise feedback, might have underdeveloped skills and perceptions of those skills. However, one positive facet from the DCF is that limitations in performance or perceived performance can be improved through training and feedback.
Attention Maintenance and Training
One higher cognitive skill that is critical for safe driving is attention maintenance. Attention maintenance is the ability to maintain visual attention to the immediate forward roadway while controlling a vehicle. A 100-car naturalistic study showed increased crash risks when young drivers look away from the forward roadway more than 2 s, measured during a 6-s window that began 5 s before a near crash/crash and continue 1 s after the crash/near crash (Klauer et al., 2006). Several driving simulator studies repeatedly showed that young drivers are especially poor at maintaining attention to the forward roadway while engaging in secondary in-vehicle tasks (Chan et al., 2010; Divekar et al., 2013; Pradhan et al., 2009; Yamani et al., 2016). In one study, for example, novice drivers made approximately 17% more in-vehicle glances longer than 2 s compared with experienced drivers, illustrating that young novice drivers are especially poor at maintaining their attention to the forward roadway (Pradhan et al., 2009).
FOrward Concentration and Attention Learning (FOCAL; Divekar et al., 2013; Pradhan et al., 2009, 2011) is a PC-based training program shown effective at training a driver to limit each in-vehicle glance to less than 2 s while controlling the vehicle. The training requires the driver to find a street on a map while viewing a series of video clips simulating the forward roadway during driving. The trainee must toggle between viewing the map or the forward roadway by pressing the space bar on the computer keyboard. Gradually, the trainee must limit the duration of “glances” at the map to less than 2 s. The training program features an error-based feedback training mechanism by allowing the user to make a mistake, mitigate the mistake through practice, and then master the target skills (3M approach). Such feedback may influence how drivers perceive and weigh information when making subjective appraisals of skill and performance within the DCF (Horrey et al., 2015).
FOCAL is shown effective at decreasing the proportion and number of off-road glances longer than different threshold values including 2.5 s, 2 s, and 1.5 s (Divekar et al., 2013; Pradhan et al., 2011). For example, a driving simulator experiment found that FOCAL-trained drivers produced 21% reduction in in-vehicle glances longer than 2 s compared with the control group (Divekar et al., 2013). Likewise, Pradhan et al. (2011) performed an on-road evaluation of FOCAL and found FOCAL-trained drivers executed roughly 18% fewer in-vehicle glances longer than 2.5 s in comparison with the control group. Further, Divekar et al. (2016) showed that the training effect in reducing long in-vehicle glances persisted for up to 4 months after treatment.
However, driver training programs can cause a driver to become overconfident in their abilities, conversely decreasing their safety (Mayhew & Simpson, 2002). Previous research demonstrating FOCAL’s effectiveness cannot eliminate the possibility that the training might cause drivers to overestimate or underestimate the percentage of especially long in-vehicle glances they make. Though drivers may be learning to limit their percentage of glances over the 2-s threshold, they might not be aware of how to recognize when they have successfully limited the percentage of excessively long glances. Thus, FOCAL-trained drivers who underestimate the percentage of excessively long glances they take are at risk for reverting back to baseline performance prior to the training and being unaware that they are doing so. However, FOCAL uses an error-feedback mechanism allowing drivers to make mistakes and correct them with specific feedback. Due to the copious amounts of feedback, the DCF supports that FOCAL should improve not only their limiting of in-vehicle glances over a threshold but also the perception of their ability to limit their excessively long in-vehicle glances. The current study investigates whether FOCAL trains drivers to both learn how to take shorter glances and how to recognize when their glance exceeds the 2-s threshold.
Current Study
In the current study, thirty-six participants were randomly assigned to receive either a FOCAL or Placebo training program. Following the completion of the assigned program, they drove through four scenarios in a medium fidelity driving simulator with their eyes tracked. In each scenario, they performed a mock map task on a tablet simulating an infotainment in-vehicle system (Unverricht et al., 2019a). As measures of driver calibration, normalized difference and the Brier score measures were used, following Roberts et al. (2016). After each drive, participants completed a questionnaire that assessed Brier scores. Once all drives were completed, the participants completed a final questionnaire that was used to calculate the normalized difference scores of driver calibration. We hypothesized that the proportion of off-road glances longer than 2 s would be lower for the FOCAL-trained drivers than the control-trained drivers. Additionally, we hypothesized that FOCAL-trained drivers would be better calibrated, with their subjective perceptions of their attention maintenance behaviors matching closely to their objective performance, than the Placebo-trained drivers on both the normalized difference and the Brier score measures of calibration.
Method
Participants
Thirty-six undergraduate drivers between 18 and 21 years old were recruited from the community of Old Dominion University (ODU), Norfolk, VA. Eighteen drivers were randomly assigned to the FOCAL group (14 females, Mage = 18.47 years, SD = .78; mean years since licensure = 2.38 years, SD = .76) and 18 drivers to the Placebo group (16 females, Mage = 18.88 years, SD = .79; mean years since licensure = 2.31 years, SD = 1.51). All drivers held a valid driver’s license and received research credits for participation. This research complied with the tenets of the Declaration of Helsinki and was approved by the Institutional Review Board at ODU. Informed consent was obtained from each participant.
Apparatus and Materials
Driving simulator
A fixed-base medium fidelity driving simulator (Real-time Technologies, Inc.) was used for the experiment. The simulator system consists of a built-up cabin, three 60” screens, and a dashboard screen with 5.1 surround speaker system. The cab provides an adjustable seat for the driver, a steering wheel, pedals, seat belt, and gear shift. Each display projects a driving image with a resolution of 1024 × 768 pixels and generated at 120 Hz. The distance between the driver and center screen was approximately 145 cm, resulting in a forward field of view of approximately 145°. Due to technical difficulties, no driving performance data were recorded.
Eye tracker
To record participants’ eye movements, a head-mounted ASL Mobile Eye (Applied Science Laboratories, Inc.) was used. The eye tracker consists of a spectacle-mounted unit (SMU) and a monocle. The SMU consists of two cameras, one that records the external scene image and the other that emits an infrared light source to the monocle reflecting the light into the eye by a set of LEDs. The eye camera tracks the eye movements by tracking a series of vectors produced by the relationship between the positions of the pupil and the corneal reflections of the LED lights. Eye Vision software was used to superimpose a crosshair indicating the driver’s gaze to the scene image.
Calibration questionnaires
Two questionnaires were used to measure participants’ calibration of their attention maintenance behaviors. Both were modeled after the NASA-Task Load Index (NASA-TLX; Hart & Staveland, 1988) and those used in Roberts et al. (2016). The first calibration questionnaire includes the following standard subscales from the NASA-TLX: mental demand, physical demand, temporal demand, frustration level, and effort, with the performance subscale further divided to offer more resolution in the specific tasks performed in the study. We asked participants to respond to these questions by marking a vertical line along a 10-cm ruler with low and high anchors. To measure calibration, participants were asked a single-item measure: “Please rate your performance on keeping your eyes on the forward roadway by limiting in-vehicle glances to less than 2 s during the Waze task.” This item in the questionnaire was used to compute the normalized difference scores between performance and self-appraisal (see Appendix A for the first calibration questionnaire).
The second questionnaire consisted of eight items allowing participants to self-appraise their performance across four metrics: attention maintenance, task performance, speed control, and lane positioning. Response options were the same as the first questionnaire. Participants responded to the questions by placing a mark along a continuum with low and high anchors. To measure calibration of attention maintenance, participants were asked a single-item measure: “Rate your performance on keeping your eyes to the forward roadway (Limiting in-vehicle glances to less than 2 s) during the Waze task,” followed by rating their confidence in their decision. This item in the questionnaire was used to compute the Brier score (see Appendix B for the second calibration questionnaire).
Map Task
Participants used a navigation application (Waze Mobile) via a Samsung Galaxy Tab E lite (Samsung Electronics America, Inc.) to report the distance between ODU and a target location. The tablet was placed approximately 57° from the driver’s line of sight. After the participants successfully entered the name of the target location, the distance found via the application appeared on the tablet display and remained present until the drive has ended. This task reflected those used in previous attention maintenance experiments (Bıçaksız et al., 2017; c.f. Yamani et al., 2016). Each drive began with an auditory instruction asking them to find the target location followed by a beep indicating the beginning of the 15-s trial. Participants were asked to manually navigate the Waze application and verbally report the distance in miles between their current position at the university and the target location. Each trial took place approximately halfway through the drive (3,280 feet) on a straight road with no distractions or dynamic objects. Four target locations were used with varying distances away from ODU. After 15 s had passed, the simulator’s speaker system sounded another auditory beep indicating the end of the trial and to stop performing the task. If the participant did not report the correct distance by the second beep, their answer was coded as incorrect. An experimenter manually recorded the participant’s verbal response each trial.
Driving Scenarios
Participants drove through four virtual environments, mirroring those used in previous work (Hamid et al., 2014, Yamani et al., 2016, 2018). All four environments (highway, residential, rural, and town) were 8,530 feet in length. Participants were instructed to drive following all normal traffic laws such as not exceeding the posted speed limits and stopping at all traffic lights. They were instructed to remain in their starting lane unless directed otherwise. To navigate through a scenario, the driver would have full control of the vehicle functions and drive at the posted speed limit signs. There was no ambient traffic. Speed limits varied between either 35 mph or 45 mph and featured a variety of different environmental configurations, as shown in Figure 1.

Top left: residential scenario. Top right: town scenario. Bottom left: rural scenario. Bottom right: highway scenario.
Training Programs
Focal
FOCAL is a computer-based training program created to train novice drivers to reduce the number of off-road glances longer than 2 s. FOCAL training took approximately 45 min to complete. Trainees were allowed to alternate using the space bar between two halves of the screen. The top half was a video representing the forward roadway and the bottom half a map. Progressively, trainees can make a mistake (look down for longer than 2 s), mitigate that mistake (learn how dangerous looking away for more than 2 s is), and master the target skill (practice until they limit all off-road glances to less than 2 s). For a full description of the training, see Pradhan et al. (2011).
Placebo
The Placebo program consists of information from the Virginia Driver’s Manual unrelated to attention maintenance (Sections 1, 4, and 5; https://www.dmv.virginia.gov/webdoc/pdf/dmv39.pdf). Trainees viewed PowerPoint slides and answered 10 multiple-choice questions at the end. The Placebo program took approximately 45 min to complete.
Procedure
All participants provided informed consent before participating in the experiment. They completed a demographics questionnaire and were randomly assigned to either the Placebo or FOCAL training group to receive the respective training program. After training, the participants were given instructions and three practice trials for the map task. Next, participants completed two practice drives to familiarize themselves with both the primary driving task and the secondary in-vehicle navigation task. The participants were instructed to drive as they normally would and to obey all traffic laws. The practice drives took approximately 3 min to complete. Participants were then equipped with a head-mount eye tracker and calibrated using a nine-dot calibration system. Participants completed four experimental drives in a predetermined randomized order and filled out the calibration questionnaire for the Brier scores after each drive. After completing all of the experimental drives, participants completed the calibration questionnaire for the normalized difference scores and a driving history questionnaire. The experiment took approximately 2 hr to complete.
Dependent Variables
Objective attention maintenance
The ability to maintain attention to the forward roadway was measured by proportions of off-road glances longer than 2 s calculated for each trial. Each glance duration was defined as a time interval between the frame that the driver moves their eyes from the forward roadway and the frame that the driver’s eyes return to the forward roadway (e.g., Yamani et al., 2015). Any gaze that left the forward roadway and was directed toward the map task was counted as an off-road glance. Proportions of glances were calculated by dividing the total number of off-road glances longer than 2 s by the total number of off-road glances executed per trial. Eye glance data were only analyzed for the 15-s search task interval during each driving scenario.
Map task performance
Map task performance was measured by calculating performance accuracy in the map task. If the participant reported an incorrect answer or did not report within the 15-s time limit, the trial was marked incorrect.
Driver calibration—normalized difference scores
Calibration scores were calculated using two different methods. The first measure of calibration required normalizing both the subjective and objective attention maintenance scores (proportion of glances longer than 2 s) using the formula below using the max–min feature scaling method. This method allows adjusting each raw score into the restricted range of [0, 100]. The formula for normalizing scores is:
The difference between those two normalized scores determined the participant’s calibration score. The difference score was calculated by subtracting the normalized objective performance proportion from the normalized subjective performance proportion. Therefore, the formula for calculating normalized difference measure of calibration is:
Negative calibration scores suggest the driver underestimates their performance, whereas positive scores suggest the driver overestimates their performance. The closer to zero the participant’s score is, the better their calibration is.
Driver calibration—the Brier score
The second measure of calibration used the Brier score (Brier, 1950). The Brier score is a measure of the accuracy of a probabilistic prediction (Brier, 1950; Lichtenstein & Fischhoff, 1977; Lichtenstein et al., 1982; Murphy, 1973) and provides insights into the calibration process by quantifying a driver’s skill and confidence as probabilistic judgments. The Brier score is a composite of three separate terms: knowledge, calibration, and resolution. Knowledge measures the participant’s ability to classify events. Calibration measures how accurate one’s self-appraisals of performance match their actual performance, while considering confidence. Resolution determines one’s ability to differentiate between different levels of uncertainty. The formula presented below was used to calculate the Brier score as seen in Roberts et al. (2016).
In the formula above, c represents the overall proportion of self-appraisals correctly identified compared with objective performance, N represents the total number of self-appraisals given, T represents the number of categories that self-appraisals are categorized into, t represents the category of objective performance, n represents the number of self-appraisals assigned to t, rt represents the participant’s confidence in their self-appraisal, and ct represents the proportion of self-appraisals correctly identified compared with objective performance for each level of t. Appendix C provides a worked example for calculating the Brier scores.
Total Brier scores can range between 0 and 1 with 0 being the desired score. The application of the Brier score required the driver to be able to make incorrect or correct subjective assessments. Therefore, the Brier score questionnaire’s response options were categorized into two different categories during coding: 0–50 and 50–100 (e.g., the parameter, T).
Statistical analysis
Instead of the traditional null-hypothesis significance tests (NHSTs), we employed default Bayesian t-tests (Morey & Rouder, 2011; Rouder et al., 2009) with Bayes factors as the measure of evidence in place of p values. Bayes factors are ratios of the likelihood that data support one hypothesis over another. Bayesian t-tests offer at least two advantages to the current study. First, Bayes factors allow researchers to provide evidence in favor of the null hypothesis. That is, while the p values greater than the α (typically .05) do not indicate the lack of an effect of interest within the NHST framework, if data support a statistical model without the effect than that with the effect, then the Bayes factor can be taken as evidence for the lack of the effect. Second, Bayes factors as likelihood ratios allow an effective means to interpret data. That is, a Bayes factor of 10 in favor of the presence of the effect of FOCAL, for example, indicates that data are 10 times more likely to have arisen from a model including the effect of FOCAL than that excluding the effect. Following Rouder et al. (2012), we report B10, with values greater than 1 indicating evidence for an effect of interest while those less than 1 indicating evidence against the effect. Last, Bayes factors greater than 3 are interpreted as substantial evidence for the presence of the effect while those less than .33 are interpreted as substantial evidence against the presence of the effect (Jeffreys, 1961). BayesFactor and BEST package in R was used for Bayesian analysis including calculation of the 95% highest density interval (HDI) for each mean, where values that exist within the HDI are more credible than those outside the HDI and points falling within the 95% HDI represent 95% of the posterior distribution (Kruschke, 2013).
Results
Performance Accuracy in the Map Task
Data did not indicate evidence for or against differences in visual search performance between FOCAL- and Placebo-trained drivers, M = .40, 95% HDI [.29, .51] for the FOCAL group, M = .30, 95% HDI [.20, .38] for the Placebo group, mean difference = .10, 95% HDI [−.04, .24], independent-samples t(34) = 1.64, B10 = .90.
Proportions of Long Off-Road Glances
Consistent with the prior works, FOCAL-trained drivers executed fewer off-road glances longer than 2 s than Placebo-trained drivers, M = .20, 95% HDI [.13, .28] for the FOCAL group, M = .36, 95% HDI [.26, .44] for the Placebo group, mean difference = .15, 95% HDI [−.27, −.04], independent-samples t(34) = −2.99, B10 = 8.36. Figure 2 presents a complementary cumulative distribution function (CDF) for each group. A complementary CDF here displays the probability that off-road glance duration was longer than or equal to each of several specified glance duration thresholds (e.g., Yamani et al., 2015). Visual inspection of the complementary CDF indicates that FOCAL-trained drivers produced shorter off-road glances than Placebo-trained drivers across varying threshold levels, generalizing the current findings.

A complementary cumulative distributive function for the FOCAL group (solid line) and the Placebo group (dotted line). FOCAL = FOrward Concentration and Attention Learning.
Driver Calibration
Normalized difference measure
Figure 3 illustrates mean calibration scores using the normalized difference measure for the FOCAL- and Placebo-trained drivers.

Both FOCAL and placebo’s mean calibration scores. Scores to the right of zero indicate overestimation, and scores to the left indicate underestimation. Error bars indicate 95% HDIs (Kruschke, 2013). FOCAL = FOrward Concentration and Attention Learning. HDI = highest density interval.
Data provided strong evidence that FOCAL-trained drivers showed lower calibration scores than Placebo-trained drivers, suggesting better calibration for FOCAL-trained drivers, M = −.14, 95% HDI [−.31, .03] for the FOCAL group, M = .24, 95% HDI [.05, .45] for the Placebo group, mean difference = −.38, 95 % HDI [−0.65,–.12], independent-samples t(34) = 3.04, B10 = 9.26. Note that the scores substantially differed from zero and were in the positive direction for Placebo-trained drivers, indicating the Placebo-trained drivers overestimated their attention maintenance skills, one-sample t(17) = 2.84, B10 = 4.77. This finding was not observed for FOCAL-trained drivers, one-sample t(17) = 1.76, B10 = .88.
Brier score
Brier scores of FOCAL-trained drivers did not substantially differ from those of Placebo-trained drivers, M = .18, 95% HDI [.11, .25] for the FOCAL, M = .14, 95% HDI [.09, .20] for the Placebo, mean difference = .04, 95% HDI [−.05, .13], independent-samples t(34) = .88, B10 = .44. None of the three components of the Brier score showed reliable differences between the groups, .32 < all B10 <.36. Note that the scores did differ decisively from zero, indicating poor calibration for both Placebo-trained drivers, one-sample t(17) = 6.72, B10 = 5.5 × 103, and FOCAL-trained drivers, one-sample t(17) = 7.03, B10 = 9.4 × 103.
Discussion
In the current study, we examined the effect of an existing attention maintenance training program for young drivers, FOCAL, on drivers’ calibration of their attention maintenance performance in a driving simulator, using two different measures. The first measure was the calibration score derived from the difference between a normalized objective measure of proportion of long glances to the map task and a normalized subjective measure from the first questionnaire. The second measure was the Brier score. Results showed better objective attention maintenance performance in FOCAL-trained drivers than Placebo-trained drivers, resulting in approximately 16% fewer in-vehicle glances that are longer than 2 s for FOCAL-trained drivers than Placebo-trained drivers. Map task performance was comparable between FOCAL- and Placebo-trained drivers. Critically, based on the normalized difference measure of calibration, FOCAL-trained drivers underestimated their attention maintenance performance, whereas the Placebo-trained drivers overestimated their attention maintenance performance. The Brier score, however, did not show measurable differences in calibration between the FOCAL- and Placebo-trained drivers across the three components of knowledge, calibration, or resolution.
The effect of FOCAL on decreasing the proportion of excessively long in-vehicle glances is consistent with the findings of Pradhan et al. (2011). FOCAL-trained drivers in the current study executed approximately 16% fewer in-vehicle glances longer than 2 s than Placebo-trained drivers—almost double the difference found in Pradhan et al. (2011). However, in both Pradhan et al. and the current study, FOCAL-trained drivers performed almost the exact same proportion of in-vehicle glances greater than the 2-, 2.5-, and 3-s thresholds. The difference between the current study and that of Pradhan et al. (2011) can be accounted for by task difficulty different between the two studies. In the current study, Placebo-trained drivers executed almost 8% greater in-vehicle glances longer than 2 s than those in Pradhan’s study, indicating that the current map task might be more challenging than the in-vehicle tasks used in Pradhan et al. (2011) as the map task required multiple interactions through the app.
Using the normalized difference measure, FOCAL-trained drivers did not demonstrate the same trend of overestimation of their attention maintenance performance as Placebo-trained drivers. In fact, FOCAL-trained drivers’ normalized difference scores were closer to zero, indicating almost perfect calibration. Placebo-trained drivers, however, markedly overestimated their own performance. In other words, FOCAL-trained drivers learned not only how to take shorter in-vehicle glances while controlling the vehicle but also how to better recognize when their glance exceeds the safety critical threshold, such as the 2-s threshold found in the current study.
The DCF would explain FOCAL’s effectivity improving both attention maintenance and calibration skills through its 3M feedback training. Recall that FOCAL requires trainees to make mistakes (e.g., looking down longer than 2 s), explains why it is a problem (e.g., looking down longer than 2 s elevates crash risk), and provides opportunities to learn the target behavior (e.g., looking down shorter than 2 s). Through the training process, even though calibration was not explicitly emphasized in the program, trainees may realize the miscalibration between their perceived performance and their actual performance. For example, trainees may perceive that they looked down less than 2 s, but they actually looked down longer than 2 s. This way, FOCAL may provide an opportunity to improve not only objective attention maintenance skills but also calibration skills via a feedback mechanism. However, the present study does not provide direct evidence that this process occurred. Future research should further identify the psychological mechanisms that explain how FOCAL may improve driver calibration.
Using the Brier score measure of calibration, FOCAL-trained drivers were not significantly different from Placebo-trained drivers. However, the obtained Brier scores for both groups were significantly different from zero indicating poor calibration. To implement the Brier score to the experimental surface transportation domain, responses on a continuous scale (e.g., time duration of off-road glances) were categorized into two discrete categories—scores less than 50 or scores greater than 50. The reduced variance that resulted could have prevented the detection of meaningful differences between the groups. To explore whether the number of categories influences the results, the responses were recategorized into three categories (less than 33%, 34%–66%, and 67%–100%) and four categories (less than 25%, 26%–50%, 51%–75%, and 76%–100%) and the Brier scores were recalculated for each, but the results showed no differences between the groups.
Historically, the Brier score has required using numerous data points (Lichtenstein & Fischhoff, 1980) substantially more than the data collected in the current experiment. Roberts et al. (2016) first implemented the Brier score in the surface transportation domain using 720 data points. The results of the Brier score in the current study trended similarly toward the normalized proportion scores, but statistical results did not converge. Even with repeated measures and increased sample size in the current study, however, it still only amounted to 144 data points, the number substantially lower than what was done in Roberts et al., 2016. Successful implementation of the Brier score might require a study with multiple trials across days to supply enough data points.
There are several limitations to note. First, as with other driving simulator studies, the current findings may not be generalizable to real-world driving environments. Second, drivers were instructed to perform the map task at a given location for exactly 15 s, but drivers may strategically engage in such in-vehicle tasks while driving (strategic attention maintenance; Fisher et al., 2017). Third, more time spent looking toward the forward roadway by itself does not necessarily indicate sufficient visual sampling for detecting and responding to imminent hazard. Future research should use a variety of tasks and levels of visual demand to test the limits of FOCAL’s effectiveness on improving drivers’ attention maintenance performance. Furthermore, future research should clarify whether attention allocated to the forward roadway during the in-vehicle task is sufficient to detect imminent hazards. For example, a study might include hazard anticipation scenarios as an unobtrusive way of indicating if an on-road glance is meaningful. It is surprising that in both the current study and that of Pradhan et al. (2011), approximately 8% of in-vehicle glances were longer than 3 s, even for drivers who were trained with FOCAL. That is, FOCAL-trained drivers looked down longer than 3 s for more than 8% of in-vehicle glances. This finding is of concern because this pattern of off-road glances appears regardless of the differences in participants’ age and driving experiences across studies. Finally, the psychological mechanisms that underlie the effect of FOCAL on both objective attention maintenance and calibration of attention maintenance behaviors remain to be explored. Further research should examine whether FOCAL improves driver calibration by training drivers to mobilize more attentional resources to monitoring of their own attention maintenance performance by shifting from a controlled to automatic process for the driving and the in-vehicle secondary tasks (e.g., Schneider & Shiffrin, 1977; Shiffrin & Schneider, 1977).
To sum, this study replicated the effect of FOCAL on attention maintenance performance and extended the previous works by examining the effect of FOCAL on calibration in young drivers. We found that FOCAL-trained drivers were better at limiting their in-vehicle glances to less than 2 s compared with Placebo-trained drivers. Also, FOCAL-trained drivers were better calibrated to their attention maintenance performance than Placebo-trained drivers, measured via normalized difference scores of driver calibration. Placebo-trained drivers demonstrated overestimation of their performance, which FOCAL-trained drivers did not show. However, no differences were found between FOCAL and Placebo-trained drivers using the Brier scores.
In practice, the present study demonstrates a promising way to train driver calibration of the targeted behaviors critical for driving safety, potentially protecting them from reverting to the untrained behaviors and promoting them to monitor their safe behaviors. Though further evaluation studies are necessary, it is likely that, in principle, other driver training programs, especially those using the 3M training method, would yield the added benefit of improved calibration skills. Designers and administrators of existing driver training programs may consider focusing on both objective driving performance and their calibration to further promote road safety for young drivers.
Key Points
Trained drivers curtailed their long glances away from the road while performing a secondary map task and reported better calibration than untrained drivers.
An existing driver training program has the potential to improve not only attention maintenance skills but also the calibration of skills for young drivers.
Driver training programs may be designed to train not only targeted higher cognitive skills but also driver calibration, both of which are critical for driving safety in young drivers.
Footnotes
Appendix A: Workload and Calibration Questionnaire
Appendix B: Brier Score Questionnaire
1. Rate your performance on keeping your eyes on the forward roadway (limiting in-vehicle glances to less than 2 s) during the Waze task.
1a. Rate your confidence in your decision.
2. Rate your performance on accurately reporting the distance between ODU and target location (within the 15-s time frame) on the Waze task.
2a. Rate your confidence in your decision.
3. Rate your performance on lane positioning (keeping your car straight) during the Waze task.
3a. Rate your confidence in your decision.
4. Rate your performance on vertical positioning (maintaining the same speed) during the Waze task.
4a. Rate your confidence in your decision.
Appendix C: Brier Score Calculation Example
Assume example raw data in Table C1.
To compare each score within the range of [0, 10], objective performance scores are multiplied by 10, providing Table C2.
Because objective performance scores must be transformed as high objective performance score (more in-vehicle glances longer than 2 s) are equivalent to poor performance as in subjective performance score, each objective performance score is subtracted from 10, resulting in Table C3.
To calculate the Brier score, values on continuous variables need to be grouped into discrete categories. However, the raw data in the current experiment were all on continuous variables between 0 and 10. In this example, the data were classified into three categories (0–3.39, 3.40–6.69, and 6.70–10) as seen in Table C4.
The formula for calculating the Brier score is:
where c represents the overall proportion of self-appraisals correctly identified compared with objective performance, N represents the total number of self-appraisals given, T represents the number of categories that self-appraisals are categorized into, t represents the category of objective performance, n represents the number of self-appraisals assigned to t, r t represents the participant’s confidence in their self-appraisal, and c t represents the proportion of self-appraisals correctly identified compared with objective performance for each category t.
Self-appraisals are defined as correct if the category of the subjective performance matches that of the objective performance. c equals to .5 because 50% of the appraisals are correct. N equals 4 because they appraised their performance four times. T equals 3 because there are three categories used for objective and subjective scores. n1 equals to 2, n2 equals to 0, and n3 equals to 2, for the first, second, and third category, respectively, because two subjective performance scores exist within each of the first and third categories. r1 equals 0.26 because it refers to the mean of confidence scores on proportion within the first category [e.g., (3.6 + 1.6)/2 * (1/10)]. Similarly, r2 equals 0 and r3 equals 0.825. c3 equals 1 because self-appraisals were correct in both trial 3 and trial 4 within the third category, while c1 equals to 0. Using the parameters above, Brier score is computed using the formula as follows:
For the above example, knowledge is .25, their calibration is .61, resolution is .18, and the Brier score is .68.
Acknowledgments
Thanks are due to James Paulson for helpful comments on an earlier draft of the manuscript.
Author Biographies
James Unverricht is a PhD student in the Department of Psychology at Old Dominion University. He received his MS in Psychology from Old Dominion University in 2019.
Yusuke Yamani is an associate professor in the Department of Psychology at Old Dominion University. He earned his PhD in Psychology (Visual Cognition and Human Performance) at the University of Illinois at Urbana-Champaign in 2013.
Jing Chen is an assistant professor in the Department of Psychology at Old Dominion University. She earned her PhD in Cognitive Psychology and MS in Industrial Engineering at Purdue University in 2015.
William J. Horrey is traffic research group leader at AAA Foundation for Traffic Safety. He received his PhD in Engineering Psychology from the University of Illinois at Urbana-Champaign in 2005.
