A Meta-Analytic Examination of the Cost and Benefits of Overlay versus Separate Displays

Abstract

Costs and benefits are associated with how information is presented depending on the type of task required. A meta-analysis was conducted to examine the costs and benefits associated with overlaying information on a single display, as in the case with a head-mounted display, or presenting information more separately, as in the case with a head-down display (e.g., a tablet). Seventeen studies contributed a total of 82 individual effect sizes that met the inclusion criteria for the meta-analysis. Overall, overlaid displays led to a response time benefit for integration tasks without compromising accuracy, whereas separate displays led to an accuracy benefit for focused attention tasks. The findings here speak favorably to the use of head-mounted displays for tasks that require integrating information on the display with information in the real world.

Keywords

Information Integration Display Design Meta-Analysis Attention Augmented Reality Head-Mounted Displays Proximity Compatibility Principle

Introduction

As augmented reality head-mounted displays (AR-HMDs) become more pervasive in domains such as the military, aviation, and medicine, it is crucial to understand any performance tradeoffs that occur when using these devices for different tasks. For example, a common task for a soldier on the battlefield is to identify an enemy target embedded within a real-world scene (e.g., urban terrain) that is located some distance away from the soldier’s current location. Information about where the enemy is located can be conveyed directly on an HMD with a cue overlaying location information onto the real world. Such overlay reduces the amount of scanning, and therefore effort, required to access information. However, overlay information can impose a cost of clutter, obscuring the view of non-salient elements in the world beyond, which could have negative consequences. An alternative display format is to present this information on a head-down display (HDD), such as a tablet. While this increases the amount of scanning required when integrating information on the HDD with the real world, the magnitude of the costs in comparison to the costs of overlay clutter may not be as severe (Yeh et al., 2003).

One meta-analysis (Fadden et al., 2000) examined the benefits of head-up displays (HUDs) relative to head-down displays (HDDs) in vehicles, specifically cars and airplanes. Overall, evidence suggests that reduced scanning improved performance despite costs associated with overlay clutter imposed by the HUD. Another key finding indicated that performance costs and benefits were moderated by the type of task. The advantage of the HUD overlay was amplified when displayed information had to be integrated with real world information. This advantage was compared to situations where two information sources were processed independently, as is the case with dual-tasking, or when only focused attention was required for only one source of information. The interaction between display separation (overlay vs separate) and task type (integration vs focused attention) is a key aspect of the proximity compatibility principle. The proximity compatibility principle proposes a tradeoff, such that close display proximity (HMD overlay) improves performance for tasks requiring integration between two or more sources of information (Wickens & Carswell, 1995; Kroft & Wickens, 2003). But tasks requiring focused attention would benefit from reduced overlay clutter, suggesting information should be presented more separately (on an HDD like a tablet).

Previous literature suggests that HUD overlay has overall performance benefits compared to separate displays, particularly for integration (Fadden et al., 2000). However, little work has examined the issue of overlay clutter and increased display separation in meta-analytic form for HMDs. In addition, there is no meta-analytic examination of the differences between overlaid and separate sources of information presented on a 2D flat panel display (e.g., overlaying traffic and weather geospatial databases on a map). Furthermore, while some papers have reviewed the extensive applications of HMD and reported their performance effects (Dey et al., 2018, Jeffri & Rambli, 2021), they have not explicitly contrasted HMD performance with their counterparts on HDDs or hand-held displays (HHDs).

The data presented here are preliminary analyses examining the relationship between display location and task type. Display location takes into consideration both information presented on an HMD and information presented on 2D flat panel displays where different databases of information are directly overlaid or presented separately (i.e., side-by-side or further separated). The goal of the present work is to empirically determine the costs and benefits of overlay versus separate displays and to examine the moderating variable of task type.

Method

Literature Accumulation

We conducted a literature search using various databases (e.g., Academic Search Premier, APA PsycInfo, and Behavioural Sciences), including U.S. Army Research Laboratory technical reports. We also used web-based search engines, like Google Scholar, to find additional relevant references. After gathering the initial articles, we reviewed the reference lists to find other relevant articles. The primary search terms included: database/information integration, display separation, overlay, superimposition, separated, HHD, HDD, HMD, HUD, proximity compatibility principle, and augmented reality HMDs. We found a total of 218 empirical research articles, conference publications, and technical reports.

Inclusion Criteria

All manuscripts were reviewed to ensure they met the following inclusion criteria: (1) reported an empirical examination between overlaid versus adjacent information presented on a desktop display, or a comparison between information presented with an HMD versus information presented on either an HDD (e.g., tablet) or HHD (e.g., smartphone); (2) reported performance measures of response time and accuracy or error; (3) the tasks were either a focused attention task, an information integration task, and/or a dual-task scenario; (4) the design was within-subjects repeated measures; and (5) included sufficient statistics required for computing effect sizes.

Using these inclusion criteria, 22 manuscripts were excluded after reviewing the abstract alone. Other reasons for exclusion included: no direct comparison between the displays of interest (n = 98), duplicate records (n = 4), wrong study design (n = 37), wrong dependent measures (n = 48), and insufficient data reported (n = 6). After reviewing the articles, a total of 6 examined the comparison between overlay and adjacent displays, and 11 examined the comparison between HMDs and HHDs/HDDs. An additional 12 articles compared HUDs and HHDs/HDDs but were excluded here to focus our analysis on the differences between HMDs and HHDs/HDDs. A total of 17 articles contributed to the present meta-analysis. All literature included in this analysis is listed in the reference with an asterisk by the first author’s last name.

Effect Size Calculation and Variance Estimates

A total of 82 individual effects met the inclusion criteria for the display comparisons and were extracted from the articles. These effects included both significant and non-significant effects comparing either overlay versus adjacent displays or comparing HMDs versus HHDs/HDDs. After extracting the data for all the relevant effects in each study, each statistic was converted to a Cohen’s d effect size. Some studies only reported F-values while other studies only reported t-values or means and standard deviations (SD). To compute Cohen’s d from F-values, partial-eta squared ( $η_{p}^{2}$ ) was calculated from the F-value and the two degrees of freedom associated with the F-test (Lakens, 2013). The equation below shows how these effect sizes were computed:

η_{p}^{2} = \frac{F \times d f_{e f f e c t}}{F \times d f_{e f f e c t} + d f_{e r r o r}}

In formula above, F represents the reported F-value and df_effect and df_error represents the between-groups and the within-groups degrees of freedom, respectively. After computing $η_{p}^{2}$ , it was converted to Cohen’s f using the formula below (Kim, 2016):

f = \sqrt{\frac{η_{p}^{2}}{(1 - η_{p}^{2})}}

Next, Cohen’s f values were converted to Cohen’s d using the formula below (Cohen, 1988):

d = 2 f

For all studies that reported t-values with no additional information, Cohen’s d was computed directly from the t-value and the sample size using the equation (Lakens, 2013; Rosenthal, 1991):

d = \frac{t}{\sqrt{n}}

For all repeated measures studies that reported only means and SD, Cohen’s d was computed using the equation below (Lakens, 2013):

d = \frac{({\bar{Χ}}_{1} - {\bar{Χ}}_{2})}{\sqrt{S D_{1}^{2} + S D_{2}^{2} - 2 \times r \times S D_{1} \times S D_{2}}} \times \sqrt{2 (1 - r})

In the formula above, the numerator is the difference between the means of two groups and the denominator is the SD of the difference scores from the SD of both groups and their correlations (Lakens, 2013; Cohen, 1988), where r represents the correlation between measures, and SD represents the standard deviation. The sampling variance of d was computed using the method reported in the literature (see Cooper et al., 2009; Wilson, 2017).

Most of the articles consisted of multiple individual effects, therefore effects measuring the same dependent variable were average and weighted to account for the non-independence of multiple effect sizes. Effect sizes were aggregated and separated based on the display characteristics, dependent measures, and the moderating variable of task type (described below). For example, if one study reported three effects showing a benefit for overlaid compared to adjacent displays for response time during a focused attention task, those three effects were averaged and weighted (i.e., multiplied by 1/3, since a total of 3 effects were included). All effect sizes were categorized as either a benefit (i.e., shorter response time or higher accuracy) of overlay or of HMDs (+d) or a benefit of adjacent or separate displays (-d). Specifically, a benefit of overlay (+d) means that overlaid displays decreased response time (i.e., faster response time) and increased accuracy compared to separate displays, whereas a benefit of separate (-d) means that separate displays decreased response time and increased accuracy compared to overlaid displays.

Moderator Variable

A critical moderator variable was the type of task. The tasks of interest included focused attention tasks, information integration tasks, and dual-tasking scenarios. Focused attention tasks include those that required focused attention on only one source of information. Information integration tasks required participants to integrate information between two sources. Dual-tasks included a primary task (e.g., flight tracking, navigation, driving) and a secondary focused attention task (e.g., monitoring, detection, digit recall) that were completed simultaneously. These are hybrids between integration tasks and focused attention tasks. Like integration tasks, in that they require processing information in each domain. But like focused attention tasks, where they require sequential focus of attention on one database and then the other, with no integration. Table 1 presents examples of task details from some of the studies.

Table 1.

Unique examples of different types of tasks for each unique display comparison. Note that only dual-tasks consisted of both primary and secondary tasks.

Study	N	Display Comparison	Method	Task	Primary Task	Secondary Task	Task Details
Kroft & Wickens, 2001	15	2D Overlay vs Adjacent	2D Flat Panel Display	Focused Attention & Integration	Map Questions	na	Multiple choice questions about navigation and air hazard displays
Fluery et al., 2013	54	2D Overlay vs Adjacent	2D Flat Panel Display	Integration	Map Questions	na	Spot the difference (primary task) between 3 pairs of plans in succession
Wickens et al., 2004	14	2D Overlay vs Adjacent	Flight Simulator	Dual-task	Tracking	Detection	Flight tracking (primary task) and detection (secondary task) of altitude changes to traffic
Dixon et al., 2014	50	HMD vs Separated	Endoscopic Surgery Navigation	Dual-task	Integration	Detection	Localize an area via surgery navigation (primary task) while detecting unexpected items (secondary task)
Lindemann et al., 2019	18	HMD vs HDD	Driving simulator	Integration	Take Over Request	na	Manual driving take-over-request (TOR) from automation (primary task)
Yeh et al., 2003	16	HMD vs HHD	Target Cueing	Dual task	Detection	Monitoring	Find friend/foe targets (primary task) while monitoring analog radio frequency display (secondary task)
Sawyer et al., 2014	40	HMD vs HDD	Driving simulator	Dual-task	Tracking	Texting	Driving control task (primary task) while text messaging (secondary task)

Results

The combined effect sizes from all studies were averaged to examine whether display type (overlay/HMD vs separate) or task had an impact on response time and accuracy or error. A result was considered significant if the mean effect size was at least two standard errors (2SE) away from zero.

Table 2 provides the mean effect sizes (Cohen’s d) for response time and accuracy, the number of effects (N), standard error (SE), and two SE (2SE) values for the display comparisons only: (1) overlay versus adjacent displays and (2) HMD versus HHD/HDD (separate) displays. Based on 2SE, there appear to be no significant benefits of either overlay/HMD or adjacent/separate displays.

Table 2.

Mean effect size (d) for each display comparison.

	d	N	SE	2SE
2D Overlay vs Adjacent
Response Time	0.30	8.00	0.60	1.20
Accuracy/Error	-0.25	14.00	0.35	0.70
HMD vs HHD/HDD
Response Time	-0.50	18.00	0.38	0.76
Accuracy/Error	0.09	17.00	0.14	0.28

In order to examine the role that the moderating variable had on response time and accuracy, data were collapsed across display types such that overlay and HMD were categorized as ‘overlay’, and adjacent and HHD/HDD were categorized as ‘separate.’ Figure 1 shows the average weighted mean effect sizes for response time as a function of task type. The error bars represent 2 SE of the mean. Dual-tasks have a mean effect size that is nearing significance (d = -0.65, SE = 0.39, 2SE = 0.78), suggesting a possible benefit of adjacent displays. The effect size for focused attention tasks was not different from zero (d = 0.79, SE = 0.64, 2SE = 1.28). The non-significance of such large effect sizes for both dual-tasks and focused attention tasks may be due to too few studies and high variability, and thus warrants further investigation. For information integration tasks, there was a significant benefit for overlaid displays compared to separate displays (d = 0.86, SE = 0.32, 2SE = 0.64). Overall, these findings suggest that overlaid displays decrease response time for tasks that require information integration, but not for those that require focused attention or dual-tasking.

Figure 1.

Mean weighted effect sizes (Cohen’s d) for response time as a function of task type. Error bars represent two SEM. Positive and negative effects indicate a benefit for overlay and separate displays, respectively.

Figure 2 shows the average weighted mean effect sizes for accuracy as a function of task type. The mean effect size for dual-tasks neared significance but with a small effect size (d = 0.17, SE = 0.12, 2SE = 0.24), hinting at a minor potential accuracy benefit for overlaid displays. For focused attention tasks, there was a significant large benefit of separate compared to overlaid displays (d = -2.14, SE = 1.0, 2SE = 2.0). Integration tasks showed no benefit on accuracy for either type of display (d = 0.06, SE = 0.23, 2SE = 0.46). Separate displays can therefore be seen to play an influential role on improved accuracy for focused attention tasks.

Figure 2.

Mean weighted effect sizes (Cohen’s d) for accuracy as a function of task type. Error bars represent two SEM. Positive and negative effects indicate a benefit for overlay and separate displays, respectively.

Discussion

The present work aimed to synthesize the overall body of literature examining performance differences when using either 2D database overlay or head-mounted displays compared to separate displays (2D adjacent database, HDDs, or HHDs) for different types of tasks (dual-tasking, focused attention, and information integration). A total of 82 individual effect sizes from 17 articles were included in the meta-analysis. The collective wisdom from these effects conveys relatively clear findings consistent with the proximity compatibility principle.

Consistent with the principle, the collective effect for response time data indicates that overlay displays are best for information integration tasks, but not for focused attention tasks. This is likely attributed to reduced scanning and head movements between separate sources of information when integration is required. Also consistent, the collective effect for accuracy indicates that separate displays are best for focused attention tasks, but not for information integration tasks. This is likely due to the effects of overlay clutter when information in two domains (e.g., information on an HMD and information in the real world scene) are superimposed. In this context, overlay clutter refers to the visual complexity of a scene or display of information that is created when more than one source of information is superimposed onto another. Such overlay clutter can disrupt the perception of visual information, causing issues with locating and extracting relevant information for a given task due to occlusion and crowding (Pankok & Kaber, 2018). In line with prior work, overlay clutter appears to have decreased accuracy for focused attention tasks, but not information integration (Kroft & Wickens, 2003).

In the methods section, dual-tasking was defined as a hybrid of integration and focused attention tasks, such that they require the processing of information in each domain (integration) and they often involve sequentially focusing on one database at a time (focused attention). The current results support this hybrid status. Specifically, dual-tasking never significantly benefited from either type of display configuration (overlay or separate). Furthermore, dual-tasking scenarios hint at a trend toward faster response time for separate displays and more accurate performance for overlay displays, suggesting a speed-accuracy tradeoff.

That overlay displays benefit response time without hindering accuracy for tasks requiring information integration speaks favorably for HMDs. For example, in the context of target cueing, a soldier is required to integrate location information indicated by a cue on an HMD with information in the real world as quickly and accurately as possible when deciding whether they have found an enemy target (Yeh et al., 2003). In such sensitive situations, reducing search time without imposing an accuracy cost is critical. In the context of augmented reality head-mounted displays (AR-HMDs), related work has shown a reduction in search time for 3D objects at specific locations in a room when cues guided attention to the location of a specific target (Warden, et al., 2022). This benefit was greater for an arrow cue, which minimized overlay clutter, compared to a minimap cue which occluded more of the real world scene. Fadden and colleagues (2000) also found an overall benefit of overlay for HUDs compared to separate displays, especially for conformal imagery which is an analog to augmented reality. Thus, overlaying information on devices such as AR-HMDs appears to be a viable option for time-critical tasks that require integration across more than one domain of information.

Contrary to the prior meta-analysis examining the costs and benefits of HUDs compared to separate displays (Fadden et al., 2000), the collective findings here did not reveal an overall significant benefit of overlay compared to separate displays for either response time or accuracy. It may be the case that this lack of effect is due to the limited amount of research examining HMDs. Specifically, much of the literature does not examine the direct comparison between HMDs and separate display formats for tasks that require focused attention, integration, or dual-task scenarios (see Dey et al. 2018 for review). Much of the current literature on HMDs not included in the present work examines other measures, such as user preferences, mental workload, or subjective reports, and the few studies that do examine performance measures do so for different types of tasks (Dey et al., 2018, Jeffri & Rambli, 2021).

Conclusions

The main finding that emerges from the present meta-analysis is that overlay displays improve response time for integration tasks and separate displays improve accuracy for focused attention tasks. However, no overall benefit of one display format over another was found. The key moderating variable of task type appears to drive the costs and benefits of one display format over another. Therefore, while the use of overlay displays may benefit response time for integration tasks, such use may hinder performance when a task requires focused attention on only one source of information. These findings have design implications for how to present information for various types of tasks.

Limitations & Future Directions

A common limitation of meta-analyses is that they include different types of studies and that the summary effects may not take into account differences between studies. While the work here established specific inclusion criteria, the studies inherently have fundamental differences. Another limitation is that the work here did not include effect sizes from research examining HUDs, the other design where overlay is critical. One reason for this was to explicitly look at the effects of HMDs, which has not been done before. However, further work could include effects from studies examining HUDs, given that HUDs can also be projected onto displays that are akin to HMDs.

Footnotes

Acknowledgements

This research was supported by the Office of Naval Research under grant numbers N00014-21-1-2949 and N00014-21-1-2580. Dr. Peter Squire was the scientific/technical monitor.

References

Cohen

(1988). Statistical power analysis for the behavioral sciences (2^nd ed.), Hillsdale, NJ: Erlbaum.

Cooper

Hedges

L. V.

Valentine

J. C.

(Eds.). (2009). The handbook of research synthesis and meta-analysis (2nd ed.). Russell Sage Foundation.

Dey

Billinghurst

Lindeman

R. W.

Swan

J. E.

(2018). A systematic review of 10 years of augmented reality usability studies: 2005 to 2014. Frontiers in Robotics and AI, 5, 37

*Dixon

B. J.

Daly

M. J.

Chan

Vescan

A. D.

Witterick

I. J.

Irish

J. C.

(2013). Surgeons blinded by enhanced navigation: the effect of augmented reality on attention. Surg Endosc.

*Dixon

B. J.

Daly

M. J.

Chan

Vescan

A. D.

Witterick

I. J.

Irish

J. C.

(2014). Inattentional blindess increased with augmented reality surgical navigation.

Fadden

Wickens

C. D.

Ververs

(2000). Costs and Benefits of Head up Displays: An Attentional Perspective and a Meta Analysis. SAE Technical Paper 2000-01-5542. https://doi.org/10.4271/2000-01-5542

*He

McCarley

J. S.

Crager

Jadliwala

Hua

Huang

(2018). Does wearable device bring distraciton closer to drivers? Comparing smartphones and google glass. Applied Ergonomics, 70, 156-166.

Jeffri

N. F. S.

Rambli

D. R. A.

(2021). A review of augmented reality systems and their effects on mental workload and task performance. Heliyon, 7(3), e06277

Kim

H. Y.

(2016). Statistical notes for clinical researchers: Sample size calculation 3. Comparison of several means using one-way ANOVA. Restorative dentistry & endodontics, 41(3), 231-234.

10.

*Kroft

Wickens

C. D.

(2001). The display of multiple geographical data bases: implications of visual attention. Aviation Resaerch Lab Institute of Aviation, Technical Report ARL-01-2/NASA-01-2, NASA Ames Research Center

11.

Kroft

Wickens

C. D.

(2003). Displaying multi-domain graphical database information: An evaluation of scanning, clutter, display size, and user interactivity. Information Design Journal, 11(1), 44–52. https://doi.org/10.1075/idj.11.1.06kro

12.

Lakens

(2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in psychology, 4, 863. https://doi.org/10.3389/fpsyg.2013.00863

13.

*Lindemann

Müller

Rigolll

(2019). Exploring the use of augmented reality interfaces for driver assistance in short-notice takeovers. IEEE Intelligent Vehicles Symposium (IV), 804-809.

14.

*Fleury

Jamet

(2014). Facilitating the comparison of multiple visual items on screen: The example of electronic architectural plan correction. Applied ergonomics, 45(3), 601-607.

15.

*Morrison

J. B.

Zander

J. K.

(2008). Evaluation of head mounted and head down information displays during simulated mine-countermeasuers dives to 42 msw. Shearwatern Human Engineering North Vancouver (B.C.).

16.

*McKendrick

Parasuraman

Murtza

Formwalt

Baccus

Paczynski

Ayaz

(2016). Into the wild: neuroergonomic differentiation of hand-held and augmented reality wearable displays during outdoor navigation with functional near infrared spectroscopy. Frontiers in human neuroscience.

17.

Pankok

Jr Kaber

D. B.

(2018). An integrated measure of display clutter based on feature content, user knowledge and attention allocation factors. Ergonomics, 61(5), 682-696.

18.

*Sawyer

B. D.

Calvo

A. A.

Hancock

P. A.

(2014). Google glass: a driver distraction cause or cure? Human Factors, 56(7), 1307-1321

19.

*Sauer

Wastell

D. G.

Hockey

R. J.

Crawshaw

C. M.

Ishak

Downing

J. C.

(2002). Effects of display design on performance in a simulated ship navigation environment. Ergonomics, 45(5), 329-347.

20.

*Tippey

K. G.

Sivaraj

Ferris

T. K.

(2017). Driving while interacting with google glass: investigating the combined effect of head up display and hands free input on driving safety and multitask performance. Human Factors, 59(4), 671-688

21.

Warden

A. C.

Wickens

C. D.

Mifsud

Ourada

Clegg

B. A.

Ortega

F. R.

(2022). Visual search in augmented reality: Effect of target cue type and location. Proceedings of the 66th Annual Human Factors Society Annual Meeting, 66(1), 373-377.

22.

Wickens

C. D.

Carswell

C. M.

(1995). The proximity compatibility principle: Its psychological foundation and relevance to display design. Human Factors, 37(3), 473-494. https://doi.org/10.1518/001872095779049408

23.

*Wickens

C. D.

Kroft

Yeh

(2000). Data base overlay in electronic map design: testing a computational model. Proceedings of the IEA 2000/HFES 2000 Congress, 44(21), 451-454. https://doi.org/10.1177/154193120004402119.

24.

*Wickens

C. D.

Alexander

A. L.

Horrey

W. J.

Nunes

Hardy

T. J.

(2004). Traffic and flight guidance depiction on a synthetic vision system display: the effects of clutter on performance and visual attention allocation. Proceedings of the Human factors and ergonomics society

25.

*Wickens

C. D

Ward

J. V.

(2017). Cockpit Displays of Traffic and Weather Information: Effects of 3D Perspective Versus 2D Coplanar Rendering and Database Integration. The International Journal of Aerospace Psychology, 27(1-2), 44-56, https://doi.org/10.1080/10508414.2017.1366270

26.

Wilson

D. B.

(2016). Formulas Used by the" Practical Meta-Analysis Effect Size Calculator. Practical meta-analysis

27.

Yeh

Wickens

C. D.

(2001). Attentional filtering in the design of electronic map displays: A comparison of color coding, intensity coding, and decluttering techniques. Human Factors: The Journal of the Human Factors and Ergonomics Society, 43(4), 543–562. https://doi.org/10.1518/001872001775870359

28.

*Yeh

Merlo

J. L.

Wickens

C. D.

Brandenburg

D. L.

(2003). Head up versus head down: the costs of imprecision, unreliability, and visual clutter on cue effectiveness for display signaling. Human Factors: The Journal of the Human Factors and Ergonomics Society, 45(3), 390–407. https://doi.org/10.1518/hfes.45.3.390.27249

29.

*Yeh

Brandenburg

Wickens

C. D.

Merlo

(2000). Up or down? A comparison of helmet mounted display and hand held display tasks with high clutter imagery. Technical report ARL-00-11/FED-LAB-00-3. US Army research laboratory interactive displays federated laboratory.

30.

*Young

K. L.

Stephens

A. N

Stephan

K. L

Stuart

G. W.

(2016). In the eye of the beholder: a simulator study of the impact of google glass on driving performance. Accident analysis and prevention, 86, 68-75.