Abstract
In this study, we validate and extend previous theoretical and empirical work on the design of visualizations to support decision makers using interfaces with high levels of information density and evolving constraints. Prior research proposed and validated a theoretical framework of task complexity to facilitate the systematic evaluation of uncertainty visualizations for these contexts. We adopted this approach to the evaluation of visualizations of uncertainty, displaying hurricane visualizations on a map rather than abstract symbols on a grid, manipulating the dispersion of visualizations, and varying an additional visuospatial element––visualization orientation––to improve the ecological validity of the experimental task. The results of this study provide evidence for the utility of a task complexity framework for research on the visualization efficacy.
Decision makers in complex environments typically rely on interfaces to collect and integrate information from multiple sources. These interfaces often include different types of data and information visualization, including visualizations of uncertainty, to aid decision making as they support perceptual and cognitive processing (Newton et al., 2017; Sedig & Parsons, 2013). These visualizations are designed with the intent of managing the ambiguity and impermanence of relevant information. Research in this area has explored if and how visualization scaffolds decision making processes above and beyond numerical and textual information in addition to how types and features of uncertainty representation produce differences in decision making performance (Bisantz et al., 2011; Kirschenbaum et al., 2014). However, effective communication of uncertainty information remains a challenge for visualization designers and researchers due to, for example, misinterpretation and misunderstanding by users (Witt et al., 2020, 2022). Furthermore, there exists a lack of consensus and consistency in the design and evaluation of uncertainty visualizations (MacEachren, 2015).
In an effort to move this research domain towards a more systematic assessment of visualization, Fiore and colleagues (2017) proposed and validated a theoretical framework to examine how variations in environmental factors influence decision making under uncertainty. Their work builds on theorizing by Wood (1986) on the forms of complexity associated with a task environment. In this conceptualization of task complexity, Wood defines component complexity and coordinative complexity. Component complexity is defined in terms of the number of distinct items associated with the task whereas coordinative complexity is defined in terms of the degree that those items need to be integrated for the task. Using abstract stimuli drawn from the visual semiotics work of MacEachren et al. (2012), Fiore and colleagues systematically varied uncertainty visualizations based on these two dimensions and found that they produced differences in decision making performance in addition to differences in the relationship between cognitive load and performance (i.e., cognitive efficiency). Their framework thus provides guidance on how to vary task context to study the efficacy of visualizations in support of decision making.
In this paper, we describe a study building this prior work to understand how accuracy and workload differs when systematically evaluating variations of uncertainty visualization based on the theoretical framework proposed by Fiore et al. (2017). Although applying the same uncertainty framework, we improved on the ecological validity to see if changes in component complexity and coordinative complexity still alter outcomes. To improve the ecological validity, we used hurricane location uncertainty visualizations on a map. Our primary experimental manipulation was the dispersion of visualizations on each map. The uncertainty visualizations in Fiore et al. (2017) were uniformly distributed across all possible locations in a given grid. In the current study, we varied the distribution of uncertainty visualizations such that they were either uniformly or randomly dispersed across the map. This enabled us to evaluate if the same patterns in performance emerge with more ecologically valid stimuli as weather events do not appear uniformly in natural environments.
We additionally varied the orientation of visualizations in our study. The grids used by Fiore et al. (2017) displayed symbols that shared the same orientation. Research on basic cognition and perception suggests that similarity of items can positively influence both visual working memory and perception (Peterson & Berryhill, 2013). As a result, the uniformity in the presentation of visualizations in Fiore et al. (2017) may have affected the observed differences in performance and cognitive load. Because the projected paths of hurricanes differ in direction, this manipulation also contributes to an improvement in ecological validity.
As our study is a validation and extension of Fiore et al. (2017), first, we hypothesized that trials with the highest complexity (high component/high coordinative) would result in the lower levels of performance and cognitive efficiency, whereas trials with the lowest complexity (low component/low coordinative) would result in the highest levels of performance and cognitive efficiency. Second, we hypothesized that similar dispersion and orientation would result in higher levels of performance and cognitive efficiency whereas dissimilar dispersion and orientation would result in lower levels of performance and cognitive efficiency.
Method
Participants
Participants were recruited from Amazon’s Mechanical Turk (N = 202; 124 males, 78 females). Ages ranged from 18 to 82 (Mage = 33.91). The majority of participants were White (n = 105), located in the USA (n = 130), and reported English as their first language (n = 181). All participants were compensated with a base payment of $3.00. A bonus based on correct responses to the experimental task was awarded to incentivize participants to perform well. For each correct judgement, a participant received one point and every 20 points resulted in a $0.10 bonus. The experiment took approximately 30 minutes to complete.
Materials
Potential participants were invited to complete the survey on AMT and the study was administered via Qualtrics. Participants who accepted the hit on AMT were redirected to Qualtrics where they reviewed an IRBapproved Informed Consent Document and documented their voluntary decision to complete the study. Following this, they interacted with instructional materials intended to familiarize them with the experimental task and in particular the visualizations used in the study. Alongside this material, participants were provided with a set of practice items with feedback.
Participants completed a series of experimental trials in which they were tasked with making a decision between two side-by-side images containing visualizations of uncertainty of a hurricane’s projected location overlaid on a map region. Specifically, participants selected the image which displayed the highest level of uncertainty. Immediately after each decision, they rated the level of difficulty associated with the decision on a 7-point scale, ranging from “Very Easy” to “Very Difficult.” Participants’ decision difficulty responses were used to assess cognitive workload.
Design
Independent Variables
This experiment used a mixed design. The between-subjects factor was visualization dispersion and the within-subjects factors were task complexity, visualization orientation, and judgement difficulty. Two types of commonly used and tested hurricane forecast visualizations were included in the experiment: the cone of uncertainty and ensemble track forecasts, or spaghetti plots (see Figure 1; Witt et al., 2020, 2022). In our experiment, participants were instructed that the width of the visualization represented the level of uncertainty associated with a given hurricane with greater width corresponding to a higher level of uncertainty.

Cone and spaghetti visualizations of uncertainty. Uncertainty is shown by the red zones. The width of the zone indicates the level of uncertainty. The wider the zone, the greater the uncertainty.
Task complexity varied along two dimensions: component complexity and coordinative complexity. Each complexity type had two levels. Component complexity was operationalized as the number of hurricane locations in each image, with 4 locations for low complexity and 8 locations for high complexity. Coordinative complexity was operationalized as the integration of different information visualizations, with a single visualization type (i.e., cone or spaghetti) as low complexity and a combination of visualization types (cone and spaghetti) as high complexity.
Experimental blocks were balanced across these complexity levels resulting in four conditions: low component/low coordinative (LL), high component/low coordinative (HL), low component/high coordinative (LH), and high component/high coordinative (HH). The number of judgements by condition is provided in Table 1. Each participant completed a total of 64 judgements in the decision making task. The presentation of blocks and images was randomized and counterbalanced.
The number of uncertainty judgements made by participants broken down by complexity level, visualization type, and visualization orientation. Each complexity level consisted of 16 uncertainty judgements.
Within these blocks, the visualization orientation and judgement difficulty were manipulated. Orientation was manipulated at two levels such that all uncertainty visualizations in an image were either similar in their direction (fixed) or dissimilar in direction (varied; see Figure 2). Like Fiore et al. (2017), each trial was classified as either easy or difficult on the basis of difference in uncertainty. First, each image was assigned a point value derived from the amount of uncertainty in an image. Then, the difference between a pair of image point values was calculated. The pairs were classified as easy if the difference was relatively large (e.g., 8 versus 16) and hard if the difference was relatively small (e.g., 8 versus 10).

Example stimuli for the high-high complexity condition with fixed orientation (left) and varied orientation (right).
Dependent Variables
We report performance and cognitive efficiency (CE) scores which determine the effect of task context on performance and cognitive load. Performance score were derived from responses to the decision making task. A participant’s performance score for an experimental condition was the percent of uncertainty judgements that they answered correctly in the condition. CE is a combinatory metric based on standardized workload scores and standardized performance scores (Fiore et al., 2006). CE scores “can be represented as the perpendicular distance from a line representing a level of zero efficiency (CE = [zp – zw]/√2)” (Fiore et al., 2017, p. 1195). A positive score reflects cognitive efficiency: higher performance relative to reported workload. In contrast, a negative score reflects a lack of cognitive efficiency: lower performance with relatively higher workload.
Results
We present the results of analyses conducted to investigate the effect of the independent variables on performance accuracy and cognitive efficiency.
Performance Accuracy
Repeated-measures ANOVA compared the effects of dispersion, component complexity, coordinative complexity, judgement difficulty, and visualization orientation on performance across the four complexity conditions. There was a significant main effect of component complexity on performance accuracy, F(1, 200) = 14.90, p < .0001, ηp2 = .069, observed power = .970. Higher performance was observed in the low component complexity conditions (M = .88, SE = .01) compared to the high component complexity conditions (M = .86, SE = .01). There was a significant main effect of coordinative complexity on performance accuracy, F(1, 200) = 11.81, p = .0001, ηp2 = .056, observed power = .928. Higher performance was observed in the low coordinative complexity conditions (M = .88, SE = .01) compared to the high coordinative complexity conditions (M = .86, SE = .01). There was a significant main effect of judgment difficulty on performance accuracy, F(1, 200) = 92.34, p < .0001, ηp2 = .316, observed power = 1.00. Higher performance was observed in the low difficulty conditions (M = .90, SE = .01) compared to the high difficulty conditions (M = .85, SE = .01).
There was a significant interaction between component complexity and judgement difficulty. F(1, 200) = 9.43, p < .005, ηp2 = .045, observed power = .864. The difference in performance between easy and hard judgments was larger in the high component complexity (M = .90, SE = .01 versus M = .83, SE = .01), than the low component complexity conditions (M = .90, SE = .01 versus M = .86, SE = .01). There was a significant interaction between component complexity, coordinative complexity, and dispersion, F(1, 200) = 15.14, p < .005, ηp2 = .070, observed power = .972 (see Figure 3). The difference in performance between the lowest complexity level (low-low) and the highest complexity level (highhigh) was larger in the random dispersion condition (M = .89, SE = .02 versus M = .85, SE = .02) than the uniform distribution condition (M = .88, SE = .02 versus M = .85, SE = .02). There was a significant interaction between component complexity, orientation, and dispersion. F(1, 200) = 5.95, p = .01, ηp2 = .029, observed power = .680.

Participant performance by condition. The highest level of complexity (high-high) is at the top of the plot. Dispersion condition information is encoded using color. In both dispersion groups, the highest performance is observed in the lowest level of complexity (low-low) but a clearer trend appears for the random dispersion visualization group (black bars).
Cognitive Efficiency
There was a significant main effect of component complexity on cognitive efficiency, F(1, 200) = 38.97, p < .0001, ηp2 = .163, observed power = 1.00. Higher CE was observed in the low component complexity conditions (M = .04, SE = .07) compared to the high component complexity conditions (M = -.10, SE = .07) There was a significant main effect of coordinative complexity on CE, F(1, 200) = 55.98, p < .0001, ηp2 = .219, observed power = 1.00. Higher CE was observed in the low coordinative complexity conditions (M = .06, SE = .08) compared to the high coordinative complexity conditions (M = -.12, SE = .07). There was a significant main effect of judgment difficulty on CE, F(1, 200) = 251.22, p < .0001, ηp2 = .557, observed power = 1.00. Higher cognitive efficiency was observed in the low difficulty conditions (M = .16, SE = .08) compared to the high difficulty conditions (M = -.22, SE = .07).
There was a significant interaction between component complexity and dispersion, F(1, 200) = 39.55, p = .02, ηp2 = .024, observed power = 1.00. The difference between CE scores between low and high component complexity was larger in the uniform dispersion group (M = .00, SE = .10 versus M = -.19, SE = .10), than the random dispersion group (M = .08, SE = .10 versus M = -.02, SE = .10). There was a significant interaction between component complexity and difficulty, F(1, 200) = 11.42, p = .001, ηp2 = .054, observed power = 0.920. The difference between CE scores between easy and difficult judgments was larger in the high component complexity conditions (M = .11, SE = .08 versus M = -.32, SE = .07), than low complexity conditions (M = .20, SE = .08 versus M = -.12, SE = .07). There was a significant interaction between coordinative complexity and difficulty, F(1, 200) = 4.54, p = .03, ηp2 = .024, observed power = .564. The difference between CE scores between easy and difficult judgments was larger in the high coordinative complexity conditions (M = .08, SE = .08 versus M = -.33, SE = .07), than low coordinative conditions (M = .23, SE = .08 versus M = -.11, SE = .07). There was a significant interaction between orientation and dispersion, F(1, 200) = 4.00, p = .05, ηp2 = .024, observed power = .512. CE scores in the random dispersion group were positive for both fixed (M = .01, SE = .10) and varied orientation conditions (M = .05, SE = .10), with a large difference between them. But, for the uniform dispersion group, CE scores were negative on both fixed (M = -.07, SE = .10) and varied orientation conditions (M = -.10, SE = .10) and more closely equal to each other.

Cognitive efficiency by condition. In both dispersion groups, the highest cognitive efficiency is observed in the conditions with lowest levels of complexity (low-low and high-low) and lowest difficulty but a clearer trend across complexity and difficulty levels is observed for the random dispersion visualization group (left facet).
Lastly, there was a significant interaction between dispersion group, component complexity, coordinative complexity, and judgement difficulty, F(1, 200) = 8.32, p = .004, ηp2 = .040, observed power = .819 (see Figure 4). In the random dispersion group, CE scores for easy judgements were positive and difficult judgements were negative across levels of complexity. This is in contrast to the uniform dispersion group: CE scores were positive in low-low complexity conditions and negative in high-high complexity conditions regardless of judgement difficulty but positive for easy judgements and negative for difficulty judgments in low-high and high-low complexity conditions.
Discussion
We presented an extension of a theoretical framework for the experimental design and evaluation of uncertainty visualization. This studied tested how similarities in visualization placement interact with variations in task complexity. In our results, we observed similar trends as those reported by Fiore et al. (2017) in performance and cognitive efficiency with more ecologically-valid experimental stimuli (i.e., in the random dispersion group). Overall, although differences between the dispersion groups were not statistically significant, we observed lower levels of performance and cognitive efficiency in the uniform dispersion group. Furthermore, there is a clearer trend associated with complexity level in the random dispersion group for both performance and cognitive efficiency.
The results of this study suggest that the task complexity framework used to design experimental stimuli in uncertainty visualization research generalizes to more ecologically valid stimuli. Ecological validity was improved by dispersing decision points across a visual area. This more realistically mimics how decision makers are exposed to differing pieces of information on a display. By testing how dispersion introduces changes in performance, we can more readily replicate the experience of DMs. The promise of the framework for studying variations in visualization is apparent in performance accuracy plot. Specifically, the task complexity framework better captured differences in performance for the randomly dispersed condition.
Our approach and Fiore et al.’s (2017) can be associated with the Gestalt principle of similarity as they related to visual perception and cognition (e.g., Peterson & Berryhill, 2013): we observe higher performance in low coordinative complexity conditions in which a singular visualization type was used. We also observed higher performance in the random dispersion group, in which there was varied distance between visualizations. This finding can also be associated with a Gestalt principle, namely proximity.
In sum, in this paper, we described our validation and extension of the framework proposed by Fiore and colleagues (2017) by moving beyond abstract and simple stimuli. We described how manipulations of visual and spatial elements in experimental stimuli can improve ecological validity of, and interaction with, task complexity to produce differences in performance and cognitive efficiency. We show how experimental stimuli in visualization research can be systematically generated and evaluated to alter cognition and performance in decision making contexts. These findings provide additional ways for decision support systems using visualizations to more rigorously test their technologies in a more systematic way.
Footnotes
Acknowledgements
This work was supported by the Office of Naval Research and Lockheed Martin Corporation. The views and opinions contained in this article are the authors and should not be construed as official or as reflecting the views of UCF, the Office of Naval Research or the Lockheed Martin Corporation.
