Abstract
Autonomous rendezvous and docking (ARD) maneuvers are challenging tasks that require collaboration between a human and a spacecraft to be successful. As automation becomes more integrated into ARD systems, it is important to consider when and why a human may take control. Intrinsic human characteristics can influence these decisions. We consider how human spatial orientation capacity affects participants when monitoring a simulated ARD maneuver and initiating takeover when the system is perceived to be failing. Participants’ spatial reasoning capability was assessed and compared to performance in the monitoring task and perceived mental workload. While participants showed high rates of success in the task, they showed a wide range in spatial reasoning capacity and perceived mental demand. Spatial reasoning capacity did not indicate participants’ mental workload, which has implications for the human as the supervisor. These results inform future work on augmentative displays that may incorporate exocentric and egocentric views.
Introduction
Autonomous rendezvous and docking (ARD) of spacecraft is a challenging task for space operations such as resupply, inspection, and maintenance missions. ARD systems are highly complex, consisting of a multitude of navigation and control systems working together to perform acquisition of the spacecraft, calculate relative orientation and attitude, and execute the docking maneuver (Woffinden & Geller, 2007). These docking maneuvers require highly-skilled astronauts operating in a high-risk environment. ARD presents an opportunity to reduce the mental and physical workload on the astronaut by offloading certain tasks such as actuator control and motion planning onto an autonomous system, potentially benefiting the astronaut by making it easier and safer to perform high-level tasks in spacecraft motion control. As autonomy is integrated into complex systems, human involvement in the system control loop remains necessary, but also brings on new challenges (Sarter, Woods, & Billings, 1997). Robotic systems are limited by the task and environment in which they are designed to operate. Thus, humans are critical to supplementing performance shortcomings of the autonomous robot, resulting in a situation where the human and robot share control over the system. The goal in successful ARD operations is to exploit the accuracy and precision of the robot to reduce the mental workload on the astronaut (NASA, 2015). The integration of the human with the autonomous systems poses several important challenges, including considerations for when and why a human may take control.
Since the beginning of the space race, ARD has been a focus area for human-robot interaction (Woffinden & Geller, 2007). The United States space program, which gravitated towards a manual control approach to ARD, was distinct from the Russian space program, which favored an autonomous control approach to ARD. These approaches to ARD began to blend throughout the unprecedented space exploration programs by both nations. Currently, there is extensive research on control strategies that can be used in ARD (Breger & How, 2008), (Zappulla, Park, Virgili-Llop, & Romano, 2019). To ensure proper system performance, the human must often monitor the status of the robot performing its maneuver (Bainbridge, 1983). When the human monitor of a docking maneuver believes the spacecraft is in a critical state, the human initiates a manual takeover mode, which is when the human intervenes in the robot’s control loop and assumes control of the system.
Here we define manual takeover as an action to take control by a human monitoring an autonomous system that happens in response to the human’s situation awareness (Endsley, 1995) leading to an impression that the the autonomous system is in a critical state. In response to the human’s inference that the system will fail, the human intervenes in the autonomous control of the system and performs manual control to correct the perceived probability of system failure. While takeovers have not been studied in spacecraft, they have been studied in the domain of autonomous vehicle simulators. Autonomous vehicle takeover studies (e.g. (Weaver & DeLucia, 2020), (McDonald et al., 2019)) often analyze the robot’s process of requesting manual takeover by the human followed by the human’s action to take control of the simulated autonomous vehicle. The purpose of these studies is to improve the control transition from the robot to the human when a critical situation is identified. For spacecraft docking, it is important to understand the decision to takeover, rather than only the process of taking over when requested. While takeover predictors could be used in selection criteria for human supervisors, their greater potential lies in improving ARD algorithm design and user interfaces so that system success is less dependent on having supervisors with certain levels of proficiency in a specific cognitive skillset.
Intrinsic aspects of the human may affect monitoring performance and the decision to initiate manual takeover in an autonomous docking task. Studies have investigated how a person’s spatial reasoning capacity may have an effect on astronaut task performance. Liu et al. (Liu, Oman, Galvan, & Natapoff, 2013)explored how astronauts’ performance on spatial reasoning assessments can be used to identify high performers in robotic training assessments. Collins et al. (Collins, Tomlinson, Oman, Liu, & Natapoff, 2008) found spatial reasoning to have an effect on performance metrics such as participants’ navigation and alignment to the target in a telerobotic arm operating task. These studies elicit the question of how spatial reasoning may affect performance in monitoring autonomous docking.
This work analyzes the relationship between human spatial visualization ability and performance in an autonomous agent monitoring task. In this study, we develop a simulation of an autonomous docking maneuver of a spacecraft to a mock-up space station. The human monitors a variety of autonomous docking maneuvers and initiates a manual takeover mode when they perceive the system to be failing. We measure task success and consider it in context with the participant’s spatial reasoning and perceived workload. We hypothesize that (1) higher spatial reasoning capacity will result in higher success rates in the monitoring task and (2) higher magnitudes of spatial reasoning will correspond to lower perceived mental workload in the monitoring task, and (3) higher success rates in the monitoring task will correspond to lower perceived mental workload.
Methods
Participants
Participants provided written informed consent approved by the University of Michigan Institutional Review Board (HUM00219137). A total of
Simulation Environment
A simulation game was developed using a virtual reality headset platform. The simulated space environment (Fig. 1a) consisted of a mock-up International Space Station (ISS) and an autonomous spacecraft (referred to as an agent) that attempted a variety of autonomous docking maneuvers that targeted two docking platforms. Participants monitored the status of the agent performing this maneuver from the perspective of a pilot with an egocentric view of the environment on-board the agent (Fig. 1b). The docking targets were designed as spherical platforms to avoid the need to reorient the agent during the final stages of the docking maneuver.

Simulation environment in which a robotic agent performs and autonomous docking maneuver on a mock up space station. Fig. (a) shows an exocentric view of the simulation, while (b) shows the participant’s egocentric perspective when monitoring the docking maneuver.
Experimental Protocol
Participants completed the Vandenberg Mental Rotation Test (MRT) (Vandenberg & Kuse, 1978) to characterize participants’ spatial reasoning capability. This survey consisted of twenty items which tested the participant’s spatial reasoning ability using images of three-dimensional items rotated into different orientations. For each item, the participant was given an image of a standard object to compare against four candidate images of objects. Two of the four candidate images depicted the same object as the given standard object but rotated at a different orientation. The participant’s goal was to identify the two object images that were the same as the standard object. The other two candidate images depict objects that were not the same as the standard one, referred to as distractor images. The distractors in each question were created using two distinct strategies. In half of the items, the distractors were mirror images of the standard object. In the other half, the distractors were rotated versions of objects from different items.
The MRT was scored using a strategy adapted from the forty-point scale detailed in (Voyer, 1997). In this strategy, one point was awarded for each correct answer. Incorrect answers were docked 1 point. Therefore, if one answer was correct and one was incorrect, zero points were awarded for this item. Our strategy differed from (Voyer, 1997) by also deducting a point for one wrong answer and two wrong answers on a problem, enabling a score to go below zero and to limit saturation effects on the lower end of the scoring. The adapted score scale was
Before the simulation game began, other demographic information was recorded, and the objectives and rules of the simulation game were presented to the participant. The participant donned the virtual reality headset and completed an introductory phase, which allowed the participants to familiarize themselves with the hardware. Then they completed a training phase to familiarize themselves with the game protocol followed by a validation phase to ensure that each participant understood the hardware and the game rules. After passing the validation phase, the participants performed 180 docking maneuver trials, which were split into three phases of 60 trials each to allow for breaks. The experiment was self-paced and could be paused and continued at the participant’s discretion.
The agent utilized a variety of path designs in its approach to the docking stations. For any trial, the agent’s planned path would either end at the green dock or the red dock. However, the targeted dock was unknown to the participants. The participant’s goal was to initiate a takeover when they perceived the agent to be failing at docking properly. Participants were instructed to monitor each maneuver and initiate the takeover when they believed the agent was attempting to dock at the red station as opposed to the green one, or the agent was at risk of collision with the ISS. Participants were not asked to manually correct the path after the decision to initiate the takeover. Each trial would end after a complete docking maneuver or a takeover initiation.
Successful trials were the ones that ended at the green dock (successful dock) as well as the trials where takeover was initiated on a path that was headed towards the red dock (successful takeover). Failure trials were ones where takeover was not initiated (failed dock), takeover was initiated even though the agent was heading to the green dock (unnecessary takeover), as well as the trials where the takeover decision was “too late" (late takeover). In a late takeover, the point of takeover initiation was too close to the red dock, and the agent would not be able to feasibly correct the path due to space constraints. To foster game motivation, participants were scored based on each trial outcome (successful dock (+2), successful takeover (+5), unnecessary takeover (0), late takeover (-5), failed dock (-5)). In the case of an unnecessary takeover, participants were neither given nor docked points. Thus participants were not penalized for being cautious with the safety of the system. Participants were not informed of the maximum score achievable, and the maximum score was varied in each of the three phases so that participants could not back-calculate the trial result.
At the end of the experiment, the NASA Task Load Index (TLX) (Hart & Staveland, 1988) was used to characterize the participant’s perceived mental workload. We analyzed one of the six prompts in the TLX for mental workload. Participants respond to the prompt “How mentally demanding was the task?" on a scale from -10 (very low) to 10 (very high).
Data and Statistical Analysis
The participant’s success rate in the simulation was calculated by normalizing the number of successful trials (successful docks plus successful takeover trials) by the total number of trials attempted. A Pearson correlation coefficient was calculated to determine if there was a linear relationship between MRT score and success rate, MRT score and mental workload, and success rate and mental workload. A Spearman’s rank correlation coefficient was also calculated for these three relationships to analyze if there was a monotonic relationship.
As a secondary analysis to inform interface design guidelines, we also considered if there was a difference in MRT responses for the two types of distractors. We estimated the Cohen’s d effect size between the mean performance in the mirror distractors and mean performance in rotated distractors. The effect size was interpreted using the heuristic of a small effect
Results
In the monitoring task, participants showed an average success rate of 98% and standard deviation 1.4% . The minimum success rate was 94%, and six participants performed all trials successfully. Participants had a wide range of performance on the Vandenberg Mental Rotation Test with an average score of 18, standard deviation of 8.5, minimum score of -2, and maximum score 36 . Participants reported a wide range of perceived mental workload on the task (mean of 0.17, standard deviation 5.1, minimum score -9, maximum score 8 ).
The rate of correct responses to items with mirror distractors across participants was 0.5
There were no statistically significant correlations observed between the measures examined. There was no statistically significant correlation between participants’ MRT score and success rate (Fig. 2a), with a Pearson correlation coefficient of -0.019

(Left) Participants’ success rate at the autonomous docking monitoring vs. MRT scores. (Middle) Participants’ NASA TLX mental demand scores vs. MRT scores. (Right) Participants’ NASA TLX mental demand scores vs. success rate.
Discussion
The purpose of this study was to analyze how spatial reasoning capacity as assessed by the MRT affected performance in monitoring autonomous docking maneuvers. Performance was analyzed by the success rate of the participant completing the task and their reported mental workload. We hypothesized that (1) higher spatial reasoning capacity would result in higher success rates in the monitoring task, (2) higher spatial reasoning capacity would correspond to lower perceived mental workload, and (3) higher success rates would be associated with lower perceived mental workload by the participant.
The first hypothesis was not supported as there was no linear or monotonic relationship between participants’ MRT scores and success rate in the simulation game. For the autonomous task evaluated, MRT score was not a significant predictor of performance. While success rates all fell within 94% and 100%, the context of the decision being made determines if these are acceptable performance levels. In our task, a mean failure rate of 2% may not be acceptable performance. This result supports that people with a range of spatial reasoning capability perform at levels above 94% . In the simulation, the participant monitors the autonomous agent from an egocentric view, the perspective of a pilot onboard the autonomous agent. In contrast, the MRT analyzes spatial reasoning about an object from an exocentric perspective. It is possible that the egocentric perspective that participants experienced in the simulation does not require high levels of exocentric spatial reasoning.
The second hypothesis was not supported as there was no linear or monotonic relationship between participants’ MRT scores and reported mental demand. We originally thought that higher levels of spatial reasoning would indicate a participant’s natural propensity towards spatial orientation tasks, which would then allow them to complete the monitoring task using a lower mental workload. However, we observed that participants with high MRT scores still reported a range of perceived mental workload. This result indicates that higher spatial reasoning capacity is not necessarily a predictor of perceived mental workload of the task. The result could also indicate that the task did not require high spatial reasoning capacity from an exocentric rotation sense, but that the mental workload was driven by another factor, such as projecting the trajectory at a given position along the path.
The third hypothesis was not supported as there was no linear or monotonic relationship between success rate and reported mental demand. It was originally thought that participants who performed well at the task would find it to be easier and would report lower magnitudes of mental demand. However, the range of reported mental demand supports that some participants perceived the mental workload of the task to be high, but they were still successful in this monitoring task. These findings are consistent with those observed in the literature (Lee, Wickens, Liu, & Boyle, 2017). Overall, participant performance levels were all greater than 94%, but participants reported a wide range of perceived effort to complete the task. The wide range of perceived effort could indicate that the task required prolonged vigilance, which resulted in high mental demand. High levels of mental demand present implications for the human in a supervisory role. Multiple Resource Theory illustrates how performing multiple tasks that utilize concurrent resources can lead to performance deterioration (Zeeb, Buchner, & Schrauf, 2016) For this study, participants had no secondary tasks to focus on while monitoring. Studies using autonomous vehicle simulators have shown that takeover performance suffers when the driver is engaging in non-driving related tasks such writing emails and watching news (Zeeb, Buchner, & Schrauf, 2016) and quizzing tasks (Merat, Jamson, Lai, & Carsten, 2012). In space applications, astronauts will divide attention between other operating tasks while monitoring a docking maneuver. The mental workload of the operator should be carefully considered when secondary tasks are incorporated into the monitoring task.
Further analyzing MRT performance results could inform interface designs that mitigate high mental workload. Participants showed a 24% increase in performance on rotated distractors compared to mirrored ones. While this work did not focus on interface design specifically, these results should inform the design of augmentative displays that incorporate both egocentric and exocentric perspectives to support the monitoring task. If exocentric views are incorporated into an interface to support the monitor’s spatial orientation, monitor’s may be less accurate on spatial interpretation when presented with reflective views. Future work could analyze the challenges associated with incorporation of exocentric displays, which may be rotated from the egocentric view, into the autonomous monitoring task.
Limitations and Future Work
In this study, we considered one intrinsic characteristic of the human, their spatial reasoning. Future studies should consider other intrinsic characteristics, as well as other aspects of the environment and agent that may influence takeover behavior. Simulation sickness was not measured directly, though consequences of simulation sickness may have been represented when participants responded to the TLX survey within mental effort and frustration. Participants showed a variety of reactions to the simulation. While some experienced moderate dizziness others were comfortable in the simulation.
The virtual reality platform gave participants a full field of view where they could rotate themselves to locate their position relative to the ISS even when the docks were not in their direct line of sight. While this supported decision making, this did not consider how restrictions on field-of-view can affect performance. While results showed implications for exocentric versus egocentric display design, future work should also consider the viewpoint presented to the monitor.
The docking task in this study was limited to motion within a 2D plane. Rendezvous motions can take on more complex three-dimensional shapes that should be considered in future studies. Factors such as agent orientation and alignment at docking were not included for this study, but could impact performance. Additionally, resource management of the agent, such as fuel monitoring, was not considered by the human or for the selected paths. The agent’s path plan was a polynomial spline, which is simplified compared to the complexity of ARD maneuvers applied in space today. Future work should consider these additional complexities, as well as additional factors related to the autonomous agent’s path design and the human’s decision to takeover.
Conclusion
This work analyzed how human spatial orientation capacity related to performance in monitoring an autonomous spacecraft docking maneuver. We found that the spatial orientation ability as assessed by the Vandenberg MRT, an exocentric mental rotation survey, does not significantly affect the egocentric monitoring performance in this simulation study or the monitor’s perceived mental workload. The monitoring task was performed successfully by people with a range of spatial reasoning capacity, yet required different levels of mental workload, which has implications when additional tasks are required. Participant performance in the MRT could inform future aspects of interface design for this simulation by suggesting perspective strategies for objects under evaluation. Future work will focus on how the autonomous agent and environment affect the monitoring task to better support the human-autonomy interaction in ARD tasks.
