Abstract
Augmented Reality provides a mobile platform that could be used for a variety of human motor control assessments. Sensorimotor evaluations, directed towards hand-eye coordination, have widespread applications in many domains. This study investigated the effect of modality on a sensorimotor assessment by comparing button press location accuracy when administered through a touchscreen and with augmented reality. Participants performed a multidirectional tapping task under two degrees of Task Difficulty and Modality. Average accuracy decreased as the difficulty of the task increased as expected. Average accuracy also decreased when performing the task using AR as compared to using the touchscreen. There was no significant interaction between Task Difficulty and Modality. While AR can be used to assess hand-eye coordination, care should be taken when comparing to measures collected from different modalities. Further work will evaluate additional metrics to support understanding differences and similarities in these interfaces to support design recommendations.
Introduction
Augmented Reality (AR) has been used in many fields, such as rehabilitation (Sveistrup, 2004; Monge et al., 2018), medical education (Dhar et al., 2021), retailing (Caboni et al., 2019), and classroom teaching (Billinghurst et al., 2012). Sensorimotor assessments are a form of evaluation that measure an individual's capability to integrate sensory inputs with motor outputs to accomplish a given task and may be related to hand-eye coordination, balance, and proprioception. The assessments may involve tasks such as reaching for and grasping objects, standing on one leg, and tracking moving objects with the eyes. Sensorimotor assessments are crucial because they are widely used in various fields to identify impairments in sensory and motor functions that may affect an individual's ability to perform a task safely and effectively. For example, sensorimotor assessment can help to monitor and evaluate the recovery of patients after an accident or a stroke (Schwarz et al., 2019), but also help to assess balance, hand-eye coordination, and different motor control capabilities for healthy aging populations. Sensorimotor assessments have been performed using motion tracking devices, motion capture systems, ankle perturbation devices, and visual observation (Riemann et al., 2002; Shirota et al., 2019). Participants may be required to perform sensorimotor tasks while wearing multiple sensors to quantify metrics of performance. These methods often require numerous resources, equipment, and trained clinicians to conduct the evaluations and interpretations of the data.
Extended reality technology is a viable cost-effective alternative to physical assessments by replacing objects with virtual content. AR also has the capability to streamline intricate tasks through the provision of contextual information, resulting in accelerated and cost-effective task environment setup. Furthermore, AR has the ability to overlay instructions, labels, and annotations directly onto objects, thereby enhancing task intuitiveness and accessibility. However, there have been few efforts to examine the potential of extended reality for hand-eye coordination assessments. Previous studies developed sensorimotor tasks in either Virtual reality (VR) or AR, thereby demonstrating the feasibility of using extended reality for hand-eye coordination assessments (Kanzler et al., 2020; Shen et al., 2012; Park et al., 2008). It is still unknown how human performance for hand-eye coordination tasks differ when performed in extended reality and with real-world tactile devices such as a touchscreen. Understanding how user accuracy varies within AR can support the future development of AR-based sensorimotor tasks.
Fitts (Fitts, 1954) implemented a targeted pointing task that has been widely used within literature to evaluate human-computer interactions and a user’s hand-eye coordination. Recent standards (Soukoreff et al., 2004, and ISO, 2002) have extended Fitts’ paradigm to evaluate target acquisition in multiple directions, i.e., a multidirectional tapping task, with a circular array of targets equidistantly arranged. Alternative hand-eye coordination assessments require participants to perform pursuit tracking of a target with a pointing device or acquiring targets presented randomly (Park et al., 2008). Regardless of the target or task type, the principal goal of hand-eye coordination assessments is to tap the targets as quickly and accurately as possible.
The goal of this study was to compare performance of a sensorimotor hand-eye coordination task, in the form of a multidirectional tapping task, when deployed on a standard touchscreen and in AR. We evaluate the button press location accuracy of performing the same task under both modalities. We hypothesized that accuracy would be affected by (1) index of difficulty and (2) test modality. The results will identify the effect of AR-based targets on user accuracy and hand-eye coordination by comparing performance with normative data collected with a touch screen device. These findings provide evidence of the feasibility of using AR for sensorimotor assessment tasks in place of traditional methods, but more importantly, the results provide important information for the interpretation of user accuracy across different modalities that can be used to support assessments and design recommendations.
Methods
Participants
Thirty-two participants (13 female and 19 male, mean age 22.6 years ± 3.3 SD, min: 18, max: 31) were recruited from the University of Michigan to participate in the current study. Participants reported normal (n=16) to corrected-to-normal vision (n=16) and reported no musculoskeletal, auditory, or vestibular disorders. Participants were informed of the procedures before participating and provided written consent, which was approved by the University of Michigan Institutional Review Board for Health Sciences and Behavioral Sciences. All participants reported daily usage of touchscreen devices, such as tablets and smartphones. Eleven participants reported no previous experience with extended reality devices (i.e., AR or VR). Twenty-one participants reported previous limited experience (between 1 and 10 hours) with extended reality devices.
Task Selection
The hand-eye multidirectional tapping task was adapted from the ISO9241-9 standard (ISO, 2002) (using 16 targets rather than 25) and is an extension of the Fitts paradigm (Fitts, 1954). The hand-eye task features 16 targets arranged equidistantly in a circular array with the principal aim of tapping the targets as quickly and accurately as possible (Figure 1). The sequence in which the participant acquired the targets follows a predefined tapping pattern that alternates the active target in a clockwise procedure across the full diameter of the array. The active target illuminates for the participant to follow with the starting and ending positions at the apex of the array. The task was implemented within an AR environment in the Microsoft HoloLens 2 device and using a 27-inch touchscreen monitor.

Experimental multidirectional tapping task with 16 target selections. Participants start at the target indicated with the plus sign. Targets illuminate in the order shown by the dotted line.
The width of the targets (target size) and the diameter of the array (movement distance) were manipulated to establish two distinct indexes of difficulty. Index of difficulty was determined as
where D is the diameter of the array and W is the width of the target. The easier index of difficulty case (ID 1) had a diameter of 0.152 meters and a target width of 0.025 meters, yielding an index of difficulty of 2.824 bits. The more challenging index of difficulty case (ID 2) had a diameter of 0.229 meters and a target width of 0.019 meters, yielding an index of difficulty of 3.700 bits. The indexes of difficulty selected were in line with the 2-8 bits range of IDs used within the literature (MacKenzie et al., 2001; Norman et al., 2010; Soukoreff et al., 2004).
In the touchscreen environment, the array of 16 targets was presented with the center of the array aligned with the participant's eye height. In the AR environment, the array of 16 targets was holographically projected 0.5 meters from the users’ head position, also aligned at eye height. In both settings, participants were permitted to move as closely as needed within the physical and augmented environments to complete the task. Participants were instructed to perform the task with their dominant index finger. Target acquisition was dependent on the participant’s tracked fingertip location relative to the bounds of the target. The next target within the sequence would simultaneously activate when the current active target was pressed by the tip of the user’s index finger. The order in which the participants perform the tasks across the Task Difficulty and Modality was randomized to reduce bias and strengthen the internal validity.
Experimental Protocol
This study included a demographics survey, training with the hardware and multidirectional tapping task, an evaluation phase, and a post-task survey. The study duration was one hour. Each participant was randomly assigned one of four task orders in terms of Modality (AR vs. Touchscreen) and Task Difficulty (ID 1 vs. ID 2). Each order was performed by eight participants. For each Task Difficulty and Modality type, the participant performed fifteen rounds of the sixteen tapping sequences, yielding 240 taps per condition per participant.
During the training section, participants were first trained with the HoloLens 2 AR headset, where the training included a built-in tool for learning interactions with holograms and navigating through the application interface to enable assessment protocols. Training also included a demonstration of the target acquisition sequence using the touchscreen device. Participants were permitted to train up to five rounds of the sixteen target selections within each modality prior to testing. After the training section was completed, participants entered the evaluation phase. The evaluative testing required participants to perform three testing blocks of the sensorimotor assessment for each index of difficulty in the AR or Touchscreen modality (6 blocks in total). One block consisted of five rounds of 16 target acquisitions. The participant was permitted to rest 20 seconds between trials and 45 seconds between blocks. Upon completion of all conditions, the participant answered a post-task survey concerning their perceived workload and fatigue, as well as their preferred modality to perform the task.
Performance Measures and Statistical Analysis
All data processing and statistical analyses were completed in MATLAB (Mathworks, Natick, MA). Standard measures of Fitts’ law (Soukoreff et al., 2004) were calculated, including accuracy, precision, movement time, throughput, and error rates. In this paper, accuracy is presented and was defined as the percent difference in the distance between the tap position and the center of the target. For both modalities, we calculated the distance between the tap position and the center of the target projected on the XY plane as
where R is the radius of the target (W/2) and P represents the distance between the tap position and the center of the target. Zero accuracy represents the tap position on the edge or outside of the target on the TS modality. Although the target array was presented in a 2D plane, the three-dimensional component present in AR required additional zero accuracy criteria. In AR, zero accuracy was defined as the tap position, represented in the XY plane, located on the edge of the target, or an unsuccessful tap action as a result of the fingertip not intersecting the plane of the AR target (z-axis).
In the context of input modalities, a fundamental distinction between touchscreen and AR lies in the capacity to obtain the tap position outside of the target. While in touchscreen environments, the tap position outside of the target can be directly obtained, in AR environments, identification of false attempts requires an indirect approach. We defined tap attempts by detecting rapid changes in the z-axis behaviors of the fingertip position that are not aligned with the record of successful taps. Our study leveraged hand-eye tracking data, obtained from the headset’s embedded sensors, to record the finger position of the participant in the x-axis, y-axis, and z-axis at a rate of approximately 0.02 seconds (60 Hz). We observed that whenever the participant attempted to click the target, a rapid change in the z-axis from low to high and back to low occurred. Furthermore, the visualization of the z-axis position revealed a U-shaped curve, characterized by an initial decline, a peak, and a subsequent decline as the participant attempted to tap and retract their finger. To identify these curves, we examined the z-axis data using Matlab’s find peak function and adjusted the thresholds such that noise fluctuations we were not captured. Each peak was associated with the nearest timestamp of a successful tap data point. Any peak not matched with a successful tap data point was considered an out-of-target attempt.
A four-way ANOVA model was fit for the dependent variable accuracy to assess the effect of the independent variables: Index of Difficulty (ID1, ID2), Modality (Touchscreen, AR), Participant (Random effect), Task Order (1-4).
Results
One Participant’s data was excluded from the analysis due to excessive noise and fluctuations in the recorded hand tracking data. All participants were instructed to use their dominant index finger throughout the testing protocol; however, one participant switched hands during trials. For this participant, four sequences of AR data were not included in the analysis as the participant switched hands during these trial periods.
Average accuracy decreased as the difficulty of the task increased (change from ID 1 to ID 2) and decreased when using AR compared to Touchscreen (Figure 2). There was no interaction effect for Index of Difficulty with the Modality (Table 1) and both Task Difficulty and Modality were significant main effects. The ANOVA model supported a significant interaction effect of the Modality and Task Order. This interaction was further investigated, and it was found that within Order, the effect of Modality followed the same trend, but had a differing magnitude. This interaction between Modality and Order did not affect the findings inferred from the main effects.

Box plots of the average accuracy of all the participants (n=32) with respect to Index of Difficulty and Modality. All conditions were significantly different from each other (p<0.001).
The ANOVA model fit for the dependent variable of accuracy.
Discussion
In this study, participants performed a two-dimensional multidirectional tapping task deployed in two different modalities – AR using the Microsoft HoloLens 2 device and a 27-inch touchscreen monitor. We hypothesized that accuracy would be affected by (1) index of difficulty and (2) test modality. The results support our hypotheses that there would be an effect of Task Difficulty and Modality on accuracy. There was no interaction effect between index of difficulty and modality.
The result of our study demonstrated that the average accuracy decreased as the difficulty of the task increased (ID 1 compared to ID 2). This result was expected for both modalities because as the difficulty increased, the size of the target decreased (W in Equation 1 and R in Equation 2) and the diameter of the array increased (D in Equation 1). The larger array distance and small target size requires the user to tap closer to the center of the target to maintain the same accuracy for the higher degree of difficulty because the accuracy examined was scaled to the target size. Therefore, even if participants had a higher absolute accuracy (i.e., a shorter distance to the target center) in the task, they might still have a lower scaled accuracy because the radius of the target was smaller. Further evaluations should consider the importance of both the scaled accuracy and the absolute accuracy.
The result of our study also support that the average accuracy was higher in the touchscreen modality than in the AR modality. This outcome was not unexpected, as none of the participants had extensive prior experience with the AR environment and the task required the participants to tap the target with high accuracy and speed. The lack of familiarity with the device may have contributed to lower accuracy, as participants were more accustomed to interactions with a touchscreen environment. Accuracy may also be lower with AR due to difficulties in perception, lack of tactile feedback, and reliability of gesture recognition in the AR environment, which have all been observed in a previous AR user study (Weiss et al., 2023).
Accuracy measures may have been impacted by participants touching the edge of targets in favor of decreased movement distances. Participants were permitted to emphasize speed and accuracy as needed to acquire targets. While one block of training was provided within AR, this may not be sufficient for users to learn how to effectively interact within AR; however, there were no observed improvements in performance across evaluation of the testing blocks. Another consideration is that finger tracking requires the index finger to be visible by the headset’s cameras. Arm fatigue may have influenced the user’s ability to maintain the appropriate fingertip position to support finger tracking.
Future work includes the analysis of additional performance metrics, such as precision, movement time, throughput, and error rates, as well as evaluations of the eye-tracking and fingertracking strategies from the AR system. Additionally, evaluating the differences in accuracy performance when the participants are instructed to precisely touch the center of the target may provide further insight into accuracy across modalities. For this study, participants were not explicitly instructed to tap the center of the target but rather perform the task a quickly and accurately as possible. Furthermore, it may be important to assess these relationships in alternative modalities to ensure the AR data can be appropriately validated and compared. Additional future studies could evaluate the effect of tactile feedback, which has been identified as a difficulty in AR interactions (Weiss et al., 2023), by situating the holographic array against a wall.
While the button press accuracy performance was lower in AR, the environment may still be a viable tool for the evaluation of hand-eye coordination with the additional benefit of hand and eye-tracking capabilities to support assessments. The use of AR for longitudinal evaluations should be evaluated further. The effect of index of difficulty (lower accuracy with increasing task difficulty) is consistent across modalities; therefore, larger targets (>0.025 meters) are recommended for AR-based target acquisition tasks to counteract the inherent difficulties present within AR interactions. Ultimately, care should be taken when comparing accuracy performance for sensorimotor tasks across modalities.
Footnotes
Acknowledgements
This study was supported in part by the National Aeronautics and Space Administration (NASA) Human Research Program Award 80NSSC20K0409.
