Abstract
BACKGROUND:
Inertial self-motion perception is thought to depend primarily on otolith cues. Recent evidence demonstrated that vestibular perceptual thresholds (including inertial heading) are adaptable, suggesting novel clinical approaches for treating perceptual impairments resulting from vestibular disease.
OBJECTIVE:
Little is known about the psychometric properties of perceptual estimates of inertial heading like test-retest reliability. Here we investigate the psychometric properties of a passive inertial heading perceptual test.
METHODS:
Forty-seven healthy subjects participated across two visits, performing in an inertial heading discrimination task. The point of subjective equality (PSE) and thresholds for heading discrimination were identified for the same day and across day tests. Paired t-tests determined if the PSE or thresholds significantly changed and a mixed interclass correlation coefficient (ICC) model examined test-retest reliability. Minimum detectable change (MDC) was calculated for PSE and threshold for heading discrimination.
RESULTS:
Within a testing session, the heading discrimination PSE score test-retest reliability was good (ICC = 0. 80) and did not change (t(1,36) = –1.23, p = 0.23). Heading discrimination thresholds were moderately reliable (ICC = 0.67) and also stable (t(1,36) = 0.10, p = 0.92). Across testing sessions, heading direction PSE scores were moderately correlated (ICC = 0.59) and stable (t(1,46) = –0.44, p = 0.66). Heading direction thresholds had poor reliability (ICC = 0.03) and were significantly smaller at the second visit (t(1,46) = 2.8, p = 0.008). MDC for heading direction PSE ranged from 6–9 degrees across tests.
CONCLUSION:
The current results indicate moderate reliability for heading perception PSE and provide clinical context for interpreting change in inertial vestibular self-motion perception over time or after an intervention.
Introduction
The vestibular system detects linear and angular head motion which controls balance, posture, vestibulo-ocular reflex (VOR) gaze stabilization, and perceptual estimates [16]. Self-motion perception refers to one’s ability to accurately perceive heading or other motion trajectories, which is essential for spatial navigation [2, 7]. Peripheral cues from the semicircular canals and otolith organs influence awareness of head and body spatial positioning relative to the external environment [51]. Accurate and precise self-motion perception relies heavily on cue integration of both visual and peripheral vestibular signals [16, 68]. Recent evidence suggests that self-motion perception depends on direct sensory inputs and indirect information, influenced by the interaction between otoliths and canals [35, 62]. Patients with vestibular disorders often report perceptual symptoms such as imbalance, vision problems, dizziness, and abnormal self-motion perception [4, 60]. However, one-third of individuals with subjective vestibular symptoms have normal vestibular diagnostic test results, revealing a discrepancy between diagnostic testing and subjective symptom reports [22]. The discrepancy between symptoms and diagnostic test results may be due to diagnostic tests primarily assessing the VOR. The disconnect suggests a need to develop additional diagnostic capabilities, since conventional vestibular reflex testing fails to evaluate central integration processing [22, 48]. Self-motion perceptual assessments may close the gap between diagnostics and patient complaints [37].
While vestibular self-motion perception has been studied extensively [10, 58], little is known about the psychometric properties or reliability of vestibular perceptual measures beyond high reliability for 1 Hz yaw rotation thresholds in healthy adults [39]. Recent evidence demonstrated that vestibular perceptual thresholds are adaptable [20, 63], suggesting novel approaches to treating perceptual impairments. There is a need to determine the psychometric properties for vestibular perceptual measures to improve diagnostic capabilities and quantify treatment outcomes for vestibular perceptual symptoms.
Combining visual and/or auditory signals with inertial self-motion narrowed vestibular perceptual threshold estimates [11, 65], thus increasing precision. Since self-motion perception is influenced by multisensory signals, assessment techniques must limit the contribution of non-vestibular cues when examining the contributions of vestibular signals to self-motion perception [12, 20]. Similarly, path stability is affected in patients with vestibular hypofunction vestibular [7, 66]. Since path stability depends on integration of linear self-motion cues (proprioception, vision, and inertial heading), tests examining heading perception may provide clinical relevance and mechanistic insight regarding path instability for individuals with vestibular disease [1, 48].
Despite substantial understanding of self-motion perception and classification techniques, the diagnosis and treatment of patients with vestibular disorder remains challenging due to the lack of clinically relevant comparative measurements for perceptual thresholds. Establishing psychometric properties and minimum detectable change values for heading perception will enhance diagnostic and treatment methods for individuals with path stability symptoms. This study reports on the psychometric properties of a passive inertial heading test, which provides clinical context for interpreting change in vestibular self-motion perception.
Methods
Subjects
Forty-nine subjects participated in the study (12 males and 36 females), one subject identified as neither male nor female. Mean age was 25.0±10.6 years (mean±SD, range 18–68). The protocol involved two visits, 1–3 weeks apart, in which the subjects were tested up to 3 times, see Fig. 1. In one visit subjects were tested once, in the other visit subjects were tested twice separated by a 20-minute period looking at their phone or relaxing. The order of the first and second visit was determined by coin flip at the time of the first session. Each testing session lasted approximately 15–20 minutes and each visit lasted between 30 and 60 minutes.

Diagram representing the experimental flow across visits. Subjects were randomized to participate in a single inertial heading perceptual test or two inertial heading perceptual tests on their first visit. They participated in the other test on the second visit. 9 subjects declined a second test on their initial visit and are not represented in the same-day test-retest analysis.
Due to time constraints, not all subjects were available to participate in the second test during the same visit, however all subjects completed two visits in which they were tested at least once per visit. A subset of 40 subjects (11 males 29 females, mean age 24.4±9.1, range 18–63) completed both the same-day and across week test-retest protocol. All subjects were naïve to the experiment design and did not receive any feedback. All subjects were healthy and denied any history of dizziness, neurological or vestibular disease. All participants provided written informed consent. The protocol was approved by the University of Rochester Research Science Review Board.
Equipment
Inertial motion stimuli were delivered using a 6-degree of-freedom (6-DOF) motion platform (Moog, East Aurora NY model 6DOF2000E). The setup has been previously described for heading estimation experiments and is commonly used in human motion perception studies [56, 57]. The subject sat in an automotive-style racing seat and wore a helmet. The helmet was coupled to the motion platform to stabilize the head (Fig. 2). Tests were performed in complete darkness and white noise stimulus was played during movement from two speakers facing the subject on either side to mask the mechanical sounds of the platform motion. The noise was the same regardless of motion direction. Subjects indicated directional responses with a handheld three-button control box. Subjects were instructed on how to operate the control box to record responses; however, there was no further training to minimize learning effects. The center button was used to initiate each stimulus presentation after an auditory cue. The right or left buttons were used to indicate the perceived direction of the inertial motion prior to the platform returning to the origin.

Study design representation. A) Experimental set-up, subjects sat in a chair on the Moog that translated linearly from the origin in the horizontal plane to either the left (black arrow) or right (white arrow) of center (gray arrow). B) Exemplar motion profiles showing sinusoidal acceleration, velocity, and displacement corresponding to a 2 second motion stimulus. C) Subjects always responded either left (black circle) or right (gray circle) to each motion. D) Example data from a representative subject showing fitting binary data using logistic regression to determine PSE and Threshold. Filled gray circles indicate average subject responses (either 0 or 1) and size of the circle corresponds to the number of presentations at that heading angle. The dashed black line represents the fit function. PSE is the x-coordinate value at the y-value = 0.5, threshold is the width of the fitted function.
Inertial stimuli were presented in the horizontal plane with a 2 s (0.5 Hz) sine wave in acceleration. Movement corresponded to 15 cm of displacement, with peak acceleration of ±23.6 cm/s/s and a peak velocity of 15 cm/s. This movement profile is at least an order of magnitude higher than human linear motion perception thresholds. The right/left stimuli presented included 0±50-degree inertial heading range based on an adaptive staircase design (Fig. 2). A 2-alternative force choice paradigm was derived from two-interleaved staircases with 25 steps each, with step size adapted based on preceding movement stimuli and the subjects’ response to that stimuli.
For staircases that started with a stimulus 50° to the right, after each rightward response the next stimulus was shifted 8° to the left. After a leftward response the step size was decreased by half (e.g. from 8° to 4°) and shifted to the right. With subsequent reversals, stimuli step sizes could be reduced to a minimum of 1° or increased after three responses in the same direction. For the staircase starting to the left, adjustments in heading angle were similar. Each staircase could step through zero. This method resulted in the majority of stimuli late in the staircase being focused near the point of subjective equality (PSE) at which subjects were nearly equally likely to respond with left or right. After each stimulus was delivered, subjects reported using the button box whether they moved to the right or left. Correct responses resulted in subsequent heading directions closer to the PSE, incorrect responses resulted in subsequent heading directions farther from the PSE. If no direction was entered within 2 s no response was recorded and the stimulus was re-presented the next time that staircase was active. These types of lapses were rare occurring less than 1% of the time.
The platform returned to the origin before initiating the next stimuli. The inter-trial interval was at least 3 seconds after the motion platform returned to the origin, but each subsequent motion trial was triggered by the subject pressing a button indicating readiness.
Analysis

Average (gray bar graphs) and individual (open circles) data representing PSE and threshold values for same day test-retest (A and B respectively) and across visit test-retest (C and D respectively). Black lines connect circles to highlight changes within an individual between respective tests.
PSE and perceptual thresholds were calculated using custom functions in MATLAB (Mathworks, Natick, MA). The proportion of right/left responses were fit to a cumulative distribution function based on a bootstrapped sampling distribution to characterize the relationship between heading amplitude and the binary response. Point of subjective equality (PSE) was the average of the psychometric function and represents the stimulus at which the subjects are equally likely to perceive heading in either direction. The threshold was defined as the width of the psychometric function and corresponds to the smallest discernable movement. Paired t-tests examined the hypothesis that PSE and thresholds did not change within or across days. A two-way mixed random effects model intraclass correlation coefficient (ICC) determined test-retest reliability for both PSE and threshold across visits and within same-day tests. As an exploratory analysis paired t-tests and a two-way mixed random effects model ICC were also performed on the absolute value of the PSE. The test-retest reliability was interpreted as poor (ICC < 0.50), moderate (ICC 0.50–0.75), good (ICC 0.75–0.90), and excellent (ICC > 0.90) using the guideline given by [38]. Statistical analyses were performed with Stata 14 (StataCorp, College Station, Texas). The standard error of measurement (SEM) was calculated using correlation coefficients to find the minimal detectable change (MDC) for PSE and thresholds [61]. We conducted an unplanned post-hoc repeated measures ANCOVA to determine if visit order (two tests/session first, yes or no) influenced PSE or thresholds as the second visit for some subjects included their third exposure to this heading perception task. We also adjusted for age and sex in this model, although since age was significantly skewed (only 3 subjects older than 40) the effects of age are not interpreted. Alpha was specified as 0.05 for all tests as they were independent questions.
All subjects completed two visits, with 13.8 (7.6) days on average between testing days. All subjects completed at least one test during each visit. While the complete protocol includes a second test during one of the visits, due to time constraints and subject availability, not all subjects completed a second same day test, see Fig. 1. Two subjects (5%) were excluded from the across visit analyses (n = 47) and three subjects were excluded from the same day analyses (n = 37) due to inconsistent data suggesting inattention to the task (like switching to only “leftward” responses regardless of movement direction in the second test).
Same visit
The average heading errors and thresholds are presented in Table 1. Within a testing session, PSE score test-retest reliability was good (ICC = 0. 80 (F(36, 36) = 5.18, p < 0.001)) and did not change from test one to test two within the session (t(1,36) = –1.23, p = 0.23). When treating heading error as magnitude rather than directional signed error, PSE had moderate reliability (ICC = 0.66) and there was no difference between the first and second test (t(1,36) = 0.50, p = 0.62). Visit order (two tests/session first, yes or no) did not have a significant effect on PSE (F(1,79) = 2.96, p = 0.09). There was not a significant interaction between visit order and time (F(1,79) = 0.29, p = 0.60) for PSE. In the ANCOVA model, PSEs were also not significantly different within a session (F(1,79) = 1.49, p = 0.23). PSEs did not differ significantly across sex (F(1,79) = 0.09, p = 0.77).
Average (SD) PSE and thresholds for heading discrimination within (n = 37) and across (n = 47) visits. ICC coefficients, standard error of measurement (SEM) and minimal detectable change (MDC). Numbers 1 and 2 within the variable names correspond to the first and second heading perception test for the respective comparison (same visit vs. across visits)
Average (SD) PSE and thresholds for heading discrimination within (n = 37) and across (n = 47) visits. ICC coefficients, standard error of measurement (SEM) and minimal detectable change (MDC). Numbers 1 and 2 within the variable names correspond to the first and second heading perception test for the respective comparison (same visit vs. across visits)
Within a testing session, thresholds were moderately reliable (ICC = 0.67 (F(36, 36) = 2.93, p < 0.001)) and also stable within a testing session (t(1,36) = 0.10, p = 0.92). Visit order (two tests/session first, yes or no) did not have a significant effect on thresholds (F(1,79) = 0.17, p = 0.69), after controlling for age and sex. There was not a significant interaction between visit order and time (F(1,79) = 0.09, p = 0.77) for thresholds. In the ANCOVA model, thresholds were also not significantly different within a session (F(1,79) = 0.08, p = 0.78). Thresholds did not differ significantly across sex (F(1,79) = 0.07, p = 0.79).
Overall, the heading discrimination PSE demonstrated the MDC based on the first test was 6.2 degrees and the MDC for the PSE calculated based on the second test was 7.9 degrees. In this cohort of 37 subjects, 6 subjects had an absolute PSE shift exceeding 6.2 degrees and 5 subjects had a PSE shift exceeding 7.9 degrees. Although heading discrimination thresholds did not significantly change within a visit, the MDC for threshold based on the first test was 7.0 degrees and the MDC for threshold calculated based on the second test was 5.8 degrees. In this cohort, 7 individuals experienced an absolute threshold shift exceeding 5.8 degrees and 2 individuals experienced an absolute threshold shift that exceeded the 7.0 degrees.
Across testing sessions, PSE scores were moderately correlated (ICC = 0.59 (F(46, 46) = 2.40, p = 0.002)) and stable (t(1,46) = –0.44, p = 0.66). When treating heading error as magnitude rather than directional signed error, PSE became poorly correlated (ICC = 0.28) and there was no difference between the first and second test visits (t(1,46) = 1.17, p = 0.25). Visit order (2 tests/session first, yes or no) did not have a significant effect on PSE (F(1,95) = 0.20, p = 0.66). There was not a significant interaction between visit order and time (F(1,95) = 1.32, p = 0.26) for PSE. In the ANCOVA model, PSEs were not significantly different across sessions (F(1,95) = 0.12, p = 0.74). PSEs did not differ significantly across sex (F(1,95) = 1.97, p = 0.17).
Across testing sessions thresholds had poor reliability (ICC = 0.03 (F(42.0, 42.0) = 1.03, p = 0.459)) and were significantly smaller at the second visit (t(1,46) = 2.8, p = 0.008). Visit order (two tests/session first, yes or no) had no significant effect on threshold (F(1,91) = 0.27, p = 0.60). There was not a significant interaction between visit order and time (F(1,91) = 0.18, p = 0.68) for thresholds. In the ANCOVA model, thresholds were not significantly different across sessions (F(1,91) = 2.22, p = 0.14). Thresholds did not differ significantly across sex (F(1,91) = 0.03, p = 0.86).
The MDC for the PSE based on the first test was 9.62 degrees and the MDC for the PSE calculated based on the second test was 9.26 degrees. In this cohort, 5 subjects had an absolute PSE shift exceeding 9.62 degrees and 5 had an absolute PSE shift greater than 9.26 degrees between visits. The MDC for threshold based on the first test was 13.44 degrees and the MDC for threshold calculated based on the second test was 9.03 degrees. In this cohort of 47 subjects, 5 individuals experienced a threshold shift that exceeded 9.03 degrees and 2 individuals exceeded 13.44 degrees.
Discussion
Based on recent findings that vestibular perception is adaptable [20, 63], some authors have advocated for the inclusion of perceptual testing in the diagnostic workup [27, 37]. To address this need we examined the psychometric properties for two measures of inertial heading self-motion perception as a first step toward clinical interpretation in the event of change over time. Overall, the PSE for heading discrimination demonstrated moderate to good test-retest reliability and was also relatively stable across tests. Threshold measures for heading discrimination had moderate test-retest reliability within a visit. Although thresholds did not significantly change within a visit, they were significantly lower (suggesting increased precision) at the second visit which likely contributed to the noted poor across visit test-retest reliability for thresholds. The moderate to good test-retest reliability associated with heading discrimination stands in contrast to the high test-retest reliability recently reported for yaw rotation [39]. Differences in physiologic and process noise between the semicircular canals and otoliths may contribute to the lower reliability reported here. Additionally, inherent in heading discrimination is two-dimensional motion perception which may lead to noisier state estimates than 1-dimensional movements such as single axis rotations.
By contrast, some clinical diagnostic tests for vestibular function have better test-retest reliability. Video head impulse testing to quantify the vestibulo-ocular reflex gain has been reported to have good to excellent test-retest reliability [5, 59], at least for the horizontal semicircular canals. Greater variability has been reported for vertical canals [59]. Otolith function as assessed by vestibular evoked myogenic potentials has varied reliability depending on the organ assessed (saccule versus utricle) and the method used (taps, clicks, tone burst) and with most studies reporting fair to excellent reliability [3, 50]. It makes sense for hard-wired reflexive behaviors to be more systematic leading to higher reliability for retesting. However, discriminatory perceptual testing requires higher cognitive processing and is likely to be more susceptible to sensory and/or process noise [52], leading to lower reliability.
The literature demonstrating improvement in perceptual thresholds indicates that many repetitions are needed to achieve a change in the perceptual threshold [36, 63]. Due to differences in testing order, some subjects had a third exposure to the inertial heading task by the time they completed the second across visit test. The total number of exposures was still well below current exposures with feedback that lead to a shift in threshold [63]. Although no feedback of any kind or reference point was provided to the subjects, there was concern that the testing order may contribute to the decrease in threshold we observed in the across visit study analysis. An exploratory post-hoc analysis demonstrated that testing order did not have any effect on PSE. However, prior experience with the experimental paradigm (participation in the single testing session first and the two test session second) did result in lower thresholds during the second testing session. Thus, increased exposure across visits may explain the improved precision. We did not control for gaze point, which has been shown to influence heading perception [13, 15]. In fact, some subjects reported they had their eyes closed during testing, but it is unlikely their gaze was as eccentric as in previous studies where gaze was intentionally manipulated. Divided attention has been shown to reduce perceptual precision [43]; therefore, it is possible that subjects were attending more to the task on the second testing date. There was no difference in the task or instructions so any change in attention would be on an individual basis. It seems unlikely that as a cohort most subjects would increase their task attention or changed their gaze point in a manner that would systematically increase heading precision. Precision is influenced by both measurement and physiologic error; therefore, it seems likely that precision benefited from prior experience in a way that negatively impacted reliability [33].
Clinically, there is increased interest in quantifying vestibular perception as that may be the missing link improving the connection between subjective complains and physical difficulties with movement for individuals with vestibular disease [28, 37]. Self-motion perception has recently been linked to postural control [34], specifically roll-tilt thresholds were correlated with standing balance ability on foam with eyes closed [6, 32]. In fact, the stimulus frequency of 0.5 Hz which represents canal-otolith integration was the only threshold stimulus associated with medio-lateral sway in healthy young adults [62]. Others have reported a positive relationship between pitch-tilt perceptual thresholds and balance ability [19], but only for older adults. Older adults are known to experience age-related decline in both otolith and semicircular canal function [40, 41]. Interestingly, a separate study failed to show any change in postural control after successful training to improve inertial heading perception [20]. This suggests a potential functional specificity to self-motion perception.
Functionally, inertial heading may be more related to walking since heading direction may be derived from linear acceleration cues and is interpreted in head coordinates [8, 44]. Individuals with reduced otolith function have demonstrated difficulty with spatial navigation which may represent heading perception impairments [2, 66]. In fact, heading discrimination has been shown to increase 10-fold after bilateral vestibular loss [24]. Individuals with vestibular disease also perform worse on a timed, eyes closed, straight-line walking task compared to healthy individuals [23], but it is not clear if this represents a navigation (heading) problem or fear of falling/balance-related anxiety interfering with walking performance during eyes closed walking [17, 67]. Future studies are needed to further explore the relationship between spatial navigation accuracy and heading perception while controlling for effects of balance-related anxiety.
Limitations
Here we report on a distribution based MDC which provides an initial perspective on change in inertial heading perception. MDC differs from minimally important change which largely depends on a criterion anchor [61]. No consensus exists in the literature on a meaningful anchor defining important change for inertial heading perception. In the absence of such an anchor, the MDC values presented here may be used to interpret 95% confident detectable change for inertial heading. It should be noted that the MDC values and the lower ICC for heading perception threshold may reflect both physiologic and measurement error. Including more steps in the staircase may improve precision (reduce variability) in the threshold estimate [33], and future studies with different methods may demonstrate better reliability for thresholds. The age group represented here represents a biased sample of convenience consistent with recruitment on a university campus and may not adequately reflect reliability or MDCs for inertial heading in older adults. Others have demonstrated changes in self-motion perception between young and older adults [6, 20]. The current movement paradigm consisted of supra-threshold passive inertial heading, and may not be representative of other styles of passive vestibular self-motion perception designed to identify perceptual motion thresholds. Three subjects were excluded from the analyses due to excessive response variability suggesting a change in the performed task within the test, such as changing from correctly identifying right and left directions to consistently reporting “left” for all remaining movements including erroneous “left” responses for previous correctly identified “right” movements. This underscores another source of variability for perceptual measures, not likely to be present with reflex testing.
Conclusion
Based on these results, PSE may be a better metric for inertial heading perception than heading direction thresholds when considering change over time. PSEs for heading perception were moderately reliable and stable both within and across repeated testing days. Heading thresholds were moderately reliable within and across testing days but only appear stable within a session. Changes in PSE and thresholds for inertial heading will need to exceed 6–10 degrees and 6–14 degrees respectively to be greater than chance. These results provide initial clinical context for interpreting change in inertial heading perception.
Footnotes
Acknowledgments
This work was supported in part by the National Institutes of Health [NIDCD K23 DC018303 & 2 R01 DC013580].
Conflict of interest
The authors declare no competing financial interests.
