Abstract
The diagnosis of autism spectrum disorder is based on clinical judgment, as there are no clearly identified markers to determine the presence of this condition. Gaze patterns have been proposed as a potential biomarker for autism. This study aims to conduct an exploratory analysis of the eye-tracking data collected during a virtual reality-based intervention for social cognition in autistic children. Specifically, we evaluated the variations in social orientation toward social stimuli, the association of gaze patterns with autistic traits, theory of mind (ToM) and task performance, and the mediating effect of attentional mechanisms on the relationship between social cognition performance and autism symptomatology. Our findings identified an increase in social orientation time toward social stimuli, but eye-tracking measures did not significantly predict autism symptom severity or ToM ability. The mediation analysis also failed to find a significant mediating effect of gaze patterns on the relationship between task performance and autism severity. This study points to VR as a promising tool for improving social orienting in autistic children, although there is a need to further investigate the potential of eye-tracking measures as a behavioral marker for predicting social cognition performance.
Keywords
Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by differences in social communication and restricted and repetitive behaviors. Although its precise causes remain unclear, converging evidence suggests interactions between genetic and environmental factors, resulting in a highly heterogeneous phenotype. Diagnosis relies on clinical judgment, as there are no clearly identified markers to determine the presence of this condition, 1 prompting growing interest in identifying genetic, cognitive, physiological, and behavioral markers.2–4
Eye-tracking has emerged as a promising behavioral tool for studying attentional processes in autism due to its feasibility, non-invasive nature, and sensitivity to atypical gaze patterns. It provides objective indices of social attention and attentional flexibility, both widely studied in ASD.5–8 Social attention—the tendency to orient toward socially relevant cues such as faces—emerges early in development and supports subsequent cortical specialization. 9 Autistic individuals, however, show reduced preference for social stimuli.10–13 Attentional flexibility, an executive function domain, refers to shifting attention according to contextual demands. Research consistently reports differences in gaze allocation in autism,14–23 including reduced eye fixation and increased focus on the mouth area or background during face viewing,24–27 suggesting altered prioritization of socially relevant information.28–30
Theory of Mind (ToM)—the ability to attribute mental states to others—is another core component of social cognition frequently affected in autism. 31 Reduced ToM abilities may influence how individuals interpret others’ intentions, emotions, and actions.
Virtual reality (VR) offers controlled yet ecologically valid social environments for studying and training social cognition. 32 VR-based interventions have demonstrated improvements in social skills, cognition, imitation, and interaction in autistic individuals, with some longitudinal benefits.33–36 However, studies integrating VR with eye-tracking to examine dynamic social attention are scarce. It remains unclear whether gaze patterns could change after a VR intervention, how they relate to autistic traits, ToM, and task performance, or whether attentional mechanisms mediate links between social cognition and symptom severity.
This exploratory study addressed these gaps by examining gaze behavior in autistic children completing a semi-immersive VR social cognition intervention. We investigated (a) pre–post changes in social orienting, (b) associations between gaze, autistic traits, ToM, and task performance, and (c) whether attentional mechanisms mediate relations between social cognition and symptom severity.
Methods
Participants
Thirty-five participants were recruited from a cohort of children aged 6–8 years undergoing clinical follow-up at Hospital Universitari Mútua Terrassa (Barcelona, Spain). Two withdrew prior to the intervention due to anxiety and nine did not complete the program, resulting in 24 intervention completers. Inclusion criteria were (a) a diagnosis of ASD according to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision; (b) classification as autistic based on the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2); (c) age 6–8 years; (d) an intelligence quotient (IQ) >85 assessed with the Wechsler Intelligence Scale for Children, Fifth Edition (WISC-V) or Wechsler Preschool and Primary Scale of Intelligence, Fourth Edition, with adequate verbal comprehension; and (e) normal or corrected vision. Ethical approval was obtained from the Comité de Ética de Investigación con Medicamentos de la Fundació Assistencial Mútua Terrassa (no. 10/2023). Written informed consent and child assent were obtained in accordance with the Declaration of Helsinki.
Measures
Participants were assessed with the ADOS-2 37 and the WISC-V, 38 and the ADOS Calibrated Severity Score (ADOS-CSS) was used to measure autism severity. All instruments demonstrated strong psychometric properties, with interrater and test–retest reliability ≥0.80 and high internal consistency (α ≈ 0.88–0.96).37,38
The Social Responsiveness Scale-2 (SRS-2)39,40 and the Theory of Mind Task Battery (ToMTB), 41 were administered pre- and postintervention. SRS-2 shows excellent internal consistency (α ≈ 0.90–.97) and test–retest reliability (r ≈ 0.80–0.90). The ToMTB also demonstrates acceptable internal consistency (α ≈ 0.60–0.80).40,41 Baseline clinical and demographic data were available for 33 participants (see the Supplementary Data).
Apparatus
The semi-immersive VR intervention took place at Hospital Universitari Mútua de Terrassa using an 86-inch display, central computer, laptop, and an Azure Kinect DK device. 42 Display geometry was defined using AprilTag 2 markers. 43 Gaze patterns were collected using a Pupil Labs Invisible device. 44
Procedure
The analyzed task was part of a broader VR-based adaptive intervention targeting social cognition within the ADAPTEA protocol (see Supplementary Data). For the present study, participants completed a fixed, non-adaptive version. Analyses focused on an emotion-recognition task.
Eye-tracking measures
Gaze data were categorized into social (human figures) and non-social (background) areas of interest (AOIs). Social orienting was indexed as percentage of fixation time on social AOIs.5,14,19,27 Attentional mechanisms were quantified using time to first fixation on social AOIs, peak fixation duration for social and background AOIs,6,14,19,22 and fixation and saccade counts.8,28–30 These metrics align with established findings of reduced social orienting and altered attentional patterns in autism.14,19,23,27,29 Peak fixation duration was prioritized over mean fixation duration to mitigate the influence of artifacts. 45
Data processing and statistical analyses
Eye-tracking data were extracted and preprocessed in Visual Studio Code and analyzed in IBM SPSS Statistics. Raw gaze streams were visually inspected for artifacts. Fixations and saccades were identified using Pupil Labs’ dispersion-based algorithm with manual verification. AOIs were manually defined and validated (10 percent frame review per participant).
Statistical analyses included paired-samples t-tests, Pearson correlations, hierarchical multiple regression, and mediation analysis, using 5,000 bootstrap resamples. For hierarchical regressions, predictors were entered in three steps: demographic covariates (age, IQ, presence of attention-deficit/hyperactivity disorder [ADHD]) in Step 1; social-cognitive variables in Step 2; and eye-tracking variables in Step 3.
Sample size and analytic samples
Thirty-five participants were recruited to account for anticipated attrition and data loss. Twenty-four completed the intervention; 18 provided valid paired eye-tracking data for primary pre–post analyses. With n = 18, paired-sample t-tests (α = 0.05) provide ∼80 percent power to detect within-subject effects of dz ≈ 0.70. Regression and mediation analyses were considered exploratory due to sample size constraints.
Eye-tracking analyses required successful calibration, sufficient tracking ratio, and valid trial completion. Sample sizes varied by analysis: pre–post comparisons included participants with valid paired data, correlations included all available cases (maximum n = 22), and regression/mediation models used complete cases.
Results
Paired-sample t-test
Results show a significant increase in the percentage of social orienting from baseline to endline, t(17) = −2.123, p = 0.049, d = −0.50, and a longer time to first fixation on social stimuli from baseline to the last session, t(17) = −3.915, p = 0.001, d = −0.923.
Correlation analyses
ADOS-CSS was significantly associated with longer peak fixation duration on background AOIs (p < 0.05). SRS Social Cognition correlated with autism severity (p < 0.05). ToMTB negatively correlated with the number of fixations and saccades. Social orienting was strongly associated with peak fixation duration to social AOIs (r = 0.844) and negatively associated with peak fixation duration on background AOIs, and number of fixations and saccades (all p < 0.05). Longer first-fixation latency to social stimuli was associated with diminished social orienting and more fragmented visual exploration (see Tables 1, 2, and 3 for more details).
Correlation Coefficients Among Clinical Measures (n = 22 Participants)
Correlation is significant at the 0.01 level (two-tailed).
Correlation is significant at the 0.05 level (two-tailed).
ADOS, Autism Diagnostic Observation Schedule; ADOS-CSS, Autism Diagnostic Observation Schedule—Calibrated Severity Score; SRS, Social Responsiveness Scale; ToMTB, Theory of Mind Task Battery.
Correlation Coefficients Among Eye-Tracking Measures (n = 22 Participants)
Correlation is significant at the 0.01 level (two-tailed).
Correlation is significant at the 0.05 level (two-tailed).
% of social stimuli, percentage of time looking at social stimuli; % right answers, percentage of right answers; AOIs, areas of interest; background peak duration time, peak duration time toward nonsocial stimuli; social peak duration time, peak duration time toward social stimuli; total fixations (background AOIs), number of fixations toward nonsocial stimuli; total fixations (social AOIs), number of fixations toward social stimuli.
Correlation Coefficients Between Clinical Measures and Eye-Tracking Variables (n = 22 Participants)
Correlation is significant at the 0.05 level (two-tailed).
Predictors of autism symptom severity
In Step 1, the control variables did not significantly predict ADOS-CSS, F(3, 20) = 0.176, p = 0.911. Step 2 did not significantly improve the model, ΔR2 = 0.228, F(2, 18) = 1.227, p = 0.337. Adding eye-tracking variables in Step 3 did not significantly improve prediction, F(10, 13) = 0.996, p = 0.492 (Tables 4 and 5).
Durbin–Watson (final model) = 1.90.
Hierarchical Regression Predicting ADOS-CSS with Bootstrapped Estimates (95 % CIs)
aUnless otherwise noted, bootstrap estimates are based on 5,000 samples.
bBased on 4,992 bootstrap samples.
ADHD, attention-deficit/hyperactivity disorder; AOIs, areas of interest; IQ, intelligence quotient; social orienting, percentage of total fixation time toward social stimuli; SRS, Social Responsiveness Scale; ToMTB, Theory of Mind Task Battery.
Predictors of ToM ability
In Step 1, the control variables did not significantly predict ToM ability, F(3, 20) = 2.335, p = 0.105. Adding ADOS-CSS in Step 2 did not significantly improve the model. In Step 3, the inclusion of eye-tracking variables increased the explained variance to 44.3 percent, but this change was not significant, ΔR2 = 0.145, F(5, 14) = 0.731, p = 0.612 (Tables 6 and 7).
Hierarchical Regression Models Predicting Theory of Mind Performance
Durbin–Watson (final model) = 2.48.
Hierarchical Regression Predicting Theory of Mind Performance (Bootstrapped Estimates, 95% CIs)
aUnless otherwise noted, bootstrap estimates are based on 5,000 samples.
bBased on 4,999 bootstrap samples.
Predictors of social orienting
Model 1 was not significant, F(3, 20) = 0.587, p = 0.630. Adding social cognition variables in Step 2 did not significantly improve prediction, F(5, 18) = 0.719, p = 0.618, R2 = 0.166. The final model, including ADOS-CSS, also was not statistically significant, F(6, 17) = 1.037, p = 0.436 (Tables 8 and 9).
Hierarchical Regression Models Predicting Social Orienting
Durbin–Watson (final model) = 1.90.
Hierarchical Regression Predicting Social Orienting (Bootstrapped Estimates, 95% CIs)
Unless otherwise noted, bootstrap estimates are based on 5,000 samples.
IQ, Intellectual Quotient.
Predictors of task performance
In Step 1, the control variables did not significantly predict performance, F(3, 18) = 0.350, p = 0.790. Step 2 did not significantly improve the model, ΔR2 = 0.008, F change (2, 16) = 0.214, p = 0.951. Model 3 also resulted in not being statistically significant, ΔR2 = 0.130, F change(1, 15) = 2.425, p = 0.140 (Tables 10 and 11).
Hierarchical Regression Predicting Task Performance
Durbin–Watson (final model) = 1.718.
Hierarchical Regression Predicting Task Performance (Bootstrapped Estimates, 95% CIs)
Bootstrapped estimates are based on 5,000 samples, except where noted.
Step 3 coefficients based on 4,944 samples.
Mediation analysis
A mediation analysis (PROCESS Model 4) tested whether background peak duration time mediated the association between percentage of correct answers during the VR intervention and ADOS-CSS. Percentage of correct answers significantly predicted background peak duration time (b = −5,961.01, p = 0.0047), but background peak duration time did not predict ADOS-CSS (b = 0.0001, p = 0.2235). The direct effect was nonsignificant (b = −0.2620, p = 0.7099). The indirect effect was b = −0.4854 (BootSE = 0.7006), 95 percent CI [−1.7546, 0.2649], indicating no mediation (Figure 1).

Mediation analysis.
Discussion
This exploratory study examined changes in social orientation following a VR social cognition intervention and explored associations between gaze patterns, autistic traits, ToM, and task performance. Social orientation increased significantly across sessions, supporting the hypothesis that the intervention may enhance social attention. The expected association between ADOS-2 scores and the SRS Social Cognition subscale was replicated.46,47 ADOS-2 and ADOS-CSS were associated with greater baseline attention to background stimuli, suggesting reduced social motivation. 14 Longer time to first social fixation was linked to lower engagement and more fragmented attention, consistent with prior findings. 48
Although autism severity was correlated with longer peak fixation durations on background AOIs, regression analyses revealed no significant predictive relationships between gaze, social cognition, and autistic traits. This dissociation supports models proposing that implicit attentional processes (eye-tracking) and explicit reasoning (ToM tasks, clinical ratings) reflect distinct but interacting systems.49,50 The limited sensitivity of ADOS-CSS 51 and the absence of direct social cognition assessment in ADOS Module 3 may partly explain these null findings. ToM performance depends on language and higher-order cognition,52–54 which may not be captured by gaze metrics. In addition, eye movements are influenced by scene salience and task demands,55,56 potentially obscuring individual differences.
The mediation model aligned with this interpretation: background fixation duration did not explain links between performance and symptom severity, although limited sample size likely reduced power. Overall, the dissociation between clinical severity, social cognition, and gaze behavior supports partially independent but interacting systems underlying social functioning. Eye-tracking captures rapid, context-sensitive processes, whereas ADOS-CSS and ToM index more stable traits.
Despite limited predictive associations, the increase in social orienting highlights eye-tracking’s sensitivity to short-term attentional shifts. The association between background attention and autism severity suggests that non-social gaze allocation may represent a feasible biomarker. VR combined with eye-tracking shows promise as a complementary tool for profiling social attention. Future research should examine whether gaze changes predict treatment response, whether adaptive VR enhances engagement, and whether integrating eye-tracking with physiological or neural measures improves characterization of social-cognitive profiles.
Limitations
The small sample reduced statistical power, and ceiling ToMTB scores limited the detection of change. Underrepresentation of autistic females, exclusion of individuals with intellectual disabilities or behavioral disorders, and absence of a control group restrict generalizability. Social orientation metrics also limited precision, as gaze to specific facial features could not be analyzed.
Conclusions
VR appears promising for capturing social orienting in autistic children, although eye-tracking measures did not yield strong predictive associations. These findings highlight the need to distinguish between task-specific attentional dynamics and trait-level clinical features when interpreting gaze behavior in autism. Larger, more diverse samples are needed to further evaluate eye-tracking as a behavioral marker of autism-related social differences.
Footnotes
Acknowledgments
The authors acknowledge the valuable contribution of Luna Maddalon and Maria Eleonora Minissi for their assistance in facilitating access to the data used in this study and their support throughout all the process.
Authors’ Contributions
S.G.M.: Conceptualization, data curation, formal analysis, methodology, investigation, visualization, writing—original draft. G.L.R.: Resources, writing—review and editing. M.A.R.: Conceptualization, methodology, funding acquisition, supervision, validation, and writing—review and editing. A.H.Z.: Conceptualization, methodology, funding acquisition, project administration, supervision, validation, and writing—review and editing.
Author Disclosure Statement
The authors declare no potential conflicts of interest.
Funding Information
This work was supported by the Ministerio de Ciencia, Innovación y Universidades (Proyectos de I + D + i Retos Investigación, grant number PID2020-116422RB-C22).
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
