Abstract
We compared scanpath similarity in response to repeated presentations of social and nonsocial images representing natural scenes in a sample of 30 participants with autism spectrum disorder and 32 matched typically developing individuals. We used scanpath similarity (calculated using ScanMatch) as a novel measure of attentional bias or preference, which constrains eye-movement patterns by directing attention to specific visual or semantic features of the image. We found that, compared with the control group, scanpath similarity of participants with autism was significantly higher in response to nonsocial images, and significantly lower in response to social images. Moreover, scanpaths of participants with autism were more similar to scanpaths of other participants with autism in response to nonsocial images, and less similar in response to social images. Finally, we also found that in response to nonsocial images, scanpath similarity of participants with autism did not decline with stimulus repetition to the same extent as in the control group, which suggests more perseverative attention in the autism spectrum disorder group. These results show a preferential fixation on certain elements of social stimuli in typically developing individuals compared with individuals with autism, and on certain elements of nonsocial stimuli in the autism spectrum disorder group, compared with the typically developing group.
Keywords
The two primary diagnostic features of autism are the socio-communicative disorder, manifesting in poor social skills and atypicalities in verbal and nonverbal communication, and repetitive and limited patterns of behavior (Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-V; American Psychiatric Association, 2013)), which may manifest as repetitive movements but also as intense preferences and interest in certain types of activities or stimuli. These patterns of behaviors characteristic of autism could perhaps be summarized as a preference for the nonsocial over the social world (Baron-Cohen, 2009), manifesting differently depending on the developmental and intellectual level of the individual with autism, and on the level of severity of their autistic characteristics. Here, we argue that this preference is also manifested in the way that autistic people view the world and the attentional decisions they make regarding which stimuli are fixated on and hence selected for further processing.
We show existence of this attentional bias by measuring scanpath similarity for repeated images. We argue that higher scanpath similarity for repeated images means that eye-movement patterns are constrained by an attentional bias for certain elements of the image. This bias is a reflection of an internal preference for certain type of stimuli, which directs individual’s attentional choices, which, in turn, manifests in their looking behavior.
Social and nonsocial world preferences in autism and typical perception
Weakened social preferences in individuals with autism spectrum disorder (ASD) are generally supported by empirical studies (Chita-Tegmark, Arunachalam, Nelson, & Tager-Flusberg, 2015; Frazier et al., 2017; Schultz, 2005; Wang et al., 2014), although Guillon, Hadjikhani, Baduel, and Rogé (2014) argue that decreased attention to social stimuli is context-dependent and, therefore, cannot be seen as a generalized deficit in ASD (see also Chevallier et al., 2015). It is also less clear whether it is accompanied by a generalized preference for the nonsocial world. There is some initial evidence that individuals with ASD preferentially look at nonsocial rather than social stimuli even in the early stages of their development. For example, preference for geometric patterns was found to be related to increased symptom severity in autistic toddlers (Moore et al., 2018; Pierce et al., 2016), while 2-year-old toddlers with autism were found to be highly sensitive to the presence of nonsocial contingency in the stimulus while ignoring its social aspect (Klin, Lin, Gorrindo, Ramsay, & Jones, 2009).
This pattern of atypical allocation of attention was also detected in adults with ASD, with decreased attention to the eyes and overall face area accompanied by increased attention to the body and nonsocial elements (Chita-Tegmark, 2016). Sasson, Elison, Turner-Brown, Dichter, and Bodfish (2011), Sasson, Turner-Brown, Holtzclaw, Lam, and Bodfish (2008), and Sasson and Touchstone (2014) showed that both children and adults with autism looked preferentially at objects related to circumscribed interests typical for ASD (such as trains) over social images. Recently, these results were replicated in a large study by Manyakov et al. (2018). However, these findings were limited to specific objects known to be of special interest to the ASD population.
Why do looking preferences matter? According to the social motivation theory of autism (Chevallier, Kohls, Troiani, Brodkin, & Schultz, 2012), decreased social motivation may be a primary deficit in autism, initiating the path of disrupted development of both social skills and social cognition, such as the deficit of theory of mind. Preferential looking at certain classes of stimuli with time may lead to accumulation of experience about those stimuli and their correlates, which may eventually lead to expertise and enhanced skills pertaining to these stimuli. Therefore, it is possible that a consistent preference for nonsocial stimuli over social stimuli in time could change the developmental trajectory of acquiring expertise in the nonsocial world at the expense of the social world. For example, decreased preference to look to the eyes in toddlers with autism correlated with greater level of social disability (Jones, Carr, & Klin, 2008). Thus, decreased preference for social stimuli and rewards may lead to inadequate social experience, which eventually leads to poor social skills and social cognition. Similarly, increased preference for nonsocial stimuli may lead to increased expertise in that domain, manifesting, for example, as heightened abilities to infer the cause of physical event or movement (folk physics; Baron-Cohen, Wheelwright, Spong, Scahill, & Lawson, 2001).
Scanpath similarity as a measure of attentional bias and looking preferences
Scanpath is a sequence of eye movements in response to a visual stimulus (Noton & Stark, 1971). It is an omnibus measure consisting of all image-related single eye movements, that is, saccades and fixations in order of their appearance. Unlike single eye-movement statistics, scanpaths provide a measure of the entire spatial and temporal viewing pattern and, as such, can be seen as the viewer’s eye-movement response to the image.
Scanpaths naturally reflect salient (Foulsham & Underwood, 2008) and semantic (Henderson & Hayes, 2018; Xu, Jiang, Wang, Kankanhalli, & Zhao, 2014) features of the image, but also top-down factors such as the task of the observer (Borji & Itti, 2014; Coco & Keller, 2014; Yarbus, 1967). However, there is evidence that they may also reflect individual characteristics and biases. For example, scanpaths have been reported to reflect individual differences, such as intelligence (Hayes, Henderson, Bors, Henderson, & van de Weijer, 2017), or diagnoses of developmental disorders (such as ASD, dyslexia, or attention deficit disorder; Hayes & Henderson, 2018). Therefore, scanpaths can be seen as signatures of the intrinsic characteristics of observer’s mind.
Noton and Stark (1971) were the first to observe that scanpaths were individual to the viewer and to some extent stable between different viewings of the same image. Scanpath similarity over different presentations of the same stimulus for the same observer has been confirmed by several studies (Foulsham et al., 2012; Foulsham & Underwood, 2008; Mannan, Wooding, & Ruddock, 1997) and to some extent can be attributed to genetic causes, given the significant degree of scanpath similarity in monozygotic and dizygotic twins (Kennedy et al., 2017). Therefore, scanpath similarity over repeated presentations of the same image, apart from the obvious bottom-up similarity stemming from the image repetition itself, reflects stable characteristics of the observer (Foulsham et al., 2012). This could be due to a general looking tendency, such as the image center bias in individuals with ASD (Wang et al., 2015), but also more complex viewing strategies, reflecting observer’s goals and preferences.
Here, we argue, that scanpath similarity can be seen as a signature of perceptual or attentional biases. If preference for certain class of stimuli exists, then we should be able to observe higher similarity in the scanpaths in response to repetition of the image containing the preferred class of stimuli. In such case, the preferred stimulus acts as an “attractor,” capturing the observer’s attention over other elements of the image, leading to a more constrained scanpath. For example, our tendency to preferentially look at meaningful or salient (i.e. highly visually noticeable) elements of the image, creates a degree of scanpath similarity for repeated images containing objects or salient features, compared with images containing no such points of interest. On the other hand, lower scanpath similarity means eye-movement patterns are less constrained by the presence of certain stimuli class, and, what follows, that the observer has no preference for that particular stimuli. To summarize, scanpath similarity for repeated images means that looking patterns are constrained by the existence of looking strategies or biases, and as such can be used to measure a preference or bias for certain types of stimuli. Thus, we propose scanpath similarity as a measure of preferences and biases. We measure scanpath similarity both by comparing different scanpaths belonging to the same individual (within-individual similarity) and scanpaths belonging to different members of the same group (within-group similarity). Within-individual scanpath similarity reveals the presence of individual biases in the looking behavior that the observer may display. Within-group scanpath similarity can help to verify whether these biases are idiosyncratic to each member of the group, or whether they are common among the members of the same group.
Aims and design of this study
The aim of the study was to show that autistic and typically developing (TD) individuals systematically differ with regard to the elements of visual stimuli they fixate on, revealing their different interests, which ultimately shape their expertise in the perceptual world.
Specifically, we hypothesized that individuals with autism, compared with TD individuals, not only display weaker biases in the way they look at social stimuli, but also display stronger biases when looking at nonsocial stimuli.
To show these biases in the eye-movement patterns, we compared the scanpath similarity between two groups of participants: individuals with ASD diagnoses and an age-, sex- and IQ-matched control group, who viewed repeated presentations of the same images. We used photographs of natural scenes as stimuli, to test whether it would be possible to detect a nonsocial bias not only for specific types of stimuli related to circumscribed interests, but in a more naturalistic situation. We manipulated the social content of the images, with some representing social scenes, while others represented only inanimate objects.
First, we compared the scanpath similarity between different presentations of the same image for each person (within-individual similarity), to test how similar the scanpaths of the same observer are when the same image is viewed repeatedly. We hypothesized that in participants with ASD, there will be a higher degree of scanpath similarity for nonsocial stimuli and a lower degree of similarity for social stimuli, reflecting their preference for the nonsocial over social world.
However, the question remains whether each autistic individual has their own idiosyncratic way of looking at the world, or whether there are patterns in their looking behavior that are common to the whole group. For example, if a nonsocial bias is observed in the group with autism, is it because each person with autism has their own preferences of looking at specific types of nonsocial stimuli (e.g. one person may be interested in vehicles or pieces of electronic equipment, another in letters or specific visual patterns), or is it because individuals with autism, as a group, prefer certain specific types of nonsocial stimuli, reflecting their group-wide interests?
To answer this question, we compared the scanpath similarity between different members of each group (within-group similarity), to test how similar members of the same group (either the ASD group or the control group) are to one another in terms of their looking behavior. We hypothesized that individuals with autism will be more similar to one another when looking at nonsocial stimuli, while TD individuals will be more similar to other members of their group when looking at social stimuli. Such pattern of results reveals group-specific biases for certain elements of natural images in each of the groups. In other words, increased within-group scanpath similarity for nonsocial stimuli in the group of participants with autism would show that their way of looking at nonsocial stimuli is not entirely idiosyncratic, but reveals preferences and interest for certain elements of such images common to the whole group.
Finally, it is very important to note that, because we are comparing scanpaths which contain both spatial and temporal information, biases that can be revealed with this analysis are not confined to preferential looking at certain elements of the image (e.g. faces), but also to any biases in the temporal ordering of fixations. For instance, suppose that—across expositions of an image—ASD participants tend to look at the same objects as TD participants, but the order in which they do so varies, whereas TD participants look at the objects in the same order (e.g. always look at a person’s face first). This might suggest a greater homogeneity among TD participants in terms of the importance associated with different objects (in this case, the face being the most important object for this group of participants).
Method
Participants
The sample consisted of 30 participants diagnosed with ASD and 32 TD age-, sex-, and IQ-matched participants. All participants in both the groups had normal or corrected to normal vision and no neurological conditions. None of the TD participants had a history of autism spectrum or other neurodevelopmental disorder. Their intelligence was measured using the Wechsler Scales of Intelligence, depending on participants’ age, either the Wechsler Intelligence Scale for Children (WISC-R, Wechsler, 1974) or Wechsler Adult Intelligence Scale (WAIS-R, Wechsler, 1983). All participants had full-scale IQs in the average range (>85).
All participants in the ASD group had clinical diagnoses prior to their enrolment in the study, based on the criteria outlined in ICD-10 (International Classification of Diseases, Tenth Revision). All diagnoses were additionally confirmed by our team (M.E.K. and an ADOS (Autism Diagnostic Observation Schedule)-trained research assistant, who has both research and clinical experience with individuals on the autism spectrum) with ADOS 2; Lord et al., 2012), which is a standardized, validated instrument for the assessment of autism spectrum disorder. Sample characteristics and p values for the between-group differences are shown in Table 1.
Sample characteristics.
ASD: autism spectrum disorder; TD: typically developing.
Mann–Whitney test.
t-test.
Pearson’s chi-square test.
The study was approved by the faculty Research Ethics Committee, and was conducted in accordance with the Helsinki Declaration. Written consent was obtained either from the adult participants themselves or, in the case of underage participants, from their guardians in addition to oral consent from the underage participants.
Stimuli and design
We used color images of 800 × 600 px resolution, taken from the database of 700 photographs created by Xu et al. (2014). For the stimuli list, please see the Supplemental Material. Apart from the images themselves, the database comprises annotation data of 5551 segmented objects with fine contours (see Figure 1, for example), as well as eye fixation data of neurologically typical participants who looked once at each image.

An example of a “social” image (containing people) with superimposed areas of interest contours and a legend listing the associated semantic attributes, all based on the Xu, Jiang, Wang, Kankanhalli, and Zhao (2014) dataset.
In order to be able to show the same set of images repeatedly to each participant (and taking into account their potential fatigue and attentional difficulties, particularly in the ASD group), we selected a small subset of 24 images from the available 700 images. Half of the images contained social elements, defined as the presence of at least one large human figure with clearly visible face, while the other half of the images contained a number of clearly visible objects, but no social elements defined as above, and additionally no people or animals of any kind and size, including pictures, figurines, or toys resembling people or animals. To make sure that the images were complex enough to elicit sufficiently long scanpaths, we chose images that elicited at least nine fixations on average in the original study by Xu et al. (2014). There was no significant difference in the average number of fixations between the social set (M = 9.79, SD = 0.43) and the nonsocial set (M = 9.77, SD = 0.38), p = 0.89. In addition, based on the areas of interest (AOIs) for each image according to the semantic object segmentation data provided by Xu et al. (2014), objects accounted for 23.5% of the area of the social images (this included facial objects covering 4.5% of the said area), while for the nonsocial images this figure was 23.1% (which, by definition, did not include any facial objects).
Procedure
Participants were seated 65 cm in front of the screen of a 15-in Dell Precision M4800 workstation (screen resolution 1280 × 720, each 800 × 600 image was displayed centrally on a white background), with their heads placed on a chinrest. Their eye movements were recorded using a remote eye-tracking device SMI RED250Mobile, with a sampling rate of 250 Hz and gaze position accuracy of 0.4°. The experiment was programmed in C#. Participants completed a 5-point calibration and 4-point validation procedure. In addition, eye data quality was periodically checked using in-house software, which prompted participant repositioning when required.
The experiment consisted of three sessions (with breaks in between), with 24 trials each (one per image). In each trial, preceded by a 500 ms fixation cross, one of the images was displayed for 3000 ms. After the presentation of the stimulus, a “What have you seen?” question prompt appeared on the screen, participants replied their response orally to the research assistant. The question was devised as an attention check and the data were not further analyzed. The trial ended with a 500 ms blank screen. Each session comprised the same 24 images (one per trial) displayed in random order. Unfortunately, one of the nonsocial images was incorrectly copied to the computer displaying the images and had to be removed from further analyses.
Data analysis
We used the SMI event detector high-speed velocity-based detection algorithm with standard settings (required fixation duration of 100 ms and a velocity threshold of 500/s) to extract the participants’ eye fixations while viewing the images.
We defined the AOIs for each image according to the semantic object segmentation data provided by Xu et al. (2014), with each of the key image objects annotated in the database constituting a single AOI. Note that, across the stimuli we selected for our study, nonbackground objects accounted for 23.5% of the area of the social images (this included human face objects covering 4.5% of the said area), while for the nonsocial images this figure was 23.1% (which, by definition, did not include any facial objects). For each trial, we assigned each fixation to an AOI/object that it was located within, thus constructing a scanpath composed of a sequence of integer numbers identifying the subsequent objects that the participant looked at, together with the corresponding fixation durations.
To compare different scanpaths, we used the ScanMatch method (Cristino, Mathôt, Theeuwes, & Gilchrist, 2010), which has the advantage of being able to account for spatial, temporal, and sequential similarity between scanpaths. The technique is well suited to revealing within-individual idiosyncrasies, and particularly similarities between repeated viewings of an image by the same observer (Foulsham et al., 2012). It has been shown to be a remarkable improvement over the less sophisticated scanpath comparison methods (such as the “edit distance,” or measures that only consider spatial similarities, e.g. “cross-recurrence”); it is also distinguished by the fact that it naturally incorporates semantic information into similarity score computation (Anderson, Anderson, Kingstone, & Bischof, 2015).
The method proceeds by first temporally binning the scanpath elements, by encoding each element of the scanpath with a number of AOI identifiers proportional to the duration of the corresponding fixation (one copy of the identifier per each 50 ms duration—the sampling frequency recommended by Cristino et al., 2010). As recommended by Cristino et al., we set the temporal sampling frequency to 50 ms, so that a fixation that lasted 200 ms is replaced by four identifiers of its target AOI. Having thus taken fixation durations into account, the next step is to find the optimal global alignment of the resulting sequences using the Needleman–Wunsch algorithm, using a predefined substitution matrix that can incorporate both spatial and semantic relationships between the AOIs. In our case, we use semantic information provided in the Xu et al. (2014) database, whereby each AOI is encoded with 12 binary semantic attributes (e.g. whether it represents a face and includes text). Among the social images selected for our study, the most common AOI attributes were “face” (appeared in all 12 images), “touched” (9 out of 12), “taste” (7), “emotion” (7), “watchability” (5), “gazed” (4), “motion” (4), “text” (3), and “sound” (3). Among the nonsocial images, the most common ones were “watchability” (10), “text” (7), “taste” (6), “smell” (5), “operability” (5), and “touch” (4).
For any two AOIs present in an image, we can therefore determine not only their spatial relationship (particularly if they are adjacent, i.e. share a common boundary), but also whether or not they are compatible in terms of at least one attribute (e.g. both represent a face). If two aligned elements of the compared sequences correspond to different AOIs, but are both semantically and spatially similar (adjacent), then they still receive half of the full (unit) premium from aligning two identical AOIs. Analogously, if the aligned AOIs are either semantically or spatially similar (but not both), then half of the full (unit) penalty for aligning different and nonrelated AOIs applies. For example, in the image in Figure 1, if two (aligned) scanpath sequences both begin with the woman’s face, this will add +1 to the alignment score; if one begins with the salad she eats and the other with the orange juice glass, this will add +1/2, because the two are different AOIs but both feature the “taste” attribute and are adjacent (next to each other); if one begins with the woman’s face, and the other with that of the man, this will subtract −1/2, because both AOIs feature the face attribute but are not adjacent; finally, if one begins with the man’s face and the other with the salad, this will subtract −1, as the two AOIs are neither semantically related nor adjacent. Put simply, in comparing two scanpaths, we take into account not only whether exactly the same AOIs were looked at in the same order, but use sequence alignment to measure the extent to which the order is different, and apply “mitigating circumstances,” whereby looking at different but semantically related or neighboring AOIs does not reduce the similarity score so much.
Once the similarity score is obtained, to safeguard against it being biased by varying scanpath lengths, the score is normalized by dividing it by the length of the longest of the compared sequences. For example, if one temporally binned scanpath sequence contains 30, and the other 40 AOI identifiers, then the ScanMatch similarity score obtained for this pair of scanpaths is divided by 40 (the length of the globally aligned sequence, see Cristino et al. for more detail). Otherwise, shorter temporally binned scanpaths might generally be classed as more similar (due to having fewer mismatched elements), which would particularly apply to the ASD participants. Specifically, the average total fixation duration per trial (which determines the temporally binned scanpath length) was 2.314 s for TD participants while looking at social images (SD =0.669 s) and 2.263 (SD = 0.817) while looking at the nonsocial ones. For ASD participants, it was 2.163 for social images (SD = 0.992) and 2.112 for the nonsocial ones (SD = 0.899). This apparent variation justifies the need for normalization, and so only the final, normalized ScanMatch scores were used as the dependent variables in both models presented in the following section.
Results
Descriptive statistics
We present the average number of fixations per image and the average fixation duration per image in Table 2.
The average number of fixations per image and the average fixation duration per image.
TD: typically developing; ASD: autism spectrum disorder.
Before reporting the main results, based on ScanMatch comparison, we also present a descriptive summary of how the participants’ attention was distributed over the AOIs. The purpose is to see whether similarities in gaze patterns across different expositions of an image can be identified based solely on the spatial (but not temporal) allocation of gaze. To this end, for each image and each of its AOIs, and separately for TD and ASD participants, we calculated the average total fixation time to the given AOI per trial. Next, for each image, we selected the four AOIs with the highest values of that average. Finally, we averaged across all social and all nonsocial images, so that, for each of the four combinations of the participant group and image type, we obtained a distribution of fixation time among the four AOIs that were fixated the longest. This is presented in Figure 2.

The distribution of fixation time among the four longest-fixated AOIs, separately for ASD (top) and TD (bottom) participants, and for nonsocial (left) versus social images (right).
It appears that, compared with ASD participants, TD participants’ gaze is more heavily concentrated in their “favourite” (longest-fixated) image areas. In this sense, in terms of their spatial properties, fixation patterns of TD participants accompanying different expositions of the same image seem more homogeneous than those of the ASD participants. However, compared with the social images, this tendency appears to be, at least partly, overturned for the nonsocial ones. Thus, in the following sections, we conduct formal, statistical testing of this variation, taking into account not only spatial, but also temporal similarities between the scanpaths, and also discriminating between similarities within- versus between-participants.
Within-individual scanpath similarity
For each participant and for each image they viewed, we compared (pairwise) the scanpaths accompanying the three expositions of each image, that is, we calculated the normalized similarity score from comparing its exposition during the first versus the second session, the first versus third, and the second versus the third session. We then calculated the average of the three obtained scores, which constituted the dependent variable, while the set of independent variables included the participant’s full-scale intelligence (“FIQ”), group membership (“ASD,” 1 = autism, 0 = TD), and indicators of whether the given image included people (“social” = 1 or “nonsocial” = 0) or not (“social” = 0 or “nonsocial” = 1). The average similarity scores across all participants and images for each of the four combinations of “ASD” and “social” are shown in Table 3.
Average within-individual scanpath similarity scores for each of the four combinations of ASD group membership and image content.
ASD: autism spectrum disorder; TD: typically developing.
Normalized similarity scores rescaled from [−1;1] to [0;1].
To evaluate the statistical significance of the differences seen in Table 2, we estimated a mixed-effects model with fully crossed random intercept and slope effects clustered by both participant and item (image), and with variables specified as above. Using this type of model instead of, for example, repeated-measures analysis of variance (ANOVA), has the advantage of being able to more effectively cope with missing observations—in our case, trials with no registered fixations. Such trials are especially likely to occur when testing ASD participants, who might have trouble concentrating and can occasionally look away from the screen (this happened in 4.4% of the trials attempted by ASD participants, compared with 1.4% for TD participants). In addition, the mixed-model procedure allowed us to assess the significance of ASD and image content relative to random idiosyncratic effects due to specific images or participants. The resulting fixed-effect estimates are shown in Table 4.
Mixed-effects model summary table for the average normalized within-individual similarity scores (N = 1413).
ASD: autism spectrum disorder; SE: standard error.
p < 0.05.
Note that, despite having two interaction terms in the model (“social × ASD” and “nonsocial × ASD”), the underlying model structure is exactly the same as if we replaced the “nonsocial × ASD” term with “ASD,” thus making “social” the single “nature of the stimuli” variable. The only difference is in which comparisons are statistically evaluated. Our two-pronged hypothesis was that, (1) for social images, scanpaths of TD participants accompanying different expositions of the same image will be more similar to one another than scanpaths of ASD participants and (2) for nonsocial images, the opposite will hold. This is what our two interaction terms will test: first, the “social × ASD” term tests part (1) of the hypothesis, as it captures the difference in similarity between the two groups of participants for social images (for TD participants, similarity is given as βintercept + βintelligence + βsocial, and for ASD participants it is given as βintercept + βintelligence + βsocial + βsocial × ASD, that is, the difference between the two is βsocial × ASD). Similarly, it is easy to show that βnonsocial × ASD captures the analogous similarity difference between the groups for nonsocial images, thus testing part (2) of the hypothesis.
The results indicate that there was no significant effect of intelligence (βintelligence = 0.001, p = 0.372) on the similarity between scanpaths accompanying the three different viewings of an image by the same participant. However, for nonsocial images, the similarity scores were significantly higher for ASD participants than for the control participants (βnonsocial × ASD = 0.058, p = 0.038), which corresponds to the average score shown in Table 1, increasing from 0.352 to 0.380. In contrast, for social images, the similarity scores were significantly lower for ASD participants than for the control participants (βsocial × ASD = −0.061, p = 0.029), which corresponds to the average score shown in Table 1, decreasing from 0.372 to 0.342. At the same time, among the control participants (ASD = 0), the difference in within-individual scanpath similarity between social and nonsocial images is not significant (βsocial = 0.041, p = 0.434), possibly due to the fact that the increase from 0.352 to 0.372 observed in Table 1 was too small, relative to the idiosyncratic random effects within each different set of images.
In sum, we found that, compared with control participants, ASD participants exhibit lower scanpath similarity across different viewings of the same social image, but that this tendency is reversed for nonsocial images.
Within-group similarity
For each participant-image-session combination, we compared the scanpaths accompanying the given viewing of the image by the participant in question with those accompanying the viewings of the same image in the same session by all other participants in the same group (i.e. we compared control participants with other control participants and ASD participants with other ASD participants). Specifically, we computed the average normalized similarity score across all pairwise comparisons between the scanpath of the participant on one hand and each of those of her counterparts on the other. In addition to the resulting dependent variable, and the independent variables used in the within-individual model in the previous section, we now included an independent variable encoding the session number (i.e. whether the participants viewed the image for the first, second, or third time, treating the last session as the regression’s reference category by subtracting 3 from the session number). The average similarity scores across all combinations of session, ASD, and image content are shown in Table 5.
Average within-group similarity scores for each combination of session, ASD group membership, and image content.
ASD: autism spectrum disorder; TD: typically developing.
Normalized similarity scores rescaled from [−1;1] to [0;1].
To evaluate the significance of the differences seen in Table 4, once again, we estimated a mixed-effects model with fully crossed random intercept and slope effects clustered by both participant and image, and with variables specified as above. The resulting fixed-effect estimates are shown in Table 6. To test the results’ robustness against potential cross-cluster dependencies, we also estimated an alternative model specification, in which 1 in 3 (10) participants were randomly excluded from each group, and the remaining ones were all compared with the same set of their excluded counterparts’ scanpaths. The results were qualitatively identical to the ones presented below.
Mixed-effects model summary table for the average normalized within-group similarity scores (N = 4157).
ASD: autism spectrum disorder; SE: standard error.
p < 0.05.
As in the within-individual model, we found that there was no significant effect of intelligence (βintelligence = 0.001, p = 0.987) on scanpath similarity. However, during the third exposure to nonsocial images (nonsocial = 1 and social = session = 0, recall that we treat the last session as reference for the other two), the similarity between the scanpaths of different ASD participants looking at the same image was greater than between those of different control participants looking at that image (βnonsocial × ASD = 0.078, p = 0.001). This tendency was significantly strengthened by repeated exposure to the image, that is, was stronger in the later than in the earlier sessions (βnonsocial × session × ASD = 0.041, p < 0.001). This can be seen in Table 4, where, in the leftmost cells, the average similarity score among the control participants watching nonsocial images for the first time (0.370) is greater than among the ASD participants (0.360). However, in the two later sessions, this tendency is reversed, with similarity being 0.032 higher among ASD participants than among their control counterparts. Put differently, the significant tendency for similarity to decline among control participants with repeated exposure to nonsocial images (βnonsocial × session = −0.069, p < 0.001) was significantly weaker for ASD participants (βnonsocial × session × ASD = 0.041, p < 0.001), as evidenced by the fact that the average scores in the top-left quarter of Table 4 are more strongly reduced following the initial presentation than the corresponding scores of ASD participants in the bottom row.
Comparing the above effects with those registered for social images, there was no effect of image content among the control participants during the final session (βsocial = 0.075, p = 0.181). As for nonsocial images, there was also a tendency for similarity to decline among control participants with repeated exposure to social images (βsocial × session = −0.033, p < 0.001). However, as opposed to nonsocial images, in the final session similarity between the scanpaths of different ASD participants looking at the same social image was smaller than between those of different control participants looking at that image (βsocial × ASD = −0.074, p = 0.002). Moreover, this tendency was not significantly different in the other two sessions (βsocial × session × ASD = 0.003, p = 0.758). In Table 4, this is evidenced by the fact that, for social images, the values in the bottom row are lower than the ones at the top, but the difference between the corresponding top and bottom numbers remains relatively stable across the three sessions.
In sum, we found that, while looking at social stimuli, the scanpaths of different ASD participants were less similar to each other than those of control participants, but that this tendency was reversed for nonsocial images that have already been seen before.
Discussion
Controlling for intelligence, we calculated scanpath similarity between repeated presentations of social and nonsocial images in participants with ASD and the TD control group. First, we measured within-individual similarity, to assess scanpath similarity between repeated presentations of the same stimulus in each participant. We found that participants with ASD, in comparison with the TD group, displayed lower scanpath similarity across different viewings of the same social image, but higher similarity for nonsocial images. Second, we measured within-group similarity, to assess how similar different members of each group (ASD vs TD) are to other members of the same group in terms of their eye-movement patterns. We found that scanpaths of participants with ASD (compared with the TD group) were more similar to one another when they were looking at nonsocial images, whereas TD participants were more similar to one another when they were viewing social images.
Within-individual similarity
In the first analysis, for each participant, we assessed scanpath similarity by calculating scanpath similarity between three presentations of the same image, separately for social and nonsocial images. We found that in response to nonsocial images, scanpath similarity was significantly higher for participants with ASD, while for social images, the opposite was true, that is, scanpath similarity was significantly higher for TD participants, compared with the other group.
We argue that scanpath similarity in such case can be interpreted as a reflection of attentional bias, constraining eye-movement patterns. In the absence of any biases, the eyes would move freely, exploring each region of the image with equal probability, which, over repeated presentations of the image, should lead to completely dissimilar scanpaths. This, of course, is never the case, as observers have many biases constraining their eye-movement patterns, such as preference for salient or meaningful features of the image. However, increased level of scanpath similarity for a certain class of stimuli may be interpreted as a signature of a perceptual or attentional bias.
While decreased social interests in the ASD population are well documented (Dawson et al., 2004; Guillon et al., 2014; Schultz, 2005), there is less evidence regarding stronger preferences for nonsocial stimuli. In other words, the question is whether ASD is a case of decreased social interests, or both decreased social interests and increased nonsocial interests. In our study, we found both higher within-individual similarity for nonsocial images and lower within-individual similarity for social images in the ASD group, when compared with the TD group. This suggests the presence of both a preference for certain aspects of nonsocial stimuli and a decreased preference for some aspects of social stimuli in the ASD group. An example of such bias in our study can be observed in Figure 3 (however, note that this is just for illustration purposes—we did not actually compare the number of fixations with social and nonsocial AOIs within a single image). This is consistent with research reports by Sasson et al. (2011), Sasson et al. (2008), Sasson and Touchstone (2014), Elison, Sasson, Turner-Brown, Dichter, and Bodfish (2012) and Manyakov et al. (2018), who found that decreased attention to social stimuli was accompanied by an increase in fixations to nonsocial objects related to circumscribed interests that frequently occur in ASD. However, in this study, we used photographs of natural scenes, similar to Wang et al. (2015), which shows that nonsocial bias is not limited only to objects of special interests to the autistic population.

Differences in the likelihood of looking at each AOI by the TD versus ASD groups (warmer colors indicate a larger likelihood of the AOI being looked at by TD participants relative to the ASD participants). For example, the AOI marked in red (girl’s face) means that TD participants were at least 15% more likely to look at it than ASD participants, while the AOI encircled in blue (yellow picture stand) means that participants with ASD were at least 8% more likely to look at it than the TD group.
Within-group similarity
In the second analysis, we compared within-group scanpath similarity for the two groups in the study: individuals with ASD and TD control group, to measure commonalities in the looking behavior stemming from group membership, that is, from traits that members of the group have in common. Higher within-group similarity means that the eye-movement patterns are more constrained by their group membership, or to put it simply, that the person looks in a way similar to other members of their group. While within-individual similarity indicated whether the individual tends to look at certain stimuli in a constrained, biased way, measuring within-group similarity can tell us whether this bias or constrained way of looking is common for the whole group or is it that members of the group do look at the stimuli in a constrained, but idiosyncratic way.
First, in the control group (which was the reference category in the model) we observed a significant decline in within-group scanpath similarity from the earlier to the last presentation, both for social and nonsocial images (as evidenced by the social × session and nonsocial × session interactions). In other words, after repeated presentations of the same stimulus, eye-movements become less constrained and more idiosyncratic. Thus, when we view an image for the first time, the initial scanpath is probably determined by the most semantically salient features such as prominent objects, with the aim of general scene interpretation. With repeated presentations, that is, after the scene has been interpreted, scanpaths may reflect other processes, such as exploration, which is more idiosyncratic and determined by top-down factors, such as preferences and goals.
We found no significant difference between the last presentations of social and nonsocial images in terms of the within-group scanpath similarity in the control group. In other words, control participants did not differ significantly in terms of their eye-movement similarity to other control participants, whether they looked at social or nonsocial images. Thus, measurement of within-group similarity did not provide evidence for an attentional bias stemming from preferential treatment of social stimuli in the TD group. However, there was a significant difference between the TD and ASD groups in terms of their within-group similarity. We found that in the last presentation (which was the reference category in the model), the scanpaths of the ASD group were significantly more similar to one another for nonsocial images, while for social images, this pattern was reversed and their scanpaths were significantly less similar to one another than scanpaths in the TD group. In other words, participants with ASD were more similar to one another in terms of their looking patterns when they were looking at nonsocial stimuli and less similar when they were looking at social stimuli (as evidenced by the significant social × ASD and nonsocial × ASD interactions). This means that the scanpaths of the ASD group were less constrained than typically in response to social images and more constrained in response to nonsocial images. Thus, participants with autism looked at social stimuli in a way that was more exploratory and less restricted by the presence of socially salient elements of the image. In contrast, their way of looking at nonsocial stimuli was less exploratory. We can shed some additional light on this result by looking at the significant three-way interaction nonsocial × session × ASD. The direction of the interaction means that the difference between the groups in the last presentation of the nonsocial images was significantly smaller in earlier presentations than in the last presentation. As discussed above, in the control group, we observed a decline in within-group scanpath similarity between the earlier and last presentations. Thus, significance of the three-way interaction for nonsocial images means that the decline in within-group scanpath similarity observed in the control group was weaker in the ASD group. In other words, while scanpaths of the control group became less constrained and more idiosyncratic with repeated presentations, this effect was weaker in the ASD group, but only for nonsocial images. This means that scanpaths of participants with autism remained more constrained with repeated presentations of nonsocial images than scanpaths in the control group. Thus, prolonged within-group scanpath similarity may be a reflection of decreased tendency for exploration and more perseverative attention in individuals with autism. This result is consistent with the results reported by Sasson et al. (2008), who found more perseverative and less exploratory eye-movement patterns in the ASD population.
Limitations and suggestions for future research
Scanpath similarity is a nonspecific measure of the degree of recurring patterns in two sets of scanpaths. It acts like a pattern finder, by identifying regularities in the data, but on its own it does not pinpoint the sources of those similarities. Such similarities could be spatial or temporal in nature, given that scanpaths contain both of these types of information. In other words, similarity between scanpaths could be a consequence of commonalities in terms of what the observer looks at, but also the order of looking at certain elements of the image. Moreover, those regularities could be related to various levels of processing, from low level, such as visual salience or image center bias, higher level such as semantically meaningful objects, or finally, they could be related to higher order cognitive processing (e.g. reading facial expressions and finding cues for problem-solving). Again, scanpath similarity measure on its own cannot tell us where the commonalities between the scanpaths originate. This bears significance for the interpretation of the results of this study. First, it means that it is difficult to pinpoint the exact source of scanpath similarity, further than stating that TD participants, compared with individuals with ASD, had more regularities in their looking patterns when viewing social images, while in the case of participants with ASD, there were more regularities in their scanpaths when they viewed nonsocial images. For example, it is impossible to say whether those regularities were a consequence of recurring fixations to specific types of objects or perhaps order of fixating certain elements of the image. Moreover, while interpreting the results in terms of an attentional bias seems to be parsimonious and well grounded theoretically, it is possible that a different cognitive mechanism may be responsible for the differences in scanpath similarity between the groups, such as a perceptual bias or a difference in information processing strategy or speed.
Second, it raises questions regarding stimulus control in meaningful interpretation of scanpath similarity results. We controlled for stimulus complexity by ensuring that there were no statistically significant differences between the stimuli in terms of the average number of fixations elicited by each image. However, given the low number of stimuli we used (due to the specificity and clinical nature of our sample), it is possible that social and nonsocial stimuli groups differed in terms of some characteristic, to which people with ASD and TD individuals respond differently, giving rise to the statistically significant differences between the groups. Traditionally, the remedy for this problem is very tight stimulus control (not always easy with images of natural scenes), measurement of potentially confounding stimuli characteristics, and increasing the stimulus set to decrease the chances of nonsystematic significant differences between the stimuli groups. However, in this study we have controlled for this problem statistically—using mixed-effects modeling and treating both participants and image as fully crossed random factors, in line with Judd, Westfall, and Kenny (2012). This procedure takes into the account the variance of the ASD group versus the TD group across different stimuli of a given type. The fact that we obtained a statistically significant effect means that this variance is sufficiently low to conclude that this effect is stable across different images and, therefore, that it is likely to generalize to other images as well.
Conclusion
We found significant differences in the eye-movement behavior between participants diagnosed with ASD and TD controls. Participants with ASD had higher scanpath similarity in response to social images and lower scanpath similarity in response to nonsocial images. Moreover, scanpaths of participants with ASD, compared with those of the control group, were more similar to one another when they were viewing nonsocial images and less similar to one another when they were viewing social images. These results suggest the presence of an attentional bias for nonsocial stimuli and a weakened preference for social stimuli in individuals with autism spectrum diagnoses. In addition, we found that in response to repetition of nonsocial stimuli, the scanpath similarity of participants with autism did not decline to the same extent as in the control group, which suggests more perseverative attention in the ASD group.
Taken together, these results show that autism is characterized not only by weakened preference for the social world, but also heightened preference for the nonsocial world.
Supplemental Material
AUT865809_Lay_Abstract – Supplemental material for Scanpath similarity measure reveals not only a decreased social preference, but also an increased nonsocial preference in individuals with autism
Supplemental material, AUT865809_Lay_Abstract for Scanpath similarity measure reveals not only a decreased social preference, but also an increased nonsocial preference in individuals with autism by Magdalena Ewa Król and Michał Król in Autism
Supplemental Material
AUT865809_Supplemental_material – Supplemental material for Scanpath similarity measure reveals not only a decreased social preference, but also an increased nonsocial preference in individuals with autism
Supplemental material, AUT865809_Supplemental_material for Scanpath similarity measure reveals not only a decreased social preference, but also an increased nonsocial preference in individuals with autism by Magdalena Ewa Król and Michał Król in Autism
Footnotes
Acknowledgements
The authors thank Kinga Ferenc and Barbara Płońska for their help with conducting the study. They also thank the wonderful participants and their families for coming to the laboratory and taking part in the study.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Centre in Poland (Narodowe Centrum Nauki) under Grant 2017/27/B/HS6/00169.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
