Abstract
This meta-analysis tested whether autistic people show a marked, isolated difficulty with mentalising when assessed using the Frith-Happé Animations, an advanced test of mentalising (or ‘theory of mind’). Effect sizes were aggregated in multivariate meta-analysis from 33 papers reporting data for over 3000 autistic and non-autistic people. Relative to non-autistic individuals, autistic people underperformed, with a small effect size on the non-mentalising control conditions and a medium effect size on the mentalising condition. This indicates that studies have reliably found mentalising to be an area of challenge for autistic people, although the group differences were not large. It remains to be seen how important mentalising difficulties are in accounting for the social difficulties diagnostic of autism. As autistic people underperformed on the control conditions as well as the mentalising condition, it is likely that group differences on the test are partly due to domain-general information processing differences. Finally, there was evidence of publication bias, suggesting that true effects on the Frith-Happé Animations may be somewhat smaller than reported in the literature.
Lay abstract
Autistic people are thought to have difficulty with mentalising (our drive to track and understand the minds of other people). Mentalising is often measured by the Frith-Happé Animations task, where individuals need to interpret the interactions of abstract shapes. This review article collated results from over 3000 people to assess how autistic people performed on the task. Analysis showed that autistic people tended to underperform compared to non-autistic people on the task, although the scale of the difference was moderate rather than large. Also, autistic people showed some difficulty with the non-mentalising as well as mentalising aspects of the task. These results raise questions about the scale and specificity of mentalising difficulties in autism. It also remains unclear how well mentalising difficulties account for the social challenges diagnostic of autism.
Autism has often been associated with difficulties in mentalising, that is, our drive to track and understand the minds of other people (Frith, 2001). Several tests have been devised to measure mentalising skills; the most fundamental is the false-belief task, which requires individuals to track a character’s false belief to predict their behaviour. While such tasks can capture mentalising difficulties in younger children, these are not sensitive to the real-world social difficulties of older autistic children and adults (Frith, 1994). This has led to the development of ‘advanced’ mentalising tests, including those which explicitly ask test-takers to interpret mental states, such as the Strange Stories (Happé, 1994) and the Reading the Mind in the Eyes Test (Baron-Cohen et al., 2001). Other advanced mentalising tests are more implicit and measure a person’s tendency to attribute mental states without an explicit requirement for the interpretation of behaviour. The most commonly used of these tests is the Frith-Happé Animations (Abell et al., 2000; Castelli et al., 2002). This test has advantages over other tests, as it includes well-matched control conditions that help rule out non-mentalising accounts of performance on the test. In addition, the test may be a relatively pure measure of mentalising, as it does not make the same demands as the Reading the Mind in the Eyes Test on emotion perception, a rather different skill to mentalising (Oakley et al., 2016), and it involves nonverbal materials that may somewhat reduce the high verbal requirements of other tests such as the Strange Stories (which shows a moderate relationship with verbal ability; e.g. Devine & Hughes, 2016).
In the Frith-Happé Animations, several cartoon clips of moving triangles are presented, and the test-taker is prompted to consider what is happening in each clip, responding either through verbal descriptions or a multiple-choice format (White et al., 2011). There are three conditions, one of which targets mentalising, while two conditions control for more general skills in perceiving and interpreting movement and action. In these control conditions, the triangles either move randomly (e.g. swirling around the screen) or in a simple goal-directed fashion (e.g. chasing each other across the screen). In the critical mentalising condition, the triangles interact as if they are trying to influence each other’s mental states (e.g. by persuasion or deception). Given the theory that autism involves a core impairment in mentalising (Frith, 2001), autistic people might be expected to underperform specifically in the mentalising condition. This view was supported by the original studies using the animations with adults diagnosed with Asperger’s (Castelli et al., 2002) and less cognitively able autistic children (Abell et al., 2000). Both groups were less likely to provide ‘appropriate’ descriptions, specifically, of the mentalising clips. Example descriptions from the Castelli et al. (2002) study include: ‘The big triangle was trying to make the little one go out, but he doesn’t want to’ (appropriate description of the ‘coaxing’ mentalising clip) and ‘The two triangles are obviously angry with each other – they are fighting’ (inappropriate description of the same clip). In later research, an objective multiple-choice version of the task was found to be ‘as sensitive as the traditional subjective method in demonstrating the well-established mentalizing impairment in autism’ (White et al., 2011, p. 152) and this version has since been developed as a ‘fast and straight forward measure of ToM [theory of mind] in autistic and neurotypical adults, to be used in future research and clinical settings’ (Livingston et al., 2019). However, recent work sounds a cautionary note about the sensitivity of the Frith–Happé Animations, as a large sample of autistic adults scored lower, but only slightly, on both mentalising and control animations compared to non-autistic adults (Wilson & Bishop, 2020). This is more consistent with a subtle domain-general difficulty in inferring meaning rather than a marked, but isolated, difficulty with mentalising.
Given these inconsistencies, this study presents a meta-analysis to evaluate the extent to which other studies support, or undermine, the hypothesis that the Frith-Happé Animations test reveals a specific mentalising difficulty in autistic people.
Methods
A meta-analysis was carried out by screening the citation lists in Web of Science for the original studies that developed the Frith-Happé Animations (Abell et al., 2000; Castelli et al., 2002) as well as the citation list for a more recent multiple-choice version of the test (White et al., 2011). After removing duplicates, 1781 titles/abstracts were screened, and of these, 121 papers were read to determine eligibility. Papers were excluded because they were theoretical/review papers (n = 15), did not present the Frith-Happé Animations (n = 53), did not include autistic and control groups (n = 9), only presented fMRI data (n = 3), only presented protocols (n = 2), had overlapping samples (n = 4), did not supply relevant data (n = 1) or predominantly included less verbally able individuals (n = 1). 33 papers were included in the meta-analysis.
Studies were included if they presented the Frith-Happé Animations to a group of autistic children or adults and to a control group. Participants needed either (1) to produce verbal descriptions of the animations that were rated according to the method of Abell et al. (2000) or similar criteria set out by Castelli et al. (2002) or (2) to complete the multiple-choice questions (MCQs) devised by White et al. (2011). Under the criteria of Castelli et al. (2002), verbal descriptions are rated for appropriateness and use of mental state language (although the latter is only meaningful for the mentalising animations). The MCQ version includes two question types: classification (of each animation as showing random, physical or mental interaction) and identification of feelings (represented in the mentalising animations). As this review compared performance across animation types, the relevant performance indices were appropriateness ratings and accuracy on the classification MCQs. Studies were eligible if the task was presented in full or part (e.g. without the control animations). Only one study has used the Frith-Happé Animations with less cognitively able individuals (Abell et al., 2000; mean full-scale intelligence quotient (FSIQ) of 74). Therefore, to reduce heterogeneity, this study was excluded to focus on samples of individuals with average-range general ability.
The following data were extracted from all papers: sample characteristics (sample size, age, gender, verbal ability and use of autism assessments), task characteristics (number and type of animations presented, scoring criteria used and the presence of inter-rater quality check), and means and SDs for performance on each animation type (in terms of appropriateness ratings and/or the classification MCQs, depending on the procedure used in this paper). Where means and SDs were not available in the paper, effect sizes were calculated on the basis of test statistics or authors were emailed. Authors were also emailed to confirm that samples were largely independent if there was uncertainty whether samples might have overlapped, for example, if the same authors published more than one paper in quick succession.
A meta-analysis was carried out in R (R Core Team, 2020) using the metafor package (Viechtbauer, 2010). Data and code are available on the Open Science Framework (https://osf.io/qa8p2/). For each animation type reported in each paper, the standardised mean difference between autistic and non-autistic groups (Hedges’ g) was computed; these are all shown in Figure 1. All individual effect sizes were aggregated in a multivariate meta-analysis to test the hypothesis that autistic people have a specific difficulty with processing mentalising animations compared to goal-directed and random animations. Animation type (goal-directed, random, mentalising) was included as a fixed effect in the meta-analysis, and the hypothesis was tested through the significance of the three levels of this fixed effect. The goal-directed condition was taken as the reference level of the fixed effect, as it offers a high-level control for the mentalising condition. Since most samples contributed more than one effect size to the meta-analysis (as most people were presented with more than one animation type), study was included as a random effect to model the dependencies between effect sizes calculated from the same groups of people. For the same reason, standard errors of effect sizes from the same paper are likely to be correlated. As papers typically have no reason to report this information, cluster robust estimation was used to allow for unknown covariances, as suggested by Hedges et al. (2010).

Forest plots for group differences between autistic and non-autistic people on the Frith-Happé Animations by condition.
This meta-analysis depends on the close matching of autistic and non-autistic groups across studies. While most studies had attempted to match participants for verbal ability, age and gender, there were some small discrepancies, as can be seen in Table 1, so the robustness of the results to these participant variables was assessed. For all studies, it was possible to determine the standardised mean difference in verbal IQ (or an equivalent measure) between the autistic and non-autistic groups recruited into the study, so this control variable was added to the initial meta-analysis as a fixed effect alongside animation type to control for any group differences in verbal ability. Differences in age and the proportion of males versus females across groups were not always calculable from the studies. Therefore, these factors were assessed in subsequent models re-running the meta-analysis across subsets of studies reporting this information (For age, the standardised mean difference was computed, whereas for gender the proportion of males in the autistic group was divided by proportion of males in the non-autistic group.).
Sample characteristics.
ADOS: Autism Diagnostic Observation Schedule; DISCO: Diagnostic Interview for Social and Communication Disorders; ADI-R: Autism Diagnostic Interview-Revised; DAWBA: Development and Wellbeing Assessment; 3Di: Developmental, Diagnostic and Dimensional Interview.
For each study, the first row relates to the autistic group and the second row to the non-autistic group. All studies required autistic participants to have a clinical diagnosis of autism based on Diagnostic and Statistical Manual of Mental Disorders (DSM)/International Classification of Diseases (ICD) criteria. Some studies gave participants or their families a diagnostic interview to confirm the diagnosis; this information is shown in the far column. For age, means (SDs) are given in years; months. For verbal ability, standard scores on norm-referenced tests of verbal intelligence are given.
Verbal ability for this paper was measured using raw scores from the Spot the Word Task.
Authors kindly supplied data for the Frith-Happé Animations relating to a larger sample size than reported in the paper (exclusions in these papers were made based on fMRI criteria). Descriptive statistics given here for age, sex and verbal ability relate to the slightly smaller samples reported in the paper.
These studies did not provide descriptive statistics for verbal ability, so full-scale IQ scores have been reported here.
For this paper, verbal ability was measured using a non-standardised vocabulary test.
After running the meta-analysis, Cook’s distance was used to assess for samples exerting undue influence on the results. Then, three further fixed effects – age-group (adult or child sample), format (verbal description or MCQs) and the inverse of the sample size – were included to investigate whether these possible moderators accounted for heterogeneity in individual effect sizes. The inverse of the sample size was included as a moderator to assess for publication bias, as a relationship between smaller studies and larger effects might exist if the former were only published if a large effect was found.
Results
Data from 1530 autistic and 1569 non-autistic people, drawn from 33 papers, were included in the meta-analysis. There were 2138 adults (1067 non-autistic and 1071 non-autistic) and 961 children (463 autistic and 498 non-autistic). Due to incomplete reporting, it was not possible to determine the exact gender distribution, but among adults, approximately 68% of the autistic and 59% of the non-autistic individuals were male, and among children, equivalent percentages were approximately 83% and 74%. Tables 1 and 2 present sample and task characteristics for each paper, and Figure 1 shows forest plots for effect sizes for each animation type from each paper.
Study characteristics.
This table shows the type of session used for data collection; the number of animations used; the rating criteria (Abell et al., 2000; Castelli et al., 2002) used for verbal descriptions (with or without inter-rater checks); and use of the multiple-choice questions (MCQs) of White et al. (2011).
Whereas effect sizes in the meta-analysis were generally calculated on the basis of means and SDs reported in papers, these studies did not present SDs but did give effect sizes. Therefore, these effect sizes were directly aggregated in the meta-analysis, with sampling variances calculated on the basis of the effect size and sample size.
Although behavioural and fMRI sessions were used in this study, data on the verbal descriptions have yet to be fully published, so only effect sizes based on data collected on the MCQs in the fMRI session have been included in the meta-analysis. Note, however, that the authors reported no group difference on total scores on the verbal descriptions, so results are likely to be similar to the null results collected in the fMRI session.
Although the MCQs were administered in both these studies alongside the verbal description paradigm, the papers did not report scores broken down by animation type, so only data on verbal descriptions are analysed in this review.
These studies did not present means or SDs, so effect sizes were calculated on the basis of test statistics.
Animation type (goal-directed movement, random movement, mentalising) was investigated as a fixed effect predicting autistic participants’ interpretation of the clips. Each level of the fixed effect was significant, with autistic people showing less normative interpretations than non-autistic people. Controlling for any group differences in verbal ability, absolute effect sizes [95% confidence intervals (CIs)] were small for the animations with random, g = −0.35 [−0.51, −0.19], and goal-directed movement, g = −0.35 [−0.48, −0.22], and there was a medium effect size for the mentalising animations, g = −0.62 [−0.74, −0.50], all p < 0.001. To test the hypothesis that autistic people have specific difficulty with the mentalising animations, relative differences in effect size were assessed. Compared to the reference level (goal-directed movement), there was no significant difference for the random condition, p = 0.997, and a small-sized increase in group difference for mentalising, g = −0.27 [−0.41, −0.14], p < 0.001. Table 3 shows these effect sizes separately for the child and adult samples. As noted in section ‘Methods’, it was possible that imperfect matching for age and gender across studies might have influenced results, so the meta-analysis was re-run across the subset of studies that had reported this information. Study-level differences in age, p = 0.633, and gender, p = 0.799, did not contribute to the models. The analyses indicate that autistic people had some difficulty with all types of animation, but somewhat more for the mentalising ones, when controlling for group differences in verbal ability, gender and age.
Effect sizes across the child and adult samples.
The outlying study (Clemmensen et al., 2016) involved children and has been removed from analysis. The random movement condition was only presented in studies involving adults.
However, there was significant heterogeneity in effect sizes, Q(64) = 205.17, p < 0.001, and so sources of heterogeneity were explored, first through influential case diagnostics and then through moderator analysis. One study (Clemmensen et al., 2016) was associated with a high Cook’s distance. Exclusion of this study made limited difference to the effect sizes; random, g = −0.24 [−0.38, −0.11], goal-directed, g = −0.34 [−0.48, −0.20], and mentalising, g = −0.58 [−0.70, −0.46]; difference in effect size between the goal-directed and mentalising animations remained significant, p = 0.002. With this study still excluded, three moderators were added to the model. These moderators were age-group (child, adult), task format (verbal description, MCQ) and inverse of the sample size. As the inverse of the sample size approaches zero, we have a hypothetical sample of unlimited size and precision, and so coefficients for other fixed effects reflect predictions under this situation. Table 4 shows the results of this moderator analysis. Inverse of the sample size was a significant predictor in the model, and coefficients for the various conditions of the Frith-Happé Animations were somewhat smaller, indicating evidence for publication bias. The moderator analysis also demonstrated a trend-level link between administration of the multiple-choice (rather than verbal) task format and slightly smaller group differences between autistic and non-autistic samples. There were no differences in effect size when comparing children and adults.
Results of the moderator analysis.
The intercept reflects performance on the goal-directed movement condition of the Frith-Happé Animations. Hedges’ g values for the other conditions represent relative differences compared to the intercept rather than the absolute effect size.
Discussion
Across the studies collated in this review, autistic people experienced a gradient of difficulty on the Frith-Happé Animations, with a small effect size difference between autistic and non-autistic people on the control conditions and an additional small increase in effect size on the mentalising animations. Analysis indicated that similar effects were found across children and adults on the spectrum and that there was evidence of publication bias slightly inflating these effects. It has been claimed that ‘impairments in individuals with autism can be revealed in characteristic inaccuracies in mental state attribution to animated shapes’ (Castelli et al., 2002, p. 1845). On the one hand, this meta-analysis did find a reliable difference between autistic and non-autistic people in mentalising skills as measured by the task. On the other hand, there are questions about the scale and specificity of the difference.
In the first study using the Frith-Happé Animations with autistic adults (Castelli et al., 2002), there was a very substantial difference between autistic and non-autistic people on the mentalising animations (g = −5.75), but widescale replication across many studies has not shown such a stark difference. After accounting for differences in verbal ability and performance on the control animations (including the higher-level ‘goal-directed’ control condition), there was a small but reliable difference between autistic and non-autistic groups. Other meta-analyses have shown much larger group differences on mentalising tasks (e.g. Chung et al., 2014; Velikonja et al., 2019), but crucially these reviews have not controlled for cognitive processes other than mentalising, as this review has, so we should be sceptical about the size and specificity of these group differences. There are a couple of ways of interpreting the small effect established in this study. It may represent a modest difference between the average autistic and non-autistic person in mentalising, which, along with other cognitive factors, may have real-world impacts on social learning and behaviour. However, it is worth noting that there is limited evidence that mentalising tests predict autistic features or day-to-day social skills (as reviewed by Gernsbacher & Yergeau, 2019), so the real-world significance of a modest difference in mentalising remains difficult to assess. Alternatively, the effect size at the group level may disguise heterogeneity, with a substantial difficulty in mentalising only experienced by a subgroup of autistic people. In the case of the present analysis, this view is speculative, but it has received empirical support in a large-scale heterogeneity analysis that found mentalising differences only in some autistic subgroups (Lombardo et al., 2016). This is in line with the theory that there are multiple cognitive influences on autism that vary in their impact for any particular individual (e.g. Brunsdon & Happe, 2014).
A different interpretation of the results would revolve around the task: that the Frith-Happé Animations might not be sensitive to individual differences in mentalising. First, it should be noted that restricted variance is not a problem for the task when presented to general population or clinical groups, as no study included in this review reported a ceiling effect. Therefore, the question is not about the presence of individual differences on the task, but what these represent. On the one hand, the Frith-Happé Animations and other mentalising tasks do not tend to correlate highly, if at all, suggesting that we cannot be confident in precisely what accounts for variance in performance on the tests. Gernsbacher and Yergeau (2019) collate evidence for the poor convergence between different mentalising tests, and conclude that mentalising lacks construct validity. On the other hand, we could equally argue that mentalising is not a single ability but a set of multiple, specific skills – a view that is supported by neural accounts of the ‘social brain’ (Schaafsma et al., 2015). Indeed, it is within the social neuroscience literature that we find the strongest argument for the validity of the Frith-Happé Animations as a test of mentalising. Studies show that the mentalising condition reliably activates social-cognitive networks in the brain that partially overlap with activation patterns observed for other mentalising tasks (see Schurz et al., 2014 for a meta-analysis). In the largest neuroimaging study of the Frith-Happé Animations to date (n = 394; Moessnang et al., 2020), there was robust activation across the mentalising network bilaterally in the pSTS and surrounding regions, as well as in precuneus, inferior frontal gyrus (IFG), medial prefrontal cortex (mPFC) and temporal poles. Furthermore, this neuroimaging study found no mean differences between autistic and non-autistic people in task-related activation in the mentalising network (Moessnang et al., 2020). This is consistent with the idea that the social difficulties diagnostic of autism might not be well-accounted for by mentalising difficulties.
As noted above in the meta-analysis, autistic people tended to underperform on the control animations. This raises the question whether difficulties on the Frith-Happé Animations can be explained to some extent by more general difficulties with interpreting motion, whether at a perceptual or higher cognitive level, which manifest across all animation types. Perceptually, autistic people have been found to perform differently on tasks involving detection and discrimination of global and biological motion without mentalising demands (e.g. Klin et al., 2009; Milne et al., 2002; Robertson et al., 2014). Interestingly, in their meta-analysis of global and biological motion tasks, Van der Hallen et al. (2019) found that autistic people underperformed to a very similar degree as they did on the control animations in the present study, g = −0.30 (−0.17, −0.44), with no differences by task or motion type (Van der Hallen et al., 2019). These sensory-perceptual differences may impact on performance of a more complex task involving moving stimuli such as the Frith-Happé Animations. Alternatively, difficulties could emerge at a higher cognitive level, as the task involves assigning a ‘narrative’ to moving stimuli. Consistent with the central coherence hypothesis (Happé & Frith, 2006), autistic individuals may show less tendency to integrate the actions and interactions of animated shapes into a central narrative, and similar domain-general processing differences may apply across conditions. In line with this view, supplemental material to this article shows continuity in performance across the control and mentalising conditions (r = 0.66, 95% CI: 0.43, 0.89) in almost 200 autistic and non-autistic adults, suggesting that the conditions share overlapping cognitive demands.
This meta-analysis presented evidence that autistic people show less difficulty in understanding social narratives in abstract animations than early reports indicated. This suggests that we should be cautious about suggesting that mentalising is necessarily an area of marked difficulty for autistic people, although we should also not underplay the subtle but reliable difference in mentalising that did emerge, which may impact on the behavioural phenotype in autism. Given that group differences between autistic and non-autistic people also emerged on the control conditions of the Frith-Happé Animations, it is possible that performance across the task is influenced, in addition to the mentalising demands, by domain-general abilities in perceiving and assigning meaning to motion.
Supplemental Material
sj-pdf-1-aut-10.1177_1362361321989152 – Supplemental material for Do animated triangles reveal a marked difficulty among autistic people with reading minds?
Supplemental material, sj-pdf-1-aut-10.1177_1362361321989152 for Do animated triangles reveal a marked difficulty among autistic people with reading minds? by Alexander C Wilson in Autism
Footnotes
Acknowledgements
The author would like to thank Professor Dorothy Bishop for comments on this article.
Funding
The author received no financial support for the research, authorship and/or publication of this article.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
