Abstract
Children and adults with autism spectrum disorder are less accurate in facial emotion recognition, which is thought to contribute to impairment in social functioning. Although many interventions have been developed to improve facial emotion recognition, there is no consensus on how to best measure facial emotion recognition in people with autism spectrum disorder. This lack of agreement has led to wide variability in how facial emotion recognition is measured and, subsequently, inconsistent findings related to impact of intervention targeting facial emotion recognition impairment. The purpose of this review is to synthesize the extant research on measurement of facial emotion recognition in the context of treatment. We conducted an electronic database search to identify relevant, peer-reviewed articles published between January 1998 and November 2019 to identify studies evaluating change in facial emotion recognition in autism spectrum disorder. Sixty-five studies met inclusion criteria, utilizing a total of 36 different assessment measures for facial emotion recognition in individuals with autism spectrum disorder. Only six of the measures were used in multiple studies conducted by different investigative teams. The outcomes of the studies are reported and summarized with the goal of informing future research.
Lay Abstract
Children and adults with autism spectrum disorder show difficulty recognizing facial emotions in others, which makes social interaction challenging. While there are many treatments developed to improve facial emotion recognition, there is no agreement on the best way to measure such abilities in individuals with autism spectrum disorder. The purpose of this review is to examine studies that were published between January 1998 and November 2019 and have measured change in facial emotion recognition to evaluate the effectiveness of different treatments. Our search yielded 65 studies, and within these studies, 36 different measures were used to evaluate facial emotion recognition in individuals with autism spectrum disorder. Only six of these measures, however, were used in different studies and by different investigators. In this review, we summarize the different measures and outcomes of the studies, in order to identify promising assessment tools and inform future research.
Facial emotion recognition (FER), a component of social cognition (Adolphs, 2002), is fundamental for effective social communication and interaction (Ekman, 1992; Wang et al., 2004). Valid and clinically sensitive assessment of FER is critical to effectively address FER impairments. Typically, the ability to discriminate discrete emotions based on facial expression alone develops early in childhood. By 7 months of age, babies are able to discriminate dynamic happy and angry faces (Soken & Pick, 1992), and by 4 years of age, typically developing children can verbally label most basic, prototypical emotions with accuracy (Widen & Russell, 2003). Given the predictable onset, timing, and trajectory of this process in typically developing children, delay in FER may be meaningfully related to certain forms of atypical development and psychopathology. Consistent with this idea, FER impairments have been documented in several clinical conditions, including externalizing disorders (e.g. Aspan et al., 2013) and depression (e.g. Jenness et al., 2014). FER atypicalities are perhaps most widely documented in youth with autism spectrum disorder (ASD; e.g. Lozier et al., 2014).
The FER atypicalities documented in youth with ASD vary according to identifiable dimensions of emotional expression including valence, arousal, and the characteristics of actors displaying emotion. For example, relative to typically developing peers, young children with ASD experience more difficulty with recognition of certain expressions depending on the overall affective valence of the displayed emotion (e.g. Rump et al., 2009) and are more adept at recognizing emotions in familiar people relative to strangers (Shanok et al., 2019). Thus, measurement strategies aimed at quantifying FER should, at minimum, consider known dimensions of complexity including valence, intensity, and specificity. However, a meta-analysis of studies exploring FER by Lozier et al. (2014) highlights the variability of results in terms of emotion-specific FER deficits in ASD. While some studies suggest deficits exist across a variety of emotions (e.g. Rump et al., 2009), others indicate deficits primarily for negative emotions (Ashwin et al., 2006; Wingenbach et al., 2017), and still others point to specific emotions such as sadness (e.g. Boraston et al., 2007) or fear (e.g. Pelphrey et al., 2002). Most research indicates that from 12 years of age through adulthood, individuals with ASD do not show impairment in recognizing basic emotions (Capps et al., 1992; Grossman et al., 2000); they do, however, show difficulty when stimuli are more subtle or complex, and when presented briefly (Humphreys et al., 2007). Difficulty recognizing emotions, especially complex ones, seems to stem, in part, from altered functioning of the social brain (Black et al., 2017), such as attending to mouth more than eye region of stimulus (Black et al., 2020). In everyday social interactions, emotional expressions tend to be subtle, complex, and brief; as such, any type of FER impairment, even if mild, is likely a direct contributor to social problems.
As summarized in a review of behavioral and neuroimaging studies of FER in ASD by Harms et al. (2010), the inconsistency in research findings on FER among intellectually able individuals with ASD is due, in part, to differences in how FER is measured. Inconsistency in measurement also has likely affected the extant intervention research. Published studies of FER in ASD have yielded inconsistent findings regarding responsiveness to interventions targeting FER among individuals with ASD. A recent review, for instance, summarized that emotion recognition training is promising, but there is little data on generalizability of intervention effects (Berggren et al., 2018). In a systematic review of the effectiveness of technology-based interventions in improving FER in individuals with ASD, Lee, Lam et al. (2018) highlighted the difficulty in drawing conclusions regarding effectiveness of the intervention programs when outcome measures vary considerably across studies. Given the implications of FER impairments, it is important to identify valid measures of FER that are sensitive to change. The primary aim of this systematic review is to identify and summarize commonly used FER assessment tools in outcome research in individuals with ASD, with the goal of informing future research. Because sensitivity of a measure is inextricably linked to the effect of the intervention that measure is being used to gauge, we make an assumption that the interventions studied are roughly equivalent in terms of potency, in order to make comparisons of sensitivity across the measures. However, we fully acknowledge that this is a circular argument, as degree of (non)equivalence is not known, in part due to use of different FER measures.
Methods
Inclusion criteria
To be included in the review, the articles had to be published in scholarly, peer-review journals and written in English. The study had to measure change in FER ability in individuals with ASD, specifically assessing FER as an outcome variable. Only data-based papers were included; as such, reviews, commentaries, and conference papers were excluded. In addition, in order to determine which FER measures are most sensitive to change regardless of the intervention provided, studies that utilized the same measure (i.e. protocol or stimuli) to conduct the intervention as well as to assess FER outcome were excluded. No participant age restrictions were implemented.
Search methods
We conducted an electronic database search of PubMed and Web of Science to identify relevant, peer-reviewed articles, published between January 1998 and November 2019. Search keywords included [ASD, AUTISM, or ASPERGER] and [EMOTION RECOGNITION, FAC* RECOGNITION, FAC* AFFECT] and [FAC* EXPRESSION]. One of the authors screened all titles and abstracts to exclude clearly irrelevant articles and reviewed full papers of the remaining articles to determine which articles met all inclusion criteria, including use of an intervention. Another author reviewed full papers of all of the included articles and 10% of the excluded articles in order to confirm eligibility or exclusion. When the first reviewer was unsure of eligibility (n = 14 articles), the secondary reviewer assigned a rating as well. When discrepancy occurred, a third rater determined the final eligibility for inclusion. After the database search, references of all included articles were reviewed for articles that were not identified in the initial database search (cf., snowball search; Greenhalgh & Peacock, 2005).
Variable definitions and coding
All included articles were coded for measures used to assess change in FER (Table 1). For study-specific FER measures with no given name, we provided a descriptive label. A brief description of the measure, including stimuli-specific factors such as color versus black & white, static versus dynamic, adult versus child, and whole face versus partial face, is then provided. Third, we indicated which emotions were conveyed. Fourth, we cited the studies that utilized each FER measure. In the case of multiple articles by the same author or author group and with the same sample, only the article with the largest sample size was reported to avoid confounding the findings. Fifth, we summarized findings in terms of change in FER from pre-intervention to post-intervention. When available, effect size was included. When not reported, Cohen’s d was calculated with the following formula: ((Mpost − Mpre)/SDpre), when the requisite data were provided. Given differences in study design and statistical power across the studies, we emphasized indices of clinical significance and effect size over statistical significance. We also included a brief description of treatment intensity (i.e. number of sessions or duration of program) for each intervention. Finally, we reported participant characteristics for each study, including sample size, age, and intellectual quotient (IQ) (when provided) for all participants that were administered the FER measure, irrespective of condition, diagnosis, or group assignment.
Measures of change in FER in individuals with ASD.
IQ: intelligence quotient, PIQ: performance intelligence quotient, NVIQ: nonverbal intelligence quotient, VIQ: verbal intelligence quotient; FER: facial emotion recognition; ASD: autism spectrum disorder; DANVA: Diagnostic Analysis of Nonverbal Accuracy; CAM-C: Cambridge Mindreading Face-Voice Battery for Children; ADHD: attention deficit hyperactivity disorder.
Emotions conveyed: emotion expressions presented to participants: H = happy, A = angry; S = sad, F = fear, D = disgust, Su = surprised, Sc = scared, N = neutral. Evidence of change: when available, effect size of change is reported. When data are available, Cohen’s d is calculated with following formula: Mpost − Mpre/SDpre. When effect size unavailable, general description of outcome is provided. Only results related to change due to intervention are provided. NS designates non-significant result. Treatment intensity: length and number of treatments as provided in the manuscript. Participants: total number of participants that were administered the FER measure, irrespective of condition, diagnosis, or group assignment with age range and IQ reported. When age range was not available, mean and standard deviation (in years) are provided. NA designates studies for which data is not available. Studies that included participants with IQ below 70 are marked with *.
Results
Search results
The search resulted in a total of 1971 articles after removal of duplicates (Figure 1). From this pool, 56 articles were identified, an additional eight articles were included after review of the references, and one article was included after colleague suggestion. Across the 65 included articles, 36 specific measures were identified (Table 1). Drawing from the rubric established for determination of evidence-based treatments (Chambless et al., 1998), evaluation by two or more research teams is considered a more rigorous standard of examination. Therefore, the measures in this review were evaluated and subsequently ranked based on degree of use in the field: measures utilized across more than one study and more than one research team (Level 1), measures utilized across more than one study but only one research team (Level 2), and measures only identified in one study (Level 3).

Flowchart demonstrating method of study identification and screening.
Level 1 measures: measures utilized across multiple studies and research teams
Of the 36 identified measures, only 6 were used in multiple studies by different research labs. Most of these measures are commonly utilized in research with diverse clinical populations and standardized, with reported psychometric properties.
Almost one-third of the included studies (n = 21) utilized the DANVA2 to assess change in FER following intervention. Of the 21 studies, only seven reported significant change in FER ability as assessed by the DANVA/DANVA2. These studies utilized a variety of intervention approaches, including a comprehensive school-based intervention (Lopata et al., 2012, 2013), a comprehensive psychosocial treatment (Lopata, Thomeer, et al., 2015), a modified computerized dynamic facial emotion training tool, the MiX (Russo-Ponsaran et al., 2016), a social adjustment enhancement intervention (Solomon et al., 2004), and a group-based Social Competence Intervention (Schmidt et al., 2011; Stichter et al., 2010). Of the seven studies, three utilized a control group design (Lopata, Thomeer, et al., 2015; Russo-Ponsaran et al., 2016; Solomon et al., 2004). In addition to these studies that reported statistically significant changes, Baghdadli and colleagues (2013) found that children exhibited fewer errors for recognition of anger (Adult Faces only) following treatment. This effect was not found for the Child Faces or for the other emotions. While these articles reported findings in either child, adult, or both tests, the majority of studies (n = 13, 61.90%) using the DANVA or DANVA2 did not find significant change following the intervention.
In our review, six studies utilized only the Revised version of the RMET, four studies utilized only the child version of the RMET, and one study utilized both versions of the measure to assess change in FER following an intervention. Support for the ability of the measure to detect change in FER is mixed. Out of the seven studies utilizing the adult version, three indicated no significant change in FER and four indicated significant improvement. The studies that indicated a significant improvement on the RMET utilized a wide range of interventions, including intranasal oxytocin (Anagnostou et al., 2012; Guastella et al., 2010), neurofeedback training (Friedrich et al., 2015), and an interactive multimedia program called Mind Reading (Golan & Baron-Cohen, 2006). Of these four studies, three utilized a control condition (Anagnostou et al., 2012; Golan & Baron-Cohen, 2006; Guastella et al., 2010). Of the three studies that found no significant improvement on the RMET, two utilized the same or similar interventions to those noted above. Guastella and colleagues (2015) and Quintana and colleagues (2017) found no significant effect of intranasal oxytocin on performance on the RMET. In addition, Golan and Baron-Cohen (2006) found no significant time × group interaction for the RMET in the first experiment, in which a group of adults with ASD used the software at home; in the second experiment, in which individuals used the system at home and also met in a group with tutor, they found that individuals in the treatment group improved on the RMET following the intervention.
Only one of the five studies which utilized the child version of RMET found a significant improvement, using a group-based Social Competence Intervention (Stichter et al., 2010). However, three other studies utilized the Social Competence Intervention and found no significant improvement on the RMET (i.e. Schmidt et al., 2011; Stichter et al., 2012, 2014).
Only one study utilized CAM to evaluate change in FER. Golan and Baron Cohen (2006) found significant change in FER following a computerized intervention (i.e. Mind Reading: The Interactive Guide to Emotions), an interactive guide to emotions and mental states in adults with ASD. Eight studies (12.31%) were identified that utilized the CAM-C to evaluate change in FER following an intervention. In all eight studies, the authors found an improvement in FER as measured by CAM-C following the intervention. Only three out of the eight studies utilized a control condition (Lopata et al., 2016, 2019; Thomeer et al., 2015). However, all but one of these studies evaluated the same computerized intervention (i.e. Mind Reading: The Interactive Guide to Emotions) (Lacava et al., 2007, 2010; Lopata et al., 2012, 2013, 2016; Thomeer et al., 2015), with Lopata and colleagues (2019) evaluating Mind Reading as one part of a comprehensive school-based intervention program. The only study that utilized CAM-C outside of the Mind Reading intervention was an open pilot study of a comprehensive outpatient psychological treatment for children with ASD (Lopata et al., 2017), in which they found significant improvement on CAM-C scores following intervention.
Ten studies (15.38%) utilized the NEPSY-II AR subtest as an outcome measure. Six studies reported no significant pre-to-post difference in scores, and four studies reported significant improvement in AR scores following the intervention. The studies that found significant improvement in NEPSY-II AR evaluated different interventions, which included a modified, computerized dynamic facial emotion training tool, the MiX by Humintell© (Russo-Ponsaran et al., 2016), a computer-based social skills intervention called FaceSay (Rice et al., 2015), a home-based intervention using a video, called The Transporters (R. L. Young & Posselt, 2012), and Virtual Reality Social Cognition Training (Didehbani et al., 2016). Three of the four studies that found a significant improvement utilized a control condition (Rice et al., 2015; Russo-Ponsaran et al., 2016; R. L. Young & Posselt, 2012). The six studies that found no significant improvement on the NEPSY-II AR task used a variety of intervention approaches, including some previously mentioned interventions (e.g. The Transporters). While R. L. Young and Posselt (2012) found significant change utilizing The Transporters, B. T. Williams et al. (2012) found no significant intervention by time effects on the NEPSY-II AR with the same intervention. It is important to note however, that Williams and colleagues utilized a sample of participants with IQ below 70 while R. L. Young and Posselt (2012) included only participants with average cognitive abilities.
Only three identified studies (4.62%) utilized the Penn Emotion Recognition Task to evaluate change in FER following an intervention. Mehling et al. (2017) utilized the measure to evaluate the Hunter Heartbeat Method (Hunter, 2014), which is a drama-based social skills intervention. The authors reported no significant differences in pre-to-post-intervention scores on the measure; however, half of the participants obtained high scores (>85%) prior to treatment, indicating minimal room for improvement. Eack and colleagues (2013) utilized ER40 during their initial evaluation of feasibility and initial efficacy of a Cognitive Enhancement Therapy (CET). While the authors did not detect significant pre-to-post-intervention change in overall emotion recognition, they found improvement in accuracy for sad faces. J. S. Lee, Kang, et al. (2018) utilized ER40 to investigate the effect of social skills training on facial emotion recognition in children with ASD and Attention Deficit Hyperactivity Disorder (ADHD). Authors indicate no significant change in FER in either group after the training.
Only three identified studies (4.62%) utilized this measure to assess change in FER following an intervention. Golan and colleagues (2010) found that the treatment group significantly improved on all three tasks. Similarly, Gev et al. (2017) utilized the Situation-Facial Expression Matching task as a FER measure to evaluate The Transporters and found significant time × treatment interaction and significant pre-treatment to post-treatment change. Yan and colleagues (2018) utilized The Transporters to improve FER in Chinese children with ASD. They utilized a subset of five items covering five basic emotions (happiness, anger, sorrow, fear, and astonishment), selected based on frequency of emotional words used by children in China. They found that the intervention significantly improved children’s FER compared to their pre-intervention scores.
Level 2 measures: measures utilized across several studies but within one research team
Five out of the 36 identified measures (13.89%) were utilized in multiple studies. However, they were utilized by the same research team evaluating one intervention, therefore limiting generalizability.
In their initial study describing the development and evaluation of a computer-based treatment program, Bölte and colleagues (2002) found significant improvement on the FEFA for both eyes and whole face following the intervention. Similarly, Bölte and colleagues (2006, 2015) found significant improvement on the FEFA for training group following administration of the FEFA. While all three of these studies showed significant change in FER utilizing the FEFA measure, all studies were completed by one research group. Moreover, they evaluated change in FER after the same intervention (i.e. FEFA), which utilizes stimuli similar to those used in the FEFA assessment.
In their pilot study (Thomeer et al., 2011) as well as their follow-up RCT (Thomeer et al., 2015) of the Mind Reading program with in vivo rehearsal, Thomeer and colleagues found significant improvement in parent rating on the ERDS. In their RCT evaluating the efficacy of Mind Reading as a component of a comprehensive psychosocial treatment, Lopata et al. (2016) found that, while there was no significant time × treatment condition interactions, there was a main effect of time for parent and clinician ratings, with both groups (i.e. summer treatment, summer treatment with Mind Reading component) improving. While these three studies show ERDS to be a promising measure for evaluating FER change in individuals with ASD, it has been utilized only by one group and only to evaluate programs with the Mind Reading component.
Didehbani and colleagues (2016) used Ekman60 Faces in addition to the NEPSY-II AR task described above to measure affect recognition in their evaluation of the impact of a Virtual Reality Social Skills Intervention to enhance social skills in children with ASD. While they found significant increases in the NEPSY-II AR task following the intervention, they did not find statistically significant changes on Ekman60. Kandalaft et al. (2013), however, found significant changes on Ekman60 following this intervention, even though they did not find significant differences on the RMET measure described above.
Level 3 measures: measures used in only one study
Each of the remaining 25 measures was utilized in a single study. However, many of these studies utilized the same stimuli (i.e. Ekman & Friesen, 1976 Picture of Facial Affect stimuli) to create different measures or protocols to assess FER.
Of note, significant improvement in FER was seen on only seven of the 17 measures. Three of these measures include static images: Emotion Recognition Test (ERT) utilized by Bölte et al. (2015), the Faces Task developed by Baron-Cohen et al. (1997), utilized by R. L. Young and Posselt (2012), and Overt Emotion Sensitivity Task with stimuli derived from Karolinska Directed Emotional Faces database, utilized by Quintana and colleagues (2017). Petrovska and Trajkovski (2019) utilized visual material from multiple sources to form Emotion Comprehension Test (ECT), in which examinees evaluated photographs of facial expressions, pictograms, and situation-based emotional scenes in three tasks. The other three measures included colored cartoons of emotions (Emotion Recognition Cartoons utilized by Silver & Oakes, 2001), facial expression video clips (Face task from Mind Reading utilized by Fridenson-Hayo et al., 2017), and complex emotion scale including pictures in the form of cards (Cheng et al., 2018). Five studies found a significant main effect of time only when evaluating FER measures (i.e. Beaumont & Sofronoff, 2008; Chung et al., 2016; Dadds et al., 2014; Miyahara et al., 2010; Silver & Oakes, 2001).
Discussion
FER difficulties have been widely documented in individuals with ASD (e.g. Lozier et al., 2014). Numerous interventions have been designed to specifically address FER difficulties in this population and yet, studies have yielded mixed results. A multitude of assessments are used to examine change in FER with intervention in individuals with ASD. Differences in stimuli (e.g. static vs dynamic), modality (e.g. questionnaire, computer task), and demands (e.g. task duration) of these assessments make it difficult to determine the degree to which FER impairment is, or is not, amenable to therapeutic remediation. Of note, in order to compare sensitivity of the measures across studies, we had to presuppose that the interventions were of approximately equal potency. An assumption of equivalence is almost certainly flawed; as such, results should be interpreted in that context. However, our categorization into three levels based on use by different teams and in multiple studies partially mitigates this concern. For instance, the most widely used measures have been applied across different treatments.
In the context of the rich history of FER intervention research in ASD, results of this systematic review highlight limited agreement with respect to how to assess this process. In the 65 identified articles, 36 different measures were utilized. However, only six measures (i.e. DANVA2, RMET, CAM/-C, NEPSY-II AR, Penn Emotion Recognition Task, Situation-Facial Expression Matching Task) were utilized across study teams. Of note, the Penn Emotion Recognition Task was utilized in only two studies, both of which showed no significant change in FER following the intervention. The Situation-Facial Expression Matching Task was used in three studies all evaluating the same intervention. Research utilizing the other four main measures has found medium-to-large effects with respect to change in FER (DANVA2: d = 0.52–1.27; RMET: d = 0.35–1.2; CAM-C: d = 0.25–1.64; NEPSY-II AR: d = 0.40–1.70). Even within these measures, however, the results are inconsistent, as some studies find significant effects while others do not. This pattern of mixed results is seen across both within-subject and between-subject (control group) designs. Additional small-to-medium effects were found for many studies, even when the change was not statistically significant. These results suggest that FER is in fact modifiable in individuals with ASD, and therefore, additional concerted work on measurement in this domain is needed.
There is little evidence pointing to a single robust measure that is sensitive to change when evaluating treatment. The CAM/CAM-C is the only measure for which all nine identified studies found a significant improvement in FER following the intervention, suggesting its potential utility. However, as only four of the studies included a control condition, and all but one evaluated the same computerized intervention in itself or as part of a larger intervention, its utility in judging the impact of treatments and comparing relative efficacy of different interventions is limited. In addition, similar stimuli from the Mindreading intervention program are utilized to measure change in FER, limiting generalizability. The NEPSY-II AR, however, has been evaluated utilizing various interventions by different research teams. While less than half of the 10 identified studies found significant change in FER, the effects were medium to large, and three of these four studies included a control condition. Given the small number of studies utilizing this measure however, further research into its utility is necessary. This highlights the need for additional work in this area, in order to facilitate judgments regarding the impact of treatment or comparisons of relative efficacy.
Twenty-four of the 65 identified articles utilized multiple FER measures to assess change. In eight of these 24 articles, variability was seen across the measures, such that while improvement was seen in one measure, no change was observed on another measure. For example, Anagnostou et al. (2012) found no significant improvement following intervention on the DANVA2 but found significant improvement on the RMET. Guastella and colleagues (2015) report an opposite pattern, with no significant drug group × time interaction for RMET but a significant interaction observed for the DANVA2. Even within the same intervention and measure, inconsistent patterns emerge. For example, while R. L. Young and Posselt (2012) found significant change in NEPSY-II AR following The Transporters intervention, B. T. Williams and colleagues (2012) found no significant intervention by time effects on the same measure, using the same intervention. It is unclear what factors contribute to these inconsistencies.
The FEFA and MiX are both Level 2 measures that show promise given the significant improvements reported across studies, even though the studies were conducted by the same research team. For both the FEFA and MiX measures, however, the stimuli that are utilized to measure change in FER are similar to those presented to the participants during the intervention phase. These types of measures may detect change in the taught skills; however, generalization to other stimuli, which is the ultimate goal of any intervention program, is not demonstrated.
Overall, the current body of literature suggests that DANVA is the most commonly used measure; however, it includes only static photos of adult and child facial expressions and focuses on only a narrow set of basic emotions. The RMET is also commonly used, however it includes black and white pictures of eye region of a face only, and performance is dependent on age and full and verbal IQ (Peñuelas-Calvo et al., 2019). For colored photographs of children’s faces specifically, the NEPSY-II AR may be a good option for experimenters. To measure FER dynamically, the CAM-C should be considered as it is the only measure utilizing dynamic, colored video clips that has been used across more than one study and more than one research team. However, in order to assess generalization of the skill above and beyond the immediate content taught in the intervention, a task separate from the one evaluated by the intervention, or a multi-layered assessment, rather than a single measure, may be required.
Limitations
This review should be considered in light of several limitations. Given the inconsistency in what information is presented in the identified articles, there is variability in the level of detail provided for the measures. In addition, given differences in methodologies (e.g. use of control group), data analytic approach, and presentation of results, firm conclusions about “evidence of change” across studies cannot be drawn. A variety of intervention approaches were used as well, and an in-depth examination of the interventions (e.g. dosage, format) is beyond the scope of this review. However, whenever possible, an effect size was reported to allow for comparisons across studies. Evaluating a measure based on sensitivity to change with intervention is complicated; even a sound measure cannot detect change with an ineffective intervention. In addition, although the list of keywords used to search the databases was constrained to a relatively small number, use of a snowball search approach should have helped mitigate risk of missed studies. Finally, majority of the studies utilized samples of cognitively able participants, limiting generalizability.
Future directions
There are inconsistencies regarding the effects of intervention targeting FER impairments in ASD. Given the current state of research, it is unclear whether this inconsistency is due to noise in the measurement, methodological differences, or true differences in the intervention impact. In order for the field to advance and more efficiently converge regarding FER intervention efficacy and outcome assessment, researchers are encouraged to utilize existing measures whenever possible, without alteration, in order to make interpretation more straightforward. In addition, future studies should evaluate measures that show promise across studies (i.e. Level 1), such as CAM-C (Golan et al., 2015). Overall, the findings from this systematic review suggest that FER is modifiable, and more concerted work on measurement of FER abilities in individuals with ASD is needed, in order to allow for sensitive evaluation of impact and determination of relative efficacy of different interventions targeting FER in individuals with ASD.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The funding source is R21/R33MH100268.
