Measuring change in facial emotion recognition in individuals with autism spectrum disorder: A systematic review

Abstract

Children and adults with autism spectrum disorder are less accurate in facial emotion recognition, which is thought to contribute to impairment in social functioning. Although many interventions have been developed to improve facial emotion recognition, there is no consensus on how to best measure facial emotion recognition in people with autism spectrum disorder. This lack of agreement has led to wide variability in how facial emotion recognition is measured and, subsequently, inconsistent findings related to impact of intervention targeting facial emotion recognition impairment. The purpose of this review is to synthesize the extant research on measurement of facial emotion recognition in the context of treatment. We conducted an electronic database search to identify relevant, peer-reviewed articles published between January 1998 and November 2019 to identify studies evaluating change in facial emotion recognition in autism spectrum disorder. Sixty-five studies met inclusion criteria, utilizing a total of 36 different assessment measures for facial emotion recognition in individuals with autism spectrum disorder. Only six of the measures were used in multiple studies conducted by different investigative teams. The outcomes of the studies are reported and summarized with the goal of informing future research.

Lay Abstract

Children and adults with autism spectrum disorder show difficulty recognizing facial emotions in others, which makes social interaction challenging. While there are many treatments developed to improve facial emotion recognition, there is no agreement on the best way to measure such abilities in individuals with autism spectrum disorder. The purpose of this review is to examine studies that were published between January 1998 and November 2019 and have measured change in facial emotion recognition to evaluate the effectiveness of different treatments. Our search yielded 65 studies, and within these studies, 36 different measures were used to evaluate facial emotion recognition in individuals with autism spectrum disorder. Only six of these measures, however, were used in different studies and by different investigators. In this review, we summarize the different measures and outcomes of the studies, in order to identify promising assessment tools and inform future research.

Keywords

autism spectrum disorder facial emotion recognition intervention measurement

Facial emotion recognition (FER), a component of social cognition (Adolphs, 2002), is fundamental for effective social communication and interaction (Ekman, 1992; Wang et al., 2004). Valid and clinically sensitive assessment of FER is critical to effectively address FER impairments. Typically, the ability to discriminate discrete emotions based on facial expression alone develops early in childhood. By 7 months of age, babies are able to discriminate dynamic happy and angry faces (Soken & Pick, 1992), and by 4 years of age, typically developing children can verbally label most basic, prototypical emotions with accuracy (Widen & Russell, 2003). Given the predictable onset, timing, and trajectory of this process in typically developing children, delay in FER may be meaningfully related to certain forms of atypical development and psychopathology. Consistent with this idea, FER impairments have been documented in several clinical conditions, including externalizing disorders (e.g. Aspan et al., 2013) and depression (e.g. Jenness et al., 2014). FER atypicalities are perhaps most widely documented in youth with autism spectrum disorder (ASD; e.g. Lozier et al., 2014).

The FER atypicalities documented in youth with ASD vary according to identifiable dimensions of emotional expression including valence, arousal, and the characteristics of actors displaying emotion. For example, relative to typically developing peers, young children with ASD experience more difficulty with recognition of certain expressions depending on the overall affective valence of the displayed emotion (e.g. Rump et al., 2009) and are more adept at recognizing emotions in familiar people relative to strangers (Shanok et al., 2019). Thus, measurement strategies aimed at quantifying FER should, at minimum, consider known dimensions of complexity including valence, intensity, and specificity. However, a meta-analysis of studies exploring FER by Lozier et al. (2014) highlights the variability of results in terms of emotion-specific FER deficits in ASD. While some studies suggest deficits exist across a variety of emotions (e.g. Rump et al., 2009), others indicate deficits primarily for negative emotions (Ashwin et al., 2006; Wingenbach et al., 2017), and still others point to specific emotions such as sadness (e.g. Boraston et al., 2007) or fear (e.g. Pelphrey et al., 2002). Most research indicates that from 12 years of age through adulthood, individuals with ASD do not show impairment in recognizing basic emotions (Capps et al., 1992; Grossman et al., 2000); they do, however, show difficulty when stimuli are more subtle or complex, and when presented briefly (Humphreys et al., 2007). Difficulty recognizing emotions, especially complex ones, seems to stem, in part, from altered functioning of the social brain (Black et al., 2017), such as attending to mouth more than eye region of stimulus (Black et al., 2020). In everyday social interactions, emotional expressions tend to be subtle, complex, and brief; as such, any type of FER impairment, even if mild, is likely a direct contributor to social problems.

As summarized in a review of behavioral and neuroimaging studies of FER in ASD by Harms et al. (2010), the inconsistency in research findings on FER among intellectually able individuals with ASD is due, in part, to differences in how FER is measured. Inconsistency in measurement also has likely affected the extant intervention research. Published studies of FER in ASD have yielded inconsistent findings regarding responsiveness to interventions targeting FER among individuals with ASD. A recent review, for instance, summarized that emotion recognition training is promising, but there is little data on generalizability of intervention effects (Berggren et al., 2018). In a systematic review of the effectiveness of technology-based interventions in improving FER in individuals with ASD, Lee, Lam et al. (2018) highlighted the difficulty in drawing conclusions regarding effectiveness of the intervention programs when outcome measures vary considerably across studies. Given the implications of FER impairments, it is important to identify valid measures of FER that are sensitive to change. The primary aim of this systematic review is to identify and summarize commonly used FER assessment tools in outcome research in individuals with ASD, with the goal of informing future research. Because sensitivity of a measure is inextricably linked to the effect of the intervention that measure is being used to gauge, we make an assumption that the interventions studied are roughly equivalent in terms of potency, in order to make comparisons of sensitivity across the measures. However, we fully acknowledge that this is a circular argument, as degree of (non)equivalence is not known, in part due to use of different FER measures.

Methods

Inclusion criteria

To be included in the review, the articles had to be published in scholarly, peer-review journals and written in English. The study had to measure change in FER ability in individuals with ASD, specifically assessing FER as an outcome variable. Only data-based papers were included; as such, reviews, commentaries, and conference papers were excluded. In addition, in order to determine which FER measures are most sensitive to change regardless of the intervention provided, studies that utilized the same measure (i.e. protocol or stimuli) to conduct the intervention as well as to assess FER outcome were excluded. No participant age restrictions were implemented.

Search methods

We conducted an electronic database search of PubMed and Web of Science to identify relevant, peer-reviewed articles, published between January 1998 and November 2019. Search keywords included [ASD, AUTISM, or ASPERGER] and [EMOTION RECOGNITION, FAC* RECOGNITION, FAC* AFFECT] and [FAC* EXPRESSION]. One of the authors screened all titles and abstracts to exclude clearly irrelevant articles and reviewed full papers of the remaining articles to determine which articles met all inclusion criteria, including use of an intervention. Another author reviewed full papers of all of the included articles and 10% of the excluded articles in order to confirm eligibility or exclusion. When the first reviewer was unsure of eligibility (n = 14 articles), the secondary reviewer assigned a rating as well. When discrepancy occurred, a third rater determined the final eligibility for inclusion. After the database search, references of all included articles were reviewed for articles that were not identified in the initial database search (cf., snowball search; Greenhalgh & Peacock, 2005).

Variable definitions and coding

All included articles were coded for measures used to assess change in FER (Table 1). For study-specific FER measures with no given name, we provided a descriptive label. A brief description of the measure, including stimuli-specific factors such as color versus black & white, static versus dynamic, adult versus child, and whole face versus partial face, is then provided. Third, we indicated which emotions were conveyed. Fourth, we cited the studies that utilized each FER measure. In the case of multiple articles by the same author or author group and with the same sample, only the article with the largest sample size was reported to avoid confounding the findings. Fifth, we summarized findings in terms of change in FER from pre-intervention to post-intervention. When available, effect size was included. When not reported, Cohen’s d was calculated with the following formula: ((M_post − M_pre)/SD_pre), when the requisite data were provided. Given differences in study design and statistical power across the studies, we emphasized indices of clinical significance and effect size over statistical significance. We also included a brief description of treatment intensity (i.e. number of sessions or duration of program) for each intervention. Finally, we reported participant characteristics for each study, including sample size, age, and intellectual quotient (IQ) (when provided) for all participants that were administered the FER measure, irrespective of condition, diagnosis, or group assignment.

Table 1.

Measures of change in FER in individuals with ASD.

Level 1 measures	Description	Emotions conveyed	Study	Evidence of change	Treatment intensity	Participants
Diagnostic Analysis of Nonverbal Accuracy (DANVA; Nowicki & Duke, 1994; DANVA2; Nowicki, 1997)	Colored static photos of adult and child facial expressions	A, F, H, S	Anagnostou et al. (2012)	No significant change in performance (d = 0.33 NS)	2×/day for 6 weeks	N = 19 Age: 33.2 (13.3) IQ: 107 (24)
			Baghdadli et al. (2013)	No significant between group difference post-tx and no significant change in total score between baseline and post-tx. For anger, adult expression only, less errors after tx; post-tx median gain was significantly different between the two groups (d = −0.8)	90 min/week for 20 sessions	N = 13Age: 8–12VIQ: 92.7 (24.2)97.6 (20.3)
			Barnhill et al. (2002)	No statistically significant change between pre- and posttest measures on the child or adult faces (child d = −0.40 NS; adult d = −0.46 NS)	1 h/week for 8 weeks	N = 8Age: 13–18IQ: NA
			Guastella et al. (2015)	No significant interactions observed between drug group and time (post d = 0.21 NS; follow-up d = 0.14 NS)	2×/day for 8 weeks	N = 44Age: 12–18IQ: 80.0 (19.2)*, 93.1 (21.1)
			Guli et al. (2013)	No significant effects for between-group comparisons in change over time (R² = 0.001 NS)	1.5 h/session 2×/week for 8 weeks or 2 h/session 1×/week for 12 weeks	N = 31Age: 8–14IQ: 104.17 (15.47), 107.50 (14.04)
			Lerner et al. (2011)	Intervention group did not demonstrate significant improvement relative to the comparison group on child or adult faces (all R² < 0.021 NS)	145 h over 29 sessions	N = 17Age: 11–17IQ: NA (ID excluded)
			Lopata et al. (2008)	No significant main effect for either the child or the adult faces and no time × tx condition interaction (child d = −0.19 NS; adult d = −0.15 NS)	6 h/day, 5 days/week for 6 weeks	N = 36Age: 6–13IQ: 99.1 (15.7)
			Lopata et al. (2010)	No significant effect after application of Bonferroni correction for multiple comparisons, although obtained effect size suggested a possible medium effect favoring the tx group (d = 0.53 NS)	5 days/week for 5 weeks	N = 36Age: 7–12IQ: 103.0 (14.0)
			Lopata et al. (2012)	Significant pre–post difference for child faces but not for adult faces (child: d = 1.27; adult d = 0.45 NS)	3 weeks, 5×/week preparation followed by 10 months	N = 12Age: 6–9IQ: 102.3 (16.0)
			Lopata et al. (2013)	Significant pre–post change for child faces and adult face (child d = 0.58; adult d = 0.78)	3 weeks, 5×/week preparation followed by 10 months	N = 12Age: 6–9IQ: 104.0 (16.6)
			Lopata, Thomeer, et al. (2015)	No significant time × tx (high vs low intensity) interaction and no main effect for child faces. However, significant main effect for time for adult faces (adult faces: ω² = 0.17; child faces: ω² = 0.05 NS)	5 days/week for 5 weeks	N = 47Age: 7–12IQ: 103.9 (15.7), 109.0 (13.9)
			Lopata, Toomey, et al. (2015)	No significant pre–post change for the Child Faces (d = −0.18 NS)	5 days/week for 5 weeks	N = 28Age: 7–10IQ: 105.8 (16.3)
			Richard et al. (2015)	No significant group effect at post-test for Child Faces. From pre to post, tx group had trend toward greater improvement than control group, but not significant (tx: d = 0.35 NS; control: d = 0.07 NS)	One session	N = 19Age: 8–14IQ: NA
			Russo-Ponsaran et al. (2014)	No significant pre–post change as only one out of the three participants improved score (d = 0.3 NS)	1 h, 2×/week for 8 weeks	N = 3Age: 9–11IQ: 109.3 (NA)
			Russo-Ponsaran et al. (2016)	tx group showed improved performance on Child Faces test (post d = 0.52, maintenance d = 0.54)	1 h/session, 2× per week for 8 weeks	N = 25Age: 8–15IQ: 98.7 (22.2), 106.6 (18.9)
			Schmidt et al. (2011)	Significant pre–post improvement (d = NA)	20 h total: 2×/week for 10 weeks	N = 6Age: 12–13IQ: 103.3 (15.9)
Level 1 measures	Description	Emotions conveyed	Study	Evidence of change	Treatment intensity	Participants
			Solomon et al. (2004)	Significant interaction of group and time for adult faces and child faces with tx group scoring higher at post-test (adult d = 1.09; child d = 1.07)	1.5-h session, 1×/week for 20 weeks	N = 18Age; 8–12IQ: 75–143
			Stichter et al. (2010)	Significant pre–post improvement (improvement translates to one additional correct response from pre to post-intervention) (d = 0.55)	20 h total: 2×/week for 10 weeks	N = 27Age: 11–14IQ: 103.8 (17.0)
			Stichter et al. (2012)	No significant improvement in ability to correctly identify the emotional state (d = 0.07 NS)	20 h total: 2×/week for 10 weeks	N = 20Age: 6–10IQ: 99.3 (15.2)
			Stichter et al. (2014)	No significant pre–post difference on the Child Faces (d = 0.33 NS)	31, 45-min lessons over five units	N = 11Age: 11–14IQ: 99.6 (16.8)
			Thomeer et al. (2012)	No significant group difference at post-assessment controlling for pretest scores (d = 0.26 NS)	5 days/week for 5 weeks	N = 35Age: 7–12IQ: 103.8 (13.5)
Reading the Mind in the Eyes Test, Revised (RMET; Baron-Cohen, Wheelwright, Hill, et al., 2001)	Black and white pictures of eye region of a face	Complex emotions including cautious, confident, doubtful, nervous, playful, and skeptical, among others	Anagnostou et al. (2012)	Significant improvement after 6 weeks (d = 1.2)	2×/day for 6 weeks	N = 19Age: 33.2 (13.3)IQ: 107 (24)
	Black and white pictures of eye region of a face		Friedrich et al. (2015)	Significant pre–post improvement (η² = 0.4)	1 h, 2–3×/week for 6–10 weeks (average 8 weeks)	N = 13Age: 6–17IQ: 93.3 (22.6), 91.3 (30.1)*
			Golan & Baron-Cohen (2006)	Ex 1: No significant time × group interactions (Intervention: d = 0.10 NS; Control: d = −0.13 NS)Ex 2: Significant time × group interaction (software and tutor d = 0.35; social skills d = −0.41)	2 h/week for 10–15 weeks	Ex 1: N = 41Age: 17–52VIQ: 108.3 (13.3), 109.7 (10.0)Ex 2: N = 26Age: 17–50VIQ: 105.7 (16.1), 96.5 (15.5)
			Guastella et al. (2010)	In comparison with performance under placebo, 60% of participants in oxytocin group improved (d = NA)	One dose	N = 16Age: 12–19IQ: NA
			Guastella et al. (2015)	No significant interactions between drug group and time (post d = 0.22 NS; follow-up d = 0.15 NS)	2×/day for 8 weeks	N = 41Age: 12–18IQ: 80.0 (19.2)*, 93.1 (21.1)
			Kandalaft et al. (2013)	No significant pre–post change (d = 0.31 NS)	10 sessions over 5 weeks	N = 8Age: 18–26IQ: 111.9 (8.5)
			Quintana et al. (2017)	No significant effect of treatment or time on performance (d = NA)	Two doses	N = 17Age: 19–35IQ: 109.8 (12.1)
Reading the Mind in the Eyes Test–Child (RMET-C; Baron-Cohen, Wheelwright, Spong et al., 2001b)	28 black and white images of eyes depicting emotion states, with forced choice between four mental-state terms for each	Affective and cognitive mental states including H, Sc, S, worried, friendly, interested, and serious, among others	Guastella et al. (2015)	No significant interactions between drug group and time (post d = 0.21; follow-up d = 0.33 NS)	2×/day for 8 weeks	N = 41Age: 12–18IQ: 80.0 (19.2)*, 93.1 (21.1)
			Schmidt et al. (2011)	No significant pre–post assessment change (d = NA)	20 h total: 2×/week for 10 weeks	N = 6Age: 12–13IQ: 103.3 (15.9)
			Stichter et al. (2010)	Significant pre–post improvement (improvement translates to one additional correct response from pre- to post-intervention) (d = 0.35)	20 h total: 2×/week for 10 weeks	N = 27Age: 10–14IQ: 103.8 (17.0)
Level 1 measures	Description	Emotions conveyed	Study	Evidence of change	Treatment intensity	Participants
			Stichter et al. (2012)	No significant improvement in ability to correctly label someone’s emotional or mental state (d = 0.13 NS)	20 h total: 2×/week for 10 weeks	N = 20Age: 6–10IQ: 99.3 (15.2)
			Stichter et al. (2014)	No significant pre–post difference (d = 0.02 NS)	31, 30- to 45-min lessons	N = 11Age: 11–14IQ: 99.6 (16.8)
Cambridge Mindreading Face-Voice Battery (CAM; Golan et al., 2006)	Colored video clips of male and female adults	20 complex emotions including appalled, appealing, grave, insincere, resentful, stern, uneasy, and vibrant, among others	Golan & Baron-Cohen (2006)	Ex 1: Significant time × condition (intervention, control) effect, with greater improvement for the intervention condition (intervention d = 0.70; control d = 0.27)Ex 2: Significant improvement in software and tutor but not social skills group (software and tutor d = 0.47, social skills d = 0.26 NS)	2 h/week for 10–15 weeks	Ex 1: N = 41Age: 17–52VIQ: 108.3 (13.3), 109.7 (10.0)Ex 2: N = 26Age: 17–50VIQ: 105.7 (16.1), 96.5 (15.5)
Cambridge Mindreading Face-Voice Battery for Children (CAM-C Golan et al., 2015)	Colored video clips of male and female children and adults	A, D, F, H, S, Su, loving, embarrassed, undecided, unfriendly, bothered, nervous, disappointed, amused, and jealous	Lacava et al. (2007)	Significant pre–post scores for the CAM-C Faces subtest (d = 0.76)	10 weeks; average 10 h/week	N = 8Age: 8–11IQ: NA
	Colored video clips of male and female children and adults		Lacava et al. (2010)	All participants improved ER tests scores from pre- to post-testing (d = NA)	7–10 weeks; average 12.3 h total	N = 4Age: 7–9IQ: NA
			Lopata et al. (2012)	Significant pre–post improvement for face and voice composite score (d = 1.64)	3 weeks followed by 10 months	N = 12Age: 6–9IQ: 102.3 (16.0)
			Lopata et al. (2013)	Significant pre–post improvement for face and voice composite score (d = 0.94)	10 months	N = 12Age: 6–9IQ: 104.0 (16.6)
			Lopata et al. (2017)	Significant pre–post change for faces (d = 0.25)	Two 90-min sessions/week for 18 weeks	N = 44Age: 7–12IQ: 108.6 (14.5)
			Lopata et al. (2016)	Significant time × treatment condition interaction observed favoring the augmented group over regular group (ω² = 0.19)	5 days/week for 5 weeks	N = 36Age: 7–12IQ: 105.0 (13.3), 106.3 (13.3)
			Lopata et al. (2019)	Significant treatment effect with significantly greater increase for tx compared to control group (d = 1.41)	160–210 min/week for school year	N = 102Age: 6–12IQ: 103.8 (12.9), 0.9 (14.8)
			Thomeer et al. (2015)	Significant group difference at both posttest and follow-up (between group: ω² = 0.23; posttest d = 1.34; follow-up d = 0.86)	Two 90-min sessions/week for 12 weeks	N = 43Age: 7–12IQ: 102.6 (13.2), 101.6 (15.3)
Affect Recognition subtest of NEPSY-II (Korkman et al., 2007)	Colored photographs of children’s faces	A, D, F, H, N, S	Corbett et al. (2011)	No significant pre–post change (d = 0.62 NS)	2 h/day, 1–4 days/week, for 3 months	N = 8Age: 6–17IQ: 82.4 (16.4)*
	Colored photographs of children’s faces	A, D, F, H, N, S	Corbett et al. (2014)	No significant pre–post change (d = −0.18 NS)	4 h/day, 5 days a week for 2 weeks	N = 12Age: 8–17IQ: 74–118
Level 1 measures	Description	Emotions conveyed	Study	Evidence of change	Treatment intensity	Participants
			Didehbani et al. (2016)	Significant pre–post change for total sample but no significant difference between ADHD + ASD combined compared to ASD only group (d = 0.58)	Ten 1-h sessions. over 5 weeks	N = 30Age: 7–16IQ: 112.6 (12.1)
			Lordo et al. (2017)	No significant pre–post difference in ASD group (d = 0.21 NS)	90 min/session, one session/week for 14 weeks	N = 29Age: 12–17IQ: 95.6 (14.5), 103.5 (10.8)
			Rice et al. (2015)	Significant difference in post-test score between experimental and control groups, controlling for pre-test score (η² = 0.42)	One 25-min session/week for 10 weeks	N = 31Age: 5–11IQ: 101 (14.5)
			Russo-Ponsaran et al. (2016)	Significant difference at post and follow-up between tx and control groups; no significant group × time interaction (post d = 0.40; follow-up d = 0.70)	45–60 min, 2×/week for ~6 sessions	N = 25Age: 8–15IQ: 98.7 (22.2), 106.6 (18.9)
			Voss et al. (2019)	No significant pre–post change but larger positive mean change in tx compared to control participants (d = NA)	Four 20-min sessions, for 6 weeks	N = 71Age: 6–12IQ: 77.8 (21.5), 75.7 (20.7)
			Wieckowski & White (2020)	No significant pre–post change (r = 0.04)	10 sessions total: 2×/week for 5 weeks	N = 8Age: 9–12IQ: 110.9 (13.6)
			B. T. Williams et al. (2012)	No significant intervention by time effects (Intervention group d-post = 0.52 NS)	15 min/day for 4 weeks	N = 55Age: 4–7IQ: 77.9 (14.0), 74.6 (13.6)
			R. L. Young & Posselt (2012)	Significant interaction between time and intervention type; tx group improved significantly pre–post (interaction: partial η² = 0.53; tx: d = 1.70)	Three 5- to 10-min episodes a day for 3 weeks	N = 25Age: 4–8IQ: NA
Penn Emotion Recognition Test (Gur et al., 2002; Kohler et al., 2003)	Colored images of male and female facial expressions	A, D, F, H, N, S	Eack et al. (2013)	No significant pre–post change on 40 stimuli subset, although trend-level effect in emotion perception due to improvement in accuracy of sad faces (overall: d = 0.24 NS; sad: d = 0.61)	60 h computer; 45, 1.5 h group sessions over 18 months	N = 14Age: 18–45IQ: 117.7 (16.8)
			J. S. Lee, Kang, et al. (2018)	No significant pre–post change on 40 stimuli subset for ASD or ADHD group. After controlling for variables, ADHD group showed more improvement compared to ASD group (ASD pre–post d = −1.00 NS)	1.5 h/session, one session per week, 24 sessions total	N = 23Age: 7–10IQ: 82.4 (21.0)
			Mehling et al. (2017)	No significant pre–post change (effect size r = 0.23)	1 h/week for 10 weeks	N = 14Age: 10–13IQ: NA
Situation-Facial Expression Matching (Golan et al., 2010)	Colored photo depicting a scene without facial expression and below, three video clips of character’s facial expressions	A, D, F, H, S, Su, excited, tired, unfriendly, kind, sorry, proud, jealous, joking, ashamed, worried	Golan et al. (2010)	Significant pre–post improvement in treatment group on all tasks (group × time partial η² = 0.45–0.56)	At least three episodes/day for 4 weeks	N = 56Age: 4–8VIQ: 76–116
			Gev et al. (2017)	Significant time × treatment condition interaction and pre–post change; significant time × series interaction (time: η² = 0.20; series η² = 0.17; time × series: η² = 0.24)	10 min/day for 8 weeks	N = 59Age: 4–7IQ: NA
		H, A, F, sorrow, astonishment	Yan et al. (2018)	Significant main effect of time and interaction of group × time. Only the ASD intervention group showed significant pre–post change (group: η² = 0.09 NS; time η² = 0.34; group × time: η² = 0.47)	40 min/day, 5 days/week for 6 weeks	N = 21Age: 5.59 (0.91)IQ: NA
Level 2 measures	Description	Emotions conveyed	Study	Evidence of change	Treatment intensity	Participants
Frankfurt Test for Facial Affect Recognition (FEFA; Bölte et al., 2002)	Black and white photograph of eye regions and whole faces	A, D, F, H, N, S, Su	Bölte et al. (2015)	Significant improvement for ASD training group after training (d = 0.88)	60 min/week for 8 weeks	N = 57Age: 14–33NVIQ: 105.7 (12.0), 109.0 (12.2)
			Bölte et al. (2002)	Significant improvement for treatment group only for face and eyes test (d = NA)	2 h/week for 5 weeks	N = 10Age: 16–40NVIQ: 58–126*
			Bölte et al. (2006)	Significant improvement in trained compared with the untrained sample for both the face and eyes test (face: η² = 0.59; eyes: η² = 0.88)	2 h/week for 5 weeks	N = 10Age: 29.4 (5.9)NVIQ: 94.3 (18.9), 98.6 (19.2)
MiX (Humintell©; Matsumoto & Hwang, 2011)	Colored dynamic videos of adult faces	A, D, F, S, Su, joy, contempt	Russo-Ponsaran et al. (2016)	Significant difference between tx and control group post-intervention and follow-up (post d = 2.02; follow-up d = 1.68)	45–60 min, 2×/week for ~6 sessions	N = 25Age: 8–15IQ: 98.7 (22.2), 106.6 (18.9)
MiX (Humintell©; Matsumoto & Hwang, 2011)	Colored dynamic videos of adult faces	A, D, F, S, Su, joy, contempt	Russo-Ponsaran et al. (2014)	Significant pre–post change as all three participants increased their performance (d = 4.8)	1 h, 2×/week for 8 weeks	N = 3Age: 9–11IQ: 109.3 (NA)
Comprehensive Affective Testing System (CATS; Weiner et al., 2006)—Name Affect and Three-Faces subtests	Black and white photographs of adult faces expressing different emotions	A, D, F, H, N, S, Su	Russo-Ponsaran et al. (2016)	No significant pre–post change on subtests, but small-to-large effects (0.38 ⩽ d_post ⩽ 0.80)	45–60 min, 2×/week for ~6 sessions	N = 25Age: 8–15IQ: 98.7 (22.2), 106.6 (18.9)
		A, D, F, H, N, S, Su	Russo-Ponsaran et al. (2014)	All three participants increased their performance on both subtests (Name Affect: d = 5.1; Three Faces d = 1.5)	1 h, 2×/week for 8 weeks	N = 3Age: 9–11IQ: 109.3 (NA)
Emotion Recognition and Display Survey (ERDS; Thomeer et al., 2011)	Rating scale that assesses ability of children to recognize and display emotions	35 basic and complex emotions (e.g. H, S, silly, upset, tired)	Lopata et al. (2016)	No significant time × treatment condition interactions, although main effect of time was significant for parent and clinician ratings (parent ω² = 0.39; clinician ω² = 0.37)	5 days/week for 5 weeks	N = 36Age: 7–12IQ: 105.0 (13.3), 106.3 (13.3)
			Thomeer et al. (2015)	Significant between-group difference at follow-up, but not at post (between group ω² = 0.08; post-test d = 0.46 NS; follow-up d = 0.73)	Two 90-min sessions/week for 12 weeks	N = 43Age: 7–12IQ: 102.6 (13.2), 101.6 (15.3)
			Thomeer et al. (2011)	Significant pre–post change in parent rating (d = 0.95)	90 min/session, 12 sessions over 6 weeks	N = 11Age: 7–12IQ: 101.3 (17.4)
Facial Expressions of Emotion Stimuli and Tests (Ekman60; A. W. Young et al., 2002)	Black and white pictures of adults	A, D, F, H, S, Su	Didehbani et al. (2016)	No significant pre–post change, or between group (ASD only and ASD + ADHD) difference at post (d = 0.29 NS)	Ten 1-h sessions. over 5 weeks	N = 30Age: 7–16IQ: 112.6 (12.1)
	Black and white pictures of adults	A, D, F, H, S, Su	Kandalaft et al. (2013)	Significant improvement following treatment (d = 0.44)	10 sessions over 5 weeks	N = 8Age: 18–26IQ: 111.9 (8.5)
Level 3 measuresEkman & Friesen (1976) Stimuli adaptations	Description	Emotions conveyed	Study	Evidence of change	Treatment intensity	Participants
Modified version of Ekman Pictures of Facial Affect Series (Ekman, 1993)	Male and female faces of emotions	A, D, F, H, N, S, Su	Beadle-Brown et al. (2017)	Significant improvement pre to follow-up in number of emotions correctly identified; trend was apparent pre–post, but not significant (follow-up d = 2.12; post: d = 0.82 NS)	45 min/week for 10 weeks	N = 22Age: 7–12IQ: 29–87*
Photographs of extracted eye and mouth region (from Eckman & Friesen (1976) set of faces)	Black and white photos of eye and mouth region followed by correct or incorrect emotional label	A, D, F, H, S, Su	Domes et al. (2014)	Significant drug × group interaction, with stronger effects of oxytocin for ASD group compared with controls; significant effect of oxytocin for emotion recognition from the eyes in participants with autism only (d = 0.75)	One dose	N = 28Age: 24.0 (6.9)23.6 (5.4)IQ: 122.4 (24.1), 125.6 (15.4)
Modifications of NimStim Emotional Face Stimuli (Tottenham et al., 2009) presented alongside Ekman Stimuli (Ekman & Friesen, 1976)	Split-screen presentation with still images at 40% intensity from NimStim stimuli on one side and images from Ekman stimuli on the other side	A, F, H, N	Hadjikhani et al. (2015)	Significant improvement for bumetanide treatment group in accuracy in emotion matching (d = 0.59)	1 mg/day for 10 months	N = 7Age: 14–28PIQ: 104.4 (14.5)
Ekman and Friesen’s (1975) photos and Schematic drawings	Black and white photographs of woman’s facial expressions and schematic drawings designed to depict emotions	A, D, F, H, S, Su	Hopkins et al. (2011)	Significant difference between tx group and controls for photos and drawing together, and for photos only for both low functioning (LF) and high functioning (HF) children. However, significant difference was found for schematic drawing only for HF children following intervention (LF total d = 0.44; LF picture d = 0.53, LF drawing d = 0.26; HF total d = 0.49, HF picture d = 0.28, HF drawing d = 0.47)	12 sessions total: 10–25 min/session for 6 weeks	N = 49Age: 6–15IQ: 75.7 (27.3)*
Picture emotion recognition (Ekman & Friesen, 1976), pictures from Mindreading software library, and schematic cartoon faces (Howlin et al., 1999)	Black and white pictures, color pictures, and black and white schematic cartoon faces	Not reported	Lacava et al. (2010)	All subjects showed reliable pre–post improvement (d = NA)	7 to 10 weeks; average 12.3 h	N = 4Age: 7–9IQ: NA
Emotion Recognition Test (ERT) using photographs from Ekman’s Pictures of Facial Affect (Ekman & Friesen, 1976)	Black and white laminated photographs	A, D, F, H, S, Su	Ryan & Charagáin (2010)	Significant pre–post difference and between-group post scores (training program full sample d = 1.42)	1 h/session, one session/week for 4 weeks	N = 30Age: 6–14IQ: 104.6 (17.4), 98.6 (20.2)
Emotion identification task using pictures of Facial Affect (Ekman & Friesen, 1976)	Black and white photographs of adult models displaying six basic facial expressions	A, D, F, H, S, Su	B. T. Williams et al. (2012)	Significant effect of intervention by time for the identification of anger only (Intervention group Total d_post = 0.19 NS; Anger d_post = 0.28)	15 min/day for 4 weeks	N = 55Age: 4–7IQ: 77.9 (14.0), 74.6 (13.6)
Emotion matching task using pictures of Facial Affect (Ekman & Friesen, 1976)	Black and white photographs of adult models displaying six basic facial expressions	A, F, H, S	B. T. Williams et al. (2012)	Significant effect of intervention by time for the matching of all four expressions of emotion and specifically for the matching of anger (Intervention group Total d_post = 0.18; Anger d_post = 0.40)	15 min/day for 4 weeks	N = 55Age: 4–7IQ: 77.9 (14.0), 74.6 (13.6)
Level 3 measures (other measures)	Description	Emotions conveyed	Study	Evidence of change		Participants
Assessment of Perception of Emotion from Facial Expression (S. H. Spence, 1995)	Black and white photographs of facial expressions of four children and adults	A, D, F, H, S, nicely surprised	Beaumont & Sofronoff (2008)	Significant main effect of time but no main effect of group or time × group interaction (time: η² = 0.31; group: η² < 0.01 NS; time × group: η² = 0.05 NS)	2 h/week for 7 weeks	N = 49Age: 7–11IQ: 107.2 (11.9), 107.4 (14.2)
Emotion Recognition Test (ERT: Merten, 2005)	Pictures of seven emotions displayed in faces	A, D, F, H, S, Su, contempt	Bölte et al. (2015)	Significant improvement for ASD training group after training (d = 0.50)	60 min/week for 8 weeks	N = 57Age: 14–33NVIQ: 105.7 (12.0), 109.0 (12.2)
Complex-Emotion Scale based on Facial Affect Scoring Technique, Ekman et al. (1971)	Four complex-emotion pictures and situation-pictures in the form of cards	Not reported	Cheng et al. (2018)	Significant improvement in tx group at post-test compared to control group (tx d = 5.64)	Three sessions, each lasting 40 min, over 21-day period	N = 24Age: 9–12IQ: 75–88
Facial emoticons (Chung et al., 2016)	Facial emoticons of pleasant and unpleasant faces	45 pleasant and 15 unpleasant	Chung et al. (2016)	Significant pre–post change in both tx and control groups; no significant difference in the degree of improvement between groups (tx d = 1.02; control d = 0.74)	1 h/day, 3 days/week for 6 weeks	N = 20Age: 13–18IQ: 80.0 (4.7), 80.4 (8.0)
UNSW Facial Emotion task (Dadds et al., 2004)	PowerPoint presentation of facial emotions (two adult, two adolescent, two child)	A, D, F, H, N, S	Dadds et al. (2014)	Significant pre–post change; no main effect for group (oxytocin, placebo), or time × group interaction (Oxytocin d = 0.31; Placebo d = 0.43)	One dose/day for 4 days	N = 38Age: 7–16IQ: 88.6 (8.0), 90.5 (11.7)
Face task from Mindreading (Baron-Cohen et al., 2004)	Facial expression video clips of male and female actors of various age groups and ethnicities	A, D, F, H, S, Su, interested, bored, excited, worried, disappointed, kind, frustrated, proud, ashamed, joking, unfriendly, hurt	Fridenson-Hayo et al. (2017)	Significant time × group interaction. Significant improvement over time was found for intervention group but not control group (Intervention d = 0.66, 0.89; Control d = 0.14, 0.30)	2 h/week for 8–12 weeks	N = 74Age: 6–9IQ: NA (subtest scores are in average range)
Reading the Mind in Films Test-Children’s Version (RMF-C; Golan et al., 2008)	Characters expressing emotions in short social scenes taken from four children’s movies	22 complex emotions including guilty, lonely, upset, mean, caring and excited, among others	Lacava et al. (2007)	No pretest scores presented however no statistically significant difference from Golan et al.’s (2006) groups of children who received no intervention and group of children who received intervention (d = NA)	10 weeks; average 10 h/week	N = 8Age: 8–11IQ: NA
Photographs from Japanese facial expressions (JACFEE: Matsumoto & Ekman, 1997)	Color photographs of facial expressions	A, D, F, H, S, Su, contempt	Miyahara et al. (2010)	No significant group × time interaction; both time and group main effects were significant (interaction: η² = 0.24 NS; time η² = 0.21; group η² = 0.24)	One session	N = 43Age: 17–26IQ: NA
Emotion Comprehension Test (ECT: Face Task) (taken from Warsaw Set of Emotional Facial Expression Pictures) (Olszanowski et al., 2015), Picto Task and Situation Task (stimuli modified from Teaching Children with Autism to Mind-Read: A practical Guide) (Howlin et al., 1999)	Colored photographs of facial expressions; colored graphic representations of facial expressions; pictures of emotional scenes	H, S, A, F	Petrovska &Trajkovski (2019)	Significant overall main effect for group $(η_{p}^{2} = 0.65)$ and intellectual functioning $(η_{p}^{2} = 0.21)$ on post-tx scores. Main effect for group for post-tx scores for all three tasks (Face task: $η_{p}^{2} = 0.40$ ; Picto task: $η_{p}^{2} = 0.66$ ; Situation task: $η_{p}^{2} = 0.19$ )	12 h over 6 weeks	N = 32Age: 7–15IQ: 62.5% diagnosed with mild or moderate ID (IQ 35–70)*
Level 3 measures (other measures)	Description	Emotions conveyed	Study	Evidence of change		Participants
Overt emotion sensitivity task with stimuli derived from Karolinska Directed Emotional Faces database	Images of male and female faces of ambiguous emotions. Asked “how happy/angry is this person?”	A, H	Quintana et al. (2017)	Significant condition (treatment, control) effect on perception of happiness in ambiguous faces (d = 0.63)	Single dose sessions	N = 17Age: 19–35IQ: 109.8 (12.1)
Facial Expression Photographs from S. Spence (1980)	10 black and white photographs of facial expressions	Not reported	Silver & Oakes (2001)	There was only a time effect as both groups improved their scores over time and the effect of the intervention was not significantly greater (d = NA)	Ten 30-min sessions, over 2 weeks	N = 22Age: 12–18IQ: NA
Emotion Recognition Cartoons (from Teaching Children with Autism to Mind-Read: A practical Guide) (Howlin et al., 1999)	Colored cartoons of emotions	Eight situation-based, six desire-based, and eight belief-based emotions	Silver & Oakes (2001)	Significant time × group (experimental, control) interaction in number of errors made (d = NA)	Ten 30-min sessions, over 2 weeks	N = 22Age: 12–18IQ: NA
Face Emotion Identification Test (FEIT; Kerr & Neale, 1993)	Black and white photographs of faces expressing basic emotions	A, F, H, S, Su, ashamed	Turner-Brown et al. (2008)	Significant main effect of group (treatment, control); main effect for time and group × time interactions were not significant (treatment group d = 0.94)	One 50-min session a week, for 18 weeks	N = 11Age: 25–55IQ: 113.3 (20.0), 110.6 (14.7)
Emotion Guessing Game (EGG; Voss et al., 2019)	40 facial expressions expressed by live human actor (five examples of eight emotions)	Not reported	Voss et al. (2019)	No significant pre–post change but larger positive mean change in tx compared to control participants (d = NA). Significant gains at 6 week follow-up for tx group (d = NA) however lack of control data	Four 20-min sessions, for 6 weeks	N = 71Age: 6–12IQ: 77.8 (21.5), 75.7 (20.7)
Facial expressions from EU-Emotion Stimulus Set (O’Reilly et al., 2016)	12 colored videos of facial expressions; half high intensity and half low intensity	F, A, D, H, S, N	Wieckowski & White (2020)9	No significant change from first to last tx session (r = 0.26)	10 sessions total: 2×/week for 5 weeks	N = 8Age: 9–12IQ: 110.9 (13.6)
Video Emotion Recognition Task (Wieckowski & White, 2017)	Colored video of adult expressing emotions	H, S, F, A, Su, D	Wieckowski & White (2020)	No significant pre–post change (r = 0.18)	10 sessions total: 2×/week for 5 weeks	N = 8Age: 9–12IQ: 110.9 (13.6)
Faces Task (Baron-Cohen et al., 1997)	20 black and white photographs of faces showing basic and complex affect	H, S, A, F, Su, D, distressed, scheming, proud, guilty, thoughtful, admiring, bored, quizzical, flirting, interested	R. L. Young & Posselt (2012)	Significant group × time interaction; significant pre–post change in tx group (interaction partial η² = 0.31; tx d = 0.92)	Three 5- to 10-min episodes a day for 3 weeks	N = 25Age: 4–8IQ: NA

IQ: intelligence quotient, PIQ: performance intelligence quotient, NVIQ: nonverbal intelligence quotient, VIQ: verbal intelligence quotient; FER: facial emotion recognition; ASD: autism spectrum disorder; DANVA: Diagnostic Analysis of Nonverbal Accuracy; CAM-C: Cambridge Mindreading Face-Voice Battery for Children; ADHD: attention deficit hyperactivity disorder.

Emotions conveyed: emotion expressions presented to participants: H = happy, A = angry; S = sad, F = fear, D = disgust, Su = surprised, Sc = scared, N = neutral. Evidence of change: when available, effect size of change is reported. When data are available, Cohen’s d is calculated with following formula: M_post − M_pre/SD_pre. When effect size unavailable, general description of outcome is provided. Only results related to change due to intervention are provided. NS designates non-significant result. Treatment intensity: length and number of treatments as provided in the manuscript. Participants: total number of participants that were administered the FER measure, irrespective of condition, diagnosis, or group assignment with age range and IQ reported. When age range was not available, mean and standard deviation (in years) are provided. NA designates studies for which data is not available. Studies that included participants with IQ below 70 are marked with *.

Results

Search results

The search resulted in a total of 1971 articles after removal of duplicates (Figure 1). From this pool, 56 articles were identified, an additional eight articles were included after review of the references, and one article was included after colleague suggestion. Across the 65 included articles, 36 specific measures were identified (Table 1). Drawing from the rubric established for determination of evidence-based treatments (Chambless et al., 1998), evaluation by two or more research teams is considered a more rigorous standard of examination. Therefore, the measures in this review were evaluated and subsequently ranked based on degree of use in the field: measures utilized across more than one study and more than one research team (Level 1), measures utilized across more than one study but only one research team (Level 2), and measures only identified in one study (Level 3).

Figure 1.

Flowchart demonstrating method of study identification and screening.

Level 1 measures: measures utilized across multiple studies and research teams

Of the 36 identified measures, only 6 were used in multiple studies by different research labs. Most of these measures are commonly utilized in research with diverse clinical populations and standardized, with reported psychometric properties.

Diagnostic Analysis of Nonverbal Accuracy (DANVA; Nowicki & Duke, 1994; DANVA2; Nowicki, 1997). DANVA2 Child Faces and Adult Faces subtests assess the ability to identify four basic emotions (i.e. happy, sad, angry, fearful) in colored photographs. Stimuli include high and low levels of emotional intensity and both male and female faces. The examinee is asked to view the face and then select which of the four emotions was displayed. The Child Faces subtest includes 24 photographs of child male and female facial expressions while the Adult Faces subtest includes 24 photographs of adult male and female facial expressions. The manual provides normative mean values and standard deviations for child and adult faces based on a compilation of over 20 studies of typically developing children broken down by age (Nowicki & Duke, 2008). For the Child Faces, coefficient alpha ranged from 0.69 to 0.81 with modal alpha of 0.76 for children aged 4–16 years and reported test–retest reliability was 0.74 for third-grade children. For the Adult Faces, coefficient alpha was 0.64–0.90 across studies with typically developing children through college students, and test–retest reliability was reported to be 0.84 in college students (Nowicki & Duke, 2008).

Almost one-third of the included studies (n = 21) utilized the DANVA2 to assess change in FER following intervention. Of the 21 studies, only seven reported significant change in FER ability as assessed by the DANVA/DANVA2. These studies utilized a variety of intervention approaches, including a comprehensive school-based intervention (Lopata et al., 2012, 2013), a comprehensive psychosocial treatment (Lopata, Thomeer, et al., 2015), a modified computerized dynamic facial emotion training tool, the MiX (Russo-Ponsaran et al., 2016), a social adjustment enhancement intervention (Solomon et al., 2004), and a group-based Social Competence Intervention (Schmidt et al., 2011; Stichter et al., 2010). Of the seven studies, three utilized a control group design (Lopata, Thomeer, et al., 2015; Russo-Ponsaran et al., 2016; Solomon et al., 2004). In addition to these studies that reported statistically significant changes, Baghdadli and colleagues (2013) found that children exhibited fewer errors for recognition of anger (Adult Faces only) following treatment. This effect was not found for the Child Faces or for the other emotions. While these articles reported findings in either child, adult, or both tests, the majority of studies (n = 13, 61.90%) using the DANVA or DANVA2 did not find significant change following the intervention.

Reading the Mind in the Eyes Test, Revised (RMET, Revised; Baron-Cohen, Wheelwright, Hill, et al., 2001) and Reading the Mind in the Eyes Test, Child Version (Baron-Cohen, Wheelwright, Spong, et al., 2001): the Revised version of the RMET contains 36 black and white photographs of the eye region. After viewing each photograph, the examinee is asked to choose the word that most accurately describes the portrayed emotion. Although developed as a test of theory of mind, since it requires participants to interpret complex facial cues in order to infer the emotion, it is also often used as a measure of FER. The child version of RMET contains 28 black and white images of eyes depicting emotion states with forced choice between four mental-state terms for each. The target emotion words and foils are adapted from RMET.

In our review, six studies utilized only the Revised version of the RMET, four studies utilized only the child version of the RMET, and one study utilized both versions of the measure to assess change in FER following an intervention. Support for the ability of the measure to detect change in FER is mixed. Out of the seven studies utilizing the adult version, three indicated no significant change in FER and four indicated significant improvement. The studies that indicated a significant improvement on the RMET utilized a wide range of interventions, including intranasal oxytocin (Anagnostou et al., 2012; Guastella et al., 2010), neurofeedback training (Friedrich et al., 2015), and an interactive multimedia program called Mind Reading (Golan & Baron-Cohen, 2006). Of these four studies, three utilized a control condition (Anagnostou et al., 2012; Golan & Baron-Cohen, 2006; Guastella et al., 2010). Of the three studies that found no significant improvement on the RMET, two utilized the same or similar interventions to those noted above. Guastella and colleagues (2015) and Quintana and colleagues (2017) found no significant effect of intranasal oxytocin on performance on the RMET. In addition, Golan and Baron-Cohen (2006) found no significant time × group interaction for the RMET in the first experiment, in which a group of adults with ASD used the software at home; in the second experiment, in which individuals used the system at home and also met in a group with tutor, they found that individuals in the treatment group improved on the RMET following the intervention.

Only one of the five studies which utilized the child version of RMET found a significant improvement, using a group-based Social Competence Intervention (Stichter et al., 2010). However, three other studies utilized the Social Competence Intervention and found no significant improvement on the RMET (i.e. Schmidt et al., 2011; Stichter et al., 2012, 2014).

Cambridge Mindreading Face-Voice Battery (CAM; Golan et al., 2006) and Cambridge Mindreading Face-Voice Battery for Children (CAM-C; Golan et al., 2015): the CAM measures emotion recognition for 20 complex emotions and mental states using faces and voices taken from the Mind Reading: The Interactive Guide to Emotions program (Baron-Cohen et al., 2004). The CAM-C measures FER for 15 emotion concepts (six basic and nine complex emotions) using video clips of morphing facial expressions and speech audio clips. While CAM and CAM-C are two different measures that differ in the number and nature of emotions covered and the age group for which measures are used, they share the same structure and nature of the stimuli. For both measures, examinees view or listen to a clip and select one of the four emotion words that reflects the emotion expressed by the person in the clip. The CAM-C effectively discriminates between intellectually able children with ASD and typical children, especially when considering the complex emotions (Golan et al., 2015). Test–retest reliability for CAM-C (administrations 10–15 weeks apart) was 0.74 (Golan et al., 2015).

Only one study utilized CAM to evaluate change in FER. Golan and Baron Cohen (2006) found significant change in FER following a computerized intervention (i.e. Mind Reading: The Interactive Guide to Emotions), an interactive guide to emotions and mental states in adults with ASD. Eight studies (12.31%) were identified that utilized the CAM-C to evaluate change in FER following an intervention. In all eight studies, the authors found an improvement in FER as measured by CAM-C following the intervention. Only three out of the eight studies utilized a control condition (Lopata et al., 2016, 2019; Thomeer et al., 2015). However, all but one of these studies evaluated the same computerized intervention (i.e. Mind Reading: The Interactive Guide to Emotions) (Lacava et al., 2007, 2010; Lopata et al., 2012, 2013, 2016; Thomeer et al., 2015), with Lopata and colleagues (2019) evaluating Mind Reading as one part of a comprehensive school-based intervention program. The only study that utilized CAM-C outside of the Mind Reading intervention was an open pilot study of a comprehensive outpatient psychological treatment for children with ASD (Lopata et al., 2017), in which they found significant improvement on CAM-C scores following intervention.

Affect Recognition subtest of NEPSY-II (Korkman et al., 2007). The Affect Recognition (AR) subtest assesses discrimination of happy, sad, anger, fear, disgust, and neutral emotions from colored photographs of children’s faces in four different tasks: (1) determining whether two photographs depict the same emotion, (2) selecting two faces with the same affect, (3) selecting a face that depicts the same expression as a face shown, and (4) selecting two photographs that depict the same expression as a previously shown face. Internal consistency has been reported to be 0.87 for children age 7–12 years, and test–retest was found to be 0.60 (Korkman et al., 2007).

Ten studies (15.38%) utilized the NEPSY-II AR subtest as an outcome measure. Six studies reported no significant pre-to-post difference in scores, and four studies reported significant improvement in AR scores following the intervention. The studies that found significant improvement in NEPSY-II AR evaluated different interventions, which included a modified, computerized dynamic facial emotion training tool, the MiX by Humintell© (Russo-Ponsaran et al., 2016), a computer-based social skills intervention called FaceSay (Rice et al., 2015), a home-based intervention using a video, called The Transporters (R. L. Young & Posselt, 2012), and Virtual Reality Social Cognition Training (Didehbani et al., 2016). Three of the four studies that found a significant improvement utilized a control condition (Rice et al., 2015; Russo-Ponsaran et al., 2016; R. L. Young & Posselt, 2012). The six studies that found no significant improvement on the NEPSY-II AR task used a variety of intervention approaches, including some previously mentioned interventions (e.g. The Transporters). While R. L. Young and Posselt (2012) found significant change utilizing The Transporters, B. T. Williams et al. (2012) found no significant intervention by time effects on the NEPSY-II AR with the same intervention. It is important to note however, that Williams and colleagues utilized a sample of participants with IQ below 70 while R. L. Young and Posselt (2012) included only participants with average cognitive abilities.

Penn Emotion Recognition Test (Gur et al., 2002; Kohler et al., 2003): the Penn Emotion Recognition Test is a computer-based test. The stimuli include 96 colored images of faces. Participants are asked to determine and select what emotion the face is showing from the following options: happy, sad, anger, fear, disgust, and no emotion. There are eight low-intensity and eight high-intensity expressions presented for each emotion, as well as 16 neutral expressions. ER40, a subset of the Penn Emotion Recognition Test, includes 40 photographs with examinees choosing one of five options (happiness, sadness, anger, fear, and neutral).

Only three identified studies (4.62%) utilized the Penn Emotion Recognition Task to evaluate change in FER following an intervention. Mehling et al. (2017) utilized the measure to evaluate the Hunter Heartbeat Method (Hunter, 2014), which is a drama-based social skills intervention. The authors reported no significant differences in pre-to-post-intervention scores on the measure; however, half of the participants obtained high scores (>85%) prior to treatment, indicating minimal room for improvement. Eack and colleagues (2013) utilized ER40 during their initial evaluation of feasibility and initial efficacy of a Cognitive Enhancement Therapy (CET). While the authors did not detect significant pre-to-post-intervention change in overall emotion recognition, they found improvement in accuracy for sad faces. J. S. Lee, Kang, et al. (2018) utilized ER40 to investigate the effect of social skills training on facial emotion recognition in children with ASD and Attention Deficit Hyperactivity Disorder (ADHD). Authors indicate no significant change in FER in either group after the training.

Situation-Facial Expression Matching Task (Golan et al., 2010): created to evaluate the treatment program The Transporters, the Situation-Facial Expression Matching Task consists of three tasks, including a task in which children must match video clips depicting characters’ facial expressions to a photographic scene that excluded character facial expressions (Golan et al., 2010). The three tasks tested three levels of generalization: familiar close generalization (using situations taken from the intervention series), unfamiliar close generalization (matching novel situations with novel expressions from The Transporters characters), and distant generalization (matching novel situations with novel expressions using non-Transporters faces taken from the Mind Reading software; Baron-Cohen et al., 2004).

Only three identified studies (4.62%) utilized this measure to assess change in FER following an intervention. Golan and colleagues (2010) found that the treatment group significantly improved on all three tasks. Similarly, Gev et al. (2017) utilized the Situation-Facial Expression Matching task as a FER measure to evaluate The Transporters and found significant time × treatment interaction and significant pre-treatment to post-treatment change. Yan and colleagues (2018) utilized The Transporters to improve FER in Chinese children with ASD. They utilized a subset of five items covering five basic emotions (happiness, anger, sorrow, fear, and astonishment), selected based on frequency of emotional words used by children in China. They found that the intervention significantly improved children’s FER compared to their pre-intervention scores.

Level 2 measures: measures utilized across several studies but within one research team

Five out of the 36 identified measures (13.89%) were utilized in multiple studies. However, they were utilized by the same research team evaluating one intervention, therefore limiting generalizability.

Frankfurt Test for Facial Affect Recognition (FEFA; Bölte et al., 2002): the stimuli include black and white photographs of female and male adult faces expressing emotions. The photographs were taken from the Ekman and Friesen (1978) set. In addition to whole faces, the eye region was cropped and assessed to evaluate a different level of FER. The whole face as well as just the eye region images were then compiled to create a computer-aided program to train and test FER on different levels. Unlike the prior four measures, FEFA was developed for both assessment and intervention. The FEFA has been shown to possess good psychometric properties in a normative sample (internal consistency: 0.91–0.95; retest reliability: 0.89–0.92).

In their initial study describing the development and evaluation of a computer-based treatment program, Bölte and colleagues (2002) found significant improvement on the FEFA for both eyes and whole face following the intervention. Similarly, Bölte and colleagues (2006, 2015) found significant improvement on the FEFA for training group following administration of the FEFA. While all three of these studies showed significant change in FER utilizing the FEFA measure, all studies were completed by one research group. Moreover, they evaluated change in FER after the same intervention (i.e. FEFA), which utilizes stimuli similar to those used in the FEFA assessment.

Microexpression Recognition Training Tool (MiX; Humintell©; Matsumoto & Hwang, 2011). The MiX™ provides FER training and testing of seven emotions (i.e. joy, sadness, anger, fear, surprise, disgust, and contempt). MiX™ displays dynamic videos of Facial Action Coding System (FACS)¹-coded emotional faces from Ekman and Friesen (1978), which include individuals of varying gender and ethnicity. The MiX includes didactic instruction of facial emotions, in addition to testing modules. Therefore, the MiX was used as both an assessment measure and the intervention in the following two studies. Measures utilizing MiX were included in the review even though MiX utilized the same measure to conduct the intervention as well as to assess FER outcome (an exclusion criterion for this study), because the modules used to test the progress of individuals using MiX were different from the actual training stimuli; it therefore assesses FER independent of what was specifically taught during the intervention. In both a small pilot study (Russo-Ponsaran et al., 2014) as well as a larger randomized controlled trial (RCT) (Russo-Ponsaran et al., 2016), experimenters found statistically significant improvement on the MiX assessment following the intervention. Although both studies showed significant change in FER utilizing the MiX assessment measure, they were completed by one research group and evaluated the change in FER after the same intervention (i.e. coach-assisted MiX).

Comprehensive Affective Testing System (CATS; Weiner et al., 2006)—Name Affect and Three Faces subtests: CATS is a commercially available measure of FER utilizing the Ekman and Friesen (1976) stimuli. Prior studies have demonstrated acceptable internal consistency in typically developing children (α = 0.61; McKown et al., 2009; Russo-Ponsaran et al., 2015) and children with ASD (Name Affect α = 0.55, Three Faces α = 0.55; Russo-Ponsaran et al., 2015) for the nonverbal awareness subtests of the CATS. The Name Affect subtest asks participants to label emotions in adult face photographs by choosing one of the feelings provided. The Three Faces subtest also uses adult faces and asks participants to match faces displaying differing intensity levels of the same emotion. The two subtests of the CATS were utilized by Russo-Ponsaran and colleagues (2014, 2016) together with the MiX, DANVA, and NEPSY-II AR to evaluate the coach-assisted MiX program. Similar to the results for the MiX, in a pilot study of three children, all improved on the CATS following the intervention, and the effect was large. However, in the follow-up study, no significant pre-to-post-treatment change on the CATS subtests was found, despite small to large effects (Russo-Ponsaran et al., 2016). The inclusion of multiple FER measures in this study makes it possible to compare the ability of the measures to detect change. While participants showed FER improvement on the DANVA, NEPSY-II AR, and MiX, they did not show significant improvement on the CATS subtest, which suggests that the CATS may be less sensitive to FER change with intervention, compared to the other measures.

Emotion Recognition and Display Survey (ERDS; Thomeer et al., 2011): the ERDS is unique in the context of the other instruments included in this review, as it is not a program or stimulus set, but rather a rating scale used to assess the child’s ability to recognize and display emotions. Thomeer and colleagues (2011) developed the 54-item measure for an open-trial pilot of Mind Reading with in vivo rehearsal. The parent provides ratings for the child’s ability to recognize and display 27 emotions, ranging from 1 (almost never) to 5 (almost always). Both basic and complex emotions are included. Thomeer and colleagues reported high internal consistency (0.90) for the Recognition scale for their sample.

In their pilot study (Thomeer et al., 2011) as well as their follow-up RCT (Thomeer et al., 2015) of the Mind Reading program with in vivo rehearsal, Thomeer and colleagues found significant improvement in parent rating on the ERDS. In their RCT evaluating the efficacy of Mind Reading as a component of a comprehensive psychosocial treatment, Lopata et al. (2016) found that, while there was no significant time × treatment condition interactions, there was a main effect of time for parent and clinician ratings, with both groups (i.e. summer treatment, summer treatment with Mind Reading component) improving. While these three studies show ERDS to be a promising measure for evaluating FER change in individuals with ASD, it has been utilized only by one group and only to evaluate programs with the Mind Reading component.

Facial Expressions of Emotion Stimuli and Test (Ekman60; A. W. Young et al., 2002): Ekman60 consists of 60 black and white images taken from the Ekman and Friesen (1976) stimuli set. The set consists of ten example photographs of facial expressions for six basic emotions (happy, sad, surprise, anger, fear, and disgust). The faces are presented one at a time for 5 s each, followed by a blank screen when a participant is asked to pick the emotion name that best describes the facial expression. Ekman60 has high test–retest reliability rs = 0.77 (C. Williams et al., 2009).

Didehbani and colleagues (2016) used Ekman60 Faces in addition to the NEPSY-II AR task described above to measure affect recognition in their evaluation of the impact of a Virtual Reality Social Skills Intervention to enhance social skills in children with ASD. While they found significant increases in the NEPSY-II AR task following the intervention, they did not find statistically significant changes on Ekman60. Kandalaft et al. (2013), however, found significant changes on Ekman60 following this intervention, even though they did not find significant differences on the RMET measure described above.

Level 3 measures: measures used in only one study

Each of the remaining 25 measures was utilized in a single study. However, many of these studies utilized the same stimuli (i.e. Ekman & Friesen, 1976 Picture of Facial Affect stimuli) to create different measures or protocols to assess FER.

Adaptations of the Ekman and Friesen (1976) stimuli: pictures of Facial Affect by Ekman and Friesen consist of black and white photographs of facial expressions that have been widely used for the past four decades. The stimuli include males and females displaying the six basic emotions, as well as neutral expressions. Eight measures across seven identified studies utilized these stimuli in different ways. In addition to the Ekman60 Faces described in the prior section, studies utilized the Ekman and Friesen stimuli alongside additional stimuli, such as schematic drawings (Hopkins et al., 2011), color pictures from Mind Reading software and black and white schematic cartoon faces (Lacava et al., 2010), cartoon/emoticon faces (Beadle-Brown et al., 2017), and NimStim stimuli (Tottenham et al., 2009) presented at the same time on the other side of the screen (Hadjikhani et al., 2015). Ryan and Charragáin (2010) utilized 24 images from the stimuli set that they laminated and presented in a book format. In addition, the stimuli have been used to create Emotion Identification and Emotion Matching tasks (B. T. Williams et al., 2012), and one study adapted these stimuli by extracting eye and mouth regions from the Ekman stimuli (Domes et al., 2014). While the Ekman and Friesen stimuli have been widely used, no direct comparisons across the actual measures can be made, given task differences.

Other assessment measures—the remaining 17 identified measures, which did not utilize the Ekman and Friesen (1976) stimuli—were only utilized in one study of FER change in individuals with ASD. The studies are briefly discussed in Table 1.

Of note, significant improvement in FER was seen on only seven of the 17 measures. Three of these measures include static images: Emotion Recognition Test (ERT) utilized by Bölte et al. (2015), the Faces Task developed by Baron-Cohen et al. (1997), utilized by R. L. Young and Posselt (2012), and Overt Emotion Sensitivity Task with stimuli derived from Karolinska Directed Emotional Faces database, utilized by Quintana and colleagues (2017). Petrovska and Trajkovski (2019) utilized visual material from multiple sources to form Emotion Comprehension Test (ECT), in which examinees evaluated photographs of facial expressions, pictograms, and situation-based emotional scenes in three tasks. The other three measures included colored cartoons of emotions (Emotion Recognition Cartoons utilized by Silver & Oakes, 2001), facial expression video clips (Face task from Mind Reading utilized by Fridenson-Hayo et al., 2017), and complex emotion scale including pictures in the form of cards (Cheng et al., 2018). Five studies found a significant main effect of time only when evaluating FER measures (i.e. Beaumont & Sofronoff, 2008; Chung et al., 2016; Dadds et al., 2014; Miyahara et al., 2010; Silver & Oakes, 2001).

Discussion

FER difficulties have been widely documented in individuals with ASD (e.g. Lozier et al., 2014). Numerous interventions have been designed to specifically address FER difficulties in this population and yet, studies have yielded mixed results. A multitude of assessments are used to examine change in FER with intervention in individuals with ASD. Differences in stimuli (e.g. static vs dynamic), modality (e.g. questionnaire, computer task), and demands (e.g. task duration) of these assessments make it difficult to determine the degree to which FER impairment is, or is not, amenable to therapeutic remediation. Of note, in order to compare sensitivity of the measures across studies, we had to presuppose that the interventions were of approximately equal potency. An assumption of equivalence is almost certainly flawed; as such, results should be interpreted in that context. However, our categorization into three levels based on use by different teams and in multiple studies partially mitigates this concern. For instance, the most widely used measures have been applied across different treatments.

In the context of the rich history of FER intervention research in ASD, results of this systematic review highlight limited agreement with respect to how to assess this process. In the 65 identified articles, 36 different measures were utilized. However, only six measures (i.e. DANVA2, RMET, CAM/-C, NEPSY-II AR, Penn Emotion Recognition Task, Situation-Facial Expression Matching Task) were utilized across study teams. Of note, the Penn Emotion Recognition Task was utilized in only two studies, both of which showed no significant change in FER following the intervention. The Situation-Facial Expression Matching Task was used in three studies all evaluating the same intervention. Research utilizing the other four main measures has found medium-to-large effects with respect to change in FER (DANVA2: d = 0.52–1.27; RMET: d = 0.35–1.2; CAM-C: d = 0.25–1.64; NEPSY-II AR: d = 0.40–1.70). Even within these measures, however, the results are inconsistent, as some studies find significant effects while others do not. This pattern of mixed results is seen across both within-subject and between-subject (control group) designs. Additional small-to-medium effects were found for many studies, even when the change was not statistically significant. These results suggest that FER is in fact modifiable in individuals with ASD, and therefore, additional concerted work on measurement in this domain is needed.

There is little evidence pointing to a single robust measure that is sensitive to change when evaluating treatment. The CAM/CAM-C is the only measure for which all nine identified studies found a significant improvement in FER following the intervention, suggesting its potential utility. However, as only four of the studies included a control condition, and all but one evaluated the same computerized intervention in itself or as part of a larger intervention, its utility in judging the impact of treatments and comparing relative efficacy of different interventions is limited. In addition, similar stimuli from the Mindreading intervention program are utilized to measure change in FER, limiting generalizability. The NEPSY-II AR, however, has been evaluated utilizing various interventions by different research teams. While less than half of the 10 identified studies found significant change in FER, the effects were medium to large, and three of these four studies included a control condition. Given the small number of studies utilizing this measure however, further research into its utility is necessary. This highlights the need for additional work in this area, in order to facilitate judgments regarding the impact of treatment or comparisons of relative efficacy.

Twenty-four of the 65 identified articles utilized multiple FER measures to assess change. In eight of these 24 articles, variability was seen across the measures, such that while improvement was seen in one measure, no change was observed on another measure. For example, Anagnostou et al. (2012) found no significant improvement following intervention on the DANVA2 but found significant improvement on the RMET. Guastella and colleagues (2015) report an opposite pattern, with no significant drug group × time interaction for RMET but a significant interaction observed for the DANVA2. Even within the same intervention and measure, inconsistent patterns emerge. For example, while R. L. Young and Posselt (2012) found significant change in NEPSY-II AR following The Transporters intervention, B. T. Williams and colleagues (2012) found no significant intervention by time effects on the same measure, using the same intervention. It is unclear what factors contribute to these inconsistencies.

The FEFA and MiX are both Level 2 measures that show promise given the significant improvements reported across studies, even though the studies were conducted by the same research team. For both the FEFA and MiX measures, however, the stimuli that are utilized to measure change in FER are similar to those presented to the participants during the intervention phase. These types of measures may detect change in the taught skills; however, generalization to other stimuli, which is the ultimate goal of any intervention program, is not demonstrated.

Overall, the current body of literature suggests that DANVA is the most commonly used measure; however, it includes only static photos of adult and child facial expressions and focuses on only a narrow set of basic emotions. The RMET is also commonly used, however it includes black and white pictures of eye region of a face only, and performance is dependent on age and full and verbal IQ (Peñuelas-Calvo et al., 2019). For colored photographs of children’s faces specifically, the NEPSY-II AR may be a good option for experimenters. To measure FER dynamically, the CAM-C should be considered as it is the only measure utilizing dynamic, colored video clips that has been used across more than one study and more than one research team. However, in order to assess generalization of the skill above and beyond the immediate content taught in the intervention, a task separate from the one evaluated by the intervention, or a multi-layered assessment, rather than a single measure, may be required.

Limitations

This review should be considered in light of several limitations. Given the inconsistency in what information is presented in the identified articles, there is variability in the level of detail provided for the measures. In addition, given differences in methodologies (e.g. use of control group), data analytic approach, and presentation of results, firm conclusions about “evidence of change” across studies cannot be drawn. A variety of intervention approaches were used as well, and an in-depth examination of the interventions (e.g. dosage, format) is beyond the scope of this review. However, whenever possible, an effect size was reported to allow for comparisons across studies. Evaluating a measure based on sensitivity to change with intervention is complicated; even a sound measure cannot detect change with an ineffective intervention. In addition, although the list of keywords used to search the databases was constrained to a relatively small number, use of a snowball search approach should have helped mitigate risk of missed studies. Finally, majority of the studies utilized samples of cognitively able participants, limiting generalizability.

Future directions

There are inconsistencies regarding the effects of intervention targeting FER impairments in ASD. Given the current state of research, it is unclear whether this inconsistency is due to noise in the measurement, methodological differences, or true differences in the intervention impact. In order for the field to advance and more efficiently converge regarding FER intervention efficacy and outcome assessment, researchers are encouraged to utilize existing measures whenever possible, without alteration, in order to make interpretation more straightforward. In addition, future studies should evaluate measures that show promise across studies (i.e. Level 1), such as CAM-C (Golan et al., 2015). Overall, the findings from this systematic review suggest that FER is modifiable, and more concerted work on measurement of FER abilities in individuals with ASD is needed, in order to allow for sensitive evaluation of impact and determination of relative efficacy of different interventions targeting FER in individuals with ASD.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The funding source is R21/R33MH100268.

ORCID iD

Andrea Trubanova Wieckowski

Notes

References

Adolphs

(2002). Neural systems for recognizing emotion. Current Opinion in Neurobiology, 12(2), 169–177.

Anagnostou

Soorya

Chaplin

Bartz

Halpern

Wasserman

…Hollander

(2012). Intranasal oxytocin versus placebo in the treatment of adults with autism spectrum disorders: A randomized controlled trial. Molecular Autism, 3(1), 16.

Ashwin

Chapman

Colle

Baron-Cohen

(2006). Impaired recognition of negative emotions in autism: A test of the amygdala theory. Social Neuroscience, 1(3–4), 349–363.

Aspan

Vida

Gadoros

Halasz

(2013). Conduct symptoms and emotion recognition in adolescent boys with externalization problems. The Scientific World Journal, 2013, Article 826108. http://doi.org/10.1155/2013/826108

Baghdadli

Brisot

Henry

Michelon

Soussana

Rattaz

Picot

M. C.

(2013). Social skills improvement in children with high-functioning autism: A pilot randomized controlled trial. European Child & Adolescent Psychiatry, 22(7), 433–442.

Barnhill

G. P.

Tapscott Cook

Tebbenkamp

Smith Myles

(2002). The effectiveness of social skills intervention targeting nonverbal communication for adolescents with Asperger syndrome and related pervasive developmental delays. Focus on Autism and Other Developmental Disabilities, 17(2), 112–118.

Baron-Cohen

Golan

Wheelwright

Hill

J. J.

(2004). Mind reading: The interactive guide to emotions. Jessica Kingsley.

Baron-Cohen

Jolliffe

Mortimore

Robertson

(1997). Another advanced test of theory of mind: Evidence from very high functioning adults with autism or Asperger syndrome. Journal of Child Psychology and Psychiatry, 38(7), 813–822.

Baron-Cohen

Wheelwright

Hill

Raste

Plumb

(2001). The “Reading the Mind in the Eyes” Test revised version: A study with normal adults, and adults with Asperger syndrome or high-functioning autism. Journal of Child Psychology and Psychiatry, 42(2), 241–251.

10.

Baron-Cohen

Wheelwright

Spong

Scahill

Lawson

(2001). Are intuitive physics and intuitive psychology independent? A test with children with Asperger Syndrome. Journal of Developmental and Learning Disorders, 5, 47–78.

11.

Beadle-Brown

Wilkinson

Richardson

Shaughnessy

Trimingham

Leigh

…Himmerich

(2017). Imagining Autism: Feasibility of a drama-based intervention on the social, communicative and imaginative behaviour of children with autism. Autism, 22(8), 915–927.

12.

Beaumont

Sofronoff

(2008). A multi-component social skills intervention for children with Asperger syndrome: The Junior Detective Training Program. Journal of Child Psychology and Psychiatry, 49(7), 743–753.

13.

Berggren

Fletcher-Watson

Milenkovic

Marschik

P. B.

Bölte

Jonsson

(2018). Emotion recognition training in autism spectrum disorder: A systematic review of challenges related to generalizability. Developmental Neurorehabilitation, 21(3), 141–154.

14.

Black

M. H.

Chen

N. T.

Iyer

K. K.

Lipp

O. V.

Bölte

Falkmer

…Girdler

(2017). Mechanisms of facial emotion recognition in autism spectrum disorders: Insights from eye tracking and electroencephalography. Neuroscience & Biobehavioral Reviews, 80, 488–515.

15.

Black

M. H.

Chen

N. T.

Lipp

O. V.

Bölte

Girdler

(2020). Complex facial emotion recognition and atypical gaze patterns in autistic adults. Autism, 24(1), 258–262.

16.

Bölte

Ciaramidaro

Schlitt

Hainz

Kliemann

Beyer

…Walter

(2015). Training-induced plasticity of the social brain in autism spectrum disorder. The British Journal of Psychiatry, 207(2), 149–157.

17.

Bölte

Feineis-Matthews

Leber

Dierks

Hubl

Poustka

(2002). The development and evaluation of a computer-based program to test and to teach the recognition of facial affect. International Journal of Circumpolar Health, 61(Suppl. 2), 61–68.

18.

Bölte

Hubl

Feineis-Matthews

Prvulovic

Dierks

Poustka

(2006). Facial affect recognition training in autism: Can we animate the fusiform gyrus? Behavioral Neuroscience, 120(1), 211–216.

19.

Boraston

Blakemore

S. J.

Chilvers

Skuse

(2007). Impaired sadness recognition is linked to social interaction deficit in autism. Neuropsychologia, 45(7), 1501–1510.

20.

Capps

Yirmiya

Sigman

(1992). Understanding of simple and complex emotions in non-retarded children with autism. Journal of Child Psychology and Psychiatry, 33, 1169–1182.

21.

Chambless

D. L.

Baker

M. J.

Baucom

D. H.

Beutler

L. E.

Calhoun

K. S.

Crits-Christoph

…Johnson

S. B.

(1998). Update on empirically validated therapies, II. The Clinical Psychologist, 51(1), 3–16.

22.

Cheng

Luo

S. Y.

Lin

H. C.

Yang

C. S.

(2018). Investigating mobile emotional learning for children with autistic spectrum disorders. International Journal of Developmental Disabilities, 64(1), 25–34.

23.

Chung

U. S.

Han

D. H.

Shin

Y. J.

Renshaw

P. F.

(2016). A prosocial online game for social cognition training in adolescents with high-functioning autism: An fMRI study. Neuropsychiatric Disease and Treatment, 12, 651–660.

24.

Corbett

B. A.

Gunther

J. R.

Comins

Price

Ryan

Simon

…Rios

(2011). Brief report: Theatre as therapy for children with autism spectrum disorder. Journal of Autism and Developmental Disorders, 41(4), 505–511.

25.

Corbett

B. A.

Swain

D. M.

Coke

Simon

Newsom

Houchins-Juarez

…Song

(2014). Improvement in social deficits in autism spectrum disorders using a theatre-based, peer-mediated intervention. Autism Research, 7(1), 4–16.

26.

Dadds

M. R.

Hawes

Merz

(2004). The UNSW facial emotion task. Department of Psychology, University of New South Wales.

27.

Dadds

M. R.

MacDonald

Cauchi

Williams

Levy

Brennan

(2014). Nasal oxytocin for social deficits in childhood autism: A randomized controlled trial. Journal of Autism and Developmental Disorders, 44(3), 521–531.

28.

Didehbani

Allen

Kandalaft

Krawczyk

Chapman

(2016). Virtual reality social cognition training for children with high functioning autism. Computers in Human Behavior, 62, 703–711.

29.

Domes

Kumbier

Heinrichs

Herpertz

S. C.

(2014). Oxytocin promotes facial emotion recognition and amygdala reactivity in adults with Asperger syndrome. Neuropsychopharmacology, 39(3), 698–706.

30.

Eack

S. M.

Greenwald

D. P.

Hogarty

S. S.

Bahorik

A. L.

Litschge

M. Y.

Mazefsky

C. A.

Minshew

N. J.

(2013). Cognitive enhancement therapy for adults with autism spectrum disorder: Results of an 18-month feasibility study. Journal of Autism and Developmental Disorders, 43(12), 2866–2877.

31.

Ekman

(1992). An argument for basic emotions. Cognition & Emotion, 6(3–4), 169–200.

32.

Ekman

(1993). Facial expression and emotion. American Psychologist, 48(4), 384–392.

33.

Ekman

Friesen

W. V.

(1975). Unmasking the face. Englewood Cliffs, NJ: Prentice Hall.

34.

Ekman

Friesen

W. V.

(1976). Measuring facial movement. Environmental Psychology and Nonverbal Behavior, 1(1), 56–75.

35.

Ekman

Friesen

W. V.

(1978). Manual for the facial action coding system. Consulting Psychologists Press.

36.

Ekman

Friesen

W. V.

Tomkins

S. S.

(1971). Facial affect scoring technique: A first validity study. Semiotica, 3(1), 37–58.

37.

Fridenson-Hayo

Berggren

Lassalle

Tal

Pigat

Meir-Goren

…Golan

(2017). “Emotiplay”: A serious game for learning about emotions in children with autism: Results of a cross-cultural evaluation. European Child & Adolescent Psychiatry, 26(8), 979–992.

38.

Friedrich

E. V.

Sivanathan

Lim

Suttie

Louchart

Pillen

Pineda

J. A.

(2015). An effective neurofeedback intervention to improve social interactions in children with autism spectrum disorder. Journal of Autism and Developmental Disorders, 45(12), 4084–4100.

39.

Gev

Rosenan

Golan

(2017). Unique effects of the transporters animated series and of parental support on emotion recognition skills of children with ASD: Results of a randomized controlled trial. Autism Research, 10(5), 993–1003.

40.

Golan

Ashwin

Granader

McClintock

Day

Leggett

Baron-Cohen

(2010). Enhancing emotion recognition in children with autism spectrum conditions: An intervention using animated vehicles with real emotional faces. Journal of Autism and Developmental Disorders, 40(3), 269–279.

41.

Golan

Baron-Cohen

(2006). Systemizing empathy: Teaching adults with Asperger syndrome or high-functioning autism to recognize complex emotions using interactive multimedia. Development and Psychopathology, 18(2), 591–617.

42.

Golan

Baron-Cohen

Golan

(2008). The “reading the mind in films” task [child version]: Complex emotion and mental state recognition in children with and without autism spectrum conditions. Journal of Autism and Developmental Disorders, 38(8), 1534–1541.

43.

Golan

Baron-Cohen

Hill

(2006). The Cambridge mindreading (CAM) face-voice battery: Testing complex emotion recognition in adults with and without Asperger syndrome. Journal of Autism and Developmental Disorders, 36(2), 169–183.

44.

Golan

Sinai-Gavrilov

Baron-Cohen

(2015). The Cambridge Mindreading Face-Voice Battery for Children (CAM-C): Complex emotion recognition in children with and without autism spectrum conditions. Molecular Autism, 6, Article 22.

45.

Greenhalgh

Peacock

(2005). Effectiveness and efficiency of search methods in systematic reviews of complex evidence: Audit of primary sources. British Medical Journal, 331(7524), 1064–1065.

46.

Grossman

J. B.

Klin

Carter

A. S.

Volkmar

F. R.

(2000). Verbal bias in recognition of facial emotions in children with Asperger syndrome. Journal of Child Psychology and Psychiatry, 41, 369–379.

47.

Guastella

A. J.

Einfeld

S. L.

Gray

K. M.

Rinehart

N. J.

Tonge

B. J.

Lambert

T. J.

Hickie

I. B.

(2010). Intranasal oxytocin improves emotion recognition for youth with autism spectrum disorders. Biological Psychiatry, 67(7), 692–694.

48.

Guastella

A. J.

Gray

K. M.

Rinehart

N. J.

Alvares

G. A.

Tonge

B. J.

Hickie

I. B.

…Einfeld

S. L.

(2015). The effects of a course of intranasal oxytocin on social behaviors in youth diagnosed with autism spectrum disorders: A randomized controlled trial. Journal of Child Psychology and Psychiatry, 56(4), 444–452.

49.

Guli

L. A.

Semrud-Clikeman

Lerner

M. D.

Britton

(2013). Social Competence Intervention Program (SCIP): A pilot study of a creative drama program for youth with social difficulties. The Arts in Psychotherapy, 40(1), 37–44.

50.

Gur

R. C.

Sara

Hagendoorn

Marom

Hughett

Macy

…Gur

R. E.

(2002). A method for obtaining 3-dimensional facial expressions and its standardization for use in neurocognitive studies. Journal of Neuroscience Methods, 115(2), 137–143.

51.

Hadjikhani

Zürcher

N. R.

Rogier

Ruest

Hippolyte

Ben-Ari

Lemonnier

(2015). Improving emotional face perception in autism with diuretic bumetanide: A proof-of-concept behavioral and functional brain imaging pilot study. Autism, 19(2), 149–157.

52.

Harms

M. B.

Martin

Wallace

G. L.

(2010). Facial emotion recognition in autism spectrum disorders: A review of behavioral and neuroimaging studies. Neuropsychology Review, 20(3), 290–322.

53.

Hopkins

I. M.

Gower

M. W.

Perez

T. A.

Smith

D. S.

Amthor

F. R.

Wimsatt

F. C.

Biasini

F. J.

(2011). Avatar assistant: Improving social skills in students with an ASD through a computer-based intervention. Journal of Autism and Developmental Disorders, 41(11), 1543–1555.

54.

Howlin

Baron-Cohen

Hadwin

(1999). Teaching children with autism to mind-read: A practical guide for teachers and parents. John Wiley.

55.

Humphreys

Minshew

Leonard

G. L.

Behrmann

(2007). A fine-grained analysis of facial expression processing in high-functioning adults with autism. Neuropsychologia, 45, 685–695.

56.

Hunter

(2014). Shakespeare’s heartbeat: Drama games for children with autism. Routledge.

57.

Jenness

J. L.

Hankin

B. L.

Young

J. F.

Gibb

B. E.

(2014). Misclassification and identification of emotional facial expressions in depressed youth: A preliminary study. Journal of Clinical Child and Adolescent Psychology, 44(4), 559–565. https://doi.org/10.1080/15374416.2014.891226

58.

Kandalaft

M. R.

Didehbani

Krawczyk

D. C.

Allen

T. T.

Chapman

S. B.

(2013). Virtual reality social cognition training for young adults with high-functioning autism. Journal of Autism and Developmental Disorders, 43(1), 34–44.

59.

Kerr

S. L.

Neale

J. M.

(1993). Emotion perception in schizophrenia: Specific deficit or further evidence of generalized poor performance? Journal of Abnormal Psychology, 102(2), 312–318.

60.

Kohler

C. G.

Turner

T. H.

Bilker

W. B.

Brensinger

C. M.

Siegel

S. J.

Kanes

S. J.

…Gur

R. C.

(2003). Facial emotion recognition in schizophrenia: Intensity effects and error pattern. American Journal of Psychiatry, 160(10), 1768–1774.

61.

Korkman

Kirk

Kemp

(2007). NEPSY-II: Clinical and interpretive manual. Psychological Corporation.

62.

LaCava

P. G.

Golan

Baron-Cohen

Smith Myles

(2007). Using assistive technology to teach emotion recognition to students with Asperger syndrome: A pilot study. Remedial and Special Education, 28(3), 174–181.

63.

LaCava

P. G.

Rankin

Mahlios

Cook

Simpson

R. L.

(2010). A single case design evaluation of a software and tutor intervention addressing emotion recognition and social interaction in four boys with ASD. Autism, 14(3), 161–178.

64.

Lee

C. S.

Lam

S. H.

Tsang

S. T.

Yuen

C. M.

C. K.

(2018). The effectiveness of technology-based intervention in improving emotion recognition through facial expression in people with autism spectrum disorder: A systematic review. Review Journal of Autism and Developmental Disorders, 5(2), 91–104.

65.

Lee

J. S.

Kang

N. R.

Kim

H. J.

Kwak

Y. S.

(2018). Discriminative effects of social skills training on facial emotion recognition among children with attention-deficit/hyperactivity disorder and autism spectrum disorder. Journal of the Korean Academy of Child and Adolescent Psychiatry, 29(4), 150–160.

66.

Lerner

M. D.

Mikami

A. Y.

Levine

(2011). Socio-dramatic affective-relational intervention for adolescents with Asperger syndrome & high functioning autism: Pilot study. Autism, 15(1), 21–42.

67.

Lopata

Lipinski

A. M.

Thomeer

M. L.

Rodgers

J. D.

Donnelly

J. P.

McDonald

C. A.

Volker

M. A.

(2017). Open-trial pilot study of a comprehensive outpatient psychosocial treatment for children with high-functioning autism spectrum disorder. Autism, 21(1), 108–116.

68.

Lopata

Thomeer

M. L.

Lipinski

A. M.

Donnelly

J. P.

Nelson

A. T.

Smith

R. A.

…Volker

M. A.

(2015). RCT examining the effect of treatment intensity for a psychosocial treatment for high-functioning children with ASD. Research in Autism Spectrum Disorders, 17, 52–63.

69.

Lopata

Thomeer

M. L.

Rodgers

J. D.

Donnelly

J. P.

McDonald

C. A.

(2016). RCT of mind reading as a component of a psychosocial treatment for high-functioning children with ASD. Research in Autism Spectrum Disorders, 21, 25–36.

70.

Lopata

Thomeer

M. L.

Rodgers

J. D.

Donnelly

J. P.

McDonald

C. A.

Volker

M. A.

…Wang

(2019). Cluster randomized trial of a school intervention for children with autism spectrum disorder. Journal of Clinical Child & Adolescent Psychology, 48(6), 922–933.

71.

Lopata

Thomeer

M. L.

Volker

M. A.

Lee

G. K.

Smith

T. H.

Rodgers

J. D.

…Toomey

J. A.

(2013). Open-trial pilot study of a comprehensive school-based intervention for high-functioning autism spectrum disorders. Remedial and Special Education, 34(5), 269–281.

72.

Lopata

Thomeer

M. L.

Volker

M. A.

Lee

G. K.

Smith

T. H.

Smith

R. A.

Toomey

J. A.

(2012). Feasibility and initial efficacy of a comprehensive school-based intervention for high-functioning autism spectrum disorders. Psychology in the Schools, 49(10), 963–974.

73.

Lopata

Thomeer

M. L.

Volker

M. A.

Nida

R. E.

Lee

G. K.

(2008). Effectiveness of a manualized summer social treatment program for high-functioning children with autism spectrum disorders. Journal of Autism and Developmental Disorders, 38(5), 890–904.

74.

Lopata

Thomeer

M. L.

Volker

M. A.

Toomey

J. A.

Nida

R. E.

Lee

G. K.

…Rodgers

J. D.

(2010). RCT of a manualized social treatment for high-functioning autism spectrum disorders. Journal of Autism and Developmental Disorders, 40(11), 1297–1310.

75.

Lopata

Toomey

J. A.

Thomeer

M. L.

McDonald

C. A.

Fox

J. D.

Smith

R. A.

…Lipinski

A. M.

(2015). Community trial of a comprehensive psychosocial treatment for HFASDs. Focus on Autism and Other Developmental Disabilities, 30(2), 115–125.

76.

Lordo

D. N.

Bertolin

Sudikoff

E. L.

Keith

Braddock

Kaufman

D. A.

(2017). Parents perceive improvements in socio-emotional functioning in adolescents with ASD following social skills treatment. Journal of Autism and Developmental Disorders, 47(1), 203–214.

77.

Lozier

L. M.

Vanmeter

J. W.

Marsh

A. A.

(2014). Impairments in facial affect recognition associated with autism spectrum disorders: A meta-analysis. Development and Psychopathology, 26(4), 933–945. https://doi.org/10.1017/S0954579414000479

78.

Matsumoto

Ekman

(1997). Japanese and Caucasian facial expressions of emotion (JACFEE): Reliability data and cross-national differences. Journal of Nonverbal Behavior, 21, 3–21.

79.

Matsumoto

Hwang

H. S.

(2011). Evidence for training the ability to read microexpressions of emotion. Motivation and Emotion, 35, 181–191. https://doi.org/10.1007/s11031-011-9212-2

80.

McKown

Gumbiner

L. M.

Russo

N. M.

Lipton

(2009). Social-emotional learning skill, self-regulation, and social competence in typically developing and clinic-referred children. Journal of Clinical Child & Adolescent Psychology, 38(6), 858–871.

81.

Mehling

M. H.

Tassé

M. J.

Root

(2017). Shakespeare and autism: An exploratory evaluation of the Hunter Heartbeat Method. Research and Practice in Intellectual and Developmental Disabilities, 4(2), 107–120.

82.

Merten

(2005). Culture, gender and the recognition of the basic emotions. Psychologia, 48(4), 306–316.

83.

Miyahara

Ruffman

Fujita

Tsujii

(2010). How well can young people with Asperger’s disorder recognize threat and learn about affect in faces? A pilot study. Research in Autism Spectrum Disorders, 4(2), 242–248.

84.

Nowicki

(1997). Instructional manual for the receptive tests of the Diagnostic Analysis of Nonverbal Accuracy 2. Peachtree.

85.

Nowicki

Duke

M. P.

(2008). Manual for the receptive tests of the diagnostic analysis of nonverbal accuracy 2 (DANVA2). Department of Psychology, Emory University.

86.

Nowicki

Duke

M. P.

(1994). Individual differences in the nonverbal communication of affect: The Diagnostic Analysis of Nonverbal Accuracy Scale. Journal of Nonverbal Behavior, 18(1), 9–35.

87.

Olszanowski

Pochwatko

Kuklinski

Scibor-Rylski

Lewinski

Ohme

R. K.

(2015). Warsaw set of emotional facial expression pictures: A validation study of facial display photographs. Frontiers in Psychology, 5, Article 1516. https://doi.org/10.3389/fpsyg.2014.01516

88.

O’Reilly

Pigat

Fridenson

Berggren

Tal

Golan

…Lundqvist

(2016). The EU-emotion stimulus set: A validation study. Behavior Research Methods, 48(2), 567–576.

89.

Pelphrey

K. A.

Sasson

N. J.

Reznick

J. S.

Paul

Goldman

B. D.

Piven

(2002). Visual scanning of faces in autism. Journal of Autism and Developmental Disorders, 32(4), 249–261. https://doi.org/10.1023/A:1016374617369

90.

Peñuelas-Calvo

Sareen

Sevilla-Llewellyn-Jones

Fernández-Berrocal

(2019). The “Reading the Mind in the Eyes” Test in autism-spectrum disorders comparison with healthy controls: A systematic review and meta-analysis. Journal of Autism and Developmental Disorders, 49(3), 1048–1061.

91.

Petrovska

I. V.

Trajkovski

(2019). Effects of a computer-based intervention on emotion understanding in children with autism spectrum conditions. Journal of Autism and Developmental Disorders, 49(10), 4244–4255.

92.

Quintana

D. S.

Westlye

L. T.

Hope

Nærland

Elvsåshagen

Dørum

…Stensønes

(2017). Dose-dependent social-cognitive effects of intranasal oxytocin delivered with novel Breath Powered device in adults with autism spectrum disorder: A randomized placebo-controlled double-blind crossover trial. Translational Psychiatry, 7(5), 1–9.

93.

Rice

L. M.

Wall

C. A.

Fogel

Shic

(2015). Computer-assisted face processing instruction improves emotion recognition, mentalizing, and social skills in students with ASD. Journal of Autism and Developmental Disorders, 45(7), 2176–2186.

94.

Richard

D. A.

Joy

S. P.

(2015). Recognizing emotions: Testing an intervention for children with autism spectrum disorders. Art Therapy, 32(1), 13–19.

95.

Rump

K. M.

Giovannelli

J. L.

Minshew

N. J.

Strauss

M. S.

(2009). The development of emotion recognition in individuals with autism. Child Development, 80(5), 1434–1447.

96.

Russo-Ponsaran

N. M.

Evans-Smith

Johnson

J. K.

McKown

(2014). A pilot study assessing the feasibility of a facial emotion training paradigm for school-age children with autism spectrum disorders. Journal of Mental Health Research in Intellectual Disabilities, 7(2), 169–190.

97.

Russo-Ponsaran

N. M.

Evans-Smith

Johnson

J. K.

Russo

McKown

(2016). Efficacy of a facial emotion training program for children and adolescents with autism spectrum disorders. Journal of Nonverbal Behavior, 40(1), 13–38.

98.

Russo-Ponsaran

N. M.

McKown

Johnson

J. K.

Allen

A. W.

Evans-Smith

Fogg

(2015). Social-emotional correlates of early stage social information processing skills in children with and without autism spectrum disorder. Autism Research, 8(5), 486–496.

99.

Ryan

Charragáin

C. N.

(2010). Teaching emotion recognition skills to children with autism. Journal of Autism and Developmental Disorders, 40(12), 1505–1511.

100.

Schmidt

Stichter

J. P.

Lierheimer

McGhee

O’Connor

K. V.

(2011). An initial investigation of the generalization of a school-based social competence intervention for youth with high-functioning autism. Autism Research and Treatment, 2011, Article 589539. https://doi.org/10.1155/2011/589539

101.

Shanok

N. A.

Jones

N. A.

Lucas

N. N.

(2019). The nature of facial emotion recognition impairments in children on the autism spectrum. Child Psychiatry & Human Development, 50(4), 661–667.

102.

Silver

Oakes

(2001). Evaluation of a new computer intervention to teach people with autism or Asperger syndrome to recognize and predict emotions in others. Autism, 5(3), 299–316.

103.

Soken

N. H.

Pick

A. D.

(1992). Intermodal perception of happy and angry expressive behaviors by seven-month-old infants. Child Development, 63(4), 787–795.

104.

Solomon

Goodlin-Jones

B. L.

Anders

T. F.

(2004). A social adjustment enhancement intervention for high functioning autism, Asperger’s syndrome, and pervasive developmental disorder NOS. Journal of Autism and Developmental Disorders, 34(6), 649–668.

105.

Spence

(1980). Social skills training with children and adolescents: A counsellor’s manual. NFER-Nelson.

106.

Spence

S. H.

(1995). Assessment of perception of emotion from facial expression. In Social skills training: Enhancing social competence with children and adolescents: Photocopiable resource book. NFER-Nelson.

107.

Stichter

J. P.

Herzog

M. J.

Visovsky

Schmidt

Randolph

Schultz

Gage

(2010). Social competence intervention for youth with Asperger syndrome and high-functioning autism: An initial investigation. Journal of Autism and Developmental Disorders, 40(9), 1067–1079.

108.

Stichter

J. P.

Laffey

Galyen

Herzog

(2014). iSocial: Delivering the social competence intervention for adolescents (SCI-A) in a 3D virtual learning environment for youth with high functioning autism. Journal of Autism and Developmental Disorders, 44(2), 417–430.

109.

Stichter

J. P.

O’Connor

K. V.

Herzog

M. J.

Lierheimer

McGhee

S. D.

(2012). Social competence intervention for elementary students with Asperger’s syndrome and high functioning autism. Journal of Autism and Developmental Disorders, 42(3), 354–366.

110.

Thomeer

M. L.

Lopata

Volker

M. A.

Toomey

J. A.

Lee

G. K.

Smerbeck

A. M.

…Smith

R. A.

(2012). Randomized clinical trial replication of a psychosocial treatment for children with high-functioning autism spectrum disorders. Psychology in the Schools, 49(10), 942–954.

111.

Thomeer

M. L.

Rodgers

J. D.

Lopata

McDonald

C. A.

Volker

M. A.

Toomey

J. A.

…Gullo

(2011). Open-trial pilot of mind reading and in vivo rehearsal for children with HFASD. Focus on Autism and Other Developmental Disabilities, 26(3), 153–161.

112.

Thomeer

M. L.

Smith

R. A.

Lopata

Volker

M. A.

Lipinski

A. M.

Rodgers

J. D.

…Lee

G. K.

(2015). Randomized controlled trial of mind reading and in vivo rehearsal for high-functioning children with ASD. Journal of Autism and Developmental Disorders, 45(7), 2115–2127.

113.

Tottenham

Tanaka

J. W.

Leon

A. C.

McCarry

Nurse

Hare

T. A.

…Nelson

(2009). The NimStim set of facial expressions: Judgments from untrained research participants. Psychiatry Research, 168(3), 242–249.

114.

Turner-Brown

L. M.

Perry

T. D.

Dichter

G. S.

Bodfish

J. W.

Penn

D. L.

(2008). Brief report: Feasibility of social cognition and interaction training for adults with high functioning autism. Journal of Autism and Developmental Disorders, 38(9), 1777–1784.

115.

Voss

Schwartz

Daniels

Kline

Haber

Washington

…Feinstein

(2019). Effect of wearable digital intervention for improving socialization in children with autism spectrum disorder: A randomized clinical trial. JAMA Pediatrics, 173(5), 446–454.

116.

Wang

A. T.

Dapretto

Hariri

A. R.

Sigman

Bookheimer

S. Y.

(2004). Neural correlates of facial affect processing in children and adolescents with autism spectrum disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 43(4), 481–490.

117.

Weiner

S. G.

Gregory

Froming

K. B.

Levy

C. M.

Ekman

(2006). Emotion processing: The comprehensive affect testing system user’s manual. Psychology Software.

118.

Widen

S. C.

Russell

J. A.

(2003). A closer look at preschoolers’ freely produced labels for facial expressions. Developmental Psychology, 39(1), 114–128.

119.

Wieckowski

A. T.

White

S. W.

(2017). Eye-gaze analysis of facial emotion recognition and expression in adolescents with ASD. Journal of Clinical Child & Adolescent Psychology, 46(1), 110–124.

120.

Wieckowski

A. T.

White

S. W.

(2020). Attention modification to attenuate facial emotion recognition deficits in children with autism: A pilot study. Journal of Autism and Developmental Disorders, 50, 30–41.

121.

Williams

B. T.

Gray

K. M.

Tonge

B. J.

(2012). Teaching emotion recognition skills to young children with autism: A randomised controlled trial of an emotion training programme. Journal of Child Psychology and Psychiatry, 53(12), 1268–1276.

122.

Williams

Daley

Burnside

Hammond-Rowley

(2009). Measuring emotional intelligence in preadolescence. Personality and Individual Differences, 47(4), 316–320.

123.

Wingenbach

T. S.

Ashwin

Brosnan

(2017). Diminished sensitivity and specificity at recognizing facial emotional expressions of varying intensity underlie emotion-specific recognition deficits in autism spectrum disorders. Research in Autism Spectrum Disorders, 34, 52–61.

124.

Yan

Liu

(2018). Using animated vehicles with real emotional faces to improve emotion recognition in Chinese children with autism spectrum disorder. PLOS ONE, 13(7), Article e0200375.

125.

Young

A. W.

Perrett

Calder

Sprengelmeyer

Ekman

(2002). Facial Expressions of Emotion: Stimuli and Tests (FEEST). Thames Valley Test Company.

126.

Young

R. L.

Posselt

(2012). Using the transporters DVD as a learning tool for children with autism spectrum disorders (ASD). Journal of Autism and Developmental Disorders, 42(6), 984–991.