Abstract
The comprehension of dynamically unfolding social situations is made possible by the seamless integration of multimodal information merged with rich intuitions about the thoughts and behaviors of others. We examined how high-functioning adults with autism spectrum disorder and neurotypical controls made a complex social judgment (i.e. rating the social awkwardness of scenes from a television sitcom) across three conditions that manipulated presentation modality—visual alone, transcribed text alone, or visual and auditory together. The autism spectrum disorder and control groups collectively assigned similar mean awkwardness ratings to individual scenes. However, individual participants with autism spectrum disorder tended to respond more idiosyncratically than controls, assigning ratings that were less correlated with the ratings of the other participants in the sample. We found no evidence that this group difference was isolated to any specific presentation modality. In a comparison condition, we found no group differences when participants instead rated the happiness of characters (a more basic social judgment) in full audiovisual format. Thus, although we observed differences in the manner with which high-functioning adults with autism spectrum disorder make social judgments compared to controls, these group differences may be dependent on the social dimension being judged, rather than the specific modality of presentation.
Keywords
Introduction
Individuals with autism spectrum disorder (ASD) often show impairment in how they perceive and comprehend social stimuli. From a very young age, individuals with ASD show atypical patterns of attention to biological and social stimuli (Bebko et al., 2006; Chawarska et al., 2012; Dawson et al., 2005; Klin et al., 2009; Swettenham et al., 1998). Although some social abilities improve to typical levels by adolescence or adulthood (e.g. Foxe et al., 2015), other differences appear to persist throughout development and include abnormal facial recognition (Blair et al., 2002), difficulties recognizing emotional facial expressions (Ashwin et al., 2006; Wallace et al., 2008; for a review, see Harms et al., 2010), relative inattention to people and faces (Klin et al., 2002), and a decreased understanding of and engagement in reciprocal social situations (Orsmond et al., 2013).
Especially in studies of high-functioning adults, however, not all social tasks elicit differences in ASD, and the reason for this is unclear. Understanding the factors that give rise to behavioral impairment is important for at least two reasons: (1) it would be useful to have tasks that are sensitive enough to measure and characterize impairments that remain present in high-functioning adults with ASD and (2) understanding the factors that give rise to these impairments would be critical for designing effective interventions targeted throughout the developmental trajectory of the condition.
The social complexity of a given experimental task may be one important factor determining whether individuals with ASD will perform typically. Especially in high-functioning adults who have honed compensatory mechanisms over many years, one may need to employ a more challenging test of social comprehension—going beyond the social ABC’s presented in oversimplified settings—to observe and study the still-present social impairments affecting these individuals. Past studies of children and adolescents with ASD have found deficits in their ability to recognize deception (Baron-Cohen, 1992), semantic ambiguity (Le Sourn-Bissaoui et al., 2011), and faux pas (Baron-Cohen et al., 1999). Compared to inferring the most basic emotions (happiness, sadness, fear, anger, disgust, and surprise), the appreciation of these more nuanced social cues likely involves a deeper understanding of social context and convention, possibly making impairment in these skills more persistent. Consistent with this account, some studies reporting typical performance on social tasks in individuals with ASD have tested the recognition of only the basic emotions (e.g. Adolphs et al., 2001; Baron-Cohen et al., 1993), whereas differences in the judgment of social awkwardness have been found at both the neural (Pantelis et al., 2015) and behavioral (Heavey et al., 2000) levels.
It is possible that the ability of ASD participants to make typical social judgments is not particularly influenced by the specific modality of the experimental stimuli. Participants with ASD are significantly worse than controls at inferring mental states from visual cues such as pictures of eyes (Baron-Cohen et al., 1997; Kleinman et al., 2001) and facial expressions (Harms et al., 2010), but these deficits also have been replicated in non-visual modalities (Kleinman et al., 2001; Loveland et al., 1995; Rutherford et al., 2002). That said, observed social deficits could potentially be more pronounced in one modality compared to another. Furthermore, there has also been some experimental evidence that integration across auditory and visual modalities may function atypically in ASD, for example, in the service of speech perception (Mongillo et al., 2008). It is possible that making social judgments in the face of dynamic, complex sensory input received simultaneously via multiple modalities—as in, for example, everyday life—may be especially challenging for individuals with ASD (Collignon et al., 2013; Grossman et al., 2015).
Finally, the dependent measure and analysis method may be critical. Even when individuals in a sample have difficulty making normative judgments in a social task, group performance may appear to be quite typical when averaging over all participants and treating idiosyncrasy as random noise. Furthermore, allowing the participant to respond with a Likert-type scale with respect to the social feature being judged (e.g. “How socially awkward does the target character feel?”) may potentially reveal more subtle deviations from typical response patterns than can be detected when the participant makes coarser categorical distinctions (e.g. “Is the target character happy, sad, afraid, angry, surprised, or disgusted?”).
In this study, we present individuals with ASD and matched controls with a series of relatively naturalistic social interactions (selected from a television sitcom), to examine how these individuals make social judgments in a manner that better approximates real-world conditions. Although semi-naturalistic, dynamic, and unconstrained stimuli often are difficult to manipulate systematically, their use in the experiments serves an important complementary role alongside more controlled, reductionist studies of how individuals with ASD process static social stimuli and allows for arguably stronger ecological validity (Bush and Kennedy, 2015; Hasson et al., 2009; Pantelis et al., 2015). Indeed, the gap between the experimental and real-life conditions has been a recurrent obstacle in reliably assessing social abilities in ASD (Volkmar et al., 2004). The experiments employing these sorts of stimuli (e.g. audio and/or video recordings) may be especially sensitive to impairments in social comprehension in participants with ASD (Dziobek et al., 2006; Golan et al., 2006; Heavey et al., 2000; Roeyers et al., 2001; Rosenblau et al., 2015; Rutherford et al., 2002), perhaps owing to their demands being more representative of social conditions encountered in daily life.
In this experiment, participants with and without ASD assign ratings of social awkwardness to scenes of a sitcom—a complex social task that we hypothesize will have the potential to evoke a group difference. These clips are presented via three different modalities (transcribed text-only, visual-only, and full audiovisual), allowing us to assess how individuals with ASD use verbal, visual, and combined audiovisual information to inform the inferences they make about complex social interactions. Based on previous research findings (Pantelis et al., 2015), it is quite plausible that the ASD and control groups will assign a similar average rating to any given scene. Furthermore, unless participants with ASD exhibit a consistent bias in one direction (i.e. consistently rating all scenes to be more awkward, or less awkward), there will be no way to detect their “errors” when averaging over the individual differences present within each group. Such a procedure will cancel out any non-systematic (i.e. idiosyncratic) individual differences in response pattern, and consensus awkwardness rating assigned by the two groups with respect to any given clip may be in apparent agreement. Thus, as a more sensitive measure of group differences in performance, our primary analyses focus on the ability of individuals in either group to consistently provide ratings in line with the tendencies of the other participants (measured via inter-subject correlation). We hypothesize that on a trial-by-trial basis, individuals with ASD will be less able to rate clips in a manner consistent with the other participants. We further test whether impairments are most apparent when the participant must process audio and visual information simultaneously, in the condition most similar to real-world experiences.
Methods
To examine group differences in the comprehension of social information presented via three modalities, we presented individuals with ASD and neurotypical (NT) controls with scenes from a sitcom (The Mindy Project) and asked these participants to rate how socially awkward a target character from each scene felt. The institutional review board (IRB) of Indiana University approved the study, and all the participants provided written informed consent.
Participants
A total of 16 adults with ASD (age 17–39 years, M = 23.0 years; 14 males and 2 females) of average or above-average IQ (97–129, M = 113.5) were recruited from the Bloomington, Indiana area. Participants responded to flyers posted around the community, on the university campus, on Craigslist, or were referred by word of mouth; several had already participated in previous experiments conducted in the laboratory. All 16 ASD participants had previously received clinical diagnoses of autism, Asperger’s syndrome, or pervasive developmental disability—not otherwise specified (PDD-NOS). A Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR) diagnosis was confirmed using the Autism Diagnostic Observation Schedule-2 (ADOS-2) Module 4 (Lord et al., 2000) and review of background information and neuropsychological assessment (including IQ testing, Autism Spectrum Quotient (AQ), Beck Depression Inventory, State-Trait Anxiety Inventory, and self-reported clinical history), together with clinical judgment. All ADOS administrations and scoring were performed by research reliable administrators. The suggested ADOS cutoff score is 7; however, in light of empirical evidence that the ADOS Module 4 has very similar specificities at thresholds of 6 and 7 when administered to high-functioning adults (Bastiaansen et al., 2011), we included participants with an ADOS score of 6 if all other available information supported an ASD diagnosis. Three potential ASD participants were initially tested on the primary experimental protocol, but were excluded from the sample because they did not return to the laboratory for further diagnostic and neuropsychological assessments (N = 3). All ASD participants were compensated at a rate of US$15 per hour.
A total of 19 approximately age-matched (19–34 years, M = 21.5 years), gender-matched (16 males and 3 females), and IQ-matched (96–132, M = 113.7) controls (see Table 1) were either recruited from the Department of Psychological and Brain Sciences at Indiana University in exchange for course credit or recruited from the Bloomington area and compensated at a rate of US$15 per hour. One additional control was excluded from the final sample because of apparent confusion with the experimental instructions, and another was excluded because he did not return for additional neuropsychological assessment.
Characteristics of autism spectrum disorder (ASD) and neurotypical (NT) control samples.
VIQ: verbal IQ; PIQ: performance IQ; FSIQ: full-scale IQ; AQ: Autism Spectrum Quotient; RME: Reading the Mind in the Eyes; TONCK: Test of Nonverbal Cue Knowledge; SD: standard deviation; ASD: autism spectrum disorder; NT: neurotypical.
Not every subject completed every measure. Because not all measures were recorded for every subject, degrees of freedom vary slightly across various measures. See section “Methods.”
The analyses we report were performed with respect to these samples of 16 ASD participants and 19 controls. However, we also reanalyzed the data using an ASD sample that reflects the application of even more stringent inclusion criteria (i.e. excluding any subject from the ASD sample with ADOS < 7 or AQ < 27). The significance (or non-significance) of all between-group (ASD vs NT) comparisons reported in section “Results” below still holds, even with respect to this smaller ASD sample (n = 10).
Measures
The following battery of assessments was administered to nearly all participants in the final sample of 16 ASD participants and 19 controls, in order to thoroughly characterize the two samples. See Table 1 for detailed information about the characteristics of both the ASD and NT samples.
Wechsler Abbreviated Scale of Intelligence
The Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, 1999) is a measure of verbal, nonverbal, and general cognitive ability that consists of four subtests. It is used to obtain an estimated full-scale IQ (FSIQ), verbal IQ (VIQ), and performance IQ (PIQ). IQ was assessed for 15 of 16 ASD participants and all 19 controls in the final sample.
AQ
The AQ (Baron-Cohen et al., 2001b) is a self-report questionnaire that is used as a screening instrument for detecting traits characteristic of ASD. It consists of 50 statements assessing characteristics commonly observed in affected individuals, across several domains: social skills, communication, imagination, attention to detail, and flexibility. All 16 of the ASD participants and 18 of the 19 controls completed the AQ, with ASD participants (M = 32.3) scoring significantly higher than controls (M = 15.0; t(32) = 7.75, p < 0.001).
Revised “Reading the Mind in the Eyes” test
The Reading the Mind in the Eyes (RME) (Baron-Cohen et al., 2001a) aims to measure how well participants can infer complex emotional states (as opposed to basic emotions such as happiness, sadness) of other people from photographs of their eyes. Participants are shown 36 photographs of eyes and asked to attribute a feeling (e.g. “Jealousy,” “Amused,” “Hostile”) to the character from among four word choices. For the purposes of this study, this task was accompanied with a glossary of each word choice option to mitigate the possible confound of group differences in vocabulary. All 16 of the ASD participants and 17 of the 19 controls were assessed with the RME. On average, controls (M = 27.0) scored slightly higher than ASD participants (M = 24.8), a marginally significant group difference (t(31) = 1.88, p = 0.07).
Test of Nonverbal Cue Knowledge
The Test of Nonverbal Cue Knowledge (TONCK) (Rosip and Hall, 2004) consists of 81 true/false questions about nonverbal human behavior (e.g. “Rapid head nods are a signal to the speaker to finish quickly”). It has been shown to correlate with performance on implicit measures of nonverbal knowledge such as the Profile of Nonverbal Sensitivity (PONS; Rosenthal et al., 1979) and the Diagnostic Analysis of Nonverbal Accuracy (DANVA; Nowicki and Duke, 1994). Fifteen of the 16 ASD participants and 18 of the 19 controls were assessed with the TONCK. Controls (M = 66.1) scored significantly higher than ASD participants (M = 61.2; t(31) = 2.12, p < 0.05).
Stimuli
The experiment was presented on a 27-in display iMac computer, via Qualtrics Online Survey Software (www.Qualtrics.com) and the Safari web browser. Headphones were provided for the conditions that required sound.
A total of 25 brief (8–34 s in duration, M = 18.8 s) segments featuring social interactions were selected from various episodes of the first season of the television sitcom The Mindy Project, to be rated on the dimension of social awkwardness. We chose the The Mindy Project instead of other more popular TV shows of this genre to decrease the possibility that some individuals would be more familiar with the show than others. And indeed, 23 of 24 participants (ASD and controls) who were explicitly asked about their familiarity with the show reported having never seen it (the remaining participant reported being vaguely familiar with the show, but had only seen a few episodes). Scenes were chosen to represent a wide range of emotions, characters, situations, and levels of social awkwardness. A total of 10 additional segments (11–32 s, M = 16.9 s) were selected as stimuli for a comparison condition, in which participants would assess the happiness of a target character.
For each of the 25 scenes selected as stimuli for the assessment of comprehension of social awkwardness, three sets of stimuli were constructed: a full audiovisual condition (i.e. the unaltered video clip with sound), a visual-only condition (the unaltered video clip without accompanying sound), and a text-only condition (a transcribed version of the video, presented as a one-panel comic strip). Limiting the number of conditions to three served to keep the experiment at a manageable duration, preserving participant motivation. Although the inclusion of an audio-only condition may provide important information in future studies, we chose not to administer it alongside the text-only condition, reasoning that exposure to conversation content and character affect, whether presented before or after the text-only stimuli, likely would have strongly influenced participants’ ratings across these conditions, making interpretation of the results difficult. Thus, we decided only to include one of these two conditions and chose the text-only condition because (1) we believed that the inclusion of the audio-only condition would be more likely to taint participants’ judgments in the other conditions (and vice versa) and (2) the audio-only condition would have been less practical to implement, because it would be potentially difficult for the participant to identify the target character among multiple speakers. See Figure 1 for a representation of one scene in each of the three presentation formats.

An example stimulus presented in each of the three modalities: text-only (left), visual-only (center), and full audiovisual (right).
In the text-only condition, characters were depicted as identical, faceless stick figures. Their dialogue appeared in bubbles (without any inflection cues, such as exclamation points), ordered from top to bottom. In order to allow participants to identify and follow the target character (the one whose feelings they would be rating), character names were written on the characters’ stick figure avatars and different characters’ text bubbles were presented in different colors. If the target character did not speak in the scene, he or she was still identified and presented in the scene, with an indication that the character was a bystanding observer.
Procedure
Participants were presented with four different task blocks, three of which consisted of 25 trials (the social awkwardness trials), and one of which consisted of only 10 trials (the happiness trials). Instructions and a definition of social awkwardness (“Social awkwardness can arise from someone saying or doing something they shouldn’t have [for instance, something inappropriate], or can arise from something happening that makes someone feel uncomfortable around another person or group of people”) were displayed and read aloud to each participant before each block. During the first and second blocks, participants saw the text-only and visual-only versions of the 25 scenes (which version came first was counterbalanced across participants). During the third block, participants viewed the scenes in their full audiovisual format. Within each of the blocks, trials were presented in a randomized order for each subject.
An image of the target character to be rated was presented prior to visual-only and full audiovisual scenes for identification purposes, in which the designated character was shown along with his or her name, wearing the same outfit that he or she would be wearing in the scene and conveying a neutral or smiling facial expression. After viewing this image, participants were instructed to click on the image to view the scene. In the text-only condition, the target character was identified by name and surrounded with a dotted box, and participants were simply instructed to read the comic strip. After the conclusion of each scene presentation, participants were asked how awkward (or happy) the identified target character felt at the end of the scene.
Awkwardness of the target character in a clip was rated on a discrete scale from 1 (“Not at all socially awkward”) to 9 (“Extremely socially awkward”), as shown in Figure 1. In the comparison condition, consisting of 10 full audiovisual scenes, participants were instructed to rate the happiness of an identified character from 1 (“Not at all happy”) to 9 (“Extremely happy”). This condition allowed us to assess the social comprehension of both groups with respect to a more basic social task, while also providing a separate task to gauge how inclined each subject was to use the full rating scale.
Results
Across the three experimental conditions, we first investigated whether there were overall differences or biases in the average (i.e. consensus) social awkwardness ratings assigned by either group to scenes, although we did not hypothesize that group differences would be found using this measure (see Introduction). More critically, our primary analyses assessed how consistently individuals’ ratings in either group matched typical ratings provided by the rest of the participants, using inter-subject correlation analyses.
Response biases
We first analyzed whether the general tendency to rate scenes as more or less awkward was influenced by group (ASD vs control) or presentation modality (text-only, visual-only, or audiovisual). A mixed (between- and within-subjects) analysis of variance (ANOVA) revealed a main effect of presentation modality (F[2,66] = 3.74, p < 0.05), with post hoc t-tests revealing a small but consistent difference in the overall level of awkwardness assigned to scenes presented in full audiovisual (M = 5.20) versus visual-only (M = 4.85; t(34) = 2.85, p = 0.007), with the text-only condition (M = 5.00) falling in-between (and not significantly different from the other two respective conditions, t(34) = 1.70, p = 0.244; t(34) = 1.19, p = 0.098).
There was no significant main effect of group (F[1,33] = 1.14, p = 0.293) or interaction between presentation modality and group (F[2,66] = 0.606, p = 0.549). The overall mean ratings provided by NT participants (5.16) and ASD participants (4.86) were very near the middle of the supplied scale and did not differ significantly. Across group and condition, the average participants’ ratings also had similar levels of variance, with no significant effect of group (F[1,33] = 0.23, p = 0.637), condition (F[2,66] = 2.15, p = 0.125), or interaction of group and condition (F[2,66] = 0.09, p = 0.914). In summary, participants in both groups tended to rate scenes presented in full audiovisual format as slightly more socially awkward, but there was no overall bias in the manner with which one group rated the scenes compared to the other.
Group consensus ratings
We next analyzed the extent to which the groups collectively agreed with one another, with respect to the consensus (i.e. average) level of social awkwardness assigned to individual scenes by each group. As illustrated in Figure 2, the mean ratings assigned by either group were strongly correlated, indicating that the two groups collectively agreed upon the relative levels of social awkwardness among these 25 scenes. Correlation was highest in the full audiovisual condition (r(23) = 0.962, p < 0.001), although the respective differences between the magnitude of this correlation and the text-only (r(23) = 0.899, p < 0.001) and visual-only (r(23) = 0.895, p < 0.001) conditions were only marginally significant (p = 0.091 and p = 0.080, respectively).

(a) Mean rating of social awkwardness provided by each group for each of 25 clips (each individual clip is represented by a black dot) that were presented as comic strips (i.e. “Text-Only”). (b) Mean rating of social awkwardness provided by each group for each of 25 clips that were presented visually, without sound (i.e. “Visual-Only”). (c) Mean rating of social awkwardness provided by each group for each of 25 clips that were presented in full audiovisual format. (d) Mean rating of happiness provided by each group for each of 10 clips that were presented in full audiovisual format. The red (ASD) and blue (control) dashed lines in each scatterplot represent the grand mean rating provided by each group for the condition.
Inter-subject agreement
For a given condition, to compute a participant’s degree of agreement with the other participants, we correlated his or her ratings across scenes with the ratings of each of the remaining 34 participants in the total sample.
Previous analyses demonstrated that the two groups tended to assign the same aggregate ratings to individual scenes. Thus, when assessing an individual’s pattern of responses with respect to what was typical, these data justified comparing the individual with all other participants’ responses. An alternative approach would have been to compare a participant’s responses only to those assigned by the “normative” sample of controls, also using a leave-one-out procedure where appropriate. It should be noted that performing the analyses in this alternative manner does not change the results of the statistical tests performed in this section. 1
These raw correlations were then Fisher z-transformed, a statistical procedure that is recommended when averaging correlations, especially with relatively small samples (Silver and Dunlap, 1987). The mean Fisher z-transformed correlation between a participant’s responses and those of the other 34 participants was our metric of the participant’s average level of agreement with the other participants. Because we had no “ground truth” measure of social awkwardness for each scene, we use this inter-subject agreement metric as a stand-in for an individual’s trial-by-trial accuracy.
We performed a mixed (between- and within-subjects) ANOVA of the effects of group (ASD vs control) and presentation format (text-only, visual-only, and full audiovisual) on how well an individual’s social awkwardness ratings agreed with the rest of the participants in the sample. The ANOVA revealed significant main effects of group (F[1,33] = 6.57, p = 0.015) and of condition (F[2,66] = 84.73, p < 0.001). Post hoc t-tests revealed that participants across both groups were best correlated with one another in the full audiovisual condition, compared to both the text-only and visual-only conditions (t(34) = 15.83, p < 0.001, d = 2.68; t(34) = 9.59, p < 0.001, d = 1.62, respectively). There also was a marginally significant (after accounting for multiple comparisons via Bonferroni correction, that is, an adjusted alpha level of 0.016) difference in performance between the visual-only and text-only conditions (t(34) = 2.36, p = 0.026, d = 0.39).
The significant main effect of group demonstrated that the social awkwardness ratings of ASD participants tended to show lower agreement with the ratings of other participants (compared to controls), an effect that was most apparent in the full audiovisual condition (d = 0.95 vs d = 0.41 (text-only) and d = 0.54 (visual-only)), although there was no statistically significant interaction of group and presentation format (F[2,66] = 1.25, p = 0.292; see Figure 3) (see footnote 1).

The average correlation between the individual’s ratings and those provided by each of the other participants in the same condition.
Rating happiness
In a separate task, participants rated the happiness of target characters across 10 scenes, presented in full audiovisual format. As in the case of rating social awkwardness, aggregate ratings provided by the NT and ASD groups were very strongly correlated (r(8) = 0.99, p < 0.001; see Figure 2(d)). However, participants’ responses tended to be much more consistent with one another on this more basic social comprehension task (t(34) = 16.94, p < 0.001, d = 2.86 when comparing inter-subject correlation in this task versus in the social awkwardness (audiovisual) condition; see Figure 3). With participants in both groups likely approaching ceiling performance on this task, no group differences in inter-subject agreement were detected (t(33) = 0.13, p = 0.900, d = 0.04).
Discussion
When individuals with ASD and their NT peers rated the social awkwardness of characters in scenes from a television sitcom, both groups tended to arrive at a similar consensus rating for each scene, and this was true for each of the presentation modalities (text-only, visual-only, and full audiovisual conditions). However, the ability of each individual to consistently provide ratings in line with those of other participants was influenced significantly by presentation modality and was significantly diminished in the ASD group. The observed group difference was unlikely to be due to simple bias in the use of the rating scale (since performance was equal in the happiness condition, which used the same Likert-style scale), nor was it necessarily isolated to any particular modality (although it was most pronounced in the full audiovisual condition).
Previous studies have hypothesized that individuals with ASD have difficulty with the integration of information via multiple modalities, a deficit that may or may not be specific to the social domain (Collignon et al., 2013; Mongillo et al., 2008). Individuals with ASD have also been previously found to have difficulties attending to multiple objects at once when faced with complex stimuli (Fletcher-Watson et al., 2006; O’Hearn et al., 2011). Although our participants with ASD rated awkwardness less consistently than controls when presented with scenes in full audiovisual format, we did not find conclusive evidence that this group difference was larger than that which was observed in single-modality conditions (i.e. text-only and visual-only). However, a comparison condition revealed that group differences were apparently limited to the judgment of social awkwardness, as no differences were observed in the less complex social task of rating levels of happiness. Rating character happiness consistently with others’ ratings was not challenging to either group—that is, inter-subject agreement was extremely high. Tasks that are more taxing to social intuition—for example, rating social awkwardness—may be necessary to elicit differences in experimental performance in ASD.
That said, impairment in individuals with ASD when processing even the basic emotions (including happiness) has sometimes been observed (Bölte and Poustka, 2003; Celani et al., 1999), although in both of these studies participants were younger than those in this experiment. Our result is consistent with other findings (e.g. Adolphs et al., 2001; Baron-Cohen et al., 1997) examining this ability in older individuals, suggesting high-functioning individuals with ASD can reliably recognize the basic emotions (happiness, sadness, anger, fear, surprise, and disgust), although perhaps still differ in more subtle aspects of emotion discrimination (Kennedy and Adolphs, 2012). As in this study, these studies did find impairments in high-functioning adults’ performance on more advanced social judgments, such as the assessment of the trustworthiness and approachability of faces (Adolphs et al., 2001) and complex mental states (e.g. “concerned,” “noticing you,” “decisive”) based on pictures of eyes (Baron-Cohen et al., 1997).
Given previous hypotheses regarding theory of mind impairments in ASD, it is possible that poor intuitions about social awkwardness created a limit to how well ASD participants could evaluate this construct, regardless of the modality of the information provided to them. That is, the real limiting factor for participants with ASD may not have been in the processing of incoming information, but in the integration of this information with sound intuitions about subtle, complex emotions (such as social awkwardness, explored here), and social interactions. This interpretation is consistent with the accompanying (marginal) group difference observed in the RME task (which assessed the participants’ ability to infer mental states from static eye cues) and the observed group difference in judging social norms on the TONCK. This interpretation also would be consistent with previous research suggesting that emotions such as awkwardness, embarrassment, and shame are particularly difficult for individuals with ASD to evaluate accurately (Golan et al., 2006; Heavey et al., 2000, 2003).
One benefit of the experimental design we employed is that each participant assessed the same scenes in each of the three formats. However, this benefit also resulted in several constraints on the experimental design. For example, it would have been desirable to include an audio-only condition, resulting in a full complement of modalities (alongside the text-only, visual-only, and full audiovisual conditions). In addition, presenting the happiness comparison condition in each of the modalities likely would have been beneficial. However, the greater the number of modalities included, the higher the risk of loss of participant motivation with time, and the greater the likelihood that exposure to a clip in one condition would influence the participant’s performance in another. We decided to include a text-only condition over an audio-only condition because of concerns that information conveyed in audio clips (i.e. intonation, character affect) would increase the probability of this occurring. Given this concern, we took measures to avoid order effects by randomizing video clips within condition and alternating between visual-only and text-only conditions as first tasks. Even so, the audiovisual task was always completed last, following visual-only and text-only conditions. We ordered the experimental conditions in this manner because it was unlikely that participants could use information from the visual-only condition to inform their responses in the text-only condition (or vice versa), or could even surmise that scenes from one condition corresponded to scenes from the other. In contrast, seeing the full audiovisual condition first would have likely tainted both these conditions, so we presented the full audiovisual condition last. As a result, each participant experienced each of the scenes twice (once as a text-only comic strip and once as a muted video) prior to beginning the audiovisual condition, introducing the possibility that observed group differences in the audiovisual condition could have stemmed from a differential ability to exploit prior experience with degraded versions of the scenes presented in the two previous conditions.
Another limitation of this study was that only one complex social emotion (awkwardness) was assessed in this experiment. To determine whether these results are specific to social awkwardness, future studies should examine other complex social emotions, such as the self-conscious emotions (e.g. embarrassment and shame) suggested by Heerey et al. (2003), as well as including a nonsocial comparison condition. In addition, future research should also consider the addition of eye-tracking procedures to further unpack the perceptual and attentional mechanisms involved in recognizing these and other complex emotions (Bush and Kennedy, 2015).
Conclusion
Social comprehension requires the integration of multimodal information with prior intuition about the nature of others’ minds, often in real time. For individuals with ASD, a reduced ability to efficiently perform this integration may play a critical role in the social and communicative deficits present even in high-functioning adults, particularly when considering complex social judgments that extend past basic emotion recognition.
Footnotes
Acknowledgements
We would like to thank Susannah Burkholder for help with recruiting and testing participants.
Funding
This research was funded by the National Institutes of Mental Health (R00-MH094409) and the Brain and Behavior Research Foundation NARSAD Young Investigator Award.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
