Abstract
Nonliteral language represents a complex form of communication that can be interpreted in numerous different ways. Our study explored how individual differences in personality and communication styles affect the evaluation of literal and nonliteral language in the context of assumptions made by the Tinge Hypothesis (Dews & Winner, 1995). Participants watched videos of social interactions focusing on positive, negative, sarcastic, and jocular statements. They evaluated speaker intentions and social impressions and completed several personality and communication style questionnaires. Individual differences in empathy, defense style, and sarcasm use correlated with the accuracy of identifying speaker intent. Additionally, positive statements were rated as friendlier when compared to jocular statements, thereby supporting the Tinge Hypothesis. However, literal negative statements were rated as more friendly than sarcastic statements, which is inconsistent with the Tinge Hypothesis. The current results provide novel evidence for the Tinge Hypothesis using multimodal, dynamic stimuli and highlight the role of the individual personality of the recipient in evaluating sarcasm and jocularity.
1 Introduction
Imagine you cooked for hours to prepare a delicious meal for your guests. However, one guest comments “Wow, that looks so great” while rolling their eyes. How does this make you feel? Are you offended, or confused? Your reaction might depend on many factors, including your relationship with the guest, whether you like sarcasm or not, if you personally use sarcasm often, or if you are easily offended. On the other hand, you might not be offended if you do not recognize the sarcastic intention. Indeed, studies have shown that our evaluation of nonliteral language depends on tone of voice (e.g., Rockwell, 2000), the closeness of the relationship (e.g., Keltner et al., 1998), personality factors (Ivanko et al., 2004), and communication preferences (Holtgraves & Holtgraves, 1997).
The complexity of nonliteral intentions not only relates to how it is perceived by the communication partner, but also how it is expressed in the first place. During social interactions, what is said does not always equal what is meant. Intonation, body language, and facial cues are used to convey social intentions beyond lexical content, further adding to the complexity of comprehension (Attardo et al., 2003). In situations where we fail to pick up on these cues, we might end up with misunderstandings that lead to conflicts, lack of further communication or hurt feelings (McDonald et al., 2006).
To understand how we can aid individuals with compromised social skills or poor interpretation abilities, we need to study social communication in its complexity. There is a demand for experiments that combine dynamic, ecologically valid stimuli, as well as an appropriate set of individual differences measures. The Relational Inference of Social Communication (RISC; Rothermich & Pell, 2015) video inventory has been developed to study how speaker intentions are understood by individuals with psychiatric disorders, neurodegenerative diseases, as well as healthy adults across the lifespan. Besides neurological conditions, individual differences concerning personality traits and communication style preferences can influence an individual’s evaluation of nonliteral stimuli (Jakobson et al., 2018; Kieckhäfer et al., 2019). In order to decide which factors correlate with pragmatic inference making, our study seeks to examine the different types of personality measures, and how they relate to people’s evaluations of literal and nonliteral intentions.
One common form of nonliteral language is sarcasm, which usually involves a literally positive utterance that expresses a negative or critical attitude toward a person or event (Kreuz & Glucksberg, 1989). The type of sarcasm we study in this experiment is also sometimes referred to as “ironic criticism.” It is often accompanied by auditory and visual markers such as reduced pitch, smirking or eye rolls (e.g., Attardo et al., 2003). Sarcasm has been associated with many different social functions—it can increase the perceived politeness of criticism, create a humorous atmosphere, and is often used to express critique, provoke anger, or enhance condemnation (e.g., Caucci & Kreuz, 2012). In the example stated at the beginning, the response, “Wow, that looks so great” would be considered sarcastic, as marked by a change in prosody and other nonverbal cues. In contrast, jocularity is often described as a literal negative statement that is meant to be taken positively, accompanied by pleasant facial and vocal cues, such as laughter. In reference to the previously mentioned example, a jocular response to the dinner you made would be “Wow, that looks so terrible!” but spoken in a friendly tone. The type of jocularity we examine in the current study is also sometimes referred to as “jocularity” (Seckman & Couch, 1989) or “ironic compliment” (Gibbs, 2000). It is important to note that in daily life, sarcasm is more common than jocularity (Giora, 1997).
The nature of jocularity and sarcasm comprehension is a matter of debate, and several models exist, such as the Standard-Pragmatic Model (Grice, 1975), the Graded Salience Hypothesis (e.g., Giora et al., 2007), the Direct Access Hypothesis (Gibbs Jr, 1994; Jr, 2002), the Constraint-Satisfaction Model (Pexman et al., 2000), and the Tinge Hypothesis (Dews & Winner, 1995). While some models assume that nonliteral language is processed differently from literal language (Standard-Pragmatic Model and Constraint-Satisfaction Model), the Direct Access and the Graded Salience Models assume that it depends on the salience of a comment. The constraint-based theory of irony comprehension can be used to frame results about the influence of individual differences (Katz, 2005; Pexman, 2008). The theory takes into account numerous factors that affect irony and sarcasm perception, such as different contextual cues. It further assumes that these cues are processed in parallel by individuals to comprehend nonliteral language, depending on the strength of the cue and context (Whalen et al., 2020). While it is difficult to make predictions about specific associated factors based on the theory, it can be used to frame our data. For example, identifying traits that are involved when processing nonliteral language (Turcan & Filik, 2016; Pexman, 2008).
The Tinge Hypothesis also considers contextual and social factors. It suggests that the positive lexical meaning of sarcasm communicates a more positive tone than literal negative statements. However, the negative surface meaning of jocularity makes those statements appear less positive than literal positive utterances (Pexman & Zvaigzne, 2004). The assumption is that the surface meaning of a sarcastic statement is processed regardless of the intended meaning, “tinging” a positive interpretation (Pexman & Olineck, 2002).
Pexman and Olineck confirmed the Tinge Hypothesis and found that sarcasm was evaluated to be more mocking and more polite when compared to literal negative statements, speaking for a muting effect, or in other words, a softening impact of using sarcasm. In contrast, jocularity was judged as more mocking and less polite than literal positive statements. The Tinge Hypothesis is the most relevant model for our study as it suggests an influence of individual differences, such as gender (Pexman & Olineck, 2002). However, to our knowledge, only a few studies have tested the influence of other individual differences, such as personality traits.
Individual differences contribute to a person’s evaluation of sarcastic and jocular utterances, both of which are forms of irony. For example, Ivanko et al. (2004) found that participants who rate themselves higher in terms of sarcasm usage were quicker to recognize the same in others. In bilingual subjects, Tiv and colleagues (2019) found that greater second language proficiency leads to more frequent usage of general sarcasm in daily life. Thus, we asked participants to complete the Self-Reported Sarcasm Scale in order to assess how it might influence the evaluation of sarcasm and jocularity (Ivanko et al., 2004). Similarly, self-reported empathy traits seem to predict higher accuracy in recognizing speaker intentions (Jakobson et al., 2018). Another factor that is known to influence the evaluation of nonliteral language is conversation style differences. The Conversational Indirectness Scale (CIS) was created by Holtgraves (Holtgraves & Holtgraves, 1997) to measure differences in expressing oneself directly and/or indirectly as well as understanding other people’s indirect statements. Using the CIS, Ivanko et al. (2004) found that using irony indirectly to criticize a conversation partner can be seen as being polite, and participants with higher CIS-production scores recognized this politeness.
Additionally, we use the Interpersonal Reactivity Index (IRI) to compare empathy measures to a person’s perception of nonliteral language. The questionnaire assesses various aspects of empathy on four subscales: perspective taking, empathic concern, personal distress, and fantasy scales (Davis, 1983). Data from studies in children suggest that there is a relationship between empathy skills and the interpretation of nonliteral language (Nicholson et al., 2013), and that empathic components such as perspective taking are prerequisites for the understanding of sarcasm and jocularity (see Pexman et al., 2019). While the influence of empathy on nonliteral language perception has been studied less in adults, a study by Jakobson and colleagues (2018) using the RISC database found that low scores on the Fantasy subscale of the IRI led to lower accuracy in identifying speaker intentions. This subscale tests the ability to empathize with a fictional character in a book or movie, and as such might predict the evaluation of the RISC videos.
The defense style questionnaire (DSQ) has not yet been incorporated into studies that examine the evaluation of nonliteral language; however, it might prove to be informative for this study. The DSQ looks at immature, mature, and neurotic communication behaviors and is often used for assessing personality and affective disorders (Savilahti et al., 2018). People with immature defense styles exhibit behaviors such as passive-aggressiveness, complaining, regression and withdrawal. They might also act out based on thoughts such as “I get openly aggressive when I feel hurt.” People whose tendencies lean toward a higher immature defense style might be less likely to label the statements given in this study as appropriate or friendly, especially when they encounter sarcasm or jocularity. In actuality, the more immature the defense style, the more cognitive distortion there is, and negative thinking is reinforced (Bowins, 2018), and thus negative attitudes such as sarcasm might not be favored.
2 Current study
As communicators, we are aware that people use and perceive nonliteral speaker intentions differently (Ivanko et al., 2004). While the influence of individual differences in interpreting social intentions has been considered previously, studies often fail to account for the full dynamic and multimodal nature of communicative intentions (Jakobson et al. 2018). In the current study, we use the Relational Inference of Social Communication Video Inventory (RISC; Rothermich & Pell, 2015) to examine the influence of personality and communication preference on the interpretation of literal and nonliteral languages, as well as how the speaker is perceived. Rothermich and Pell (2015) developed the RISC to be able to test the perception of nonliteral language under ecologically valid circumstances. The inventory provides the possibility to study common forms of nonliteral language, such as sarcasm and jocularity, and compare them with both positive and negative literal utterances that only differ in the way speakers’ intentions are conveyed via verbal (e.g., intonation) and nonverbal (e.g., gestures) cues.
In the current study we focus on tasks that aim to identify if the speaker is literal (speaker intent), as well as social impression scales, if they are being nice or mean (friendliness), if they are being socially appropriate, and if that makes them likable. We also aim to test the Tinge Hypothesis and explore the usability of individual difference on studying nonliteral language processing using dynamic video materials. Our main hypotheses are as follows:
3 Method
3.1 Participants
Forty native English speakers with typical or corrected to normal hearing and vision were recruited from a participant pool within a large university in New England (24 female, 13 male, mean age = 18.89 years old, SD = 1.05 years). All participants were undergraduate college students at the time of participation. They were compensated with course credit for their psychology courses, and the study was approved by the local Institutional Review Board. Three participants had to be excluded due to missing data. All participants completed the IRI, the SSS, the DSQ and the CIS (see Appendix A for mean scores by personality trait questionnaire).
A post hoc power analysis (using the pwr package in R; Champely et al., 2020) was conducted to determine the sample size. It revealed an effect size of 0.35, power of 0.8, and 15 predictors, concluding that n = 51 was needed for a power of 0.8, revealing that our study is underpowered for large effects.
3.2 Dynamic stimuli
We obtained 192 short videos from the RISC database (Rothermich & Pell, 2015). Figure 1 shows screenshots for the four different intentions. The RISC database contains 600 videos in total; they have been used successfully as a tool for research in healthy adults across the lifespan and in children (Giles et al., 2019; Jakobson et al., 2018; Rothermich et al., 2021a; Rothermich et al., 2020; Rothermich et al., 2019). The videos in the published RISC inventory had been validated previously with 31 young adult participants (meanage = 23.21 years, SD = 3.88). For the specific subset of 192 videos we decided to use in the current study, young adults in the original study (Rothermich & Pell, 2015) identified the speaker’s intention in a 4-forced-choice task with an average accuracy of 85.37% correct (literal positive: M = 86.49%, SD = 17.94%, literal negative: M = 93.62%, SD = 4.63%, sarcastic: M = 79.84%, SD = 14.07%, teasing: M = 81.52%, SD = 10.64%).

Screenshot of one example scene taken from the RISC video database.
The videos depicted 5–10 second social interactions between two actors; in total, four different actors assumed a unique fictional identity they portrayed consistently over all vignettes (two female, meanage in years = 19.50, SD = 0.50). Various paired relationship types were constructed between the actors, including a mixed sex couple (Paul and Lisa), female friends (Lisa and Anna), mixed sex colleagues (Anna and Peter), and a male boss with their employee. The presented scenarios consisted of one person asking a question and the other person responding in one of four ways: literal positive, literal negative, jocularity, or sarcasm. For example, one scenario had one actor holding a plate of cookies and asking if the other actor would like one of them. The other actor either gave a
Each participant may see the same scenario (topic content in the video, for example cookies) multiple times, but they never see the same item twice due to the pseudo-randomization of the order of trials. Keeping the lexical content between the scenes of the same topic identical and only changing the manner in which the actors responded (literal positive vs. sarcastic and literal negative vs. jocular) allowed us to explore the effects of the communicative intention of the response and how those may be influenced by individual differences, as well as participants’ perceptions of the responding actor’s sincerity, social appropriateness, likeability, and friendliness.
3.3 Procedure
Each participant was presented with 192 videos from the RISC database (12 scenes with four relationships and four intentions). Each video was followed by a yes or no question (“Was the response sincere?”) and three rating questions. The rating questions were as follows: “How socially appropriate was the response?” (Answer options: 1 = not at all appropriate, 2, 3, 4, 5 = very appropriate), “How friendly was the response?” (Answer options: 1 = not at all friendly, 2, 3, 4, 5 = very friendly), “How likable was the person responding?” (Answer options: 1 = not at all likable, 2, 3, 4, 5 = very likable). Before and after watching the videos, participants completed the four questionnaires; the order of questionnaires was counterbalanced. We also tracked the participant’s eye movements during the experiment for a companion study (see Rothermich et al., 2021b).
3.4 Data analysis
The results for self-reported empathy (IRI questionnaire) were collapsed into two subscales, cognitive empathy (Perspective Taking and Fantasy Scale) and affective empathy (Empathic Concern and Personal Distress). For the analysis of friendliness, appropriateness, and likeability, we only included items in the analysis where participants correctly identified the intention (“Was the response sincere?” yes/no). On average, 11.34% of the items were excluded. All data points were included for the analysis of accuracy.
The data were analyzed using R (R Core Team, 2013) by means of a glmer analysis for the accuracy results and an ordinal logistic regression for the Likert scale data. A glmer model was built for the dependent variable accuracy with the fixed effect intention (LITERAL POSITIVE, LITERAL NEGATIVE, SARCASM, JOCULARITY) and the random effects subject, scene, and relationship. Models were compared based on Akaike information criterion (AIC; Hu, 2007), χ², and p-values. For post hoc comparisons we report β-, z/t-, and p-values. To test the influence of personality traits, we ran an LMER on the averaged accuracy data with subjects as a random factor and the 11 traits as fixed factors. The results include p-values, odds ratios and confidence intervals.
To test the effect of intention on perceived appropriateness, friendliness and likeability, we implemented an ordinal logistic regression analysis in R using the function polr in the R package MASS (Ripley et al., 2013). Subject was added as a random factor, while intention was included as a fixed factor. Models were compared based on AIC and residual deviance, and we report β-, z/t-, and p-values for post hoc comparisons. To test the influence of personality traits, we ran an ordinal logistic regression on the averaged Likert scale data with subjects as a random factor and the 11 traits as fixed factors; this was done for each intention separately.
4 Results
4.1 Accuracy
On average, participants had high accuracy scores when identifying speaker INTENTION (M = 88.51%, SD = 18.35%). Accuracy was highest when identifying SARCASM items (96% correct), then LITERAL POSITIVE items (90% correct), LITERAL NEGATIVE items (88% correct), and JOCULARITY items (79% correct, see Figure 2). Including the factor INTENTION in the accuracy model improved it significantly. Details and post hoc comparisons reveal significantly higher accuracy for SARCASTIC versus JOCULARITY items. We also found higher accuracy for LITERAL POSITIVE compared to JOCULAR scenes, as well as higher accuracy for LITERAL POSITIVE compared to LITERAL NEGATIVE scenes. See Table 1 for details. Accuracy measures were also significantly predicted by reported affective empathy measures, and it appears that participants with lower empathy are better at identifying the sincerity of speaker intentions (see Table 1 for details).

Violin plots and boxplots depicting mean accuracy scores for identifying speaker intent.
GLMER models and post hoc results for accuracy. Italics indicate significant results. P-values were adjusted using the Tukey method.
Note. AIC = Akaike Information Criterion.
4.2 Appropriateness
Participants rated literal positive utterances as most appropriate (M = 4.50, SD = 0.68), followed by jocular (M = 3.41, SD = 0.96), literal negative (M = 2.87, SD = 1.11) and sarcastic responses (M = 2.03, SD = 0.84). Including the factor INTENTION in the appropriateness model improved it (see Figure 3 for details). Post hoc comparisons revealed significantly lower appropriateness ratings for SARCASTIC versus LITERAL POSITIVE, LITERAL NEGATIVE, and JOCULARITY items. We also found significantly lower appropriateness ratings for LITERAL NEGATIVE and JOCULAR compared to LITERAL POSITIVE scenes. A significant difference was also found when comparing JOCULAR to LITERAL NEGATIVE responses (see Table 1 for model details).

Violin plots and boxplots depicting mean appropriateness ratings.
We examined the individual predictors for appropriateness ratings and the results indicate that for literal positive items, DSQ40 - Immature, SSS - Face-saving, and SSS - Frustration Diffusion were significant positive predictors in the model, while CIS Speak and SSS - General sarcasm were identified as negative predictors (see Table 2b for details). For literal negative responses, DSQ40 - Immature was a significant negative predictor, as well as for sarcastic responses and jocular responses. Appropriateness ratings of sarcastic items were also negatively predicted by SSS - Frustration Diffusion measurements.
Ordinal logistic regression and post hoc results for appropriateness. Italics indicates significant results. P-values were adjusted using the Tukey method.
Logistic regression for appropriateness and personality traits. Italics indicates significant results.
4.3 Friendliness
Participants rated literal positive utterances as most friendly (M = 4.40, SD = 0.74), followed by jocular (M = 3.63, SD = 0.92), literal negative (M = 2.45, SD= 0.95) and sarcastic responses (M = 1.88, SD = 0.80). Including the factor INTENTION in the friendliness model improved it significantly (see Figure 4 and Table 3a for details). Post hoc comparisons reveal significantly lower friendliness for SARCASTIC versus LITERAL POSITIVE, LITERAL NEGATIVE, and JOCULARITY items. We also found lower friendliness ratings for LITERAL NEGATIVE and JOCULAR compared to LITERAL POSITIVE scenes. A significant difference was also found when comparing JOCULAR to LITERAL NEGATIVE responses (see Table 3a for model details).

Violin plots and boxplots depicting mean friendliness ratings.
Ordinal logistic regression and post hoc results for friendliness. Italics indicates significant results. P-values were adjusted using the Tukey method.
We examined the individual predictors for friendliness ratings and the results indicate that for literal positive items, DSQ40 - Neurotic and IRI affective were significant positive predictors in the model (see Table 3b for details). For literal negative responses, only DSQ40 - Immature and SSS - Face-Saving were significant negative predictors. Friendliness ratings of jocular items were positively predicted by DSQ40 - Mature and negatively by DSQ - Immature measurements. No significant predictors were found for sarcastic items.
Logistic regression for friendliness and personality traits. Italics indicates significant results.
4.4 Likeability
Participants rated literal positive utterances as most likable (M = 4.34, SD = 0.82), followed by jocular (M = 3.73, SD = 0.84), literal negative (M = 2.66, SD = 0.99) and sarcastic responses (M = 1.81, SD = 0.78). Including the factor INTENTION in the likeability model improved it significantly (see Figure 5 and Table 4a for details). Post hoc comparisons reveal significantly lower likeability for SARCASTIC versus LITERAL POSITIVE, LITERAL NEGATIVE, and JOCULARITY items. We also found lower likeability ratings for LITERAL NEGATIVE and JOCULAR compared to LITERAL POSITIVE scenes. A significant difference was also found when comparing JOCULAR to LITERAL NEGATIVE responses (see Table 4a for model details).

Violin plots and boxplots depicting mean likeability ratings.
Ordinal logistic regression and post hoc results for likeability. Italics indicates significant results. P-values were adjusted using the Tukey method.
We examined the individual predictors for likeability ratings and the results indicate that for literal positive items, DSQ40 - Immature, SSS - Frustration Diffusion, and SSS - Face-saving were significant positive predictors in the model, while SSS - General sarcasm was a negative predictor (see Table 4b for details). Likeability ratings of jocular items were positively predicted by DSQ - Immature measurements. No significant predictors were found for literal negative or sarcastic items.
Logistic regression for Likeability and personality traits. Italics indicates significant results.
5 Discussion
The Relational Inference of Social Communication Inventory (RISC) was developed to test how we perceive common forms of nonliteral language, more specifically social interactions involving sarcasm and jocularity (Rothermich & Pell, 2015). In this study, we used the RISC inventory to test assumptions of the Tinge Hypothesis (Dews & Winner, 1995) and to examine how individual differences may influence people’s interpretations of speaker intentions during literal and nonliteral communication. Based on previous research (e.g., Jakobson et al., 2018; Rockwell, 2003), we hypothesized that individual differences correlate with participants’ evaluations of nonliteral language. Our findings will help support our understanding of the traits related to difficulties with nonliteral language comprehension. Our findings will further inform the types of personality surveys that are most relevant when testing healthy and clinical populations, such as individuals with Autism Spectrum Disorders, Parkinson’s disease, or schizophrenia.
5.1 Accuracy
Overall, accuracy for determining speaker intention was highest when identifying sarcastic items, followed by literal positive items, literal negative items, and finally, jocularity items. Interestingly, participants were better at identifying whether the speaker was sincere for sarcastic responses, compared to literal positive responses. This finding is in contrast to earlier studies (e.g., Jakobson et al., 2018; Rothermich & Pell, 2015) which revealed literal statements were easier to identify than nonliteral statements. We assume that differences in our sample compared to earlier samples, such as geographical region (Dress et al., 2008) and gender distribution (Ivanko et al., 2004), might have contributed to the current findings. Furthermore, in the 2015 study by Rothermich and Pell, participants were asked to identify the intentions of the speakers via a forced-choice task. They had the options “literal,” “sarcastic,” “teasing,” and “lying” with the exception of prosocial lies, which are part of the RISC database but were not studied in the current experiment. This difference could have led to the changes in accuracy when using a binary question like in the current study. Accuracy with respect to each individual difference measure will be discussed in the sections below.
5.2 Appropriateness, friendliness and likeability
We also asked participants about their social impressions when watching the videos, that is, how appropriate, friendly, and likable the response was for each vignette. Participants rated literal positive utterances as the most appropriate, most friendly, and most likable, followed by jocular and literal negative utterances. In contrast, sarcastic responses were rated as the least appropriate, friendly, and likable. The difference between literal positive and sarcastic utterances is worth noting since the lexical content was identical for these conditions and only the mannerisms that the actors used to respond changed (such as prosody, body language, etc.).
Our findings demonstrate how nonverbal cues are capable of influencing speaker perceptions during social communication. In line with the Tinge Hypothesis, we hypothesized (H1) that sarcastic statements would be rated as more appropriate compared to literal negative statements. However, in our study, sarcastic statements were rated as less friendly/appropriate when compared to literal negative statements, which stands in contrast with the Tinge Hypothesis. Similarly, Colston (1997) could not support the Tinge Hypothesis and showed that sarcasm was often used to decrease as opposed to increase approval. Both Pexman and Olineck (2002) as well as Colston (1997) referred to differences in methodology as the reason for varying results; for example, the distinction between tasks that target the intentions of the speaker or the social evaluation by the listener. They also mention the importance of prosody and other acoustic parameters for ratings of sarcasm and jocularity.
Most studies supporting the Tinge Hypothesis used written or auditory materials (Harris & Pexman, 2003; Matthews et al., 2010; Milanowicz et al., 2017; Thompson et al., 2016) or cartoons (Dews et al., 1996), making them less ecologically valid and lacking important cues, such as facial expressions and/or prosody. In contrast, the RISC videos provide multiple prosodic, facial, and body language cues that help get the speaker’s intention across. Thus, the actor’s usage of negative nonverbal cues (e.g., eye rolling, smirking) might override the muting effect of the literal meaning and participants in our study were influenced by the auditory and visual cues when deciding if a statement is appropriate or friendly. It is also possible that as participants are witnessing a conversation between two people and observe the positive/negative reactions to literal and nonliteral statements, they are more inclined to judge sarcasm as inappropriate, unfriendly and/or unlikable. Thus, having direct access to the consequences of a sarcastic comment might override a part of the proposed muting function of sarcasm.
We find support for the Tinge Hypothesis as participants overall judged literal positive statements as the most friendly, appropriate and likable, compared to jocularity, sarcasm, and literal negative statements (H1). This is similar to results by Pickering and colleagues (2018), as well Pexman and Olineck (2002) who also found that participants seem to favor the “full” positive statements—literal positive statements that are accompanied by friendly facial and acoustic cues. In line with the results by Pickering et al. (2018), our results confirm the Tinge Hypothesis and revealed that jocularity is judged as less positive than literal positive statements. We would argue that the literal positive intention is the most expected, socially accepted form of communication, and therefore preferred above the other intentions we tested. Future studies should systematically test the assumptions of the Tinge Hypothesis with different modalities (written, auditory only, and audio-visual), varying tasks (speaker intent vs. social impressions), as well as being able to view the reaction of the listener.
5.3 Individual differences
We hypothesized (H2) that participants with higher empathy scores (based on the IRI) would be better at identifying nonliteral language as insincere due to advantages in cognitive and affective perspective taking. This could not be confirmed in the current study. In contrast, we found that participants with higher affective empathy scores find it harder to decode if a statement is meant as sincere or insincere. When testing individuals with schizophrenia, Sparks and colleagues (2010) found that performance on identifying sarcasm was predicted by affective empathy scores, revealing higher accuracy for participants with lower personal distress. In our sample, participants with higher affective empathy seem to have a harder time correctly identifying if somebody was sincere maybe because some of the intentions, such as literal negative or sarcastic items, create discomfort. Future studies will need to be conducted to further test the influence of affective empathy on identifying speaker intentions.
Affective empathy also affected judgments of friendliness for literal positive statements; in other words, participants with higher affective empathy scores judged literal positive items as friendlier. It is possible that participants with high affective empathy are drawn to the positive attitude displayed in these items. In the videos, actors receiving a literal positive comment often show contentment via nonverbal cues (e.g., smiling) This might affect participants with higher affective empathy more than those with lower levels.
Based on previous findings (Ivanko et al., 2004; Polk et al., 2009), we hypothesized (H3) that higher scores on the Self-Reported Sarcasm Scale (SSS) would positively predict the accuracy of identifying sarcasm as insincere, and that participants who reported using sarcasm often would also find it more appropriate, friendly and likable. Our findings showed no significant results for accuracy. However, people with high SSS Frustration Diffusion (FD) and high SSS Face-Saving (FS) scores found literal positive items to be more appropriate and likable. It is possible that participants who are concerned about their self-image (saving face) value positive statements that are not ambiguous and non-threatening. On the other hand, we find that SSS General Sarcasm (GS) scores negatively predict the appropriateness and likeability of literal positive responses. This suggests that the SSS subscales reflect different tendencies of self-perceived sarcasm use—the less participants use sarcasm in general, the more appropriate and likable they perceived literal positive comments.
In the current study, we hypothesized that people with high CIS-production scores (i.e., who prefer to communicate indirectly) will judge sarcasm as more appropriate, friendly and likable (H4). We found that individuals who rated themselves likely to use indirect speech rated literal positive comments as less appropriate. In other words, if somebody uses more direct speech in everyday life, they tend to evaluate literal positive comments as more appropriate. Directness of communication can vary between communities of practice or cultural backgrounds; in another study using the RISC, we tested individuals who group in communities that prefer direct communication (e.g., Germany) or indirect use of communication (e.g., China; Giles at al., 2019). We found that German participants evaluate literal negative responses as more appropriate in certain situations, while Chinese participants preferred a more indirect way of communicating.
To our knowledge, no studies have directly investigated the extent to which nonliteral language processing may influence individual differences related to defensive communication styles. We speculated (H5) that it might be the case that participants with higher scores on the DSQ immaturity would be poorer at identifying nonliteral language as insincere due to disadvantages in ignoring irrelevant information as well as cognitive misrepresentations (Bowins, 2018). Individuals with a higher score on the DSQ immature subscale were less likely to find the statements appropriate, friendly or likable, especially when encountering sarcasm or jocularity. The more immature the level of defense is for an individual, the greater cognitive distortion, which leads to negative thoughts (Bowins, 2018). This could explain a negative bias when evaluating rude or inappropriate statements. These results might call for this questionnaire to be used in more studies, especially when testing specific populations that have trouble with nonliteral language.
Taken together, we find that certain personality traits and communication preferences influence the way individuals judge literal and nonliteral language, for example empathy or sarcasm use. These individual differences could also affect the Tinge Hypothesis, for example people that fall into the high sarcasm use category (as measured by the SSS) could find sarcasm more friendly than literal negative statements. However, when exploring this possibility post hoc, we did not find any indication that this is the case. For all tested individual difference measures, we found that both high and low groups consistently rate sarcasm as less friendly, appropriate and likable (see Appendix B).
6 Limitations
One limitation of our study is the small sample size (n = 37) which is lower than the estimate by a post hoc power analysis (n = 51 needed for a power of 0.8). A replication of our results is needed to determine if the patterns persist with larger samples. Another limitation of our study is that all our individual difference measures are obtained via self-report. In comparison to performance-based tasks or online measures, self-reported measures might reflect the goal to be seen positively by others and might not objectively represent an effective measure of somebody’s real behavior (Richter & Kunzmann, 2011). Future studies should explore performance-based measures or online measures to study the evaluation of nonliteral language. Another limitation is that all our participants are young healthy adults in college, and our sample is therefore lacking diversity in terms of education, age and other demographic variables, such a geographic location (Dress et al., 2008).
7 Conclusions
Many of the hypotheses and frameworks concerning sarcasm and jocularity are based on studies that have used static images and/or written materials, without consideration of the wealth of social cues that accompany these types of statements in everyday life. Using video materials, we could show that certain personality traits and communication preferences seem to correlate with the identification of speaker intent and the judgment of social impressions. Moreover, although meaningful and ecologically valid information is communicated through written form every day, the use of video materials has the advantage of reflecting real-world communication. For example, individuals that use sarcasm often seem to have advantages when processing multimodal, dynamic stimuli. The Tinge Hypothesis could only partially be confirmed, most likely due to the fact that it was developed mostly on written materials, and future studies need to expand it to include assumptions about situations in which a multitude of cues are available. In our study, sarcasm is judged as the least appropriate, friendly and liked intention, and the positive surface meaning seems to be outweighed by the presence of negative audio-visual cues, thus failing to mute criticism.
Supplemental Material
sj-docx-1-las-10.1177_00238309211010859 – Supplemental material for No, No One Had Fun. Individual Differences in Nonliteral Language Perception
Supplemental material, sj-docx-1-las-10.1177_00238309211010859 for No, No One Had Fun. Individual Differences in Nonliteral Language Perception by Gitte Henssel Joergensen, Pavitra Rao Makarla, Matthew Fammartino, Lauren Benson and Kathrin Rothermich in Language and Speech
Footnotes
Acknowledgements
We would like to thank Maria Lattanzi, Julia Mocciola, and Tom Pietruszewski for help with data acquisition, as well as Gerry Altmann and Eiling Yee for sharing their laboratory equipment. We would further like to thank Havan Leigh Harris for comments on an earlier draft of this manuscript.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
