Abstract
Religious affiliation is an important identifying characteristic for many individuals and relates to numerous life outcomes including health, well-being, policy positions, and cognitive style. Using methods from computational linguistics, we examined language from 12,815 Facebook users in the United States and United Kingdom who indicated their religious affiliation. Religious individuals used more positive emotion words (β = .278, p < .0001) and social themes such as family (β = .242, p < .0001), while nonreligious people expressed more negative emotions like anger (β = −.427, p < .0001) and categories related to cognitive processes, like tentativeness (β = −.153, p < .0001). Nonreligious individuals also used more themes related to the body (β = −.265, p < .0001) and death (β = −.247, p < .0001). The findings offer directions for future research on religious affiliation, specifically in terms of social, emotional, and cognitive differences.
Religion is a ubiquitous part of human life. While western, educated, industrialized, rich, and democratic (WEIRD) countries are generally becoming more secular, around 77% of the U.S. population remain religious (Pew Research Center, 2015). Moreover, over 80% of the world’s population identifies with some type of religion—a trend that appears to be on the rise (Pew Research Center, 2012). Religious affiliation is associated with meaningful life outcomes; in general, those with a religious affiliation (compared to those who define themselves as agnostic or atheists) tend to have better health (Koenig, 2004), higher well-being (Joseph, Linley, & Maltby, 2006; Lewis & Cruise, 2006; Pargament, 2002), and longer life (McCullough, Hoyt, Larson, Koenig, & Thoresen, 2000; Powell, Shahabi, & Thoresen, 2003). On the other hand, religious affiliation has been associated with increased rates of obesity (Kim, Sobel, & Wethington, 2003), racism (Hall, Matz, & Wood, 2009), and antiscientific attitudes (Gauchat, 2008). While positive and negative outcomes related to religious affiliation are subject to variation based on sect and one’s particular community (Pargament, 2002), the question is no longer whether religious affiliation relates to life outcomes, but how.
A growing number of studies have investigated potential psychological mechanisms linking religious affiliation and life outcomes. Given the starkly different metaphysical pictures of reality proposed by religious and nonreligious belief systems, it should come as little surprise that religious affiliation is associated with a number of psychological differences. Religious affiliation correlates with more agreeable and conscientious personalities (Saroglou, 2002) and more self-control (McCullough & Willoughby, 2009). Religious affiliation also correlates with prosocial behavior (Shariff & Norenzayan, 2007), marital stability (Mahoney, Pargament, Tarakeshwar, & Swank, 2001), and better health-related behaviors (e.g., Strawbridge, Shema, Cohen, & Kaplan, 2001). Religious belief influences attitudes toward certain political issues such as abortion (Minkenberg, 2002). In terms of cognition, more religious individuals have been shown to tend toward a more intuitive—as opposed to analytical—thinking style (Gervais & Norenzayan, 2012). In general, religious affiliation is a widely measured variable that predicts a number of outcomes of psychological interest and is emerging as a variable of particular interest to well-being research.
Religions themselves can be considered different cultural groups. Religions provide coherent sets of social norms for group membership (Cohen & Hill, 2007). Even for nonreligious individuals (e.g., atheists and agnostics), there are norms around how to respond to religion, creating their own cultures. These differences are apparent both between religious and nonreligious orientations as well as within different sects of each of these categories. Notably, language plays an important role both in reflecting a culture and in influencing and forming the norms and perspectives of that culture (Cohen, 2009).
Behavioral differences associated with religious affiliation also appear in online environments. By analyzing language from social networking sites, algorithms can predict individuals’ religious affiliations with a high degree of accuracy (Kosinski, Stillwell, & Graepel, 2013; Nguyen & Lim, 2014; Wagner, Asur, & Hailpern, 2013). For example, one study examined language differences underlying the polarization between Islamist and secular groups in Egypt during the so-called Arab Spring, identifying, among many other factors, the different media outlets preferred by each group (Weber, Garimella, & Batayneh, 2013).
A study by Ritter, Preston, and Hernandez (2013) compared linguistic differences between Christians and Atheists on Twitter, a popular online social networking site. The religious affiliation of users was extrapolated based on who followed updates from either Christian or Atheist leaders. The authors report that those who followed Christian leaders used a greater number of positive emotions words (e.g., “love” and “nice”) and words related to social connectedness (e.g., “mate” and “friend”) while those who followed Atheist leaders used more negative emotion words (e.g., “hurt” and “nasty”) and words related to cognitive processes (e.g., “think” and “consider”). These categories—positive emotion, social connection, and thinking style—were taken, in slightly modified form, from categories created for the Linguistic Inquiry and Word Count (LIWC) program. LIWC contains a number of other categories that this study did not test.
Unlike self-report surveys, language posted on social media sites such as Facebook and Twitter offers more spontaneously generated language data potentially relevant to a person’s personality, thoughts, attitudes, and behaviors. Analyzing language data from these platforms has been shown to predict traits like personality (e.g., Kern et al., 2014a; Schwartz et al., 2013a), age (Kern et al., 2014b), and gender (Park et al., 2016), among many other factors (Kosinski et al., 2013). Analysis of language from social media works about as well as other methods of evaluation for outcomes including personality (Park et al., 2015), life satisfaction (Schwartz et al., 2013b), and heart disease (Eichstaedt et al., 2015). Notably, the words that most correlate with these outcomes can potentially illustrate how aspects of how these constructs reveal themselves in natural language. Words, phrases, and linguistic themes that differentiate religious versus nonreligious affiliations can provide insights into the kinds of topics that people from these groups tend to express.
Analyzing social media language data can follow either top-down theory–based approaches or bottom-up data–driven approaches (Kern et al., 2016). In general, top-down approaches use researcher-created language categories, whereas bottom-up approaches utilize computational clustering algorithms to derive language clusters. Ritter and colleagues (2013), in their study of religious affiliation on Twitter, used a top-down approach, in which they tested a few preselected language categories. This approach limited their opportunity to discover other linguistic correlates of religious affiliation unanticipated a priori. A top-down approach that examined more language topics would potentially provide unanticipated findings. Also, a particular benefit of bottom-up approaches is the possibility of identifying topics (language clusters) and patterns that arise from the data itself rather than from researcher-created categories, potentially yielding greater coverage of the links underlying religious affiliation and other life outcomes.
In the present study, we analyzed differences in language use between people who reported religious or nonreligious affiliations on Facebook. While the Ritter et al.’s (2013) study also only had an assumed religious affiliation, we were able to determine affiliation based on participants’ direct report. We also analyzed linguistic differences between religious and nonreligious individuals using two forms of computational linguistic methods. First, replicating Ritter et al.’s top-down approach, we used the LIWC program (Pennebaker, Francis, & Booth, 2001) to test whether their findings would generalize to a different online context (i.e., Facebook rather than Twitter) with participants who self-reported their religious affiliation (rather than assuming affiliation). Additionally, rather than limiting our analysis to a few language categories, we examined all of the LIWC categories for differences across religious affiliation. Extending the linguistic analysis further, we also used differential language analysis (DLA; Schwartz et al., 2013a), a bottom-up, data-driven approach that utilizes language clusters based on semantic similarity as variables rather than researcher determined categories. We used both of these methods of computational linguistic analysis to examine words, phrases, and topics associated with religious and nonreligious affiliation.
Method
The source of our language and religious affiliation data was Facebook, a popular online social networking website (Duggan & Smith, 2013). Specifically, we used data from the MyPersonality application, which asked Facebook users to consent to allow researchers to analyze their written online posts and other self-reported information (Kosinski et al., 2013). The MyPersonality application was available from 2007 to 2012, and these authors can be reached through www.MyPersonality.org (Kosinski, Matz, Gosling, Popov, & Stillwell, 2015). Several studies have concluded that Internet and social media studies can be generalized to the population (Back et al., 2010; Gosling, Vazire, Srivastava, & John, 2004). In particular, the MyPersonality application has been found to be generally representative, although slightly younger than the general population (Kosinski et al., 2013; Schwartz et al., 2013b).
The sample was comprised of MyPersonality participants who wrote at least 1,000 words across their statuses and had written an answer in the Facebook “religion” prompt for a group that exceeded 100 participants. Most participants were from the United States (87%) and United Kingdom (11%). Of the total 12,815 participants, 10,359 were considered “religious” (Christian, Hindu, Muslim, and Buddhist), and the majority of these religious individuals were Christian (8,913). This “Christian” category contained 2,426 self-identified Catholics, 1,118 Baptists, 336 Lutherans, 219 Pentecostals, 265 Methodists, 248 Protestants, and 4,301 users who identified simply as Christians. The rest of the sample (2,456) was “nonreligious,” which included 1,219 self-identified Atheists and 1,237 Agnostics. These figures are generally representative of the religious and nonreligious landscape of the U.S. population. As we were underpowered to compare within cultures (United States vs. United Kingdom) or between different religious groups, we limited our analysis to comparing religious and nonreligious individuals.
Procedure
We began by tokenizing Facebook posts (Potts, 2011) to extract words (including misspellings of common words), punctuation, and emoticons, as well as two- and three-word phrases (see Kern et al., 2016, and Schwartz et al., 2013a, for detailed methodology). In essence, this process involves separating words from its textual context in order to cluster them into linguistic variables.
Using a top-down approach, we first examined linguistic differences between religious and nonreligious users using LIWC (Pennebaker, Booth, & Francis, 2007). LIWC includes numerous predefined categories (e.g., positive emotion, including words such as “happy,” “joy,” and “love”) and counts the number of times words from each category are used. The program then provides the relative frequency of each language category (i.e., frequency adjusted by the total number of words). Replicating Ritter et al. (2013), we tested whether those with a religious affiliation used more positive emotions and social words, and fewer cognitive process words, by correlating, using logistic regression, each category score with a single binary-coded religious affiliation variable. Extending the Ritter et al.’s study, we also explored whether other LIWC categories differentiated the two groups.
Next, using a bottom-up approach, we compared the language of religious individuals to nonreligious individuals using DLA. DLA takes an atheoretical approach, as it is not limited by researcher-created categories. Specifically, DLA creates a preset number of clusters of words and phrases based on their degree of semantic similarity, which then become language variables that can be correlated with outcomes of interest. This data-driven approach allows for a more transparent view of the words that differentiate the two groups. First, we correlated users’ religious or nonreligious identification against all the one- to three-word phrases they had written to examine the most positively and negatively associated words with being religious or nonreligious on Facebook. We used a set of previously created topics (Schwartz et al., 2013a) derived through a clustering algorithm called latent Dirichlet allocation (LDA) to create topics of semantically related clusters of co-occurring words (Blei, Ng, & Jordan, 2003). In accordance with the precedent set in our previous work (Schwartz et al., 2013a), the number of topics was set to 2,000 (see also Kern et al., 2016, for rationale). We then correlated users’ religious or nonreligious identification with LDA topics.
As age and gender impact word use (Kern et al., 2014b; Park et al., 2016; Pennebaker & Stone, 2003), we controlled for these demographics in all analyses by included them as covariates in logistic regression. In accordance with conventional linguistic analysis reporting, we used a p value of <.05, after adjusting for multiple comparisons using Simes’s (1986) multitest correction, as a heuristic for identifying potentially meaningful correlations.
Results
For religious individuals, religion was the most correlated LIWC category (β = .283; top words include “devil,” “blessing,” and “praying”), providing some face validity to our approach. Replicating Ritter et al.’s (2013) findings with Twitter, individuals with a religious orientation were more likely to use words belonging to positive emotion (β = .278; e.g., “love,” “good,” and “happy”), family (β = .242; e.g., “mothers,” “uncle,” and “aunt”), and social (β = .189; e.g., “speaking,” “we,” and “they”) dictionaries. Nonreligious individuals were more likely to use words in anger (β = −.427; e.g., “hate,” “lying,” and “sucks”), negative emotion (β = −.317; e.g., “bad,” “hate,” and “cried”), the cognitive processes (β = −.085; e.g., “expected,” “figured,” and “barely”), and insight (β = −.081; e.g., “figured,” “noticed,” and “reasons”) categories. Exploring other LIWC dictionaries, nonreligious individuals were also more likely to use words in the swearing (β = −.402; e.g., “piss,” “screw,” and “heck”), body (β = −.265; e.g., “heads,” “neck,” and “chest”), and death (β = −.247; e.g., “die,” “dead,” and “died”) categories. Other significantly correlated categories are summarized in Table 1.
Linguistic Correlates of Religious (Christian, Muslim, Buddhist, Hindu, n = 7,416) and Nonreligious (Atheist, Agnostic, n = 1,633) Affiliations.
Note. p Values are corrected for multiple comparisons using Simes’s (1986) method.
Linguistic analysis using DLA between religious and nonreligious individuals resulted in a similar pattern of findings. The 75 most distinctive words and phrases are visualized in Figure 1. The religious group shows numerous religious words (“church”), positive emotion (“love”), and social words (“family”), as well as words suggesting gratitude (“blessed,” “thank,” and “thankful”). The nonreligious group shows many swear words (“fucking”) and words related to drug use (“drunk”) and death (“dead”). Additionally, nonreligious individuals showed words that have been associated with a more nuanced cognitive style (“apparently,” “possibly,” “probably,” and “thinks”).

Words and phrases that most distinguish religious affiliation (top) versus nonaffiliation (bottom). The size of the word indicates the correlation strength and the color indicates frequency (red is more frequent, and gray is less frequent).
Taking a more granular approach, the specific LDA topics most strongly associated with the language of religious participants (Figure 2) revolve around prayer (“pray” and “prayer”), a sense of generalized gratitude (“thankful” and “blessed”), and social language (“family” and “friend”). Gratitude (“grateful”) is also apparent in several religious topics, perhaps reflecting more positive and socially oriented expressions. In contrast, LDA topics associated with nonreligion involve cursing and anger. Additionally, nonreligious language is indicative of the processing of unexpected information (e.g., “odd,” “unusual,” and “strange”) and forming conclusions (e.g., “discovered,” “found,” and “realized”), perhaps reflecting an analytic attitude and approach to the world.

Statistically significant topics differentiating those with a religious affiliation (top) versus nonreligious (bottom). The larger the word, the more prevalent the word is in the topic, the colors are random (see Schwartz et al., 2013).
Discussion
Comparison of the language used by religious and nonreligious individuals on the Facebook social media platform revealed several significant differences. Replicating Ritter et al.’s (2013) study of Twitter users, religious individuals tended to use more positive emotion and social themes, whereas nonreligious individuals used more negative emotion and cognitive themes. By exploring more categories, we also found that nonreligious individuals also swore more often and discussed death, the body, and sex more frequently than religious individuals. Extending these findings further using a bottom-up approach revealed additional insights. For example, the language of religious individuals was more prosocial in nature and involved words indicative of appreciation and gratitude.
A number of studies find positive associations between religion and well-being (e.g., Levin & Chatters, 1998; Lewis & Cruise, 2006; Myers, 2000; Seybold & Hill, 2001). Emmons and Crumpler (2000) describe a conspicuous emphasis on the positive emotion of gratitude in religious groups, and another study shows that gratitude is correlated with religiousness (Emmons & Kneezel, 2005). Fredrickson (2002) theorizes that positive emotions may mediate the observed relationship between religion and well-being. The origin of greater positive emotion expressed by religious individuals is not clear, although some research suggests it may derive from increased levels of social support from religious communities (Salsman, Brown, Brechting, & Carlson, 2005). The language itself supports this perspective, as words associated with religious individuals tended to reflect social themes and positive emotions generally as well as gratitude in particular.
The inverse finding, that nonreligious individuals experience more anger and negative emotions, has also been observed in previous research (Kimble & McFadden, 2003). Pargament (2002) argues that religion can support coping strategies through a process that includes healthy regulation of negative emotion. Also, religious people report less anxiety (Inzlicht, McGregor, Hirsh, & Nash, 2009), which may generalize to fewer negative emotions in general. In addition, it has been observed that individuals who are not religious may be less likely than those who are religious to engage in certain cognitive strategies that are known to lift mood (Buffone, Gabriel, & Poulin, 2016).
However, it is also possible that religion merely exerts a social pressure that discourages the verbal expression of negative emotion while promoting the verbal expression of positive emotion. That is, religion may encourage people to present a more positive facade, despite whatever emotions are actually being experienced. One study examining this topic, though, found that religious individuals do not appear to repress negative emotions (Bullard & Park, 1998). However, discrepancies between the feeling and expression of emotion are a fundamental issue in linguistic analysis (and self-report in general) and, perhaps, particularly salient in the study of religion. The influence of social norms in the context of religious affiliations (Cohen, 2009; Cohen & Hill, 2007) remains an important area for future research.
A core pathway through which religion may influence well-being and health outcomes is through positive social relationships. The well-being benefits from religious group membership have been described in previous research (Ysseldyk, Matheson, & Anisman, 2010), and some have argued that social cohesiveness may mediate some of the well-being benefits of religion (Salsman et al., 2005). Diener and Seligman (2002) suggest that the association between religion and well-being can be almost entirely explained by its correlation with higher quality relationships. Some have argued that the benefits and maybe even the evolutionary origins of several aspects of religion reside in its ability to foster closer social bonds (Graham & Haidt, 2010; Wilson, 2002; Yaden, Haidt, Hood, Vago, & Newberg, 2017).
The language used by religious individuals was not only social in nature, but the frequent mention of “grateful” may reflect a sense of appreciation for others. Gordon, Impett, Kogan, Oveis, and Keltner (2012) suggest that gratitude helps to maintain positive romantic relationships, in that feeling appreciated motivates one to work harder at maintaining the relationship, in turn helping the partner to feel appreciated and motivated to hold on to the relationship as well. A similar way of responding to other relationships might help develop a strong social network, which provides a sense of belonging and connectedness, as well avenues for social support when needed.
The finding that nonreligious affiliation is associated with linguistic markers of cognitive processes has likewise been observed in previous research. Gervais and Norenzayan (2012) found that the tendency to engage in analytic thinking to override initial, intuitive responses to critical thinking puzzles correlates with atheism. This is supported by a number of findings that suggest religious belief may supervene on several intuitive cognitive processes such as anthropomorphism and the tendency to perceive intentionality (Barrett, 2000; Bloom, 2012; Boyer, 2008). If nonreligious affiliation involves engaging in analytic forms of reasoning to override more intuitive beliefs, then this may help explain the correlation between linguistic categories related to cognitive processes and nonreligion.
The finding that nonreligious individuals swear more and use sexual words more frequently may have to do with the absence of taboos against mentioning such topics (Jay, 2009). Previous linguistic analysis studies have found that those who are less agreeable swear more often (Yarkoni, 2010), and as religious individuals are more agreeable, this personality trait could help explain the relationship. Similarly, the finding that ingestion and body categories are correlated with nonreligion may involve more mention of “sensual” topics (i.e., “drinking”). Many studies have found that religion is associated with reduced drug and alcohol use (Wallace & Forman, 1998), for example, which may help to explain the reduced mention of these topics.
A less obvious finding is the relationship observed between nonreligious individuals and mentions of words related to death. To investigate this further, we examined which specific words within the LIWC “death” category were driving the result, identifying several key words: “die,” “dead,” “died,” “dying,” “war,” and “alive.” Within these mentions, the topic of death arose in a wide range of capacities spanning not only literal but also figurative references to death. For instance, in addition to references to specific people dying, death-related words were frequently used in hyperbole, jokes and chain posts, social problems involving mortality, and technology (objects ceasing to work). In general, the topic of death appears to be more frequently discussed by nonreligious Facebook users in a variety of contexts. This observation might also be explained by the absence of religious taboo. It may be that, like sexual topics and swearing, death is also seen as a vulgar or “profane” topic among religious groups. Another theory often raised in the context of mortality salience is “terror management theory” (Greenberg & Arndt, 2011), which argues that people seek out means to avoid dwelling on death. However, it is unclear how the theory applies in this instance. It could be that religion guards against mortality salience, thus obviating the need to discuss death while also providing a boon to psychological well-being. On the other hand, previous research has shown that priming people with thoughts about death increases God beliefs (Norenzayan & Hansen, 2006). Further unpacking this and the other language findings remains a task for future work.
Limitations and Future Directions
This study was limited in several ways. First, participants provided their religious affiliation but they did not indicate their degree of religiosity. The majority of individuals were from the United States, where people commonly claim a Christian orientation in name only but do not practice regularly. Future research might consider the degree of religiosity as a continuous variable, identifying differences between those for whom their belief system is central to their lives as opposed to those who hold it as a peripheral concern. Cross-cultural differences might also be considered, especially considering different religious traditions. This study, while employing a much larger than average sample size, would require an even larger sample to compare different cultural contexts (countries) or to do subclass analysis between various religious and religious sects. While we were underpowered to perform these analyses in this study, we plan to do so when sufficient data become available.
Another limitation involves self-censorship. It is possible that due to self-monitoring and image management, expressed words reflect social norms more than actual attitudes. Religious individuals might be more cognizant of monitoring their language in socially appropriate ways, as reflected by a lower use of swear words, less discussion of sexuality, and reduced use of other somewhat socially taboo topics, like death. While we acknowledge that some self-monitoring is likely occurring in this sample, several studies suggest that people do portray their real personalities online (Kern et al., 2014a). Language use is difficult to monitor and control long term, especially its more subtle features, so the observed differences likely reflect more than mere self-censorship.
There were also many more individuals with a religious affiliation than without. This breakdown reflects average trends in religiosity in the United States, but the unbalanced sizes could skew results toward the dominant group. Further, religion is a multidimensional construct including aspects such as affiliation, practices, rituals, and experiences (Yaden, Iwry, & Newberg, 2016). While other linguistic studies have examined specific aspects of religion and spirituality, such as experiences (Yaden et al., 2015, 2017), future linguistic analysis studies should examine convergences and differences between various aspects of religion. For example, how do those who practice meditation or prayer on a daily basis but have low levels of belief compare with those who have high levels of belief but who do not engage in any practices? These questions will also require sample sizes larger than the present study.
Lastly, while this study reports quantitatively derived correlations between linguistic features and religious affiliation, interpretation of the meaning of these relationships is necessarily qualitative in nature. The pattern of results maps onto other studies linking religion with life outcomes and identifies possible mechanisms such as sociability and gratitude, but the processes involved are not directly tested. The correlates identified here provide hypotheses that can be tested in future studies using different methodologies.
Conclusion
Links between religious affiliation and life outcomes are multifaceted and occur through the accumulation of attitudes and behaviors across the life span. Language used on social media provides one marker of daily behaviors and persisting commitments. In this study, replicating and extending previous findings, the language of religious individuals was more emotionally positive and socially oriented, whereas that of nonreligious individuals was more emotionally negative and contained some indications of a more analytical cognitive style. Nonreligious individuals also reference the body and death more often. While it is unclear whether this phenomena results from a genuine difference in psychological orientation or linguistic norms enforced through social taboos, the content and magnitude of these differences warrants further investigation into the links between life outcomes and religion.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
