Abstract
Social media platforms have been struggling to moderate at scale. In an effort to better cope with content moderation discussion has turned to the role that automated machine-learning (ML) tools might play. The development of automated systems by social media platforms is a notoriously opaque process and public values that pertain to the common good are at stake within these often-obscured processes. One site in which social values are being negotiated is in the framing of what is considered ‘toxic’ by platforms in the development of automated moderation processes. This study takes into consideration differing notions of toxicity – community, platform and societal by examining three measures of toxicity and community health (the ML tool Perspective API; Reddit’s 2020 Content Policy; and the Sense of Community Index-2) and how they are operationalised in the context of r/MGTOW – an antifeminist group known for its misogyny. Several stages of content analysis were conducted on the top posts and comments in r/MGTOW to examine how these different measures of toxicity operate. This paper provides insight into the logics and technicalities of automated moderation tools, platform governance structures, and frameworks for understanding community metrics to interrogate existing uses of ‘toxicity’ as applied to cultural or social subcommunities online. We make a distinction between two used terms: civility and toxicity. Our analysis points to a tension between current social framings and operationalised notions of ‘toxicity’. We argue that there is a clear distinction between civility and toxicity – incivility is a measure of internal perceptions of harm within a community, whereas toxicity is a measure of the capacity for social harms outside of the bounds of the community. This nuanced understanding will enable more targeted interventions to be developed to destabilise the internal conditions that make groups like r/MGTOW internally ‘healthy’ yet externally toxic.
Over the past few years social media platforms have been struggling to moderate at scale (Gillespie, 2020). At the same time, they have come under fire for failing to mitigate the risks of perceived ‘toxic’ content on their platforms (Gillespie et al., 2020). In an effort to better cope with content moderation, to combat hate speech, ‘dangerous organisations’ and other bad actors present on platforms, discussion has turned to the role that automated machine-learning (ML) tools might play (Blackwell et al., 2017; Gorwa et al. 2020). Automated tools and processes provide one of the only means in which to address the overwhelming amount of content that is created and shared on social media platforms at scale. Shifting to automated systems to at least partially help with content moderation is an inevitable step in the governance of large-scale platforms (Rieder and Skop, 2021). The concerns, as such, are not about whether automated systems are ‘good’ or ‘bad’ or whether they should be used or not but rather how, and in which part of the process, should they be employed, and what the cultural implications are of this inevitable shift (Gillespie et al., 2020; Rieder and Skop, 2021).
The development of automated systems by corporate social media platforms is a notoriously opaque process, and as Van Dijck et al. (2018) make clear, public values that pertain to the common good are at stake within these often-obscured processes. One such site in which social values are being negotiated is in the framing of what is considered ‘toxic’ by corporate platforms in the development of automated moderation tools. This paper makes a distinction between two used terms in discussions of online behaviour: civility and toxicity. As Reider and Skop (2021: 12) argue, the definition for ‘civility’ mirrors that of ‘toxicity’ and was a defining concept adopted by the initial partners in the Perspective API project: the New York Times and Wikipedia. The conceptualising and operationalisation of a term such as ‘toxicity’ at the heart of automated moderation processes by corporate platforms results in the embedding of biased values that serve economic interest over the public good (Binns et al., 2017). The obscuring and ‘black box’ nature of these automated systems also works to shut down and prevent the type of broad civil discourse between a variety of actors (including users, governments, regulators, public interest groups) needed in the negotiation of values for a collective social good (Suzor et al., 2019).
It is precisely because of the concern of scholars such a Suzor et al. (Suzor et al., 2019) that in this paper we argue that there is a clear distinction between civility and toxicity – incivility is a measure of internal perceptions of harm within a community, whereas toxicity is a measure of the capacity for social harms outside of the bounds of the community. This distinction is essential if we are to work towards and negotiate a healthier platform society (Van Dijck et al., 2018). We draw on intersecting fields of platform studies, online community management, and digital methods to examine the r/MGTOW subreddit, a group known for its misogyny (Jones et al., 2020). The paper provides insight into the logics and technicalities of automated moderation tools, platform governance structures and frameworks for understanding community metrics. We interrogate three operationalisations of ‘toxicity’ as applied to cultural or social subcommunities online by applying them to analyse the content generated by r/MGTOW.
To provide a comparison of three different assessments of ‘toxicity’, we begin with an analysis of Jigsaw’s Perspective (one of the leading automated moderation tools) as operationalised by the Chrome extension Tune. We then conducted a qualitative content analysis of the comments based on a framework developed from the Sense of Community Index-2 (SCI-2) and Reddit’s internal community guidelines. The SCI-2 measures community solidarity (sense of community) across four broad categories: membership, influence, reciprocity and shared emotional connection (McMillan and Chavis, 1986) and is used by community management professionals and sociologists as a measure of community ‘health’.
The subreddit for MGTOW (also known as Men Going Their Own Way) was chosen as a site for examination because, at the time of data collection, they were one of the largest growing communities belonging to the ‘manosphere’ (a loose confederacy of men’s groups). Our study incorporated several stages of analysis of the top ten threads of all time, including all comments, on r/MGTOW as of October 2019. However, as of 6 February 2020 r/MGTOW was quarantined, and by August 2021, r/MGTOW was banned on Reddit (Thalen, 2021). A Reddit spokesperson publicly stated that the subreddit was banned in accordance with a policy that prohibits content that ‘incites violence or promotes hate based on identity or vulnerability’ (Thalen, 2021) – a policy that Reddit updated in June 2020, indicating a more nuanced and socially responsible approach to moderation that takes into consideration the distinction between civility and toxicity we outline above. Commenting on the updated policy, Reddit CEO Steve Huffman stated, ‘I have to admit that I’ve struggled with balancing my values as an American, and around free speech and free expression, with my values and the company’s values around common human decency’ (Newton, 2020), demonstrating the clash of competing interests and values in the negotiation and boundary-making of online social values.
Our analysis points to a tension between current social framings and operationalised notions of ‘toxicity’, ‘civility’ and sociological understandings of community health as framed by the SCI-2. Understanding these tensions exposes issues around the use of ML tools for the automated moderation of community spaces within platform cultures. Of particular interest is the finding that automated ML tools may be good tools for measuring individual community health, or ‘civility’, making them potentially useful to online community management practitioners. Community health in this sense, however, is not what we consider when we talk about ‘toxicity’ and the effects of groups such as MGTOW within the broader fabric of society. By incorporating sense of community measures into this study, we also open up new frameworks for understanding how groups such as MGTOW are able to successfully nurture, radicalise and recruit members online. This more nuanced understanding will enable more targeted interventions to be developed and put in place to destabilise the internal conditions that make these communities appealing to newcomers and a source of thriving sociality online.
Reddit, the manosphere and automated moderation
The ‘manosphere’ has been described by various scholars as a loose confederacy (Ging, 2017) of misogynistic men’s groups that roughly focus on ‘men’s issues’. A few of these groups include incels and Men’s Rights Activists (MRAs), both of which have gained attention in mainstream media for their responsibility in several violent attacks offline (Ging, 2017). Another subcommunity within the manosphere is the Men Going Their Own Way (MGTOW) group. This group is less explicitly violent as they are not motivated to ‘take action’. However, previous research (Jones et al., 2020; Wright et al., 2020) demonstrates that MGTOW is responsible for producing online harassment and misogyny. Each of these groups are distinct factions that can be hostile to each other but are tied together by a shared ideology of what has become known as The Red Pill (Zuckerberg, 2018).
Reddit, alongside 4chan, has received much attention for its role in hosting groups associated to the manosphere. Massanari’s (2017) analysis into the infrastructure of Reddit highlights how Reddit’s karma points and subreddit systems, along with the ease of account creation and their governance structure combine to create an environment in which ‘toxic technocultures’ can proliferate. Rafail and Freitas (2019) also highlight the importance of Reddit as a platform claiming that its anonymous and quasi-anonymous capabilities provide these men’s groups powerful tools for networking and mobilising. Reddit as a platform retains the culture of a traditional message board and is an aggregate of user generated content consisting of posts and comment threads. There are very few restrictions for registering an account, with users allowed to create multiple accounts with pseudonymous usernames. As such, the structure and culture of Reddit (at least historically) provides an optimal space for manosphere groups to post and share contentious content in a self-reinforcing community (Rafail and Freitas, 2019).
Reddit has had an increasingly tenuous relationship with the manosphere in part due to mounting public attention triggered by several fatal attacks instigated by self-identifying incels. The media attention surrounding these attacks pressured Reddit to respond to the threat posed by radical manosphere groups. As a result, Reddit banned several incel groups and associated subreddits in the period 2017–2019 (Solon, 2017). In addition, on 7 November 2017 Reddit updated their user policy to prohibit content that ‘encourages, glorifies, incites or calls for violence or physical harm against an individual or group’ (Solon, 2017) and again in 2020 to more explicitly address violence ‘based on identity or vulnerability’ (Thalen, 2021). While these more explicitly ‘violent’ groups were the first to feel the brunt of Reddit’s new policies, communities such as r/MGTOW have also found themselves falling foul of Reddit’s community guidelines, indicating a shift in Reddit’s framing of ‘toxicity’ as an outward facing social construct.
Potts and Harrison (2013) argue for the importance of understanding the rhetorical constructions of platforms like Reddit as they are reflective of the cultures they support. One aspect they identify as key to influencing and shaping the rhetoric on Reddit, especially in specific subreddit communities, is the fact that the ‘makers and maintainers’ of the subreddits are participants themselves (Potts and Harrison, 2013: 1). This point is essential for understanding the manosphere subreddits – the moderators of these communities maintain different community standards and ideologies than the Reddit administrators. Hence, the quarantining or banning of a subreddit is a decision made by Reddit administrators and not the community moderators. Community moderators, on the other hand, enforce the boundaries, norms and practices that are deemed acceptable within the specific subreddit community.
There have been several efforts by researchers to theorise and measure toxicity, hate speech and misogyny in manosphere subcommunities to pave a way for detecting misogynistic hate speech. (Marwick and Caplan, 2018) conducted a critical discourse analysis on the word ‘misandry’ and examine how MRA terminology evolves, disseminates and permeates through social media. Zuckerberg (2018) examined the rhetoric within the manosphere, specifically focusing on how they draw on antiquities to justify their ideology and (Jones et al., 2020) developed categories of online harassment produced by popular MGTOW users on Twitter.
There have been several recent developments in terms of deploying automated tools and ML processes to detect toxicity within the manosphere. In 2019, Farrell and colleagues conducted a large quantitative study of the misogynistic language used across several manosphere subcommunities on Reddit. They recognised that existing automated methods for analysing online misogyny are ‘scarce’ (Farrell et al., 2019) and created their own lexicons grounded within feminist theory and feminist critiques of language. Farrell et al. (2019) found that newer manosphere subcommunities (such as incels and MGTOW) contained more extreme violence than older, more traditional manosphere groups (such as MRAs and PUAs). In particular, they found r/MGTOW contained the equal highest amount of misogynistic content, violent attitudes, sexual violence and physical violence (Farrell et al., 2019: p. 93).
Expanding on their previous work, Farrell et al. (2020) adopt a socio-linguistic approach to analyse the use of jargon and ‘neologisms’ within manosphere groups to develop a better understanding of community development and identification. What their work signifies is a push toward developing automated tools that can better detect and interpret neologisms, that is newly created words, and specific rhetoric. This thinking is in line with earlier work on communities in which ‘shared language’ is one of the mechanisms used to demarcate in-group (member) and out-group status, reinforcing the sense of belonging amongst the group (McMillan and Chavis, 1986).
While there is a growing field of academics developing automated tools to detect misogynistic hate speech, existing tools such as Perspective are not specifically attuned to these types of lexicons. What sets Farrell et al.’s (2019, 2020) work apart is that their method is grounded in feminist theory, whereas tools such as Perspective are developed by corporate platforms and are entangled in competing economic interests. With these considerations in mind, this paper explores the interplay between competing measurements of toxicity for the moderation of misogynistic subcommunities on Reddit. We consider how toxicity is operationalised by an automated moderation tool, and how additional perspectives such as platform interests and broader social pressures influence metrics of toxicity and ultimately moderation outcomes.
Defining toxicity and community health
At the crux of online moderation (both human and automated) is the notion of what constitutes ‘toxicity’ within online expressions and communities. Here, we review the literature surrounding existing definitions of toxicity used for moderation and that underlie current automated moderation techniques as well as the theory surrounding measures of community health and how we can understand the implications for the broader social good.
There has not been a universally agreed upon definition of what constitutes toxicity online with scholars labelling unwanted behaviours as ‘hate speech’, ‘online harassment’, ‘trolling’, ‘abuse’ and more. What constitutes hate speech has been the focus of numerous studies that have sought to define the phenomenon within online contexts. For example, Warner and Hirschberg (2012, p. 19) define hate speech as ‘abusive speech targeting specific group characteristics, such as ethnic origin, religion, gender, or sexual orientation’, while Malmasi and Zampieri (2018) work to create a distinction between general profanity and hate speech and the struggle to differentiate between the two. In addition, a host of research grounded in feminist theory driven by Emma Jane has set out to conceptualise the gendered nature of hate speech, which Jane conceptualises as ‘e-bile’ (2014) and has also been referred to as online harassment (Jones et al., 2020).
Some of this research has examined the potential for ML tools to detect different understandings of hate speech and toxicity online. Numerous scholars have drawn attention to the limitations of these tools in practice and the propagation of racial and gender biases encoded into these automated tools. Park and Fung (2017) discuss the difficulty of automating abusive language detection because of the subjective and individualised nature of annotating examples to develop an initial dataset to train ML tools. Park and Fung (2017) identify how previous attempts have failed due to the struggles of defining ‘abusive language’, which makes it difficult for non-experts to annotate datasets without specific domain knowledge. Waseem and Hovy (2016) also raise issues in terms of using annotators who lack specialised knowledge, particularly around race and gender theory and how privilege intersects with the experience of hate speech, and how current automated hate speech detection processes are not effectively grounded in gender and race theory. The difficulties these scholars identify is largely a result of the fact that current ML tools measure civility via comment level analysis, which demonstrates internal group health rather than outward facing toxicity that may represent a group’s ideology.
More broadly, there have been notable critiques highlighting the dangers of relying on automated tools, including the inequalities that automated punitive approaches lead to (Eubanks, 2018) and how automated tools reproduce and reinforce oppressive structures (Noble 2018). Safiya Noble’s (2018) influential work emphasises the significance and impact of the obscured and less visible technological and platform infrastructures that shape and govern content and communication online and demonstrates how these automated processes have ramifications for broader social, cultural, political, regulatory and economic issues. These concerns tie into the existing debates that draw attention to platform politics and how platforms attempt to frame themselves as politically neutral intermediaries, yet through their design, they enact and co-produce political views (Gillespie, 2018).
In pointing out that it remains tempting to study the social dynamics on platforms while ignoring the platforms themselves, Gillespie (2015, p. 1) raises awareness of the ‘socio-technical dynamics, context-specific realities, and political economic dynamics of social media,’ making it clear how the ‘technical design, economic imperatives, regulatory frameworks, and public character’ of platforms have distinct impacts in shaping what users do and how they behave. An emerging field of literature has developed in an attempt to identify the often non-transparent processes of moderation that platforms engage in (Gillespie, 2018). However, Gillespie (2015, p. 1) criticises platforms for regularly downplaying these efforts of intervention, except in the specific circumstances when it is beneficial for the platforms to trumpet their interventions. This is reflected in Reddit’s announcement of their updated anti-harassment policy, strategically timed in the lead up to the 2020 US presidential election and the banning or quarantining of manosphere groups in reaction to fatal attacks.
What is apparent in the cacophony of research attempting to shed light on different dimensions of hate speech or toxicity online is that what counts as toxic is under constant negotiation by the public and end users, and by the platforms and technological developers. In pointing out the tensions this produces, this paper aligns with Gillespie’s (2020, p. 3) assertion that labelling something hate speech is ‘not an act of classification’ but rather ‘a social and performative assertion that something should be treated as hate speech, and by implication, about what hate speech is.’
To this end, our study takes into consideration differing notions of toxicity – community, platform and societal. Specifically, we propose that a community health framework, and the notion of civility can help us understand and interpret how toxicity is measured and operationalised by ML tools as opposed to platform and social understandings of toxicity (as described within platform community guidelines and public reactions). In keeping with the work outlined above (Farrell et al., 2019; Park and Fung 2017), we argue that what ML tools like Perspective currently measure best is in-group or community health as figured by the Sense of Community Index-2 (Chavis et al., 2008), while recent moderation outcomes on Reddit demonstrate an outward facing understanding of toxicity (toxic to society at large).
In the original theoretical Sense of Community (SOC) framework developed by McMillan and Chavis (1986, p. 9), they define SOC as ‘a feeling that members have of belonging, a feeling that members matter to one another and to the group, and a shared faith the members’ needs will be met through their commitment to be together’. For the most part, scholarly literature and the community management profession agree that a strong sense of community or in the context of online communities, sense of virtual community (SOVC), means a healthy community (Bess et al. 2002: p. 14). Interestingly, for the context of this paper, past research has criticized SOC for having a ‘dark side’ (14) in which ‘racist groups such as the Klu Klux Klan, would potentially meet the criteria for a healthy community’ (14). This thinking, however, fails to understand that SOC effectively measures ‘in-group’ health – civility – in relational communities rather than the group’s health within a broader social framework, that is, their toxicity.
While the four pillars proposed by McMillan and Chavis (1986) – feelings of membership, feelings of influence, integration and fulfilment of needs and shared emotional connection – appear to provide a one-size-fits all approach to measuring community health, their expression differs from community to community. For example, in a community such as the aforementioned Klu Klux Klan, language and rituals that express a commitment to white supremacy are actually modes of affiliation (membership) with the in-group, and the same applies to expressions of misogyny in groups such as MGTOW. In other words, these expressions represent cohesion to social norms within the group, which Blanchard’s research (2007; Blanchard and Markus 2004) has shown is a strong measure in determining SOVC. It is precisely for this reason that online community management professionals measure community health contextually and why the distinction made in this paper between civility and toxicity is so important if we are to address the harms these communities cause.
Method
Our study is structured around an analysis of the r/MGTOW subcommunity on Reddit. We used the Python Reddit API Wrapper (PRAW) to collect all comments from the top 10 sub-threads for the year listed on r/MGTOW, on 13 July 2019, equalling a dataset of 922 comments. We then conducted a two-fold analysis of this dataset, which was contextualised by a digital ethnographic analysis of the broader r/MGTOW subreddit.
The first stage of the analysis involved running Perspective to return a ‘toxicity’ rating for each comment in the dataset to measure toxicity as defined by a leading automated moderation tool. To do this, we used the Chrome extension Tune, which Alphabet released in March 2019 and which operationalises Perspective by rating and filtering comments on several popular platforms including Facebook, Reddit, Twitter, YouTube and Disqus (Jigsaw, 2019b). We then manually browsed the top ten threads and recorded the rating displayed by Tune for each comment.
There are several settings that demonstrate the categories of toxicity Tune evaluates: ‘quiet’, ‘low’, ‘medium’, ‘loud’ and ‘blaring’. No official definitions are provided by Jigsaw for each of these categories but we can assume they correlate with the numerical toxicity score (between 0 and 1) that Perspective returns (e.g. scores attributed 0.8 or more would be labelled ‘blaring’ while scores greater than 0 but equal or less than 0.2 would be considered ‘quiet’). We recorded how Tune filtered comments for each of the toxicity levels.
Part two of the data analysis involved a qualitative content analysis on the same dataset. We based our analytical and coding approach on Schreier’s work (2012), which involves developing a coding frame and then qualitatively coding the data. Our concept-driven coding system was based on the Sense of Community Index-2 (Chavis et al., 2008) and Reddit’s 2020 Content Policy guidelines. While a single researcher undertook the coding, a multiphase approach was taken that included several stages of review in which three researchers coded a sample of the data and cross-checked the results for consistency at different points in time.
Reddit’s Content Policy, as listed in June 2020, contains nine rules, which we adapted into the qualitative content analysis framework before coding all of the comments to understand how the content may or may not breach Reddit’s content policy. These nine rules detailing the type of content that is prohibited on the platform are as follows: illegal; involuntary porn; sexual or suggestive content involving minors; encourages or incites violence; threatens, harasses, bullies or encourages others to do so; promotes hate based on identity or vulnerability; personal or confidential information; impersonates an individual or entity to mislead/deceive; solicit transaction or gift involving certain goods or services.
The SCI-2 provides a descriptive framework for understanding the different dimensions that interrelate to produce a sense of community. Although SOC and the original Sense of Community Index (SCI) were developed for offline community contexts, the SCI-2 has been adapted to include factors unique to online communities. This is in keeping with the work done by several scholars on adapting the SOC measures for virtual communities (Abfalter et al., 2012; Blanchard and Markus, 2004; Blanchard, 2007). This work on a sense of virtual community (SOVC) highlights several different environmental factors that impact how SOVC manifests and expresses online (Abfalter et al., 2012) and recognises the importance and need for adapting any existing measures of SOC for a virtual context (Abfalter et al., 2012; Blanchard and Markus, 2004).
Since McMillan and Chavis’ initial conception of SOC in 1986, Chavis et al. (2008) redefined the measures of SOC in the development of the SCI-2. The SCI-2 has been further tested and refined on a range of different types of communities (Abfalter et al., 2012: p. 401) and consists of 24 closed-ended items based around the original four dimensions. While normally delivered as a survey to community members, we have adapted the SCI-2 into a framework to guide a qualitative content analysis of the posts and comments made within the community. As ‘outsiders’ studying a community known to be hostile (particularly to women) we present this methodological approach as a way for academics to research extremist or hostile online communities without putting themselves at risk and within an organic setting in which the researchers do not influence the communication within the community. Feminist researchers have previously recognised the particular challenges of researching men as women and ‘outsiders’ (Vogels, 2019) entering a male-dominated space. These challenges helped inform the design of the study along with several broader ethical concerns raised in the study of online communities (Franzke et al., 2020).
Measurements of Toxicity: Perspective
To understand how toxicity is defined and operationalised by Perspective and within Reddit’s Content Policy the results from the first stage of the analysis are detailed below. Figure 1 illustrates the results returned by Perspective, which demonstrate that roughly 30% of all comments were deemed not toxic and thus were unfiltered, while the majority of the comments (roughly 46%) were labelled as having quiet to low levels of toxicity (meaning they received a score 0 < and ≤ 0.4). Only .67% of comments received the highest rating of toxicity (blaring or ≥0.8). Overall, approximately 70% of all comments were rated by Perspective as toxic across a scale from quiet to extreme. Toxicity results of comments as measured by Perspective via the Tune extension.
It is important to note that subreddits are generally considered to be ‘self-policed’ (Chandrasekharan et al., 2022) in that they have their own moderators who are themselves members of the community and who enforce the community’s specific guidelines. These moderators are separate from Reddit’s official administrators but are expected to comply with the broader platform guidelines. If subreddit moderators fail to comply and moderate their community effectively, the subreddit becomes at risk of quarantining and banning. The comments within this dataset had already been subjected to moderation by the subreddit moderators and deleted or removed comments were not analysed. There were also discussions within the subreddit that conveyed fear and a heightened awareness about the potential of being quarantined or banned. As a result, we can assume that the most toxic or extreme content had been removed by subreddit moderators hence the low percentage of blaringly toxic content and that the remaining content was considered acceptable from the community’s standpoint.
The results demonstrate that comments containing common profanities register a higher toxicity rating than comments that use in-group rhetoric associated with the red pill ideology (which often received 0 to low toxicity scores). The comments that received the highest toxicity rating (that of ‘blaring’) did not just include frequent profanities but also expressed aggression targeted toward Reddit or other users (e.g. ‘Fuck Reddit. Fuck this generation’ ‘and ‘Fuck all of you’). This contrasted with some of the comments that received low to no toxicity ratings which expressed aggression to targets outside of the subcommunity and received support within the community via upvotes and replies expressing agreement.
Comments that were not filtered included those that made fun of ‘fat people’, ‘soy boys’ and ‘manginas’, while several comments that registered low levels of toxicity encouraged or even incited violence outside of the community (for example, one comment encouraged a user to go and ‘dump some beta rage’ – although this was not endorsed by many users within the community). Many of the unfiltered comments did contain strong positive sentiment, including supportive statements such as ‘Awesome. Keep up the good work,’ ‘A hero to me,’ ‘Good work, sir,’ ‘Congrats, mgtow’. These examples reinforce existing findings about the limitations of ML tools for detecting hate speech and toxicity and the need for greater contextual understanding around language and image use within these communities (Farrell et al., 2019; Marwick and Caplan, 2018; Park and Fung, 2017; Zuckerberg, 2018).
On the Perspective website, ‘toxic’ is defined as ‘a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion’ (Jigsaw 2019a). The stated purpose and aspiration for Perspective is to minimise comments that shut down conversation with the goal of bettering and enhancing conversations online. Further, the developer information reveals that Perspective scores comments based on ‘the perceived impact a comment might have on a conversation’ (Jigsaw, 2019a) in other words, a comment’s discursive function, rather than the attitudes and beliefs depicted within the comment. This is remarkably similar to Reddit’s framing of the problem of toxicity online within their content policy, although Reddit’s policy goes further, incorporating behaviours such as harassment and explicit threats.
There have been a number of recent updates and changes to Reddit’s Content Policy. On 30 September 2019, Reddit introduced a major change in the development of an anti-bullying and harassment policy in which harassment was defined as ‘anything that works to shut someone out of the conversation through intimidation or abuse, online or off’ and that would result in ‘discouraging a reasonable person from participating on Reddit’ (Robertson, 2019). This shift resulted in the banning of several subreddits including that of a popular incel subreddit. Further, on 30 June 2020 there was another update, this time with an explicit statement that ‘communities and users that promote hate based on identity or vulnerability will be banned’ (Reddit 2020). As a direct result of this update, 200 subreddits were banned.
It is this updated rule that arose as a concern within our dataset. There were no instances in our dataset that breached six of the nine rules listed in Reddit’s Content Policy, however in terms of the ‘promotion of hate based on identity or vulnerability’, we recorded 392 (42.5%) instances. Reddit administrators listed three criteria in terms of how this most recent policy is measured: abusive titles and descriptions; high ratio of hateful content (based on reporting and ML filtering); and positively received hateful content (high upvote ratio on hateful content) (Reddit 2020). Other breaches of the Content Policy rules were minimal with one instance of sexual or suggestive content involving minors and 16 comments that were classified as ‘threatens, harasses, bullies or encourages others to do so’.
While there are obvious commercial interests involved in how Reddit (and Perspective) define toxicity, it is clear from the updates to Reddit’s policy and the subsequent quarantining of r/MGTOW that Reddit’s policies have both an internal and an external approach to considering toxicity. The behaviours they list frame toxicity both from an internal community perspective and from a broader societal viewpoint, while the ‘hateful content’ category frames toxicity at a societal level. A community health perspective goes some way to explaining how communities such as r/MGTOW may appear to have low toxicity scores whilst maintaining a socially toxic ideology.
A community health perspective
The notion of ‘community health’ offers a way to elucidate the disparity between Perspective’s relatively low toxicity score and Reddit’s own measures of toxicity, resulting in the quarantine and ultimate banning of the r/MGTOW subreddit. This is particularly apt as Reddit itself has often been referred to as a ‘community of communities’ (Massanari 2017: p. 331). Our analysis of r/MGTOW using an adapted version of the SCI-2 survey enabled us to explore the four key dimensions of community health – membership, influence, reciprocity and emotional connection – that lead to an overall sense of belonging in a community. Our results here align with Perspective’s reading of the community as having low levels of toxicity and highlight the need for a distinction between civility and toxicity.
Within McMillan and Chavis’ (1986) framework, membership within a community involves several criteria including shared language, shared rituals, defined boundaries of membership (who is in, who is out) and how one becomes a member. The r/MGTOW community shows clear signs of a well-developed and policed sense of membership. Over time, the idea of what constitutes a member has been distilled as the community has split from the original MRA and incel groups. This is expressed through in-jokes, stories of how people have evolved, and dismissive attitudes towards those they perceive are in earlier stages of the red pill ‘awakening’.
The use of specific language such as references to the ‘plantation’ 1 and ‘monkey-branching’ 2 , is evident in this community, and although these concepts convey a highly misogynistic ideology they were also not picked up by Perspective. Language is a particularly important aspect of community (McMillan and Chavis 1986) as it demarcates not only what can and cannot be said in a community, but also how it can be articulated. Language was used to further differentiate MGTOW from earlier MRA groups. Pseudonymous usernames were frequently used to signal one’s membership with many usernames incorporating MGTOW along with other prefixes and suffixes. These usernames worked to signal one’s commitment to the community and the MGTOW lifestyle (references to being a single or divorced man), one’s current stage in their MGTOW journey (‘going monk’ or ‘ghost’) or even indicators of their stage in their ‘awakening‘ (depressed, ‘a mess’ or a ‘disaster’). Usernames were also used to signal members’ self-proclaimed roles within the community (joker, ‘memester’ or philosopher). User roles are often ways to signal community influence and recognition of users’ roles ties into another important facet of community health: influence. Online communities typically have multi-dimensional influence, members of the group can influence one another, and the community itself aspires to influence the wider world.
The fact that MGTOW is a separatist movement generates an interesting phenomenon, which was evident within our dataset: MGTOWs are MGTOWs because they feel they have no external influence. Community members talk about actively surrendering their influence in wider society by abandoning the traditional way of interacting with women and are defeatist in their views of society – believing that the activism engaged by MRAs is pointless. Occasionally within the group, a new user posts a call to action, encouraging members to get involved in an antifeminist protest, however these rallying calls are met with derision and a clear boundary is drawn separating MGTOW identity from those aligned with men’s rights activism. In this way, the perceived absence of influence is an existential feature of membership and group identity.
On first glance the community appears to score badly on the internal influence portion of the SCI-2 framework. That is not to say that there are not influential members within the group, or that their absence means r/MGTOW is less ‘healthy’. Research has shown that influence may not be as important a factor in SOVC (Blanchard and Markus, 2004; Blanchard and Markus, 2004). That said, there was evidence of past influential members within the r/MGTOW community, and they were often evoked to add weight to discussions. These figures also serve the useful function of distinguishing ‘community elders’ who remember them, from the newer members who found their way to the community as their older communities were closed down in previous Reddit purges. Importantly, the mythologising of such figures also provides a sense of shared history within the community (Anderson, 1983).
The lack of clear influencers within r/MGTOW does not mean an absence of influencers within the broader MGTOW community. It was clear from commentary that there are several major MGTOW influencers whose thinking helps to set the norms in this community. In most cases, these influential MGTOWs had popular YouTube channels in which they discussed the MGTOW lifestyle, philosophy, cultural artefacts they deemed to be MGTOW (such as ancient stoic texts – see Zuckerberg, 2018) and promoted their own content (e.g. self-published books). References to these popular MGTOWs and their YouTube channels often arose when group members discussed the future of the community and reflected on their own MGTOW journey, recalling when they first started ‘going their own way’ and crediting these founding members for starting their ‘real’ education.
The sharing of these stories and the strong sense of commitment to and policing of the MGTOW life shown by the members of the r/MGTOW community indicate a strong sense of ‘integration and needs fulfilment’ (McMillan and Chavis, 1986; Gui, 2018). The stories themselves reinforce the community identity and shared values as well as providing new members a way of proving their fit and commitment to the community (McMillan, 1996). Responses by older members to requests for advice regarding their ‘awakening’ and venting from these newer MGTOWs, as well as boundary maintenance as members came across from other communities, create an atmosphere not only of internal competence within the community (Gui, 2018) but also of a broader social contract within the MGTOW movement.
Several challenges have resulted from the quarantining and banning of associated subreddits as r/MGTOW received an influx of new members of the manosphere. This has resulted in r/MGTOW community elders reminding members of the values and goals of the MGTOW movement and, in some instances, suggesting that new members would fit elsewhere. It is unsurprising then that discussions of the future of the community frequently arise in response to the changing content guidelines by Reddit along with any quarantining or banning of related groups. There is also concern about the possibility of a future ban for the subreddit and so the community has a strong undercurrent of priming for a move – a prediction that came true in August 2021.
As a result, members of the community attempted to create backup forums on owned spaces or alternative platforms. Posts about concerns for the future of the community in the face of stricter enforcement of Reddit’s platform guidelines express that MGTOW is greater than any forum as they position it as a lifestyle and philosophy. These statements work to reinforce community membership as a commitment beyond the platform and beyond a specific forum. The discussions about migrating the user generated content from Reddit to another platform also highlight the importance and value of the community to its members in terms of the drive and motivation to preserve what are essentially cultural artefacts.
Our findings show a community with high levels of shared values, priorities, needs and goals as well as solidarity. This extends beyond the sharing of information about living the MGTOW life into general life advice and support. This brings us to the final pillar of the SCI-2: shared emotional connection. Shared emotional connection is in many ways the foundation of a community, and it is generated from the shared history and storytelling within the group and the support that members give to one another (Gui, 2018). It allows the formation of deep bonds within the community leading to a strengthening of SOC and commitment to the community (McMillan and Chavis, 1986; Gui, 2018).
There is a distinct homosociality within MGTOW spaces in which MGTOW members actively encourage a focus on self-care and self-improvement, often about improving their career and physical appearance (there are many references to going to the gym for instance). Users sharing their career successes with the community are met with support and encouragement by other members. Some members seek out career advice while others inspire each other to find new hobbies or new motivation to re-engage with past interests. However, amongst these expressions of support and care were consistent references to women broadly.
In her work examining pickup artist communities, Zuckerberg (2018, p. 124) argues, ‘anything that reinforces homosocial bonds between men opens itself to suspicions of homoeroticism’. She outlines how the ‘atypical performance of masculinity’ that emerges from a hyperfocus on self-care and grooming that is present within PUA spaces results in an insecurity about one’s masculinity and gives way to expressions of homophobia (Zuckerberg, 2018: p. 125).
We find a similar dynamic within the MGTOW community in which misogyny and homophobia are often employed, in the form of comments that deride women or effeminate men peppered throughout conversations, as a way of disrupting and also justifying intensely homosocial moments. For example, in a thread congratulating and commending one user’s career achievements, several users qualified their congratulatory comments with misogynistic statements about women in the workplace. This allowed them to express care and emotional support for each other while preventing or minimising the risks of homosexual accusations. In this way, misogynistic comments were often used as a way of forging emotional connections with each other.
Discussion: The interplay of toxicity from three perspectives
In this paper we examine the r/MGTOW community using three separate measures of ‘toxicity’: Jigsaw’s ML tool Perspective; top-down community understandings of toxicity (from Reddit); and, community health as measured via the SCI-2. Both Perspective and the SCI-2 measures show a healthy and strong community, that is non-toxic in a way that is more appropriately indicative of civility. Reddit’s community guidelines, on the other hand, indicate high levels of toxicity associated with r/MGTOW. We propose then, that measures of ‘toxicity’ must understand the nuances between these three measures and researchers should consider whether they are examining intra-community toxicity (civility) or extra-community toxicity when discussing online communities more broadly.
The analysis also points to the challenges inherent in detecting and managing the rise of hateful, anti-science and conspiracy based online groups across platforms. The success of these communities is largely that they are in fact ‘healthy’ discursive spaces within the contexts in which they are formed. The danger inherent in this is that the strength of social support from like-minded people is a strong component in the radicalization process and the propagation and spread of hateful ideologies (Frissen, 2021). As these communities grow and mature these ideologies may become more distilled and radicalized members may form splinter communities (Irreberri and Leroy, 2009) that are more prone to physical violence.
The health of these communities also often makes them difficult to find, particularly where they operate as closed groups on platforms. ML tools such as Perspective tend to read these groups as having low levels of toxicity not only because of the strength of interpersonal bonds amongst members, but also because these groups develop their own languages – textual and visual – that are highly contextual in nature and shift to elude ML moderation tools. The other issue is that platforms often rely on user reporting as part of their discovery process, where communities are healthy but breach platform guidelines these mechanisms fail.
The strength of the connections within the group, the boundary policing of members and the focus on disenfranchisement at the hands of a feminist outgroup are all key factors in the first step toward radicalisation (Birdwell, 2020). Community health metrics then offer us insight into the appeal of communities like MGTOW because they enable us to better understand the mechanisms by which members are supported both ideologically and emotionally once they enter. In turn, this may allow for a more targeted approach to the destabilisation of these communities and for intervention programmes.
Reddit provides an interesting case study, not only because it is known as a ‘community of communities’ but also because it has recently and ongoingly actively changed its community guidelines in an attempt to counter the growth of communities such as r/MGTOW on the platform. Its combination of top-down and bottom-up moderation structures, in which a greater agency toward self-governance is given to communities on the platform than on other platforms such as Facebook, indicates a platform attempting to generate a better balance between centralized and decentralized control. At the same time, it is clear that Reddit is focused not only on fostering intra-community health but also extra-community health, recognising that these things are sometimes in conflict as with the r/MGTOW community.
This study points to a complex interplay between competing frameworks of toxicity. Perhaps, the tension and pressure exerted by a broader society articulating and negotiating what is considered toxic onto platforms such as Reddit, pushing them to update their content policy and ban groups that have been publicly framed as toxic or dangerous, is an indicator of a healthy democratic process. We note however, that there needs to be some nuance in understanding how communities negotiate these guidelines.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
