Abstract
Stigma negatively impacts minority college students’ mental health, yet little is known about how stigma processes differ across minority groups. This study examines how key components of stigma—labeling, stereotyping, separation, status loss, and discrimination—emerge across minority statuses (gender, race, religion, profession) and how they intensify through intersecting identities. We analyzed 331,353 posts from r/college using a Stereotype BERT model with stance detection. We extended the model to identify status loss and discrimination using semantic distance measures. Posts referencing professional identity were most associated with stereotyping, while those indicating racial identity showed higher frequencies of status loss and discrimination. Posts with intersecting identities showed the strongest associations with compounded stigma. Findings underscore the importance of intersectional approaches to understanding stigma, especially in addressing the compounded vulnerabilities faced by racial minorities and multiply-marginalized students.
Introduction
Higher education in the United States has diversified in gender, race/ethnicity, nativity, and religion.1,2 In 2022, college enrollment rates were 36 percent for Black students and 33 percent for Hispanic students compared with 41 percent for White students, with women (44 percent) also outpacing men (34 percent). 3 In addition, >1.1 million international students from varied racial, ethnic, and religious backgrounds enrolled during the 2023–2024 academic year. 4 Despite this growing diversity, minority students continue to face stigma—defined as “an attribute that is deeply discrediting” 5 arising from intersecting identities such as gender, race, religion, and institutional affiliation, which often leads to stereotyping and exclusion with adverse effects on academic performance and mental health.6,7 Minority students with stigma report poorer mental health than their nonminority peers and often conceal these struggles. 8
Framing stigma as a dynamic process rather than a static condition, Link and Phelan’s framework 9 conceptualizes stigma as a process involving four interrelated. This framework has been applied to various marginalized groups,10–12 and prior studies have examined these components individually across gender, race, religion, and profession.13–17 However, traditional research methods, which typically rely on small-scale surveys or interviews,18,19 often fail to capture the complexity and nuance of stigma experiences. Although a few studies20,21 have applied computational approaches to detect stigma in online data, research specifically focusing on college students remains limited, and scholars have noted that stigma is frequently framed too narrowly in individualistic terms. 20
To address this gap, our study has two main objectives. First, drawing on Link and Phelan’s conceptualization of stigma 9 as a multistage process, it investigates how the key elements of stigma—labeling, stereotyping, separation, status loss, and discrimination—manifest across different minority identities (e.g., gender, race, religion, and professional affiliation) in online college communities. Second, it examines how stigma elements interact across intersecting minority identities, highlighting the compounded experiences of students facing multiple forms of stigmatization. To achieve these aims, we analyze large-scale discussions on Reddit, an anonymous, text-based platform where minority students can openly seek information, share experiences, and build communities—features that make it particularly suitable compared with other social media platforms—and thus offer novel insights into their mental health and social challenges.22–26
Data and Methods
Data collection
We use the subreddit r/college as our primary data source to examine college students’ experiences through the lens of Link and Phelan’s stigma theory. 9 Dedicated to discussions about college life—including academics, campus experiences, and student concerns—r/college had around 1.5 million members as of November 2024. It ranks 429th in membership and 70th in posting activity out of ∼138,000 subreddits, 27 placing it in the top 0.05 percent by activity. These metrics underscore its relevance and vitality as a platform for understanding student discourse. The subreddit encompasses a total of 331,353 posts and 1,973,710 comments with 144,756 unique authors over a date range spanning March 2008–December 2022. These statistics demonstrate the breadth and depth of discussions, offering extensive insights into the college experience.
Data preparation
To ensure data quality and usability for analysis, we apply a series of preprocessing steps to the submission datasets. For clarity, although “submission” is the official term used on Reddit, we refer to them as “posts” throughout our article. First, we remove records with titles or body texts marked as [deleted], [removed], [deleted by user], or left blank, thereby retaining only relevant and valid entries. Next, we merge the “title” and “selftext” fields so that both components can be analyzed together. In addition, we extract the year, month, and day from the “post_created_utc” column to generate a new “date” column in the YYYY-MM-DD format (see Table 1), which provides a clear temporal reference for each post.
Post Dataset Examples After Preprocessing
In this study, we focus on posts as our primary unit of analysis, excluding comments, as our main interest lies in understanding the concerns active members express about college life and how these concerns relate to stigma. While we acknowledge that comments contain valuable dialogue, our primary objective is to identify what students proactively choose to share as their concerns rather than how these concerns are discussed or resolved.28,29 By narrowing our scope to posts, we capture the issues members raise directly and obtain ample, high-quality data to address the objectives of this research.
Methods
We focus on four key labels—gender, religion, profession, and race—because these societal attributes have a strong influence on perceptions and biases in online communities. 30 We detail our procedure for identifying the stages of the stigma process (labeling, stereotyping, separation, and status loss/discrimination; see Table 2 for definitions and Table 3 for examples) by mapping posts into a vector space and visualizing them in two dimensions.
Definitions of Stigma Stages in Link and Phelan’s Framework
Examples of Stigma Stages Based on Link and Phelan’s Framework
Examining stigma
We analyze Reddit posts based on the four stages of stigma outlined by Link and Phelan 9 : (1) labeling, (2) stereotyping, (3) separation, and (4) status loss and discrimination. To detect stereotypes, we apply a Bidirectional Encoder Representations from Transformers (BERT)-based approach, Stereotype BERT, a transformer-based multiclass classifier designed to capture both explicit and implicit stereotypes. 15 Trained on the Multi-Grain Stereotype dataset—which integrates diverse sources such as StereoSet and CrowS-Pair 31 —the model demonstrates strong generalizability across linguistic and social contexts. The model outperforms existing classifiers, including the GPT series, in terms of precision, recall, and F1-score. Its reliability is further supported by explainable Artificial Intelligence (AI) tools such as SHAP 32 and BERTViz, 33 which confirm its focus on meaningful linguistic patterns. We validated the model’s performance; the StereoBERT model’s predictions on this dataset show a 90.67 percent average agreement with human labels across three coders. The interrater reliability, measured by Fleiss’ kappa, reaches 0.72 among the three coders, indicating a substantial level of agreement. 34
Labeling
We assign labels based on the presence of positive or negative stances toward four minority categories: gender, race, profession, and religion, following established methods in prior research. Posts reflecting a clear stance toward any minority group receive the corresponding label. The labeling scheme includes nine classes: four positive, four negative, and one for unrelated content. If the unrelated score exceeds all others, the post is unlabeled. Otherwise, it is assigned to the class(es) with the highest score(s). Since the classes are not mutually exclusive, a post may carry multiple labels—for example, addressing both gender and racial bias or showing both positive and negative stances within a single category. This multilabel approach captures the complexity and overlap common in social discourse.
Stereotyping
Following the initial labeling process, we detect stereotypes in each post. By exclusively targeting negative stance toward stereotypes, our model identifies stereotyping. We establish a threshold at the mean plus 1.68 standard deviations for each category, corresponding to approximately the top five percent of posts. A post is labeled as demonstrating a specific stereotype if its score exceeds this threshold.
Separation
To identify linguistic markers of separation, we apply a semantic similarity approach. We curate 12 reference sentences from Link and Phelan’s work and subsequent studies grounded in stigma theory.9,35–39 These sources provide definitions and examples that explicitly describe separation as arising from negative stereotypes—for example, “Labeled persons are placed into categories to accomplish separation of us from them.” Both posts and references are embedded using a pretrained Sentence-Transformer model, which involves tokenization with padding/truncation, mean pooling over token embeddings (weighted by attention masks), and L2 normalization. We compute cosine similarity between each post and all reference sentences, using the maximum similarity (i.e., minimum distance) as the separation score for each post.
We classify posts into four groups using thresholds at μ ± 1.68σ. To test robustness, we also evaluated μ ± 1.96σ and μ ± 2.58σ, with human coders judging μ ± 1.68σ to provide the most appropriate thresholds:
For binary classification, we label posts in Groups 1 and 2 as 1 (separation) and Groups 3 and 4 as 0 (no separation).
Status loss and discrimination
We extend our approach to the fourth stage of stigma—status loss and discrimination—using 18 reference sentences derived from the same rationale, prior studies by Link and Phelan or studies grounded in their stigma theory.9,36,38,40
The full execution of disapproval, rejection, exclusion, and discrimination. The almost immediate consequences of successful negative labeling and stereotyping are a general downward placement of a person in a status hierarchy.
Using the same embedding pipeline and cosine similarity measure, we compute the minimum distance from each post to the set of status loss/discrimination reference sentences. We then define thresholds (again, using the mean and standard deviation of the distance distribution) to partition posts into four groups ranging from strong to no alignment. Posts with distances in the strong or moderate groups are assigned a binary label of 1 (indicating the presence of status loss/discrimination), and those in the weak or no alignment groups are assigned 0.
Keyword analysis
We develop an analytical framework combining text processing, network analysis, and co-occurrence metrics to explore intersecting stigmatized identities among college students. After preprocessing posts by tokenizing, lowercasing, and removing punctuation, we extract high-frequency keywords related to identity and stigma. These keywords form a network graph, with nodes representing terms and edges showing their co-occurrence. This network highlights clusters of intersecting stigmas. In addition, we compute co-occurrence frequencies and apply statistical measures to assess the strength of associations, providing a nuanced understanding of compounded stigma in online discourse.
Findings
Using Stereotype-BERT, we classify posts into four primary categories (gender, race, profession, and religion) and six intersectional ones (e.g., gender and race). Table 4 shows example posts and label distributions. Gender-related posts often address dorm life dynamics, such as roommate issues involving overnight guests. Race stigma appears in posts referencing non-American ethnicities, including interactions with Chinese professors or students. Profession labels surface in critiques of academic versus industry experience among faculty. Religion-related posts typically mention Christian colleges or campus religious events. Posts with combined labels reveal more complex stigma. For instance, a gender and profession post describes a professor grading “pretty girls” higher than more qualified students, while a race and religion post highlights the challenges an openly gay student might face at a conservative Christian university.
Examples of Posts of Each Label
Figure 1 presents the proportion of stigma processes across labeling, stereotyping, separation, and status loss and discrimination, showing that the proportion of posts decreases substantially at each successive stage across all minority categories. When examining single minority status, profession exhibits the highest overall vulnerability to stigma. However, when considering stigma as a progressive process, gender- and profession-based minorities predominantly experience negative stereotyping (an early stage of stigma), while racial and religious minorities are more likely to encounter later stages such as status loss and discrimination. This pattern suggests that U.S. college campuses exhibit a form of exclusivity, in which statuses considered part of the “in-group” (e.g., profession and gender) are primarily subjected to early-stage stereotyping, whereas more immutable statuses (e.g., race and religion) face more severe consequences in later stages.

Distribution of stigma processes—stereotype, separation, and status loss and discrimination—across single-category minority groups. Note: Each panel represents the full set of posts (100 percent) within a single category (gender, race, profession, or religion) where minority status was identified. The colored segments illustrate the sequential stages of stigmatization: blue indicates the proportion of posts that involve labeling (identification of minority status), green shows the proportion progressing to stereotyping (a subset of labeling), red represents the proportion advancing to separation (a subset of stereotyping), and purple indicates the proportion reaching status loss and discrimination (a subset of separation). Percentages are calculated conditionally, meaning each successive stage is shown relative to the total posts in that category, illustrating the hierarchical progression from labeling through stereotyping and separation to status loss and discrimination.
Figure 2 presents the proportion of stigma processes across labeling, stereotyping, separation, and status loss and discrimination for six intersectional minority categories, revealing three notable patterns: (1) Gender and race shows exceptionally high separation rates (9.80 percent) compared with other intersections, (2) race and religion demonstrates elevated rates across all stages, particularly in separation (9.18 percent) and status loss (5.92 percent), and (3) intersections involving religion consistently show higher proportions in later stigma stages compared with those without religious identity.

Distribution of stigma processes—stereotype, separation, and status loss and discrimination—across combined-categories of minority groups. Note: Each panel represents the full set of posts (100 percent) within an intersectional category (e.g., gender and race, gender and profession) where minority status was identified. The colored segments illustrate the sequential stages of stigmatization: blue indicates the proportion of posts that involve labeling (identification of minority status), green shows the proportion progressing to stereotyping (a subset of labeling), red represents the proportion advancing to separation (a subset of stereotyping), and purple indicates the proportion reaching status loss and discrimination (a subset of separation). Percentages are calculated conditionally, meaning each successive stage is shown relative to the total posts in that intersectional category, illustrating the hierarchical progression from labeling through stereotyping and separation to status loss and discrimination.
Keywords common across all four subgroups are shown in purple, while those tied to three, two, or one subgroup are in green, blue, or yellow, respectively (see Fig. 3). For instance, gender posts include terms such as “friend” and “girl,” religion posts feature phrases such as “Christian school” and “god,” race posts are linked to “Greek life” and “studying abroad,” and profession posts include “engineering major” and “nurse.”

Stereotype keyword network. The red nodes at the center represent keywords that appear in all four labeled subgroups. Note: Yellow nodes represent keywords associated with a single subgroup, blue nodes indicate connections to two subgroups, green nodes to three subgroups, and red nodes to all four main subgroups (religion, race, gender, and profession).
The network reveals overlap among minority statuses, such as gender and religion sharing keywords such as “roommate” and “help,” gender and profession including “exam,” and gender and race featuring terms such as “school” and “make friend.” religion and race intersect with “ivy league,” while religion and profession include “textbook” and “degree,” and profession and race share terms such as “internship” and “engineering.” Posts combining gender, religion, and race use keywords such as “campus” and “tuition,” and those with religion, race, and profession share “admission” and “university.” Gender, religion, and profession posts include “class” and “graduate,” while terms such as “online class” and “student loan” are common across all labels.
In analyzing subreddit posts reflecting separation, the keyword patterns reveal a pervasive sense of disconnection and exclusion experienced by students across contexts (see Fig. 4). Posts with themes of separation reveal keywords such as “fake id” and “freshman,” highlighting both physical and perceived isolation.

Separation keyword network. Nodes near the center of the graph represent keywords that frequently appear when more minority groups overlap. Note: Yellow nodes represent keywords associated with a single subgroup, blue nodes indicate connections to two subgroups, green nodes to three subgroups, and red nodes to all four main subgroups (religion, race, gender, and profession).
When multiple stigmas intersect, language becomes more charged, signaling intensified exclusion. Posts combining gender, race, and profession feature keywords such as “depression,” “transition socially,” and “struggle immensely,” indicating heightened feelings of isolation. These students express separation due to both institutional practices and self-stigmatization, deepening the “us” versus “them” divide. For example, posts about gender, race, and religion often include words such as “affirmative action” and “bullying,” highlighting a dual process of external separation and internal self-stigmatization. This layered stigma leads to anger, frustration, and a cycle of negative emotions. Even single-axis experiences of stigma, such as gender or race, are marked by loneliness and a lack of connection, with keywords such as “friend” and “international student.”
We analyze discrimination-related posts by comparing top keywords across different overlapping identity combinations (see Fig. 5). Common terms such as “failure” and “rejection” highlight experiences of exclusion framed within discriminatory contexts.

Discrimination keyword network. Graph for the emergence or overlap of one or more minority statuses. Note: Yellow nodes represent keywords associated with a single subgroup, blue nodes indicate connections to two subgroups, green nodes to three subgroups, and red nodes to all four main subgroups (religion, race, gender, and profession).
Race, profession, and religion-related posts frequently feature keywords such as “fraud” and “discrimination,” focusing on unfair treatment in professional and racial contexts.
Discussion
This study examines how stigma manifests and interacts across different and intersecting minority identities in online college communities. Our findings suggest several important implications for addressing stigma among minority students in higher education. Rather than affecting minority students uniformly, stigma operates in stage-specific ways across distinct identity groups and combinations of identities.
First, the significant proportion of discourse surrounding minority status found on platforms such as Reddit indicates the need for academic institutions to incorporate stigma-awareness training and promote intersectional equity practices. Higher education institutions should implement practical measures to prevent and address discrimination, including accessible reporting channels, targeted student support services, and faculty training on recognizing and mitigating stigma. In addition, institutions could offer workshops, establish peer or professional support networks for affected students, and integrate antistigma content into curricula to raise awareness of the complex experiences of minority students.
Second, although professional minorities do not reach later stigma stages, the early prevalence of stereotyping indicates symbolic or discursive stigma and underscores the need for early intervention. Awareness campaigns and bias-reduction training targeted at both students and staff could challenge stereotypes before they escalate. This could prevent progression to more harmful stages such as separation, status loss, and discrimination.
Third, our findings highlight that racial minorities, particularly those with intersecting identities, are most vulnerable to experiencing the later, more damaging stages of stigma—status loss and discrimination. Specifically, students with intersecting racial and religious minority identities demonstrate the highest vulnerability to status loss and discrimination. Historically, higher education institutions have sought to admit students from diverse racial and ethnic backgrounds. However, they have paid less attention to how structural systems and institutional cultures exclude minority students, especially those with intersecting identities. Since the later stages of stigma could have long-term negative impacts on both academic success and mental well-being, 37 specific interventions are needed, such as connecting students with mentors who share similar intersecting identities and providing support programs that include religion-sensitive spaces alongside culturally responsive resources.
There are several limitations to consider. First, this study may include selection bias due to convenience sampling and the online nature of the data. Although Reddit is relatively racially diverse, prior work shows it remains biased toward male users and those with higher digital engagement. 41 Consequently, Reddit users do not represent the broader college student population, and stigma experiences shared on the platform may overrepresent groups that are more visible or active online. Second, while we included minority groups in our study, it was not possible to determine whether the posts were written by individuals from these minority groups or whether the posts simply discussed these groups. Next, although our deep learning-based approach opened new avenues for understanding stigma theory at scale, it inherently introduced some degree of hallucinations, which may lead incorrect interpretations. Finally, our study focused on minority groups based on gender/sexual orientation, race, religion, and profession, but other important minority identities, such as disability status, nativity, and socioeconomic status, could also influence stigma processes in higher education and were not accounted for in this research, which later studies could more fully examine to provide a broader understanding.
Building on our study, future research could further examine stigma processes by analyzing both single and intersecting minority statuses across various contexts beyond the education. Stigma is not confined to schools, and exploring its occurrence in workplaces, health care, or social services would provide a broader understanding of how it operates across domains. Stigma may manifest differently depending on the broader societal context, influenced by historical, cultural, and political factors that shape the reception of minority groups. In addition, future studies could use mixed-methods approaches by applying computational thresholds to identify theoretically salient or extreme cases and conducting in-depth qualitative analyses to examine how stigma processes are narratively constructed in context. Finally, longitudinal designs would be valuable for tracing how stigma processes evolve over time, particularly how early mechanisms such as stereotyping may develop into discrimination and status loss, and how these dynamics affect individuals.
Authors’ Contributions
C.H.: Conceptualization, formal analysis, methodology, writing—original draft, and writing—review and editing. S.Y.: Conceptualization, formal analysis, methodology, writing—original draft, and writing—review and editing. H.Y.: Conceptualization, writing—original draft, and writing—review and editing. S.H.J.: Conceptualization, writing—original draft, and writing—review and editing.
Footnotes
Author Disclosure Statement
The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding Information
This study was supported by a faculty research grant from the College of Liberal Arts at Korea University in 2025.
Ethical Approval and Consent to Participate
We analyzed publicly available anonymized data that are accessible to everyone.
