Abstract
Purpose:
Identify and examine the associations between health behaviors and increased risk of adolescent suicide attempts, while controlling for socio-economic and demographic differences.
Design:
A data-driven analysis using cross-sectional data.
Setting:
Communities in the state of Montana from 1999 to 2017. Selected Montana as it persistently ranks among the top 3 vulnerable states in the U.S. over the past years.
Subjects:
Selected 22,447 adolescents of whom 1,631 adolescents attempted suicide at least once.
Measures:
Overall 29 variables (predictors) accounting for psychological behaviors, illegal substances consumption, daily activities at schools and demographic backgrounds were considered.
Analysis:
A library of machine learning algorithms along with the traditionally-used logistic regression were used to model and predict suicide attempt risk. Model performances—goodness-of-fit and predictive accuracy—were measured using accuracy, precision, recall and F-score metrics. Additionally, χ2 analysis was used to evaluate the statistical significance of each variable.
Results:
The non-parametric Bayesian tree ensemble model outperformed all other models, with 80.0% accuracy in goodness-of-fit (F-score: 0.802) and 78.2% in predictive accuracy (F-score: 0.785). Key health-behaviors identified include: being sad/hopeless (p < 0.0001), followed by safety concerns at school (p < 0.0001), physical fighting (p < 0.0001), inhalant usage (p < 0.0001), illegal drugs consumption at school (p < 0.0001), current cigarette usage (p < 0.0001), and having first sex at an early age (below 15 years of age). Additionally, the minority groups (American Indian/Alaska Natives, Hispanics/Latinos) (p < 0.0001), and females (p < 0.0001) are also found to be highly vulnerable to attempting suicides.
Conclusion:
Significant contribution of this work is understanding the key health-behaviors and health disparities that lead to higher frequency of suicide attempts among adolescents, while accounting for the non-linearity and complex interactions among the outcome and the exposure variables. Findings provide insights on key health-behaviors that can be viewed as early warning signs/precursors of suicide attempts among adolescents.
Keywords
Purpose
Suicide is ranked as the second leading cause of death among individuals aged 10-34 years in the United States. 1 Previous literature has demonstrated that suicidal behaviors among youths are associated with family-related factors 2 (e.g., family history of psychiatric disorder and suicidal acts, parental discord and disharmony, poor family communication), mental disorders 2,3 (e.g., depression, borderline or antisocial personality disorder, anxiety disorders, anorexia nervosa), negative life events 4,5 (e.g., physical/sexual abuse, bullying at schools, relationship break-ups, dating violence), health risk behaviors 6 (e.g., alcohol addiction, substance abuse) and so forth. A meta-analysis revealed that suicide risk is complex in nature, and suggested that future investigations should go beyond the traditional linear models (e.g., Linear/Logistic regression) to characterize its nonlinear associations. 7
Some knowledge gaps, however, still remain in identifying the concerning health-behaviors on a daily basis that can be regarded as precursors to future suicide attempts in adolescents, and proposing the statistical model that can best capture the non-linear associations. Therefore, this paper aims to: 1) examine the confounding effects of multiple health-behaviors and their interactions on suicide attempt risk, and 2) identify the key health-behaviors (precursors) that can better predict the risk of suicide attempts among U.S. adolescents, leveraging a library of advanced statistical learning models.
Methods
Participants
The Youth Risk Behavior Survey (YRBS) is a biennial school-based survey conducted by the Centers for Disease Control and Prevention (CDC) in collaboration with the states, territories, and local education and health agencies in the U.S. to collect the self-reported data on health-related behaviors from students in grades 9-12. 8,9 In our study, we used this secondary data on health-behaviors and demographics from YRBS for the state of Montana during 1999-2017. We also collected relevant information on socioeconomic condition of the state from the Bureau of Labor Statistics corresponding to the same time-period. Here, we selected Montana because of the known fact that Montana witnesses the highest youth suicide rate in the country, persistently securing its place among the top 3 vulnerable states over the years. 10 However, our proposed data-driven methodology is generalized enough that can be applied to any geographical regions (contingent on data availability) to identify health-behaviors instrumental to suicide attempt risks, and predict the likelihood that an adolescent will attempt suicide.
Data collection was followed by data cleaning and variable selection in order to remove: 1) highly correlated variables (correlation coefficient |ρ| < 0.9) to avoid “masking effect” and aid in better interpretation of the health-behaviors and suicide attempt nexus; 2) variables with over 90% missing values; and 3) observations having missing entries. Consequently, the final dataset includes a total of 22,447 observations and 30 variables (29 predictors and the response).
Measures
The response variable (a.k.a. dependent variable) in this study, referring to a subject’s self-reported suicide attempts during the past 12 months from the instance of taking the survey, consists of 2 groups: “
The predictors (a.k.a. independent variables) include a variety of individual-level as well as state-level information. Individual-level information include participants’ demographics (sex, age, race, education, body height and weight), health behaviors such as illegal substances consumption (usage of alcohol, marijuana, cocaine, inhalant, steroid) and assaultive behaviors (physical fighting, threatening, weapon carrying, safety concerns at school), sexual activities, risk perception (seat-belt/helmet use, riding with drunk drivers), and emotional wellbeing (being sad/hopeless) of the participants. State-level information mostly includes socioeconomic factors such as gross domestic product (GDP), per capita income and unemployment rate, all of which are used as control variables in the analysis.
Data Analysis
Among the 22,447 observations in the final dataset, 20,816 cases belonged to
Results
Comparing all the models' results, we found that BART (see Online Appendix for model illustration) outperformed all other models, both in terms of model fit (overall accuracy = 80.0%, F-score = 0.802) and predictive accuracy (overall accuracy = 78.2%, F-score = 0.785). Note that, BART performed better than the traditionally used linear logistic regression (overall predictive accuracy = 77.4%, F-score = 0.776), which is approximately 0.8% difference in overall predictive accuracy. This indicates that our proposed analytic approach (BART) better classifies the adolescents who are vulnerable to suicide attempts than the traditional logistic regression models (8 more correct predictions out of 1000 total examples). Considering 19% of Montana’s overall population (∼ total population of 1.07 M as of July 01, 2019) is adolescent, we can say that our proposed model better predicts the vulnerability to suicide attempt risk of a higher number of adolescents (∼1625 adolescents more), compared to the traditionally used logistic regression model, that will help in a better-informed decision making minimizing the overall suicide attempt risk among adolescents. Thus, this study provides evidence that the associations between risk factors (a.k.a. predictors/precursors) and suicide attempt risk (a.k.a. response variable) are nonlinear that cannot be adequately captured by the traditional linear regression models.
To better understand how strongly each predictor is associated with the response variable, variable inclusion proportions (note 5) (VIP) was computed for each predictor. Larger value of VIP indicates higher importance of the variables in explaining and predicting the response variable. 13,14 The ranking of all the 29 predictors, based on their VIPs from high to low, is exhibited in Figure 1.

Variable importance ranking of all 29 predictors, including health behaviors and socioeconomic and demographics factors, as measured by variable inclusion proportions.
For the sake of brevity, this paper further discussed the top 10 predictors (from Figure 1) associated with adolescent suicide attempt risk. The statistical significance of the top 10 predictors are shown in Table 1, where the predictors are categorized into 2 groups: health-related behaviors (7 variables) and demographic characteristics (3 variables) of the adolescents. Note that, all the top 10 predictors are categorical with different levels (values) (note 6), and are statistically significant (p < 0.0001). Table 1 also provides detailed information on the distribution of each predictor across different levels with respect to the response variable.
Top 10 Factors (7 Health-Related Behaviors and 3 Demographic Factors) Associated With the Adolescent Suicide Attempt Risk.
The 7 most important health-related behaviors associated with higher suicide attempt risk in adolescents are shown in Table 1 with VIPs from high to low. The foremost predictor being sad or hopeless reported whether a subject had felt sad/hopeless persistently for almost every day over 2 or more weeks in the past year during which they stopped doing their usual activities. Our analysis revealed that 78% of the subjects who attempted suicides (
The 3 most significant demographic variables indicated in Table 1 include sex, race and education (in descending order by VIP). Our results indicated that females are more vulnerable to attempting suicide than males—70% of Group-1 subjects were females, compared to 52% female subjects in Group-0. Minority groups (e.g., American Indian and Alaska native; Hispanic/Latino) are also found to be more vulnerable compared to their White or Asian peers. For example, 10.5% of Group-1 subjects were American Indian and Alaska native compared to their Group-0 counterparts (4.7%). Similarly, 8.3% of Group-1 subjects were Hispanic/Latino compared to their Group-0 counterparts (4.9%). Adolescents in 9th and 10th grades are also found to be at a higher risk of suicide attempt than those in 11th and 12th grades.
Discussion
Summary
The major findings from this study indicate that certain health-related behaviors among adolescents have strong correlations with the risk of suicide attempts. Being sad or hopeless is the most important predictor of adolescent suicide attempt risk. Up to 78% adolescents who attempted suicide had the persistent feelings of sadness and hopelessness that hindered them from doing usual activities. Similar findings were reported in a study that targeted the group of South African secondary school learners. 15 Other health-related behaviors, mostly observed in a school environment, such as being absent from school due to safety concerns, getting involved in physical fighting, illegal drugs usage in school property, also play critical roles in predicting the risk of suicide attempt. For instance, those who were absent from school over 6 days had a 50% likelihood of attempting suicide. These school-based concerning behaviors can be easily tracked on a daily basis, and could be regarded as early warning signs of suicide attempt risk. Higher frequency of addictive substance consumption (e.g., inhalant, cigarette) by the adolescents can also reflect that they are at a higher risk of attempting suicides among youth. Engaging in sexual activity during early adolescent period is another key predictor that has shown significant associations with increased risk of suicide attempt.
Besides health-related behaviors, this study establishes that the risk of suicide attempt significantly differs by sex, race and education. Health disparities exist where certain demographic subgroups are relatively vulnerable to attempting suicide, such as adolescent females, American Indian and Alaska native and Hispanic/Latino adolescents, and youths studying in 9th and 10th grades.
Limitations
Health-behaviors identified in this study illustrate the correlations/associations, but not necessarily the causality of suicide attempt risk. More information related to adolescents’ family background, pre-existing mental/physical health conditions, school GPA, and/or social determinants of health (e.g., social norms and attitudes) may help to better understand the causality of suicide attempt risk in adolescents.
Significance
Unlike previous studies, this study identified, examined and ranked a multitude of health-related behaviors and demographic characteristics of adolescents associated with suicide attempt risks while capturing their nonlinear and complex interactions, leveraging an advanced data-driven approach. Different from traditionally-used linear models, our proposed nonlinear model (BART) demonstrates higher accuracy in predicting suicide attempt risk and provides an importance ranking of the predictors, indicating how strongly they are correlated with the response. We found that a majority of adolescents attempting suicides had persistent feelings of sadness and/or hopelessness for over 2 weeks. This study also established that certain school-based behaviors are key precursors to suicide attempt risks. In this view, timely identification of the suicide attempt precursors by adolescents’ family and school staffs, and monitoring the mental/physical wellbeing of students will help to develop informed suicide prevention strategies, minimizing the overall suicide risk. Higher frequency of addictive substance usage (e.g., inhalants, cigarettes) is also found to be a key predictor of the growing risk of suicide attempt; this indicates that local government and communities might need to impose a stricter surveillance system to restrain addictive substances sales to youth.
So What? (Implications for Health Promotion Practitioners and Researchers)
What is already known on this topic?
Previous studies modeled the associations of adolescent suicide attempts with a single or a few selected health-behaviors assuming a linear form.
What does this article add?
Our study provides evidence that the associations between health-behaviors and suicide attempt risk is nonlinear and can be best captured by nonparametric ensemble tree-based model. This study also identified and ranked the key health-behaviors that can be viewed as precursors / early warning signs for future suicide attempts among adolescents. Key predictors include persistent feelings of sadness and/or hopelessness, school absenteeism, physical fighting, illegal drug usage, under-aged sexual activities.
What are the implications for health promotion practice or research?
The identified key health-behaviors and their ranking could help stakeholders—e.g., adolescents’ families, school staffs, school district/boards and governments/community leaders to make informed suicide prevention strategies and promote mental health of adolescents. Strategies may range from monitoring concerning/risky behaviors within the school environment, regulating drugs sale at the community level, and address the health inequity issues among vulnerable population such as females and minority racial groups. This study paves a new path for predicting future suicide attempt risk leveraging advanced data-driven machine learning models.
Supplemental Material
Supplemental Material, sj-pdf-1-ahp-10.1177_0890117120977378 - Health-Behaviors Associated With the Growing Risk of Adolescent Suicide Attempts: A Data-Driven Cross-Sectional Study
Supplemental Material, sj-pdf-1-ahp-10.1177_0890117120977378 for Health-Behaviors Associated With the Growing Risk of Adolescent Suicide Attempts: A Data-Driven Cross-Sectional Study by Zhiyuan Wei and Sayanti Mukherjee in American Journal of Health Promotion
Footnotes
Author’s Note
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Youth Risk Behavior Surveillance System (YRBSS). Zhiyuan Wei contributed to data collection and preprocessing, data analysis, results interpretation, writing draft manuscript. Sayanti Mukherjee contributed to conceptualize and design the research, supervision, preliminary data analysis, results interpretation, writing and finalizing the manuscript.
Acknowledgments
The authors would like to acknowledge Mr. Shivam Dave, former Masters student of Industrial and Systems Engineering, University at Buffalo (SUNY) for his help with preliminary collection of the data. The authors would also like to acknowledge SUNY Research Seed Grant Program and the University at Buffalo, The State University of New York (SUNY) for providing funding for this research study.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors received financial support from the SUNY Research Seed Grant Program and the University at Buffalo, The State University of New York (SUNY) for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
