Abstract
Teenage suicidal ideation is on the rise, which emphasizes how crucial it is to recognize and comprehend the variables that contribute to this problem. Convolutional neural networks (CNNs), which are complex machine learning models capable of analysing intricate relationships within a network, are one possible strategy for addressing this issue. In our study, we employed a CNN-LSTM hybrid model to explore the complex relationships between teen suicide ideation and various risk variables, including depression, anxiety, and social support by analysing a substantial dataset of mental health surveys, seeking patterns and risk factors associated with suicidal thoughts. Our objective was clear: identify adolescents prone to suicidal ideation. With 24 parameters and a sample size of 3075 subjects, our model achieved an impressive F1-score of 97.8%. These findings provide valuable insights which helps in developing effective preventive interventions to address adolescent suicidal ideation, finding out the important patterns and risk variables related to suicidal thoughts. The study results offer important direction for developing preventive interventions that successfully address adolescent suicidal ideation.
Keywords
Introduction
India’s teenage population is highly susceptible to suicidal thoughts and attempts, making it a serious public health concern. Teenagers in India who are experiencing scholastic stress, familial conflicts, social isolation, and psychological problems including anxiety and depression are among the factors that lead to suicidal thoughts. Suicidal ideation is the term used to describe a wide range of emotions and ideas related to considering or preparing to take one’s own life. There are many degrees of pain associated with it, ranging from melancholy and hopelessness to more serious thoughts and plots of suicide. It is common to categorize suicidal individuals into two groups: those who have engaged in self-destructive conduct or who have successfully attempted. The relationships between these groupings are still up for debate among researchers. Other studies have found no conclusive evidence linking suicidal thoughts to the advancement of suicide attempts; rather, risk factors like hopelessness, frustration, and despair may serve as indicators of suicidal ideation [1].
Studies show that teens are prone to suicidal thoughts and behaviours, which is a serious public health concern. According to reports, 12% of high school students seriously considered suicide in the preceding year. It is important to keep in mind that suicidal thoughts can turn into suicide attempts, which can have terrible outcomes and lingering implications on individuals, families, and communities. Understanding and addressing the factors that lead to suicidal ideation is crucial for the effectiveness of prevention and intervention efforts.
Teens who have had mental illness in the past, substance misuse, family strife, exposure to violence or trauma, a lack of social support system, a history of self-harming activities, or who have had access to deadly weapons are more likely to consider suicide. Adolescent suicide thoughts should be prevented and treated with a focus on addressing risk factors, encouraging protective factors like healthy relationships and coping mechanisms, and offering the right mental health services. Potential treatments that can enhance teenage mental health and well-being while lowering suicidal thoughts include crisis hotlines, school-based preventive initiatives, and mental health screening and treatment programs [2].
Mood, behavioural, or sleep-related changes are frequently used as warning signs of teenage suicidal thoughts. If they think a child is at risk of suicide, parents, teachers, and psychotherapists should be aware of these indicators and take the proper action. Open communication, empathy, and support can be quite beneficial for adolescents who are struggling with suicidal thoughts and feelings. Deep learning approaches have outperformed traditional text classification methods in computer vision and pattern recognition. Deep learning makes use of dense vector representations and neural networks, which are frequently used in NLP tasks, to achieve better performance than classical machine learning, which depends on manually designed features [3].
Our study aims to analyse suicide ideation discussions using advanced deep learning techniques. Specifically, we will be utilizing CNN-LSTM model, and testing if their combination can improve classification performance for suicide-related topics. We successfully demonstrated that the CNN-LSTM model accurately identifies and classifies discussions of suicidal ideation more effectively than individual CNN classifiers and conventional machine learning systems.
Literature review
Because of the rise in suicide rates in recent years, a number of studies have focused on suicide detection. Suicide has many different causes, each of which is linked to a complicated network of interconnected elements. Numerous computer techniques have been established by researchers who examine suicidal thoughts.
Kyu Sung Choi et al. [4] developed a model using Graph Neural Networks to improve the sensitivity of predicting the risk of suicide in emerging adults. The model was able to address previous low sensitivity issues by combining multidimensional core characteristics surveys, which comprise resilience, self-esteem, depression, anxiety, and clinicodemographic data, within a graph structure dataset. Each person was represented by a separate graph, and the input features included personality questionnaires and hospital-based data.
In his work, Chao Yu et al. [5] discuss how RL, as opposed to conventional supervised learning algorithms, which frequently rely on one-shot, exhaustive, and supervised reward signals, solves sequential decision-making problems using sampling, evaluative, and delayed feedback simultaneously. Owing to these special qualities, the RL approach works well for developing practical solutions in a variety of healthcare domains where diagnosis options or treatment plans are frequently defined by a protracted, sequential process.
Roy et al. [6] The idea was to create an algorithm called “SAIPH” that would predict suicidal thoughts in the future by analysing Twitter data. We trained neural networks using tweets about psychological topics such as anxiety, stress, loneliness, hopelessness, depression, and anxiousness. We used data from 283 suicidal ideation patients and 2655 controls to train a random forest model to predict the status of suicidal thoughts. With an AUC of 0.88, the model predicted a roughly seven-fold increased likelihood of suicidal thoughts during the following ten days. Using regionally collected Twitter data for validation, significant relationships were established between SAIPH ratings and county-wide suicide fatality rates, particularly for younger persons.
Gareth, Harman, and other people [7] The purpose of this investigation was to build statistical models for suicide ideation in a cohort of children aged 9 to 10 years, using factors previously linked to risk in older teenage and adult populations. For the case-control study, Adolescent Brain and Cognitive Development (ABCD) collected data from 21 different research centres, resulting in 11,369 cases in total. The training features were made with the R Boruta package. It had an integrated Boruta feature selection method, and the Random Forest algorithm used all of the features – original and shadow – as a training set. As a result, the system was able to identify both SI and SA scenarios by producing a permutation rating for every characteristic. The random forest created using the extracted features was able to distinguish the group of teenagers who had considered suicide or had made plans to take their own life from the controls with an AUC
According to Chang Su et al.’s research, [8] it is crucial to precisely estimate a child or adolescent’s risk of suicide within manageable time frames, even when doing so is challenging. To bridge this disparity, we employed deidentified electronic health records (EHR) from Connecticut Children’s Medical Centre, encompassing 41,721 children aged 10 to 18 from October 2011 to September 2016. Our machine learning algorithms employed longitudinal clinical records, accounting for both short- and long-term risk factors, to predict suicide behaviour. The candidate predictors comprised laboratory test results, medication, diagnosis, and candidate demographics, with prediction windows ranging from 0 to 365 days. With 90% specificity, predictive models detected 53–62% of people who tested positive for suicide, with AUCs ranging from 0.81 to 0.86 for all prediction windows. Shorter prediction windows performed better, and the predictor’s value changed between windows, highlighting both immediate and long-term risks. Our study demonstrates the use of routinely collected electronic health records (EHRs) to create accurate prediction models for the risk of adolescent and pediatric suicide.
H.H. Theyazn et al. [9] The goal of this work is to identify suicidal ideation from social media posts, which is a challenging but crucial task. To tackle this, we present a method based on publically accessible Reddit datasets and experimental study. Word-embedding techniques like TF-IDF and Word2Vec are used for text representation, while hybrid deep learning and machine learning algorithms are used for classification. Specifically, we employ the CNN-BiLSTM and XGBoost models to classify social posts as suicidal or non-suicidal 9 based on textual and LIWC-22-based attributes. Two tests were conducted to evaluate the models’ performance, and standard metrics like F1-scores, accuracy, precision, and recall were used. The findings demonstrate that the CNN-BiLSTM model outperformed the XGBoost model in terms of detecting suicidal thoughts when textual characteristics were used, with 95% accuracy as opposed to 91.5% accuracy. On the other hand, XGBoost fared better than CNN-BiLSTM while employing LIWC features.
According to Lei Cao et al.’s research [10], many people experience suicidal thoughts for a variety of reasons on a global scale. Suicidal thoughts might surface on social media platforms, which are frequently used as spaces for self-expression and communication. Nevertheless, it is challenging to identify such behaviour successfully due to issues with incomplete information and complex data. To detect suicidal ideation on social media, we integrate deep neural networks with a high-level suicide-oriented knowledge structure while accounting for psychological insights. Additionally, we employ a two-layered attention technique to pinpoint critical risk factors influencing each individual’s concepts. Utilizing data from microblogs and Reddit, analysis reveals that our approach achieves over 93% accuracy. Among the detected personal components, posts, personality traits, and experiences stand out as the key identifiers. Crucial roles in the detection process are played by factors like posted text, stress levels, and rumination.
According to Christianah Oyewale et al. [11], the research’s objective was to suggest that, in mental health assessments, anticipating suicidal thoughts is crucial, especially given victims’ reluctance to seek help and doctors’ drawn-out case review processes. Deep learning models like convolutional neural networks (CNNs) and long short-term memory (LSTM) networks have shown promise in this area. Nonetheless, it’s still challenging to determine which word embeddings will best 10 vectorize text. This work employs a CNN and bidirectional LSTM layer deep learning architecture with two word embedding techniques: Word2Vec and Fast Text. Test set F1-scores of 90% with Word2Vec and 94% with Fast Text were observed in experiments conducted on a 232,074 post Reddit dataset. Fast Text works noticeably better with less overfitting than Word2Vec.
Social media offers valuable information in this area, according to Usman Naseem et al.’s [12] proposal that sequential learning approaches show potential in identifying suicidal individuals. However, these methods could overlook global traits that are necessary for accurate identification. We propose a Graph-Based Hierarchical Attention Network (GHAN) that utilizes graph convolutional neural networks with ordinal loss to enhance suicide risk identification. Using the encoder of attentive transformers, GHAN examines textual material and hierarchically optimizes suicide risk levels after creating three graphs to record semantic, syntactic, and sequential contextual information. Experimental results on a Reddit dataset that is accessible to the public demonstrate that GHAN performs better at predicting suicide risk than state-of-the-art methods.
Proposed methodology
This section describes the pre-processing procedures that were done and offers a thorough examination of the dataset. The dataset was prepared from surveys using questionnaires considering the various factors pertaining to suicide. In particular, the dataset is collected from the survey of persons in the age group 10–19. 23 attributes were taken into consideration. The Prediction of various Suicidal Tendencies Factors. The profound deep neural network will be prepared to gain proficiency with the accompanying different components. Target: field refers to the prediction of suicidal tendency in the adolescent. It is integer valued from 0 (no-Risk) to 1. Experiments with the dataset have concentrated on simply attempting to distinguish Suicidal (values 1) from non-suicidal (value 0).
A cross hybrid model that consolidates Convolutional Neural Network and Long Short-Term Memory has been harnessed to identify self-destructive ideation in teens. In this model, the convolutional layer that comes after the LSTM layer gets its input from the LSTM layer’s output vector. Figure 1 depicts the LSTM-CNN hybrid model’s architecture. The input data is effectively categorized by this model into levels of suicidal or non-suicidal tendencies. The goal of this technique is to increase the accuracy and effectiveness of diagnosing suicidal ideation through questionnaire-based assessments.
LSTM-CNN model for suicide prediction.
The model architecture in the image, which demonstrates how a Sequential model with multiple layers is implemented, serves as an illustration of the suggested methodology A dropout layer is employed to prevent overfitting by randomly deactivating some neurons during training. The LSTM layer is responsible for capturing long-range correlations within the textual input, whereas the convolutional layer helps extract relevant properties [13]. The retrieved features are condensed into a small area by the pooling layer, which down samples the feature map. Adam optimizer to update the network weights. The model’s plan and related boundaries are compactly illustrated in Fig. 2.
LSTM-CNN model summary.
This is a technique which is incorporated in neural networks to address overfitting and promote generalization by preventing co-adaptation among hidden units. It works by casually deactivating a portion of the activations during training, introducing noise that aids in improving the model’s ability to generalize. Typically, a dropout rate of 0.5 is utilized, indicating that, on average, half of the activations are dropped out during training
Dropout provides a type of noise or uncertainty into the network by randomly deactivating neurons, which helps prevent the network from depending excessively on particular features or co-adapting to particular patterns in the input. By lowering the interdependencies between neurons, this regularisation strategy promotes the network to develop more reliable and generalizable representations [14].
When examining social elements associated to suicidal ideation, the dataset may contain subtle patterns that are critical for prediction. Dropout keeps the model from becoming highly specialized to the training input by randomly deactivating a subset of neurons throughout each training iteration. This pushes the model to acquire redundant and more generalized features that will be effective on fresh, previously unknown data, making it more robust in real-world applications where responses can vary greatly. Given the sensitive nature of the data and the possibility of overfitting due to complicated sociological inputs, dropout contributes to the model’s accuracy and reliability when applied to various teenage populations. Dropout improves the model’s predicted accuracy and reliability by preventing it from remembering precise features of the training data, which is critical for effectively detecting suicidal ideation.
Long short-term memory
LSTM is advantageous over conventional RNNs, it has increased resilience and the capacity to recognize dependencies over long periods of time. By incorporating a memory cell that directs the network’s information flow, this is accomplished.
One layer, or 100 LSTM units, makes up our LSTM layer model. Each LSTM unit has four gates that each carry out separate computations. The input sequences that are used for these computations are symbolized by d-dimensional word embedded vectors (xt). Furthermore, the parameter H denotes the quantity of hidden layer nodes within the LSTM structure.
In the equations mentioned above, the symbol
Originally designed for image recognition tasks, the convolutional layer is a key component of Convolutional Neural Networks (CNNs) known for their strong performance. However, in recent years, CNNs have proven to be highly versatile models that can achieve remarkable results across various text classification tasks. Convolutional neural networks (CNNs) exhibit the ability to identify and learn patterns that may be challenging for traditional feed-forward networks when applied to well-organized and structured text data. CNNs excel in capturing nuanced distinctions, such as the different sentiments conveyed by the word “down” in phrases like “down to earth” and “feeling down.” Additionally, CNNs can extract features from any position within a sentence without being constrained by their specific location. Unlike cyclic connections in other types of neural networks, CNNs operate with individual neurons representing regions within input samples, such as image segments or text fragments [13].
The word embedding vector (k) has a dimension of the convolutional filter (F Rjk), which consists of j words in a window. At each time step (t), the convolutional filter
In this case, “F” and “b” act as the parameters for a single filter, while the variable “b” stands in for a bias term. A feature map is produced as a result of the application of this filter.
In this experiment, we use multiple convolutional filtering algorithms with different variable initializations. These filters are used to extract multiple feature maps from the text.
This layer plays a major role in reducing the dimensionality of corrected feature maps while preserving valuable data. To make input representations easier to handle, its primary objective is to combine and consolidate them. In order to avoid overfitting, this technique effectively lowers the number of parameters and computations in the network [16].
Pooling layers assist the model in focusing on the essential sociological elements that may predict suicidal thoughts by summarizing the most significant features. This selective preservation of features helps to improve the model’s accuracy by emphasizing the most important parts of the input data. Pooling layers improve the model’s robustness by introducing spatial invariance. This indicates that the model can cope better with changes and distortions in input data, such as discrepancies in language or emphasis in questionnaire responses. This resilience is crucial for ensuring that the model works consistently across diverse teenage groups and sociological circumstances
Flatten layer
The neural network receives the column vector from the CNN flatten layer as input for the classification task. After that, a reshape capability is used to the pooled include maps in order to smooth and reshape them into a consolidated element vector [17].
Sociological elements from questionnaires are frequently treated as text data, which can take advantage of CNNs’ local feature extraction capabilities. After convolution and pooling, the flatten layer converts these 2D feature maps into a format that LSTM layers can analyse sequentially. This is critical for capturing temporal dependencies as well as the progression of thoughts and feelings over time, both of which are required to understand patterns in adolescent suicide ideation.
Output layer
The result or completely associated layer’s fundamental objective is to decide the probability of recognizing a text as one or the other self-destruction or not. This layer takes as its input the text feature vector created by the convolutional and pooling layers. Appropriate activation functions are used to address issues with gradient explosion or vanishing. Based on the labelled training dataset, these activation functions are crucial in deciding the final classification outcome. In our experiment, we activate the output layer using the SoftMax activation function [18].
Algorithm
Package import statements that are necessary: This lists all the important packages for handling pictures before processing them, setting up the system, organising data, and storing directories.
Dataset consist of 3075 sample information in which 1497 are of those who have factors of suicidal tendency and the remaining 1578 as non-suicidal, stacked in the form of ‘0’ and ‘1’ indexes separately which was collected was Bangalore Urban schools and colleges.
Three phases – train, approve, and test – are used to process data in the train test process. 15% for validation, another 15% for testing, and 70% for preparation. The condition of every image for every x_train, x_val, and x_test that was observed and displayed.
The dataset contained sociological features along with the demographic features which includes Name, age, class, Gender, Relationship, Media Exposure, Physical abuse, Sexual abuse, Exam failure, Academic performance, Academic stress/pressure, Freedom to move, Expression of opinion, Communication with parents, Communication with friends, Confront wrong acts, Discussion about relationship, Death of loved ones, Family problems, Parental Divorce/Separated, Family illness, Parental abuse, Relationship problems, Peer pressure, Lack of parental guidance, Parental pressure.
The model is planned with (input layer
The models are ready for 120 epochs, and validation accuracy for every model that is created and batch size of 64.
Activation functions used in this model was ReLu and Softmax function, optimizer used is Adam Optimizer and the Loss function used is Binary_crossentropy function.
The trained network details are used in conjunction with history, which records all the intricacies of each train, validation, and test batch in terms of loss, accuracy, and f1 score. Plots of the accuracy and loss are displayed.
The optimal model is selected based on the highest validation accuracy across all age groups, and the measurements are evaluated using the hyperparameters.
Ultimately, the confusion matrix’s accuracy, exactness, and f1 score aftereffects are found for the test and validation datasets.
Assessment metrics
Using important assessment metrics like accuracy (Acc.), we compare the performance of our suggested classification technique to F1-score (F1). The review (R) and accuracy (P) of each test are kept in a confusion matrix, which is utilized to compute the F1-score, which is a decent metric.
By taking into account both review (the level of accurately distinguished positive examples) and accuracy (the quantity of accurately recognized positive examples), F1-score offers an exhaustive assessment. A superior equilibrium between memory and precision is indicated by a higher F1-score. TP, FP, TN and FN are the evaluation metrics [19]. The most straightforward evaluation statistic is accuracy, which has the following definition:
A 70%-30% split was used to divide the 3075-value as training and testing date. The following choices were made during the compilation of the model:
Optimizer: Adam was chosen as the optimizer for this work partly due to its bias-correction mechanism, which improves optimisation, particularly in the later training stages when gradients tend to become sparse. Faster convergence is also facilitated by this optimizer selection [20].
Loss function: For this job, the sparse categorical cross-entropy loss function was selected. With numerous label classes expressed as integers, this specific loss function is applicable.
Metrics: Accuracy was chosen as the primary metric for estimating the model’s performance. This metric is widely used and allows for comparison with results from existing literature. Additionally, precision, recall, and F1-score were also evaluated for a comprehensive analysis.
Activation functions Softmax, ReLU are two of the functions employed in the model. Faster training and vanishing gradient issues were addressed by using ReLU activation, which is especially advantageous for convolutional layers. The function suitable for binary classification is sigmoid [21].
Number of epochs: After looking at a number of relevant models, 120 epochs were found to be the right amount to teach the model, with 64 being the number of batches used.
Charts displaying the loss and accuracy of the training across 120 epochs.
For this research, we utilized a custom-developed questionnaire dataset obtained from survey forms. Specifically, the dataset consists of 3075 data values, each associated with 24 different characteristics or features. Figure 4, depicts the raw data. Figure 5, shows the statistical analysis of the data.
Correlation between suicide and the parameters
Figures 6 and 7 shows the correlation between the parameters taken for identifying and suicide. Heatmaps are particularly useful for visually identifying patterns or relationships in large datasets. They allow for quick identification of high or low values, clusters, and gradients within the data.
Performance metrics
Performance metrics
Performance metrics
Raw data.
Statistical analysis of the data.
Correlation of features with suicide.
Heap map representation of features.
Table 1 shows the performance metrics of the designed CNN-LSTM model in terms of F1 score, Cohens kappa, accuracy, ROC AUC, Precision and recall.
Confusion matrix of suicide prediction.
The model’s performance can be further understood by looking at the confusion matrix, which also reveals the types of errors the model produces (false positives and false negatives) as well as how well it predicts (true positives and true negatives). These numbers can be used to calculate a variety of evaluation metrics, including recall, precision, accuracy, and the F1 score. A more comprehensive and in-depth view of the model’s predictions is provided by the confusion matrix.
True negatives: 422 instances of a negative (class 0) outcome correctly predicted by model.
False positives: Three events were wrongly classified by the model as positive (class 1) despite being negative.
False negatives: Eight examples were positive even though the model mis predicted them as negative (class 0).
True positives: 490 instances of positivity (class 1) were accurately predicted by the model.
ROC Curve: This curve shows how well a binary classification model does at different levels of classification.
ROC-AUC curve.
Table 2 above displays the different evaluation metric values that were acquired for the various baseline models. When our suggested model is contrasted with the other baseline models, we find that the CNN-LSTM combined model produces the best classification results in our experiment. It performed noticeably better than the other models and showed an accuracy of 97.8% with the improved parameters.
Performance metrics of the classification model
Performance metrics of the classification model
We also recognize that an unbalanced dataset prevents accuracy from being a suitable metric. As a result, we additionally assessed the models using more precise metrics including F1-score, precision, and recall. The suggested model showed 96.3% precision, 95.3% recall, and 97.8% F1-score. These findings show that the suggested combined model outperforms the individual LSTM and CNN classifiers. To determine the accuracy, recall, and precision of the suggested model, macro-averaging is utilized. We note that the suggested model’s recall and precision values are determined to be higher than its accuracy. This may be caused by the high recall values for the classes with severe and no risk, the relatively low recall values for the classes with low and moderate risk, and the high precision values for the classes with severe and moderate risk. These classes’ high recall and precision help to sustain a high recall and precision overall, despite their lower accuracy.
Our research introduces and tests a method for assessing the mental health of teenagers. Our model aims to minimize the suicide rate in society. Detecting suicidal intents based on individual social media interactions. This can provide necessary medical support to individuals in need. Based on our model’s results, we may offer an awareness lesson on dealing with mental health concerns including stress and anxiety. Nowadays, people are more sensitive to basic events and hardships, resulting in significant losses for families, friends, and those around them. By identifying indications of suicidal intentions through user interactions through questionnaires, our methodology aims to reduce the increasing suicide rate in society. By identifying individuals at risk, our model aims to facilitate timely intervention and appropriate medical assistance for those in need. Furthermore, based on the outcomes derived from our model, we can offer educational programs and awareness sessions on coping with mental health challenges such as stress, anxiety, and related issues. The current societal landscape demonstrates a growing sensitivity towards handling everyday situations and managing hardships, which can lead to devastating consequences for families, friends, and the wider community. Therefore, our research endeavours to develop a model that effectively detects suicidal ideation and contributes to reducing the incidence of suicide. Using the LSTM-CNN combination model, we looked at submissions and paid particular attention to areas of the data that indicated whether the submission had suicidal tendencies. Our suggested model achieved an incredible F1-score of 97.8%, outperforming the baseline approach in every aspect of performance. The model’s input data was based on questionnaires collected from students, focusing solely on sociological factors. However, future studies could enhance the model by incorporating both biological and psychological factors with different attention mechanisms, particularly the Hierarchical Attention Networks (HAN).
Ethical approval
My research guide reviewed and ethically approved this manuscript for publishing in this Journal.
Competing interests
The authors declare that there is no conflict of interest.
Research funding
Not applicable.
Human and animal rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed consent
I certify that I have explained the nature and purpose of this study to the above-named individual, and I have discussed the potential benefits of this study participation. The questions the individual had about this study have been answered, and we will always be available to address future questions.
Footnotes
Acknowledgments
I express my gratitude to my respected supervisor and head of the department for their guidance.
