Analytical framework for mental health feature extraction methods in social networks

Abstract

Today, with the development of internet technology, a new kind of social relations and interactions have been formed in the newly emerged social networks. Through social networks, the users can share different types of content, including personal information, text, image, video, music, poem, and other related information, which express their mental states, emotions, feelings, and thoughts. Thus, a new and essential aspect of human life is being formed in a virtual space in social networks, which must be explored from several viewpoints, such as mental disorders. Analyzing mental disorders according to the social network data can guide us to gain new approaches to improve the public health of the whole society. To this aim, developing mental health feature extraction (MHFE) methods in a social network is essential and is now becoming an active research area. Therefore, in this paper, a review of existing techniques and methods in MHFE is presented, and a comprehensive framework is provided to classify these approaches. Furthermore, to analyze and evaluate each approach in extraction methods, an appropriate set of functional criteria is proposed, which leads to a more accurate understanding and correct use of them.

Keywords

Feature extraction mental health mental disorders social network

1. Introduction

Mental health (MH) is the level of psychological well-being or an absence of mental illness. It is the state of someone who is “functioning at a satisfactory level of emotional and behavioral adjustment” From the perspective of positive psychology or holism, mental health may include an individual’s ability to enjoy life and create a balance between life activities and efforts to achieve psychological resilience [1].

Currently, monitoring symptoms of mental disorders such as depression is done by physicians in the office through self-assessment and interview-based methods. However, the shortage of mental health professionals and the limited resources available to physicians preclude close monitoring of symptoms, delay in optimal treatment, and consequently long-term suffering for patients. Inactive recording of behavioral data is recognized as a potential practical method for long-term monitoring of mental disorders. A combination of social network analysis and machine learning can provide an accurate and instantaneous measurement of the details of behavior and changes in mental disorders.

Social network analysis and especially the use of machine learning techniques can guide us to gain new approaches to improve the public health of the whole society by interventions that can be in the form of advertising, information links, online advice, or cognitive behavioral therapy. For example, Facebook intends to help users with suicidal risk in real-time [2].

A mental disorder is a set of disorders that are associated with a change in thinking, mood, or behavior in the form of distress or impairment in performance.Users with mental illness will disclose their symptoms on social networks or by membership in online forums, and they can be detected by monitoring user activities and discovering patterns in their online activities [3]. The process of exploring social networks is a subset of a large set of activities called Social Network Analysis (SNA), which, according to the structure and characteristics of this environment, is divided into two groups of general structural analysis and content analysis. The first group focuses on social networks from the links and structure point of view, and the second group focuses on the analysis of texts, images, and other possible content types [3, 4, 5].

Facebook, with about 1.7 billion active users per month, and Twitter, with about 310 million active accounts, are two of the largest social networks. Each of these social networks has produced massive data that can be used to explore significant patterns in user behavior [6]. Up to now, multiple measures have been considered to analyze social network user’s mental health, including activity, social capital, emotion, and linguistic style in participants’ Facebook data in prenatal and postnatal periods [7]. In addition to sufficient work being performed in mental health analysis in social networks, feature extraction is now becoming an active research area.

Mental health feature extraction (MHFE) is the process of finding the best features from a variety of existing features to analyze the mental health of individuals. This process is very important to guide us in a way to automate MH investigations and using artificial intelligence methods in this field. Furthermore, extending and improving existing MHFE methods causes better MH analysis. However, several mental health feature extraction techniques have been developed up to now but the lack of a survey will be felt for new researches. Hence, a new taxonomy might be urgent that leads to this paper. Besides, using each MHFE method requires appropriate criteria to understand its role in extraction methods’ functionality.

In this paper, a review of existing techniques and methods in MHFE is presented, and a comprehensive framework is provided to classify these approaches. Furthermore, to analyze and evaluate each approach in extraction methods, an appropriate functional criterion is proposed, which leads to a more accurate understanding and correct use of them. So the main contributions of this paper are based on two categories. First, a new taxonomy and classification of MHFE methods are presented. Second, proposing appropriate functional criteria to analyze and evaluate each approach.

The rest of the paper is as follows. In the second section, previous related works are provided. In the third section, the proposed MHFE framework is described. In the fourth section, the functional criteria for studying and analyzing the feature’s role are described. Finally, in the last part, the conclusion is presented and the future works are highlighted.

2. Related works

There are many activities in the field of feature extraction in social networks, and many researchers have presented review papers. For example, Chandrashekar and Sahin have presented a survey on feature extraction methods [8]. Molina et al. have also focused on the same topic but they also have presented an experimental evaluation of feature extraction methods [9]. Xue et al. have presented a survey on evolutionary computation methods to feature extraction [10]. Diaz-Chito et al. have presented an overview of incremental feature extraction methods based on linear subspaces [11]. Yıldız has focused on feature extraction in discrete space [12]. AlNuaimi et al. have presented a survey on streaming feature extraction algorithms for big data [13]. But to the best of our knowledge, there is no survey in the field of mental health feature extraction (MHFE) and the current paper is the first attempt to present an overall review of MHFE approaches.

However, intending to analyze mental health in the context of social networking and dealing with its challenges, some articles have attempted to provide a framework for improving efficiency by combining relevant resources. Wongkoblap et al. have presented a systematic review to determine the scope and limits of techniques that researchers have done to predict based on machine learning and review related issues [6]. Guntuku et al. have studied mental illness in the social network. Users with mental illness will disclose their symptoms on Twitter or by membership in online forums, and they can be detected by controlling users and discovering patterns in their language and online activities [3]. Choi et al. have presented important features for identifying the social support needs of users based on the knowledge gathered from survey data. This study also has provided guidelines for a technical framework that can be used to predict the social support needs of users based on raw data collected from online health social networks (OHSNs) [14]. Using Twitter posts, Choudhury et al. have quantified postpartum changes in 376 mothers along dimensions of social engagement, emotion, social network, and linguistic style [5]. Choudhury et al. have considered multiple measures including activity, social capital, emotion, and linguistic style in participants’ Facebook data in pre-and postnatal periods. This study includes detecting and predicting the onset of postpartum depression (PPD) [7]. Park et al. have focused on Facebook to discern any correlations between the platform’s features and users’ depressive symptoms [15]. Hu et al. have built both classification and regression models based on linguistic and behavioral features acquired from 10,102 social media users and showed that users’ depression could be predicted via social media [16]. Tsugawa et al. have extensively evaluated the effectiveness of using a user’s social media activities for estimating the degree of depression [17]. Liu and Zhu have utilized a deep learning algorithm to build a feature learning model for personality prediction, which could perform an unsupervised extraction of the Linguistic Representation Feature Vector (LRFV) activity without supervision from text actively published on the Sina microblog [18]. Pedersen has screened Twitter users for depression and PTSD with lexical decision lists [19]. Guan et al. have identified Chinese microblog users with high suicide probability using internet-based profiles and linguistic features [20]. Liu et al. have proposed to detect suicide risk on social media using a Chinese suicide dictionary [21]. Wu et al. have analyzed Facebook status updates to determine the extent to which users’ emotional expression predicted their SWB -specifically their self-reported satisfaction with life [22].

In conjunction with all the above studies, in the following sections of this paper, a comprehensive framework for MHFE methods is presented, and appropriate functional criteria to analyze and evaluate each approach is proposed.

3. Proposed MHFE framework

There are many techniques for extracting features that could be used to predict the mental health issues of social network users. In this paper, a comprehensive framework for feature set collection is presented based on a set of features that, according to studies, falls into four main groups: linguistic style, contextual, structural, and user features. Each of these classes has several sub-classes, which are separated bases on more details. In the following, the proposed framework is described based on each proposed class and its sub-classes. Figure 1 shows the proposed framework for MHFE.

Figure 1.

The proposed framework for MHFE.

3.1 Linguistic style

Many of the studied articles have analyzed the content of social networks [18, 20, 22, 23], and each considered a specific language for analysis. Most of these analyses are done in English and Japanese, and some Spanish and Korean. Different methods have been used to analyze linguistic features that are discussed later. These search methods are based on social network data to find the relationship between the uses of words with mental health features.

3.1.1 Count vector

Consider the count vector of a corpus named C consisting of D document $\left\{{d_{1},d_{2},d_{3},d_{D}}\right\}$ and the N single token extracted from C. The size of the vector of the count vector M is equal to D * N. Each row in matrix M contains the frequency of the tokens in a document $D(i)$ .

Now there may be quite a few variations while preparing the above matrix M. The variations will generally be in the way a dictionary is prepared.

Because in real-world applications, we might have a corpus that contains millions of documents. And with millions of documents, we can extract hundreds of millions of unique words. So basically, the matrix that will be prepared like above will be a very sparse one and inefficient for any computation. So an alternative to using every unique word as a dictionary element would be to pick, say top 10,000 words based on the frequency and then prepare a dictionary.

The way count is taken for each word. We may either take the frequency (number of times a word has appeared in the document) or the presence (has the word appeared in the document?) to be the entry in the count matrix M. But generally, the frequency method is preferred over the latter.

Figure 2 is a representational image of the matrix M for easy understanding.

Figure 2.

matrix M in the count vector method.

3.1.2 Total emotion count (TEC)

This feature captures the number of words in a document that associate with emotion. Given document d, its corresponding feature vector is denoted by $d_{\textit{TEC}}$ . The feature value for the $j^{\text{th}}$ emotion is computed as follows:

$\displaystyle d_{\textit{TEC}}(e_{j})=\sum\limits_{w\in d}I(\mathop{\arg\max}% \limits_{k}\textit{Lex(w,k)})　\times\textit{count(w,d)}$ (1)

Where $I(.)$ is an indicator function and is set to 1 or 0 when the argument is true or false respectively. count(w,d) is the number of occurrences of word w in document d. Note that TEC only captures the popular emotion context of a word suggested by the lexicon (i.e., emotion with the highest score in the lexicon). However, not all words associate with just a single emotion. For example, even if the word beautiful may be associated moderately with both the emotions joy and love, the TEC emotion feature would force the word to contribute a count of 1 towards either of these emotions (depending on the scores from the lexicon Lex) and 0 towards the other. Therefore it is important to develop features that incorporate the relations between a word and multiple emotions.

Figure 3.

Co-occurrence method with constant window [23].

Figure 4.

An example of the co-occurrence matrix.

3.1.3 Total emotion intensity (TEI)

This is the sum of the emotion intensity scores of words present in a document. Unlike the coarse integer counts in TEC features, here, word-level emotion intensity scores offered by a domain-specific emotion lexicon (DSEL) are used to capture the emotional orientation of documents along with multiple emotion concepts (classes). Accordingly, the $d_{\textit{TEI}}$ term is the feature vector corresponding to the document $d$ . The feature value for the $j^{\text{th}}$ emotion is computed as follows:

$\displaystyle d_{\textit{TEI}}(e_{j})=\sum\limits_{w\in d}{\textit{Lex}(w,e_{j% })\times\textit{count(w,d)}}$ (2)

3.1.4 Max emotion intensity (MEI)

Given a document d, and its corresponding feature vector $d_{\textit{MEI}}$ , the feature value for the $j^{\text{th}}$ emotion is computed as follows:

$\displaystyle d_{\textit{MEI}}(e_{j})=\mathop{\arg\max}\limits_{w\in d}\textit% {Lex(w,j)}$ (3)

3.1.5 Graded emotion count (GEC)

Both TEC and TEI consider all the words in a document regardless of the intensity with which they convey emotion. However, it is useful to understand the impact of high-intensity words on emotion classification. GEC is similar in principle to TEC, except that it only captures the number of words in a document that associate with emotion and over a threshold value. The GEC features extracted using the DSELs are for the above three thresholds. Given a document d, and its corresponding feature vector $d_{\textit{GEC}}$ , the feature value for the $j^{\text{th}}$ emotion is computed as follows:

$\displaystyle d_{\textit{GEC}}(e_{j})=\sum\limits_{\begin{subarray}{c}w\in d\\ \textit{Lex(w,j)}\geqslant\delta\end{subarray}}$

(4) $\displaystyle\quad I(e_{j}=\mathop{\arg\max}\limits_{k}\textit{Lex(w,k)})% \times\textit{count(w,d)}$

Where $\delta$ is an experimental threshold.
3.1.6 Graded emotion intensity (GEI)

Given a document d, and its corresponding feature vector $d_{\textit{GEI}}$ , the feature value for the $j^{\textit{th}}$ emotion is computed as follows:

$\displaystyle d_{\textit{GEI}}(e_{j})=\sum\limits_{\begin{subarray}{c}w\in d\\ \textit{Lex(w,j)}\geqslant\delta\end{subarray}}{\textit{Lex(w,k)}\times\textit% {count(w,d)}}$ (5)

3.1.7 TF-IDF

Another method that is based on word frequency but differently performs counting is the TF-IDF method, which not only considers the occurrence of a word in a separate text but also in all corpus. Ideally, we want to reduce the weight of the words that are in most of the documentation and increase the importance of the words that are in the subset of the documentation.

$\displaystyle\textit{TF}\!=\frac{\#\text{times the word t appears in a % document}}{\#\text{words that appear in the whole document}}$ (6) $\displaystyle\text{IDF}\!=\text{log}\!\!\left(\frac{\text{\# documents that % the word t appears in}}{\text{\# documents}}\!\!\right)$ (7)

If a word appears in all the documentation, it is likely that the word association with a specific document is low. But if it appears in a subset of the documentation, it’s probably the word is associated with the document appearing in it.

Figure 5.

Individual Behavioral Patterns when Selecting UserNames [26].

3.1.8 Co-occurrence

The co-occurrence main idea is that similar words tend to appear in the same context; for example, Apple and Mongo tend to appear in the text in which fruit is. Co-occurrence For the given text is a pair of words that say the number of times w1 and w2 appear together in the content window. The Content window is specified with number and direction. Figure 3 shows an example of a content window 2 (around) [22].

Words with a dashed box are a 2 (around) context window for the word ‘Fox’ and for calculating the co-occurrence, only these words will be counted. Now, let us take an example corpus to calculate a co-occurrence matrix.

Corpus $=$ He is not lazy. He is intelligent. He is smart. Figure 4 shows the co-occurrence matrix for this example.

Let us understand this by seeing two examples in the table above Solid and the dashed box. Solid box- It is the number of times ‘He’ and ‘is’ have appeared in the context window two and it can be seen that the count turns out to be 4.

3.1.9 n-grams

N-gram model is a method of checking ‘n’ continuous words or sounds from a given sequence of text or speech. This model helps to predict the next item in a sequence. Unigram refers to n-gram of size 1, Bigram refers to n-gram of size 2, and Trigram refers to n-gram of size 3. Higher n-gram refers to four-gram, five-gram, and so on [24]. N-grams can reflect semantics that cannot be captured by looking at words individually.

3.1.10 Part-of-speech (POS)

Part-of-speech tagging on non-social media data sets is done using the Stanford POS tagger, whilst the Twitter NLP tool from Carnegie Mellon University was used for tagging social media data sets.

3.2 Contextual

Though standard words can convey the emotional intention of the author, additional expressions such as punctuation marks, emoticons are often used on social media to express emotions. Further sentiment-bearing words could indicate the emotion in the text and also alter its orientation from positive emotion (e.g., joy) to negative emotion (e.g., sadness) or vice versa. The following contextual features have been used in sentiment and emotion classification:

•
Capitalized words: This feature counts the number of words in a document with all upper case characters.
•
Elongated words: This feature counts the number of words with characters repeated two, three, or four times.
•
Punctuation: Emotions are intensified on social media using exclamation marks and question marks. Two features were included to model the occurrence of question marks and exclamation marks in a document.
•
Emoticons: Emoticons are facial expressions captured pictorially and are often used on social media to convey emotions. A binary feature is designed to model the presence/absence of emoticons in a document. The emoticon list is adopted from earlier work in emotion classification.
•
Negation: Though the role of negation is not extensively studied for emotion classification, following its usefulness for sentiment classification

Figure 6.
An example of FOAF.

•
Sentiment: An exhaustive list of positive and negative words is created by merging the aforementioned lexicons to extract sentiment features from the documents.

3.3 User

3.3.1 Behavior

Analysis of user behaviors and activities on the social network can also help to analyze user mental health.

$\bullet$ User behavior trajectory along a temporal dimension

User behavior trajectory refers to all the social behavior of a user exhibited on the platforms along the timeline, e.g., befriend, follow/unfollow, retweet, thumb-up/thumb-down, etc.

Both empirical and social behavior studies demonstrate that, over a sufficiently long period of time, a user’s social behavior exhibits a surprisingly high level of consistency across different platforms [25].

$\bullet$ UserName selection

Most sites maintain the anonymity of users by allowing them to freely select usernames instead of their real identities, and also different websites employ different user-naming and authentication systems. In terms of information availability, usernames seem to be the minimum common factor available on all social media sites. Usernames are often alphanumeric strings or email addresses, without which users are incapable of joining sites. Selecting the usernames is a behavioral action that may have its behavioral patterns. Figure 5 depicts a summary of these behavioral patterns observed in individuals when selecting usernames [26].

As depicted in this figure, the behavior of username selection consists of human limitation, exogenous factors, and endogenous factors. The human limitation is mainly based on time & memory limitation and knowledge limitation. Endogenous factors are based on personal attributes & traits, and habits. Time & memory limitation is classified using the same usernames, username length likelihood, and username uniqueness likelihood. Knowledge limitation is classified into a limited vocabulary and limited alphabet. Exogenous factors are classified into typing patterns and language patterns. Personal attributes & traits are classified into personal information and username randomness. Finally, habits are classified as modifying previous usernames, creating similar usernames, and username observation likelihood.

3.3.2 Interaction

An approach that is mainly useful in discovering hidden relationships, communication, and the process of complex systems through mathematical and graphical techniques is network analysis. However, given the multiplicity of these methods, the use of SNA methods is complicated and confusing. To attempt to use the SNA methodology in health research issues [4], offers a categorization of SNA methods. Structural analysis; discrete-time intervals describe network topology, the role of specific nodes, communities, and subgroups in the network, and so on.

On social networks, the importance of knowing how many people can be used to discover the number of people exposed to mental disorders, as well as the discovery of hidden populations and social locations. Individual differences in personality construct sensitivity to social disconnection [27].

$\bullet$ Tagging

The similarity between users taking into account the tags they share can be calculated as defined below [28]:

$\displaystyle w_{ut}=\sum\limits_{i}{\frac{1}{\left|{M_{ui}}\right|}}\textit{% if}(t\in M_{ui}),i\in I$ (8)

where $w_{ut}$ denotes the weight of tag t labeled by user $u$ , Mu ${}_{i}$ is the tag list that user u gave to item $i$ , and $\left|{M_{ui}}\right|$ is the number of tags. Then the correlation between users and items can be computed via mapping them in the tag space model as defined:

$\displaystyle w_{jt}=\sum\limits_{u}{\frac{1}{\left|{M_{uj}}\right|}}\textit{% if}(t\in M_{uj}),u\in U$ (9)

Where $w_{jt}$ denotes the weight of tag $t$ of item $j$ , $M_{uj}$ is the tags list, and $\left|{M_{ui}}\right|$ is the number of tags that user $u$ gave to item $j$ .

3.3.3 Profile

Features such as age, gender, geographic location, education, and general user profiles can also be used to analyze their mental health. In most articles that have been designed to analyze mental health, especially in social networks, to enhance the accuracy of the model, user profile features are combined with other features.

$\bullet$ UserName

A username or display name is a string of numbers, characters, and letters. Intuitively, the longer the length of a common substring or common subsequence of a pair of the username or display name is, the more similar these two accounts are [29].

$\bullet$ Description

This is the short write-up / ‘bio’ / ‘about me’ which the user provides about himself [30].

$\bullet$ FOAF (Friend Of A Friend)

It is a machine-readable semantic vocabulary describing people, their relationships, and activities. It is written in XML syntax and adopts the conventions of the Resource Description Framework (RDF) to define a set of attributes [31]. Default metrics to compute the similarity of each FOAF attribute is provided in Fig. 6.

Various techniques can be used to measure the similarity score between two textual/string values and can be grouped into two main categories:

Syntactic-based similarity approaches: provide exact or approximate lexicographical matching of two values. Using exact similarity techniques can lead to poor similarity results since frequent variations of a word exist and typing errors are common. Thus, approximate string matching techniques can be used to compute the distance between two values that have a limited number of different characters.

Semantic-based similarity approaches: are used to measure how two values, lexicographically different, are semantically similar.

3.4 Structural

3.4.1 Structure similarity

Given two users $u_{i}$ and $u_{j}$ their structure similarity can be calculated as follows [32]:

$\displaystyle S_{ij}=\textit{Sim}(u_{i},u_{j})=\left|N_{u_{i}}\cap N_{u_{j}}\right|$ (10)

The structure similarity is measured by two users’ common friends. $N_{u_{i}}$ represents neighbors of the user $u_{i}$ . $\left|{N_{u_{i}}\cap N_{u_{j}}}\right|$ represents the number of $u_{i}$ ’s and $u_{j}$ ’s common friends [24].

3.4.2 Closeness centrality (

\text{C}C_{u})

This measure requires considering the distance between two vertices $u$ and $v$ , defined as the length $SP(u,v)$ of the shortest path (geodesic distance) connecting them. It is defined as the reciprocal of the sum of all distances from v to all other vertices in the network [33]:

$\displaystyle\text{C}C_{u}=\frac{1}{\sum{u\in V^{sp(u,v)}}}$ (11)

3.4.3 Betweenness centrality (

\textit{BC}_{v})

Given any three distinct vertices v, u and w, let $\sigma$ uw be the number of shortest paths from u to w and let $\sigma$ uw(v) be the number of the shortest paths from u to w passing through v. The Betweenness Centrality $BC_{v}$ of v is defined as follows [34]:

$\displaystyle\textit{BC}_{v}=\sum\limits_{v\neq u\neq w\in V}{\frac{\delta_{uw% }(v)}{\delta_{uw}}}$ (12)

3.4.4 Homophily

Homophily is defined as the tendency of individuals to become friends with those who are similar to themselves. The homophily concept can be considered a promising tool to analyze the structure of social networks. In other words, homophily is one of the most basic notions, which provides us an illustration of how a social network’s surrounding contexts can drive the formation of relationships [35].

Often, when we look at a network, such contexts capture some of the dominant features of its overall structure. Of course, there are strong interactions between intrinsic and contextual effects on the formation of any single link [35].

Suppose we have a network in which a p fraction of all individuals are male, and a q fraction of all individuals are female. Consider a given edge in this network. If we independently assign each node the gender male with probability p and the gender female with probability q, then both ends of the edge will be male with probability p 2, and both ends will be female with probability q 2. On the other hand, if the first end of the edge is male and the second end is female or vice versa, then we have a cross-gender edge, so this happens with probability 2pq. So we can summarize the test for homophily according to gender as follows:

Homophily Test: If the fraction of cross-gender edges is significantly less than 2pq, then there is evidence for homophily [35].

Homophily can help us to investigate the structural formation of social networks based on every feature. This concept can also be used to evaluate mental health features.

3.4.5 Fuzzy homophily

Fuzzy homophily is a fuzzy inference system that uses the distributions of attributes and inner links as input and infers the homophily measure as output. It is assumed that the attributes have only two values, for example, male and female. Therefore D1 is used for male distribution and D2 is used for female distribution. Similarly, L1 is male-to-male total links and L2 is female-to-female total links [36].

Fuzzy homophily is an attribute that compares two individual distributions and indicates relationships between two distributions, for example, male and female gender. It is based on fuzzy inference systems and demonstrates that relationships are normal or abnormal. It is clear that normal relations show mental health in that society, as abnormal relations show mental disorders in the gender field.

Figure 7.

Fuzzy homophily [36].

4. Criteria and evaluation of MHFE methods

4.1 Criteria

Feature extraction is highly subjective and depends on the problem’s type. There is no generic feature extraction technique that works in all cases. Such as classifiers, it is not possible to say which the best technique for feature extraction or extraction is. It highly depends on the application [37]. In this section, we will introduce the five functional criteria to study and evaluation of the various techniques of mental health feature extraction in social networks: accuracy, speed, the function of each technique on the large data, the scalability of each technique, the generality of each for the overall analysis of the mental health of the individual in the social network and the computational complexity of them [20, 22, 24, 38].

4.1.1 Accuracy

According to this criterion, the accuracy of the model created based on the characteristics of the proposed method is calculated. This criterion is the ratio of the accurate prediction pf user mental disordering to all predictions, is shown in Eq. (13):

$\displaystyle\text{Accuracy}=\frac{\textit{TP}}{\textit{TP}+\textit{FP}}$ (13)

4.1.2 Speed

According to this criterion, the speed of extraction is calculated.

$\displaystyle\textit{Speed}=(\#\textit{instructions})\times t$ (14)

Where $t$ is the execution time per instruction.

4.1.3 Scalability

Usually, a machine-learning algorithm or feature extraction method works well on small data and then extended to large data sets. At that time, a lot of problems occur such as the curse of dimensionality, processing time and computational complexity, and learning time increases exponentially. To overcome this problem, it is obvious that algorithms should also work well for large data sets.

4.1.4 Generality

According to this criterion, having general rather than specific validity of the feature will be depicted. In other words, a feature extraction algorithm must be valid for new data, which does not exist in previously trained data. This criterion prevents from over-training of the model.

4.1.5 Computational complexity

The computational complexity or simply complexity of an algorithm is the number of resources required for running it. The computational complexity of a problem is the minimum of the complexities of all possible algorithms for this problem [39].

4.2 Evaluation and discussion

In this section, we will evaluate mental health feature extraction methods based on a number of key challenges with their key advantage and disadvantage. This qualitative assessment determines how well these qualitative methods have been able to address the challenge that is labeled at three levels: High (H), Low (L), and Medium (M). This qualitative analysis of mental health features extraction methods based on key challenges is shown in Tables 1–3.

Table 1
Proposed qualitative analysis of mental health features extraction for linguistics style methods based on key challenges

[height=1.8cm,width=3.3cm]Methods of
mental health
feature extraction
Challenges	Accuracy	Speed	Scalability	Generality	Computational complexity	Advantages	Disadvantages
Linguistic style	Count vector	M	M	M	M	M	• Models each document by counting the number of times each word appears	• It makes a sparse matrix that is inefficient for any computation. • The frequencies of the terms remain intact, although grammar and order are lost. • It gives a bias to longer documents since terms have a higher likelihood to occur a greater number of times in them. • Nor take into account the frequency of terms across all documents in the corpus.
	Total Emotion Count (TEC)	M	H	H	H	L	• Captures the popular emotion context of a word suggested by the lexicon	• Captures not all words associate with just a single emotion. • Consider all the words in a document regardless of the intensity with which they convey an emotion.
	Total Emotion Intensity (TEI) Max Emotion Intensity (MEI)	M M	H H	M H	M H	L M	• Capture the emotional orientation of documents along with multiple emotion concepts (classes)	• Consider all the words in a document regardless of the intensity with which they convey an emotion.
	Graded Emotion Count (GEC)	H	M	M	M	H	• Captures the number of words in a document that associate with emotion and over a threshold value • Understand the impact of highintensity words on emotion classification	• It utilizes only high-intensity emotion words from a DSEL, resulting in a drop in coverage.
	Graded Emotion Intensity (GEI)	H	M	M	M	M	• Captures the sum of intensity scores of words in a document and over a threshold	• It utilizes only high-intensity emotion words from a DSEL, resulting in a drop in coverage.
	TF-IDF	H	L	H	H	H	• Reduce the weight of the words that are in most of the documentation • Increase the importance of the words that are in the subset of the documentation • Both use normalization with regard to the length of the document and that the rarity of the terms across all documents is taken into consideration	• Maybe slow for large vocabularies. • Makes no use of semantic similarities between words.
	n-grams	H	M	ï¼¨	M	H	• Helps to predict the next item in a sequence • Can reflect semantics that cannot be captured by looking at words individually	• Inability to capture the underlying emotion semantics, thereby resulting in overall performance degradation. • Computational complexity.

Table 1, continued
[height=1.8cm,width=3.3cm]Methods of
mental health
feature extraction
Challenges	Accuracy	Speed	Scalability	Generality	Computational complexity	Advantages	Disadvantages
	Part-of-Speech (POS)	H	M	H	H	M	• Lead to marginal improvements over n-grams in emotion classification	• The ineffectiveness of POS features suggests that emotions are expressed more implicitly and not just by direct words.
	Co-occurrence	M	M	M	H	M	• Map count-based statistics of the same event between neighboring words to a small, dense word vector	• It does not work well on its own and therefore requires other intelligent methods to work on its results.

Table 2

Proposed qualitative analysis of mental health features extraction for contextual and structural methods based on key challenges

[height=1.8cm,width=3.3cm]Methods of
mental health
feature extraction
Challenges	Accuracy	Speed	Scalability	Generality	Computational complexity	Advantages	Disadvantages
Contextual	Capitalized words Elongated words Punctuation Emoticons Negation Sentiment	L L H H M M	H H M M M L	M M H H H M	L L M M M M	L L M M M M	• Sentiment-bearing words could indicate the emotion in the text and also alter its orientation from positive emotion (e.g. joy) to negative emotion (e.g. sadness) or vice versa	• Loss of generality • Simple and need to be combined with other features
	Structure similarity	M	L	H	H	H	• Refers to the user’s setting position in the network graph with the following features: vertex degree, number of triangles, clustering coefficient, eigenvector centrality, average shortest path length, and so on	• Computational complexity • Doesn’t give a perfect view of the most influential nodes in a graph, but rather a good representation
	Closeness centrality	H	M	M	H	H	• The reciprocal of the sum of all distances from v to all other vertices in the network
	Betweenness centrality	H	M	M	H	H	• Detecting the amount of influence a node has over the flow of information in a graph • Find nodes that serve as a bridge from one part of a graph to another
	Homophily	M	M	H	M	L	• One of the most basic notions, which provides an illustration of how a social network’s surrounding contexts can drive the formation of relationships
	Fuzzy homophily	M	L	H	M	M	• An attribute that compares two individual distributions and indicates relationships between two distributions

Table 3

Proposed qualitative analysis of mental health features extraction for user methods based on key challenges

[height=1.8cm,width=3.3cm]Methods of
mental health
feature extraction
Challenges	Accuracy	Speed	Scalability	Generality	Computational complexity	Advantages	Disadvantages
User	Behavior	User behavior trajectory	H	M	M	L	L	• Refers to all the social behavior of a user exhibited on the platforms along the timeline, e.g., befriend, follow/unfollow, retweet, thumb-up/thumb-down, etc. • Over a sufficiently long period, a user’s social behavior exhibits a surprisingly high level of consistency across different platforms s	• Loss of generality
		UserName selection	M	M	L	L	M	• The minimum common factor available on all social media sites • Is a behavioral action that may have its behavioral patterns	• Loss of generality
		Tagging	H	L	M	M	H	• Discrete-time intervals describe network topology, the role of specific nodes, communities, and subgroups in the network, and so on	• Loss of generality • Complicated and confusing
		UserName description	M M	H M	L M	M H	L H	• Enhance the accuracy of the model, combined with other features	• Simple and need to be combined with other features
		FOAF (Friend Of A Friend)	H	L	H	H	M	• Allows groups of people to describe social networks without the need for a centralized database.	• Complicated and confusing

A comprehensive taxonomy for MHFE techniques is needed to appropriate understanding and correct use of them. However, these techniques can be classified into several points of view. Furthermore, to analyze and evaluate each approach, several criteria are needed. In this paper, four classes and several sub-classes are proposed as a general framework and are evaluated qualitatively with the accuracy of the model based on the characteristics of the proposed technique, the speed of extraction, the scalability of each, the generality of each technique for the overall analysis of the mental health of the individual in the social network and the computational complexity.

Linguistic style considers a specific language for analysis and is based on the content of social networks, including Count Vector, TEC, TEI, MEI, GEC, GEI, TF-IDF, Co-occurrence, N-grams, and POS [18, 20, 22, 23]. Contextual class is consists of Capitalized Words, Elongated Words, Punctuation, Emoticons, Negation, and Sentiment [1, 2, 3, 4, 5, 7, 15, 16, 19]. User class is consist of Behavior [17, 18, 25], User-Name Selection [26], Interaction [27, 28], and Profile [29, 30, 31]. Structural class is consist of Structural Similarity [24, 32], Closeness Centrality [33], Between-Ness Centrality [34], and homophily [35, 36].

Count vector in the linguistic analysis at a moderate level in all metrics. Total Emotion Count (TEC) has a moderate accuracy, speed, scalability and generality is high and has low computational complexity. Total Emotion Intensity (TEI) is the same as TEC in accuracy, speed, and computational complexity but has high scalability and generality.

Max Emotion Intensity (MEI) has moderate accuracy and computational complexity and has high speed, scalability, and generality. Graded Emotion Count (GEC) and Graded Emotion Intensity (GEI) are the same in high accuracy and moderate speed, scalability, and generality but are against computational complexity, where GEC is high and GEI is moderate.

TF-IDF has a high score on accuracy, scalability, generality, and computational complexity but has a low speed.

The use of n-gram in linguistic analysis, because of needing very large statistical set, each of which contains a set of word vertices, with the relationships between them, challenges the accuracy and computational complexity at a high level, but in terms of scalability, and in terms of generality and speed is at a moderate level [24].

Part-of-Speech (POS) has a high score in accuracy, speed, scalability, and generality and a moderate score in computational complexity. Co-occurrence is the same as Count Vector except that in generality, which is high.

In the contextual category, Capitalized words and Elongated words are the same in low accuracy, generality, and computational complexity, high speed, and moderate generality. Punctuation and Emoticons are the same in high accuracy and scalability and moderate speed, generality, and computational complexity. Negation has moderate accuracy, speed, generality, and computational complexity, and high scalability. Sentiment has moderate accuracy, scalability, generality, and computational complexity, and low speed.

In the structural category, Structure Similarity and Between-ness Centrality are the same and have moderate accuracy, low speed, and high scalability, generality, and computational complexity. Closeness Centrality has high accuracy, generality, and computational complexity and has moderate speed and scalability. Homophily and fuzzy homophily are the same in moderate score in accuracy and generality and high score in scalability but are against in speed and computational complexity where respectively homophily has moderate and low scores while fuzzy homophily has low and moderate scores.

In user categories, there are three subcategories named behavior, interaction, and profile. User behavior trajectory has high accuracy, moderate speed, and scalability, and low generality and computational complexity. The Username selection has moderate accuracy, speed, and computational complexity and low scalability and generality. Tagging has high accuracy and computational complexity, low speed, and moderate scalability, and generality. The username has moderate accuracy and generality, high speed, and low scalability, and computational complexity. The Description has moderate accuracy, speed, and scalability, and high generality, and computational complexity. Finally, FOAF (Friend of a Friend) has high accuracy, scalability, and generality, low speed, and moderate computational complexity.

5. Conclusions

In this paper, a coherent framework is presented for feature extraction methods in social networking data related to the mental health of individuals, and the evaluation of each method is presented. Furthermore, by identifying and introducing these methods and their challenges, appropriate functional indicators are provided for analyzing feature extraction methods. Utilizing the proposed criteria to analyze different methods leads to a more accurate understanding of these methods and their proper use and the possibility of accurate comparison and evaluation of these techniques. What seems to be in this framework is the use of structural features of the network because of its high correlation with personality traits and the behaviors and interactions discovered on the social network and the combination of these features with linguistic features can challenge the accuracy and generality that is most important metrics in techniques of mental health analysis. Despite the efforts made in this research, there are still many challenges that need to be addressed by researchers in the future. Specifically, it is suggested that in the future, analytical methods as well as implementation using valid brand benchmarks to be used and each of the proposed methods and criteria to be re-evaluated.

References

Singh

. Study of mental health and emotional improvement. Anish Kumar Verma, 2017.

Callison-Burch

Guadagno

Davis

. Building a safer community with new suicide prevention tools. Facebook Newsroom, 2017.

Guntuku

Yaden

Kern

Ungar

Eichstaedt

. Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences. 2017; 18: 43-9.

Benhiba

Loutfi

Abdou

Idrissi

. A classification of healthcare social network analysis applications. In: 10th Internafional Conference on Health Informafics, Proceedings; Part of 10th Internafional Joint Conference on Biomedical Engineering Systems and Technologies; 2017; 5: pp. 147-158.

De Choudhury

Counts

Horvitz

. Predicting postpartum changes in emotion and behavior via social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. 2013, pp. 3267-3276.

Wongkoblap

Vadillo

Curcin

. Researching mental health disorders in the era of social media: systematic review. Journal of Medical Internet Research. 2017; 19(6): e228.

De Choudhury

Counts

Horvitz

Hoff

. Characterizing and predicting postpartum depression from shared facebook data. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, 2014, pp. 626-638.

Chandrashekar

Sahin

. A survey on feature selection methods. Computers & Electrical Engineering. 2014; 40(1): p16-28..

Molina

Belanche

Nebot

. Feature selection algorithms: A survey and experimental evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. pp. 306-313.

10.

Xue

Zhang

Browne

Yao

. A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation. 2015; 20(4): 606-26.

11.

Diaz-Chito

Ferri

Hernández-Sabaté

. An overview of incremental feature extraction methods based on linear subspaces. Knowledge-Based Systems. 2018; 145: 219-35.

12.

Yıldız

. On the feature extraction in discrete space. Pattern Recognition. 2014; 47(5): 1988-93.

13.

AlNuaimi

Masud

Serhani

Zaki

. Streaming feature selection algorithms for big data: A survey. Applied Computing and Informatics. 2020.

14.

Choi

Kim

Lee

Kwon

Choo

Huh

. Toward predicting social support needs in online health social networks. Journal of Medical Internet Research. 2017 Aug 2; 19(8): e7660.

15.

Park

Lee

Kwak

Cha

Jeong

. Activities on Facebook reveal the depressive state of users. Journal of Medical Internet Research. 2013; 15(10): e217.

16.

Heng

Zhu

. Predicting depression of social media user on different observation windows. In: 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). 2015; 1: 361-364.

17.

Tsugawa

Kikuchi

Kishino

Nakajima

Itoh

Ohsaki

. Recognizing depression from twitter activity. InProceedings of the 33rd annual ACM conference on human factors in computing systems. 2015. pp. 3187-3196.

18.

Liu

Zhu

. Deep learning for constructing microblog behavior representation to identify social media user’s personality. PeerJ Computer Science. 2016; 2: e81.

19.

Pedersen

. Screening Twitter users for depression and PTSD with lexical decision lists. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality. 2015. pp. 46-53.

20.

Guan

Hao

Cheng

Yip

Zhu

. Identifying Chinese microblog users with high suicide probability using internet-based profile and linguistic features: classification model. JMIR mental health. 2015; 2(2): e4227.

21.

Liu

Tov

Kosinski

Stillwell

Qiu

. Do Facebook status updates reflect subjective well-being? Cyberpsychology, Behavior, and Social Networking. 2015; 18(7): 373-9.

22.

Zhou

Huang

. A fuzzy logic-based text classification method for social media data. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2017. pp. 1942-1947.

23.

Grayson

Wade

Meaney

Greene

. The sense and sensibility of different sliding windows in constructing co-occurrence networks from literature. InInternational Workshop on Computational History and Data-Driven Humanities. 2016; pp. 65-77.

24.

Tripathy

Agrawal

Rath

. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications. 2016; 57: 117-26.

25.

Liu

Wang

Zhu

Zhang

Krishnan

. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 2014. pp. 51-62.

26.

Zafarani

Liu

. Connecting users across social media sites: a behavioral-modeling approach. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013. pp. 41-49.

27.

Kortink

Weeda

Crowley

Moor

van der Molen

. Community structure analysis of rejection sensitive personality profiles: A common neural response to social evaluative threat? Cognitive, Affective, & Behavioral Neuroscience. 2018; 18(3): 581-95.

28.

Sun

Han

Huang

Wang

Zeng

Wang

Yan

. Recommender systems based on social networks. Journal of Systems and Software. 2015; 99: 109-19.

29.

Peng

Zhang

Yin

. Matching user accounts across social networks based on username and display name. World Wide Web. 2019; 22(3): 1075-97.

30.

Malhotra

Totti

Meira

, Jr. Kumaraguru

Almeida

. Studying user footprints in different online social networks. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2012. pp. 1065-1070.

31.

Raad

Chbeir

Dipanda

. User profile matching in social networks. In: 2010 13th International Conference on Network-Based Information Systems. 2010. pp. 297-304.

32.

Zou

Yang

Zhang

. Microblog sentiment analysis using social and topic context. PloS One. 2018; 13(2): e0191163.

33.

Agreste

De Meo

Ferrara

Piccolo

Provetti

. Trust networks: Topology, dynamics, and measurements. IEEE Internet Computing. 2015; 19(6): 26-35.

34.

Sapountzi

Psannis

. Social networking data analysis tools & challenges. Future Generation Computer Systems. 2018; 86: 893-913.

35.

Easley

Kleinberg

. Networks, crowds, and markets. Cambridge: Cambridge university press, 2010.

36.

Heidarpour

Emami

Shirazi

. Fuzzy homophily in social networks. In: 2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS). 2015. pp. 1-4.

37.

Shuai

Shen

Yang

Lan

Lee

Philip

Chen

. A comprehensive study on social network mental disorders detection via online social media mining. IEEE Transactions on Knowledge and Data Engineering. 2017; 30(7): 1212-25.

38.

Compagnon

Olliver

. Graph embeddings for social network analysis, state of the art (Doctoral dissertation, Master’s thesis, INSA Lyon, 2016. p. 22.

39.

Arora

Barak

. Computational complexity: a modern approach. Cambridge University Press, 2009.

Analytical framework for mental health feature extraction methods in social networks

Abstract

Keywords

1. Introduction

2. Related works

3. Proposed MHFE framework

3.1.1 Count vector

(4) I ( e j = arg ⁡ max k Lex(w,k) ) × count(w,d) Where δ is an experimental threshold. 3.1.6 Graded emotion intensity (GEI)

3.1.9 n-grams

3.1.10 Part-of-speech (POS)

3.2 Contextual

3.3.1 Behavior

3.3.2 Interaction

3.4 Structural

3.4.1 Structure similarity

3.4.5 Fuzzy homophily

4.1 Criteria

4.1.1 Accuracy

4.1.4 Generality

4.1.5 Computational complexity

4.2 Evaluation and discussion

Table 1 Proposed qualitative analysis of mental health features extraction for linguistics style methods based on key challenges

References

(4) $\displaystyle\quad I(e_{j}=\mathop{\arg\max}\limits_{k}\textit{Lex(w,k)})% \times\textit{count(w,d)}$

Where $\delta$ is an experimental threshold.
3.1.6 Graded emotion intensity (GEI)

Table 1
Proposed qualitative analysis of mental health features extraction for linguistics style methods based on key challenges