Abstract
Due to the diversity of text expressions, the text sentiment classification algorithm based on semantic understanding is difficult to establish a perfect sentiment dictionary and sentence matching template, which leads to strong limitations of the algorithm. In particular, it has certain difficulties in the classification of student sentiments. Based on this, this paper analyzes the student sentiment classification model by neural network algorithm and uses the student group as an example to explore the application of neural network model in sentiment classification. Moreover, the regularization method is added to the loss function of LSTM so that the output at any time is related to the output at the previous time. In addition, the sentimental drift distribution of sentimental words on each sentimental label is added to the regularizer, and the sentimental information is merged with the two-way LSTM to allow the model to choose forward or reverse. Finally, in order to verify the research model, the performance of the model proposed in this paper is studied through experimental research. The research shows that the model proposed in this paper has better comprehensive performance than the traditional model and can meet the actual needs of students’ sentiment classification.
Keywords
Introduction
Through research and analysis, it is found that the text sentiment classification algorithm based on semantic understanding is relatively simple, and it is not easy to find complex and changeable text sentiment models, and it is difficult to establish a perfect sentiment dictionary and sentence pattern matching template, which leads to strong limitations of the algorithm. The effect of machine learning text sentiment classification algorithm based on artificial feature engineering depends largely on artificial feature engineering. Moreover, the traditional artificial feature engineering has problems such as semantic loss, matrix sparseness, and dimensional explosion, and this type of algorithm is based on a shallow classification model, which is difficult to capture deep-level features, so the algorithm’s generalization ability is not strong. The text classification algorithm based on deep learning can achieve better text representation through neural network language model and can capture deep features of text. Therefore, compared with the first two algorithms, a better classification effect can be obtained. However, most deep learning algorithms are data-driven and ignore the prior knowledge about sentiments and weights in natural language, which results in the algorithm requiring a large number of training samples to obtain good results [1].
Text sentiment analysis is an important task in natural language processing. It has a wide range of applications and is not limited to the above examples facing the general public. Sentiment analysis is also very helpful for special people. For example, by analyzing the words and text of autistic children, we can learn their preferences and sentimental changes, help parent to enter their inner world, and promote the treatment of autistic children. Moreover, sentimental modeling of prison inmates can strengthen prison officers’ understanding of prisoners. This can not only help the prisoner to reform better, but also predict the sentimental fluctuation of the prisoner to prevent the prisoner from excessive behavior. As a sub-task of sentiment analysis, sentiment classification plays an important role in the field of natural language processing. The goal of sentiment classification is to divide the text into positive, negative, or more detailed classifications, such as very positive, positive, neutral, negative, very negative, etc. according to the polarity of sentiments. Before the advent of the information age, people realized the importance of sentiment classification and proposed various classification methods. After entering the 21st century, with the explosive growth of information and the rapid development of technology, machine learning has also experienced rapid progress [2]. Similarly, the methods and methods of sentiment classification are also changing rapidly. In addition to the traditional machine learning classification methods, various neural network models of deep learning have also appeared.
As a major research direction of natural language processing, sentiment analysis is to analyze and judge the sentiment color of a sentence. This sentimental tendency can be simply positive sentiments and negative sentiments, and it can also be specific multiple sentiments, such as sadness, collapse, joy, panic, etc. Sentiment analysis is from a simple but shallow method based on dictionaries and rules to machine learning methods based on labeled data, and then to supervised and unsupervised methods based on deep learning [3].
In this paper, the neural network algorithm is used to analyze the student sentiment classification model, and the student group is taken as an example to explore the application of the neural network model in sentiment classification.
Related work
As the original and simplest research method of sentiment analysis, the sentiment analysis method based on sentiment dictionary simply determines the sentiment intensity of each word or phrase based on the dictionary, and then integrates the sentiment intensity of each word or phrase as the sentiment tendency of the text, which completely ignores the semantic connection between the words. The literature [4] has carried out related research on how to build a semantic dictionary through a corpus. The literature [5] did a research on the sentiment tendency of English words on the basis of the interrelationship of sentiments in the sentiment experiment of conjunctions. The literature [6] used the existing Wordnet English sentiment dictionary to conduct related research on sentiment tendencies. Literature [7] uses Wordnet to measure the semantic correlation between words, which is used as the similarity of emotional words to calculate the emotional tendency of words.
Deep learning originates from traditional machine learning but is superior to traditional machine learning. It is an algorithm with complex models and complex calculations, but with stronger calculation and learning capabilities. Moreover, its role is to train neural network structures deeper than previous machine learning and optimize algorithms by changing models. Deep learning frameworks have been widely used and researched in recent years in image processing, automatic speech recognition and natural language processing. This article will mainly introduce the application of deep learning in sentiment analysis of natural language processing and judgment of evaluation objects. In order to better promote the study of sentiment analysis, the recursive self-encoding model proposed in the literature [8] can learn the vector representation of phrases and entire sentences and can extract the hierarchical structure of phrases from unlabeled data sets. Now, there are many researches about capturing the semantic relationship between words through word vectors, and the recursive neural network model proposed in the literature [9] is based on these studies and methods. Compared with others, the semantic relationship captured by this model is more fine-grained. The advantage over traditional recurrent neural networks is that the model can learn the combined vector relationship between phrases and sentences through matrix and vector regardless of length. That is, the vector captures the meaning of a word and the matrix can capture the meaning of other words that contain the word. The model uses a grammatical structure parse tree to perform recursive combined word capture from the bottom up, so that it can be applied to longer phrases or sentences. This method has achieved 79% accuracy in the experimental verification of the movie review data set. The semantic vector space model has its own advantages, but its disadvantage is that it cannot fully express longer phrases. In the task of real sentiment detection, sentiment often needs more fine-grained division, such as: very good, good, general, bad, very bad, so we need to train a more complex model to deal with such tasks, which is a new challenge for sentiment analysis. To this end, the literature [10] once again proposed a neural model of a recurrent neural network that can handle arbitrary lengths. In this model, the basic unit of text is represented by a word vector and a parse tree, and the parent node vector is calculated by a tensor-based combination function. Finally, the model is validated on the Stanford sentiment dataset. What has been introduced above is a sentiment analysis method based on a tree-dependent recurrent neural network. However, there is another solution that uses recurrent neural networks to perform natural language processing tasks. Recursive neural networks can make many language problems have reasonable and feasible explanations through parse trees. As we all know, recurrent neural network is a special form of recurrent neural network. Since recurrent neural networks or their variants will achieve better results than recursive neural networks on most natural language processing problems, recurrent neural networks are now more widely used. Literature [11] proposed a recurrent neural network model based on being able to process texts of different lengths. The literature [12] proposed a long-term and short-term memory network with a tree structure that can make better use of the correlation between semantics. The tree-shaped long-term and short-term memory network is different from the standard long- and short-term memory network in that it relies on more unit child nodes to update the memory state and the threshold unit vector. In addition, convolutional neural networks are more commonly used in computer vision and image processing problems. In recent years, more attempts have been made to extend the use of convolutional neural networks to natural language processing tasks. The biggest feature of convolutional neural network is to extract and learn local feature values through convolution kernel. The literature [13] used convolutional neural networks to solve the problem of semantic analysis and achieved good experimental results. The literature [14] proposed a convolutional neural network architecture that uses convolution to learn word vector representations, then generates new feature values for word vector filtering operations, and then performs sentence classification based on the feature values. The literature [15] proposed a short text deep convolutional neural network sentiment analysis model. The number of convolutional layers used in this model is two, and the relevant feature values are extracted through convolution, so as to realize the extraction of sentimental information of short text from the character layer to the sentence layer. The verification test was conducted on a public data set of sentiment analysis and achieved an accuracy rate of 85.7% for two-class classification problems and 48.3% for fine-grained classification. We all know that in order to avoid some information input that is not related to the task, the parameter initialization of the convolutional neural network is very important for the accuracy of the entire training. The literature [16] proposed a deep convolutional neural network model that uses sentence matrix to initialize network parameters.
LSTM model fusing sentiment information
-- , - , 0, + , ++++ in the histogram indicates very negative, negative, neutral, positive and very positive, respectively.
LSTM is a sequence model, and new words are added at every moment until all short texts are entered. In order to meet the above five situations in the process of LSTM fusion of sentimental information, the idea of this article is to abstract the impact of these five situations into KL divergence (relative entropy), and add it to the loss function of LSTM by using regularization to make the output at any time related to the output at the previous time.
Taking the sentence in Fig. 1 as an example, when the input is “like”, the output p2 corresponding to the sentimental word will change compared to the output p1 at the previous time. Therefore, the polarity direction of the direction “like” is changed, that is, the positive probability of p2 increases on the basis of p1. This phenomenon is called sentimental drift (sentiment _ drift). When the input is the non-sentimental information word “the”, the corresponding output p3 is similar to the output p2 at the previous moment. When the input is a degree adverb “very”, because it indicates degree enhancement, the corresponding output p5 should drift in a very positive direction based on the output p4 of the previous moment. The realization of all these depends on the change of LSTM loss function [17].

The output change of LSTM with sentimental information at each moment.
The original cross-entropy loss function of LSTM is:
Among them,
Among them, Lt,i is all the regularized sentimental information added to sentence i or called regularizer. t is the position of words in the sentence, α is the weight of the regularizer.
Since there are five different situations in the input words, there are five regularizers in the LSTM model that integrates sentimental information. The five regularizers are described in detail below.
When the input at time t is a non-sentimental information word, in order to make the sentimental prediction distribution unchanged from the previous time, the expression of the regularizer is [18]:
Among them, M is the hyperparameter for mar-gin, p
t
is the sentiment prediction distribution at time t, that is, the output of the hidden layer at time t is normalized, h
t
is the output of the hidden layer at time t, and D
KL
is symmetric KL divergence, which is defined as:
Among them, p and q are the distribution on the sentimental tags, and C is the number of tags.
When the input at time t is a sentiment word, the sentiment prediction distribution at the current time should be changed compared to the previous time. For example, the example sentence “Ilike _ the _ starring _ very _ much” in Fig. 1 has an input of “I” at time t = 1 and an input of “like” at time t = 2. Meanwhile, the sentiment prediction distribution at time 2 should be more positive than the prediction distribution at time 1 [19].
In order to realize the sentimental drift of sentimental words, this model adds the sentimental drift distribution S
C
∈ R
C
of sentimental words on each sentimental label in the regularizer. Each sentiment word has sentiment drift distribution on five types of labels (very negative, negative, neutral, positive, very positive), and these distributions can be learned through the model. Therefore, the main function of the regularizer of sentiment words is: when the input at time t is sentiment words, it makes the sentiment prediction distribution drift significantly compared to the prediction distribution at the previous time, which is reflected in the mathematical expression:
Among them,
When the input at time t is a negative word, the sentiment prediction distribution p t at time t should change polarity on the basis of the prediction distribution pt-1 at the previous time. In order to give full play to the role of negative words in sentiment classification, for the characteristics that when different negative words modify different phrases, the change of sentimental polarity is different, this model provides a transformation matrix T ∈ RC×C for each negative word, and the transformation matrix can be carried out by the model Training and learning [21–23].
The negative word regularizer is used to implement the following functions: When the input at time t is a negative word, the predicted distribution pt-1 at the previous time is transformed by the transformation matrix, and its expression is:
Among them,
The degree adverb has a similar effect to the negative classification, and changes the sentiment prediction distribution p t at time t on the basis of the prediction distribution pt-1 at the previous time. The difference is that negative words completely change the polarity of sentiments, while degree adverbs only change the intensity of sentiment expression. Therefore, we can draw on the design ideas of the regularizer of negative words to provide a transformation matrix T ∈ RC×C for degree adverbs to make the predicted distribution pt-1 at the previous moment transform through the transformation matrix. Similarly, in view of the different degree of influence of adverbs of different degrees on sentiment expression, the change matrix of each degree adverb is different and can be trained and learned through the model.
From this, we can get the regularizer of degree adverbs:
Among them,
In the sequence model, syntactic conjunctions act like a combination of negative and degree adverbs. When the input at time t is a negative word, the sentiment prediction distribution p
t
at time t should be changed on the basis of the prediction distribution pt-1 at the previous time. Moreover, this change may change the polarity, or it may only change the degree. In addition, syntactic conjunctions also have different characteristics of different conjunctions affecting sentiment. Therefore, this model also provides a trainable transformation matrix T ∈ RC×C for each syntactic conjunction. The mathematical expression of the regularizer is:
Among them,
The LSTM model that incorporates sentimental information has some limitations. That is, it is necessary to determine whether the negative words, degree adverbs, or sentence conjunctions in the sentence are on the left or right of the sentimental word, so as to determine whether the sentence should be input forward or reverse. For example, “I like the starring very much” requires positive input, so that the degree adverb regularizer amplifies positive sentiments; however, “I don′t like the starring” requires reverse input, so that the negative word regularizer can transform positive sentiments. If it is input positively, the sentiment of the sentence cannot be predicted correctly. In order to better solve this problem, this paper fuses sentimental information with bidirectional LSTM to let the model itself choose forward or reverse.
The original cross-entropy loss function of bidirectional LSTM is:
Among them,
The two-way LSTM model merging sentimental information is similar to the LSTM model merging sentimental information, and both regularization methods are used to integrate sentimental information. Moreover, the change of the loss function and the regularizers in all five cases have interoperability. The difference is that the two-way LSTM model must be compared with the previous text at every moment. The loss function of the improved bidirectional LSTM model is:
Among them, Lt,i is the regularizer added to sentence i, t is the position of the word in the sentence, and α is the weight of the regularizer.
Because the bidirectional LSTM has two hidden layers, which respectively receive contextual information, the regularizer added at each moment is not one, but two. In this way, it is possible to perform drift operations on the sentiment prediction distribution before and after the text and compare with the current prediction respectively. For example, when the input at time t is a non-sentimental information word, the expression of the regularizer is:
Among them,
L t is the regularizer added to the loss function. The reason for choosing a regularizer with a smaller value is that L t represents the gap between the distribution of sentiment prediction at the current moment and the distribution of sentimental drift at the previous moment. Moreover, the smaller the gap, the more sentimental drift is in line with the current word, and the more accurate the sentimental prediction.
Regularizers in other cases are also improved in this way. When the input at time t is an sentimental word, the expression of the regularizer is:
Among them,
When the input at time t is a negative word, the expression of the regularizer is:
Among them,
The regularizers of degree adverbs and syntactic conjunctions are similar to negative words, and both of them use the transformation matrix corresponding to the word x t input at time t to multiply the forward and backward sentiment prediction distributions respectively. The expression corresponding to the regularizer can refer to the formulas (22), (23) and (24) of the negative word, so this section will not elaborate on it in detail.
In the field of machine learning, there are many tasks to map one style of sequence to another style of sequence, such as mutual mapping between different languages, mutual mapping between different language forms, or even mapping a piece of text into a short overview. These tasks can be regarded as Seq2Seq type tasks, that is, from one Sequence to another Sequence. Seq2Seq-type tasks are often solved using the Encoder-Decoder model.
Machine translation based on neural networks is a widely used method in machine translation. Moreover, most of the proposed neural machine translation models consist of a framework called encoder decoder, as shown in Fig. 2.

Frame structure.
The model uses a three-layer CNN network, and then uses the corresponding deconvolution structure as the decoder part, which meets the Sequence-Sequence learning model.
The model is named LSTM-MCNN-decoder, which is a multi-layer LSTM-CNN model combined with the Encoder-Decoder framework. Its structure is shown in Fig. 3. The model uses the LSTM-CNN model as the basic architecture, which makes the entire network have a memory function, can learn the context information of the text well, and has excellent local feature learning ability. Then, the basic model is regarded as the encoder part, an external decoder structure is connected, and the decoder part is used to reconstruct the input matrix of the encoder part. This reconstruction process will make the feature extraction and learning of the encoder part more essential and efficient, which will lead to better final performance.

LSTM-MCNN-decoder model.
The Attention mechanism can be combined with many existing models and inserted between two layers: The attention model accepts the output of the previous layer and a context parameter (this context is the key to the attention mechanism), which passes through the attention model. After that, it filters out the context-related part from the output of the previous layer (it is not necessary to directly delete the content, it can be screened by weighted average) as the output of the attention model. This output is used as the input of the next layer. The following figure is a schematic diagram of the attention mechanism.
In the NLP problem, most of the existing research on the attention mechanism is to model the correlation between different sentences, such as the word alignment between the source language and the target language in machine translation, and the word similarity between question and answer. However, there is little research on the correlation between words in a sentence. In existing research, an attention-CNN (ATT-CNN) is proposed, in which the function of the attention mechanism is to automatically capture the context information and relationships of non-contiguous words without any external grammatical information.
In the case of considering syntactic rules, the word vector of each feature word is calculated, and then the sentiment dictionary and rules created in this paper are used to calculate the sentiment intensity value of each sentiment word in the text. Then, the text word vector and the text sentiment value are spliced and used as the text feature vector. Finally, the machine learning algorithm is used to classify the sentiment of the text. The specific process is shown in Fig. 5.

Attention mechanism.

Flow chart of text classification.
This article considers the semantic rules and obtains word vectors based on Word2vec training. However, word vectors do not contain sentiment tendencies, and word vectors of words with similar semantics do not necessarily have similar sentiment tendencies. For example, the Euclidean distance between “like” and “hate” word vectors may be closer than the “like” and “fast” word vectors, so the following rules are proposed to process the text: (1) Degree adverb feature: When a degree adverb appears in front of an sentiment word, the feature value of the degree adverb is multiplied by each dimension of the word vector of the sentiment word. (2) Negative word features: the number of negative words that appear before the sentiment word is counted, and the negative word feature values are multiplied. If the result is 1.0, the word vector of sentiment word itself is used to calculate the total word vector. If the result is.1.0, the word vector of the sentiment word closest to the opposite value of the sentiment intensity value of the sentiment word is used to replace the sentiment word vector for subsequent calculation. (3) Transitional sentence features: when the first type of transitional conjunction appears before the sentiment word, the feature value of the transitional conjunction.1.0 is multiplied by the negative words and degree adverbs that modify the sentiment. If the result is greater than 0, the value is multiplied by each dimension of the word vector of the sentiment word. If the result is less than 0, the word vector of the sentiment word closest to the opposite value of the sentiment intensity value of the sentiment word is used to replace the word vector of sentiment word for subsequent calculation.
When the second transitional conjunction appears in front of the sentiment word, it is processed according to the two rules (1) and (2) above.
In order to verify the model proposed in this study, the performance of the model is studied through experimental research. The model proposed in this study is named GRU-IDF, and the performance of the model proposed in this study and the neural network model are compared and analyzed. The neural network model is named NNM. The two models are analyzed for the accuracy rate and recognition speed of sentiment classification. The data source is a sub-classroom student sentiment database. 5000 data are extracted from the database and divided into 50 groups. These two models are used to classify sentiment s, and the accuracy and recognition speed of sentiment classification are compared. The classification accuracy results are shown in Table 1 and Fig. 6.
Statistical table of classification accuracy (%)
Statistical table of classification accuracy (%)
Statistical table of classification speed
Figure 6 is a statistical diagram of classification accuracy. It can be seen in this figure that the classification accuracy of the model proposed in this study is nearly 20% higher than that of the neural network model, and the classification accuracy of the model proposed in this study is all above 90%.

Statistical diagram of classification accuracy.
As can be seen from Fig. 7, the recognition speed of the model proposed in this study is between 60ms–80 ms, while the traditional neural network’s sentiment classification speed is between 150ms–180 ms. Therefore, the model proposed in this study has certain advantages in recognition speed.

Statistical diagram of classification speed.
On the whole, the model proposed in this study has certain advantages in sentiment classification accuracy and sentiment classification speed. Therefore, compared with traditional models, the model proposed in this study has good comprehensive performance and can meet the actual needs of students’ sentiment classification.
As an important task in natural language processing, text sentiment classification is applied to all aspects of life. Natural language processing has finally ushered in a period of rapid development of deep learning after experiencing a downturn period dominated by rules and a slow development period of early machine learning. In the context of the vigorous development of deep learning, this article has conducted research on student sentiment classification, and analyzed the effect of sentiment information on sentiment classification. Sentiment information includes sentiment words, negative words, degree adverbs, and syntactic conjunctions. As a direct carrier of sentiment expression, sentiment words play a decisive role in sentiment classification, and negative words are typical sentiment polarity converters, which will completely change the sentiment polarity of the modified phrases. For the role of sentiment information, a way to integrate sentiment information into neural networks based on regularization is proposed. In addition, the role of different sentiment information words is abstracted into a regularized formula, which is added to the loss function of the neural network, so that the neural network is closer to the direction of the prior sentiment information during the training process.
Footnotes
Acknowledgments
Research on social science development in Chengde in 2018Research on the innovation of ideological and political education in colleges and universities in chengde area(project number: 20182065 general project).
