A fuzzy convolutional neural network for text sentiment analysis

Abstract

For the artificial intelligence (AI) to effectively mimic humans, understanding humans, more specifically, human emotion is important. Sentiment analysis aims to automatically uncover the underlying sentiment or emotions that humans hold towards an entity. There is high ambiguity of emotion in text data. In this paper, we consider the sentence-level sentiment classification task, and propose a novel type of convolutional neural network combined with fuzzy logic called the Fuzzy Convolutional Neural Network (FCNN) and its associated learning algorithm. The new model is an integration of modified Convolutional Neural Network (CNN) in the fuzzy logic domain. The proposed model benefits from the use of fuzzy membership degrees to produce more refined outputs, thereby reducing the ambiguities in emotional aspects of sentiment classification. Also it benefits from extracting high-level emotional features due to convolutional neural representation. We compare the performance of our proposed approach with conventional CNN for sentiment classification. The experimental results indicate that the proposed FCNN outperforms the conventional methods for sentiment classification task.

Keywords

Sentiment analysis fuzzy logic convolutional neural network convolutional neuro-fuzzy network

1 Introduction

Today in the Internet era, and with the influx of social media, it is necessary to dig into essential and meaningful information from the countless data available in it. They usually come in the form of text and explicitly state the viewer’s affection for content such as products, services, books, hotels, etc. Text is one good source for expressing your ideas, emotions and feelings. Languages are not only used for communication but also impart emotion associated with it. Sentiment analysis of such texts is pivotal to clearly grasp the opinions and emotions expressed within an online mention.

Sentiment extraction from texts has made a remarkable development in the past few years [11, 21]. Sentiment analysis analyses online texts using natural language processing (NLP), text analysis and computational linguistics to categorize a piece of text into positive versus negative emotional states. However sentiment polarity (positive and negative) and sarcasm in text can make sentiment classification a challenge to machine learning.

The Convolutional Neural Network (CNN) is a machine learning model that showed remarkable results in identifying and classifying problems, especially in the field of NLP [1 , 6]. The deep CNN can extract high-level features from the input that improves the classification accuracy [8]. These classifiers treat the classification of sentiment in a “black-and-white” manner and do not deal with inherent ambiguities with linguistic labels. Also, the features extracted by the deep CNN are not something that humans can interpret.

Fuzzy logic has been in use for dealing with many practical problems of ambiguities with linguistic labels. Unlike deep CNN, fuzzy logic can extract the degree to which a document contains a specific sentiment. Fuzzy rules can be derived from a large amount of training data by automatically learning the fuzzy membership functions. Based on the predicted classes (e.g. positive and negative) and the corresponding fuzzy membership values, the system allows the inference of more refined categories (e.g. neutral) or intensities (e.g. somewhat positive, somewhat negative) of sentiment without the need to define more classes. Both neural network and fuzzy logic can represent data efficiently. Many successful fuzzy-neural models have been developed a few decades ago. In the fuzzy-neural network (FNN), input signals, weights and output signals are fuzzified and expressed in the fuzzy domain [7 , 19]. The FNN is capable of handling linguistic ambiguities such as low, medium, and high or fuzzy values which enhances its sustainability and processes capabilities with ambiguous data [19].

In this paper, we propose a new Fuzzy Convolutional Neural Network (FCNN) model that is an integration of fuzzy logic in to a traditional CNN framework. The combination brings together the advantages of both fuzzy logic and CNN models by extracting high-level useful features from ambiguous text data. A preliminary version of this work has been presented as a conference paper [18]. The FCNN model was tested on sentiment analysis tasks that showed better performance as compared to the conventional CNN model. In the current version, we perform detailed experimental analysis using five varying data sets. We analyze the contribution of fuzzy operators in feature extraction with detailed feature set visualization at different layers. We also verify the robustness of the proposed FCNN by experiments with noisy data.

The rest of paper is organized as follows: in the next section, we present the related works. In Section 3, we explain our proposed FCNN model architecture. In Section 4, we present the results of our experiments. In the last section, we conclude our results and discuss our future plans.

2 Related works

Sentiment extraction from texts has made considerable progress in the past few years [11, 21]. For text sentiment analysis, lexicon-based dictionaries; bag-of-words; word embedding [16] in combination with classifiers such as SVM [23] or deep neural networks [5, 12] have been attempted.

The CNN for text based sentiment analysis has been applied to a broad set of applications including language modeling [16], sentiment analysis [11], syntactic parsing [13], and machine translation [15]. Deep learning networks carry out automatic feature extraction and doesn’t have the capacity to represent inherent uncertainties with linguistic labels. This makes deep learning unsuitable to address the problems of data uncertainty as well as ambiguity.

With the development of deep learning models, many studies have attempted to combine fuzzy logic with deep learning models that produce remarkable results such as fuzzy restricted Boltzmann machine (FRBM) [2] which replaced RBM’s parameters with fuzzy number, or parallely concatenate fuzzy inference system with multilayer neural network in fused fuzzy deep neural network (FDNN) [3]. Another attempt is using fuzzy logic correction module to improve performance of CNN in [22]. However, these models are just a combination of fuzzy logic and neural modules in a successive or parallel way, which do not really integrate the two models to take full advantage of them.

3 FCNN architecture

Figure 1 shows the architecture of proposed FCNN architecture for text sentiment analysis. Firstly, the input sentence is embedded by embedding layer to real value matrix. Then fuzzification layer transforms the input matrix in to fuzzy domain. Consequently, the fuzzy representation is convoluted in fuzzy convolutional layers which works as a filter to get high-level features from the data. After passing through the fuzzy convolutional stage, the extracted feature set is converted into crisp value by defuzzification layer. Finally, fully connected layer works as output classifier for FCNN.

Fig.1

FCNN for text sentiment analysis.

3.1 Embedding level

For sentiment classification of a sentence, FCNN calculates a score value for each emotional label. To do this, the model first receives the input as a sequence of words and passes through the layers of the model. Through each layer, higher-level features are extracted and taken to the next layer. The model then extracts the features from the vector level of the word to the sentiment level of the sentence.

For proper computation using FCNN, the words in the sentence must be expressed as numerical values. Word-level embedding is the first step to map every word in a sentence to a d-dimensional vector so that each sentence will be transformed into a matrix of size m × d, where m is the length (number of words in the sentence) and d is the dimension of the embedded vector. Typically, the length of the sentences in the dataset are usually different. For convenience, we pad a special word at the end of each sentence to make their length same.

For each sentence consisting of M words (w₁, w₂, …, w_m, …, w_M), the w_m words in the sentence will be transformed into a vector u_m = [u₁, u₂, …, u_D]. We use a fixed size V dictionary, so we have embedding matrix D ∈ R^(d×|V|). We obtain the mapping of w_m to the vector u by Equation 1.

$u_{m} = D v^{w}$ (1) where v^w is the vector dimension |V| which has a value of 1 in the index w and 0 in the remaining positions. w is the index of the word w_m in the dictionary V. Initially the matrix D is randomly initiated, then its values are trained during the training of the model.

3.2 The FCNN classifier level

After embedding each input to a matrix X, each element in the input matrix is assigned multiple linguistic labels based on membership functions. The fuzzy membership function calculates the grade that describes the membership of the input node to a particular fuzzy set. The fuzzy sets $\tilde{X}$ in Equation (2) is obtained by Equation (3) Relational that calculates using max-product operation. They are possibilities that the input and output data belong to the predefined reference fuzzy numbers ${\tilde{MF}}_{ij}$ in the universe of discourse. $\tilde{X} = fuzzification (x_{ij} | {cx}_{ij})$ (2) where i, j are indices of element x in input matrix X, and center of input fuzzy membership function cx.

$\begin{matrix} x_{ij} & = & possibility (x_{ij} | {\tilde{MF}}_{ij}) \\ = & \max_{x \in X} ({\tilde{MF}}_{ij} δ (x - x_{ij})) \end{matrix}$ (3) where δ (x - x_ij) is the Kronecker delta function.

Each fuzzy convolution layer includes three processing stages, namely fuzzy convolution stage, nonlinearity stages and pooling stage. The fuzzy convolutional stage is a process of applying fuzzy convolutional filters to original 2D data as in Equation (4) in which fuzzy convolutional filters W_μ is calculated as Equation (5) where W is original convolution filter. $x_{ij} = \sum_{a = 0}^{m - 1} \sum_{b = 0}^{d - 1} W_{μ} x_{(i + a) (j + b)}$ (4) $W_{μ} = fuzzification (W)$ (5)

The following Equation (6) is a non-linear transformation of the output obtained from fuzzy convolution stage. The last stage is another operation called pooling (e.g., Max Pooling) which is a summary statistics of nearby results after feature extraction stage. This stage helps the representation to be invariant to translation of input, and meanwhile, the size of the input to next fuzzy convolutional layer might be reduced. $y_{ij} = σ (x_{ij})$ (6) where σ (.) is the activation function of the convolution layer.

The fully connected layer of the FCNN is working as a classifier with input features being the crisp value z_i obtained from the deffuzzification process with center of gravity method in Equation (7), where C_y is the center of the defuzzification membership function. ${\hat{y}}_{i}$ is output of the classifier and W_fc is weight matrix of fully connected layer. $z_{i} = defuzz (x_{i}) = \frac{\sum C_{y} x_{i}}{\sum x_{i}}$ (7) ${\hat{y}}_{i} = W_{fc} z_{i}$ (8)

3.3 FCNN training

Cross entropy is the loss function used to evaluate the output error, as shown in Equation (9), where y is target, $\hat{y}$ is output of classifier and N is the number of samples. $E = - \frac{1}{N} \sum_{n = 1}^{N} [y_{n} \log ({\hat{y}}_{n}) + (1 - y_{n}) \log (1 - {\hat{y}}_{n})]$ (9)

The parameters of the model are trained by conventional back-propagation learning algorithm with the cross-entropy loss function. The weight update as shown in Equation (10). $W_{fc} (k + 1) = W_{fc} (k) - α_{fc} \frac{\partial E}{\partial W_{fc}}$ (10)

The centers C_y (k) of defuzzification membership functions are updated as Equation (11), where a_{C
_y} denotes the learning rate of updating center, y_k+1 and ${\hat{y}}_{k + 1}$ are the output target and the model’s actual output, respectively. $C_{y} (k + 1) = C_{y} (k) + a_{cy} \nabla_{C_{y}}$ (11)

Center value C_w and variance σ of fuzzification membership function of convolution layer’s weight are calculated by Equations (12)–(15) with learning rate α_{C
_w} $C_{w} (k + 1) = C_{w} (k) + α_{cw} \nabla_{W_{μ}}$ (12) and $σ_{C_{w}} (k + 1) = σ_{cw} (k) + α_{C_{w}} \nabla_{W_{μ}}$ (13) where $δ_{k} = (W_{μ k})^{T} δ_{k}^{(3)} f^{'} (x_{k})$ (14) $\nabla_{W_{μ k}} = \sum y_{ij} * rot 180 (δ_{k})$ (15)

For updating the mean and variance of fuzzification layer’s membership function, we use Equations (16) and (17), where α_cx is the learning rate of fuzzification layer. $C_{x} (k + 1) = C_{x} (k) + α_{cx} \nabla_{C_{x}}$ (16) $σ_{C_{x}} (k + 1) = σ_{cx} (k) + α_{C_{x}} \nabla_{C_{x}}$ (17)

3.3.1 Algorithm

The process of training the FCNN with back-propagation is summarized in Algorithm 1. For given training dataset with feature set X and target y, we apply mini-batch training procedure to obtain the best parameters for FCNN and CNN. The hyper-parameters such as learning rates, batch size, dropout rate, and training epoch are empirically chosen.

4 Experiments

4.1 Model configuration

The structure and the detailed settings of parameters for the FCNN model is illustrated in Table 1. Notice that we adopt ‘dropout’ in the last layer after fully connected layer to prevent over fitting. We use a variant of the proposed model with the same structure as regular FCNN with fuzzification and defuzzification stages, but replace the fuzzy convolution layers with conventional convolution layers for comparison. We name this model as FCNN w/o FuzzConv. We employ this model to analyze how fuzzy convolutional layer in the proposed model contributes to sentiment classification task.

Table 1
CNN and FCNN Model Parameters for text sentiment analysis

Layer CNN parameters FCNN parameters

Input 56×128 56×128

Fuzzification None 2×(56×128) Gaussian MF

Conv3×d (in_channels=1, out_channels=200, kernel_size=(3×128), stride=1, padding=0) 2× (in_channels=1, out_channels=100, kernel_size=(3×128), stride=1, padding=0)

ReLU, Max_Pooling(55×1) ReLU, Max_Pooling(55×1) (Fuzzied kernel)

Conv4×d (in_channels=1, out_channels=200, kernel_size=(4×128), stride=1, padding=0) 2 × (in_channels=1, out_channels=100, kernel_size=(4×128), stride=1, padding=0)

ReLU, Max_Pooling(54×1) ReLU, Max_Pooling(54×1) (Fuzzied kernel)

Conv5×d (in_channels=1, out_channels=200, kernel_size=(5×128), stride=1, padding=0) 2 × (in_channels=1, out_channels=100, kernel_size=(5×128), stride=1, padding=0)

ReLU, Max_Pooling(53×1) ReLU, Max_Pooling(53×1) (Fuzzied kernel)

Defuzzification None 2×128

FC1 128×2, ReLU, Dropout(0.5) 128×2, ReLU, Dropout(0.5)

Layer	CNN parameters	FCNN parameters
Input	56×128	56×128
Fuzzification	None	2×(56×128) Gaussian MF
Conv3×d	(in_channels=1, out_channels=200, kernel_size=(3×128), stride=1, padding=0)	2× (in_channels=1, out_channels=100, kernel_size=(3×128), stride=1, padding=0)
	ReLU, Max_Pooling(55×1)	ReLU, Max_Pooling(55×1) (Fuzzied kernel)
Conv4×d	(in_channels=1, out_channels=200, kernel_size=(4×128), stride=1, padding=0)	2 × (in_channels=1, out_channels=100, kernel_size=(4×128), stride=1, padding=0)
	ReLU, Max_Pooling(54×1)	ReLU, Max_Pooling(54×1) (Fuzzied kernel)
Conv5×d	(in_channels=1, out_channels=200, kernel_size=(5×128), stride=1, padding=0)	2 × (in_channels=1, out_channels=100, kernel_size=(5×128), stride=1, padding=0)
	ReLU, Max_Pooling(53×1)	ReLU, Max_Pooling(53×1) (Fuzzied kernel)
Defuzzification	None	2×128
FC1	128×2, ReLU, Dropout(0.5)	128×2, ReLU, Dropout(0.5)

4.2 Dataset and preprocessing

The proposed FCNN model with the above parameters is tested on five different datasets with varied properties like number of classes, data size and length of the sentence. Table 2 gives the summary statistics of five datasets used.

Algorithm 1

Training procedure for FCNN

Input:	Training samples X and corresponding labels Y, hyper-parameters (learning rates (α_fc, α_cx, α_cw, and α_cw; batch number B; dropout rate; training epoch number N; number of convolutional layers L)
Output:	Trained parameters of FCNN
Initialization:	Randomly initialize weight W and centers of membership function C_x, C_w, C_y
	for e=1 to N do
	for b=1 to B do
	$\begin{array}{l} \hat{X} \leftarrow f u z z i f i c a t i o n (X); \\ W_{μ} \leftarrow f u z z i f i c a t i o n (W); \end{array}$
	for l=1 to L do
	${\hat{X}}^{l + 1} \leftarrow c o n v (W_{μ}^{1}, {\hat{X}}^{l});$
	end
	$\begin{array}{l} Z \leftarrow d e f u z z ({\hat{X}}^{L}); \\ \hat{Y} \leftarrow f u l l y c o n n e c t e d (Z); \\ E \leftarrow c r o s s e n t r o p y (Y, \hat{Y}); \\ (W_{f c}, C_{y}, C_{w}, C_{x}) \leftarrow \\ u p d a t e (W_{f c}, C_{y}, C_{w}, C_{x}); \\ (σ_{w}, σ_{x}) \leftarrow u p d a t e (σ_{w}, σ_{x}) \end{array}$
	end
	end

Table 2

Summary statistic of used datasets. c: Number of classes, l: average of sentences length, N: Number of samples in dataset, Test: Size of test set (CV: using 10-fold CV)

Dataset	c	l	N	Test
MR	2	20	10662	CV
Twitter-1	3	15	9513	CV
Twitter-2	2	15	7625	CV
SST-1	5	18	11855	2210
SST-2	2	19	16185	1821

Movie reviews (MR) were first used in [20], which contains 10,662 reviews, including 5,331 positive and 5,331 negative snippets. Each sample in this dataset corresponds to a single review which usually contains one sentence. The length of each sentence is smaller than fifty six words. All the sentences are lower-cased. The samples are labeled as negative and positive. Some examples of sentences in this text dataset are shown in Table 3. Stanford Sentiment Tree bank (SST) which includes SST-1 and SST2 is an extended version of MR with standard separation of train/dev/test sets. SST-1 has 5 classes while SST-2 has 2 classes. Twitter sentiment dataset is a small dataset extracted from Twitter, which is a popular micro blogging service. Twitter’s user can create status messages, which are called tweets, with a maximum of 140 characters. The Twitter dataset used for our experiments has a total of 9,513 tweets. Each sentence has an associated emotional label as positive, neutral or negative. Twitter-2 is the same with Twitter-1 dataset, but the neutral class is eliminated.

Table 3

Some samples of sentences in MR dataset

Positive	Negative
If there’s a way to effectively teach kids about the dangers of drugs, i think it’s in projects like the paid.	Though excessively tiresome, the uncertainty principle, as verbally pretentious as the title may be, as long as you discount its ability to bore.
Though everything might be literate and smart it never took off and always seemed static. Cantet perfectly captures the hotel lobbies, two-lane highways, and roadside cafe’s that almost spooky in her sulky, calculating lolita turn.	Despite juliet stevenon’s attempt to bring cohesion to pamela’s emotional roller coaster life. It’s tough to be startled when you’re almost dozing.
While it would be easy to give crush the new title of two weddings and a funeral, it’s a far more thoughtful film than any slice of hugh grant whimsy.	Humorless that it becomes a chore to sit through despite some first-rate performances by its lead.

For the datasets that are not separated according to standard train/dev/test set, a 10-fold cross validation is performed, where the classifier is trained on 90 percent and tested on 10 percent of the dataset. This is repeated such that each sample from the dataset is used once for validation. Then the average accuracy is calculated. For comparison, we select a CNN which is derived from [5] with the similar structure and complexity as the baseline approaches. To make sure the performance of the proposed model and the baseline method is comparable, we apply the same training hyper-parameter as pooling probabilities, dropout rate, and learning rate for classification layer. For comparison, we empirically choose hyper-parameters of each model by trial and error.

4.3 Results and discussion

Table 4 shows the comparison of 10-fold classification of CNN and FCNN for MR dataset. The results show that the average accuracies of the emotion recognition system are 74.03%, and 76.34% for the CNN and FCNN classifiers respectively.

Table 4
Comparison of classification accuracy of CNN and FCNN for MR dataset using cross-validation

Cross-validation CNN (%) FCNN (%)

1 73.47 76.83

2 73.94 75.61

3 73.10 75.98

4 74.69 77.65

5 74.56 76.27

6 73.82 75.92

7 74.93 75.75

8 73.75 76.66

Average 74.03 76.34

STD 1.26 0.70

Cross-validation	CNN (%)	FCNN (%)
1	73.47	76.83
2	73.94	75.61
3	73.10	75.98
4	74.69	77.65
5	74.56	76.27
6	73.82	75.92
7	74.93	75.75
8	73.75	76.66
Average	74.03	76.34
STD	1.26	0.70

Comparison of classification accuracy is summarized in Table 5. This table shows the classification accuracy of 3 models: CNN, FCNN and FCNN w/o FuzzConv with respect to 5 datasets. With MR, Twitter-1, and Twitter-2, which are tested with 10-fold cross validation, the standard deviation is also mentioned. It can be seen that the FCNN has the highest accuracy. Our proposed model outperforms the vanilla CNN by an average of almost 2% with lower standard deviation. The performance of CNFN w/o FConv is lower than both CNN and CNFN. This result allows us to emphasize the effectiveness of the fuzzy convolutional layer, combined with the fuzzification and defuzzification stages, in extracting higher features. The lower standard deviation of FCNN after 10-fold cross validation also indicates that the proposed FCNN is more stable than the conventional CNN model.

Table 5

Summary of classification accuracy of CNN, FCNN and FCNN w/o FuzzConv for all tests

Dataset	CNN (%)	FCNN (%)	FCNN w/o FuzzConv (%)
MR	74.03±1.26	76.83±0.70	73.95±0.92
Twitter-1	77.03±1.86	78.85±1.62	76.87±1.75
Twitter-2	62.70±1.67	63.02±1.65	62.56±1.76
SST-1	78.10	80.56	78.15
SST-2	37.12	39.16	36.91

To verify the robustness of the proposed FCNN model, we test on a noisy dataset. This is done by replacing a number of random words of all the sentences in the dataset. Then we train and test with noise added to this dataset. From the results in Table 7, we conclude that, although both CNN and FCNN are affected with noise, FCNN shows a smaller ratio of reduction in accuracy.

In Table 6, the sentences replicate ambiguous sentiments, which are sometimes confusing even for humans. By using CNN, the sentiment of the sentences were misclassified. Whereas, FCNN could recognize the sentiments in the sentences properly. This can be explained by higher feature extraction of FCNN on ambiguous data.

Table 6

Some samples of ambiguity sentences in MR dataset

Sentences	Classified result
Its burns visuals characters and his punchy dialogue not his plot that carry way downtown	Ground truth: PositiveCNN: NegativeFCNN: Positive
A disturbing examination of what appears to be the definition of a “bad” police shooting	Ground truth: PositiveCNN: NegativeFCNN: Positive
The only excitement comes when the credits finally roll and you get to leave the theater	Ground truth: NegativeCNN: PositiveFCNN: Negative
Even accepting this in the right frame of mind can only provide it with so much leniency	Ground truth: NegativeCNN: PositiveFCNN: Negative
A sugar coated rocky whose valuable messages are forgotten 10 minutes after the last trombone honks	Ground truth: NegativeCNN: PositiveFCNN: Negative

Table 7

Comparison performance reduced by adding noise to MR dataset

# of words	CNN		FCNN
	Acc (%)	% reduced	Acc (%)	% reduced
0	74.03	0	76.83	0
5	72.51	2.1	75.32	1.9
10	69.94	5.6	72.86	5.2
20	66.35	10.9	69.59	9.8

4.4 Feature set visualization

In order to analyze the contribution of fuzzy operators in feature extraction and classification, we visualize feature sets that are extracted by fuzzy convolutional layers and compare with those of CNN at the same level. Two trained CNN and FCNN models for text sentiment analysis are employed with the same test dataset. We use the visual analytic method in [24]. Firstly, we apply 1,066 samples of the test set to train both CNN and FCNN models, then we extract output feature sets of given layers. After extracting all the feature sets, we project each of those to 2D scatter plots as shown in Fig. 2.

Fig.2

Projection of scatter plots of test input samples.

The projection uses t-distributed Stochastic Neighbor Embedding (t-SNE) [14] with same parameters. t-SNE is a popular method to visualize high-dimensional data. Each point on the plot is equal to one sample and labeled to the class it belongs to. The main reason of using t-SNE is to show how the proposed FCNN can better identify the hidden patterns in the data. The t-SNE represents the multi-dimensional data features in 2 or 3 dimensions.

t-SNE basically converts high dimensional distances between the data points into conditional probabilities. It shows the similarity between data points under Gaussian distribution. So if the points are close to each other based on the predefined variance, the probability of locating at the same Gaussian distribution would be higher. The t-SNE tries to minimize the sum of the difference of the conditional probabilities and this is performed through the minimization of the sum of the Kullback-Leibler divergences over the data by gradient descent.

By comparing the t-SNE distributions of CNN and FCNN in Fig. 3, it can be understood that both the local and global distribution of the data are retained better in case of the proposed approach. So in FCNN, the data points are less sparse than the CNN case and are clustered in a better probability distribution. Figure 3 also verifies that the data points in FCNN case are grouped in better neighboring distribution.

Fig.3

Projection scatter plots of output feature set extracted by convolutional layers in both CNN (first column) and FCNN (second column). Each model was tested using kernels of sizes 3 × d, 4 × d and 5 × d, where d is the dimension of the embedded vector.

Scatter plot of projection of input samples is shown in Fig. 2. By passing through the convolutional layers, higher feature sets can be extracted, so the distribution of feature set of each class is more separate as shown in Fig. 3. Figure 4 shows the plots of final features distribution of convolutional stage of two model. Those plots clearly show that fuzzy convolutional filter can extract more useful features for classification.

Fig.4

Projection scatter plots of feature set extracted by convolutional stage in both (a) CNN and (b) FCNN.

For quantitative analysis of separability of output feature set, we employed Silhouette analysis [25]. This analysis can be used to study the separation distance between the resulting clusters. The Silhouette plot displays a measure of how close each point in one cluster is to points in the neighboring clusters and thus provides a way to assess parameters like number of clusters visually. This measure has a range of [-1, 1]. Silhouette coefficients near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster. The comparison of Silhouette score for each layer of two models is shown in Table 8. Compared to CNN, the fuzzy convolutional filters of FCNN show higher ability to extract discriminative features, and then the output feature set gets more separable with the higher Silhouette score.

Table 8

Comparison of Silhouette score

Feature extracted at layer	Silhouette score
	CNN	FCNN
Input	8.5e-5	8.5e-5
Conv 3 × d	0.0146	0.126
Conv 4 × d	0.0131	0.135
Conv 5 × d	0.0106	0.130
Conv_out	0.0135	0.136

5 Conclusion

This paper takes advantages from deep learning, fuzzy modeling and neural networks and propose a hybrid deep learning based fuzzy-neural model, Fuzzy Convolutional Neural Network, which integrates fuzzy logic and CNN, for text sentiment classification. FCNN can generate more reasonable features that achieve better classification accuracies on emotional data as compared to conventional approaches such as CNN. The proposed model addresses the problems of data ambiguities with linguistic labels that have relevance for emotion identification in sentiment analysis tasks. In future, we intend to explore a more deeper architecture by adding fuzzy convolution layers to the proposed framework. We also intend to integrate features from multiple modalities such as text, video and audio and use the proposed FCNN model for emotion recognition and sentiment analysis. Also, we intend to develop an explainable AI which indicates the roles of hidden layers and neurons to produce an output. Furthermore, we are trying to develop other fuzzy operators in the deep fuzzy convolutional neural networks as well as a deep fuzzy recurrent neural network.

Footnotes

Acknowledgments

This work was partly supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government(MSIT) (2016-0-00564, Development of Intelligent Interaction Technology Based on Context Awareness and Human Intention Understanding) (50%) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2016R1A2A2A05921679) (50%).

References

Becker

, Cagnini

H.E.L.

, Barros

R.C.

, Wehrmann

, Becker

, Cagnini

H.E.L.

and Barros

R.C.

, A Character-based convolutional neural network for language-agnostic twitter sentiment analysis, 2017 International Joint Conference on Neural Networks (IJCNN), (2017), pp. 2384–2391.

Chen

C.L.

, Zhang

C.Y.

, Chen

and Gan

, Fuzzy restricted boltzmann machine for the enhancement of deep learning, IEEE Transactions on Fuzzy Systems23 (2015), 2163–2173.

Deng

, Ren

, Kong

, Bao

and Dai

, A hierarchical fused fuzzy deep neural network for data classification, IEEE Transactions on Fuzzy Systems6706 (2016), 1.

Dos

C.N.

, Santos and M. Gatti, Deep convolutional neural networks for sentiment analysis of short texts, Coling2014 (2014), 69–78.

Irsoy

and Cardie

, Opinion mining with deep recurrent neural networks, EMNLP (2014), 720–728.

Kim

, Convolutional Neural Networks for Sentence Classification, arXiv preprint arXiv:1408.5882, (2014),

Koivo

, Adaptive Neuro-Fuzzy Inference System - ANFIS, (2000), pp. 1–25.

LeCun

, Bottou

, Bengio

and Haffner

, Gradient-based learning applied to document recognition, Proceedings of the IEEE86 (1998), 2278–2323.

Lee

, Lee

S.-Y.

and Park

C.H.

, A new neuro-fuzzy identification model of nonlinear dynamic systems, International Journal of Approximate Reasoning10 (1994), 29–44.

10.

Lee

S.C.

and Lee

E.T.

, Fuzzy Neural Network, (1975),

11.

Liu

, Sentiment analysis and opinion mining, Synthesis Lectures on Human Language Technologies5 (2012), 1–167.

12.

Liu

and Zhang

, Distributed representations of words and phrases and their compositionality, Mining Text Data, Springer U.S., (2012), 415–463.

13.

Lopez

M.M.

and Kalita

, Deep learning applied to nlp, arXiv preprint arXiv:1703.03091, (2017),

14.

Maaten

L.V.D.

and Hinton

, Visualizing data using tsne, Journal of Machine Learning Research9 (2008), 2579–2605.

15.

Meng

, Lu

, Wang

, Li

, Jiang

and Liu

, Encoding source language with convolutional neural network for machine translation, arXiv preprint arXiv:1503.01838, (2015),

16.

Mikolov

, Sutskever

, Chen

, Corrado

and Dean

, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems 2635 (2013), 3111–3119.

17.

Mitra

, Fuzzy multi-layer perceptron, Inferencing and Rule Generation6 (1995), 51–63.

18.

Nguyen

T.-L.

, Kavuri

and Lee

, A convolutional neuro-fuzzy network for sentiment analysis, International Conference on Green and Human information Technology 2018 (2018). 164–167.

19.

Pal

S.K.

and Mitra

, Multilayer perceptron, fuzzy sets, and classification, IEEE Transactions on Neural Networks3 (1992), 683–697.

20.

Pang

and Lee

, Seeing stars: Exploiting class relation ships for sentiment categorization with respect to rating scales, in Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, 2005, pp. 115–124.

21.

Pang

, Lee

., Opinion mining and sentiment analysis, in Information Retrieval2 (2008), 1–135.

22.

Popko

and Weinstein

, Fuzzy logic module of convolutional neural network for handwritten digits recognition, Journal of Physics: Conference Series738 (2016), 012123.

23.

, Wu

and Yuan

, Sentiment analysis with the exploration of overall opinion sentences, in Proceedings of the International Joint Conference on Neural Networks, vol. 2017-May, 2017, pp. 2392–2399.

24.

Rauber

P.E.

, Fadel

S.G.

, Falcao

A.X.

and Telea

A.C.

, Visualizing the hidden activity of artificial neural networks, IEEE transactions on visualization and computer graphics23 (2017), 101–110.

25.

Rousseeuw

P.J.

, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics20 (1987), 53–65.

A fuzzy convolutional neural network for text sentiment analysis

Abstract

Keywords

1 Introduction

2 Related works

3 FCNN architecture

4 Experiments

4.1 Model configuration

Algorithm 1

Training procedure for FCNN

Table 4 Comparison of classification accuracy of CNN and FCNN for MR dataset using cross-validation Cross-validation CNN (%) FCNN (%) 1 73.47 76.83 2 73.94 75.61 3 73.10 75.98 4 74.69 77.65 5 74.56 76.27 6 73.82 75.92 7 74.93 75.75 8 73.75 76.66 Average 74.03 76.34 STD 1.26 0.70

Footnotes

Acknowledgments

References

Table 4
Comparison of classification accuracy of CNN and FCNN for MR dataset using cross-validation

Cross-validation CNN (%) FCNN (%)

1 73.47 76.83

2 73.94 75.61

3 73.10 75.98

4 74.69 77.65

5 74.56 76.27

6 73.82 75.92

7 74.93 75.75

8 73.75 76.66

Average 74.03 76.34

STD 1.26 0.70