CLSA-CapsNet: Dependency based concept level sentiment analysis for text

Abstract

The refining of information from the immense amount of unstructured data on the internet can be a critical issue in identifying public opinion. It is difficult to extract relevant concepts from huge amounts of data. Concept level semantic parsing is improved over word-based investigation as it conserves the semantical data relevant to many-word articulations. The semantic proposals offer a superior comprehension of textual data and serve to altogether precision the exactness of numerous mining operations in text assignments. The extraction of concepts from textual data is a significant step forward in content analysis at the concept stage. We present a CLSA-CapsNet method that extracts concepts from natural language text. Then the extracted concepts are applied in Capsule networks (CapsNet). Moreover, the integration of Concept Level Sentiment Analysis (CLSA) and Capsule Networks (CapsNet) has not yet been implemented on the hotel review dataset. This is the first attempt, which we researched and embraced by the capsule network, to develop classification models for hotel reviews. The developed results demonstrated excellent performance with a prediction accuracy of 86.6% for CLSA-CapsNet models, respectively. Various similarities have also been made across our techniques and they are implemented by some other deep learning algorithms, such as rnn-lstm. Overall, the outstanding success obtained by CLSA-CapsNet in this investigation highlights its ability in the hotel review dataset. We likewise show exploratory outcomes, in which the proposed system outpaced the state-of-the-art CLSA-CapsNet model.

Keywords

Capsule network sentiment analysis CNN RNN LSTM concept level

1 Introduction

Facts and opinion statements are the two kinds of textual data available on the Web. The entities are described in objective sentences and they do not show any sentiment in the facts. whereas, opinions describe people’s feelings toward entities and are descriptive in nature. Owing to an extensive range of business and applications like securing multimedia data [6], e-health and e-learning [4, 5] in human-centric environments [3], air quality modelling [7], increasing sentiment analysis research. It includes both multimodal [1, 2] and text-based. Comprehending what others think has consistently been viewed as a significant key for decision-making. People tend to ask for suggestions from friends when buying a new gadget. But nowadays, due to the advancement in the trends in the Web, people tend to share their experiences, reviews of products, and feelings on social networks, blogs, forums, etc.

The importance of understanding and analysing these reviews and online user-generated data has increased gradually. The purchase decisions [8] can be made by the qualities and the demerits of an invention on the basis of the knowledge provided by the existing users. The current trends were adopted by e-commerce companies, and they analysed the reviews of users to improve their products and services. Identifying the feedback from online movie reviews, finding the best gadgets, and listening to music that most users like are a few examples of sentiment analysis applications. In the product review, polarity categories such as positive, neutral or negative created on its descriptive information are assigned to a document in the sentiment classification. Even sentiment analysis is representing in different modality. recent research in various multi-modality analysis employing deep learning methodology is gaining popularity. Every day, massive amounts of data are available through public media platforms such as, WhatsApp, microblogs, YouTube, Facebook and Twitter [47]. Moreover, machine learning techniques are used in medical field. With the development of machine learning techniques, for example, it is now possible to identify insulin resistance without using clinical procedures. Insulin resistance is discovered in this work using non-invasive procedures and machine learning methodologies for individuals with high triglycerides and a low HDL-c ratio [9]. Chakradar et al. [10] defined the ontological prototype of a diverse learning system for children with distinct learning challenges in the modelling process of an educational activity. In a preference-based learning environment, it serves the needs of the students who demand differentiated instruction. This strategy can help students figure out which learning domains and multiple-criteria are most important to them. The learning resources and questionnaires for diagnosing problems are structured and combined with multiple learning methodologies to provide a variety of learning settings, including case-based, game-based, practice-based, and visual-based learning environments [10].

The fundamental understanding that people acquire through experience [11] is referred to as common-sense knowledge. The reasoning algorithms like machine learning and predicate logic are used to perform common-sense ontologies in automatic common-sense reasoning. In particular, common sense is required to appropriately classify sentiments in natural language textual data. The best example of common sense is that a small queue in a billing place in a department store is considered positive, whereas a small legroom in a flight is considered negative. Even the machine learning is used everywhere. For example, Kumar, Manoj, et al [12] used Deep Neural Networks (DNNs) to completely automate the cerebrum tumour division process in this work. Authors looked at a lot of different designs that used K-Nearest Neighbors (KNN), like DNNs that were made to look at images. In contrast to those utilised in computer vision, an innovative KNN is implemented. At the same time, the suggested KNN handles both neighbourhoods and increasingly universally applicable traits. In addition, a way for dealing with the issues associated with the unevenness of tumour kinds is offered. The approach’s efficiency is demonstrated by the results, which were calculated using the 2017 BRATS test dataset. And images are not only used in sentiment analysis. it is used in forensic analysis also. The process of recognizing, analysing, and authenticating the source, and repercussions of a safety incident and organisation laws is referred to as forensic analysis. Manipulation of digital media has become popular in recent years. As one of the key vehicles of communication, digital media, particularly photographs, can be easily altered. The current research trends in digital picture forensics are centred on validating the image’s validity. Kumar et al. [13] demonstrated a digital picture forensic technique based on determining the image’s light source(s).

If the performance of classification degrades in machine learning, after eliminating a particular feature, then that feature is relevant for classification. The classification does not include irrelevant features [15, 16]. The performance of the classification is degraded by such redundant features. The relationship between the features [17] is measured and they are used to detect the redundancy of features. The unwanted features can reduce the efficiency of machine learning methods. Thus, the overall way classification is done can be better if these extra and unnecessary features are removed. And also the performance analysis is represented based on the artificial intelligence algorithms for big data analysis [18]. The new class of a sample [20] should be foreseen with the information present on this feature which is discriminated against by the classifier. It is treated as one of the key points of a characteristic of a salient feature. The Irrelevant features and noises are eliminated by the selection of feature methods, resulting in the selection of features that are necessary for the classification. The selection of feature techniques enables the representation of a minimal number of salient features for a class attribute. The two main benefits of feature selection are (1) a clear perspective of [14] of arguments in the sentiment and clear understanding of salient class features by the improved classification accuracy. These are the two main benefits of Feature selection. The efficiency of methods of machine learning [15, 20], in which computation speed is increased through prominent features of the reduced feature vector. The data can be represented in lower-dimensional feature space more efficiently by the feature selection. Generally, cloud computing provides compute, storage, and application capabilities. It provides a platform for users to study computational assets and use them on a “pay per use” basis. The author [21] proposed protocol employs inter-cloud resource management, in which a cloud leader is chosen to interface with other clouds and make decisions on virtual machine (VM) migration.

Aspect-based sentiment analysis (ABSA) is made up of three main parts: extracting aspect terms, extracting opinion terms, and classifying aspects. These three parts are usually done separately or together. However, previous approaches don’t take advantage of how the three subtasks work together and don’t take advantage of the simply accessible sentiment data, which limits their performance. To solve these problems, Yunlong, et al. [48], came up with a new methodology named Iterative Multi-Knowledge Transfer Network (IMKTN) for entire aspect based sentiment analysis. The selection techniques, like mutual information (MI) and information gain (IG), do not tend to remove the redundant features. But, the Maximum Relevance (mRMR) and Minimum Redundancy techniques select only appropriate features, which serves as an advantage to the feature selection technique. The other selection techniques focus mainly on the appropriate features rather than considering redundant information. Using an EM routing method, Chunning et al. [49] presented a capsule network to construct word vectors of features and cluster features. In addition, to simulate the possible relationships between aspect keywords and the context, an active attention model is incorporated to the capsule routing method. You can also encapsulate a statement from a global perspective using iterative routing.

We propose a new concept termed the CLSA-CapsNet technique. It is based on a semantic parser that extracts concepts based on dependency relations between clauses from natural language text. The machine learning model is trained with the components extracted from the two proposed algorithms. The favourable and the unfavourable categories of documents are classified by the pattern of the concept. This paper proposes a capsule network-based concept level sentiment analysis model to address the shortcomings of existing approaches, which shows excellent performance on two datasets. The suggested method is based on CapsNet concept level sentiment analysis and uses senticnet to identify the sentiment values. Senticnet and combining part-of-speech, syntactic dependency, and concept extraction algorithms are fed into the CapsNet Concept Level Sentiment Analysis model, which could mix features from various standpoints and acquire the inner association of sentences much better. The CapsNet-CLSA model will increase its ability to solve semantically complex sentences with the help of effective features. As a result, the CapsNet-CLSA model will perform much better. The core contributions of this effort are categorized as:

We propose the part-of-speech-based bigram algorithm and the event concept extraction algorithm that extract concepts based on dependency relations between clauses from natural language text.

We designed an attention mechanism. One of the accomplishments of senticnet is polarity detection and part-of-speech-based bigram algorithms and event concept extraction algorithms are fed into the attention mechanism collected to ensure that the calculated word representation is only associated with the target aspect.

We integrated the proposed algorithms and the capsule network has not yet been implemented into the hotel review dataset. We demonstrate the feasibility of the proposed CapsNet-CLSA by conducting experiments on two real-world review data sets.

2 Related work

There are four main categories of recent approaches to measuring sentiment: statistical approaches, catch of keywords, lexical affinity, and concept level methodologies. The spotting of keywords is the maximum ingenuous and common approach due to the usability and budget. Textual data is grouped into categories that move determined by the presence of terms that have a reasonably unambiguous effect, like happy, sad, bored, and scared. The shortcomings of this method are focused on two areas: one is an inadequate understanding of the effect when negation is alleviated and the other is based on surface characteristics. In connection with its primary weakness, whereas the method be able to define the statement “Yesterday was a wonderful day for me “in the sense of happiness.”, and a sentence will most likely fail such as “Yesterday was not at all a happy day for me”. Regarding the approaches to the of second weakness, it depends on the existence of obvious terms that are just surface characteristics of the text. In practical terms, many phrases express results through the underlying context rather than the effect of the adjective concepts. For instance, “my husband just submitted for divorce, and he is trying to take away custody of my kids.” Strong emotions can be seen, but no affect keywords are used, so it can’t be classified by keyword approaches.

Lexical sensitivity is much more advanced than simple keyword spotting. instead of just spotting obvious affect word sequence, it allocates a probabilistic ‘affinity’ to the arbitrary phrases for a specific sentiment. For instance, an accident may be given a probability of 75% showing a negative effect, such as a car accident or being offended by an accident. Typically, such predictions are learned using linguistic language. Despite the fact that pure keyword detection consistently outperforms, the method has two major flaws. First, lexical affinity can easily be influenced by working only at the word level via sentences such as “I prevented an accident” and “I met my girlfriend by accident.” The probabilities of lexical affinity are always skewed towards the text of a specific genre, as determined by the linguistic corpora’s source.

The relationship between words and their subsequences was used for sentiment classification by Matsumoto et al. [23]. Text mining techniques were applied to extract such relationships. In a movie review dataset, Pang et al. [22] generated features by using bigrams and unigrams. One such feature of the classification of sentiment includes dependency tree subgraphs that were extracted from a parsed sentence by Pak and Paroubek [24]. Mongia, Shweta, et al. [25] used supervised learning technique to predict the amount of confirmed covid cases, recovered cases and death cases of COVID-19 cases in India. The necessity of the hour is for governments to be equipped with tools for early identification, prevention, and mitigation of infectious diseases. Death and recovery rates have also been anticipated by the authors. The authors empowered the relevant authorities in implementing effective preventive measures during the decision-making process. The SVM classifier with extracted subgraphs dominated the n-gram features and bag-of-words in a movie review dataset. The bag-of-word features were beaten by the linguistic dependency trees, which were used by Nakagawa et al. [26] for analysis of sentiment. Xia et al. [27] investigated aspects of word relationships that included practical aspects of conditioning. URoffoff et al. [28] used the entry-level to officially define different kinds of lexical symptoms. Rose et al. [29] looked at syntactic features of sensory differentiation and came up with common dependence factors that made mining more effective.

The Adjectives, pronouns, and nouns were also considered in the classification of sentiments against the independent features in machine learning methods. The efficiency of the POS tag was experimented with by Mejova and Srinivasan [30]. They verified that adjectives, nouns, etc., overlapped other features of individual POS-tags. Based on the theory of the orientation method using semantics by Turney’s [33] and Osgood’s [32], a new method was proposed by Mullen and Collier [31] to expand the set of features. The efficiency of sentiment classification was improved by the combination of the linguistic-based method of orientation and machine learning by the lexicon proposed by Dang et al. [34]. The sentiment classification was improved by the newly proposed feature based on the lexicon. Other mining-related activities focus on conceptualization. Gelfand et al. [35] established a type of graph semantic relationship method to retrieve concepts throughout the document. They use associations among the words, drawn from lexical knowledge, to create concepts.

3 Concept based sentiment analysis using senticnet and capsule networks

Sensitivity computing refers to a variety of disciplinary methods of opinion mining that straddle the line between common sense computing and affective computing [14], utilising both social media and personal computers to improve their opinions, feelings, and perceptions over the internet. Sentic computing is a tool related to common sense reasoning and affective ontologies that enables text to be analysed not only at the paragraph, page, document, sentence, clause, or concept level, but also at the document, paragraph, page, sentence or concept based [37]. Sentic Album is a new, information-based, and concept-managed online personal photo system that intelligently annotates, organizes, and retrieves online personal photos using data and metadata [19].

Specifically, sentic registering includes disciplinary utilisation of counterfeit insight and Semantic Web methods for information portrayal and deduction; arithmetic, for completing assignments such as chart mining and many-dimensionality decrease; phonetics, for talk examination and pragmatics; brain science, for psychological also, full of feeling demonstrating; humanism, for getting social media elements and public impact; and lastly, morals, for thoughtful allied problems around the idea of the brain and the making of enthusiastic technologies.

The part of the speech-based bigram pseudo-code identifies concepts such as fruits, bazaar, some vegetables, and fruits. To detect event-based concepts, that tie between the normalised web chunks and object concepts. This work is completed by exploiting a parsing graph that plots many numbers of word expressions that exist in the knowledge bases. That is, to easily define complex concepts, an unweighted directed graph is used. Algorithm 2 is used to fetch the event concepts like “buy fruits,” “goto the bazaar,” “buy vegetables,” “buy some vegetables” on behalf of an algorithm for extra processing. We also used a sentic computing system in this work, using the semantic parser to decompose text content into natural language concepts, which are then interpreted and analysed using the capsule network technique.

3.1 Feature extraction

The machine learning model is built by constructing various sets of features for sentiment analysis. These features are explained clearly in the below section.

3.2 Unigrams

The spaces, along with noise characters, are separated from the bag of words that are extracted to form the Unigram. For example, ‘The Taj Mahal is beautiful.” The words “the,” “Taj,” ‘Mahal,” “is,” ‘beautiful’ are all unique unigrams.

3.3 Bigrams

The two successive words in a sentence are known as “bigrams.” For example, in the sentence ‘BMW is an expensive car,” “is an expensive car,” “expensive car,” ‘expensive car’ are features of Bigram. It has the ability to combine some contextual data.

3.4 Bi-tagged

The fixed patterns extracted from a part-of-speech (POS) are called “Bi-tagged.” The adjectives in most of the Bigrams are thought to be more likely to have a strong emotional impact and adverbs tend to be descriptive [33, 36]. Hence, we use sentiment-rich features extracted from the POS-based information. These features that are extracted from two-words are proposed by Turney [33], of which one word is either an adverb or adjective. We accepted the part-of-speech based patterns from Turney’s study. From our observation, we found that verbs also contained information about sentiments resulting in more extraction of sentiment-bearing features. A few details of the guidelines are represented in Table 1 and Table 2 to extract Turney’s feature.

Table 1
Rules to get Turney’s data

S. No Primary Word Secondary word Third word Example

1. JJ – JJ Piercing_JJ, open_JJ

2. RB / RBS / RBR – JJ Perfectly_RB, minimized_JJ

3. JJ – NN / NNS Big_JJ, dog_NN

4. RB / RBS / RBR – VB / VBG / VBD In_RB, hand_VB

5. NN / NNS – JJ Time_NN, chased_JJ

6. NN / NNS CC NN / NNS House_NN, but_CC, yard_NN, mess_NN

S. No	Primary Word	Secondary word	Third word	Example
1.	JJ	–	JJ	Piercing_JJ, open_JJ
2.	RB / RBS / RBR	–	JJ	Perfectly_RB, minimized_JJ
3.	JJ	–	NN / NNS	Big_JJ, dog_NN
4.	RB / RBS / RBR	–	VB / VBG / VBD	In_RB, hand_VB
5.	NN / NNS	–	JJ	Time_NN, chased_JJ
6.	NN / NNS	CC	NN / NNS	House_NN, but_CC, yard_NN, mess_NN

Table 2

Guidelines to fetch verb-based features

S. No	Primary Word	Secondary word	Third word	Example
1.	VBN	–	NN / NNS	Shot_VBN, hand_NN
2.	JJ	–	VBN / VBG	Black_JJ, eaten_VBN
3.	VV / VBG / VBP	–	JJ / JJS / JJR	Eating_VBG, white_JJ
4.	RB / RBR / RBS			finally_RB
5.	VV / VBG / VBP	CC	NN / NNS	tidy_VBG, but_CC, yar_NN, mess_NN

3.5 Dependency features

The sentiment analysis model is used to form a semantic relationship between the words in a sentence. Semantic patterns are very useful in subjective detection [24 –26]. The dependency parser tree captures the information available in the interminability of a sentence.

3.6 Syntactic N-grams (sn-grams)

The sentiment classification [37] used patterns from the text obtained from machine learning algorithms. This feature is called Syntactic N-grams (sn-grams). The nodes [38] contained in the subtree of a sentence are understood by the sn-gram. Vectors are used to represent the dependency tree in the sn-gram. They are less noisy and more explanatory as they are represented by semantic entities when compared to the traditional n-grams. These sn-grams go an elongated way towards semantically essential representations in a sentence, and they convey the relationship between words in a sentence. All the possible words are formed from their leaf nodes. For example, “the food is delicious.” in this sentence, the features are extracted as follows: delicious food, food is delicious, food, the delicious. The semantic message from the machine learning model is conveyed by the syntactic n-grams.

4 Main algorithm (Proposed Approach)

Our processing system is shown belowrepresented in Figs. 1 and 2. The following is our processing system. First, we extract relationships of dependency among the sentence’s terms. Then we use these relationships of dependence to develop difficult level relationships and concepts that involve semantics. Observing these principles, we were extracted, and we acquired common sense related understanding from capsule networks. Furthermore, only main concepts are chosen. To build relations of dependency between the words, the dependency parser from Stanford was utilised [39]. Figure 3 signifies the proposed capsule network architecture for concept level sentiment classification and Fig. 4 indicates flow diagram of our technique to predict polarity detection.

Fig. 1

Algorithm 1- Part-of-speech-based bigram algorithm.

Fig. 2

Algorithm 2- Algorithm for extracting event concepts.

Fig. 3

Proposed capsule network architecture for concept level sentiment classification.

Fig. 4

Flow diagram depicting our method to sentiment analysis at the concept level.

4.1 Sematic parsing

The sematic parser’s aim is to decompose content into clauses and then back into concepts. knowledge of text semantics, as well as some additional data (influence) relevant to such semantics, is frequently sufficient to perform emotion recognition and polarity detection quickly.

One of the well-known algorithms used to perform textual sentiment analysis is semantic parsing POS-based bigram algorithm [1]. Figure 1 briefly represents the semantic parsing Part-of-speech based bigram algorithm.

To build relations of the dependency between the words, we used the Standford dependency parser. Below is the example for representing the part of speech from Sandford dependency parser.

4.1.1 Part-of-speech

4.1.2 Bracked parse tree

Figure 5 is an example of a part of speech from a sentence. Figure 6 represents an example of a bracketed parse tree. The “bracketed parse (tree)” is the name of the Stanford Parser’s output format. It’s meant to be seen as a graph, with words, phrases, and edges represented as nodes, labels, and linked hierarchy. The root node is usually a delirious root.

Fig. 5

An example of part-of-speech for a sentence.

Fig. 6

An example of bracketed parse tree.

4.2 Creation of concepts using dependency relations

It is considered that the technique based on sentic computing is more powerful than word-based technique which is restricted by the way it doesn’t consider into account structure. The bag-of-concepts outperforms the bag-of-words, and that a can signify the semantics related to a normal linguistic sentence. The bag of words technique breaks down concepts like grid computing formed in two distinct word which represents the semantic inputs. Though the bag of model technique cannot correctly extract the polarisation of a sentence, such as “the house is tidy but the yard is a mess,” it can extract the concepts “house,” “tidy,” “yard,” and “mess.” The sentence polarity is negative.

To this purpose, we continue to develop and utilise structured linguistic patterns. We create a methodology for fetching concepts from the syntactic erections of sentences. The relation form, the head word, and the dependent word are the three types of elements that make up a dependence relation. The syntactic relationship between the two terms in the sentence is determined by the form of the relation. The head passes down the pair’s important syntactic and semantic features. The factor that is based on the head word is referred to as the relationship dependent word [40, 41].

4.3 Universal Rules

4.3.1 Law of Subject Noun

Trigger: if the syntactic token is considered to be the context of a verb.

Behaviour: It is used to measure the polarity of the relationship.

When the multi-word definition (s-t) is found in the sentence.

Example: The cartoon is tedious.

In the above example, ‘cartoon’ is the relation between subject and ‘tedious’. As a result, the concept (tedious cartoon) is mined.

4.3.2 Conjunctions

Trigger: Conjunctions appear as dashed lines connecting parallel elements.

Behaviour: In a word, h belongs to a connecting element and is associated with a t word, therefore the (h-t) concept is retrieved.

Example: We are campers, tired but happy.

4.3.3 Law of combined subject noun and adjective complement rule

Trigger: The verb is related to an adverb in an adjective complement if the effective vector is revealed to be the lexical subject of a verb.

Behaviour: In a topic noun relationship, if a word h is with the word t, and with an adjective, the word t must Complement the link with the word w, then the idea (w-h) has been extracted.

Example: The car looks beautiful.

In the above example, “car” is linked to ‘looks’ and ‘looks’ is a relationship between the adjective complement and “beautiful”. So, the concept (beautiful car) is extracted.

4.3.4 Direct token items

Trigger: When the active nominal is a head verb of unique identifier dependency.

Behaviour: In a word h belongs to a direct token item association with a t word. Therefore, the (h-t) concept is Retrieved.

Example: He played the game in 3D.

In the above example, the method extracts the concept (play game).

4.3.5 Law of negation

Trigger: Negation is a key element of natural language content, which generally spins the text’s meaning. It is used in the text to extract whether a word is negated.

Behaviour: if negation with a negation marker t is a word h, the concept (t-h) is then retrieved.

Example: They don’t like to park the vehicle in no-parking area.

When the multi-word definition (s-t) is discovered in the sentence, it is utilised to measure the polarity of the relationship.

4.3.6 Adjectival complement

Trigger: A verb adjectival complement is an adjectival phrase that acts as a complement.

Behaviour: it is used to measure the polarity of the relationship when a verb functions as a complement.

Example: He looks very beautiful.

4.3.7 Adverbial, adjectival clause modifier

Trigger: The active token is changed by an adverb, and an adjective then the rules get activated.

Behaviour: The concept (t-w) is extracted if a word w is changed by a word t.

Example: John is a good boy.

In the above example, the concept “good boy” is extracted.

4.3.8 Relative pronoun in the adjectival clause with subordinating conjunctions

Trigger: Clauses also joined to form main sentence elements that they change. A relative pronoun is located in its suitably spaced location. Subordinating conjunctions are placed on the dashed lines.

Behaviour: the parser creates a (prep-with) dependency relationship between the verb hit and the mallet noun.

Example: Peter hit john with a mallet.

4.3.9 Noun compound modifier

Trigger: When it detects a noun, the rule is triggered. It’s made up of multiple nouns. An NP noun compound modifier is any noun that is used to change the head noun.

Behaviour: The complex concept is extracted if a noun word w is changed by another noun word t.

Example: camera life of the laptop is not worthy.

4.3.10 Single word concepts

Generally, a sentence is a collection of words that represent a part of speech, such as a verb, noun, adverb and Adjective. These words are used to extract the relevant information from the context.

4.3.11 Adverbial clause convertor

Complete clauses that act as convertors of a verb are vulnerable in this way. Standard examples contain conditional structures and temporal clauses.

Trigger: when the active token is a verb, the rule is triggered and adapted by an adverbial clause. The modifying clause is led by the dependent.

Behaviour: If Senticnet has a binary notion (h or d), it is used to calculate the score. Otherwise, the law assigns polarity by considering the dependent d first, then the head h.

Example: computer work slows down when the best games are playing.

4.3.12 An appositive phrase

The appositive and its modifiers make up an appositive expression. Appositives may be either mandatory or optional. The required appositive provides data for retrieving the pronoun or noun in order to proceed to the next process. The sentence would be meaningless without the required appositive. An optional appositive delivers some more data about a pronoun or noun in a sentence. It provides the reader with an additional but unimportant detail. Commas can be used to separate non-essential appositives.

Example: The dog who lives next door, Frankie, enjoys going for walks in the park.

Sentiment: negative

Senticnet.net/demos/

Concept extractor:

frankie_dog

dog

go_for_walk

walk

walk_in_park

park

4.4 The SenticNet

The system of emotion classification is built to obtain as input a concept of natural language expressed by a dimension of space and the related sentic levels (Figure 7) to identify the four affective dimensions involved. such as sensitivity, pleasantness, aptitude, and attention. The dimensionality M of the space of inputs stems from Affective Space’s unique architecture. As for the results, each affective dimension can be defined in general by an affective dimension along with a range value [1, –1], that reflects the strength of the feelings felt.

Fig. 7

Flowchart represents the polarity detection of sentence level method. The text in natural language is first broken down into concepts by Standford CoreNLP parser. Sentic patterns are implemented if these are contained in SenticNet. The capsule network classifier is used if none of the definitions are accessible in SenticNet.

4.5 Capsule networks

Capsule Networks (Figure 8 & 9) are a type of deep neural network design that is relatively new. Hinton et al. recently proposed, they have shown great progress in many fields, particularly in the areas of computer vision and natural language processing. A capsule is a collection of neurons that trigger individually, such as location, size, and hue, for different properties of an object type. A capsule is a collection of neurons that work together to form an activity vector with one element for each neuron, each of which contains the value of that neuron.

Fig. 8

Architecture of capsule networks (CapsNet).

Fig. 9

Routing algorithm [41].

A CapsNet (Capsule Neural Network) is a machine learning framework that can be used for better modelling hierarchical relationships. The method aims to be as close to biological neural organisation as possible [41]. A capsule is a collection of neurons that each respond to particular features of an item, such as size, colour, and location. A capsule is a collection of neurons which produce an activity vector along with one item for each neuron, which contains the neuron’s instantiation value. Capsnets substitute vector-output capsules for scalar-output function detectors, and allow max-pooling for routing-by-agreement. According to the documentation, Conv-CapsNet has four layers: a secret function layer, a convolutional layer, a PrimaryCaps layer, and a DigitCaps layer. The PrimaryCaps layer consists of eight capsules (um), each of which contains eight-dimensional features. We have used Eq. to calculate the contribution (unnm) of each capsule (um) in PrimaryCaps to the contribution (vn) in DigitCaps. Using Eq. 1 as a starting point Because capsules are self-contained, when a lot of them agree, the chances of getting an accurate reading go up a lot.

A Capsule Neural Network (CapsNet) is a machine learning system for more effectively modelling hierarchical relationships. The technique is an experiment to more accurately represent biological neural organisation [41]. ${\hat{u}}_{nm} = W_{mn} + u_{m}$ (1)

We need the length of a capsule’s result vector to reflect the likelihood that the object it represents an existent in the current input. We apply a non-linear “squashing” feature to assure that short vectors are reduced to practically zero length while long vectors are compressed to a length slightly less than zero. We believe discriminative learning is the best way to take advantage of this non-linearity. $V_{n} = \frac{{∥ s_{n} ∥}^{2}}{1 + {∥ s_{n} ∥}^{2}} \frac{s_{n}}{∥ s_{n} ∥}$ (2)

Here, the vector output of the capsule n is denoted by v_n and the total input is denoted by s_n.

The capsule input s_j is a weighted amount of all “prediction vectors” u_nm, except for the first layer of capsules, and it is determined by increasing the output u_m of a layer in a capsule below by a weight matrix W_mn. $S_{n} = \sum_{t} c_{mn} {\hat{u}}_{n | m}, {\hat{u}}_{n | m} = W_{nm} u_{m}$ (3)

The iterative dynamic routing method specifies the c_mn, which are coupling coefficients. All the capsules in the layer represents the coupling coefficients add up to one and are determined using a “routing softmax” with initial logits b_mn equal to the log earlier probabilities that capsule I should be joined to capsule n. $c_{mn} = \frac{exp (b_{mn})}{\sum_{0} \exp (b_{mo})}$ (4)

The log priors can be discriminatively taught at the same time as the other weights. They are influenced by the position and shape of the two capsules, but not by the current input image2. The initial coupling coefficients are then improved iteratively by comparing each capsule’s current output Yn in the layer above with capsule m’s prediction Xnnm. The agreement is the scalar product pmn = Yn: Xnnm. This agreement is treated as if it were a log probability and applied to the starting logit, qmn, until the new values for all the coupling coefficients linking capsule I to higher level capsules are computed. Each capsule in a convolutional capsule layer generates a vector for each capsule type in the layer above, with different revolution matrices for each grid member and capsule type.

5 Machine learning support

Despite being more effective than the bag-of-words technique, the proposed model is still less research work in nlp and the collection of dependency rules. Natural Language Processing implementations use sentiment analysis to extract emotions that are associated with few raw texts. Sentiment analysis is the analysis of tweets, Facebook posts over some time to fetch the sentiment of a few audiences. Further, we extract concepts to create a better understanding, and even if sentic pattern is not matched, we choose an alternative to capsule networks for representing the sentiment analysis. We used two familiar datasets of sentiment analysis and the Capsule Network further gives the sentiment either positive or negative from the text.

5.1 Dataset used

5.1.1 Movie review dataset

We have used a movie review dataset from the corpus implemented by Pang and Lee [42]. This dataset contains 1000 negative and 1000 positive reviews based on movies inspected by professional movie reviewers. This dataset is gathered from rottentomatos.com where all the contextual data is converted to lemmatized, lowercase, and HTML elements are removed. Every movie review is manually labelled by Pang and Lee. After that, Socher et al. [43] annotated this movie dataset at the sentence level. [43] extracted 11855 sentences from the movie reviews and marked them, utilising a fine-grained inventory of five sentiment labels as positive, negative, neutral, strongly positive, and strongly negative. Then, in this implementation work, we focused on binary classification.

5.1.2 Trip advisor hotel review dataset

We also conducted additional experiments using 8000 TripAdvisor hotel review datasets for the evaluation and performance of sentiment analysis. This trip advisor dataset [45], provided by [44], constructed a vocabulary list of nearly 4750 words. We have done preprocessing stages such as removing stop words and punctuation to reduce the complexity of the original content.

5.2 Representation of the feature set

5.2.1 Features of common-sense knowledge

Common-sense awareness entails concepts represented via AffectiveSpace. The semantic parser encodes concepts derived from text, particularly 300-dimensional concepts. Vectors that have been real-valued and then aggregated into a single vector via coordinate-wise summing and describing the sentence: $x_{i} = m_{j = 1}^{N} x_{ij}$

Where, N signifies the amount of concepts extracted form a sentence and x_i is the vector of the i^th organize of the sentence feature when i = 1 to 100. x_ij is the ith coordinate of its vector of jth concepts [46].

5.2.2 Features of part of speech

The amount of speech feature is obtained by the amount of nouns, adverbs, conjunctions, interjections, and adjectives.

5.2.3 Features of senticnet

Each notion’s polarity scores were extracted from the sentence, and they were obtained from SenticNet to create a single scalar feature.

5.2.4 Feature of modification

We also conducted additional experiments using 800This is a single function feature of binary. we obtained, the sentences from the dependency parser. From the dependency parser, we obtained the sentences of the dependency tree. We have considered this tree to decide whether any words are modified by an adjective, noun, conjunction, interjection, or adverb. The purpose of the adjustment was set to 1 if we establish any alteration association in the sentence; otherwise, it was set to 0.0. trip advisor hotel review datasets for the evaluation and performance of sentiment analysis. This trip advisor dataset [45], provided by [44], constructed a vocabulary list of nearly 4750 words. We have done preprocessing stages such as removing stop words and punctuation to reduce the complexity of the original content.

5.2.5 Features of negation

Likewise, the feature of negation was a single binary element and it is determined by if in the sentence with any negation. It is important since negation will reverse the polarity of that phrase.

5.3 Classification methods

Over the training phase of the movie review dataset, lstm-rnn and Capsule Network classifier was trained. In the training part, lstm-rnn and capsule network classifiers are trained using the sentence feature set of the movie analysis dataset. We discovered that the capsule network outperformed the lstm-rnn in terms of both accuracy and processing time. In specific, on the testing portion of the research, we achieved 88.5% accuracy from the movie analysis dataset and 79.4% with the lstm-rnn classifier. Similarly, the same method that was tested on the TripAdvisor hotel review dataset sentence and achieved 86.6% accuracy level with Capsule Network and achieved an 81.7% accuracy level with lstm-rnn. Then, If SenticNet is unable to interpret a sentence, the learned capsule network classifier is used to create a better guess at the phrase’s polarity based on the existing feature.

6 Experimental results and discussion

The proposed approach was checked on two datasets, such as the movie review dataset described in section 5.1.1 and the trip advisor dataset described in 5.1.2. As exposed by result lower, the best the results in accuracy are accomplished by implementing a capsule network and deep learning technique. After the preprocessing and semantic extension, the text of the comment was usually under 300 characters long. The length of the comment text after preprocessing and semantic extension was typically less than 300 characters. As a result, Google news-vectors-negative300.bin [41], the Google Pre-Trained Corporate News word vector library, was chosen to generate a text vector for the feedback.

6.1 Results

The two datasets (movie review data and hotel review data) were applied to the proposed CLSA-CapsNet model, the capsule network model and traditional deep learning models. Table 3 displays the results of binary classification experiments, where each cell’s two values correspond to the “positive” and “negative” groups, respectively. Accuracy, precision, recall, and F1 score are all described in these tables.

Table 3
Results of all the model

Classification Accuracy

Movie Review Dataset Hotel Review Dataset

Deep learning (lstm-rnn) 79.4% 81.7%

Capsule Network 81.9% 83.3%

CLSA+Capsule Network 86.4% 84.6%

CLSA+SenticNet+Capsule Network (with Dropout Layer) 88.5% 86.6%

Classification	Accuracy
Deep learning (lstm-rnn)	79.4%	81.7%
Capsule Network	81.9%	83.3%
CLSA+Capsule Network	86.4%	84.6%
CLSA+SenticNet+Capsule Network (with Dropout Layer)	88.5%	86.6%

It can be challenging to find a model in Machine Learning and Deep Learning is co-adapt. This means that each neuron is very important to the other ones. They have a big impact on each other and aren’t as independent as they should be when it comes to what they say. It’s also common to find cases where some neurons have a predictive power that is more important than those of other neurons.

These effects must be avoided, and the weight must be spread out to avoid overfitting. Regulating the co-adaptation and high predictive power of some neurons can be done in different ways. One of the familiar is the Dropout. We tested our approach to the hotel review dataset and the movie analysis dataset without SenticNet and obtained an accuracy of 84.6 percent and 86.4 percent. Then we tested our approach to the hotel review dataset and the movie analysis dataset with SenticNet along with dropout layer and obtained an accuracy of 86.6 percent and 88.5 percent.

Tables 4 5 had the highest F1 scores, while deep learning algorithms had similar outcomes for the two datasets. Our first CLSA+SenticNet+Capsule network was the best among all models.

Table 4

Results of movie review dataset

Model	Accuracy	Precision	Recall	F1
Deep learning (lstm-rnn)	79.4%	78.6%	79.8%	79.7%
Capsule Network	81.9%	80.7%	81.6%	82.4%
CLSA + CapsuleNetwork	86.4%	85.3%	85.7%	85.8%
CLSA+ SenticNet +Capsule Network (with Dropout Layer)	88.5%	87.4%	89.2%	88.9%

Table 5

Results of hotel review dataset

Model	Accuracy	Precision	Recall	F1
Deep learning (lstm-rnn)	81.7%	80.9%	81.6%	81.9%
Capsule Network	83.3%	82.6%	84.2%	83.6%
CLSA + CapsuleNetwork	84.6%	84.1%	84.3%	85.2%
CLSA+ SenticNet +Capsule Network (with Dropout Layer)	86.6%	85.8%	87.2%	86.7%

7 Conclusions

There were just a few thousand exabytes of knowledge on the Web between the beginning of humanity in 2003 and the present. Today, a huge amount of data is generated each week. The Social media has given public new applications for making and sharing their content, thoughts, and views with virtually millions of people linked to the Internet in a timely and cost-effective manner. This vast amount of useful data, on the other hand, is largely unstructured, having been created primarily for human use, and therefore is not explicitly machine-processable. Concept-level sentiment analysis, unlike previous word-based methodologies, uses web ontologies or semantic networks to perform a semantic analysis of text, allowing for the accumulation of concepts and affective knowledge connected with natural language opinions. Concept-level sentiment analysis is limited by the richness of the knowledge base and the fact that the bag-of-concepts model, while more evolved than to the bag-of-words model, misses out on essential discourse structure data that is necessary for correctly detecting the polarity expressed by natural language opinions. In this paper, we introduced a innovative paradigm to CLSA that combines semantics, common sense computing, and capsule networks to increase the accuracy of polarity detection. We demonstrated that CLSA+SenticNet+Capsnet improved performance on relatively long text. The proposed CLSA+ SenticNet +Capsnet models achieved 88.5 percent and 86.6 percent accuracy in binary classification, respectively. Accuracy, Precision, recall and F1 Score of classification models are representd in Figure 10, 11, 12, 13.

Fig. 10

Accuracy of classification models.

Fig. 11

Precision of classification models.

Fig. 12

Recall of classification models.

Fig. 13

F1 Score of classification models.

References

Cambria

and Hussain

, Sentic album: content-, concept-, and context-based online personal photo management system, Cognitive Computation 4(4) (2012), 477–496.

Poria

, et al., Towards an intelligent framework for multimodal affective data analysis, Neural Networks 63 (2015), 104–116.

Howard

and Cambria

, Intention awareness: improving upon situation awareness in human-centric environments, Human-centric Computing and Information Sciences 3(1) (2013), 1–17.

Cambria

, et al., Sentic computing for patient centered applications, IEEE 10th International Conference on Signal Processing Proceedings. IEEE, 2010.

Cambria

, et al., Towards crowd validation of the UK national health service, WebSci 10 (2010), 1–5.

Kumar

, et al., Secure video communication using firefly optimization and visual cryptography, Artificial Intelligence Review (2021), 1–21.

Raheja

, et al., Modeling and simulation of urban air quality with a 2-phase assessment technique, Simulation Modelling Practice and Theory 109 (2021), 102281.

Liu

, Sentiment analysis and opinion mining, Synthesis Lectures on Human Language Technologies 5(1) (2012), 1–167.

Thapliyal

, et al., A differentiated learning environment in domain model for learning disabled learners, Journal of Computing in Higher Education (2021), 1–23.

10.

Chakradar

, et al., A Non-invasive Approach to Identify Insulin Resistance with Triglycerides and HDL-c Ratio Using Machine learning, Neural Processing Letters (2021), 1–21.

11.

Cambria

, et al., Common sense computing: From the society of mind to digital intuition and beyond, European Workshop on Biometrics and Identity Management. Springer, Berlin, Heidelberg, 2009.

12.

Kumar

, et al., Automatic Brain Tumor Detection Using Machine Learning and Mixed Supervision, Evolving Role of AI and IoMT in the Healthcare Market. Springer, Cham, (2021), 247–262.

13.

Kumar

and Srivastava

, Identifying photo forgery using lightingelements, Indian J Sci Technol 9(48) (2016), 1–5.

14.

Cambria

, et al., An ELM-based model for affective analogicalreasoning, Neurocomputing 149 (2015), 443–455.

15.

Guyon

and Elisseeff

, An introduction to variable and feature selection, Journal of Machine Learning Research 3 (2003), 1157–1182.

16.

Agarwal

and Mittal

, Enhancing performance of sentiment analysis by semantic clustering of features, IETE Journal of Research 60(6) (2014), 414–422.

17.

Peng

, Long

and Ding

, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8) (2005), 1226–1238.

18.

Punia

S.K.

, et al., Performance analysis of machine learning algorithms for big data classification: Ml and ai-based algorithms for big data analysis, International Journal of E-Health and Medical Communications (IJEHMC) 12(4) (2021), 60–75.

19.

Agarwal

and Mittal

, Prominent feature extraction for reviewanalysis: an empirical study, Journal of Experimental &Theoretical Artificial Intelligence 28(3) (2016), 485–498.

20.

Hoque Dhruba

, Bhattacharyya

and Jugal Kalita,

, MIFS-ND: A mutual information-based feature selection method, Expert Systems with Applications 41(14) (2014), 6371–6385.

21.

Srivastava

, et al., CGP: cluster-based gossip protocol for dynamic resource environment in cloud, Simulation Modelling Practice and Theory 108 (2021), 102275.

22.

Pang

, Lee

and Vaithyanathan

, Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070 (2002).

23.

Matsumoto

, Takamura

and Okumura

, Sentiment classification using word sub-sequences and dependency sub-trees, Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, 2005.

24.

Pak

and Paroubek

, Text representation using dependency tree subgraphs for sentiment analysis, International Conference on Database Systems for Advanced Applications. Springer, Berlin, Heidelberg, 2011.

25.

Mongia

, et al., Prediction of COVID-19 epidemic curve of India using supervised learning approach, International Journal of Computer Applications in Technology 66 (3-4) (2021), 433–441.

26.

Nakagawa

, Inui

and Kurohashi

, Dependency tree-based sentiment classification using CRFs with hidden variables, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010.

27.

Xia

, Zong

and Li

, Ensemble of feature sets and classification algorithms for sentiment classification, Information Sciences 181(6) (2011), 1138–1152.

28.

Riloff

, Patwardhan

and Wiebe

, Feature subsumption for opinion analysis, Proceedings of the 2006 conference on empirical methods in natural language processing. 2006.

29.

Joshi

and Rosé

, Generalizing dependency features for opinion mining, Proceedings of the ACL-IJCNLP 2009 conference short papers. 2009.

30.

Mejova

and Srinivasan

, Exploring feature definition and selection for sentiment classifiers, Fifth international AAAI conference on weblogs and social media. 2011.

31.

Mullen

and Collier

, Sentiment analysis using support vector machines with diverse information sources, Proceedings of the 2004 conference on empirical methods in natural language processing 2004.

32.

Osgood

C.E.

, George Suc

and Percy Tannenbaum

, The measurement of meaning. No. 47. University of Illinois press, 1957.

33.

Peter Turney,

, Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv preprint cs/0212032 (2002).

34.

Dang

, Zhang

and Chen

, A lexicon-enhanced method for sentiment classification: An experiment on online product reviews, IEEE Intelligent Systems 25(4) (2009), 46–53.

35.

Gelfand

and Wulfekuler

, Punch

W.F.

, Automated concept extraction from plain text, AAAI 1998Workshop on Text Categorization. 1998.

36.

Hatzivassiloglou

and McKeown

, Predicting the semantic orientation of adjectives, 35th annual meeting of the association for computational linguistics and 8th conference of the european chapter of the association for computational linguistics. 1997.

37.

Sidorov

, et al., Syntactic n-grams as machine learning features for natural language processing, Expert Systems with Applications 41(3) (2014), 853–860.

38.

Sidorov

, Non-continuous syntactic n-grams, Polibits 48(1) (2013), 67–75.

39.

De Marneffe

M.-C.

, Christopher Manning,

, The Stanford typed dependencies representation, Coling 2008: proceedings of the workshop on cross-framework and crossdomain parser evaluation. 2008.

40.

Poria

, et al., Sentic patterns: Dependency-based rules for concept-level sentiment analysis, Knowledge-Based Systems 69 (2014), 45–63.

41.

Sabour

, Frosst

and Geoffrey Hinton,

, Dynamic routing between capsules, Advances in neural information processing systems. 2017.

42.

Pang

and Lee

. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, in: ACL, Ann Arbor, (2005), pp. 115–124.

43.

Socher

, Perelygin

, Wu

J.Y.

, Chuang

, Manning

C.D.

, Ngand

A.Y.

and Potts

, Recursive deep models for semantic compositionality over a sentiment treebank.

44.

Kasper

and Vela

, Sentiment analysis for hotel reviews, Computational linguistics-applications conference. Vol. 231527. 2011.

45.

http://times.cs.uiuc.edu/wang296/Data

46.

Poria

, et al., Sentic patterns: Dependency-based rules for concept-level sentiment analysis, Knowledge-Based Systems 69 (2014), 45–63.

47.

Mahendhiran

P.D.

and SJIJoIT Kannimuthu , Deep learning techniques for polarity classification in multimodal sentiment analysis, International Journal of Information Technology & Decision Making 17(03) (2018), 883–910.

48.

Liang

, et al., An Iterative Multi-Knowledge Transfer Network for Aspect-Based Sentiment Analysis, arXiv preprint arXiv:2004.01935 (2020).

49.

, et al., Capsule network with interactive attention for aspect-level sentiment classification, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

Classification	Accuracy
	Movie Review Dataset	Hotel Review Dataset
Deep learning (lstm-rnn)	79.4%	81.7%
Capsule Network	81.9%	83.3%
CLSA+Capsule Network	86.4%	84.6%
CLSA+SenticNet+Capsule Network (with Dropout Layer)	88.5%	86.6%

CLSA-CapsNet: Dependency based concept level sentiment analysis for text

Abstract

Keywords

1 Introduction

2 Related work

3 Concept based sentiment analysis using senticnet and capsule networks

3.1 Feature extraction

3.2 Unigrams

3.3 Bigrams

3.4 Bi-tagged

3.6 Syntactic N-grams (sn-grams)

4 Main algorithm (Proposed Approach)

4.1.1 Part-of-speech

4.1.2 Bracked parse tree

4.3 Universal Rules

4.3.1 Law of Subject Noun

4.3.2 Conjunctions

4.3.3 Law of combined subject noun and adjective complement rule

4.3.4 Direct token items

4.3.5 Law of negation

4.3.6 Adjectival complement

4.3.7 Adverbial, adjectival clause modifier

4.3.8 Relative pronoun in the adjectival clause with subordinating conjunctions

4.3.9 Noun compound modifier

4.3.10 Single word concepts

4.3.11 Adverbial clause convertor

4.3.12 An appositive phrase

4.4 The SenticNet

5.1 Dataset used

5.1.1 Movie review dataset

5.1.2 Trip advisor hotel review dataset

5.2 Representation of the feature set

5.2.1 Features of common-sense knowledge

5.2.2 Features of part of speech

5.2.3 Features of senticnet

5.2.4 Feature of modification

5.2.5 Features of negation

5.3 Classification methods

6 Experimental results and discussion

6.1 Results

Table 3 Results of all the model Classification Accuracy Movie Review Dataset Hotel Review Dataset Deep learning (lstm-rnn) 79.4% 81.7% Capsule Network 81.9% 83.3% CLSA+Capsule Network 86.4% 84.6% CLSA+SenticNet+Capsule Network (with Dropout Layer) 88.5% 86.6%

References

Table 3
Results of all the model

Classification Accuracy

Movie Review Dataset Hotel Review Dataset

Deep learning (lstm-rnn) 79.4% 81.7%

Capsule Network 81.9% 83.3%

CLSA+Capsule Network 86.4% 84.6%

CLSA+SenticNet+Capsule Network (with Dropout Layer) 88.5% 86.6%