A quantum-like text representation based on syntax tree for fuzzy semantic analysis

Abstract

To mine more semantic information between words, it is important to utilize the different semantic correlations between words. Focusing on the different degrees of modifying relations between words, this article provides a quantum-like text representation based on syntax tree for fuzzy semantic analysis. Firstly, a quantum-like text representation based on density matrix of individual words is generalized to represent the relationship of modification between words. Secondly, a fuzzy semantic membership function is constructed to discuss the different degrees of modifying relationships between words based on syntax tree. Thirdly, the tensor dot product is defined as the sentence semantic similarity by combining the operation rules of the tensor to effectively exploit the semantic information of all elements in the quantum-like sentence representation. Finally, extensive experiments on STS’12, STS’14, STS’15, STS’16 and SICK show that the provided model outperforms the baselines, especially for the data set containing multiple long-sentence pairs, which confirms there are fuzzy semantic associations between words.

Keywords

Quantum-like text representation fuzzy semantic analysis fuzzy semantic membership function neural networks syntax tree

1 Introduction

Sentence embedding is an important topic in NLP study by passing knowledge to downstream tasks efficiently, such as cross-modal retrieval [1 –3], emotion annotation [4, 5] and semantic analysis [6 –11]. Different models have different mechanisms. Among the unsupervised methods, the models with better semantic expression are paraphrastic sentence embedding (PP) [12] and siamese continuous bag of words (SCBOW) [13] and their variants [14]. Using word2vec as the input vector of deep neural network has achieved very good experimental results in many downstream NLP tasks [12 –15]. Quan et al. [15] established an association matrix between words based on the constituency parse and combined word embeddings with attention weight mechanisman to construct an attention constituency vector tree (ACVT) kernel for sentence similarity. The transformer-based model establishes the direct connection between any two words in a sentence through the self-attention mechanism, making long-distance dependent features easier to capture. Dissecting BERT-based word models were used to form a new sentence embedding, analyzing the word embedding spanned space [16].

Although the existing text representation models have achieved good experimental results in many NLP tasks, such as semantic analysis, two issues still need to be solved.

Firstly, the existing literature lacks the description of semantic association between words based on quantum density matrix. The tensor product of the word vector and its transposed conjugate vector was defined as the density matrix of the word, and the sum of the density matrices of the words in the sentence is defined as the semantic representation of the sentence [17 –19]. The density matrix of the text is input into the neural network to extract features, thus realizing the deep neural network analysis based on quantum language. Therefore, the existing quantum-like text representation models based on density matrices treat words as independent individuals, ignoring the semantic associations between words.

Secondly, these existing text representation models regard words in text as individual, ignoring the different degrees of association between words, and can not reflect the grammatical modification structure of sentences. Moreover, words in different positions in the same grammar tree are closely related to each other, and the existing models do not consider the different degrees of association between words. The transformer-based model establishes the direct connection between any two words through the self-attention mechanism, but is not suitable for semantic similarity tasks [20].

Therefore, this article aims to construct a quantum-like text representation (QTR) based on syntax tree to discuss the influence of fuzzy associations between words on text semantics.

Firstly, from the perspective of quantum computing, the density matrix can not only represent the quantum states in an isolated system, but also describe the quantum states of subsystems in a composite system. Further, from the perspective of text representation, the expression of text semantics is not only related to the semantics of words, but also determined by the modified associations between words. Therefore, we can extend the density matrix representation based on independent words to the reduced density matrix representation that can reflect the semantic association between words.

Secondly, from the perspective of fuzzy set theory, in different contexts, the semantic information expressed by words needs to be measured according to the grammatical information between words. To reflect the influence of different degrees of association between words on text semantics, the fuzzy semantic membership function based on smallest subtree is defined as the semantic correlation coefficient, and combined with the reduced density matrix to form a weighted semantic association matrix between words. The linear superposition of all weighted semantic correlation matrices of the input sentence forms the sentence tensor representation. To utilize the semantic information of all elements of sentence tensor representation as much as possible, the dot product of the tensor representation of two input texts is defined as the semantic similarity matrix, and input into the convolutional neural network (CNN) to extract features. The flowchart of our model is shown in Fig. 1

Fig. 1

Framework of the provided model for sentence representation.

The rest of this paper is organized as follows. Section 2 summarizes some related literature on sentence embedding, the application of fuzzy rough set in NLP and the application of quantum computing in NLP. The proposed fuzzy semantic text representation model is constructed in Section 3. Section 4 demonstrates the experimental results and discussions. In Section 5, some conclusions are drawn.

2 Related works

2.1 Sentence embedding

How to effectively represent a text is a difficult problem that natural language processing (NLP) scholars have been exploring. In recent years, many researchers have proposed textual representations with different mechanisms to discuss the semantics of some continuous words. Sentence embedding is an important topic in NLP study by passing knowledge to downstream tasks efficiently, such as cross-modal retrieval [1 –3], emotion annotation [4, 5] and semantic analysis [6 –11]. Using word2vec as the input vector of deep neural network has achieved very good experimental results in many downstream NLP tasks [12 –15]. The transformer-based model establishes the direct connection between any two words in a sentence through the self-attention mechanism, making long-distance dependent features easier to capture. Dissecting BERT-based word models were used to form a new sentence embedding, analyzing the word embedding spanned space [16]. BERT [20] and its variants [21 –25] are regarded as effective models with outstanding performances in many NLP tasks [26 –28]. However, BERT outputs and [CLS] token embedding were not suitable for semantic similarity tasks but appropriate for classification tasks through comparing the layer-wise evolution of word representations of deep contextualized models [16]. The transformer-based sentence representation considers all correlations between all words in a sentence through self-attention mechanism with a sharp rise of parameters for pre-training [29 –31]. In brief, text representation based on deep learning depends on large-scale training set and huge parameter requirements. Accordingly, it is fundamental to use existing knowledge and certain parameters to establish a text representation model that can comprehensively discuss the semantic, grammatical and word order information between words.

2.2 Application of fuzzy rough set in NLP

The application of fuzzy sets in NLP attracts more and more attention, such as text representation [32 –34], sentiment analysis [35, 36] and e-commerce logistics [37, 38]. According to the research objects, the existing text representation models based on fuzzy rough sets are mainly divided into two categories in semantic analysis. One is the study of words with fuzzy semantics [39 –41], with the basic idea of measuring the similarity based on word embedding and the similarity between words according to the similarity of word vectors. To measure the semantic similarity between words, multiple semantic information of a word in WordNet was encoded into a vector space [42]. Standard similarity measures and new directives were implemented to generate the approximation degree by the proximity equations linking two words [43]. The other is fuzzy decision analysis for semantic overlap [44 –46]. In query language tasks, a fuzzy decision mechanism was combined with the dependency graph to improve the model decision [47] and a fuzzy semantic representation was presented for rare words [48]. Top-k words as a selection technique was used to detect and return the k most similar words to a candidate set [49]. A fuzzy set representation was provided to measure the fuzzy words through the fuzzy set similarity [50]. In conclusion, existing NLP models based on fuzzy sets mainly study the semantic distributions of texts from the perspective of fuzzy sets, but lack researches on semantic enhancement caused by fuzzy modifications between words. Hence, it is necessary to study the different degrees of modifications between words in the text.

2.3 Application of quantum computing in NLP

Quantum-inspired text representations [51, 52] mainly use the mathematical framework of quantum theory to represent the semantics, syntax, and emotion of texts in NLP, so that text representations are better consistent with human cognitive information. In recent years, text representation models based on quantum excitation can be roughly divided into two categories: one is the theoretical study of text representation based on quantum language [53 –58], and the other is deep neural network models based on density matrices and traces of matrices [17 , 59–61]. The text representation based on density matrix is input into the neural network to extract features, thus realizing the deep neural network analysis based on quantum language [62 –65]. As deep neural networks show much greater prominent advantages in various fields, more and more researchers combine the text representation model based on quantum density matrix with other basic knowledge of quantum theory to form a text representation that can better reflect human cognition model and apply it to different NLP tasks. With the excellent performance of the transformer in different NLP tasks, some scholars have input the density matrix of consecutive n words into the multi-head transformer to extract more semantic features [66 –69]. In a word, the existing NLP models based on quantum computing mainly regard words as independent individuals, but lack of researches considering semantic and grammatical associations between words. Therefore, it is worth extending sentence representation based on density matrix of independent words to the semantic association between words.

In short, text representation models have different focuses as they are based on different frameworks. Consequently, it is essential to use the mathematical framework of quantum theory and fuzzy mathematics to study the modification relations among words, the degree of modification and the influence of word order on the semantics and syntax of text.

3 Fuzzy semantic quantum-like text representation

The general flowchart of the fuzzy semantic quantum-like text representation and the definition of the used parameter are respectively listed in Fig. 2 and in Table 1.

Fig. 2

The general flowchart of the proposed method for preprocessing the sentence representation.

Table 1

The definition of parameters

Parameter	Definition
S _T	subtree set of sentence S
T _ij	correlation matrix between the ith word and the jth word
$\overset{⇀ ⇀}{S_{1} S_{2}}$	correlation matrix between the sentences S₁ and S₂
∣w_i〉	the word2vec of the ith word
$e_{j}^{i}$	the value of the jth dimension in the ith word’s embedding
(t_ml) _ij	the value in row m and column l of T_ij
$S_{mt}^{ij}$	the value in row m and column t of $\overset{⇀ ⇀}{S_{i} S_{j}}$
c _ij	semantic association function of the ith word and the jth word
$Δ h_{i, j}^{{ST}_{s}}$	depth difference of the words w_i and w_j in the smallest subtree ST_s
Δd _i,j	distance difference of words w_i and w_j
d	dimension of the word embedding
p	total number of subtrees of sentence S
λ	membership constraint constant of the semantic association

3.1 Correlation matrix between words

All words in the sentence T are stored in the word set

$W = {∣ w_{i} 〉}, i = 1, 2, . . ., n$ (1)

where w_i is the normalized vector of the ith word embedding $∣ w_{i} 〉 = {[\begin{matrix} e_{1}^{i} e_{2}^{i} \dots e_{d}^{i} \end{matrix}]}^{⊤}$ .

The correlation matrix between w_i and w_j is defined as follows,

$\begin{matrix} T_{i, j} & = ∣ w_{i} 〉 \times 〈 w_{j} ∣ \\ = {[\begin{matrix} e_{1}^{i} e_{2}^{i} \dots e_{d}^{i} \end{matrix}]}^{⊤} \times [\begin{matrix} e_{1}^{j} e_{2}^{j} \dots e_{d}^{j} \end{matrix}] \\ = {[\begin{matrix} t_{11} t_{12} \dots t_{1 d} \\ t_{21} t_{22} \dots t_{2 d} \\ ⋮ ⋮ \dots ⋮ \\ t_{d 1} t_{d 2} \dots t_{dd} \end{matrix}]}_{i, j}, \end{matrix}$ (2) where (t_ml) _i,j is defined as $(t_{ml})_{i, j} = e_{m}^{i} \times e_{l}^{j}$ .

If the order of two associative words is exchanged, the correlation matrix is changed as follows,

$\begin{matrix} T_{j, i} & = ∣ w_{j} 〉 \times 〈 w_{i} ∣ \\ = {[\begin{matrix} e_{1}^{j} e_{2}^{j} \dots e_{d}^{j} \end{matrix}]}^{⊤} \times [\begin{matrix} e_{1}^{i} e_{2}^{i} \dots e_{d}^{i} \end{matrix}] \\ = {[\begin{matrix} t_{11}^{'} t_{12}^{'} \dots t_{1 d}^{'} \\ t_{21}^{'} t_{22}^{'} \dots t_{2 d}^{'} \\ ⋮ ⋮ \dots ⋮ \\ t_{d 1}^{'} t_{d 2}^{'} \dots t_{dd}^{'} \end{matrix}]}_{j, i}, \end{matrix}$ (3)

where $(t_{ml}^{'})_{j, i}$ is defined as $(t_{ml}^{'})_{j, i} = e_{m}^{j} \times e_{l}^{i}$ .

According to $(t_{ml}^{'})_{j, i} = e_{m}^{j} \times e_{l}^{i} \neq (t_{ml})_{i, j} = e_{m}^{i} \times e_{l}^{j}$ , one can obtain T_i,j ≠ T_j,i, which means that the correlation matrix between the word embeddings can reflect the word orders.

3.2 Fuzzy semantic association

The constituency parse can reflect phrases with different modifications from different combinations of words 1 . Each subtree in a constituency parse saves the complete grammatical information, denoting a phrase with a certain modification. The size of the smallest subtree can reflect the different degrees between related words with different modifications.

Take a subtree of Sen₁ for example, shown in Fig. 3.

Fig. 3

A subtree of the sentence constituency parse shown in Fig. 1.

There are five NPs with different phrase combinations listed in Table 2.

Table 2

The sequences of NPs in Sen₁ with different phrase combinations

Sequence	Phrase combination
Seq₁:	NP(NP(DT(the) NN(question) PP(IN(of) NP(NP(DT(the) NN(integration)) NP(IN(of) NP(DT(the) JJ(Russian) NN(population)))))))
Seq₂:	NP(DT(the) NN(question)
Seq₃:	NP(NP(DT(the) NN(integration)) NP(IN(of) NP(DT(the) JJ(Russian) NN(population))))
Seq₄:	NP(DT(the) NN(integration))
Seq₅:	NP(IN(of) NP(DT(the) JJ(Russian) NN(population)))

Apparently, Seq₁ contains the other four NPs, that is, Seq₂, Seq₃, Seq₄ and Seq₅ are the descendants of Seq₁. Seq₃ is the smallest subtree of Seq₄ and Seq₅, not containing Seq₂, which means that the semantic correlation between words in Seq₄ and Seq₅ is stronger than that in Seq₂. Therefore, a membership function of semantic association between words based on a subtree is established to reflect the different degrees of the different correlations.

3.2.1 Subtree set

Input all the words of the sentence S to generate a constituency parse and construct a set of subtrees according to the constituency parse. The algorithm for generating the subtree set based on the constituency parse is listed in Alg. 1. Record the positions of all words in the sentence. For example, w_i is the ith word of the sentence S and w_j is the jth word.

Algorithm 1 Algorithm for generating the subtree set based on the constituency parser.

Require: Input sentence, Sen; StanfordCoreNLP;
Ensure: Obtain the subtree set of Sen, S_T;
1: Generate the constituency parse T of Sen, preorder T and then get sequence T_x
2: ST = [], i = 0, T_x(i) is the ith subtree of T
3: whilei < len(T_x) do
4: if T_x(i) (NP, VP, PP, IP, S, Root) then
5: S_T .append(T_x(i))
6: end if
7: i = i + 1
8: end while
9: returnS_T;

3.2.2 Smallest subtree

Find the smallest subtree ST_s that contains leaf nodes w_i and w_j, and record the depths of leaf nodes w_i and w_j in the subtree, denoted respectively as $F_{w_{i}}^{{ST}_{s}}$ and $F_{w_{j}}^{{ST}_{s}}$ . The algorithm for finding the smallest subtree of word pair (w_i, w_j) from the subtree set S_T is listed in Alg. 2.

Algorithm 2 Algorithm for finding out the smallest subtree ST_s of word pair (w_i,w_j) from the subtree set S_T.

Require: Input the word pair (w_i,w_j) and the subtree set S_T	Ensure: Obtain the smallest subtree ST_s of the word pair (w_i,w_j)
1: k = 0, k is the length of the ith subtree
2: k = len(S_T [0])
3: whilei < len(T_x) do
4: if w_icst and w_jcstthen
5: n = len(cst)
6: ifn < kthen
7: ST_s = cst
8: k = n
9: end if
10: end if
11: end while
12: returnST_s

3.2.3 Membership function of the semantic association

In fuzzy set theory, the membership degree equaling to 0.5 represents the least amount of information, which is, the most ambiguous. To reflect the most ambiguous semantic association between words, we define the membership function of semantic association between words as follows. if $F_{w_{i}}^{{ST}_{s}} + F_{w_{j}}^{{ST}_{s}} \leq α_{1}$ and $| F_{w_{i}}^{{ST}_{s}} - F_{w_{j}}^{{ST}_{s}} | \leq β_{1}$ : $c_{i, j} = λ Sim (w_{i}, w_{j}),$ (4) if $F_{w_{i}}^{{ST}_{s}} + F_{w_{j}}^{{ST}_{s}} \leq α_{1}$ and $| F_{w_{i}}^{{ST}_{s}} - F_{w_{j}}^{{ST}_{s}} | > β_{1}$ : $c_{i, j} = λ (0.5 + \frac{1}{Δ h_{i, j}^{{ST}_{s}} + 1}) Sim (w_{i}, w_{j}),$ (5) if $F_{w_{i}}^{{ST}_{s}} + F_{w_{j}}^{{ST}_{s}} > α_{1}$ and $| F_{w_{i}}^{{ST}_{s}} - F_{w_{j}}^{{ST}_{s}} | \leq β_{1}$ : $c_{i, j} = λ (0.5 + \frac{1}{Δ d_{i, j}^{{ST}_{s}}}) Sim (w_{i}, w_{j}),$ (6) if $F_{w_{i}}^{{ST}_{s}} + F_{w_{j}}^{{ST}_{s}} > α_{1}$ and $| F_{w_{i}}^{{ST}_{s}} - F_{w_{j}}^{{ST}_{s}} | > β_{1}$ : $c_{i, j} = λ (0.5 + \frac{1}{Δ h_{i, j}^{{ST}_{s}} + 1}) (0.5 + \frac{1}{Δ d_{i, j}}) Sim (w_{i}, w_{j}) .$ (7) Where $F_{w_{i}}^{{ST}_{s}}$ denotes the depth of the words w_i in the smallest subtree ST_s, both α₁ and β₁ are constants with the condition of α₁ > β₁, and, Δd_i,j = |i - j| is the distance difference between words w_i and w_j.

Therefore, the difference of depths of the words w_i and w_j in the smallest subtree ST_s is defined as the depth difference $Δ h_{i, j}^{{ST}_{s}}$ between the two words, $Δ h_{i, j}^{{ST}_{s}} = | F_{w_{i}}^{{ST}_{s}} - F_{w_{j}}^{{ST}_{s}} | .$ (8)

3.3 Sentence embedding

The fuzzy membership function of the semantic association is regarded as the correlation coefficient between the related words. Therefore, the sentence representation is obtained by linear superposition of the correlation matrix of all the associated word pairs in the input sentence. $S = \sum_{i, j}^{n} c_{i, j} T_{i, j} = [S_{mt}],$ (9) where n is the total number of words in the sentence S, m = 1, 2, . . . , d and t = 1, 2, . . . , d. The framework of the constituency parser-based sentence embedding of fuzzy semantic association is listed in Alg. 3.

Algorithm 3 Framework of the constituency parse-based sentence embedding of fuzzy semantic association.

Require: Input sentence, Sen; StanfordCoreNLP;
Ensure: Obtain the semantic association matrix S of sentence Sen;
1: w_i is the ith word in Sen, w_j is the jth word in Sen, Δd_i,j denotes the distance difference between w_i and w_j, α₁ and β₁ are constants
2: S_T = Alg.1(Sen)
3: i = 0, S = []
4: whilei < len(Sen) do
5: j = i + 1
6: whilej < len(Sen) do
7: Δd_i,j = \|i –j\|
8: \|w_i = normal(w_i)
9: \|w_j = normal(w_j)
10: <w_j \|= transposed_conjunction(\| w_j>)
11: T_i,j =\| w_i> × <w_j \|
12: ST_s = Alg.2(w_i,w_j)
13: $Δ h_{i, j}^{S T_{s}} = F_{w_{i}}^{S T_{s}} - F_{w_{j}}^{S T_{s}}$
14: $h_{t} = F_{w_{i}}^{S T_{s}} + F_{w_{j}}^{S T_{s}}$
15: if $h_{t} \leq α_{1} and Δ h_{i, j}^{S T_{S}} \leq β_{1}$ then
16: u_i,j = λ
17: else if $h_{t} \leq α_{1} a n d Δ h_{i, j}^{S T_{S}} \geq β_{1}$ then
18: $u_{i, j} = λ (0.5 + \frac{1}{Δ h_{i, j}^{S T_{S}} + 1})$
19: else if $h_{t} \geq α_{1} a n d Δ h_{i, j}^{S T_{S}} \leq β_{1}$ then
20: $u_{i, j} = λ (0.5 + \frac{1}{Δ d_{i . j}})$
21: $h_{t} \geq α_{1} and Δ h_{i, j}^{S T_{S}} \geq β_{1}$
22: $u_{i, j} = λ (0.5 + \frac{1}{Δ h_{i, j}^{S T_{S}} + 1}) (0.5 + \frac{1}{Δ d_{i, j}})$
23: end if
24: c_i,j = u_i,j × Sim(w_i,w_j)
25: S = S + c_i,j × T_i,j
26: j = j + 1
27: end while
28: i = i + 1
29: end while
30: returnS

3.4 Tensor projection

The components of a word embedding express the co-occurrence distribution of words in the training corpus. Since the co-occurring words of different words are different, the basis vectors are also different, so as the base vectors of the semantic association tensor obtained by the word2vec. To highlight the association semantics and reduce redundant information, the semantic association tensor is projected to a matrix core and then the maximum value is extracted to reduce the dimension of the semantic association tensor, as shown in Fig. 4. Considering that there are negative numbers in the elements of word2vecs, this model improves the pooling layer of the CNN to extract the value with the largest absolute value, and restores its original sign in the output layer.

Fig. 4

Flowchart of the dimension of sentence tensor representation reduced by tensor projection.

A sentence tensor of the provided model is taken as an example to list the processing of tensor projection, shown in Fig. 5. Figure 5 illustrates how to extract the maximum absolute value in sentence representation when the step size and kernel function are equal to 2. Among the four data in the blue box, the largest absolute value is -0.04576. If the maximum pooling in classical CNN is used to extract features, 0.02723 is obtained. Considering that the numbers and symbol in the word2vec can reflect the distribution characteristics of words co-occurring with them, -0.04576 has stronger semantic characteristics than 0.02723. Therefore, tensor projection can extract elements with stronger semantic features in sentence representation.

Fig. 5

A sentence tensor to illustrate the tensor projection.

3.5 Sentence similarity

The dot product between the tensors of sentence embeddings S₁ and S₂ is seen as the correlation tensor of S₁ and S₂.

$\begin{matrix} \overset{⇀ ⇀}{S_{1} S_{2}} = [S_{mt}^{12}] \\ = [\begin{matrix} S_{11}^{1} S_{12}^{1} \dots S_{1 d}^{1} \\ S_{21}^{1} S_{22}^{1} \dots S_{2 d}^{1} \\ ⋮ ⋮ \dots ⋮ \\ S_{d 1}^{1} S_{d 2}^{1} \dots S_{dd}^{1} \end{matrix}] \otimes [\begin{matrix} S_{11}^{2} S_{12}^{2} \dots S_{1 d}^{2} \\ S_{21}^{2} S_{22}^{2} \dots S_{2 d}^{2} \\ ⋮ ⋮ \dots ⋮ \\ S_{d 1}^{2} S_{d 2}^{2} \dots S_{dd}^{2} \end{matrix}] \\ = [\begin{matrix} S_{11}^{12} S_{12}^{12} \dots S_{1 d}^{12} \\ S_{21}^{12} S_{22}^{12} \dots S_{2 d}^{12} \\ ⋮ ⋮ \dots ⋮ \\ S_{d 1}^{12} S_{d 2}^{12} \dots S_{dd}^{12} \end{matrix}], \end{matrix}$ (10)

where $S_{mt}^{12} = S_{mt}^{1} \times S_{mt}^{2}$ . The dimension of $\overset{⇀ ⇀}{S_{1} S_{2}}$ is d × d and the order is 2 + 2 -2 = 2.

Similarly, the correlation tensor of S₁ and S₁ is $\overset{⇀ ⇀}{S_{1} S_{1}} = [S_{mt}^{11}],$ (11) and so as the correlation tensor of S₂ and S₂ $\overset{⇀ ⇀}{S_{2} S_{2}} = [S_{mt}^{22}] .$ (12) The semantic similarity between sentences S₁ and S₂ is $Sim (S_{1}, S_{2}) = \frac{| \overset{⇀ ⇀}{S_{1} S_{2}} |}{\sqrt{| \overset{⇀ ⇀}{S_{1} S_{1}} | \times | \overset{⇀ ⇀}{S_{2} S_{2}} |}},$ (13) where $| \overset{⇀ ⇀}{S_{1} S_{2}} | = \sum_{m, t}^{d} (S_{mt}^{12})^{2}$ , $| \overset{⇀ ⇀}{S_{1} S_{1}} | = \sum_{m, t}^{d} (S_{mt}^{11})^{2}$ , and $| \overset{⇀ ⇀}{S_{2} S_{2}} | = \sum_{m, t}^{d} (S_{mt}^{22})^{2}$ .

4 Experimentation and results

4.1 Experimental settings

The 300-dimensional word2vec is loaded from the public lib 2 . Experiments are executed on Sentences Involving Compositional Knowl-edge (SICK) 3 US and the corpora released by SemEval Semantic Textual Similarity Tasks (STS) 4 covering years of 2012, 2014, 2015 and 2016, as shown in Table 3. In Table 3, the figures in parentheses represent the sentence pairs of the test set of the corresponding corpus. In addition, the number in parentheses after STS’ 12 refers to the sum of all test sets of the year of 2012 in the STS corpus. Similarly, STS’14, STS’15, and STS’16 can be obtained.

Table 3
Datasets and the total number of sentence pairs in each dataset

Dataset (Number of sentences)
STS’12.MSRpar (750)	STS’14.deft-forum (450)	STS’15.answers-forums (375)	STS’12 (3108)
STS’12.MSRvid (750)	STS’14.deft-news (300)	STS’15.answers-students (750)	STS’14 (3750)
STS’12.SMTeuroparl (459)	STS’14.headlines (750)	STS’15.belief (375)	STS’15 (3000)
STS’12.OnWN (750)	STS’14.images (750)	STS’15.images (750)	STS’16 (1186)
STS’12.SMTnews (399)	STS’14.OnWN (750)	STS’15.headlines (750)	SICK (4500)
	STS’14.tweet-news (750)

These corpora consist of sentence pairs and their annotated textual similarity scores ranging from 0.0 to 1.0 (divided by 5).

For the same value of |x_i - y_i|, the relative errors of different y_i are different. The larger y_i, the smaller the relative error; the smaller y_i, the larger the relative error. To have a unified expression for different y_i, a constraint function is defined as follows, $f (x_{i}) = | x_{i} - y_{i} | - ɛ y_{i},$ (14) where x_i denotes the experimental similarity of the ith sentence pair, y_i is the annotated similarity and ɛ is a constant. When f (x_i) ≤0, the calculation result is valid; if f (x_i) >0, the calculation result is invalid and a tensor projection is introduced to optimize the tensor representation of the input sentence. When y_i ≥ γ, the stride of tensor projection is set as the same size of matrix core; when σ ≤ y_i < γ, the stride of tensor projection is set as 2.

4.2 Experimental results

4.2.1 Comparison of unsupervised models

Comparing the results listed in Table 4, except for SICK, QTR is higher than baselines, and the average of QTR on these five corpora is higher than that of all the baselines. The longer the sentence, the more complex the correlations between words, the more related phrases QTR obtains, which makes the semantic and grammatical expression of the sentence more accurate. Therefore, the semantic mining effect of QTR is more obvious for corpora with more long sentences. As an example, the result of QTR is much higher than the baselines in STS’12. However, in SICK, there are mostly short sentences and many sentence pairs with zero labeling value, and there are many identical words in the comparison sentence pairs, usually changing the synonym conversion of individual words, which makes QTR generate fewer discriminative associated word pairs, and leads to the calculation result that is larger than the labeled value, resulting in a lower Spearman’s rank correlation coefficient.

Table 4
Comparison of some unsupervised models on Spearman rank correlation coefficient (Src) in each dataset

Model STS’12 STS’14 STS’15 STS’16 SICK Avg.

Bert_base 0.397 0.497 0.660 0.662 0.621 0.5674

Bert_base-flow [21] 0.584 0.609 0.752 0.712 0.645 0.6604

ConSERT_bert [22] 0.646 0.691 0.797 0.760 0.673 0.7134

SimCSE_bert [23] 0.684 0.744 0.809 0.786 0.722 0.7490

SCPSE_bert [24] 0.643 0.705 0.785 0.757 0.687 0.7154

DCPCSE_bert [24] 0.730 0.767 0.842 0.797 0.700 7672

TENC [25] 0.719 0.764 0.829 0.807 0.712 0.7662

QTR 0.756 0.797 0.854 0.808 0.649 0.7728

Model	STS’12	STS’14	STS’15	STS’16	SICK	Avg.
Bert_base	0.397	0.497	0.660	0.662	0.621	0.5674
Bert_base-flow [21]	0.584	0.609	0.752	0.712	0.645	0.6604
ConSERT_bert [22]	0.646	0.691	0.797	0.760	0.673	0.7134
SimCSE_bert [23]	0.684	0.744	0.809	0.786	0.722	0.7490
SCPSE_bert [24]	0.643	0.705	0.785	0.757	0.687	0.7154
DCPCSE_bert [24]	0.730	0.767	0.842	0.797	0.700	7672
TENC [25]	0.719	0.764	0.829	0.807	0.712	0.7662
QTR	0.756	0.797	0.854	0.808	0.649	0.7728

4.2.2 Comparison of other models based on word2vec

The comparson of the results listed in Table 5, shows that almost all calculation results of QTR are better than the results of the comparison models except STS’14.deft-news, accounting for 15/16, which indicates that QTR can effectively express the semantic and grammatical information of the text. If the logical language structures of most sentences in the corpus are obvious, and the coherence based on information is strong, especially the corpus containing more long sentences, the calculation result of QTR exceeds all the comparison models by absolute advantage, such as STS’12.MSRpar, STS’12.SMTeuroparl and STS’15. answers-forums. It shows that QTR based on constituency parse can effectively establish the correlation between words by combining the distance difference, the depth difference of grammatical trees and the cosine similarity between words as the influencing factors of the semantic membership function. In STS’14.deft-news, some texts are phrases with clauses as modifiers. The words in the clauses do not have much semantic relation to the subject of the phrase. However, all the words in the clauses are related to the subject with semantic association, which increases the semantic error and makes the calculated value smaller. Therefore, the Pcc of STS’14.deft-news is significantly lower than the comparison models.

Table 5
Comparison of other methods based on word2vec on the Pearson correlation coefficient in each dataset

Year Dataset BERT ACVT [15] PP [14] SCBOW [14] QTR

based large PP-base PP-tfidf PP-best base ATT-CCG SCBOW-best

MSRpar 0.329 0.335 0.58 0.426 0.465 0.499 0.438 0.419 0.437 0.639

MSRvid 0.597 0.595 0.83 0.745 0.792 0.846 0.452 0.734 0.734 0.852

2012 SMTeuroparl 0.485 0.440 0.43 0.473 0.521 0.521 0.450 0.552 0.552 0.611

OnWN 0.603 0.618 0.70 0.706 0.725 0.727 0.644 0.696 0.696 0.781

SMTnews 0.554 0.547 0.54 0.584 0.658 0.666 0.390 0.557 0.557 0.703

deft-forum 0.315 0.251 0.48 0.478 0.544 0.558 0.408 0.421 0.421 0.560

deft-news 0.456 0.710 0.74 0.731 0.738 0.755 0.591 0.733 0.733 0.670

2014 headlines 0.571 0.581 0.72 0.697 0.713 0.723 0.636 0.668 0.668 0.769

images 0.657 0.664 0.81 0.785 0.808 0.831 0.650 0.761 0.761 0.834

OnWN 0.553 0.755 0.87 0.788 0.810 0.841 0.607 0.693 0.729 0.905

tweet-news 0.610 0.627 0.75 0.764 0.773 0.791 0.732 0.723 0.754 0.846

answers-forums 0.306 0.479 0.69 0.683 0.684 0.694 0.218 0.512 0.531 0.847

answers-students 0.669 0.704 0.79 0.782 0.785 0.785 0.367 0.732 0.732 0.851

2015 belief 0.456 0.602 0.70 0.762 0.782 0.784 0.477 0.603 0.603 0.848

images 0.708 0.751 0.79 0.814 0.838 0.853 0.256 0.787 0.727 0.863

headlines 0.621 0.642 0.82 0.748 0.768 0.774 0.215 0.727 0.787 0.816

Year	Dataset	BERT	ACVT [15]	PP [14]	SCBOW [14]	QTR
	MSRpar	0.329	0.335	0.58	0.426	0.465	0.499	0.438	0.419	0.437	0.639
	MSRvid	0.597	0.595	0.83	0.745	0.792	0.846	0.452	0.734	0.734	0.852
2012	SMTeuroparl	0.485	0.440	0.43	0.473	0.521	0.521	0.450	0.552	0.552	0.611
	OnWN	0.603	0.618	0.70	0.706	0.725	0.727	0.644	0.696	0.696	0.781
	SMTnews	0.554	0.547	0.54	0.584	0.658	0.666	0.390	0.557	0.557	0.703
	deft-forum	0.315	0.251	0.48	0.478	0.544	0.558	0.408	0.421	0.421	0.560
	deft-news	0.456	0.710	0.74	0.731	0.738	0.755	0.591	0.733	0.733	0.670
2014	headlines	0.571	0.581	0.72	0.697	0.713	0.723	0.636	0.668	0.668	0.769
	images	0.657	0.664	0.81	0.785	0.808	0.831	0.650	0.761	0.761	0.834
	OnWN	0.553	0.755	0.87	0.788	0.810	0.841	0.607	0.693	0.729	0.905
	tweet-news	0.610	0.627	0.75	0.764	0.773	0.791	0.732	0.723	0.754	0.846
	answers-forums	0.306	0.479	0.69	0.683	0.684	0.694	0.218	0.512	0.531	0.847
	answers-students	0.669	0.704	0.79	0.782	0.785	0.785	0.367	0.732	0.732	0.851
2015	belief	0.456	0.602	0.70	0.762	0.782	0.784	0.477	0.603	0.603	0.848
	images	0.708	0.751	0.79	0.814	0.838	0.853	0.256	0.787	0.727	0.863
	headlines	0.621	0.642	0.82	0.748	0.768	0.774	0.215	0.727	0.787	0.816

4.2.3 Influence of the membership function

All calculation results in Table 6 are computed by the sentence representation model based on constituency parse. As the distributions of the maximum Pcc and the minimum MSE are Compared, it is found that the semantic mining effect of the semantic association membership function between words that considers the depth difference Δh and ignores the distance difference Δd is significantly better than that of the other two while both considering the depth difference Δh and distance difference Δd, especially on the corpus containing a large number of long sentences, such as STS’15.belief. For a corpus of short text documents mainly composed of phrases or multiple short sentences, the effect of considering only Δh and ignoring Δd is better than considering Δd, and examples of this effect are STS’14.OnWN, STS’14.deft-forum, STS’15.belief and STS’15.answers-forums. However, for a corpus with diverse sentence structures, complete sentence patterns, and rhetorical accuracy between words in the text, it is better to consider both Δh and Δd comprehensively. For example, the Pccs of the five corpora of STS’12 all achieve the maximum values and MSE also reach the minimums.

Table 6
Comparison of the influence of membership function c_i,j in each dataset with λ = 1, where the c_i,j of the smallest subtree are defined by Eqs. 4, 5, 6 and 7 with the condition of α₁ = 4 and β₁ = 2 and the other two c_i,j are computed by Δh_i,j, Δd_i,j and Sim (w_i, w_j) directly without any other conditions attached

Year Dataset smallest subtree (α₁ = 4, β₁ = 2) $(0.5 + \frac{1}{Δ h_{i, j} + 1}) Sim (w_{i}, w_{j})$ $(0.5 + \frac{1}{Δ h_{i, j} + 1}) (0.5 + \frac{1}{Δ d_{i, j}}) Sim (w_{i}, w_{j})$

Pcc MSE Pcc MSE Pcc MSE

MSRpar 0.6353 0.0268 0.6394 0.0263 0.6417 0.0259

MSRvid 0.8477 0.0346 0.8491 0.0346 0.8519 0.0340

2012 SMTeuroparl 0.5818 0.0136 0.5947 0.0136 0.6113 0.0122

OnWN 0.7792 0.0189 0.7754 0.0194 0.7811 0.0189

SMTnews 0.6991 0.0182 0.6904 0.0180 0.7033 0.0179

deft-forum 0.5518 0.0684 0.5682 0.0659 0.5603 0.0666

deft-news 0.6558 0.0489 0.6655 0.0416 0.6698 0.0408

2014 headlines 0.7678 0.0393 0.7692 0.0392 0.7688 0.0394

images 0.8286 0.0290 0.8332 0.0283 0.8339 0.0298

OnWN 0.9053 0.0282 0.9066 0.0276 0.9049 0.0283

tweet-news 0.8473 0.0223 0.8444 0.0229 0.8461 0.0226

answers-forums 0.8432 0.0244 0.8477 0.0238 0.8474 0.0236

answers-students 0.8558 0.0254 0.8523 0.0267 0.8508 0.0263

2015 belief 0.8444 0.0291 0.8547 0.0266 0.8478 0.0286

images 0.8663 0.0279 0.8661 0.0278 0.8632 0.0285

headlines 0.8148 0.0412 0.8151 0.0413 0.8161 0.0414

Year	Dataset	smallest subtree (α₁ = 4, β₁ = 2)	$(0.5 + \frac{1}{Δ h_{i, j} + 1}) Sim (w_{i}, w_{j})$	$(0.5 + \frac{1}{Δ h_{i, j} + 1}) (0.5 + \frac{1}{Δ d_{i, j}}) Sim (w_{i}, w_{j})$
	MSRpar	0.6353	0.0268	0.6394	0.0263	0.6417	0.0259
	MSRvid	0.8477	0.0346	0.8491	0.0346	0.8519	0.0340
2012	SMTeuroparl	0.5818	0.0136	0.5947	0.0136	0.6113	0.0122
	OnWN	0.7792	0.0189	0.7754	0.0194	0.7811	0.0189
	SMTnews	0.6991	0.0182	0.6904	0.0180	0.7033	0.0179
	deft-forum	0.5518	0.0684	0.5682	0.0659	0.5603	0.0666
	deft-news	0.6558	0.0489	0.6655	0.0416	0.6698	0.0408
2014	headlines	0.7678	0.0393	0.7692	0.0392	0.7688	0.0394
	images	0.8286	0.0290	0.8332	0.0283	0.8339	0.0298
	OnWN	0.9053	0.0282	0.9066	0.0276	0.9049	0.0283
	tweet-news	0.8473	0.0223	0.8444	0.0229	0.8461	0.0226
	answers-forums	0.8432	0.0244	0.8477	0.0238	0.8474	0.0236
	answers-students	0.8558	0.0254	0.8523	0.0267	0.8508	0.0263
2015	belief	0.8444	0.0291	0.8547	0.0266	0.8478	0.0286
	images	0.8663	0.0279	0.8661	0.0278	0.8632	0.0285
	headlines	0.8148	0.0412	0.8151	0.0413	0.8161	0.0414

4.2.4 Influence of the tensor projection

For sentence pairs satisfying 1 > y_i > 0.8 and ɛ > 0.35, the dimension of the text representation is reduced by matrix projection. When the feature matrix is projected on the semantic relation matrix to extract the semantic features, the core of the feature matrix is set as the step size of the feature matrix movement. As the core of the feature matrix grows, the trend of the three corpora is the same, as illustrated in Fig. 6. Pcc first increases and then remains unchanged, and MSE first decreases and then remains stable. It shows that when the dimension of the association matrix is reduced to a certain value and then the dimension is reduced, there will be excessive dimension reduction, which may lead to the emergence of data sparseness.

Fig. 6

Influence of the tensor projection with the stride equaling to the matrix core.

5 Conclusions and future works

This article provides a quantum-like text representation based on syntax tree to analyse the fuzzy modifying relationships between words for semantic similarity computing. The main advantages and differences compared to the existing models are summarized as follows. 1) To highlight the semantic relationship between words, the quantum-like text representation based on density matrix of individual words is generalized to represent the relationship of modification between words. 2) Combining the fuzzy mathematics theory, a fuzzy semantic membership function is constructed to discuss the different degrees of modifying relationships between words based on syntax tree. 3) To extract semantic information with larger absolute value in elements of the quantum-like sentence representation, we alter the pooling layers of CNN. 4) To effectively exploit the semantic information of all elements in the quantum-like sentence representation, the tensor dot product is defined as the sentence semantic similarity by combining the operation rules of the tensor.

Note that this study has certain limitations. This article mainly uses the computational method of quantum mechanics to preprocess the text representation to discuss the fuzzy semantic association between words. The dimension of sentence representation obtained by linear superposition of density matrix between words is expanded from d × N of word2vec to d², thus increasing the computational complexity of the model. Future research will focus on reducing the computational complexity of this model and applying it to other natural language processing tasks such as fuzzy emotional association between words, multimodal fuzzy semantics and emotional associations.

Footnotes

https://nlp.stanford.edu/software/lex-parser.shtml

http://code.google.com/archive/p/word2vec

http://creativecommons.org/licenses/by-nc- sa/3.0/deed.en

http://groups.google.com/group/STS-semeval

References

Wang

, Li

, Huang

, et al., Learning two-branch neural networks for image-text matching tasks, IEEE Trans Pattern Anal Mach Intell 41(2) (2019), 394–407.

Yang

, Wang

, Xie

, et al., Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis, IEEE Trans Image Process 30 (2021), 2798–2809.

Zhang

, Zhou

, Wang

, et al., Deep relation embedding for cross-modal retrieval, IEEE Trans Image Process 30 (2021), 617–627.

Canales

, Strapparava

, Boldrini

, et al., Intensional learning to efficiently build up automatically annotated emotion corpora, IEEE Trans Affect Comput 11(2) (2020), 335–347.

Luo

, Zhang

, Qin

, et al., Tourism attraction selection with sentiment analysis of online reviews based on probabilistic linguistic term sets and the IDOCRIW-COCOSO model, Int J Fuzzy Syst 23 (2021), 295–308.

, Zhou

, Xu

, et al., SEA: Sentence encoder assembly for video retrieval by textual queries, IEEE Trans Multim 23 (2021), 4351–4362.

Khameneh

, Kilicman

, Ali

, Transitive fuzzy similarity multigraph-based model for alternative clustering in multi-criteria group decision-making problems, Int J Fuzzy Syst 2022.

Ben

, Pan

, Li

, et al., Unpaired image captioning with semantic-constrained self-learning, IEEE Trans Multim 2021, 1–1.

Gao

, Xu

, Huang

, et al., Jointly learning topics in sentence embedding for document summarization, IEEE Trans Knowl Data Eng 32(4) (2020), 688–699.

10.

Huang

, Wang

, Few-shot image and sentence matching via aligned cross-modal memory, IEEE Trans Pattern Anal Mach Intell 2021, 1–1.

11.

, Kang

, Jiang

, Semantic key generation based on natural language, Int J Intell Syst 2021, 1–24.

12.

John

, Mohit

, Kevin

, et al., Towards universal paraphrastic sentence embeddings, in Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, 2016.

13.

Tom

, Alexey

, Maarten de

, Siamese CBOW: Optimizing word embeddings for sentence representations, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, 2016.

14.

Wang

, Zhang

, Zong

, Learning sentence representation with guidance of human attention, in Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI 2017, 2017, 4137–4143.

15.

Quan

, Wang

, Le

, et al., An efficient framework for sentence similarity modeling, IEEE ACM Trans Audio Speech Lang Process 27(4) (2019), 853–865.

16.

Wang

, Jay

, Kuo, SBERT-WK: A sentence embedding method by dissecting BERT-based word models, IEEE ACM Trans Audio Speech Lang Process 28 (2020), 2146–2157.

17.

Dimitris

, Shahram

, Massimo

, et al., Quantum cognitively motivated decision fusion for video sentiment analysis, in Proceedings of the 35th Conference on Artificial Intelligence, Online, 2021, 827–835.

18.

Wang

, Zhao

, Christina

, et al., Encoding word order in complex embeddings, in Proceedings of the 9th International Conference on Learning Representations (ICLR2020), Addis Ababa, 2020, 1–15.

19.

Zhang

, Niu

, Su

, et al., End-to-end quantum-like language models with application to question answering, in Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), Louisiana, 2018, 5666–5673.

20.

Devlin

, Chang

, Lee

, et al., Bert: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, 2019, 4171–4186.

21.

, Zhou

, He

, et al., On the sentence embeddings from pre-trained language models, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Online, 2020, 9119–9130.

22.

Yan

, Li

, Wang

, ConSERT: A contrastive framework for self-supervised sentence representation transfer, in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, 5065–5075.

23.

Gao

, Yao

, Chen

, SimCSE: Simple contrastive learning of sentence embeddings, in Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2021), Online, 2021, 6894–6910.

24.

Jiang

, Wang

, Deep continuous prompt for contrastive learning of sentence embeddings, 2022, arXiv:2203.06875v1.

25.

Liu

, Jiao

, Massiah

, et al., Trans-encoder: unsupervised sentence-pair modelling through self- and mutual-distillations, in Proceedings of the 11th International Conference on Learning Representations (ICLR 2022), online, 2022, arXiv:2109.13059v4.

26.

Lan

, Chen

, Goodman

, et al., ALBERT: A lite BERT for self-supervised learning of language representations, in Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, 2020.

27.

Alexis

, Guillaume

, Cross-lingual language model pretraining, in Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, 2019, 7057–7067.

28.

Liu

, Ott

, Goyal

, et al., RoBERTa: A robustly optimized BERT pretraining approach, CoRR, abs/1907.11692, 2019. http://arxiv.org/abs/1907.11692

29.

, Dai

, Guerin

, et al., BERT-hLSTMs: BERT and hierarchical LSTMs for visual story telling, Comput Speech Lang 67 (2021), 101169.

30.

Arase

, Tsujii

, Transfer fine-tuning of BERT with phrasal paraphrases, Comput Speech Lang 66 (2021), 101164.

31.

Brown

, Mann

, Ryder

, et al., Language models are few-shot learners, in Proceedings of Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 2020.

32.

Zhao

, Mao

, Fuzzy bag-of-words model for document representation, (2), IEEE Trans Fuzzy Syst 26 (2018), 794–804.

33.

Karpagam

, Manikandan

, Multi-level fuzzy based Renyi entropy for linguistic classification of texts in natural scene images, Int J Fuzzy Syst 22 (2020), 438–449.

34.

Qiu

, Zheng

, Improving Text Rank algorithm for automatic keyword extraction with tolerance rough set, Int J Fuzzy Syst 2021.

35.

Vashishtha

, Susan

, Sentiment cognition fromwords shortlisted by fuzzy entropy, IEEE Trans Auton Ment Dev 12(3) (2020), 541–550.

36.

Cardone

, Di Martino

and Senatore

, Improving the emotion-based classification by exploiting the fuzzy entropy in FCM clustering, Int J Intell Syst 36(11) (2021), 6944–6967.

37.

Mehlawat

, Gupta

, Khaitan

, Multiobjective fuzzy vehicle routing using Twitter data: reimagining the delivery of essential goods, Int J Intell Syst 36(7) (2021), 3566–3595.

38.

Orouskhani

, Shi

, Cheng

, A fuzzy adaptive dynamic NSGA-II with fuzzy-based borda ranking method and its application to multimedia data analysis, IEEE Trans Fuzzy Syst 29(1) (2021), 118–128.

39.

Dang

, Luk

, Allan

, A principled approach using fuzzy set theory for passage-based document retrieval, IEEE Trans Fuzzy Syst 29(7) (2021), 1967–1977.

40.

Chang

, Lee

, Subtopic segmentation for small corpus using a novel fuzzy model, IEEE Trans Fuzzy Syst 15(4) (2007), 699–710.

41.

Valerie

, Valeria

, Keeley

, et al., Ontological and fuzzy set similarity between perception-based words, in Proceedings of 2019 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2019, 2019, 1–6.

42.

Zhao

, Zhu

, Han

, A novel model for semantic similarity measurement based on wordnet and word embedding, J Intell Fuzzy Syst 40(5) (2021), 9831–9842.

43.

Pascual

, Fernando

, Implementing word Net measures of lexical semantic similarity in a fuzzy logic programming system, Theory Pract Log Program 21(2) (2021), 264–282.

44.

Das

, Ghosh

, Reducing parameter value uncertainty in discrete Bayesian network learning: a semantic fuzzy Bayesian approach, IEEE Trans Emerg Top Comput Intell 5(3) (2021), 361–372.

45.

Belkhatir

, Fuzzy-logic-based integration of web contextual linguistic structures for enriching conceptual visual representations, IEEE Trans Emerg Top Comput Intell 3(4) (2019), 351–356.

46.

, Zhang

, Ge

, et al., Uncertainty measurement for a fuzzy relation information system, IEEE Trans Fuzzy Syst 27(12) (2019), 2338–2352.

47.

, Li

, et al., A comprehensive exploration on spider with fuzzy decision text-to-SQL model, IEEE Trans Ind Informatics 16(4) (2020), 2542–2550.

48.

Yang

, Liu

, Chen

, et al., A hierarchical clustering approach to fuzzy semantic representation of rare words in neural machine translation, IEEE Trans Fuzzy Syst 28(5) (2020), 992–1002.

49.

Liu

, Huang

, Xuan

, et al., A fuzzy word similarity measure for selecting top-k similar words in query expansion, IEEE Trans Fuzzy Syst 2020, 1–1.

50.

Cross

, Mokrenko

, Crockett

, et al., Using fuzzy set similarity in sentence similarity measures, in Proceeedings of the 29th IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2020, 2020, 1–8.

51.

Dominic

, Kirsty

, Trevor

, Quantum mathematics in artificial intelligence, J Artif Intell Res 72 (2021), 1307–1341.

52.

Diederik

, Jonito

, Lester

, et al., Context and interference effects in the combinations of natural concepts, in Proceedings of Modeling and Using Context-10th International and Interdisciplinary Conference 10257 (2017), 677–690.

53.

Aerts

, Beltran

, Quantum structure in cognition: human language as a Boson gas of entangled words, Found Sci 25 (2020), 755–802.

54.

Aerts

, Beltran

, Geriente

, et al., Quantum-theoretic modeling in computer science: a complex Hilbert space model for entangled concepts in corpuses of documents, Int J Theor Phys 99 (2019), 1–15.

55.

, Wang

, Song

, et al., A dependency parser for spontaneous Chinese spoken language, ACM Trans Asian Low Resour Lang Inf Process 17(4) (2018), 28:1–28.

56.

Diederik

, Aerts

, Lester

, et al., Towards a quantum world wide web, Theor Comput Sci 752 (2018), 116–131.

57.

Trevor

, Dominic

, Bringing order to neural word embeddings with embeddings augmented by random permutations (EARP), in Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL2018), Brussels, 2018, 465–475.

58.

Liu

, Zhang

, Li

, et al., What does your smile mean? Jointly detecting multi-modal sarcasm and sentiment using quantum probability, in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP2021), Online, 2021, 871–880.

59.

Jiang

, Zhang

, Gao

, et al., A quantum interference inspired neural matching model for Ad-hoc retrieval, in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2020), Online, 2020, 19–28.

60.

Zhang

, Chen

, Wang

, et al., Quantum-based subgraph convolutional neural networks, Pattern Recogn 88 (2019), 38–49.

61.

Zhang

, Song

, Li

, et al., A quantum-like multimodal network framework for modeling interaction dynamics in multiparty conversational sentiment analysis, Inform Fusion 62 (2020), 14–31.

62.

Zhang

, Song

, Zhang

, et al., A quantum-inspired sentiment representation model for twitter sentiment analysis, Appl Intell 49(8) (2019), 3093–3108.

63.

Zhang

, Song

, Zhang

, et al., A quantum-inspired multimodal sentiment analysis framework, Theor Comput Sci 752 (2018), 21–40.

64.

Dimitris

, Li

, Yu

, et al., An entanglement-driven fusion neural network for video sentiment analysis, in Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-21), Online, 2021, 1736–1742.

65.

Zhang

, Liu

, Li

, et al., CFN: A complex-valued fuzzy network for sarcasm detection in conversations, IEEE Trans Fuzzy Syst 29(12) (2021), 3696–3710.

66.

Zhang

, Zhang

, Ma

, et al., A generalized language model in tensor space, in Proceedings of the 33rd Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, and the 9th Symposium on Educational Advances in Artificial Intelligence, Hawaii, 2019, 7450–7458.

67.

, Zhang

, et al., A tensorized transformer for language modeling, in Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, 2019, 2229–2239.

68.

Zhang

, Zhang

, Ma

, et al., TensorCoder: Dimension-wise attention via tensor representation for natural language modeling, in Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Online, 2020, 1–11.

69.

Jiang

, Zhang

, Gao

, et al., A match-transformer framework for modeling diverse relevance patterns in Ad-hoc retrieval, Aust J Intell Inf Process Syst 17(1) (2019), 39–47.

A quantum-like text representation based on syntax tree for fuzzy semantic analysis

Abstract

Keywords

1 Introduction

2.1 Sentence embedding

2.2 Application of fuzzy rough set in NLP

2.3 Application of quantum computing in NLP

3 Fuzzy semantic quantum-like text representation

3.2.2 Smallest subtree

3.2.3 Membership function of the semantic association

4.1 Experimental settings

Table 3 Datasets and the total number of sentence pairs in each dataset

4.2.1 Comparison of unsupervised models

Footnotes

References

Table 3
Datasets and the total number of sentence pairs in each dataset