Abstract
As a fundamental task of natural language processing, semantic role labeling (SRL) have attracted much attention of researchers in recent years. However, with increasing features being added into the studies, the performance growth trend of SRL is gradually slowing down. So new ways must be found to improve the performance of semantic analysis. Word sense information is useful for SRL task. But how to effectively make use of word sense information is a key issue. Referring to synergetics, we can regard semantic analysis process as competitive process of many semantics order parameters under coherent action and interactive collaboration of semantic role-related features and word sense-related features. Accordingly, we propose a semantic role labeling model with word sense information based on improved synergetic neural network (SNN). Our contributions are three-fold. Firstly, role-related features and word sense-related features are used to configure semantic order parameters of SNN. Secondly, network parameters are reconstructed which can reflect the relationship of driving and restraining each other between various linguistic features. Finally, we use an improved quantum particle swarm algorithm (QPSO) to realize the optimization of network parameter which has stronger search ability and faster convergence speed. By evaluating our model on the OntoNotes 2.0 corpus, the experiment results show the proposed model in this paper leads to a higher performance for SRL.
Introduction
Automatic semantic analysis is an important task in natural language processing research. However, limited to the current level of technology, it is more difficult to make deep semantic analysis. Semantic role labeling [1, 2] is a simplified method of semantic analysis, which does not make complete semantic analysis for the entire sentence, but only labels predicate-related semantic role. Semantic role labeling enjoys many advantages such as clear definition of issues, easiness for manual labeling, etc, and thus has broad application prospects. A good processing of semantic role labeling issue will help solve other natural language processing tasks such as machine translation [3], question answering [4, 5] and information extraction [6].
At present, the mainstream studies of semantic analysis employ a variety of statistical machine learning techniques and linguistics features for semantic recognition. However, with more and more features added, improvement trend of semantic analysis performance gradually slows down. Some scholars adopt detailed and rich feature engineering to improve the performance of SRL system. These features are primarily derived from lexical information, part of speech information and syntactic information. But for labeling models, features extracted from the lexical information of words are too sparse, while features extracted from part of speech are too generalized. Word sense-related information may reflect indescribable relationship between lexical information and part of speech information. Introduction of word sense will help to determine semantic roles to further improve the performance of labeling system. But currently there are still few studies which apply word sense information in semantic role labeling. Only a small part of scholars made some attempts. Surdeanu et al. [7] first proposed a predicate recognition model in CONLL 2008 and this task can be regarded as word sense disambiguation. Meza-Ruiz and Riedel [8] attempted to improve performance of semantic role labeling with predicate word sense information. Che et al. [9] considered application of whole word sense information in semantic role labeling system, and discussed impact of predicate-related features and word-related features on systemperformance.
In semantic role labeling process, the introduction of word sense information is expected to achieve better results. But how to better reflect the relationship between various semantic features is a key issue. In current mainstream SRL systems (SVM-based model [10]; MaxEnt-based model [11, 12]; Decision tree model [13]; AdaBoost [14]), semantic role information and word sense information are considered to be independent, possible dependencies between them are ignored and full use of semantic information is incapable.
For a better mutual learning, this paper introduces synergetic pattern recognition [15] into semantic analysis, which can better represent global constraint of semantic role information and word sense information, so that semantic role labeling can achieve global optimal solution in the integration process. Compared with other pattern recognition methods, synergetic neural network enjoys greater advantages in dealing with fuzzy matching applications, such as face recognition [16], image retrieval [17] and mangroves classification [18]. Problem of incomplete contextual information often occurs in semantic annotation process, such as information missing in part of speech, word sense information and semantic role information, etc. Application of synergetic neural network in processing of semantic annotation problem is expected to achieve better performance. An improved SNN-based SRL model is proposed and implemented in this paper which can better handle fuzzy matching of semantic information, so as to improve the performance of SRL.
This paper is organized as follows. In Section 2, some related work is introduced. A brief introduction to SNN and SNN-based SRL model is described in Section 3. An improved SRL model with word sense is introduced in section 4. Some experimental results and conclusions are given in the last section.
Related work
In general, semantic role labeling and word sense disambiguation are considered to be independent. Word sense information is rarely used in semantic role labeling. In recent years, A few researchers have carried out a small number of related research. Dang and Palmer [19] used semantic role information in verb sense disambiguation, experiment showed the performance of word sense disambiguation can be improved by leveraging information about predicate arguments. Hajič et al. [20] used the predicate sense information to help semantic role labeling. Dahlmeier et al. [21] reported a joint model for word sense disambiguation of prepositions and semantic role labeling of prepositional phrases, which can lead to an improvement for both subtasks. Che and Liu [22] propose a joint model of semantic role labeling and word sense disambiguation based on markov logic model, which can further improve the performance of semantic analysis.
Synergetic neural network [15] is a pattern recognition method proposed by German physicist professor Haken which extends synergetic theory into cognitive science and computer science by taking advantage of deep similarities between spontaneous pattern formation and pattern recognition based on synergetics. Synergetic neural network is without pseudo-state, but with relatively strong anti-noise and anti-defect capacity, and its order parameter evolutionary process conforms to human cognitive processes. In recent years, extensive research has been carried out in learning algorithm of SNN, especially in the setting of attention parameter [23–25], the selection of prototype pattern vector [26], and reconstruction algorithm of order parameters [27] and so on.
In the learning process of synergetic neural network, attention parameters can play an important role in biased regulation. By adjusting synergetic attention parameters, self-learning ability of synergetic neural network can be fully utilized to improve recognition results of the network. Fang et al. [23] adopt award-penalty learning method for attention parameters, which is to accordingly increase attention parameters that need to be correctly classified based on false identification of test pattern in the learning process, so as to decrease attention parameters of misidentified class. This dynamic adjustment of attention parameters enables the system to achieve best recognition performance. Ma et al. [24] believes that mediation of synergetic parameter is a global behavior, which introduces immune clone algorithm into optimization of synergetic attention parameter, and obtains better recognition performance by setting mutation probability and cloning scale. However, this method features increased search space and slow convergence rate. Zou et al. [25] proposed a parameter optimization algorithm based on differential evolution algorithm, which improves global and local optimization ability of algorithm. Nevertheless, the method is with slow convergence rate, not desirable in some practical applications.
In this paper, an improved quantum particle swarm optimization [27, 28] is proposed to implement the parameter optimization of synergetic neural network which can avoid some shortcomings of the traditional evolutionary algorithm, such as poor global convergence ability and slow convergence rate.
An integrative semantic analysis method based on synergetic neural network
A brief introduction to SNN
Synergetics mainly studies how each individual in complex system makes qualitative change of system state through mutual synergies. The basic idea is: The cooperative system is composed of various subsystems, and when each subsystem is far from equilibrium, synergies and coherence effects will be generated, a certain critical point will then be achieved, so that the system state has fundamental change in time and space, changing from disordered state into ordered state. Professor Haken pointed out that new models and new features formed by self-organization are governed by the sameprinciples.
A certain number of order parameters can be constructed according to test pattern q and prototype pattern, evolve with kinetic equation, and eventually drive test pattern q to enter a prototype pattern v
k
from the intermediate state q (t), thus completing identification of q, as shown in formula (1).
Haken noted that synergetic pattern recognition process can be described with potential kinetic equation. If fluctuation force F (t) and transient component are ignored, M is the number of prototype pattern vector, then potential function equation is:
Corresponding kinetic equation is:
Substitute order parameter into the formula (2), and then corresponding kinetic Equation of order parameter (4) and potential Equation (5) can be obtained.
According to the above synergetic theory, kinetic equation of order parameter can be discretized into formula (6).
In the formula , n represents the current iteration number, r is the step size. λ k is attention parameter, while B and C are given parameters.
This section constructs SRL model based on synergistic interaction between semantic roles, as shown in Fig. 1, which mainly contains order parameter configuration network and order parameter evolving network. In order parameter configuration network, configuration of semantic role chain order parameter is achieved. In order parameter evolving network, evolution equation configuration and network parameter configuration are implemented. Through iterative competition of semantic order parameters in dynamics equation, marked role chain is ultimately obtained.
The order parameter configuration network should complete the following tasks: Obtain feature vector from training corpus, and construct prototype pattern vectors; Obtain feature vector from test corpus, and construct test pattern vectors; Configure order parameter that reflects similarity between prototype pattern vector and test pattern vector; By sorting order parameter, obtain N-best selected roles; Assembly semantic roles to obtain candidate roles chain, and calculate order parameter of roles chain.
In order parameter evolving network, the following tasks must be completed: Configure dynamic evolution equation; Set network parameter; Through evolution of order parameter equation, obtain the final labeling pattern.
Semantic role labeling process can be seen as a competitive process of many roles chain order parameters, and order parameter with the largest support will win, thereby obtaining the final labeling results.
Semantic role labeling based on improved SNN with word sense
How to make full use of word sense information is a problem to be solved in the SNN-based SRL systems. Order parameter conversion is to make initial order parameter of corresponding pattern have a dominant position in the competition, and the role of regulation parameters can make the pattern whose initial order parameter is not the biggest win in the competition; so how to coordinate configuration between them in the optimization process is worthy of further study.
In this section, we implement a synergetic semantic role labeling model with the introduction of word sense information, as shown in Fig. 2. First, some role-related feature and word sense-related feature are added to semantic vector space model. Second, network parameters are reconstructed according to logical constraints that exist in semantic roles chain, to better reflect the logical relationship between word senses and semantic roles. Finally, the best roles chain will be obtained by the evolution of semantic order parameters.
Feature engineering
The features used in our model contain some basic features and extended features. These features have been proved to be effective for SRL task [30, 31]. To better reflect the role of word sense information, we also select a number of word sense-related features which are introduced in some SRL system [22]. This feature includes: Basic features: Predicate lemma, Predicate part of speech (POS), Predicate voice, Dependency, Predicate Path (The dependency chain from the current node to the current predicate), The dependency chain of predicate child, Predicate subclass framework, Head word and Head word path (the sequence of headword node relative to the current predicate). Expansion features: Syntactic path, Syntactic path length, Partial syntactic path length, Relation path, Partial relation path, Partial syntactic path, Dependency chain of predicate brothers, Predicate relation, Predicate+syntactic path, Syntactic subclasses framework of predicate, Head word chain of predicate brothers, Predicate+head word, POS of head word, Head word+POS and Current relation of head word. Word sense-related features
Lemma+Sense: For example sheep.n.1 (n represents that it is a noun, 1 is meaning number corresponding to WordNet).
Hypersense(n): The n-layer hypernym sense.For example, hypernym sense of “Sheep.n.1” is “bovine.n.1”, wherein, variable n denotes the n-layer hypernym sense.
For example, in WordNet, it can be found that hypernym sense corresponding to sheep.n.1 is bovine.n.1. Root sense is eventually obtained by continuous search. Through hypernym sense bovine.n.1 of sheep.n.1, it can be found that its hypernym sense is ruminant.n.1. If this process continues, root sense entity.n.1 can be eventually found. When n = 1,2, hypernym sense of “sheep” and “cow” are the same, namely “bovine.n.1” and “ruminant.n.1”. The word sense path of sheep.n.1 and cow.n.1 are:
Order parameters reconstruction with word sense
Order parameter reflects similarity degree of prototype pattern vector and test pattern vector. The closer prototype pattern vector and test pattern vector, the greater corresponding order parameter, and the greater possibility of winning the competition. So construction quality of order parameter will greatly affect recognition performance of the system. Traditional synergetic pattern recognition uses pseudo-inverse method to construct order parameter, but this method requires a lot of computing power, which is without practical significance for pattern recognition of large-scale data. Some researchers consider reconstruction method of order parameter, refuse seeking of order parameter with pseudo-inverse method, but directly construct order parameter based on similarity degree of input pattern vector and prototype pattern vector, thus saving time and space.
After the introduction of word sense information, how to construct the order parameter is a key problem. Suppose the test pattern vector with word sense information is q
s
, all possible roles of test pattern vector are , then order parameter ξ (q
s
, r
s
) of candidate role r
s
can be calculated by Equation (7):
Here, f
i
(q
s
, r
s
) is feature function, ω
i
represents weight of f
i
(q
s
, r
s
). Suppose q
s
= (qs1, qs2, …), then the expression of f
i
(q
s
, r
s
) is:
As long as ω i can be obtained, then the semantic order parameter ξ (q s , r s ) can be calculated by the formula (7). Estimation of ω i may adopt Sequential conditional generalized iterative scaling algorithm (SCGIS algorithm, [32]), which seems to be very efficient. SCGIS algorithm is a model selection or parameter calculation algorithm, we can use the SCGIS algorithm to train the parameters of the model iteratively.
Dynamical evolution Equation (4) reflects dominant principle of synergetics. It can be rewritten as:
In formula (8), λ k ξ k is a self-excitation term, which represents feedback excitation of the model. is self-inhibition term, which reflects the model’s inhibition of excessive growth. is a lateral inhibition item, which reflects mutual inhibition between the models, actually a penalty term. λ k is attention parameter, while B and C are given parameters.
Network parameter (λ k , B, C) have important influence on the recognition performance of synergetic neural network. In this section, an efficient reconstruction and optimization algorithm of network parameter is proposed.
Supposed q1, q2, …, q
n
are the patterns remained to be recognized, a number of features f
j
(q
i
) , (i = 1, 2, …, n ; j = 1, 2, …, m) and g
k
(q
i
) , (i = 1, 2, …, n ; k = 1, 2, …, m) are extractedfrom q1, q2, …, q
n
, where f
j
(q
i
) , (j = 1, 2, …, m) are the feature associated with role information and syntactic information, g
k
(q
i
) , (k = 1, 2, …, m) are the feature associated with word sense. n and m are the number of corresponding feature functions. Then
We used semantic information and word sense information to construct nework parameters, which can better describe the weight of each role chain.
4.3.2.1. Introduced to QPSO. Particle swarm optimization (PSO) is a population-based optimization algorithm [33, 34]. Quantum particle swarm optimization (QPSO) [28, 29] was proposed by Sun et al. in 2004, which is a probabilistic particle swarm optimization algorithm based on the quantum computation theory. It has been proved that the QPSO algorithm is a global convergence algorithm. QPSO algorithm does not need the velocity information of the particles, and has the advantages of less control parameters, simple operation and fast convergence speed. It is a general global optimization technique, which is suitable for solving all kinds of complex optimization problems. In recent years, a series of papers have focused on the application of QPSO, such as traveling salesman problems [35], sreamflow forecasting [36], RLV reentry [37] and economic load dispatch [38].
In the QPSO algorithm, the updated position of the particle is:
Wherein, t is the current number of iteration, x
i
= (xi1, xi2, ⋯ , x
iD
) is current position of particle i, p
i
= (pi1, pi2, …, p
iD
) is attractor of particle i in evolutionary iteration process, D indicates dimension of the problem to be solved. φ
ij
(t) and μ
ij
(t) are random numbers uniformly distributed in [0,1]. The control mode of L
ij
(t) has a critical impact on convergence rate and performance of algorithm, It can be calculated by introducing optimum position mbest.
Wherein, β is contraction expansion factor used to adjust convergence rate of algorithm, and its calculation formula is:
Here, T is the maximum number of iteration, t is the current number of iteration. Calculation formula of mbest is:
Then final updating equation of particle:
4.3.2.2. The improved QPSO algorithm (IQPSO). Traditional QPSO presence some defects such as easily falls into local optimum, slow convergence rate. Researchers have put forward a number of corresponding improvements. In order to maintain diversity of the population, Coelho [39] proposed a Gaussian quantum particle swarm optimization method, which realizes mutation through Gaussian probability distribution operator. Xi et al. [40] obtains optimum location mbest through weighted average, which can improve the global search ability of algorithm. These improvements in solving problems of prematurity are based on the idea of particle swarm diversity maintenance. These algorithms improve global search ability, but reduce convergence rate in varying degrees.
In quantum particle swarm algorithm, the local attractor is comprehensively decided by local optimum P
ij
(t) and global optimum P
gj
(t):
We noted if different convergence velocity and amplitude are used in different stages, the premature convergence phenomenon can also be effectively prevented, so as to obtain better convergence performance. The two accelerating factor c1 and c2 can not only affect the convergence speed, but also can lead to the occurrence of premature phenomenon. When the diversity is better, c1 should be set a little bigger, and c2 should be set slightly smaller. When the diversity is poor, it is just theopposite.
The degree of aggregation particle swarm can be described by diversity, and the diversity of particle swarm is indicated by α = [α1, α2, …, α
T
], wherein, α
t
is adaptive diversity of t iteration particle, T is the maximum number of iteration. Population diversity concerns the average of the current position of particulate in particle swarm and optimal group position.
In the formula: N represents the number of particles of particle swarm; |L| represents the maximum angular distance of search target space; x id represents d dimension component of i particle position; D indicates dimension of the problem to be solved; represents the average of d dimension component of all particle positions.
Assuming that Uc and Lc are the upper and lower bounds of the parameters respectively, then accelerating factor function can be obtained:
Algorithm 1. The improved QPSO algorithm
Initialize the particle swarm;
For t = 1 to maximum iteration T
Compute the mean best position;
For i = 1 to population size M
If f(x i ) <f(P i ) then P i = x i ; Endif
P g = min(P i );
For j = 1 to dimension D
update c1 and c2 by Equations (19) and (20);
update φ ij (t) by Equation (17);
update p ij (t) by Equation (16);
update x ij (t + 1) by Equation (15);
Endfor
Endfor
Endfor
The key problem of optimizing the network parameters using intelligent optimization algorithm is the setting of the fitness function. In the SRL system, we use F1 value as the performance test index. So in our system, we can set the fitness function of the particle individual as:
The reconstruction and optimization of network parameters based on improved QPSO can be described as Algorithm 2.
Algorithm 2. Reconstruction and optimization of network parameters based on improved QPSO
1) Role-related features and word sense-related features are used to construct semantic vector space model;
2) Obtain feature vectors from the train corpus and test corpus, construct test pattern q l (l = 1, 2, …) and possible roles r k (k = 1, 2, …);
3) Combination of all possible roles of q l (l = 1, 2, …), obtain all the possible roles chains R i (ri1, ri2, …);
4) Set λ i by Equation (9);
5) Obtain the optimal solution of parameters (α1, α2, …, α m , β1, β2 ⋯ , β n , B, C) based on improved QPSO algorithm;
6) Marked role chain is ultimately obtained through iterative competition of semantic order parameters in dynamics equation.
Experiment
Data description
In the experiments, the OntoNotes Release 2.0 corpus [41] is used to test the performance of semantic role labeling. The corpus has been annotated with predicate argument structure and word senses in three languages, and provide a chance for us to get word senses information and semantic role information. OntoNotes have seven data sets: ABC,CNN,MNB, NBC,PRI,VOA and WSJ. For each data set, we approximately take 60% of the training set / 20% of the development set/20% of test set. To obtain word sense information, corpus must be first preprocessed.
(1) Syntax Conversion
OntoNotes 2.0 corpus adopts phrase structure-based syntactic analysis. To study semantic role labeling based on dependency, the phrase-based syntactic structure must be translated into dependency-based syntactic structure. We adopt Constituent-to-Dependency Coversion Tool Kit for conversion.
(2) Word sense Information
In OnteNote 2.0 corpus, many words are provided in plural or past tense form. There is no corresponding lexical information, nor word sense information. This information can be obtained by calling API function provided by WordNet 2.1.
Experiment results
In order to carry out a detailed comparison, we use six strategies. SVM: semantic role labeling based on support vector machine. SNN: semantic role labeling based on SNN. SNN+BWS: SNN-based semantic role labeling with basic word sense information (Lemma+Sense). SNN+BWS+MWS: SNN-based semanticrole labeling with word sense information (Lemma+Sense, Hypersense(1), Hypersense(2), Hypersense(3), Lemma+Hypersense(1), Lemma+Hypersense(2), and Lemma+Hypersense(3)). SNN+IQPSO: SNN-based semantic role labeling based on improved QPSO. SNN+BWS+NWS+IQPSO: semantic role labeling with word sense information base on improved SNN.
The comparison of different methods are shown in Tables 1 to 6.
As can be seen from Table 1, F1 value obtained of 7 kinds of data sets from SNN model are better than the results based on SVM-based SRL model. The experiment with biggest upgrade is MNB corpus, with F1 value increased by 0.45 (84.76-84.31), and the remaining corpus with smallest upgrade of F1 value is NBC corpus, with F1 value increased by 0.09 (81.57-81.48). Seen from recall rate, recall rate obtained from VOA corpus and WSJ corpus is inferior to SVM-based SRL model, but the difference is small. Experimental results show that the method to obtain optimal roles chain through order parameter evolution is effective, which can improve the performance of semantic role labeling system.
As can be seen from Table 2, for all seven corpus, after adding basic word sense information, the system performance is improved. The experiment with biggest upgrade is ABC corpus, with F1 value increased by 0.53 (85.25-84.72), and the remaining corpus with smallest upgrade of F1 value is WSJ corpus, with F1 value increased by 0.05 (84.54-84.49).
To further discuss the role of word sense information, we add Lemma+Sense, Hypersense(1), Hypersense(2), Hypersense(3), Lexical+Hypersense(1), Lexical+Hypersense(2), and Lexical+Hypersense(3) in the experiment. As can be seen from experiment results of Table 3, after addition of these semantic features, the performance is further improved. It can be seen that word sense information can better describe semantic relations that cannot be reflected by lexical information and part of speech information, thus improving the performance of semantic role labeling system.
To further analyze impact of word sense-related features on semantic role labeling performance, we made further experiments on all corpus (Test set of all seven kinds of data sets as experimental test set). Table 4 shows performance comparison of different word sense features. The experiment with biggest upgrade is Hypersense(1), and the remaining corpus with smallest upgrade of F1 value is Hypersense(3).
From Table 5, we can see that the performance of SNN+IQPSO outperforms SNN model, and F1 value obtained in seven kinds of data sets is better than SNN model. The experiment with biggest upgrade is ABC, with F1 value increased by 0.62 (85.34-84.72), and the remaining corpus with smallest upgrade of F1 value is MNB, with F1 value increased by 0.12 (81.79-81.57). This demonstrates that optimization of model parameters is very important and to obtain model parameters through optimization algorithm can further improve system performance.
As can be seen from Table 6, compared to SVM-based SRL model, SNN+BWS+NWS+IQPSO enjoys better semantic role labeling performance. F1 values obtained from seven kinds of corpus are improved to some extent. The proposed models in this paper significantly enhance the efficiency and effectiveness of semantic analysis.
Figure 3 shows convergence curve of three different data sets, which indicates that IQPSO algorithm can more effectively select model parameters of synergetic semantic analysis model, so as improve performance of SRL system.
Conclusion
An improved SNN-based SRL model with word sense information is proposed and implemented in this paper. Experiments show the synergetic semantic analysis have a higher performance for SRL.
We got the following conclusions. The improved SNN model can effectively utilize word sense information, which can further improve the performance of semantic analysis. The improved QPSO algorithm based on adaptive accelerating factor selection has a better global convergence ability, and can effectively regulate the parameters of semantic analysis model.
In the next work, we will consider the use of deep learning [42] to construct the order parameter of SNN, and thus better describe the semantic relations in the semantic chain.
Footnotes
Acknowledgments
This work was supported by the Science and technology project of Fujian Provincial Education Department (Grant No. JA15026) and the Quanzhou science and technology project (Grant No. 2013Z17 and 2015Z113).
