Analysis of OWA operators for automatic keyphrase extraction in a semantic context

Abstract

Automatic keyphrase extraction from texts is useful for many computational systems in the fields of natural language processing and text mining. Although a number of solutions to this problem have been described, semantic analysis is one of the least exploited linguistic features in the most widely-known proposals, causing the results obtained to have low accuracy and performance rates. This paper presents an unsupervised method for keyphrase extraction, based on the use of lexico-syntactic patterns for extracting information from texts, and a fuzzy topic modeling. An OWA operator combining several semantic measures was applied to the topic modeling process. This new approach was evaluated with Inspec and 500N-KPCrowd datasets. Several approaches within our proposal were evaluated against each other. A statistical analysis was performed to substantiate the best approach of the proposal. This best approach was also compared with other reported systems, giving promising results.

Keywords

Automatic keyphrases extraction linguistic patterns topic modelling graph-based method semantic processing OWA operator

1. Introduction

The exponential growth of textual and unstructured data in digital format have led to a significant challenge in textual information processing, that of distilling the most important information from the amount of information available. The development of computational solutions based on the application of natural language processing (NLP) and text-mining techniques has emerged as the most promising option for dealing with this challenge.

In this context, a high-level description of a document can be obtained through relevant words or phrases, from their strong relationship with the main topic (s) addressed in the documents, so that automatic keyphrase extraction is an essential task for many text-mining solutions [19, 10]. The keyphrase provides a concise understanding of a text, enabling one to grasp the central idea and the main topics discussed in a text document, and it facilitates the construction of text-representation models, such as graph-based models. Several automatic keyphrase extraction models have been created over the last few years, some following a supervised approach [18, 11] and others using unsupervised techniques [3, 16, 20, 25, 26, 27, 28].

In this study we focus on unsupervised keyphrase extraction, where human-annotated training data for applying some machine-learning algorithm is not required in this process. The solutions reported still have low rates of accuracy and performance [10, 19], and semantics is one of the least-exploited linguistic features in the most widely-reported proposals, especially in unsupervised approaches.

According to [19], it is essential to focus on semantically and syntactically correct phrases and make sure that the keyphrases are semantically relevant to the document topic and context. Topic modeling for keyphrase extraction from texts is reported in [26, 25, 3], however the semantic analysis in those proposals has not been considered, or at least, not in all its possible dimensions, constituting a weakness. The semantic analysis of textual content, at the level of word meaning or relationships between them, is usually influenced by subjectivity, vagueness and imprecision, due to the inherent ambiguity of natural language, which constitutes a challenge for the computational solutions required by intensive semantic processing. Fuzzy logic offers a number of techniques for dealing with these problems, such as fuzzy set techniques, fuzzy clustering algorithms, and aggregation operators, among others. Despite these advantages, few keyphrase extraction proposals using a fuzzy logic approach to semantic analysis have been identified [26].

This paper proposes an unsupervised method for automatic keyphrase extraction from a single document. The method was conceived through the combination of the use of lexico-syntactic patterns with graph-based topic modeling, which is carried out from the fuzzy logic perspective. In this sense, syntactic and semantic measures are combined, applying the aggregation operator OWA (Ordered Weighted Averaging) [31], to increase the semantic processing level of the candidate phrase in topic identification. The description of the method gives an example, using a document of the Inspec datasets. In this example, the state of the text is shown at each stage of the proposal. The method was evaluated with the Inspec [11] and 500N-KPCrowd [18] datasets, and the performance was measured using the precision, recall, and the performance was measured using the precision, recall, and F-measure metrics. Several experiments were carried out with the purpose of providing a deeper grounding for the contribution of the proposal. Different topic modeling approaches were evaluated in the proposal, taking into account single and multi-criteria approaches. A comparison of multi-criteria approaches was performed with different numbers of keyphrases, in order to analyze the behavior of the proposal for different outputs. The best approach of the proposal was substantiated through a statistical analysis, the well-known Friedman Test and Post-Hoc Test. The approaches with the best results were compared with those obtained by other state-of-the-art unsupervised proposals, improving the results with respect to those systems included in the comparison.

Specifically, the contributions of this paper are the following: (1) we propose a new way of processing the semantic information in topic-modeling based keyphrase-extraction solutions, applying a fuzzy aggregation operator (OWA), and (2) we show, on two datasets, that the fuzzy topic modeling proposed can improve accuracy in the unsupervised automatic keyphrase-extraction process.

The rest of the paper is organized as follows: Section 2 summarizes the analysis of related work; Section 3 sets out the theoretical background of the main concepts; Section 4 describes the proposed method; Section 5 presents the datasets, metric description and experimental results and the corresponding analysis. Conclusions and future lines of work are given in Section 6.

2. Related works

Solutions for automatic keyphrase extraction in text documents are usually designed in four phases: pre-processing, identification and selection of candidate phrases, keyphrase determination, and evaluation [19]. Unsupervised keyphrase extraction approaches typically follow a standard three-stage process [10]. The first stage involves choosing the candidate lexical units with respect to some heuristic, such as the exclusion of stop words or the choice of words that are nouns or adjectives. The second stage is ranking these lexical units by measuring their importance through co-occurrence statistics or syntactic rules. The final stage concerns keyphrase formation, where the top-ranked lexical units are used either as keywords or as components of keyphrases. The unsupervised approach has the advantage of using only the information contained in the input text to determine the keyphrases [3, 16, 20, 25, 26, 27, 28].

The common baseline approach for unsupervised keyphrase extraction is tf-idf [12]. It ranks phrases in a particular document according to their frequency in this document (tf), multiplied by the inverse of their frequency in all documents of a collection (idf). Recently, Florescu and Caragea [8] proposed an approach for combining tf-idf with any other word-scoring approach. In their approach, a phrase’s score is computed by multiplying its frequency within the document (tf) with the mean of the scores of the words in the phrase.

Studies often extract keyphrases by collecting adjacent important adjectives and nouns. In [13], a statistical study of four public corpora shows that about 15% of keyphrases contain other kinds of words. This proposal introduces words other than adjectives and nouns to keyphrases, which improves the performance of extraction. It describes a novel approach to extracting keyphrases by collectingnoun phrases (NP’s) as candidate keyphrases using syntactic information, i.e. chunks and constituent syntactic parsing trees. Hence, the well-formedness of keyphrases is ensured by noun phrases from chunks and parsing trees. In addition, words other than adjectives and nouns are also considered to be part of the keyphrase pattern if they appear in candidates noun phrase.

The n-grams resulting in removing all stop words from text constitute the list of candidate keywords in SMAF Extractor [1]. For each keyword in that list, the algorithm fetches its synonyms from an external source. Then the synonyms in the new list are compared to the keywords of the original list if there is a match, the original list of keywords updated as the weights recalculated to include new frequencies. To decide which keywords to extract, SMAF Extractor relied on combined statistical metrics: the traditional term frequency measure, the keyword heading weight, and the keyword first occurrence position weight.

YAKE [6] also uses a statistical analysis over the candidate terms. The statistics of each term are computed considering structure, term frequencies, and co-occurrence. From these terms, the most relevant keywords are selected. The relevance of each candidate term is computed by aggregation of a number of statistical features: casing; term position; term frequency normalization; term relatedness to context and term occurrences in different sentences.

In TextRank [20] the candidate terms and their relationships are represented in an unweighted and undirected graph, whose vertices represent the terms and the arcs represent the co-occurrence relationships between them. An algorithm similar to PageRank [4] is applied to the constructed graph to determine the relevance of each vertex. Next, a third of the vertices of the whole graph are chosen as the most relevant vertexes. Finally, the relevant terms are marked in the text and the sequences of adjacent words are selected as keyphrases. A similar solution is considered in the Salience Rank algorithm [28], to obtain a ranking of the words in the document is combined with other salience measures in the context of an LDA (Latent Dirichlet Allocation) based topic modeling approach. [26] describes the co-occurrence graph of the words of the input text, which is customized for each topic by using the semantic information obtained from the topic model (built from Wikipedia articles) to form the topic graphs. Next, the communities and central nodes of these topical graphs are identified. This is done using the fuzzy modularity criterion for measuring the goodness of overlapped community structures. [27] (RAKE) also aplied a co-occurrence graph built with all individual words founded in the candidate keyphrases. The word score is calculated through the word degree as well as the word frequency. For multiple-word expressions, they calculated the weights by summing the members’ weights up.

WikiRank [34] uses a topic annotator to identify meaningful sequences of words (concepts) in the text and link them to a related Wikipedia page. On the other hand, the use of noun phrases through patterns is used in WikiRank to identify candidates’ keyphrases. A semantic graph is built linking the concepts with the candidate keyphrases that contain him. The candidate keyphrase with the most links to concepts is selected as keyphrases. [33] proposes another graph-based approach. The proposed graph is built using as vertex, words and sentences identified. Edges correspond to the three kinds of relationships: sentence-to-sentence; word-to-word and sentence-to word.In this approach a document can be grouped with many topics. [29] contains a multi-centrality index (MCI) approach, which aims to find the optimal combination of word rankings according to the analysis of nine centrality measures (Betweenness, Clustering Coefficient, Closeness, Degree, Eccentricity, Eigenvector, K-Core, PageRank, Structural Holes) for identifying keywords in co-occurrence word-graph representations of documents.

TopicRank [3] proposes a strategy based on the identification and analysis of topics to extract the relevant phrases. In this method, the longest sequences of nouns and adjectives in the text are extracted as candidate phrases, and the syntactically similar noun phrases are clustered into a theme or topic, using a hierarchical agglomerative clustering (HAC) algorithm [22]. Next, a graph is constructed where each vertex represents a topic and the arcs are labeled with a weight that represents the strength of the contextual relationship in the text between the candidate phrase contained in a topic with respect to those that were grouped with another topic to which it relates. Finally, only one keyphrase from each topic is selected, which is a weakness because a topic can be represented by more than one keyphrase in the same text. This proposal is improved in [25], which conceives a more flexible procedure for keyphrase selection from topics and incorporating the definition of a “distance-between-phrases” function into the candidate-phrases clustering process, although semantic processing remains limited, as in the case of [3]. Liu et al. [16] also consider the clustering of candidate phrases to represent the document’s themes, and a co-occurrence-based relatedness measure is applied for computing the semantic relatedness of candidate terms in this process.

According to the related studies analyzed, graph-based terms representation and topic modeling appear as promising alternatives for unsupervised keyphrase extraction from text. The unsupervised methods offer more significant strengths than supervised ones; nevertheless, they have as a weakness that the graph-based approach does not guarantee that the extracted keyphrases represent all the main topics of the document and it fail to reach a reasonable coverage level of the text document [19]. The good keyphrases of a document should be semantically relevant to the document theme or topic and cover the whole document well [16]. In this sense, the analyzed work shows a low use of semantic analysis in the clustering and topic modeling process carried out, or in any other task included. This semantic processing has focused on computing the co-occurrence relatedness [16, 20, 26, 27] or distance-based contextual relationship [25]. However, there are other levels and measures of semantic analysis, such as semantic similarity and semantic relatedness measures, which have not been explored. Our work is aimed at assessing the benefits of these other semantic measures in topic modeling from the fuzzy logic perspective to improve the outcomes of the unsupervised keyphrase extraction process.

3. Background

Although a number of automatic keyphrase extraction solutions have been developed over the last few years, semantic issues are one of the least exploited linguistic features; as can be seen in the analysis of related works. The process of semantic analysis can be conceived, fundamentally, from two perspectives: (1) considering only the textual content, for example, exploiting the contextual or co-occurrence relationship between terms, or (2) taking advantage of the external knowledge base, such as: WordNet [21].

WordNet [21] is a lexical database widely used to capture the underlying semantics of texts, whosebasic structure is the synset (acronym of synonyms set). The synset defines the meaning of a set of words that share a sense, and are interconnected by several types of lexical and semantic relations, being distributed in the form of a semantic network. Through this semantic network, it is possible to determine the semantic similarity or relatedness between two words by analyzing the synsets path formed by the different relations that connect them, directly or otherwise. Word-Net::Similarity [24] is a freely available software package developed for this purpose, which makes it possible to measure the semantic relatedness of a pair of concepts (or word senses). In our proposal, two types of semantic relatedness measures from WordNet::Similarity were evaluated in the topic modeling process, specifically, LCH (Leacock & Chodorow) and JCN (Jiang & Conrath), with the last being the most promising, according to [5].

In the automatic keyphrase extraction process, topic modeling refers to a clustering process ofvcandidate phrases which may be strongly linked from different perspectives, and are parts of the core topics. According to Liu et al. [16], the good keyphrases of a document should be semantically relevant to the document’s theme or topic and cover the whole document well. Therefore, the semantic processing in topic modeling reaches a higher level of importance. The semantic analysis of textual content, at the level of word meaning or relationships between them, is usually prone to problems of subjectivity, vagueness and imprecision, due to the inherent ambiguity of natural language. Fuzzy logic offers several techniques for dealing with these problems, such as fuzzy sets, fuzzy clustering algorithms, aggregation operators, and others. On the other hand, subjectivity and imprecision suggests that the semantic analysis in the identification of topics from candidate phrases should be complemented with other linguistic features (e.g. syntactic aspects) and context analysis; transforming topic detection into a multi-criteria decision problem.

Many aggregation operators have been developed to aggregate information [30], including the Ordered Weighted Averaging (OWA) operator, which has been widely used as a solution to multi-criteria decision problems. Aggregation refers to the process of combining values (numerical or non-numerical) into a single value, so that the final result of aggregation takes into account, in a given fashion, all the individual aggregated values [9]. Therefore, the OWA operator can be very useful in combining semantics with other linguistic aspects, through weightings assigned to each measure to be aggregated. This operator allows clusters of phrases to be found that are strongly related from different semantic dimensions, and at the same time, to achieve a wide coverage of the whole document in the topic modeling process.

Definition. An OWA operator of dimension n is a mapping denoted $f_{\textit{owa}}:R^{n}\rightarrow R$ that has associated an n-dimensional weight vector $W=[w_{1},w_{2},\ldots,w_{n}]^{T}$ such as $w_{i}\in[0,1]$ and $\sum_{\begin{subarray}{c}i\end{subarray}}w_{i}=1$ . The function $f_{\textit{owa}}$ is defined according to 1, with $b_{j}$ the $j^{\text{th}}$ largest element in the collection $a_{1}\ldots a_{n}$ .

$\displaystyle f_{\textit{owa}}(a_{i},\ldots,a_{n})=\sum_{\begin{subarray}{c}j=% 1\end{subarray}}^{n}w_{j}b_{j}$ (1)

There are different methods for determining the weights to be used in an OWA operator and the use of linguistic quantifiers is one of them [35], e.g. RIM (Regular Increasing Monotone) quantifiers. Yager proposed a method to calculate the weights of an OWA by means of RIM quantifiers [32], which is defined in Eq. (2). In our proposal, four RIM quantifiers were evaluated (see Table 1), as the first approach to measure the performance of the OWA operator in the keyphrase extraction problem.

Table 1

Linguistic quantifiers RIM

Id.	Linguistic term	Membership function	Orness
1	Average	$Q(x)=x$ if $0\leqslant x\leqslant 1$	0.5
2	At least half	Eq. (3)	0.75
3	Most (Pasi) [23]	Eq. (4)	0.35
4	most (Feng) [7]	Eq. (5)	0.33

Specifically, in our proposal, we apply the RIM quantifier “Most” (Feng & Dillon) reported in [7] (see Eq. (5)), as the first approach to measure the performance of the OWA operator in the keyphrase extraction problem.

$\displaystyle W_{j}=Q\left(\frac{j}{n}\right)-Q\left(\frac{j-1}{n}\right)$ (2) $\displaystyle Q(x)=\left\{\begin{array}[]{ll}2x&\text{if }0\leqslant x% \leqslant 0.5\\ 1&\text{if }0.5<x\leqslant 1\\ \end{array}\right.$ (3) $\displaystyle Q(x)=\left\{\begin{array}[]{ll}0&\text{if }0\leqslant x\leqslant 0% .4\\ 2(x-0.4)&\text{if }0.4<x\leqslant 0.9\\ 1&\text{if }0.98\leqslant x\leqslant 1\\ \end{array}\right.$ (4) $\displaystyle Q(x)=\left\{\begin{array}[]{ll}0&\text{if }0\leqslant x\leqslant 0% .5\\ (2x-1)^{0.5}&0.5<x\leqslant 1\\ \end{array}\right.$ (5)

4. Keyphrase extraction using OWA operator

The proposed method was conceived through the combination of the use of lexico-syntactic patterns with a topic modeling carried out from a fuzzy perspective. This method has four phases, as shown in Fig. 1: (1) text pre-processing, (2) fuzzy topics identification, (3) relevance evaluation of topics, and (4) keyphrases selection. Lexico-syntactic patterns were defined for extracting candidate phrases from the text, and a fuzzy clustering of candidate phrases is proposed for identifying the main topics in the texts, to improve the semantic analysis with respect to other proposals [3, 25, 26]. It also incorporates a more flexible mechanism of keyphrase selection from the relevant topics identified, which allows the extraction of more than one keyphrase and solving the weakness identified in TopicRank [3].

Figure 1.

Process of keyphrases extraction.

For a better understanding of the proposal, an example is developed step by step throughout the description of the method. The example text, shown as follow, is a document selected from the Inspec dataset. The gold keyphrase are highlighted in bold.

Example Text:Inverse problems for a mathematical model of ion exchange in a compressible ion exchanger.A mathematical model of ion exchange is considered, allowing for ion exchanger compression in the process of ion exchange. Two ion exchange are investigated for this model, unique solvability is proved, and numerical solution methods are proposed. The efficiency of the proposed methods is demonstrated by a numerical experiment.

4.1 Text pre-processing

At this stage, different NLP tasks are carried out to extract from the text the syntactic information which is required for the candidate phrase extraction process. Initially, plain text from the input file is extracted, segmented into paragraphs and sentences, and the set of tokens (e.g., words, numbers, and others) is obtained from each sentence. Subsequently, a deep syntactic analysis is carried out using the Freeling parser.

The extraction of candidate phrases is based on the identification of conceptual phrases and a set of defined lexico-syntactic patterns defined for this purpose, such as: [D | P | Z] $+$ [<s-adj>] $+$ NN; [D | P | Z] $+$ [<s-adj>] $+$ NN $+$ NN; [Z] $+$ <sn>; NN $+$ [IN] $+$ NN; VBN $+$ NN; JJ $+$ NN $+$ [NN], in a similar way to that reported in [25]. These patterns have been defined according to the grammar labeling used by 21 Freeling, and they combine a set of relevant grammatical categories in the composition of concepts. Most of these patterns have their origins in the most frequent patterns identified in the concepts included in the ontological knowledge resources analyzed, e.g. the ontology of the DBpedia project [14], which has more than 1000 concepts in different domains from Wikipedia. Through these patterns the coverage of the text in this process is increased with respect to other proposals that only consider noun phrases. Next, we show the segmentation the example text into paragraph and sentences:

Paragraph 1 (#1 sentences): (#1)Inverse problems for a mathematical model of ion exchange in a compressible ion exchanger. Paragraph 2 (#3 sentences): (#1)A mathematical model of ion exchange is considered, allowing for ion exchanger compression in the process of ion exchange. (#2)Two inverse problems are investigated for this model, unique solvability is proved, and numerical solution methods are proposed. (#3)The efficiency of the proposed methods is demonstrated by a numerical experiment.

Table 2 shows a list of candidate phrases and the corresponding patterns resulting from the pre-processing phase. The candidate phrases highlighted in bold are those that match the reference and those that are underlined correspond to the two that contain a reference’s keyphrase (ion exchange). In this case the patterns used do not allow a keyphrase of a reference to be identified exactly but can be included in the other two candidate phrases, and only one cannot be identified.

Table 2
Candidates phrases and patterns

Phrases	Patterns
Inverse problems	JJ NNS
Model of ion exchange	NN IN NN NN
Compressible ion exchanger	JJ NN NN
Process of ion exchange	NN IN NN NN
Model	NN
Proposed methods	VBN NNS
Numerical experiment	JJ NN
Unique solvability	JJ NN
Numerical solution methods	JJ NN NNS

4.2 Fuzzy identification of topics

The topic identification process is carried out using a hierarchical agglomerative clustering algorithm [22] over the extracted candidate phrases, which is addressed as a fuzzy logic problem for reinforcing the semantic analyses in the phrases clustering. The hierarchical agglomerative clustering algorithm was selected following the approach of TopicRank [3]. This algorithm assumes a similarity function to determine the similarity between two instances (or candidate phrases), which is perfectly suited to our problem. The need for a clustering algorithm that does not restrict the number of clusters is another point in its favor, given that a changing number of topics may be present in different texts, especially when it comes to achieving a general solution, i.e. for documents of different lengths.

Although the use of clustering algorithms for topic modeling has also been reported in [3, 25, 26], semantic analysis is not considered in those proposals, or at least not in all its possible dimensions. This is a weakness considering the assumption that a topic could be modeled through the cluttering of concepts that frequently appear together as well as concepts with similar meanings or that are semantically related. To address this weakness, in our new unsupervised approach the phrase clustering process is carried out considering the score resulting from combining the syntactic similarity and distance-between-phrases measures reported in [25] with a further two semantic similarity measures applying a fuzzy aggregation operator. Moreover, average distance (in words) between each pair of words of each pair of candidate phrases is calculated by Eq. (6), where $a$ is each word of a phrase with a total words $A$ and $b$ is each word of other phrases with a total words $B$ .

$\displaystyle\frac{1}{|A||B|}=\sum_{\begin{subarray}{c}a\in A\end{subarray}}% \sum_{\begin{subarray}{c}b\in B\end{subarray}}\textit{dist}(a,b)$ (6)

The two semantic similarity measures were conceived according to the sentence-to-sentence similarity metric reported in [15] and using two word-to-word semantic relatedness metrics from WordNet::Similarity package, specifically the Jiang and Conrath and Leacock and Chodorow metrics [24]. Additionally, the words distance metric reported in [25] was redefined (see Eq. (7)).

$\displaystyle D(F_{1},F_{2})=\left\{\begin{array}[]{ll}1&\text{if }F_{1}\text{% and }F_{2}\text{ appear in the same paragraph}\\ 1-\frac{\textit{ave\_dist}(F_{1},F_{2})}{TW}&\text{in other cases}\\ \end{array}\right.$ (7)

Where $\textit{ave\_dist}(F_{1},F_{2})$ is the average distance [25] in words that exists between the words included in the pair of phrases $F_{1}$ and $F_{2}$ , and TW is the total of words in the text. In this method, the OWA operator [31] is applied aggregating the resulting numerical values ( $a_{i}$ ) from the four defined measures into a single one similarity relatedness score (SRS) of a pair of candidate phrases. These measures represent features with different semantic meaning for the phrases clustering, as well as different relevance levels for the decision making in this process.

The hierarchical agglomerative clustering process is carried out by creating a square symmetric matrix of size n (total of candidate phrases identified), where each topic identifies a row, and a column and the intersection between each pair of topics contains the SRS (weight value) between a pair of candidate phrases that represent the corresponding topics. The relatedness matrix created from the example text is shown in the Table 3.

Table 3

Matrix creation

Topics	Inverse problems	Model of ion exchange	Compressible ion exchanger	Process of ion exchange	Model	Proposed methods	Numerical experiment	Unique solvability	Numerical solution methods
Inverse problems	0	0	0	0	0	0	0	0	0
Model of ion exchange		0	0.2886	0.7500	0.3893	0	0	0	0
Compressible ion			0	0.2886	0	0	0	0	0
exchanger
Process of ion exchange				0	0	0	0	0	0
Model					0	0	0	0	0
Proposed methods						0	0	0	0.4958
Numerical experiment							0		0.4082
Unique solvability								0	0
Numerical solution									0
methods

Table 4

Topics clustering: First iteration

Topics	Inverse problems	Model of ion exchange, process
of ion
exchange	Compressible ion exchanger	Model	Proposed methods	Numerical experiment	Unique solvability	Numerical solution methods
Inverse problems	0	0	0	0	0	0	0	0
Model of ion exchange,		0	0.2886	0.1946	0	0	0	0
process of ion exchange
Compressible ion exchanger			0	0	0	0	0	0
Model				0	0	0	0	0
Proposed methods					0	0	0	0.4958
Numerical experiment						0	0	0.4082
Unique solvability							0	0
Numerical solution methods								0

Initially, each candidate phrase is considered as a topic. In each iteration, the pair of topics with the highest weight value is merged. The average of the weight values is used as a clustering strategy of a pair of topics because it represents a balance between complete linking and single linking. Through the use of average linking, the weighting of the relation between the new formed topic $T(x)$ with the topic $T(y)$ is calculated by Eq. (8):

$\displaystyle\textit{Rel}(T(x),T(y))=\frac{\textit{Rel}(T(i),T(y))+\textit{Rel% }(T(j),T(y)}{2}$ (8)

being $\textit{Rel}(T(x),T(y))$ the weight of the relation between $T(x)$ and $T(y)$ , furthermore, $T(i)$ and $T(j)$ are the topics that came together to produce $T(x)$ . In each iteration, grouping two topics into a new topic, the associated weighting to relations of the new topic are recalculated. Table 4 shows the resulting matrix after the first iteration over the matrix of Table 3. In this iteration, the topics merged into a new topic were [model of ion exchange] and [process of ion exchange], with the best weight 0.75.

Table 5 represents the second iteration, on the matrix of Table 4. In the case of the text taken as an example, the second iteration gives the latter, and thus the topics are formed. In this iteration, the topics merged into a new topic were [proposed methods] and [numerical solution methods], with the best weight 0.4958.

Table 5

Topics clustering: Second iteration

Topics	Inverse problems	Model of ion exchange, process
of ion
exchange	Compressible ion exchanger	Model	Proposed methods, numerical solution
methods	Numerical experiment	Unique solvability
Inverse problems	0	0	0	0	0	0	0
Model of ion exchange, process		0	0.2886	0.1946	0	0	0
of ion exchange
Compressible ion exchanger			0	0	0	0	0
Model				0	0	0	0
Proposed methods, numerical					0	0.2041	0
solution methods
Numerical experiment						0	0
Unique solvability							0

[b] Fuzzy Identification of Topics $C$ : Candidates Key-phrases $T$ : Topics Listn $\leftarrow$ C.Size $i\leftarrow 1$ $n-1$ $j\leftarrow i+1$ $n$ matrix $[i][j]$ $\leftarrow$ Distance(C[i],C[j]) th $\leftarrow$ Threshold(matrix)min $\leftarrow$ MinValue(matrix)

th $<$ min T $\leftarrow$ MergePairOfTopics(min, matrix)min $\leftarrow$ MinValue(matrix)T

The process of clustering is shown in Algorithm 1. The algorithm input is a candidates keyphrase list resulting of the pre-processing phase and the output is a list of all identified topics. Firstly, the matrix is filled out (lines 2 to 6), where the distance (line 4) between each pair of phrases ( $C[i]$ and $C[j]$ ) is calculated, this distance corresponds to that calculated by the fuzzy aggregation function shown in Eq. (5) above. After building the matrix (e.g. Table 3), the process of clustering begins. A threshold is calculated in the function Threshold (matrix) (line 7), which works as a stop condition to the clustering, later with the function MinValue (matrix) (line 8), the minimum distance in the matrix is choosen. The function MergePairOfTopics (min, matrix) (line 9) merges in a new topic the pair of row-column that represent the topics $C[i]$ and $C[j]$ , as mentioned above (explanation of Table 5). Each iteration of this process result in a new matrix from which a new minimum is selected (line 9), to continue with the clustering process (lines 9 to 12), until the stop conditions is met.

The stage concludes by generating a graphic representation of the text, in which the identified topics are represented as vertices and these are linked by arcs labeled with the weight of the relation between them. Each weight represents the strength of the existing semantic relationship between the pair of topics. Topics A and B have a strong semantic relationship if the candidate phrases that include these topics frequently appear close together in the text. The weight $W_{i,j}$ is calculated according to Eqs (9) and (10). Equation (10) refers to the reciprocal distance between the positions of the candidate phrases $c_{i}$ and $c_{j}$ in the text, where $\textit{pos}(c_{i})$ represents all positions $(p_{i})$ of $c_{i}$ .

$\displaystyle W_{i,j}=\sum_{\begin{subarray}{c}c_{i}\in T_{i}\end{subarray}}% \sum_{\begin{subarray}{c}c_{j}\in T_{j}\end{subarray}}D(c_{i},c_{j})$ (9) $\displaystyle D(c_{i},c_{j})=\sum_{\begin{subarray}{c}p_{i}\in\textit{pos}(c_{% i})\end{subarray}}\sum_{\begin{subarray}{c}p_{j}\in\textit{pos}(c_{j})% \end{subarray}}\frac{1}{|p_{i}-p_{j}|}$ (10)

Figure 2 shows a sample of the graph built considering the identified topics from the example text and illustrates the output of this stage. The vertices correspond with the topics identified and the edge weights correspond to those calculated according to Eq. (9).

Figure 2.

Graph of topics.

The example shows the clusters resulting of the clustering approach. Indeed, the clustering succeed to group “model of ion exchange” and “process of ion exchange”, which share a high semantic content.

4.3 Relevance evaluation of topics

At this stage, the relevance of each topic represented in the constructed topic graph is evaluated using the TextRank [20] model.

The relevance score computed for each topic $T_{i}$ is based on the concept of “voting” (inspired in the PageRank algorithm [4]): the adjacent topics of $T_{i}$ with the highest score contribute more to the relevance evaluation f the topic $T_{i}$ . The relevance score $S(T_{i})$ is obtained via the Eq. (11), where $V_{i}$ is the set of adjacent topics of $T_{i}$ in the graph, and $\lambda$ is a damping factor that is usually 0.85 [4].

$\displaystyle S(T_{i})=(1-\lambda)+\lambda*\sum_{\begin{subarray}{c}T_{j}\in V% _{i}\end{subarray}}\frac{W_{i,j}\ast S(T_{i})}{\sum_{\begin{subarray}{c}T_{k}% \in V_{j}\end{subarray}}W_{j,k}}$ (11)

The process of edge and vertex weighing of the complete graph is described in Algorithm 2. The input is a topic list (unweighted topics) $T$ , resulting from the previous stage, and the output is a list of evaluated topics (weighted topics) $R$ . Next, the edges weights (lines 2 to 7) corresponding to each pair of vertices $T[i]$ , $T[j]$ is calculated from the function $\textit{Distance}(T[i],T[j])$ (line 4),which makes reference to Eq. (9) and each edge weighted is saved in a list (line 5). The score or weight of each vertex is then calculated (lines 8 to 11) using the edge list which links it to its adjacent vertices. The weight of the vertex $i$ is calculated using the function VertexScore (i, edgeList) (line 9) which corresponds with the Eq. (11). Each weighted vertex is saved in a list (line 10). The algorithm returned a list of weighted vertex or topics (line 12).

[H] Topic Evaluation $T$ : Topics List $R$ : Topics Rankn $\leftarrow$ T.Size $i\leftarrow 1$ $n-1$ $j\leftarrow i+1$ $n$ edgeWeight $\leftarrow$ Distance(T $[i]$ , T $[j]$ )edgeList.Add(edgeWeight) $i$ in T vertexWeight $\leftarrow$ VertexScore(i, edgeList)R.Add(vertexWeight)R

Table 6 shows an example of the ranking of topics identified from the text shown above. From these topics, the keyphrases will be selected.

Table 6

Ranking of topics

Topics	Weight
Model of ion exchange, process of ion exchange	0.5129
Inverse problems	0.3724
Compressible ion exchanger	0.3478
Proposed methods, numerical solution methods	0.2310
Unique solvability	0.2234
Model	0.2216
Numerical experiment	0.1745

The best weighted topics are those with the strongest semantic relationship to other topics, therefore, the best keyphrases should be identified from these topics. In this sense, it can be seen below, in Section 5, that the keyphrases that coincide with the reference keyphrases are identified from the best-valued topics.

5. Keyphrases selection

The selection of keyphrases from the most relevant topics identified in the previous phases is carried out according to the following criteria: (1) candidate phrase that first appears in the text; (2) most frequently used candidate phrase; and (3) candidate phrase that has a closer relationship with the others in each topic (centroid role). A mechanism that allows the three criteria to be combined has been implemented in our proposal, offering the possibility of extracting more than one keyphrase from each topic and greater flexibility in its execution, respect to the reported in [3] (only one of the criteria is considered which affects the coverage in the keyphrase extraction process). If more than one candidate phrase (associated with a topic) with the same higher frequency is identified, and the frequency value is higher than one, then all of them are selected. Otherwise, only the first candidate phrase that appears in the text will be chosen.

The process of keyphrase selection is described in Algorithm 3. The input is a ranked topic list $R$ and the output is the list of selected keyphrases $K P$ . First, the $n$ best ranked topics are selected using the function SelectNBestRankedTopics (R) (line 1), which sort the topics in descending order and select the $n$ first. Then, the keyphrases from each of the $n$ best topics (lines 2 to 6) are selected using three criteria. Using the first criterion, the most frequent phrase is selected as keyphrase, via the function mostFrequentKeyPhrases (i) (line 3). Using the second criterion, the keyphrase turns out to be the firt phrase that appears in the text, and is identified through the function firstKeyPhrase (i) (line 4). The last criterion allows the most similar phrases in the topic to be selected as keyphrases using the function centroidKeyPhrases (line 5). The algorithm returned the list of selected keyphrases $K P$ (line 7).

[H] Key-phrases Selection $R$ : Topics Rank $K P$ : Key-phrases ListrankedTopics $\leftarrow$ SelectNBestRankedTopics(R) $i$ in rankedTopics KP $+=$ mostFrequentKeyPhrases(i)KP $+=$ firstKeyPhrase(i)KP $+=$ centroidKeyPhrases(i)KP

From the description of the method and the example shown through it, the resulting keyphrase is identified below. In the list of keyphrases identified from the example text those that match the reference are highlighted in bold, and those that include a reference’s keyphrase are underlined:

model of ion exchange , inverse problems, compressible ion exchanger, numerical solution methods, unique solvability, model

For this example, we can see a high level of matching (4.5 out of 7) of keyphrases identified by our proposal with the reference’s keyphrases, taking into account the one that included the reference’s keyphrase. In this sense, the accuracies reached, reflected by the metrics Precision and Recall (boarded later in Subsection 6.1), are 75% and 64.3% respectively.

6. Experimental and results

6.1 Datasets description

To evaluate the effectiveness of our proposal, we used two standard and publicly available datasets characterized by different types and sizes of documents. The Inspec dataset [11] consists of 2000 abstracts of scientific journal papers in computer science collected between the years 1998 and 2002, and divided into sets of 1000, 500, and 500, as training, validation and test datasets respectively. Each document has two lists of keywords assigned by humans (controlled keywords), which are assigned by the authors, although restricted by the Inspec thesaurus, and uncontrolled, which are freely assigned by the expert readers. The controlled keywords are mostly abstractive, and therefore may not appear in the document, whereas the uncontrolled ones are mostly extractive.

The 500N-KPCrowd dataset [17] consists of 500 English broadcast news stories in 10 different categories (e.g. Politics, Sports) with 50 docs per category. The ground truth or gold standard is defined using Amazon’s Mechanical Turk service to recruit and manage taggers. Multiple annotators were required to look at the same news story and assign a set of keywords from the text itself. The final ground truth consists of keywords selected by at least 90% of the taggers. A statistical characterization of these datasets is shown in Table 7.

Table 7
Datasets characterization

Dataset	Type of doc.	#Doc	#Words/doc	#Gold keys/doc
Inspec	Paper abstracts	500 (test collection)	124.4	9.8
500N-KPCrowd	News stories	500	432.7	39.9

6.2 Evaluations metrics

The performance of the method was measured using the precision (P), recall (R), and F-measure (F) metrics.

•
Precision (see Eq. (12)) is a measure of the probability that if a concept is selected as a keyphrase by a method then it is a correct keyphrase. The precision gives the proportion of the correctly extracted keyphrases among all of the retrieved phrases.

$\displaystyle\textit{Precision}=\frac{\text{correct extracted keyphrase}}{% \text{total extracted keyphrase}}$ (12)
•
Recall (see Eq. (13)) is a measure of the probability that if a concept is selected as a keyphrase then the method will correctly identify it. Recall gives the proportion of the correctly extracted keyphrase from among all the gold standard keyphrases.

$\displaystyle\textit{Recall}=\frac{\text{correct extracted keyphrase}}{\text{% total gold standard keyphrase}}$ (13)
•
F-measure (see Eq. (14)) is the tradeoff between precision and recall. The high value of the F-measure would mean a reasonably high score on both precision and recall, due to it is the harmonic mean of the precision and recall.

$\displaystyle\textit{F-measure}=\frac{2\times\text{precision}\times\text{% recall}}{\text{precision}+\text{recall}}$ (14)

6.3 Experimental setup

In this section, we will describe the experimental setup that was considered for both datasets and used to evaluate the effectiveness of the OWA-based topic modeling in keyphrase extraction. In our experiment, we consider the uncontrolled keywords from Inspec as gold-standard keyphrases to guarantee that the keywords appear in the text. In the case of the 500N-KPCrowd, the most and second most selected keyphrases in the ranked keyphrases of each document were considered as gold standard. For each document and each algorithm, we compute the macro-averaged precision, recall and F-measure for measuring the algorithm’s performance. The following experimental tasks were performed:

1.
Evaluating four variants of the OWA operator, using four different linguistic quantifiers, and comparing them with two other variants based on the use of single semantic relatedness metrics. Selecting the qualifier that provides the best results and verifying the benefits of the aggregation metric in semantic processing.
2.
To evaluate the impact of each variant of an OWA-based solution according to the Top N keyphrases extracted, whose results would provide more detail to the evaluation of the different quantifiers under assessment.
3.
Compare the results obtained by the best OWA solution identified in the previous tasks with the results obtained by other proposals.

Friedman’s Test was performed to validate the results obtained. From each dataset, 250 texts were randomly selected to constitute the sample group, up to 50% of the processed texts. In each test, the level of statistical significance was 95%, which means that the null hypothesis will be rejected when the $p_{\textit{value}}\leqslant$ 0.05. Holms, Finner and Li procedures were performed as post-hoc tests to finding significant differences between the solutions evaluated.
6.4 Results and discussion

Table 8 shows the results of the first experimental task, where the evaluated solutions are grouped in single criteria (topic modeling based on the use of single semantic relatedness metrics) and multi-criteria (topic modeling through the OWA-based aggregation metrics); the best results are highlighted in bold. As shown in Table 8, the precision obtained by the use of LCH or JCN measures in Inspec, as well as the recall in 500N-KPCrowd were good, although not enough. Through this experiment, the use of single semantic relatedness metrics shown that, although the precision increases, the recall significantly diminishes when the size of the documents increases. This behavior is not shown in the OWA-based solution of topic modeling, which reached higher values in most of the evaluation metrics in both datasets, specifically the OWA ${}_{\textit{Most(Feng)}}$ and OWA ${}_{\textit{Most(Pasi)}}$ solutions. These approaches are achieved more balanced effectiveness among precision and recall for different sizes of documents.

In [25], the contribution of the use of single metrics of syntactic similarity or distance between phrases for keyphrase extraction were proved in short and long texts. Them, based on those reported results, and for obtaining a more general proposal that offers good and better results for different types of documents these metrics were aggregated with the semantic measures in the topic modeling process; key process in our approach. According to the results shown in Table 8, using the OWA ${}_{\textit{Most(Feng)}}$ and OWA ${}_{\textit{Most(Pasi)}}$ operators, that objective can be reached.

Table 8
Results with Inspec and 500N-KPCrowd according to the topic modelling approaches

Topic modelling approach	Metrics	Inspec			500N-KPCrowd
		P	R	F	P	R	F
Single criteria	LCH	38.2	53.2	44.5	59.0	19.0	28.6
	JCN	39.5	55.1	46.2	55.0	18.9	28.1
Multi-criteria (aggregating metrics)	OWA ${}_{\textit{Average}}$	41.5	40.8	41.1	60.7	20.7	30.9
	OWA ${}_{\textit{AtLeastHalf}}$	42.4	37.4	39.7	60.8	20.7	30.9
	OWA ${}_{\textit{Most(Feng)}}$	42.5	61.9	50.4	57.3	44.1	49.9
	OWA ${}_{\textit{Most(Pasi)}}$	42.8	62.0	50.7	57.4	43.9	49.8
	Average	42.3	50.5	45.5	59.0	32.3	40.4

To verify the effectiveness of these solutions, they were evaluated considering the Top N (N $=$ 5, 10, 15 and 20) keyphrase extracted. Table 9 shows these results. This comparison allows having a better understanding of the behavior of OWA operators to different outputs. The results obtained show that when the number of N is increasing, the evaluation metrics enhance significantly, and OWA ${}_{\textit{Most(Feng)}}$ and OWA ${}_{\textit{Most(Pasi)}}$ are the best solutions according to the F-measure. It is also important to point out in this experiment, that while N is increasing the obtained results of precision and recall of both solutions are also increasing, and with a balanced behavior, which is very encouraging.

Table 9

Impact of the OWA-based solution proposed according to Top N keyphrase extracted

#Keyphrases	OWA	Inspec			500N-KPCrowd
		P	R	F	P	R	F
Top 5	OWA ${}_{\textit{Average}}$	27.9	26.8	27.3	28.5	9.1	14.0
	OWA ${}_{\textit{AtLeastHalf}}$	30.1	26.4	28.2	29.9	9.2	14.1
	OWA ${}_{\textit{Most(Pasi)}}$	21.9	30.6	25.5	15.6	9.6	11.9
	OWA ${}_{\textit{Most(Feng)}}$	22.1	30.9	25.8	16.0	9.6	12.0
Top 10	OWA ${}_{\textit{Average}}$	39.2	38.0	38.6	45.1	14.6	22.1
	OWA ${}_{\textit{AtLeastHalf}}$	41.2	36.3	38.6	46.6	14.6	22.2
	OWA ${}_{\textit{Most(Pasi)}}$	33.7	47.5	39.4	26.9	16.8	20.7
	OWA ${}_{\textit{Most(Feng)}}$	33.9	47.5	39.6	27.5	16.8	20.9
Top 15	OWA ${}_{\textit{Average}}$	41.2	40.0	40.6	53.9	17.7	26.7
	OWA ${}_{\textit{AtLeastHalf}}$	42.2	37.2	39.5	55.1	17.5	26.5
	OWA ${}_{\textit{Most(Pasi)}}$	39.4	56.1	46.3	34.5	22.3	27.1
	OWA ${}_{\textit{Most(Feng)}}$	39.6	56.2	46.5	35.1	22.1	27.1
Top 20	OWA ${}_{\textit{Average}}$	41.3	40.2	40.8	58.1	19.3	29.0
	OWA ${}_{\textit{AtLeastHalf}}$	42.4	37.4	39.7	58.4	18.8	28.4
	OWA ${}_{\textit{Most(Pasi)}}$	41.8	60.0	49.0	40.4	26.8	32.2
	OWA ${}_{\textit{Most(Feng)}}$	42.1	60.1	49.5	40.8	40.8	40.8

These results were validated through statistical tests, applying Friedman’s Test to obtain the solution ranking, complemented with Post-Hoc Tests to find significant differences between the solutions. The previous results show the following to be true:

•

H ${}_{0}$ (Inspec): The OWA ${}_{\textit{Most(Feng)}}$ is not the best solution for keyphrase extraction in Inspec;

•

H ${}_{1}$ (Inspec): The OWA ${}_{\textit{Most(Feng)}}$ is the best solution for keyphrase extraction in Inspec;

•

H ${}_{0}$ (500N-KPCrowd): The OWA ${}_{\textit{Most(Pasi)}}$ is not the best solution for keyphrase extraction in 500N-KPCrowd;

•

H ${}_{1}$ (500N-KPCrowd): The OWA ${}_{\textit{Most(Pasi)}}$ is the best solution for keyphrase extraction in 500N-KPCrowd.

First, the statistical tests were performed to analyze the linguistic quantifiers in the OWA-based approaches, and the results obtained from the sample group of each dataset are shown in Table 10. The Friedman’s Test shown that the best solutions are OWA ${}_{\textit{Most(Feng)}}$ and OWA ${}_{\textit{Most(Passi)}}$ (each one with better results in a specific dataset), according to the ranking and using the values of the F-measure as a reference. Next, Post-Hoc Tests was carried out, using OWA ${}_{\textit{Most(Feng)}}$ and OWA ${}_{\textit{Most(Passi)}}$ as control solutions in the context of the text from Inspec and 500N-KPCrowd, respectively. The results obtained shown that the null hypothesis (H ${}_{0}$ (Inspec) and H ${}_{0}$ (500N-KPCrowd)) should be rejected, because all p-values obtained from the Post-Hoc procedures are less or equal to 0.05. The existence of significant differences between the control solutions selected and the rest of the solutions compared (contextualized in each dataset) was also proven.

Table 10

Results of the Friedman Test and Post-Hoc procedures applied to each OWA-based proposal

Configuration Test	OWA-based proposal	Ranking	Post-Hoc
			p ${}_{\textit{Holm}}$	p ${}_{\textit{Finner}}$	p ${}_{\textit{Li}}$
Dataset: Inspec	OWA ${}_{\textit{Most(Feng)}}$	1.920	–	–	–
	OWA ${}_{\textit{Most(Passi)}}$	1.952	0.0500	0.0500	0.0500
Quality metric: F-measure	OWA ${}_{\textit{AtLeastHalf}}$	3.052	0.0257	0.0336	0.0115
	OWA ${}_{\textit{Average}}$	3.076	0.0167	0.0169	0.0115
Dataset: 500N-KPCrowd	OWA ${}_{\textit{Most(Passi)}}$	1.542	–	–	–
	OWA ${}_{\textit{Most(Feng)}}$	1.626	0.0500	0.0500	0.0500
Quality metric: F-measure	OWA ${}_{\textit{AtLeastHalf}}$	3.302	0.0250	0.0336	0.0281
	OWA ${}_{\textit{Average}}$	3.530	0.0167	0.0169	0.0281

Having identified the best OWA solution for each dataset, a similar statistics evaluation was performed, but comparing these solutions with the single semantic measures, whose results are shown in Table 11. In this case, the Friedman’s Test shows that the best solutions are OWA ${}_{\textit{Most(Feng)}}$ and OWA ${}_{\textit{Most(Passi)}}$ with a similar performance regarding the datasets. The results of the Post-Hoc Test carried out shows that the null hypothesis (H ${}_{0}$ (Inspec) and H ${}_{0}$ (500N-KPCrowd)) should be rejected too, because in most of the test the $p$ -values obtained are less or equal to 0.05. The significant differences between the OWA-based solution respect to the single semantic measures was also proven.

Table 11

Results of the Friedman Test and Post-Hoc procedures applied to the best OWA-based solution and the single criteria approach

Configuration Test	Solutions	Ranking	Post-Hoc
			p ${}_{\textit{Holm}}$	p ${}_{\textit{Finner}}$	p ${}_{\textit{Li}}$
Dataset: Inspec	OWA ${}_{\textit{Most(Feng)}}$	1.686	–	–	–
Quality metric: F-measure	LCH	2.066	0.050	0.050	0.0500
	JCN	2.248	0.025	0.0253	0.0503
Dataset: 500N-KPCrowd	OWA ${}_{\textit{Most(Passi)}}$	1.182	–	–	–
Quality metric: F-measure	LCH	2.376	0.050	0.050	0.0500
	JCN	2.442	0.025	0.0253	0.0526

Proving that the best approach of the proposal is the OWA-based solution, a comparison of results with other algorithms reported in the state of the art is carried out. This comparative is shown in Table 12, where the proposal gives promising results.

Table 12

Comparative results with other algorithms reported

	Systems	Inspec			500N-KPCrowd
		P	R	F	P	R	F
Group A	Liu et al. [16]	35.0	66.0	45.7	–	–	–
	Thi et al. [13]	38.1	46.1	41.7	–	–	–
	WikiRank [34]	28.4	25.9	27.0	–	–	–
	EmbedRank [2]	31.5	49.2	38.4	–	–	–
Group B	SMAF Extractor [1]	–	–	–	42.7	24.8	29.8
	YAKE [6]	–	–	–	25.1	6.3	10.1
Group C	TextRank [20]	31.2	43.1	36.2	26.5	6.3	10.3
	TopicRank [3]	36.4	39.0	35.6	26.2	23.9	25.0
	TSAKE [26]	40.1	20.3	26.9	14.3	46.6	21.9
	Salience Rank [28]	26.5	29.8	26.6	25.3	22.2	22.9
	Ying et al. [33]	43.0	40.2	39.6	48.7	49.8	47.8
	RAKE [27]	33.7	41.5	37.2	12.0	3.8	5.8
	Vega et al. [29]	49.2	51.8	50.5	44.8	44.3	44.5
	Avg. (Baseline)	35.7	41.2	36.8	19.3	23.0	21.7
	OWA ${}_{\textit{Most(Pasi)}}$	42.8 (III)	62.0 (II)	50.7	57.4	43.9 (V)	49.8 (II)
	OWA ${}_{\textit{Most(Feng)}}$	42.5 (IV)	61.9 (III)	50.4 (III)	57.3 (II)	44.1 (IV)	49.9

Our proposal, in Inspec, shows a slight improvement of F-measure in the state of the art (just a 0.2%), according to Vega et al. [29]. In case of recall and precision, both, are the third best. The best recall obtained with Inspec, achieved by Liu et al. [16] (66 %), is due to its clustering based on term semantic relatedness which guarantees the extracted keyphrases have good coverage in the document. Nevertheless, our proposal, generally has the best results.

The recall achieved with 500N-KPCrowd was the least satisfactory result of our proposal, although the results of precision and F-measure were significantly better than those obtained by the other proposals. Although the recall of Yign et al. [33] is the highest in the case of 500N-KPCrowd, its precision is approximately 9% and F-measure 2% lower than our method.

The low recall value obtained can be explained by the presence of a high number of annotated named entities as keyphrases in 500N-KPCrowd. The identification of named entities as candidate phrases from the text was not considered within the defined patterns in the pre-processing phase of the proposed approach, because this type of sentence is not often identified as a keyphrase. On the other hand, the OWA operator applied in the proposed fuzzy modeling of topics includes the aggregation of several semantic measures, which may fail in the case of named entities. This situation suggests a specific analysis for this type of phrase in subsequent applications of our proposal. Nevertheless, through the experiments, the improvement in effectiveness achieved by our method and the fuzzy-based semantic processing proposed in the automatic keyphrase extraction from two types of texts, such as: paper abstracts and news stories, has been proven.

7. Conclusions and future works

This paper presents a new unsupervised method for automatic keyphrase extraction from text, which combines the use of lexico-syntactic patterns to identify the candidate phrases with a fuzzy modeling of topics. The use of linguistic patterns allowed more possibilities for identifying the candidate phrases and improved the coverage of the text. Several syntactic and semantic measures for modeling the most relevant linguistics features of the candidate phrases were aggregated applying an OWA aggregation operator. The aggregation of these measures through the OWA operator increased the semantic processing of the candidate phrase in the topic identification, which is a little-considered aspect in most of the existing proposals. The proposed method was evaluated on two datasets with different types of texts, and the results obtained were compared with those from other unsupervised schemes. From the different approaches analyzed in the proposal, it was possible to demonstrate that by using the aggregation of several semantic measures (multi-criteria), the best results are achieved with the use of this measure independently (single criteria). This was shown not only by the significant differences between the results, but in addition the use of statistical analysis also confirmed the benefits of using a multi-criteria approach. In this sense, it was also concluded that increasing the amount of identified keyphrases can improve the results.

A slight improvement of F-measure was achieved in both datasets compared to another proposal reported in the state of the art. The most significant result was obtained in 500N-KPCrowd, where a remarkable improvement was found in Precision compared to the other proposals. Although in general the best values were not achieved with Precision and Recall, it was possible to obtain a better balance of these metrics, which contributed to the improvement of the F-measure values. The results obtained with the proposed method promising, demonstrating the contribution of applied fuzzy topic modeling for improving the keyphrase extraction process, in paper abstracts and in more general texts, such as news stories. The improvement of the recall results on general domain texts is a challenge to be solved in the future, considering specific analysis for the entities named. Additionally, other linguistic quantifiers applied to the OWA operator will be assessed and their performance in the keyphrase extraction process will be measured.

Footnotes

Acknowledgments

This work has been partially supported by FEDER and the State Research Agency (AEI) of the Spanish Ministry of Economy and Competition under grant MERINET: TIN2016-76843-C4-2-R (AEI/FEDER, UE).

References

Abdou

Salah

and AbdelGaber

, Unsupervised automatic keywords and keyphrases extractor for web documents, International Journal of Computer Science and Information Security (IJCSIS) 15(10) (2017).

Bennani-Smires

Musat

Hossmann

Baeriswyl

and Jaggi

, Simple unsupervised keyphrase extraction using sentence embeddings, arXiv preprint arXiv:1801.04470, 2018.

Bougouin

Boudin

and Daille

, TopicRank: Graph-based topic ranking for keyphrase extraction, in: Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, Oct. 2013, pp. 543–551. Asian Federation of Natural Language Processing.

Brin

and Page

, Reprint of: The anatomy of a large-scale hypertextual web search engine, Computer Networks 56(18) (2012), 3825–3833.

Budanitsky

and Hirst

, Evaluating wordnet-based measures of lexical semantic relatedness, Computational Linguistics 32(1) (2006), 13–47.

Campos

Mangaravite

Pasquali

Jorge

Nunes

and Jatowt

, Yake! Keyword extraction from single documents using multiple local features, Information Sciences 509 (2020), 257–289.

Feng

and Dillon

T.S.

, Using fuzzy linguistic representations to provide explanatory semantics for data warehouses, IEEE Transactions on Knowledge and Data Engineering 15(1) (2003), 86–102.

Florescu

and Caragea

, Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 1105–1115.

Grabisch

Orlovski

S.A.

and Yager

R.R.

, Fuzzy aggregation of numerical preferences, in: Fuzzy Sets in Decision Analysis, Operations Research and Statistics, Springer, 1998, pp. 31–68.

10.

Hasan

K.S.

and Ng

, Automatic keyphrase extraction: A survey of the state of the art, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, pp. 1262–1273.

11.

Hulth

, Improved automatic keyword extraction given more linguistic knowledge, in: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2003, pp. 216–223.

12.

Jones

K.S.

, A statistical interpretation of term specificity and its application in retrieval, Journal of documentation, 1972.

13.

T.T.N.

Le Nguyen

and Shimazu

, Unsupervised keyphrase extraction: Introducing new kinds of words to keyphrases, in: Australasian Joint Conference on Artificial Intelligence, Springer, 2016, pp. 665–671.

14.

Lehmann

Isele

Jakob

Jentzsch

Kontokostas

Mendes

P.N.

Hellmann

Morsey

Van Kleef

Auer

et al., Dbpedia – a large-scale, multilingual knowledge base extracted from wikipedia, Semantic Web 6(2) (2015), 167–195.

15.

McLean

Bandar

Z.A.

O’shea

J.D.

and Crockett

, Sentence similarity based on semantic nets and corpus statistics, IEEE Transactions on Knowledge and Data Engineering 18(8) (2006), 1138–1150.

16.

Liu

Zheng

and Sun

, Clustering to find exemplar terms for keyphrase extraction, in: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, Association for Computational Linguistics, 2009, pp. 257–266.

17.

Marujo

Gershman

Carbonell

Frederking

and Neto

J.P.

, Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization, arXiv preprint arXiv:1306.4886, 2013.

18.

Marujo

Ribeiro

de Matos

D.M.

Neto

J.P.

Gershman

and Carbonell

, Key phrase extraction of lightly filtered broadcast news, in: International Conference on Text, Speech and Dialogue, Springer, 2012, pp. 290–297.

19.

Merrouni

Z.A.

Frikh

and Ouhbi

, Automatic keyphrase extraction: An overview of the state of the art, in: 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt), IEEE, 2016, pp. 306–313.

20.

Mihalcea

and Tarau

, Textrank: Bringing order into text, in: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004, pp. 404–411.

21.

Miller

G.A.

, WordNet: An electronic lexical database, MIT press, 1998.

22.

Müllner

, Modern hierarchical, agglomerative clustering algorithms, arXiv preprint arXiv:1109.2378, 2011.

23.

Pasi

and Yager

R.R.

, Modeling the concept of majority opinion in group decision making, Information Sciences 176(4) (2006), 390–414.

24.

Pedersen

Patwardhan

Michelizzi

et al., Wordnet::Similarity-measuring the relatedness of concepts, in: AAAI, Vol. 4, 2004, pp. 25–29.

25.

Pérez-Guadarramas

Rodríguez-Blanco

Simón-Cuevas

Hojas-Mazo

and Olivas

J.Á.

, Combinando patrones léxico-sintácticos y análisis de tópicos para la extracción automática de frases relevantes en textos, Procesamiento del Lenguaje Natural (59) (2017), 39–46.

26.

Rafiei-Asl

and Nickabadij

, Tsake: A topical and structural automatic keyphrase extractor, Applied Soft Computing 58 (2017), 620–630.

27.

Rose

Engel

Cramer

and Cowley

, Automatic keyword extraction from individual documents, Text Mining: Applications and Theory 1 (2010), 1–20.

28.

Teneva

and Cheng

, Salience rank: efficient keyphrase extraction with topic modeling, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2017, pp. 530–535.

29.

Vega-Oliveros

D.A.

Gomes

P.S.

Milios

E.E.

and Berton

, A multi-centrality index for graph-based keyword extraction, Information Processing & Management 56(6) (2019), 102063.

30.

and Da

Q.-L.

, An overview of operators for aggregating information, International Journal of Intelligent Systems 18(9) (2003), 953–969.

31.

Yager

R.R.

, On ordered weighted averaging aggregation operators in multicriteria decisionmaking, IEEE Transactions on systems, Man, and Cybernetics 18(1) (1988), 183–190.

32.

Yager

R.R.

, Quantifier guided aggregation using owa operators, International Journal of Intelligent Systems 11(1) (1996), 49–73.

33.

Ying

Qingping

Qinzheng

Ping

and Panpan

, A graph-based approach of automatic keyphrase extraction, Procedia Computer Science 107 (2017), 248–255.

34.

and Ng

, Wikirank: Improving keyphrase extraction based on background knowledge, arXiv preprint arXiv:1803.09000, 2018.

35.

Zadeh

L.A.

, A computational approach to fuzzy quantifiers in natural languages, in: Computational Linguistics, Elsevier, 1983, pp. 149–184.

Analysis of OWA operators for automatic keyphrase extraction in a semantic context

Abstract

Keywords

1. Introduction

2. Related works

3. Background

Table 2 Candidates phrases and patterns

6. Experimental and results

6.1 Datasets description

Table 7 Datasets characterization

Table 8 Results with Inspec and 500N-KPCrowd according to the topic modelling approaches

Footnotes

Acknowledgments

References

Table 2
Candidates phrases and patterns

Table 7
Datasets characterization

Table 8
Results with Inspec and 500N-KPCrowd according to the topic modelling approaches