EDCOSUM: Text extractive summarization framework based on edge information with coreference resolution

Abstract

Automatic Text Summarization(ATS) is distinctly beneficial due to a vast amount of textual data and time-consuming manual summarization. In order to enhance ATS for single document in huge datasets, a new extractive graph framework - text extractive SUMmarization framework based on EDge information with COreference resolution EDCOSUM is proposed in this paper that relies on coreference resolution, adding edge information in word-level graph and a sentence-ranking strategy. EDCOSUM combines the graph-based and statistical-based extractive summarization methods. It is a general method for any document (not limited to a specific domain). Moreover, two ranking strategies(sentence and LSA ranking strategy) are proposed for sentence selection. A set of extensive experiments on CNN/Daily Mail and NEWSROOM are conducted for investigating the proposed method. The widely used automatic evaluation tool: Recall-Oriented Understudy for Gisting Evaluation(ROUGE) is utilized to evaluate EDCOSUM. Compared to the state-of-the-art ATS methods, EDCOSUM achieves a competitive result by improvements of over the highest scores in the literature for metrics ROUGE-1, ROUGE-2 and ROUGE-L respectively.

Keywords

Text extractive summarization Graph theory Coreference resolution Word-level graph Ranking strategy

1 Introduction

According to de Kunder, M. [1], the estimated size of web in 2021 is around 4.51 billion pages due to the Internet since its invention three decades. The text grows particularly at a fast pace, such as news articles, electronic books, scientific papers, etc. The purpose of text extractive summarization is to condense the input text and remain the core meaning of the input text.

In [2], Radev et al. also have defined the summary as follows: “A summary can be loosely defined as a text that is produced from one or more texts, that conveys important information in the original text(s), and that is no longer than half of the original text(s) and usually significantly less than that. Text here is used rather loosely and can refer to speech, multimedia documents, hypertext, etc.” As we know, text summarization is a process of creating the compressed or short version from the original document which reduces the invalid information and gets significant parts from the original document.

Automatic Text Summarization(ATS) can be divided into multiple categories depending on the factor that is used for comparison. According to the input size, ATS can be classified into multi-document or single-document summarization. The single-document summarization is to extract several sentences from single document while multi-document summarization is to extract a summary from a group of cluster documents. On the other hand, in light of the output type, ATS can be divided into generic and query methods. Generic methods do not require any query, since the method aims to cover all the information. Query methods aim to answer the query concisely. Moreover, in terms of applied methods, ATS can be categorized into extractive, abstractive and hybrid summarization. In abstractive summarization methods, the summary which consists of concepts taken from document contains the new sentences. By comparison, extractive approaches aim to select a subset of sentences in source document, thereby enjoying better fluency and efficiency [3]. The extractive summarization with the selection of sentences is closer to human-written summary. The hybrid summarization combines the extractive and abstractive approaches. Extractive summarization is more simple than abstractive and hybrid methods. EDCOSUM is the single-document, generic and extractive summarization. The structure of text extractive summarization is shown in Fig. 1.

Fig. 1

The structure of text extractive summarization.

Text extractive summarization field still needs extensive research efforts due to the generated summaries which are far away from human-written summaries. Therefore, in this paper, we propose a text extractive SUMmarization framework based on EDge information with COreference resolution(EDCOSUM) which objectively includes:

Representation of sentences in the form of a graph.

A new methodology based on word-level graph to obtain an extractive summarization.

The rest of this paper is organized as follows: Section 2 presents a literature review of text extractive summarization. Section 3 illustrates the proposed method – EDCOSUM. The used evaluation methods, datasets, experiments and results are discussed in Section 4. Finally, Section 5 concludes the paper.

2 Related work

Various summarization methods are proposed for extractive summarization in literature: linguistic and statistical-based, LSA, graph-based and reinforcement learning methods.

Linguistic and statistical-based methods: The linguistic and statistical-based methods utilize the widely-used statistical and linguistic features to identify the important words or sentences. The most important sentences are defined as the prominent position, the most frequent word which is contained in sentences, etc. The selection of statistical and linguistic features is decided to the weight of sentence [4]. The feature weight calculation is a strategy that obtains the score of each sentence [5]. Afsharizadeh et al. [6] utilize the statistical feature to achieve summary extraction. It does not require superabundant computation capacity and memory. However, it exists a problem of redundancy among sentences in summary. Meanwhile, the sentence which contains important information maybe not be included in summary due to the low scores. Therefore, statistical feature to extract summary is applied into EDCOSUM, such as keywords, bigrams, title feature, etc. Meanwhile, redundancy is taken into consideration.

Fig. 2

The example of coreference resolution.

Latent Semantic Analysis(LSA) methods: LSA is a commonly used semantic-based extractive ATS method. The sentence scoring algorithm includes the creating of an input matrix, the application of Singular Value Decomposition(SVD) to identify the relationship between words and sentences [7]. The improvement models of LSA contain Explicit Semantic Analysis(ESA) and Semantic Role Labelling(SRL). Mohamed et al. [8] propose a text extractive summarization on account of ESA and SRL in the Wikipedia knowledge dataset. The extractive summarization based on LSA can generate a semantic-related summary. However, the methods depend on the quality of semantic representation. We consider LSA as a part of sentence ranking strategy combining the semantic feature in order to rank sentences and achieve summarization.

Graph-based methods: The graph-based methods can be regarded as the sentence vertices ranking in graph. It considers a document as a graph in which sentences are vertices and the resemblance between a pair of sentences is the edge of vertices. The sentence ranking algorithm is essentially a way of deciding the importance of sentences within a graph based on information drawn from the graph structure [9]. The essence of GraphSum [10] is the rank strategy that discriminates between positive and negative term correlations. Dutta et al. [11] find the most informative sentences by means of the mapping of information clustering after a graphical representation of entire document. The betweenness centrality is utilized in summarization [12]. Mallick [13] proposes an extractive summarization method by modified PageRank. The graph-based methods enhance coherency and reduce the redundancy of summary. However, the approaches that consider the sentence as a node in graph neglect the diversity of words. Therefore, EDCOSUM adjusts the word weights to decide the importance of sentence.

Reinforcement learning methods: Reinforcement learning is to train an agent which can be interacted with a given environment to maximize a reward [14]. In extractive summarization approaches, the agent decides whether a unit is [15] utilize reinforcement learning into extractive summarization as sentence ranking. The summarization structure viewed as the agent is updated by the reinforcement algorithm. RNES [16] optimizes coherence and information of summary by reinforcement algorithm based on the pre-trained Neural Extractive Summarization(NES). BanditSum [17] regards text summarization as a contextual bandit(CB) problem. For agents, the rewards influence the actions. However, the model based on reinforcement learning is based on the domain which is related to the training data. EDCOSUM is a generic method for all domains.

Additionally, there are several mentions that refer to the same entity in text. Coreference resolution solves the problem of determining which refers to each real-word entity mentioned in document. Coreference resolution aims to enrich the semantic information and reduce the noun nodes which refer to the same entity. As a preprocessing, coreference resolution with rich semantic information has been successfully applied to plenty of downstream tasks. The process of coreference resolution is depicted in Fig. 2. Therefore, coreference resolution is taken into consideration in EDCOSUM.

3 Research methodology

In this paper, we try to present an extractive summarization approach based on edge information with coreference resolution. The structure of EDCOSUM is depicted in Fig. 3. The methodology contains text preprocessing, graph construction, the calculation of node weight, the candidate edge selection, the ranking of candidate summary.

3.1 Text preprocessing

The document from datasets that contains non-English characters needs to be preprocessed. As illustrated in Fig. 3, preprocessing contains 4 steps: (1) sentence segment. (2) word lemmatization. (3) non-English characters (4) word frequency calculation.

Fig. 3

The structure of EDCOSUM.

3.2 Graph construction

EDCOSUM models a document as a directed, unweighted graph to capture the sequence between words’ nodes. Given a document D = {S₁, S₂, . . . , S_n}, the directed unweighted graph G (V, E) of input document is built, where V is a set of nouns in D and E is a collection of edges between pair of vertices. The representation of a sentence is a path from beginning token "CLS#" to destination token "SEP#" based on the word sequence. We assume that contextual information can be retained by graph structure. Algorithm 1 illustrates the procedure of graph construction. The process starts with a document and produces a document graph over text spans(nouns) according to coreference resolution, node selection and edge construction. Graph construction of EDCOSUM represents the meaningful relation among words.

Algorithm 1
Graph Construction

Input: G, sentence _ dict

Output: Word Graph node sentence _ node

1: Given sentence_dict and G, construct its word graph G _ w

2: beginning node cls _ node = " CLS # "

3: sentence order edge _ order = 0

4: edge information edge _ label = " "

5: ending node sep _ node = " "

6: word graph node sentence _ node = []

7: #Step 1: coreference resolution for sentence _ dict

8: sentence _ dict _ core = coreference _ solution (sentence _ dict)

9: tag = pos _ tagging _ text (sentence _ dict _ core)

10: for all word ∈ tag do

11: if tag ∈ nouns then

12: word _ lemma = get _ word _ lemma (tag [word] [0])

13: sep _ node = word _ lemma

14: if tag [word] [0] not ∈ sentence _ node then

15: add _ node (G, word _ lemma)

16: add _ edge (cls _ node, sep _ node, newlinesentence, edge _ orde)

17: edge _ order + =1

18: cls _ node = word _ lemma

19: edge _ label = " "

20: else if tag ∈ non _ noun then

21: edge _ label = edge _ label + " " + tag [word] [0]

22: end if

23: end if

24: end for

25: return Word Graph node sentence _ node

Coreference Resolution identifies the mentions in the document that refer the same entity in real world. It helps EDCOSUM discover the central subject of document. We integrate the coreference resolution which discovers coreference chains with extractive summarization for extracting informative sentences.

Due to the simple network of NeuralCoref 1 , we choose it to find coreference chains. Firstly, coreference can be recognized through NeuralCoref. Subsequently, pronouns are replaced with corresponding nouns. It is dedicated to the construction of coreference resolution system without the utilization of complex models.

Node Selection EDCOSUM is based on the hypothesis that the sentence containing more nouns should be more important. Therefore, we utilize nouns in each sentence to construct a graph in text.

After coreference resolution, we screen the Part Of Speech(POS) tagging of words which is a feature of the node. This paper utilizes the special feature as a part of graph construction. We classify word tags into 2 types: (1) noun. (2) non-noun. If the tag is [NN, NNS, NNP, NNPS], the lemmatized word will be added to V. Otherwise, the word is regarded as a part of edge in E.

Edge Construction Every two vertices are linked by an edge labeled with all other non-noun words between these two consequent nouns in the parsed sentence. We transform the labeled edges into directed edges. To illustrate how the edges are constructed more clearly, the following sentences are selected as an example in Fig. 4, where rectangle shapes represent the nodes and arrows between every two nodes represent the directed edges.

Fig. 4
The illustration of constructing graph. Firstly, the noun vertices are selected into V. Subsequently, the noun vertices are connected through the sentence. For example, the sentence “A new security law in Malaysia came into force on Monday one that critics say gives the government broad unchecked powers. ”? contains the words “Malaysia”? and “Monday”?. The edge between “Malaysia”? and “Monday”? is “came into force on”?.

S0: A new security law in Malaysia came into force on Monday one that critics say gives the government broad unchecked powers.

S1: It allows for authorities to create a so-called security zone inside which arrests and seizures can happen without warrants.

S2: Some say the government could use this law to ward off political and legal challenges.
3.3 The calculation of node weight

Each word node has a word frequency in the input document. Above the calculation of word frequency, we find that the median word frequency is equal to 1 and represents the minimum word frequency. It is a matter for the selection of candidate node in the next step. To address the problem, we combine the text features with a bias to calculate the node weight. Previously, extractive summarization models incorporate a wide variety of features, such as keywords, proper nouns, bigrams, etc. Indeed, how these features work is not explicit for the node weight calculation. We aim to spot these features and identify the efficient features to adjust the importance of words in sentences. The features contain as follows:

Keywords Keywords serve as a dense summary of documents. Keywords, as immediate products, have been widely used in some domains such as online search engines and libraries to help to implement text categorization [18 –20], text clustering [21 –23], speech recognition [24, 25] and automatic text summarization [26]. Therefore, we utilize Yake 2 which is a lightweight unsupervised automatic keyword extraction method. Yake rests on text statistical features extracted from single documents to select the most important keywords of a text.

Proper Words Name Entity Recognition(NER) is to recognize the special entities including a person’s name, location name, institution name, etc. In general, the entity means the object of a document which is more essential than other words. Thus, we resort to Stanford NLTK 3 for NER as the proper words in documents.

Bigrams Bigrams are consistent with ROUGE-2 widely used for summarization evaluation [27]. Bigrams we defined are the phrases of an adjective and a noun by POS tagging.

Synonym Replacement Synonym Replacement based on the widely used knowledge base "Word-Net" [28] is essential for unifying identical terms and reducing nodes which have similar semantics.

As introduced in text information that contains keywords, bigrams, title and proper words, each word may be as a part of text essential features. Additionally, we define a bias list which is domain-specific for the special text. Each word in bias list can get an additional weight to reflect the importance. The text information with the bias calculation is utilized in node weight calculation, which is computed as: $\begin{matrix} Weight (noun) = word_frequence (noun) + W_{T} * {Av}_{T} \\ + W_{K} * {Av}_{K} + W_{B} * {Av}_{B} + W_{P} * {Av}_{P} + W_{Bias} \end{matrix}$ (1)

where word _ frequence (·) represents word frequency as the initial value. W_P, W_T, W_K, W_B are the word weights that words are in proper nouns, title, keywords and bigrams. However, the constant value of weight is not suitable for all documents. The manual setting value maybe cause bias for the downstream work. To solve the problem, we propose an adjusted value to reduce the bias. The process means that Av_T, Av_K, Av_P, Av_B are utilized for adjusting constant weight. The calculation of adjusted weight is computed as: ${Av}_{T} = Abs (AW - M (title))$ (2) ${Av}_{K} = Abs (AW - M (keywords))$ (3) ${Av}_{P} = Abs (AW - M (pronoun))$ (4) ${Av}_{B} = Abs (AW - M (bigram))$ (5) where AW represents the average value of word frequency. M (·) is the median function. Abs (·) is the absolute value function.

3.4 The candidate edge selection

Based on the calculation of node weight, the search algorithm is proposed for the candidate edge selection. We assume that the word node connected more edges is more essential for text summarization since the words frequently appear at the same time in document [29]. In addition, we deem that the long and short sentences are of capital importance. Not only is the weight of words considered, but the number of edges connected with nodes is not neglected as well. We propose an edge selection method which considers the number of edges connected to the node. The Equ(1) is changed into Equ(6). $Weight_edge (noun) = Weight (noun) + \sum {Num}_{edge}$ (6) where Weight _ edge (·) is a function which calculates the node weight containing the edge information. Num_edge represents the number of edges which is connected to the node. The process means the more edges the node connects, the more vital the node is.

The criterion of edge selection is to choose the maximum weight of nodes connected with more edges. Provided that a sentence at least has a selected edge, the sentence is possibly selected as the candidate summary. The candidate edge selection is achieved using Algorithm 2. Additionally, an example of candidate edge selection processing(the letters represent the noun in each sentence) is following:

S0: E-G-D-A-G-D-B-C

S1: L-G-J-D-B

S2: H-F-J-I-B-I-K

S3: L-J-B

Firstly, node information is added to the node. Specifically, the weight of “CLS#” is set to 0. Secondly, the node that has the maximum weight as the destination node is visited successively and the corresponding edge is selected.

Algorithm 2

Candidate Edge Selection

Input: G, node weight graph _ nodes _ weights

Output: candidate summary candidate _ summary

1: Given node weight graph _ nodes _ weights and G, obtain candidate summary candidate _ summary

2: candidate edge candidate _ edge _ list = []

3: # add the edge information into nodecandidate summary candidate _ summary

4: for node ∈ graph _ nodes _ weights do

5: if node ∈ G . nodes () then

6: graph _ nodes _ weights [node] + = calculate _ edge _ num (node)

7: end if

8: sorted _ weight _ list = sorted (graph _ nodes _ weights)

9: end for

10: for node ∈ sorted _ weight _ list do

11: if nodenotvisted then

12: visted (node)

13: out _ edge = G . edge (node, data = True)

14: out _ edge _ visited = []

15: max _ weight = 0

16: for out ∈ out _ edge do

17: if graph _ node _ weights [out [1]] > = max _ weight then

18: max _ weight = graph _ nodes _ weights [out [1]]

19: end if

20: out _ weight = graph _ nodes _ weights [out [1]]

21: ifout _ weight > = max _ weightandout _ weight > 0 then

22: sent = G . edge [out [0] , out [1]] [′sentence′]

23: tup = (out [0] , out [1] , {′sentence′ : sent, ′order′ : G . edges [out [0] , out [1]] [′order′]})

24: candidate _ edge _ list . add (tuple)

25: end if

26: end for

27: end if

28: end for

29: candidate _ summary = sentence (candidate _ edge _ list)

30: return candidate summary candidate _ summary

The node weight with edge information is shown in Table. 1. The process of candidate edge selection is depicted in Fig. 5. The candidate edges include L-G, G-D, D-B, B-C. Therefore, the candidate summary contains S0 and S1 in this example.

Table 1

The calculation of node weight

CLS#	A	B	C	D	E	F	G	H	I	J	K	L	SEP#
No edge information	0	1	5	3.5	4	1	3	2.5	2	2	3	2	4	0
Add edge information	0	2	8	4.5	7	2	4	5.5	3	4	5	3	5	3->0

Fig. 5

The processing of the candidate edge selection.

3.5 The ranking of candidate summary

The final summary is extracted from candidate summary due to sentence rank strategy, which follows the steps:

Step 1: define the ranking strategy.

Step 2: sort the sentence based on ranking strategy.

Step 3: apply the K-means clustering algorithm to obtain a set of clusters where similar sentences are grouped.

Step 4: select a sentence in each cluster to form the final summary.

The ranking strategy is a vitally essential part to rank candidate summary. As demonstrated in section 3.3, the word weight with edge information has contained the weight of title, proper nouns, keywords, bigrams and bias words. In this section, we propose two ranking strategies to deal with word weight:

The value of a sentence consists of the word weight in the sentence.

The topic word weight is added to sentence weight.

We suppose that the first-K sentences are more important than the following sentences because the lead-3 which selects the first-3 sentences as the final summary is a widely baseline method. The calculation of sentence weight is illustrated as: $sentence_value = \frac{{num}_{sentence} - {value}_{order}}{{num}_{sentence}}$ (7) where num_sentence indicates the sentence number of candidate summary. value_order is the order of sentence in document. Equ(7) represents that the order of sentence determines the sentence value.

To select the sentence as a summary, sentence and LSA ranking strategy are utilized which are shown in Equ(8) and Equ(9).

$\begin{matrix} Rank 1 (sentence) = sentence_value \\ + \sum_{n = 0}^{{len}_{s}} word_frequence (noun) \end{matrix}$ (8) $\begin{matrix} Rank 2 (sentence) = Rank 1 (sentence) \\ + LSA_topics_weight (sentence) \end{matrix}$ (9) where LSA _ topics _ weight (·) represents the weight of words which is listed in each topic using LSA for the sentence. Rank2 is based on Rank1 by adding topic word frequency in sentence. The topic words have corresponding weights in the generated matrix from LSA. After that, all words of different topics are grouped into a list "topic words".

$LSA_topics_weight (sentence) = \frac{\sum {Weight}_{LSA_topic}}{{maximum}_{LSA_weight}}$ (10) where Weight_{LSA_topic} represents weight of topic in sentence. $\underset{LSA_weight}{maximum}$ is the maximum value of topic weight.

4 Experiment and evaluation

We conduct several experiments to evaluate the effectiveness of different parts of EDCOSUM. Moreover, a set of comparable experiments are organized in order to assess the efficiency of the proposed framework. This section provides the evaluation of EDCOSUM compared with the state-of-the-art ATS systems.

4.1 Dataset

CNN/Daily Mail: The CNN/Daily Mail is the most widely used benchmark dataset for single document summarization. The dataset consists of online CNN and Daily Mail news articles for question/answering systems originally. We follow the dataset [30] which replaces the anonymized entities with their actual values and creates the non-anonymized version.

NEWSROOM [31]: This corpus, a dataset with 1.3 million news articles and human-written summaries, is the most recent large-scale dataset from different domains introduced to text summarization genuinely. We regard sources variety as a diversity of summarization styles.

4.2 Evaluation metrics

Following previous works [30 , 33], we evaluate CNN/Daily Mail and NEWSROOM on ROUGE which is an extensive-used automatic evaluation tool. It calculates the appropriate n-gram word-overlap between reference summary and system summary. ROUGE-1, ROUGE-2 and ROUGE-L are the most utilized measures in the literature. ROUGE-L calculates the longest common subsequence while ROUGE-N calculates N-gram between the system and reference summaries. For input documents with the summarization model, we use the value which is the average score of all the results as the final score for computing the overall ROUGE scores.

4.3 Comparison model

A set of classic and well-known summarization methods, such as lead-3, Oracle method, Relevance Measure, LSA, TextRank, LexRank and TextRank variation, is utilized to assess the proposed framework in CNN/Daily Mail and NEWSROOM. Besides, a set of state-of-the-art ATS systems have been selected for comparisons such as SumRunner, NEUSUM, REFRESH, VHTM, Exconsumm, SUMO and LATENT. The existed extractive summarization methods are selected for comparison with the proposed framework because they obtain the best result in the literature on CNN/Daily Mail which is the most common dataset for single-document summarization.

Lead-3: Lead-3 which represents the baseline method is to extract the first three sentences of document as the constructed summary.

Oracle method [34]: Oracle method which represents the upper bound is to extract the three longest sentences of document as extractive summary.

Relevance Measure(RM) [35]: RM strives to select sentences by standard IR method and LSA.

TextRank [36]: TextRank focuses on document graph constructed with the content resemblance among sentences [37].

LexRank [38]: LexRank makes full use of inter-sentence cosine similarity to build the connectivity graph.

LSA [39]: Text summarization based on LSA uses SVD to reduce the dimension of sentence vector to be included in the summary.

TextRank Varization [40]: The combination of TextRank with modern IR ranking function creates a robust method which connects edges with a new calculation method for automatic summarization.

SumRunner [32]: SumRunner is a recurrent neural network based on a seq2seq model for extractive summarization of documents.

REFRESH [15]: REFRESH explores the space of candidate summaries while learning to optimize a reward function which is relevant to the task.

Exconsumm [41]: Exconsum aims to obtain summary that first extracts sentences from a document and then compresses them.

LATENT [42]: LATENT proposes a latent variable extractive model and views labels of sentences in a document as binary latent variables.

SUMO [43]: SUMO provides a new perspective on extractive summarization, conceptualizing it as a tree induction problem.

VHTM [44]: VHTM takes advantage of a joint model that combines hierarchical topic-aware inference and summarization in an end-to-end manner.

NEUSUM [45]: NEUSUM utilizes an end-to-end neural network framework to make sentence scoring and sentence selection.

4.4 Result

In order to evaluate EDCOSUM fully, we designed two parallel experiments. The first one is conducted with CNN/Daily Mail. We further add the second experiment to assess the performance of the proposed method on NEWSROOM.

4.4.1 Experiment 1: CNN/Daily Mail

We try to find the discipline concerning coreference resolution, edge information and ranking strategy for CNN/Daily Mail. Moreover, to evaluate the effectiveness of EDCOSUM, we compare EDCOSUM to state-of-the-art methods.

Fig. 6 illustrates the effectiveness of edge information. The experiment result demonstrates that adding edge information has a positive influence on extractive summarization. Fig. 7 shows the results of ranking strategy and the importance of coreference resolution. From Fig. 7, the result has an improvement with the coreference resolution compared to the method without the coreference resolution.

Fig. 6

The result of methods by adding edge information on CNN/Daily Mail.

Fig. 7

The result of methods by coreference resolution and LSA ranking strategy on CNN/Daily Mail.

According to the above experiments, we configure the proposed method with coreference resolution, edge information and sentence ranking strategy to achieve the best performance. To evaluate the effectiveness of the proposed method, we have compared the results with state-of-the-art text summarization methods. Table 2 represents the comparison among the various methods of text extractive summarization, which are evaluated on CNN/Daily Mail for F-measure. As shown in Table 2, EDCOSUM_LSA represents the proposed method with coreference resolution, edge information in graph and LSA ranking strategy while EDCOSUM is the proposed method with coreference resolution, edge information and sentence ranking strategy. The best results in this set of experiments are in boldface.

Table 2

Comparison among the state-of-the-art extractive summarization methods for F-measure on CNN/Daily Mail

Models	ROUGE-1	ROUGE-2	ROUGE-L
Lead-3	40.34	17.62	36.57
Oracle [34]	52.59	31.24	48.87

RM [35]	35.50	13.00	17.30
TextRank [36]	38.30	14.50	19.60
LexRank [38]	38.50	14.00	20.70
LSA [39]	34.40	12.20	17.00
TextRank varization [40]	39.10	15.00	19.20
SumRunner [32]	39.60	16.20	35.20
Exconsumm [41]	41.70	18.60	37.80
REFRESH [15]	40.00	18.20	36.60
LATENT [42]	41.00	18.70	37.50
SUMO [43]	41.00	18.40	37.20
VHTM [44]	40.50	18.00	37.10
Neusum [45]	41.59	19.01	37.98
EDCOSUM_LSA	41.39	18.94	37.52
EDCOSUM	41.86	19.38	38.90

Overall, empirical results illustrate that the proposed summarization framework achieves the best performance in terms of both ROUGE-1, ROUGE-2 and ROUGE-L on CNN/Dail Mail. Compared to the sentence-level graph(TextRank, LexRank, TextRank Variation), there is also a rise of almost 9% in EDCOSUM. We consider the proposed framework is better than sentence-level graph methods for text summarization due to the word-level graph. Moreover, there is an improvement compared to the SUMO. We reckon on the word-level graph is better than the tree structure. Compared to the simple machine learning methods(REFRESH, LATENT, NEUSUM, Exconsum), the proposed method takes first place with an improvement(nearly 2%). We deem that coreference resolution could yield positive impacts on the proposed method. Therefore, it can be proved that word-level graph construction and coreference resolution has a positive influence on proposed method. We consider the improved result due to the reduction of reference relation.

However, it’s amazing that the best result is the method with coreference resolution and sentence ranking strategy compared to others. Therefore, the sentence ranking strategy has a positive influence on text extractive summarization. Although the proposed method with LSA ranking strategy has a negative result compared to one with sentence ranking strategy, it is better than other state-of-the-art approaches. We guess that the LSA ranking strategy overlapping on the statistical feature reduces the summarization result.

In conclusion, the coreference resolution is beneficial for the text summarization based on word-level graph. Moreover, the ranking strategy positively influences extractive summarization results. Finally, adding edge information in graph is helpful for summarization.

4.4.2 Experiment 2: NEWSROOM

NEWSROOM is a large-scale summarization dataset. We try to evaluate the large-scale dataset by using EDCOSUM. Moreover, to investigate the importance of each part of EDCOSUM, we choose NEWSROOM dataset to conduct experiments.

Fig. 8 shows the comparison of the proposed method with or without edge information. The results of ranking strategy and coreference resolution are depicted in Fig. 9. The experiments prove that the proposed method with coreference resolution, edge information and sentence ranking strategy is the best one than other methods. The conclusion is the same as that of above experiments.

Fig. 8

The result of methods by adding edge information on NEWSROOM.

Fig. 9

The result of methods by coreference resolution and LSA ranking strategy on NEWSROOM

In addition, we try to prove that the proposed method can be utilized in a large-scale dataset. On the basis of above experiments, Table 3 depicts the comparison among state-of-the-art methods, calculating the average F-measure value evaluated on NEWSROOM. As shown in Table 3, the proposed method still has a positive influence on NEWSROOM which is better than a majority of mentioned methods. We reckon on EDCOSUM which can be utilized in NEWSROOM relies on the word-level graph.

Table 3

Comparison between the lead-3 and Oracle on NEWSROOM

ROUGE-1	ROUGE-2	ROUGE-L
Lead-3	53.1	49.0	52.4
Oracle [34]	68.1	64.5	67.3
SummaRunner [32]	48.96	44.33	49.57
Exconsumm [41]	68.4	62.9	67.3
EDCOSUM_LSA	53.24	48.23	56.78
EDCOSUM	54.54	49.24	57.89

5 Conclusion

As a significant and fashionable graph theory implement, text extractive summarization based on graph contributes to a great diversity of work. The main contribution of this paper constructing word-level graph framework addresses the generic single-document extractive summarization problem. In addition, coreference resolution is considered in EDCOSUM. The edge information is added to word-level graph where the elementary unit is a word. The vertices are combined with statistical features as word weight. The two ranking strategies are proposed in EDCOSUM.

The experiments with two datasets(CNN/Daily Mail and NEWSROOM) show that the result of EDCOSUM combining the coreference resolution and edge information has a significantly positive impact compared with the state-of-the-art extractive summarization methods. The comparison experiments show that the method with coreference resolution, edge information and sentence ranking strategy has a better result. Although the proposed method with LSA ranking strategy is not better than one with sentence ranking strategy, its result has an improvement compared to almost state-of-the-art methods. Additionally, experiments exhibit that the proposed method can be utilized in a large-scale dataset. It may be possible to improve result by considering the word-level feature for a document. Both experiments demonstrate that the proposed method EDCOSUM is better than baseline models.

Footnotes

Acknowledgments

This work is partially supported by Joint Fund of Science & Technology Department of Liaoning Province and State Key Laboratory of Robotics, China (2020-KF-12-11), Fundamental Research Funds for the Central Universities N181706001, N2017009, N2017008, N182608003, N181703005), National Natural Science Foundation of China (61902057).

NeuralCoref:

Yake:

NLTK:

References

de Kunder

, Daily estimated size of the world wide web. https://www.worldwidewebsize.com/.

Radev

D.R.

, Hovy

and McKeown

, Introduction to the special issue on summarization, Computational Linguistics 28(4) (2002), 399–408.

Cao

, Li

and Wei

, Improving multi-documentsummarization via text classification, In Proceedings of theAAAI Conference on Artificial Intelligence 31(1) (2017), 3053–3059.

Gupta

and Lehal

G.S.

, A survey of text summarization extractivetechniques, Journal of Emerging Technologies in WebIntelligence 2(3) (2010), 258–268.

Gambhir

and Gupta

, Recent automatic text summarizationtechniques: a survey, Artificial Intelligence Review 47(1) (2017), 1–66.

Afsharizadeh

, Ebrahimpour-Komleh

and Bagheri

, Query-oriented text summarization using sentence extraction technique, In 2018 4th international conference on web research (ICWR), pages 128–132. IEEE, (2018).

Al-Sabahi

, Zhang

, Long

and Alwesabi

, An enhanced latentsemantic analysis approach for arabic document summarization, Arabian Journal for Science and Engineering 43(12) (2018), 8079–8094.

Mohamed

and Oussalah

, Srl-esa-textsum: A text summarizationapproach based on semantic role labeling and explicit semanticanalysis, Information Processing & Management 56(4) (2019), 1356–1372.

Mihalcea

, Graph-based ranking algorithms for sentence extraction, applied to text summarization, In Proceedings of the ACL interactive poster and demonstration sessions, (2004), 170–173.

10.

Baralis

, Cagliero

, Mahoto

and Fiori

, Graphsum: Discovering correlations among multiple terms for graph-basedsummarization, Information Sciences 249 (2013), 96–109.

11.

Dutta

, Das

A.K.

, Mallick

, Sarkar

and Das

A.K.

, A graph based approach on extractive summarization, In Emerging Technologies in Data Mining and Information Security, 179–187. Springer, (2019).

12.

De la Pena Sarracen

G.L.

and Rosso

, Automatic text summarization based on betweenness centrality, In Proceedings of the 5th Spanish conference on information retrieval, (2018), 1–4.

13.

Mallick

, Das

A.K.

, Dutta

, Kumar Das

and Sarkar

, Graph-based text summarization summarization using modified textrank, In Soft computing in data analytics 137–146. Springer, (2019).

14.

Paulus

, Xiong

and Socher

, Adeep reinforced model for abstractive summarization, In International Conference on Learning Representations, (2018).

15.

Narayan

, Cohen

S.B.

and Lapata

, Ranking sentences for extractive summarization with reinforcement learning, In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1747–1759, (2018).

16.

and Hu

, Learning to extract coherent summary via deep reinforcement learning, In Thirty-Second AAAI Conference on Artificial Intelligence (2018), 5602–5609.

17.

Dong

, Shen

, Crawford

, van Hoof

and Cheung

J.C.K.

, Banditsum: Extractive summarization as a contextual bandit, In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (2018), 3739–3748.

18.

Sakakibara

and Misue

, Building of a document classfication tree by recursive optimization of keyword selection function, (1995). US Patent 5,463,773.

19.

McCallum

and Nigam

, Text classification by bootstrapping with keywords, em and shrinkage, In Unsupervised Learning in Natural Language Processing, (1999).

20.

Zhang

, Zincir-Heywood

and Milios

, Narrative text classification for automatic key phrase extraction in web document corpora, In Proceedings of the 7th annual ACM international workshop on Web information and data management, (2005), 51–58.

21.

Beil

, Ester

and Xu

, Frequent termbased text clustering, In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, (2002), 436–442.

22.

Steinbach

, Karypis

and Kumar

, A comparison of document clustering techniques. (2000).

23.

Tonella

, Ricca

, Pianta

and Girardi

, Using keyword extraction for web site clustering, In Fifth IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture. Proceedings., pages 41–48. IEEE, (2003).

24.

Rose

R.C.

and Paul

D.B.

, A hidden markov model based keyword recognition system, In International Conference on Acoustics, Speech, and Signal Processing, 129–132. IEEE, (1990).

25.

Wilpon

J.G.

, Rabiner

L.R.

, Lee

C.-H.

and Goldman

E.R.

, Automaticrecognition of keywords in unconstrained speech using hidden markovmodels, IEEE Transactions on Acoustics, Speech, and SignalProcessing 38(11) (1990), 1870–1878.

26.

Liu

Z.Y.

, Research on keyword extraction using document topicalstructure, New Technology of Library and Information Service 9 (2013), 30–34.

27.

Alsharman

N.M.

and Pivkina

I.V.

, Generating summaries throughunigram and bigram: Text summarization, International Journalof Information Technology and Web Engineering (IJITWE) 15(1) (2020), 64–74.

28.

Miller

G.A.

, Wordnet: a lexical database for english, Communications of the ACM 38(11) (1995), 39–41.

29.

Perc

, The matthew effect in empirical data, Journal of TheRoyal Society Interface 11(98) (2014), 20140378.

30.

See

, Liu

P.J.

and Manning

C.D.

, Get to the point: Summarization with pointer-generator networks, In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (2017), 1073–1083.

31.

Grusky

, Naaman

and Artzi

, Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies, In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages (2018), 708–719.

32.

Nallapati

, Zhai

and Zhou

, Summarunner: a recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (2017), 3075–3081.

33.

Chen

Y.-C.

and Bansal

, Fast abstractive summarization with reinforce-selected sentence rewriting, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (2018), 675–686.

34.

Liu

and Lapata

, Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (2019), 3730–3740.

35.

Gong

and Liu

, Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19–25, (2001).

36.

Mihalcea

and Tarau

, Textrank: Bringing order into text, In Proceedings of the 2004 conference on empirical methods in natural language processing, (2004), 404–411.

37.

Page

, Brin

, Motwani

and Winograd

, The pagerank citation ranking: Bringing order to the web, Technical report, Stanford InfoLab, (1999).

38.

Erkan

and Radev

D.R.

, Lexrank: Graph-based lexical centrality assalience in text summarization, Journal of ArtificialIntelligence Research 22 (2004), 457–479.

39.

Ozsoy

M.G.

, Alpaslan

F.N.

and Cicekli

, Text summarization usinglatent semantic analysis, Journal of Information Science 37(4) (2011), 405–417.

40.

Barrios

, Lopez

, Argerich

and Wachenchauzer

, Variations of the similarity function of textrank for automated summarization, In Argentine Symposium on Artificial Intelligence (ASAI2015)-JAIIO 44(Rosario,2015), (2015).

41.

Mendes

, Narayan

, Miranda

, Marinho

, Martins

A.F.T.

and Cohen

S.B.

, Jointly extracting and compressing documents with summary state representations, In Proceedings of NAACL-HLT, (2019), 3955–3966.

42.

Zhang

, Lapata

, Wei

and Zhou

, Neural latent extractive document summarization, In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018), 779–784.

43.

Liu

, Titov

and Lapata

, Single document summarization as tree induction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), (2019), 1745–1755.

44.

, Wang

, Zhang

, Wei

and Yang

, Document summarizationwith vhtm: Variational hierarchical topic-aware mechanism, In Proceedings of the AAAI Conference on Artificial Intelligence 34(5) (2020), 7740–7747.

45.

Zhou

, Yang

, Wei

, Huang

, Zhou

and Zhao

, Neural document summarization by jointly learning to score and select sentences, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 654–663, (2018).