Determining the importance of sentence position for automatic text summarization

Abstract

The methods of Automatic Extractive Summarization (AES) uses the features of the sentences of the original text to extract the most important information that will be considered in summary. It is known that the first sentences of the text are more relevant than the rest of the text (this heuristic is called baseline), so the position of the sentence (in reverse order) is used to determine its relevance, which means that the last sentences have practically no possibility of being selected. In this paper, we present a way to soften the importance of sentences according to the position. The comprehensive tests were done on one of the best AES methods using the bag of words and n-grams models with the with DUC02 and DUC01 data sets to determine the importance of sentences.

Keywords

Automatic Text Summarization n-gram Model bag of words model slope calculation genetic algorithm

1 Introduction

Currently, information is exponentially growing and thus, the necessary time available for processing. Therefore, it is essential to have methods that allow Automatic Extractive Summarization (AES). The purpose of the methods AES is to generate summaries more similar to those generated by the human. Presently, summaries can be used in different areas. There are employed to summarize information, for example, for videos [1], newspapers [2 –4], scientific papers [5] and social networks as Twitter [6, 7] or blog [8], where information rapidly changes and technologies are required to access real-time information represented in reduced form.

According to Ladda Saunmali [9], the purpose of the text summary is to present the most important information in a shorter version of the original text, maintaining its main content and helping the user to quickly understand the large volume of information. According to Alfonseca, Berker, Da Cunha Fanego among others [9 –17], the summaries are classified according to their strategy of condensation in abstractive and extractive summaries. The abstractive summaries are those summaries generated from understanding the document and describe the content with words or sentences that sometimes are not in the original text. Instead, extractive summaries are generated from the selection of key phrases, sentences, or paragraphs considered essential for the original text; so, they do not require the understanding of the document.

Among the methods proposed for AES are those that need a large number of language resources [18 –23], so they have a high dependence on language or require sophisticated processes to generate a summary. There are also methods that only use the structure and distribution of the original text, so they are less dependent on language [2 , 24–29]. The language-dependent methods may show better results than language-independent ones. However, research in language-independent methods has grown because of its possibility to apply to a wide range of languages. In this study, only language-independent extractive methods are considered.

The methods for the AES consider the structure and distribution of sentences to select the most important [2, 28]. The two most used characteristics are: the frequency of terms and sentence position in the text.

The frequency of terms depends on the text model used, for example, the bag of words or n-grams. The bag of word model is easily extracted since only the different words of the document are extracted. However, its terms (words) tend to lose their meaning. The n-gram model considers the document as a set of fixed-length sequence terms, which allows it to maintain the meaning of its terms (n-grams).

According to state-of-the-art methods, the importance of sentences can be determined according to their position in the original text, so this feature is one of the most used in the investigation of AES. The state-of-the-art method considers the hypothesis that from a text with n sentences, the sentence i is weighted as i/n (where i is considered in reverse order). For example, in a small text with 50 sentences, the first sentence is worth 1, while the last sentence is worth 0.02, this makes the last sentence have a very low possibility of appearing in summary, even when the sentence has other important features such as high similarity with the title, a great length, frequent terms, containing numerical data or names own among others. Some papers already propose other ways to calculate the position of sentences, as in [28] which attempts to soften the importance of sentences according to the position by using $\sqrt{1 / j}$ , where _j is the position of the sentence (without to invest).

Recently in the article by García-Hernández & Ledeneva [2] a new formula has been used to soften the relevance of sentences according to the position; so that of a document with n sentences, the sentence _i will have the weighting t (i - x) + x, where x = 1 + (n - 1)/2, and m = slope ∈ [0, …, -1]. The research [28] and [2] have reported some of the best results for the AES with the DUC-2002 data set. However, in both works, they only show results obtained with this data set in a single experiment. Therefore, more than one test must be done in different data sets to guarantee that the results are robust and reliable since the genetic algorithms are random. It is also known that the text model n-grams can help to obtain better results [30] so in this paper, the method proposed by [2] is evaluated to determine the importance of sentences according to their position by the text models: a bag of words and n-grams. Also, adjustments are made to the parameters of the genetic algorithm used in [2], and a way to determine the population size of the genetic algorithm is proposed according to the size of the input document. Finally, different models proposed in the state-of-the-art are tested to calculate the importance of sentences and compared against the proposed model in this paper.

2 Background

The methods for GARE using genetic algorithms consider significant the features of the text to determine the importance of sentences. Among the most used are:

Sentences position. According [31], the relevant information in a document, regardless of its domain tends to be found in certain sections such as titles, headings, the leading sentences of paragraphs, the opening paragraphs and others. Many methods of state-of-the-art calculate the importance of sentences considering the hypothesis that the first sentences are more important than the others. The researches that consider the sentence position to construct summaries are: [9–11 , 32–41].

Title similarity. It is assumed that the titles contain the main concepts of the document. So, if the sentence contains more words in common with the title of the document, it is assumed to be related to the main theme of the document. The researches that consider the title similarity to construct summaries are: [9 , 38–41].

Sentences length. According to [42], some studies have concluded that the shortest sentences of a document ought to be less likely to appear in the document summary. So that longer sentences are better than shorter sentences. The researches that consider the sentence length to construct summaries are: [9–11 , 41].

Frequency of terms. It is important because terms that frequently appear in a document are related to the same topic. Many of the researches try to consider the frequency of the terms with the restriction that the summary is not repetitive, which they also call coverage. The researches that consider the frequency of terms to construct summaries are: [9 , 40].

Reference/cue phrases. This feature is used to determine the importance of a word for the document. The sentences contain key phrases that contain important definitions that are to be added in the important class category [43]. The researches that consider the reference/cue phrases to construct summaries are: [11 , 41].

Proper names. Usually, the sentence that contains more proper nouns is an important one, and it is most probably included in the document summary. The researches that consider the proper names to construct summaries are: [9 , 41].

Numerical data. A sentence containing numeric data is considered important and is more likely to appear in the summary of the document. The researches that consider the numerical data to construct summaries are: [9 , 40].

Centrality. The centrality of the sentence implies its similarity to other sentences [44]. This feature is an indication of the importance of a sentence for a document when the entire document content is considered. The researches that consider the centrality to construct summaries are: [11 , 41].

Similarity to a query. This regular feature is used when a custom summary is generated. It gives greater importance to the words entered in a query made by the user. The researches that consider the similarity to a query to construct summaries are: [10 , 35].

Trigger words. Refers to words that are not related to the central document, but indicate that the phrase may contain important information and should be part of the summary such as: “important", “essential” or “to conclude" [43]. One of the main drawbacks of the use of trigger words is the dependence of both the domain and the language. The researches that consider the trigger words to construct summaries are: [20 , 39].

Cohesion. It is the degree of relationship that the sentences that make up the summary have [13]. Determine the degree of coupling between sentences. The researches that consider the cohesion to construct summaries are: [13 , 41].

Similarity with Snippets. The similarity between the sentence and given snippets is measured. Generally, the snippets are provided by some data set or by the user. The researches that consider the similarity with snippets to construct summaries are: [13 , 35].

Sentence to sentence similarity. It is used to compute the similarity between each sentence (S) and every other sentence. The researches that consider the sentence to sentence similarity to construct summaries are: [9, 40].

Format. in this feature, the format of the text is considered. The researches that consider the format to construct summaries are: [20, 39].

In addition to the features mentioned above, there are some others that are little used as, sentiment and similarity with the first sentence [32], word length, polysyllabic words and occurrence of nouns [34] pronoun, adjective, weekday/month and quotation [35], indicator of main concepts and occurrence of non-essential information [41], conjunctions [36] and synonym links [11]. The features used in the AES are approximately 28, of which the most used is the sentence position.

3 Related work

The method proposed by [2] is one of those that have obtained the best results. It is done through a genetic algorithm and uses the bag of word model. The fitness function takes two main features, which are mentioned below:

–Position of sentences. The first sentences are more important; the first sentences of a text are considered as candidates to be part of the summary. One of the ideas to give more importance to the first sentences would be to consider the first sentence with importance xn. The second with an importance xn-1, until the last one would have importance of 1, but this could be very drastic, because of a text of 30 sentences, it would seem that the first sentence is 30 times more important than the last but could correspond to the conclusions and this would have no possibility of appearing in the summary. It was proposed to soften the importance of sentences. For this, the general equation of a straight line with slope m was used. The slope indicates the importance that is given to the first sentences or last sentences, if negative the first sentences are more important, (–1) means that it descends to the right at a 45-degree angle, zero means that all sentences. Have the same importance and positive that the last sentences have more importance (1) means that it rises to the right at a 45-degree angle.

For a text with n sentences, if sentence i is selected for the summary (This is the chromosome |c_i| = 1) then its relevance is defined as m (i - x) + x, where x = 1 + (n - 1)/2 and t is the slope for discovering. In order to normalize the sentence position measure (δ), it is calculated the relevance of the first k sentences, where k is the number of selected sentences.

Then the formula to calculate the importance of the first sentences would be as follows: $δ = \frac{\sum_{| C_{i} | = 1}^{n} m (i - x) + x}{\sum_{j = 1}^{k} m (j - x) + 1}, x = 1 + \frac{(n - 1)}{2} .$ (1)

Frequency of terms. Evaluate that the abstract has different ideas, that is to say, that it is not repetitive, but at the same time, it has important words (precision-recall). To measure both, the fitness function sums up the frequencies of the words that have the summary {word ∈ S}. In order to ponder how significant, the words were, the same is done, but considering the original text frequency (w,T), in this case only the most frequent words are counted according to the number of maximum words m. This weighting is precision-recall. $β = \frac{\sum_{p = {word ɛ S}}^{m} frecuency (p, T)}{\sum_{q = {word ɛ T}}^{m} frecuency (q, T)}$ (2)

Finally, to obtain the value of the fitness function, the following formula is applied: $fitness = β * δ$ (3)

4 Proposer method

As shown in section 2, different features can be implemented in a method for AES. However, it has been shown that despite using more than two, it does not guarantee that the results will improve [28, 45]. Therefore, in this work, the method proposed by García-Hernández & Ledeneva (2013) is considered, which has shown to have competitive results about other researchers presented up to now in the state-of-the-art.

Next, we explain the proposed genetic algorithm and the modifications made to the method of [2] to perform an improvement based on the positional characteristic of the sentences.

Text model and preprocessing. Before using the text in the genetic algorithm, it is necessary to go through a preprocessing, in which the text is given the appropriate format. The preprocessing applied consists of lexical analysis, stop words removal, and finally, the application of well-known Porter Stemmer. In the lexical analysis, a cleaning of symbols and placement of labels to number and emails was made to differentiate them from the words. The elimination of stop words was done to eliminate the words that are not relevant and that do not provide information — finally, the application of stemming from normalizing the text.

According to Romyna Montiel [46], the models of representation of texts are a technique that is based on the extraction of the terms of a text or document. Text modeling consists of selecting the terms to be extracted and converting them into a pattern that can be analyzed later. The difference between models is the type of term that is extracted from the document. In this paper uses the bag of words and n-grams models. For the n-grams model the value of n = 2, 3, 4 and 5.

Chromosome Encoding. The representation of the individual depends on the problem that must be solved. According to [9], there are two types of representation, floating point and bit string (0 and 1). For this problem of automatic generation of summaries, we use the binary representation. Each gene will represent a sentence of the original text. If the bit is 1, it means that the sentence was part of the summary; otherwise, the sentence has not been selected.

if bit = 1, selected sentence and if = 0, unselected sentence

S1	S2	S3	S4	S5	S6	S7
1	1	0	1	1	0	0

Sentences 1,2,4 and 5 are selected

Initial Population. Once the representation of the individuals has been determined, the initial population should be generated, considering some parameters of the AES. Especially, the number of words (l) that the summary should have.

Size of the initial population. The size of the initial population was determined according to the size of the entry document. It is known that an individual has as many genes as sentences have the text. To automate the calculation of the size of the initial population, it was defined that the number of individuals (I) in a generation is double the number of sentences (S) (see Equation 4). $I = \sum_{i = 1}^{n} S_{i} (2)$ (4)

Fitness function. The fitness function implemented in this paper is based on the proposal in the work of [2], where it is proposed to soften the position of the sentences by calculating the slope of the line (see Fig. 1).

Fig. 1

Graphical representation of the slope value of the line.

The values of the slope considered to calculate the importance of the sentences were m = -0.25, m = -0.3, m = -0.375, m = -0.45, m = -0.5, m = -0.55, m = -0.6, m = -0.625, m = -0.65, m = -0.65, m = -0.7, m = -0.75, m = -0.8., m = -0.85 y m = -0.9. These values are taken at random.

Parent selection. In this module, the individuals of the population are selected according to their aptitude. The type of selection implemented in this research is roulette. With the selection by roulette is intended to give more likely to be selected to the strongest individuals (with higher fitness function) and less to the weak.

Crossover. The type of cross that is used is cross with a priority of genes, is based on the work of [2]. It is a type of specialized cross for the automatic generation of summaries because this task usually has as a parameter to fulfill a certain number of words, so common crosses cannot be applied. Therefore, to create the new chromosome is created choosing from both parents the genes randomly but consider only those with value 1. In this way, if a gene has a value of 1 in both parents, it has more probability of being selected for the child chromosome. Each time a gene in the child chromosome is selected, the minimum number of words for the summary is reviewed.

Mutation. The type of mutation that is used is the mutation by double inversion, based on the work of [2], just as this type of mutation crosses, consider the minimum number of words that the summary should have. The mutation by double inversion consists of the invert operator is applied twice to the child chromosome, but the first time only the genes with value 1 are considering for invert the value; in the second time, only the genes with value 0 are considering for invert the value. After that, the number of words in the summary is reviewed it, if the numbers of words do not have the number of words specified by the user, another gene with value 0 is inverted, this process continues until the number of words specified by the user is satisfied.

Stop condition. The stop condition refers to the condition that must be met in order for the algorithm to stop evolving and to present the best solution found. In this research, the Equation 5, proposed by [45]. Where the genetic algorithm runs, it reaches the maximum number of generations for each document, which depends on its number of sentences (NS) and the number base (NG). $maxGenerations = \sqrt{4 * NG * NS}$ (5)

5 Experimentation and results

This section describes the experiments and the results obtained with the proposed method. As mentioned, the method is based on a genetic algorithm. The results of the genetic algorithms vary in each execution, so for this investigation, two executions are made for each experiment, and the average of said experiments is presented. In this paper, we use the data set provided in DUC 1 .

5.1 Data sets

The experiments were performed with DUC01 and DUC02 data sets. A description of each of them is shown in Table 1.

Table 1
Description of the data sets

DUC01 DUC02

Clusters 30 59

Documents 309 567

Task Task 1 Task 1

Summary length 100 words 100 words

	DUC01	DUC02
Clusters	30	59
Documents	309	567
Task	Task 1	Task 1
Summary length	100 words	100 words

5.2 Evaluation procedure

For the evaluation, the ROUGE tool is used which is an automatic system for the evaluation of summaries, proposed by Lin [45], which has the ability to measure the similarity and determine the quality of an automatic summary compared to the one created by a human. Our evaluation is done using n-gram (1, 1) setting of ROUGE, which was found to have the highest correlation with human judgments, namely, at a confidence level of 95%. ROUGE evaluates the f-measure that is a balance (not an average) of recall and precision results. The results are presented for ROUGE-1 and ROUGE-2 metrics to 100 words.

5.3 Text model

The text models that were used to carry out the experimentation are bag words and n-grams with n = 1, 2,3,4, and 5. The results obtained for each of them are described below with DUC01 and DUC02 data sets. The value of f-measure is presented with ROUGE-1.

Table 2 shows the results for different values of the slope (m) for the text models, bag words, and n-grams. From the results obtained for DUC01, the best result is with the bag words model with 0.45253. Based on the results obtained, it can be concluded that for this data set the first sentences are more important than those of the rest of the documents since the value of the slope on which the best results were obtained oscillates between –0.7 and 0.9.

Table 2
Results of f-measure for DUC01 with ROUGE-1

Value on (m) Bag words n-grams

n = 2 n = 3 n = 4 n = 5

–0.25 0.43008 0.44135 0.44156 0.44430 0.44155

–0.3 0.43608 0.44388 0.43936 0.44308 0.44255

–0.375 0.43615 0.44176 0.43974 0.44240 0.44287

–0.45 0.43719 0.44494 0.44137 0.44374 0.44366

–0.5 0.43721 0.44452 0.44542 0.44448 0.44161

–0.55 0.43821 0.44738 0.44549 0.44285 0.44313

–0.6 0.43826 0.44813 0.44464 0.44338 0.44217

–0.625 0.43901 0.44728 0.44501 0.44329 0.44265

–0.65 0.43961 0.44840 0.44548 0.44328 0.44351

–0.7 0.44067 0.45032 0.44555 0.44400 0.44241

–0.75 0.44211 0.44653 0.44506 0.44562 0.44310

–0.8 0.44474 0.44757 0.44354 0.44272 0.44860

–0.85 0.44481 0.44823 0.44393 0.44440 0.44256

–0.9 0.45253 0.45095 0.44415 0.44361 0.44307

Value on (m)	Bag words	n-grams
–0.25	0.43008	0.44135	0.44156	0.44430	0.44155
–0.3	0.43608	0.44388	0.43936	0.44308	0.44255
–0.375	0.43615	0.44176	0.43974	0.44240	0.44287
–0.45	0.43719	0.44494	0.44137	0.44374	0.44366
–0.5	0.43721	0.44452	0.44542	0.44448	0.44161
–0.55	0.43821	0.44738	0.44549	0.44285	0.44313
–0.6	0.43826	0.44813	0.44464	0.44338	0.44217
–0.625	0.43901	0.44728	0.44501	0.44329	0.44265
–0.65	0.43961	0.44840	0.44548	0.44328	0.44351
–0.7	0.44067	0.45032	0.44555	0.44400	0.44241
–0.75	0.44211	0.44653	0.44506	0.44562	0.44310
–0.8	0.44474	0.44757	0.44354	0.44272	0.44860
–0.85	0.44481	0.44823	0.44393	0.44440	0.44256
–0.9	0.45253	0.45095	0.44415	0.44361	0.44307

In Fig. 2, the best results are shown by the text models, from which for the DUC01, bag words, and n-grams (with n = 2) is the one that obtains the best results. It can also be seen that the trend line shows that the larger the value of n, the lower the value of f-measure.

Fig. 2

Graph of the best result by text model with DUC01.

In the same way, the experiment was done for the DUC02 data set. The results are shown in Table 3. For DUC02, the best text model is bag words with an f-measure of 0.48183. Based on the results obtained it can be concluded that for this data set the first sentences are more important than those of the rest of the documents since the value of the slope in which the best results were obtained ranges from –0.75 and 0.9.

Table 3

Results of f-measure for DUC02 with ROUGE-1

Value on (m)	Bag words	n-grams
		n = 2	n = 3	n = 4	n = 5
–0.25	0.47648	0.47101	0.47527	0.47677	0.47675
–0.3	0.47689	0.47246	0.47637	0.47641	0.47820
–0.375	0.47828	0.47669	0.47612	0.47663	0.47833
–0.45	0.47755	0.47681	0.47664	0.47704	0.47716
–0.5	0.47912	0.47804	0.47756	0.47601	0.47851
–0.55	0.47980	0.47693	0.47782	0.47659	0.47661
–0.6	0.47846	0.47721	0.47748	0.47683	0.47868
–0.625	0.47838	0.47556	0.47771	0.47612	0.47805
–0.65	0.47818	0.47670	0.47818	0.47717	0.47745
–0.7	0.47858	0.47772	0.47823	0.47641	0.47779
–0.75	0.48183	0.47853	0.47787	0.47628	0.47732
–0.8	0.48032	0.47894	0.47814	0.47721	0.47910
–0.85	0.48029	0.47731	0.47809	0.47786	0.47816
–0.9	0.47984	0.47844	0.47841	0.47767	0.47727

For DUC02, it is clearer to see that the trend towards the best results is between the text models bag of words and n-gram with n = 2 (see Fig. 3).

Fig. 3

Graph of the best result by text model with DUC02.

5.4 Comparison to models state of the art

The proposed method was tested with the different models proposed in the state-of-the-art (MSA) to calculate the importance of the position of the sentences, described below.

MSA1. Proposed in [47], used a discrete function which gives the value zero (0) in case the sentence is not the first in the text and one (1) otherwise.

MSA2. Proposed in the works of [32] and [28]. $P = \sum_{\forall S_{i} \in Summary} \sqrt{\frac{1}{q_{i}}}$ (6) where qi indicates the position of the sentence Si in the document, and P is the result of the calculation for all sentences of the summary. In this equation, P has high values when sentences in summary belong to the first sentences in the document, and P has low values when sentences in summary belong to the last sentences in the document.

MSA3. In the work of [33], two ways of calculating the importance of sentences are proposed (see Equations 7 and 8). $(N - i + 1) / N$ (7)

$\frac{1}{i}$ (8)

MSA4. In the work of [35] assigns higher scores for terms in the first 4 paragraphs.

MSA5. In the works of [11, 40] and [48] consider up to 5 positions from the top of the document. For instance, the first sentence in a paragraph has a score value of 5/5; the second sentence has a score 4/5, and so on. For example, Equation 9. $\begin{matrix} S = \frac{5}{5} for 1^{st}, \frac{4}{5} for 2^{nd}, \frac{3}{5} for 3^{rd}, \\ \frac{2}{5} for 4^{th}, \frac{1}{5} for 5^{th}, \\ \frac{0}{5} for the other sentneces \end{matrix}$ (9)

MSA6. In [27] three models are proposed to calculate the importance of sentences (see Equation 10, Equations 11 and 12). The results shown in this work are considering the Equation 12. $(S_{i}) = i$ (10)

The sentence score is proportional to its closeness to the end of the document: where i is the sequential number of the sentence in the document. $(S_{i}) = \frac{1}{i}$ (11)

As in [33] sentence score is proportional to its closeness to the beginning of the document $(S_{i}) = max (\frac{1}{i}, \frac{1}{n - i + 1})$ (12)

Finally, the sentence score is proportional to its closeness to the borders of the document where n is the total number of sentences.

MSA7. In [49] a specialized study is made for the corpus DUC01 and DUC02, to determine the best way to calculate the importance of the sentences of these data sets. By DUC01 the Equation 13 and DUC02 the Equation 14. $f (i) = \frac{114 X - 233 X^{2}}{166 - 192 X (\sqrt{X + \frac{1}{103 \frac{1}{25} X})}}$ (13)

$\begin{matrix} (f (i) = \frac{41.3294}{15.53 + X^{2} - 2 X + \frac{704.55}{X}} \\ + \frac{X^{2} + 4 X - 663.61}{- 2 X^{3} - 665.61 X - \frac{31.06}{X}} \end{matrix}$ (14)

In addition to the state-of-the-art model to determine the importance of sentences, there is a heuristic that is based on the task of automatic generation of summaries, baseline: first. Baseline:first consists of taking the first n sentences to make up the summary [23]. For state-of-the-art methods and systems, the goal is to overcome this heuristic. Mainly, for the news, it turns out to be very high, since this type of texts contains the most important information at the beginning of the document, for this reason, the importance of comparing with this heuristic.

Table 4 shows a comparison of the different models proposed in the state-of-the-art to determine the importance of the sentences for the corpus DUC01, for this data set it can be observed that the proposed method where the value of the pending obtains the best result. It is worth mentioning that the model proposed in EA7 was built especially for DUC01 and even then, the proposed method obtains better results. It can also be observed that three of the models proposed in the state-of-the-art do not surpass the baseline heuristic: first.

Table 4

Comparison with different models to determine the importance of sentences with DUC01

Model	ROUGE-1	ROUGE-2
Proposed	0.45095	0.19763
MSA7	0.44940	0.19450
MSA3	0.44870	0.19358
MSA3	0.44654	0.18997
MSA4	0.44370	0.19279
MSA5	0.44309	0.19261
Baseline:first	0.44272	0.19701
MSA2	0.41280	0.14079
MSA1	0.41053	0.14078
MSA6	0.35803	0.09545

In Table 5 a comparison of the different models of the state-of-the-art is made to calculate the position of the sentences for the DUC02 data set. The proposed method is second for this data set. However, the model proposed in EA7 is specially built for DUC02, besides the difference between the proposed method and the first is not relevant. For DUC02, only one model fails to overcome the baseline heuristic: first.

Table 5

Comparison with different models to determine the importance of sentences with DUC02

Model	ROUGE-1	ROUGE-2
MSA7	0.48470	0.22792
Proposed	0.48213	0.22333
MSA2	0.47877	0.22169
MSA3	0.47807	0.21983
MSA3	0.47788	0.21921
MSA6	0.47788	0.22208
MSA5	0.47773	0.22494
MSA4	0.47546	0.21889
Baseline:first	0.47294	0.22216
MSA1	0.44604	0.17432

5.5 Comparison to related works

The proposed method is compared to other approaches that have used DUC01 and DUC02 collections, and ROUGE-1 and ROUGE-2 evaluations. Such approaches are briefly described in the next:

GA-4feature [45] proposes four features, similarity with the title (δ), the position of sentences (β), length of the sentence (γ) and coverture (α), also calculates the weight that each one must-have. The research Vázquez is based on a genetic algorithm.

UnifiedRank [50] is a method that proposes a novel unified approach to simultaneous single-document and multi-document summarization, which uses a graph-based representation.

DE [51] is a summarization approach based on clustering sentences. Use a discrete Differential Evolution algorithm to optimize the objective function, selecting representative sentences of each cluster. Selection of the summary sentences is done under a recursive scheme, which takes into account the degree of membership of each sentence to the corresponding group, measuring the centrality of each sentence to the group it belongs to, based on normalized google distance.

FEOM [52] proposes a Fuzzy Evolutionary Optimization Model. In this approach, sentences are categorized in terms of their content, and after the most important sentence are selected for each cluster. FEOM uses genetic algorithms for the generation of the solution vectors with the groups and applies three control parameters to regulate the probability of crossover and mutation of each solution.

NetSum [53] is an approach that use the RankNet learning algorithm to train a pair-based sentence ranker and score every sentence in the document and so identify the most important sentences. This method realizes automatic summarization based on neural nets.

CRF [54] proposes a framework that takes the output of previous methods as features and seamlessly integrates them. Treat the summarization task as a sequence of labeling problem. The framework is based on Conditional Random Fields.

GA-2feature [2] proposes a Genetic Algorithm to extractive summarization through 2 sentence features sentence position (slope based linear equation) and term frequency (precision-recall).

It is worth mentioning that more state-of-the-art methods test with data sets DUC01 and DUC02. However, they perform a division of sentences different from that stipulated in the data sets. So, they are not considered in this paper.

Below are the results corresponding to ROUGE 1 and ROUGE 2 obtained in each data set for the state-of-the-art methods and the proposed method. In Table 6, the results of data set DUC01 are presented and in Table 7, the results of data set DUC02.

Table 6
F-measure score ROUGE-1 and ROUGE-2 of the related works with the DUC01

Method ROUGE-1 ROUGE-2

Proposed 0.45095 0.19763

GA-4feature 0.45058 0.19619

FEOM 0.47728 0.18549

DE 0.47856 0.18528

NetSum 0.46427 0.17697

UnifiedRank 0.45377 0.17649

CRF 0.45512 0.17327

Method	ROUGE-1	ROUGE-2
Proposed	0.45095	0.19763
GA-4feature	0.45058	0.19619
FEOM	0.47728	0.18549
DE	0.47856	0.18528
NetSum	0.46427	0.17697
UnifiedRank	0.45377	0.17649
CRF	0.45512	0.17327

Table 7

F-measure score ROUGE-1 and ROUGE-2 of the related works with the DUC02

Method	ROUGE-1	ROUGE-2
UnifiedRank	0.48478	0.21462
GA-4feature	0.48423	0.22471
Proposed	0.48213	0.22333
DE	0.46694	0.12368
FEOM	0.46575	0.1249
NetSum	0.44963	0.11167
CRF	0.44006	0.10924

Table 8 shows the position of each method and system with respect to the results obtained by each measure. The resulting ranking matrix was calculated as proposed in [55] as follows (see Equation 15): $Ran \sum_{r - 1}^{n} \frac{(n - r + 1) R_{r}}{n}$ (15)

Table 8

Global ranking estimated by the partial rakings obtained in each evaluation of ROUGE-1 and ROUGE-2 in the DUC01 and DUC02 data sets

Method	R(r)							Global Ranking
	1	2	3	4	5	6	7
UnifiedRank	1	0	1	0	1	1	0	2.4
GA-4feature	1	2	0	0	0	0	1	2.8
Proposed	1	1	1	0	0	1	0	2.8
DE	1	0	0	2	1	0	0	2.5
FEOM	0	1	1	1	1	0	0	2.5
NetSum	0	0	1	0	1	2	0	1.7
CRF	0	0	0	1	0	0	3	1

where n is the number of methods and systems involved for the comparison, and R_r refers to the number of times that the method or system affects the r-th position.

The ranking matrix allows us to determine which is the best method; in this case, the AG-4feaure method and the proposed method are the ones with the best results.

6 Discussions and conclusions

In this paper, it was possible to determine the importance of the position of the sentences for the corpus in English DUC01 and DUC02. In addition to determining the appropriate text model for the AES in these two data sets.

For DUC01 the best result was with the text model n-grams with n = 2 and a slope of –0.9. For DUC02, the best result was with the bag of words text model with a slope of –0.75. For both data sets, the bag of words text model obtains good results and the trend observed is that the larger the number of n (n-grams) the quality of the summaries decreases. The positional feature of the sentences for the AES is one of the most used by the importance and the contribution it offers to the state-of-the-art methods. In this paper different models proposed in the state of the art were tested to calculate the importance of the sentences and it could be determined that the model that uses the slope of a line is the best one, which is independent of the language and the domain. To be able to adjust the slope can be implemented in any method and with any data set.

Additionally, a comparison of the proposed method with the state-of-the-art methods was made. The proposed method is positioned first together with the method proposed by [45].

Footnotes

Acknowledgments

Work done under partial support of Mexican Government (CONACyT and SNI). The authors thank Autonomous University of the State of Mexico for their assistance.

References

Yahiaoui

, Merialdo

and Huet

, Comparison of multiepisode video summarization algorithms, EURASIP Journal on Advances in Signal Processing2003 (2003), 613895.

García-Hernández

R.A.

and Ledeneva

, Single extractive text summarization based on a genetic algorithm, in: Springer2013, pp. 374–383.

Mihalcea

and Tarau

, Alanguage independent algorithm for single and multiple document summarization, (2005).

Mao

, Yang

, Huang

, Liu

and Li

, Extractive summarization using supervised and unsupervised learning, Expert Systems with Applications133 (2019), 173–181.

Qazvinian

, Radev

D.R.

, Mohammad

S.M.

, Dorr

, Zajic

, Whidby

and Moon

, Generating extractive summaries of scientific paradigms, Journal of Artificial Intelligence Research46 (2013), 165–201.

Nichols

, Mahmud

and Drews

, Summarizing sporting events using twitter, in: Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces, ACM2012, pp. 189–198.

Yang

, Cai

, Tang

, Zhang

, Su

and Li

, Social context summarization, in: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2011, pp. 255–264.

Joshi

, Fidalgo

, Alegre

and Fernández-Robles

, SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders, Expert Systems with Applications129 (2019), 200–215.

Suanmali

, Salim

and Binwahlan

M.S.

, Genetic Algorithm Based Sentence Extraction for Text Summarization, International Journal of Innovative Computing1 (2011). http://ijic.fc.utm.my/index.php/ijic/article/view/6 (accessed March 20, 2018).

10.

Alfonseca

and Rodríguez

, Generating extracts with genetic algorithms, in: Springer2003, pp. 511–519.

11.

Berker

, Using genetic algorithms with lexical chains for automatic text summarization, (2011).

12.

da Cunha Fanego

, Hacia un modelo lingüístico de resumen automático de artículos médicos en español, Proyecto de Investigación, Universidad Pompeu Fabra, Instituto Universitario de Lingüística Aplicada, Doctorado En Ciencias Del Lenguaje y Lingüística Aplicada.<Http://Www.Upf.Edu/Pdi/Iula/Iria. Dacunha/# 0. 202 (2005) 07–04.

13.

Qazvinian

, Hassanabadi

L.S.

and Halavati

, Summarising text with a genetic algorithm-based sentence extraction, International Journal of Knowledge Management Studies2 (2008), 426–444.

14.

Ledeneva

, Automatic Language-Independent Detection of Multiword Descriptions for Text Summarization, National Polytechnic Institute, (2008).

15.

Montiel

, Generación automática de resúmenes mediante aprendizaje no supervisado, Instituto Tecnológico de Toluca, (2009).

16.

Plaza

, Uso de grafos semánticos en la generación automática de resúmenes y estudio de su aplicación en distintos dominios: biomedicina, periodismo y turismo, Universidad Complutense de Madrid, (2011).

17.

García-Hernández

R.A.

, Ledeneva

, Mendoza

G.M.

, Dominguez

Á.H.

, Chavez

, Gelbukh

and Fabela

J.L.T.

, Comparing commercial tools and state-of-the-art methods for generating text summaries, in: Artificial Intelligence, 2009. MICAI 2009. Eighth Mexican International Conference On, IEEE2009, pp. 92–96.

18.

Miranda-Jiménez

, Gelbukh

and Sidorov

, Summarizing conceptual graphs for automatic summarization task, in: International Conference on Conceptual Structures, Springer2013, pp. 245–253.

19.

Lloret

and Palomar

, COMPENDIUM: Una herramienta de generación de resúmenes modular, Procesamiento Del Lenguaje Natural. (2011).

20.

Mateo

P.L.

, González

J.C.

, Villena

and Martínez

J.L.

, Un sistema para resumen automático de textos en castellano, Procesamiento Del Lenguaje Natural31 (2003), 29–36.

21.

Bhargava

, Sharma

and Sharma

, Atssi: Abstractive text summarization using sentiment infusion, Procedia Computer Science89 (2016), 404–411.

22.

Bing

, Li

, Liao

, Lam

, Guo

and Passonneau

R.J.

, Abstractive multi-document summarization via phrase selection and merging, ArXiv Preprint ArXiv:1506.01597. (2015).

23.

Genest

P.-E.

and Lapalme

, Framework for abstractive summarization using text-to-text generation, in: Proceedings of the Workshop on Monolingual Text-To-Text Generation, Association for Computational Linguistics2011, pp. 64–73.

24.

Ledeneva

, García-Hernández

R.A.

and Gelbukh

, Graph ranking on maximal frequent sequences for single extractive text summarization, in: Springer2014, pp. 466–480.

25.

Ledeneva

, Hernández

, Soto

, Reyes

and Gelbukh

, EM clustering algorithm for automatic text summarization, Advances in Artificial Intelligence (2011), 305–315.

26.

Ledeneva

, Gelbukh

and García-Hernández

R.A.

, Terms derived from frequent sequences for extractive text summarization, in: Springer2008, pp. 593–604.

27.

Last

and Litvak

, Language-independent Techniques for Automated Text Summarization, (2010), 207–237.

28.

Mendoza

, Bonilla

, Noguera

, Cobos

and León

, Extractive single-document summarization based on genetic operators and guided local search, Expert Systems with Applications41 (2014), 4158–4169.

29.

Saggion

, Using SUMMA for Language Independent Summarization at TAC 2011., in: (2011).

30.

Montiel Soto

, Ledeneva

, García-Hernández

R.A.

and Cruz Reyes

, Comparación de tres modelos de texto para la generación automática de resúmenes, Procesamiento Del Lenguaje Natural. (2009).

31.

Lin

C.-Y.

and Hovy

, Identifying topics by position, in: Fifth Conference on Applied Natural Language Processing, (1997).

32.

Bossard

, Généreux

and Poibeau

, Description of the LIPN System at TAC 2008: Summarizing Information and Opinions, in: 2008, pp. 282–291.

33.

Ouyang

, Li

, Lu

and Zhang

, A study on position information in document summarization, in: Association for Computational Linguistics2010, pp. 919–927.

34.

Nandhini

and Balasundaram

S.R.

, Extracting easy to understand summary using differential evolution algorithm, Swarm and Evolutionary Computation16 (2014), 19–27.

35.

Lin

C.-Y.

, Training a selection function for extraction, in: ACM1999, pp. 55–62.

36.

Hirao

, Isozaki

, Maeda

and Matsumoto

, Extracting important sentences with support vector machines, in: Association for Computational Linguistics2002, pp. 1–7.

37.

Katragadda

, Pingali

and Varma

, Sentence position revisited: a robust light-weight update summarization’baseline’algorithm, in: Association for Computational Linguistics2009 pp. 46–52.

38.

Uddin

M.N.

and Khan

S.A.

, A study on text summarization techniques and implement fewof them for Bangla language, in: Computer and InformationTechnology, 2007. Iccit 2007, 10th International Conference On, IEEE2007, pp. 1–4.

39.

Orǎsan

, An evolutionary approach for improving the quality of automatic summaries, in: Proceedings of theACL 2003 Workshop on Multilingual Summarization and Question Answering-Volume 12, Association for Computational Linguistics2003, pp. 37–45.

40.

Babar

and Patil

P.D.

, Improving Performance of Text Summarization, Procedia Computer Science46 (2015), 354–363.

41.

Kiyoumarsi

, Evaluation Of Automatic Text Summarizations Based On Human Summaries, Procedia-Social and Behavioral Sciences192 (2015), 83–91.

42.

Kupiec

, Pedersen

and Chen

, A trainable document summarizer, in: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 1995 pp. 68–73.

43.

Edmundson

H.P.

, New methods in automatic extracting, Journal of the ACM (JACM)16 (1969), 264–285.

44.

Erkan

and Radev

D.R.

, Lexrank: Graph-based lexical centrality as salience in text summarization, Journal of Artificial Intelligence Research22 (2004), 457–479.

45.

Vazquez Vazquez

, García-Hernández René

R.A.

and Ledeneva

, Sentence Features Relevance for Extractive Text Summarization using Genetic Algorithms, Journal of Intelligent & Fuzzy Systems Applications in Engineering and Technology35 (2018), 353–365.

46.

García-Hernández

, Montiel

, Ledeneva

, Rendón

, Gelbukh

and Cruz

, Text summarization by sentence extraction using unsupervised learning, MICAI 2008: Advances in Artificial Intelligence (2008), 133–143.

47.

Belkebir

and Guessoum

, A supervised approach to arabic text summarization using adaboost, in: New Contributions in Information Systems and Technologies, Springer2015, pp. 227–236.

48.

Suanmali

, Salim

and Binwahlan

M.S.

, Genetic algorithm based sentence extraction for text summarization, International Journal of Innovative Computing1 (2011).

49.

Vazquez Vazquez

, García-Hernández

and Ledeneva

, Learning Relevant Models using Symbolic Regression for Automatic Text Summarization, Computation and System1 (2019).

50.

Wan

, Towards a unified approach to simultaneous single-document and multi-document summarizations, Association for Computational Linguistics2010, 1137–1145in.

51.

Aliguliyev

R.M.

, A new sentence similarity measure and sentence based extractive technique for automatic text summarization, Expert Systems with Applications36 (2009), 7764–7772.

52.

Song

, Choi

L.C.

, Park

S.C.

and Ding

X.F.

, Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization, Expert Systems with Applications38 (2011), 9112–9121.

53.

Svore

, Vanderwende

and Burges

, Enhancing single-document summarization by combining RankNet and third-party sources, in: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), (2007).

54.

Shen

, Sun

J.-T.

, Li

, Yang

and Chen

, Document Summarization Using Conditional Random Fields., in: IJCAI2007, pp. 2862–2867.

55.

Aliguliyev

R.M.

, Performance evaluation of density-based clustering methods, Information Sciences179 (2009), 3583–3602.

Determining the importance of sentence position for automatic text summarization

Abstract

Keywords

1 Introduction

2 Background

3 Related work

5.1 Data sets

Table 1 Description of the data sets DUC01 DUC02 Clusters 30 59 Documents 309 567 Task Task 1 Task 1 Summary length 100 words 100 words

5.3 Text model

Table 6 F-measure score ROUGE-1 and ROUGE-2 of the related works with the DUC01 Method ROUGE-1 ROUGE-2 Proposed 0.45095 0.19763 GA-4feature 0.45058 0.19619 FEOM 0.47728 0.18549 DE 0.47856 0.18528 NetSum 0.46427 0.17697 UnifiedRank 0.45377 0.17649 CRF 0.45512 0.17327

Footnotes

Acknowledgments

References

Table 1
Description of the data sets

DUC01 DUC02

Clusters 30 59

Documents 309 567

Task Task 1 Task 1

Summary length 100 words 100 words

Table 6
F-measure score ROUGE-1 and ROUGE-2 of the related works with the DUC01

Method ROUGE-1 ROUGE-2

Proposed 0.45095 0.19763

GA-4feature 0.45058 0.19619

FEOM 0.47728 0.18549

DE 0.47856 0.18528

NetSum 0.46427 0.17697

UnifiedRank 0.45377 0.17649

CRF 0.45512 0.17327