Multi-document summarization based on sentence cluster using non-negative matrix factorization

Abstract

Multi-document summarization aims to produce a concise summary that contains salient information from a set of source documents. Many approaches use statistics and machine learning techniques to extract sentences from documents. In this paper, we propose a new multi-document summarization framework based on sentence cluster using Nonnegative Matrix Tri-Factorization (NMTF). The proposed framework employs NMTF to cluster sentences using inter-type relationships among documents, sentences and terms, and incorporate the intra-type information through manifold regularization. The most informative sentences are selected from each sentence cluster to form the summary. When evaluated on the DUC2004 and TAC2008 datasets, the performance of the proposed framework is comparable with that of the top three systems.

Keywords

Multi-document summarization sentence clustering cluster-based ranking non-negative matrix tri-factorization manifold ranking

1 Introduction

The exponential growth in the volume of documents available on the Internet brings the problem of finding out whether a single document can meet a user’s complex information need. In order to solve this problem, multi-document summarization, which reduces the size of documents while preserves their important semantic content is highly demanded. Most of the summarization work done till date follow the sentence extraction framework, which ranks sentences according to various pre-specified criteria and selects the most salient sentences from the original documents to form summaries [1].

Though traditional feature-based ranking approaches and graph-based approaches employ quite different techniques to rank sentences, they have at least one point in common, i.e., all of them focus on sentences only, but ignore the information beyond the sentence level. Actually, in a given document set, there usually exist a number of themes (or topics) with each theme represented by a cluster of highly related sentences [2]. These theme clusters are of different size and especially different importance to assist users in understanding the content in the whole document set. The cluster level information is supposed to have foreseeable influence on sentence ranking.

In order to enhance the performance of summarization, cluster-based ranking approaches have been explored in the literature [3]. Normally these approaches apply a clustering algorithm to obtain the theme clusters first and then rank the sentences within each cluster or by exploring the interaction between sentences and obtained clusters. So the sentence ranking performance is inevitably influenced by the sentence clustering result.

The key part in sentence clustering is to estimate the similarity between two sentences [4]. Intuitively, many similarity measures traditionally used for document clustering cannot be directly applied to sentence clustering. The solutions that rely on term overlaps can be effective when dealing with documents because the documents about the same topic may share many terms in common. However, the sentences with very similar meanings do not necessarily share enough terms, owing to the short length and the limited content that they contain. Inevitably, the similarity measures based on term overlaps alone fail to perform well in sentence clustering. To help alleviate this problem, Nonnegative Matrix Factorization (NMF), which utilizes relationship between sentences and terms or sentences and documents, has been used in sentence clustering [5]. However, there also exist relationships between sentences and sentences, terms and terms, documents and documents, documents and terms, documents and sentences.

In this paper, we propose a new multi-document summarization framework based on sentence cluster using Nonnegative Matrix Tri-Factorization (NMTF). One of the benefits of the framework is that inter-type relationships between three text objects, i.e., documents, sentences and terms, can be fully utilized, moreover intra-type information between documents and documents, sentences and sentences, terms and terms can be also incorporated through manifold regularization. So the sentence cluster performance can be further enhanced via these relationships.

The main contributions of this paper are three fold. (1) Nonnegative Matrix Tri-Factorization based sentence clustering framework is developed; (2) two sentence clustering approaches are developed, which allow term and/or document plays an explicit role in sentence clustering as an independent text object so that sentence clustering can benefit from inter-relationship and intra-relationship among them; and (3) thorough experimental studies are conducted to verify the effectiveness of the proposed framework.

The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 presents Nonnegative Matrix Tri-Factorization based sentence clustering framework that enhances sentence clustering using inter-type relationships among documents, sentences and terms, and incorporates the intra-type information through manifold regularization. Section 4 addresses other issues in theme-based summarization. Section 5 addresses experiments and evaluations. Conclusions are presented in Section 6.

2 Related work

A variety of summarization approaches have been proposed in the literature. These approaches are either extractive or abstractive. Extractive summarization assigns a significance score to each sentence and extracts the sentences with highest scores to form the summaries. Abstraction summarization, on the other hand, involves a certain degree of understanding of the content conveyed in the original documents and creates the summaries based on information fusion and/or language generation techniques [6]. Like most researchers in this field, we follow the extractive summarization framework [7] in this work.

Under the framework of extractive summarization, sentence ranking [8] is the issue of most concern. Beyond that, when the given documents are all supposed to be about the same topic, they are very likely to repeat some important information in different documents or in different places in the same document. Therefore, effectively clustering the sentences with the same or very similar content is necessary. Recently it has been successfully applied in cluster-based summarization [2, 3]. These cluster-based summarization approaches utilize the clustering results to select the representative sentences in order to generate summaries. Alternatively, the clustering results could be used to improve or refine the sentence ranking results. Wan and Yang [9] proposed two models to incorporate the cluster-level information into the process of sentence ranking for generic multi-document summarization, namely ClusterCMRW and Cluster-HITS. The ClusterCMRW (Cluster-based Conditional Markov Random Walk) model incorporates the cluster-level information into the text graph and manipulates clusters and sentences equally, the Cluster-HITS model treats clusters and sentences as hubs and authorities in the HITS algorithm. Zheng et al. [10] proposed a novel approach for multi-document summarization based on sentence clustering, which detects redundancy when selecting representative sentences. Zhang et al. [11] proposed a density peaks sentence clustering approach for multi-document summarization, which measures representativeness, diversity and conciseness at the same time.

Steinberger et al. [12] proposed a LSA-based multi-document summarization approach. Li et al. [13] extended generic multi-document summarization using LSA to query-based document summarization. Zha [14] proposed generic summarization using sentence clustering and the mutual reinforcement principle (MRP). Their method clusters sentences of documents into several topical groups. Sentences are extracted from each topical group by their saliency scores which are computed using the MRP, this is a modified LSA method. This method guarantees that the elements of a singular vector with respect to semantic feature values will be only positive values, even though the semantic features do not necessarily identify subtopics. Yeh et al. [15] proposed a document summarization method using LSA and a text relationship map (TRM). Their method uses LSA to derive the semantic matrix of a document and uses semantic representation of a sentence to construct a semantic text relationship map. TRM is constructed using the similarity between semantic representations, and important sentences are extracted by using the number of links in the TRM. This method does not consider subtopics for document summarization.

Wang et al. [13] proposed a language model to simultaneously cluster and summarize documents. Nonnegative factorization is performed on the term-document matrix using the term-sentence matrix as the base so that the document-topic and sentence-topic matrices could be constructed, from which the document clusters and the corresponding summary sentences are generated simultaneously. Lee et al. [5] proposed an unsupervised method using Non-negative Matrix Factorization (NMF) to select sentences for automatic generic document summarization. Park et al. [16] used relevance feedback (RF) and NMF to distill the contents of the documents with respect to a given query, to generate a document summarization. The approach can reduce the semantic gap between the low level feature representation in vector model and the high level user’s perception by means of iterative relevance feedback. Shen et al. [17] proposed Bi-mixture PLSA with sentence bases to simultaneously cluster and summarize the documents utilizing the mutual influence of the document clustering and summarization procedures. Tan et al. [1] proposed a joint matrix factorization and manifold-ranking framework for topic-focused multi-document summarization, which aims at learning better sentence similarity scores and better sentence ranking scoressimultaneously.

These approaches either utilize topic information and sentence itself information or utilize inter-relationships between sentences and terms, sentences and documents, ignoring intra-relationships among them. We propose Nonnegative MatrixTri-Factorization based sentence clustering framework, which uses not only inter-relationships between sentences and terms, sentences and documents, terms and documents, but also intra-relationships among documents, sentences and terms.

3 Non-negative matrix tri-factorization based sentence clustering framework

3.1 Problem formulation

First of all, let’s introduce the document-sentence-term tri-type graph model for a set of given documents, based on which the sentence clustering framework is developed. Let G =< V, E, R, W >, where V is the set of vertices that consists of the document set D = {d₁, d₂, …, d_b}, the sentence set S = {s₁, s₂, …, s_n} and the term set T = {t₁, t₂, ⋯ , t_m}, i.e., V = D ∪ S ∪ T, b is the number of documents, n is the number of sentences and m is the number of terms. Each term vertex is the sentence that is given in the WordNet as the description of the term. It extracts the first sense used from WordNet instead of the term itself 1 . E is the set of edges that connect the vertices in V, i.e., E = {< v_i, v_j|v_i, v_j ∈ V >}. R is a set of inter-type relationship matrices that consists of $R_{ST} \in R^{n \times m}$ , $R_{SD} \in R^{n \times b}$ and $R_{DT} \in R^{b \times m}$ , in which R_ST (i, j) represents cosine similarity between the sentence s_i and the term t_j, R_SD (i, j) represents cosine similarity between the sentence s_i and the document d_j, R_DT (i, j) represents cosine similarity between the document d_i and the term t_j. Meanwhile $R_{ST} = R_{TS}^{T}$ , $R_{SD} = R_{DS}^{T}$ and $R_{DT} = R_{TD}^{T}$ as the relationships between sentences and terms, sentences and documents, documents and terms are symmetric. W is a set of intra-type relationship matrices that consists of $W_{SS} \in R^{n \times n}$ , $W_{DD} \in R^{b \times b}$ and $W_{TT} \in R^{m \times m}$ , in which W_SS (i, j) is the cosine similarity between the sentences s_i and s_j, W_DD (i, j) is the cosine similarity between the documents d_i and d_j, W_TT (i, j) is the cosine similarity between the terms t_i and t_j. Figure 1 indicates the tri-type document graph G, in which solid lines illustrate the inter-type relationships and dashed lines illustrate the intra-type relationships, respectively.

Fig.1

The tri-type graph is composed of documents, sentences and terms, involving (1) Inter-type relationships (solid lines): the relations between objects in different types. (2) Intra-type relationships (dashed lines): the relations between different text objects in a same type.

3.2 Sentence clustering using two types of text objects

3.2.1 Using inter-type relationships

Let us first consider clustering sentences via relationships between sentences and terms. The close connection between Nonnegative Matrix Factorization (NMF) and clustering provides the potential theory to develop the method. Ding et al. proposed to use NMTF [19] to simultaneously cluster the rows and columns of an input nonnegative relationship matrix by decomposing it into three nonnegative factor matrices. We apply it to R_ST matrix, which minimizes the following objective:

$\begin{matrix} J_{1} = | | R_{ST} - G_{S} X_{ST} G_{T}^{T} | |^{2}, \\ s . t . G_{S} \geq 0, G_{T} \geq 0, X_{ST} \geq 0 \end{matrix}$ (1) where || · || denotes the Frobenius norm of a matrix, $G_{S} \in R_{+}^{n \times c_{S}}$ , $G_{T} \in R_{+}^{m \times c_{T}}$ are the cluster indicator matrices for S and T respectively, c_S and c_T are the cluster number of S and T respectively. $X_{ST} \in R_{+}^{c_{S} \times c_{T}}$ absorbs the different scales of R_ST, G_S and G_T.

Simultaneous clustering on S and T is then achieved by solving Equation (1). As the rows of G_S (with normalization) can be interpreted as the posterior probability for clustering on S [19], the cluster label of s_i is obtained by

$l (s_{i}) = \underset{i}{arg max} G_{S} (i, j)$ (2)

Let us move to clustering sentences using inter-type relationship between sentences and documents. We replace R_ST with R_SD. Likewise, we use the equation similar to Equation (1) to solve the problem.

3.2.2 Incorporating intra-type relationships via graph regularization

The optimization objective in Equation (1) only involves the inter-type relationships of two types of text objects, whereas the intra-type information which is deemed important in sentence clustering [20] has not been used. In this section, we incorporate the intra-type information through Laplacian regularization. For the two types of text objects, i.e. sentences and terms, given the intra-type information in form of the pairwise affinity matrices W_SS and W_TT, we can incorporate them into Equation (1) as follow:

$\begin{matrix} min J_{2} = | | R_{ST} - G_{S} X_{ST} G_{T}^{T} | |^{2} \\ + λ [tr (G_{S}^{T} L_{S} G_{S}) + tr (G_{T}^{T} L_{T} G_{T})] \\ s . t . G_{S} \geq 0, G_{T} \geq 0, X_{ST} \geq 0 \end{matrix}$ (3) where D_S and D_T are diagonal degree matrices with D_S (i, i) = ∑_jW_SS (i, j), D_T (i, i) = ∑_jW_TT (i, j), and L_S = D_S - W_SS, L_T = D_T - W_TT are the corresponding graph Laplacian. Because L_S and L_T are the discrete approximation of the Laplace-Beltrami operator on the underlying data manifold [21], the last two terms reflects the label smoothness of the two types of objects. The smoother the data labels are with respect to the underlying data manifolds, the smaller their values will be.

In the following, we will present the solution to Equation (3). As we see, minimizing Equation (3) is with respect to X_ST, G_T and G_S, and we cannot give a closed-form solution. We will give an alternating scheme to optimize the objective. In other words, we will optimize the object with respect to one variable while fixing the other variables. This procedure repeats until convergence.

Computation of X _ST

Optimizing Equation (3) with respect to X_ST is equivalent to optimizing

$\begin{matrix} J_{2} = | | R_{ST} - G_{S} X_{ST} G_{T}^{T} | |^{2} \\ s . t . X_{ST} \geq 0 \end{matrix}$ (4) setting $\frac{\partial J_{2}}{\partial X_{ST}} = 0$ leads to the following updating formula $X_{ST} = (G_{S}^{T} G_{S})^{- 1} G_{S}^{T} X_{ST} G_{T} (G_{T}^{T} G_{T})^{- 1}$ (5)

Computation of G _T

Optimizing Equation (3) with respect to G_T is equivalent to optimizing

$\begin{matrix} min J_{2} = | | R_{ST} - G_{S} X_{ST} G_{T}^{T} | |^{2} \\ + λ \cdot tr (G_{T}^{T} L_{T} G_{T}) \\ s . t . G_{T} \geq 0 \end{matrix}$ (6)

We introduce the Lagrangian multiplier $α \in R^{c_{T} \times m}$ , thus the Lagrangian function is

$\begin{matrix} L (G_{T}) & = & | | R_{ST} - G_{S} X_{ST} G_{T}^{T} | |^{2} \\ + λ \cdot tr (G_{T}^{T} L_{T} G_{T}) - tr (α G_{T}^{T}) \end{matrix}$ (7) Setting $\frac{\partial L (G_{T})}{\partial G_{T}} = 0$ , we obtain $α = 2 λ L_{T} G_{T} - 2 A + 2 G_{T} B$ (8) where $A = R_{ST}^{T} G_{S} X_{ST}$ and $B = X_{ST}^{T} G_{S}^{T} G_{S} X_{ST}$

Using the Karush-Kuhn-Tucker condition [22] α_ijG_T (i, j) =0, we get $[λ L_{T} G_{T} - A + G_{T} B] (i, j) \cdot G_{T} (i, j) = 0$ (9)

Introduce $L_{T} = L_{T}^{+} - L_{T}^{-}$ , A = A⁺ - A^- and B = B⁺ - B^- where A⁺ (i, j) = (|A (i, j) | + A (i, j))/2 and A^- (i, j) = (|A (i, j) | - A (i, j))/2 [23], B can be obtained similarly, we get

$\begin{matrix} [λ L_{T}^{+} G_{T} - λ L_{T}^{-} G_{T} - A^{+} + A^{-} \\ + G_{T} B^{+} - G_{T} B^{-}] (i, j) \cdot G_{T} (i, j) = 0 \end{matrix}$ (10)

Equation (10) leads to the following updating formula

$\begin{matrix} G_{T} (i, j) \leftarrow G_{T} (i, j) \\ \sqrt{\frac{[λ L_{T}^{-} G_{T} + A^{+} + G_{T} B^{-}] (i, j)}{[λ L_{T}^{-} G_{T} + A^{-} + G_{T} B^{+}] (i, j)}} \end{matrix}$ (11)

Computation of G _S

Optimizing Equation (3) with respect to G_S is equivalent to optimizing

$\begin{matrix} min J_{2} = | | R_{ST} - G_{S} X_{ST} G_{T}^{T} | |^{2} + λ \cdot tr (G_{S}^{T} L_{S} G_{S}) \\ s . t . G_{S} \geq 0, X_{ST} \geq 0 \end{matrix}$ (12)

Similar with the computation of G_T, we introduce the Lagrangian multiplier $β \in R^{c_{S} \times n}$ , thus the Lagrangian function is

$\begin{matrix} L (G_{S}) & = & | | R_{ST} - G_{S} X_{ST} G_{T}^{T} | |^{2} \\ + λ \cdot tr (G_{S}^{T} L_{S} G_{S}) - tr (β G_{S}^{T}) \end{matrix}$ (13)

Setting $\frac{\partial L (G_{S})}{\partial G_{S}} = 0$ , we obtain $β = 2 λ L_{S} G_{S} - 2 P + 2 GQ$ (14) where $P = R_{ST} G_{T} X_{ST}^{T}$ and $Q = X_{ST} G_{T}^{T} G_{T} X_{ST}^{T}$ .

Using the Karush-Kuhn-Tucker complementarity condition [35] β_ijG_S (i, j) =0, we get $[λ L_{S} G_{S} - P + G_{S} Q] (i, j) \cdot G_{S} (i, j) = 0$ (15)

Introduce $L_{S} = L_{S}^{+} - L_{S}^{-}$ , P = P⁺ - P^- and Q = Q⁺ - Q^-, we obtain

$\begin{matrix} [λ L_{S}^{+} G_{S} - λ L_{S}^{-} G_{S} - P^{+} + P^{-} \\ + G_{S} Q^{+} - G_{S} Q^{-}] (i, j) \cdot G_{S} (i, j) = 0 \end{matrix}$ (16)

Equation (16) leads to the following updating formula

$\begin{matrix} G_{S} (i, j) \leftarrow G_{S} (i, j) \\ \sqrt{\frac{[λ L_{S}^{-} G_{S} + P^{+} + G_{S} Q^{-}] (i, j)}{[λ L_{S}^{+} G_{S} + P^{-} + G_{S} Q^{+}] (i, j)}} \end{matrix}$ (17)

3.2.3 Complexity analysis

The number of iterations of the sentence clustering using inter-relationships and intra-relationships with sentences and terms approach is denoted as N₁. The input data R_ST is an n × m matrix. The cost of Equation (5) is O (nN₁C_SC_T), the cost of Equation (11) is O (mC_T) and the cost of Equation (17) is O (nC_S). The cost of obtaining initial sentence cluster is $O ({nt}_{1} C_{S}^{2})$ and the cost of obtaining initial term cluster is $O ({nt}_{1} C_{T}^{2})$ , where t₁ denotes the number of iterations in k-means algorithm. As C_S < C_T, so the overall cost of the approach is $O ({nN}_{1} C_{S} C_{T} + {mC}_{T} + {nC}_{S} + {nt}_{1} C_{T}^{2})$ .

We also normalize the final matrix G_S, the cluster label of s_i is obtained by Equation (2). Convergence proof of the algorithm please refers to [24]. Clustering sentences using inter-type relationship between sentences and documents and their intra-type relationship is similar to the above solution.

3.3 Sentence clustering using three types of text objects

3.3.1 Using inter-type relationships

When adding specific term semantics or the document context can enhance sentence clustering, we believe that using both of them should be able to bring us the further boosted clustering performance. To go one step further, we cluster sentences using documents, sentences and terms information. This time, we generalize the objective in Equation (1) to simultaneously cluster three text objects, that is to solve the following optimization problem [25]:

$\begin{matrix} min J_{3} & = & | | R_{SW} - G_{S} X_{SW} G_{W}^{T} | |^{2} \\ + | | R_{SD} - G_{S} X_{SD} G_{D}^{T} | |^{2} \\ + | | R_{DW} - G_{D} X_{DW} G_{W}^{T} | |^{2} \\ s . t . & G_{S} \geq 0, G_{W} \geq 0, G_{D} \geq 0, \\ X_{SW} \geq 0, X_{SD} \geq 0, X_{DW} \geq 0 \end{matrix}$ (18)

However, it is not straightforward to solve Equation (18) by generalizing existing iterative multiplicative NMTF solution algorithms. Motivated by [26] that deals with bipartite graph, we propose to solve the optimization problem in Equation (3) by solving an equivalent Symmetric NMTF problem.

We first present the following lemma.

Lemma 1. The optimization problem in Equation (1) can be equivalently solved by the following S-NMTF problem: $min J_{4} = | | R - {GXG}^{T} | |^{2}, s . t . G \geq 0, X \geq 0$ (19)in which

$\begin{matrix} R & = & [\begin{matrix} 0^{n \times n} & R_{SW}^{n \times m} \\ R_{WS}^{m \times n} & 0^{m \times m} \end{matrix}], G = [\begin{matrix} G_{S}^{n \times c_{S}} & 0^{n \times c_{w}} \\ 0^{m \times c_{s}} & G_{W}^{m \times c_{w}} \end{matrix}], \\ and & X = [\begin{matrix} 0^{c_{s} \times c_{s}} & X_{SW}^{c_{s} \times c_{w}} \\ X_{SW}^{c_{w} \times c_{s}} & 0^{c_{w} \times c_{w}} \end{matrix}] \end{matrix}$ (20) where the superscripts denote the matrix size, 0^n×n is a matrix with all zero entries of sizen × n.

Based on Lemma 1, we have the following theorem.

Theorem 1. It is equivalent to solve Equation (18) and to solve $min J_{4} = | | R - {GXG}^{T} | |^{2}, s . t . G \geq 0, X \geq 0$ (21) in which

$\begin{matrix} R & = & [\begin{matrix} 0^{n \times n} & R_{SW}^{n \times m} & R_{SD}^{n \times b} \\ R_{WS}^{m \times n} & 0^{m \times m} & R_{WD}^{m \times b} \\ R_{DS}^{b \times n} & R_{DW}^{b \times m} & 0^{b \times b} \end{matrix}], \\ G & = & [\begin{matrix} G_{S}^{n \times c_{S}} & 0^{n \times c_{W}} & 0^{n \times c_{D}} \\ 0^{m \times c_{S}} & G_{W}^{m \times c_{W}} & 0^{m \times c_{D}} \\ 0^{b \times c_{S}} & 0^{b \times c_{W}} & G_{D}^{b \times c_{D}} \end{matrix}] \\ X & = & [\begin{matrix} 0^{c_{S} \times c_{S}} & X_{SW}^{c_{S} \times c_{W}} & X_{SD}^{c_{S} \times c_{D}} \\ X_{WS}^{c_{W} \times c_{S}} & 0^{c_{W} \times c_{W}} & X_{WD}^{c_{W} \times c_{D}} \\ X_{DS}^{c_{D} \times c_{S}} & X_{DW}^{c_{D} \times c_{W}} & 0^{c_{D} \times c_{D}} \end{matrix}] \end{matrix}$ (22) where $R_{SW} = R_{WS}^{T}$ , $R_{DW} = R_{WD}^{T}$ , $R_{SD} = R_{DS}^{T}$ and $T_{SW} = T_{WS}^{T}$ , $T_{SD} = T_{DS}^{T}$ , $T_{DW} = T_{WD}^{T}$ .

Theorem 1 presents a general framework via NMTF to simultaneously cluster three types of text objects using the inter-type relationship matrices. So cluster label of s_i can be obtained byEquation (2).

3.3.2 Incorporating intra-type relationships via graph regularization

Under R, X and G defined in Equation (22), we denote $W = [\begin{matrix} W_{SS}^{n \times n} & 0^{n \times m} & 0^{n \times b} \\ 0^{m \times n} & W_{TT}^{m \times m} & 0^{m \times b} \\ 0^{b \times n} & 0^{b \times m} & W_{DD}^{b \times b} \end{matrix}]$ (23)

We approach sentence clustering using information of three types of text objects by solving the following optimization problem:

$\begin{matrix} min J (G) = | | R - {GXG}^{T} | |^{2} \\ + 2 λ tr [G^{T} (D - W) G] \\ s . t . G \geq 0, X \geq 0 \end{matrix}$ (24) where D is the diagonal degree matrix with D (i, i) = ∑_jW (i, j).

Computation of X

We introduce the Lagrangian multipliers 4Λ and minimize the Lagrangian function as follows: $L (G, X) = J + tr (4 Λ G^{T})$ (25) which gives $\begin{matrix} \frac{\partial L}{\partial G} & = & - 4 RGX + 4 {GXG}^{T} X \\ - 4 λ WG + 4 DG + 4 Λ \end{matrix}$ (26) $\frac{\partial L}{\partial X} = - 2 G^{T} RG + 2 G^{T} {GXG}^{T} G$ (27)

Fixing G, letting ∂L/ - ∂X = 0, from Equation (27) we get $X = (G^{T} G)^{- 1} G^{T} RG (G^{T} G)^{- 1}$ (28)

Computation of G

Before computing G, we first introduce two lemmas which have been proved in [27].

Lemma 2. For any matrices $A \in R_{+}^{n \times n}$ , $B \in R_{+}^{k \times k}$ , $X \in R_{+}^{n \times k}$ and $X^{'} \in ℜ_{+}^{n \times k}$ , and A and B are symmetric, the following inequality holds: $\sum_{ip} \frac{({AX}^{'} B) (i, p) X^{2} (i, p)}{X^{'} (i, p)} \geq tr (X^{T} AXB)$ (29)

Lemma 3. For any nonnegative symmetric matrices $A \in R_{+}^{k \times k}$ and $B \in R_{+}^{k \times k}$ , for $H \in R_{+}^{n \times k}$ the following inequality holds:

$\begin{array}{l} t r (H A H^{T} H B H^{T}) \leq \\ \sum_{i k} (\frac{H' A H'^{T} H' B + H' B H'^{T} H' A}{2}) (i, k) \frac{H^{4} (i, k)}{H'^{3} (i, k)} \end{array}$ (30)

Based on the above lemmas, we prove the following theorem.

Theorem 2. Let

$\begin{matrix} J (G) & = & tr (- 2 {RGXG}^{T} + {GXG}^{T} {GXG}^{T} \\ + 2 Λ G^{T} DG - 2 λ G^{T} WG) \end{matrix}$ (31)

Then the following function $\begin{array}{l} Z (G, G') = - 2 \sum_{i j k l} G' (i, j) X (i, j) G' (k, l) R (l, i) (1 + \log \frac{G (j, i) G (k, l)}{G' (j, i) G' (k, l)}) \\ + \sum_{i j} (G' X G'^{T} G' X) (i, j) \frac{G^{4} (i, j)}{G'^{3} (i, j)} + \sum_{i j} (D G' Λ) (i, j) \frac{G^{4} (i, j) + G'^{4} (i, j)}{G'^{3} (i, j)} \\ - 2 λ \sum_{i j k} G' (j, i) W (j, k) G' (k, i) (1 + \log \frac{G (j, i) G (k, i)}{G' (j, i) G' (k, i)}) \end{array}$ is an auxiliary function of J (G). Furthermore, it is a convex function in G and its global minimum is $G (i, j) \leftarrow G (i, j) {[\frac{(RGS + λ WG) (i, j)}{({GSG}^{T} GS + λ DG) (i, j)}]}^{\frac{1}{4}}$ (32)

The algorithm of sentence clustering using inter-relationships and intra-relationships among three types of text objects via NMTF is presented in Table 2. The final cluster labels of s_i are obtained from the resulted G_S using Equation (32).

Table 1

Cluster quality evaluation on the DUC2004 dataset

	Cluster Quality
NMTF-based inter + intra (Doc-Sen-Term)	0.581
NMTF-based inter (Doc-Sen-Term)	0.557
NMTF-based inter + intra (Sen-Term)	0.528
NMTF-based inter + intra (Doc-Sen)	0.512
NMTF-based inter (Sen-Term)	0.472
NMTF-based inter (Doc-Sen)	0.460
Context-based	0.451
LSA-based	0.417
WordNet-based	0.402
Surface word matching-based	0.386

Sen = sentence; Doc = document; LSA = Latent Semantic Analysis.

Table 2

Cluster quality evaluation on the TAC2008 dataset

	Cluster Quality
NMTF-based inter + intra (Doc-Sen-Term)	0.625
NMTF-based inter (Doc-Sen-Term)	0.618
NMTF-based inter + intra (Sen-Term)	0.586
NMTF-based inter + intra (Doc-Sen)	0.579
NMTF-based inter (Sen-Term)	0.561
NMTF-based inter (Doc-Sen)	0.558
Context-based	0.516
LSA-based	0.498
WordNet-based	0.476
Surface word matching-based	0.433

Sen = sentence; Doc = document; LSA = Latent Semantic Analysis.

3.3.3 Complexity analysis

The number of iterations of the sentence clustering using inter-relationships and intra-relationships among sentences, documents and terms approach is denoted as N₂. The input data G is a (n + m + b) × (C_S + C_W + C_D) matrix. The cost of Equation (28) is O ((n + m + b) ² × (C_S + C_W + C_D)), the cost of Equation (32) is O ((n + m + b) ³ × (C_S + C_W + C_D)). The cost of obtaining initial sentence cluster is $O ({nt}_{2} C_{S}^{2})$ , the cost of obtaining initial term cluster is $O ({nt}_{2} C_{T}^{2})$ and the cost of obtaining initial document cluster is $O ({nt}_{2} C_{D}^{2})$ , where t₂ denotes the number of iterations in k-means algorithm. As C_D < C_S < C_T, so the overall cost of the approach is $O (N ((n + m + b)^{2} \times (C_{S} + C_{W} + C_{D}) + (n + m + b)^{3} \times (C_{S} + C_{W} + C_{D})) + {nt}_{2} C_{T}^{2})$ .

4 Theme-based summarization

4.1 Cluster number estimation

Our aim is to cluster sentences and select sentences in each cluster to form a summary. Note that our proposed NMTF-based sentence clustering framework requires predefined cluster number c_S, c_T and c_D. To avoid exhaustive search for a proper cluster number for each document set, we employ the spectra approach introduced in [28] to predict the number of the expected clusters. Based on the sentence similarity matrix W_SS using the normalized 1-norm, for its eigenvalues β_i (i = 1, 2, …, n), the ratio φ_i = β_i+1/ - β₂ (i ≥ 1) is defined. If φ_i - φ_i+1 > 0.05 and φ_i is still close to 1, then set c_S = i + 1. Similarly, we can set the term cluster number c_T and the document cluster number c_D in the same way.

4.2 Summary generation

In multi-document summarization, the number of documents to be summarized can be very large. This makes information redundancy appear to be more serious in multi-document summarization than it is in single-document summarization. Redundancy control is necessary. Two popular techniques for avoiding redundancy in summarization are maximal marginal relevance (MMR) [29] and clustering [3]. In MMR, the determination of redundancy is based mainly on the textual overlap between the sentence that is about to be added to the output and the sentences that are already in the output. MMR has been modified by many researchers [30]. On the other hand, clustering offers an alternative that the summarization system clusters the input textual units before starting the selection process. This step allows analyzing one or a few number of representative units from each cluster instead of all textualunits.

We combine the aforementioned two methods to generate document summarization as follows: We get sentence clusters based on the proposed framework at first. Then we define a coefficient θ_k to evaluate the importance of each cluster C_k (1 ≤ k ≤ c_S), which is formulated as the normalized cosine similarity between a theme cluster and the whole document set for generic summarization, or between a theme cluster and a given query for query-based summarization, θ_k ∈ [0, 1] and $\sum_{k = 1}^{c_{S}} θ_{k} = 1$ . According to θ_k, we can order sentence clusters from the most salience to the least salience. After that, we use LexRank [2] to get ranking scores of sentences in each cluster. In addition, we consider the information of sentence positions in a document: We multiply a weight to the score of each sentence; that is, the weight of the first sentence in a document is 1/1, the weight of the second sentence in a document is 1/2, the weight of the third sentence in a document is 1/3, and so on. This is based on a hypothesis that the first sentence in a document is the most important, and the importance decreases as the sentence gets further away from the beginning. We then can get the final ranking scores of each sentence in each cluster. Finally, we choose the most salient sentence from the most salient sentence clusters to the least salient sentence clusters, then the second-most salient sentences of each cluster, and so on, which is similar to C-LexRank [3].

5 Experiments and evaluations

5.1 Experiment setup

In order to compare our proposed clustering framework, we use surface word matching as the baseline of the experiments. We apply WordNet, LSA and context-based matching to test the influence of sentence representation for sentence clustering [20]. We further apply the proposed NMTF-based sentence clustering framework to test whether sentence clustering results can be improved when inter-relationships and intra-relationships among terms, sentences and documents are used.

We conduct a series of experiments on the DUC 2004 generic multi-document summarization dataset and TAC2008 query-focused summarization datasets. According to the task definitions, systems are required to produce a concise summary for each document set (without or with a given query description) and the length of summaries is limited to 665 bytes in the DUC2004 and 100 words in TAC2008.

5.2 Evaluation methods

5.2.1 Intrinsic cluster quality evaluation

In order to evaluate the sentence cluster quality, we need to construct a sentence graph model G₁ (S, E_S) at first, where S = {s₁, s₂, ⋯ , s_n} is the set of vertices representing sentences in document sets, E_S is the set of edges connecting two sentences, every edge in E_S is associated with a weight measuring the cosine similarity between the corresponding two sentences. Newman et al. [31] define modularity measure Q as follow: $Q = \sum_{i} (e_{ii} - a_{i}^{2}) = Tr e - | | e | |^{2}$ (33) where the matrix e is a K × K symmetric matrix whose element e_ij is the fraction of all edges in the network that link vertices in community i to vertices in community j (K is the number of communities in the network). a_i = ∑_je_ij represents the fraction of edges that connect to vertices in community i. Tre = ∑_ie_ii and ||x|| is the sum of the elements of the matrix X. The traditional modularity measure is applied in disconnected graph, while the constructed sentence graph is connected graph. Thus we modify the elements of the matrix e, i.e., e_ij, to be the fraction of all edges’ weight in G₁ that connect vertices in cluster c_i to vertices in cluster c_j. The generated sentence clusters are then evaluated by the modified modularity measure.

5.2.2 Extrinsic summarization evaluation

Our final aim is to generate more accurate summarization. We use ROUGE evaluation toolkit [32] to evaluate the generated summarization, which has long been adopted by DUC for automatic summarization. It measures summary quality by counting overlapping units between system-generated summaries and human-written reference summaries. We report three common ROUGE scores in this report, namely ROUGE-1, ROUGE-2 and ROUGE-SU4, which base on Uni-gram match, Bi-gram match and Skip-Bi-gram match, respectively. Documents and queries are pre-processed by segmenting sentences and splitting words. Stop words 2 are removed and the remaining words are stemmed using Porterstemmer 3 .

5.3 Performance of sentence clustering based on different measures

We first do the experiment on the DUC 2004 dataset: We vary the parameter λ from 0.1 to 0.9 with stepsize 0.05, and verify when λ = 0.1, the NMTF-based sentence clustering using inter-relationships and intra-relationships among two kinds of text objects can achieve the best performance. Then, NMTF-based sentence clustering using inter-relationships and intra-relationships among three kinds of text objects also achieves best when λ = 0.1. We set λ = 0.1 in the TAC2008 datasets. The evaluations of cluster quality based on the different measures on the DUC2004 and TAC2008 datasets are shown in Tables 1 and 2. From the two tables, we can easily see the performance of each measure. We list reasons for the performance of different sentence clustering measures in Table 3.

Table 3
Reasons for the performance of different sentence clustering measures

Approach Reason

NMTF-based inter + intra (Doc-Sen) NMTF-based inter + intra (Sen-Term) NMTF-based inter + intra (Doc-Sen-Term) The NMTF-based sentence clustering using inter-relationships and intra-relationships among text objects shows better performance because it fully uses not only inter-relationship among sentences, terms and documents but also intra-relationships among the three text objects.

NMTF-based inter (Doc-Sen) NMTF-based inter (Sen-Term) NMTF-based inter (Doc-Sen-Term) The performance using sentences and documents was inferior to that of clustering sentences and terms because the document information is coarser than that of term information to a sentence. Moreover, when using inter-relationships among documents, sentences and terms, it shows the best performance than using inter-relationships between documents and sentences or sentences and terms.

Context-based Sentence clustering based on context enrichment achieves a more meaningful performance than that based on concept enrichment, which corresponds to the fact that more related original information can assist in expressing a sentence’s meaning.

LSA-based LSA can be used to construct specific, corpus-driven knowledge about words. Although Islam et al. [46] claimed that each dimension of the singular vector space captures a base latent semantics of the given document set and that each sentence in the document is jointly indexed by the base latent semantics in this space, negative values in some of the dimensions generated by the SVD make the explanation less meaningful. Thus, LSA cannot capture the exact semantic meaning of each sentence, which may reduce the accuracy of the sentence clustering result.

WordNet-based The synonyms of words extracted from WordNet in the given document set are limited. In addition, some words do not exist in WordNet, and these words are usually named entities that can carry important information for summarization. In our experiments, about 8.73% of the words in the DUC2004 dataset and 7.76% of the words in the TAC2008 dataset do not exist in WordNet. WordNet also provides very general domain knowledge about words. All of these factors to some extent influence the accuracy of sentence clustering.

Surface word matching-based The vector representation of the sentence is to be a very sparse representation. Thus, it does not provide enough contexts for computing cosine similarity of the two sentences.

Approach	Reason
NMTF-based inter + intra (Doc-Sen) NMTF-based inter + intra (Sen-Term) NMTF-based inter + intra (Doc-Sen-Term)	The NMTF-based sentence clustering using inter-relationships and intra-relationships among text objects shows better performance because it fully uses not only inter-relationship among sentences, terms and documents but also intra-relationships among the three text objects.
NMTF-based inter (Doc-Sen) NMTF-based inter (Sen-Term) NMTF-based inter (Doc-Sen-Term)	The performance using sentences and documents was inferior to that of clustering sentences and terms because the document information is coarser than that of term information to a sentence. Moreover, when using inter-relationships among documents, sentences and terms, it shows the best performance than using inter-relationships between documents and sentences or sentences and terms.
Context-based	Sentence clustering based on context enrichment achieves a more meaningful performance than that based on concept enrichment, which corresponds to the fact that more related original information can assist in expressing a sentence’s meaning.
LSA-based	LSA can be used to construct specific, corpus-driven knowledge about words. Although Islam et al. [46] claimed that each dimension of the singular vector space captures a base latent semantics of the given document set and that each sentence in the document is jointly indexed by the base latent semantics in this space, negative values in some of the dimensions generated by the SVD make the explanation less meaningful. Thus, LSA cannot capture the exact semantic meaning of each sentence, which may reduce the accuracy of the sentence clustering result.
WordNet-based	The synonyms of words extracted from WordNet in the given document set are limited. In addition, some words do not exist in WordNet, and these words are usually named entities that can carry important information for summarization. In our experiments, about 8.73% of the words in the DUC2004 dataset and 7.76% of the words in the TAC2008 dataset do not exist in WordNet. WordNet also provides very general domain knowledge about words. All of these factors to some extent influence the accuracy of sentence clustering.
Surface word matching-based	The vector representation of the sentence is to be a very sparse representation. Thus, it does not provide enough contexts for computing cosine similarity of the two sentences.

We further conduct the paired t-test evaluation using the ROUGE-2 recall scores, the primary DUC evaluation criterion, on all 50 DUC 2004 document sets and 48 TAC 2008 document sets. The hypothesis is that “the first algorithm is equal to or inferior to the second one in ROUGE-2,” and the significance level was 5%. The p-values presented in Table 4 suggest that all the hypotheses were rejected, which means the first algorithm is superior to the second one. The evaluation results further confirm that our analysis is correct.

Table 4

Hypothesis testing (paired t test) on the DUC2004 and TAC2008 datasets

	P value
	DUC2004	TAC2008
LSA-based vs. K-means	0.02136	0.02715
NMTF-based inter (Doc-Sen) vs. LSA-based	0.02245	0.02841
NMTF-based inter (Sen-Term) vs. NMTF-based inter (Doc-Sen)	0.02361	0.03102
NMTF-based inter + intra (Doc-Sen) vs. NMTF-based inter (Sen-Term)	0.03631	0.03316
NMTF-based inter + intra (Sen-Term) vs. NMTF-based inter + intra (Doc-Sen)	0.03346	0.02857
NMTF-based inter + intra (Doc-Sen-Term) vs. NMTF-based inter (Doc-Sen-Term)	0.03797	0.03679

5.4 Further analysis on cluster quality’s improvement

The aim of the NMTF-based sentence clustering framework is to generate more accurate sentence clusters by iteratively refining two text objects clustering (i.e., sentence clustering and term clustering, or sentence clustering and document clustering) or iteratively refining three text objects clustering (i.e., sentence clustering, term clustering and document clustering).

Figure 2 plots the values of Q (the sentence cluster quality) in each iteration of the four proposed NMTF-based sentence clustering algorithms on the DUC2004 dataset, the Q values corresponding to the iterations greater than zero illustrate the qualities of the sentence clusters generated by the four new NMTF-based sentence clustering algorithms (i.e., using inter-relationships between sentences and terms, using inter-relationships and intra-relationships between sentences and terms, using inter-relationships among documents, sentences and term and using inter-relationships and intra-relationships documents, sentences and terms, respectively). The curves clearly show that the quality of sentence clustering using inter-relationships between sentences and terms is much lower than the other three approaches. The increase of Q indicates the improvement of the cluster quality.

Fig.2

Trends of the cluster quality with increased iteration numbers on the DUC2004 dataset.

While Q directly evaluates the quality of the generated clusters, we also were interested in knowing whether the improved quality of clusters can further enhance the quality of sentence ranking and thus raise the performance of summarization. Therefore, we also evaluated the ROUGEs in each iteration of the four algorithms. Figure 3 illustrates the increase of ROUGE-1, ROUGE-2, and ROUGE-SU4 results on the same dataset mentioned earlier. The figure further shows that the performance of the generated summaries using inter-relationships between sentences and terms is inferior to the performance of the other three algorithms. For fair comparison, except the clustering algorithms, all the other processes involved (i.e., preprocessing, redundancy control, and summary generation) remained the same. Figure 3 demonstrates the significant role of the proposed NMTF-based sentence clustering algorithms in summarization.

Fig.3

Trends of ROUGEs with increased iteration numbers on DUC2004 Dataset.

6 Conclusion

Sentence clustering relies heavily on sentence similarity methods, but the sentences with similar meaning always share few common words, so traditional bag-of-words cosine similarity is no longer suitable for measuring sentence similarity. In this paper, we propose a new multi-document summarization framework based on sentence cluster using Nonnegative Matrix Tri-Factorization (NMTF). The proposed framework employs NMTF to cluster sentences using inter-type relationships among documents, sentences and terms, and incorporate the intra-type information through manifold regularization. Experimental results show that the framework is able to generate more reasonable sentence clusters, which in turn lead to more meaningful summarization performance. In future studies, we will focus on the influence of other proper context definitions besides terms and documents on sentence clustering to further enhance the performance of sentence clustering.

Footnotes

The senses of words in WordNet are ranked according to frequency. The first sense is more likely than the second, the second is more likely than the third, etc. So we use the first sense of each word from WordNet.

Words which are filtered out prior to, or after, processing of natural language data (text).

http://tartarus.org/ martin/PorterStemmer/index.html

Acknowledgments

The work described in this paper was supported by China Postdoctoral Science Foundation Funded Project (No. 2017M613205), Basic Research Project Funded by Shenzhen Science and Technology Innovation Committee (No. 201703063000511) and Scientific Research Foundation of Northwestern Polytechnical University (Nos. 3102016QD009 and 3102016QD010).

References

Kageback

, Mogren

, Tahmasebi

and Dubhashi

, Extractive summarization using continuous vector space models, Proceedings of the 2nd Workshop on Continuous Vector Vector Space Models and Their Compositionality (CVSC) @ EACL, 2014, pp. 31–39.

Erkan

and Radev

, Lexrank: Graph-based centrality as salience in text summarization, Journal of Artificial Intelligence Research 22(1) (2004), 457–479.

Qazvinian

and Radev

D.R.

, Scientific paper summarization using citation summary networks, Proceedings of the 22nd International Conference on Computational Linguistics (COLING2008), 2008, pp. 689–696.

Aliguliyev

R.M.

, A new sentence similarity measure and sentence based extractive technique for automatic text summarization, Expert Systems with Applications 36 (2009), 7764–7772.

Lee

J.H.

, Park

, Ahn

C.M.

and Kim

, Automatic generic document summarization based on non-negative matrix factorization, Information Processing and Management 45 (2009), 20–34.

Barzilay

and Mckeown

K.R.

, Sentence fusion for multi-document news summarization, Computional Linguistics 31(3) (2005), pp. 297–327.

Mihalcea

, Graph-based ranking algorithms for sentence extraction, applied to text summarization, Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL’04) (2004), pp. 20–24.

Ferreira

, Cabral

L.S.

, Freitas

, Lins

R.D.

, Silva

G.F.

, Simske

S.J.

and Favaro

, A multi-document summarization system based on statistics and linguistic treatment, Expert Systems with Applications 14(3) (2014), 5780–5787.

Wan

and Yang

, Multi-document summarization using cluster-based link analysis, Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08), 2008, pp. 299–306.

10.

Zheng

H.T.

, Gong

S.Q.

, Chen

, Jiang

and Xia

S.T.

, Multi-document summarization based on sentence clustering, Neural Information Proceesing, Lecture Notes in Computer Science 8835 (2014), 429–436.

11.

Zhang

, Xia

Y.Q.

, Liu

and Wang

W.M.

, Clustering sentences with density peaks for multi-document summarization, Proceeding of the 2015 annual conference of the north American Chapter of the ACL, 2015, pp. 1262–1267.

12.

Steinberger

and Kristan

, LSA-based multi-document summarization, Proceedings of 8th International PhD Workshop on Systems and Control, 2007.

13.

, Li

and Wu

, Query focus guided selection strategy for DUC 2006, Proceedings of the Document Understanding Conference (DUC’06), 2006.

14.

Zha

, Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering, Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrival (SIGIR’02), 2002, pp. 113–120.

15.

Yeh

J.Y.

, Ke

H.R.

, Yang

W.P.

and Meng

I.H.

, Text summarization using a trainable summarizer and latent semantic analysis, Information Processing and Management 41 (2005), 75–95.

16.

Park

, Lee

J.H.

, Kim

D.H.

and Ahn

C.M.

, Document summarization using non-negative matrix factorization and relevance feedback, Proceedings of the 2008 International Conference on Convergence and Hybrid Information Technology, 2008, pp. 301–306.

17.

Shen

, Li

and Ding

H.Q.

, Integrating clustering and multi-document summarization by bi-mixture probabilistic latent semantic analysis (PLSA) with sentence bases, Proceedings of the 17th AAAI conference on Artificial Intelligence, 2011, pp. 914–920.

18.

Tan

J.W.

, Wan

X.J.

and Xiao

J.G.

, Joint matrix factorization and manifold-ranking for topic-focused multi-document summarization, Proceedings of the 38th Annual International ACM SIGIR Conference on Research and Development in Information Retrival (SIGIR’15), 2015, pp. 987–990.

19.

Ding

, Li

, Peng

and Park

, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’06), 2006, pp. 126–135.

20.

Cai

X.Y.

and Li

W.J.

, Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization, Journal of the American Society for Information Science and Technology 62(10) (2011), 2067–2082.

21.

Belkin

, Niyogi

and Sindhwani

, On manifold regularization, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTAT’05), 2005.

22.

Boyd

and Vandenberghe

, Convex optimization, Cambridge University Press, Cambridge, 2004.

23.

Ding

C.H.

, Li

and Jordan

M.I.

, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence 32(1) (2010), 45–55.

24.

Q.Q.

and Zhou

, Co-clustering on manifolds, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 359–368.

25.

Wang

D.D.

, Zhu

S.H.

, Li

, Chi

and Gong

Y.H.

, Integrating clustering and multi-document summarization to improve document understanding, Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08), 2008, pp. 1435–1436.

26.

Dhillon

, Co-clustering documents and words using bipartite spectral graph partitioning, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’01), 2001, pp. 269–274.

27.

Wang

, Huang

and Ding

, Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization, Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM’11), 2011, pp. 279–284.

28.

W.Y.

, Ng

W.K.

, Liu

and Ong

K.L.

, Enhancing the effectiveness of clustering with spectra analysis, IEEE Transactions on Knowledge and Data Engineering 19(7) (2007), 887–902.

29.

Carbonell

J.G.

and Goldstein

, The use of MMR, diversity-based reranking for reordering documents and producing summaries, Proceedings of the 21st Annual International Conference on Research and Development in Information Retrieval (SIGIR’98), 1998, pp. 335–336.

30.

McKeown

K.R.

, Kalvans

J.L.

, Hatzivassiloglou

, Barzilay

and Eskin

, Towards multi-document summarization by reformulation: Progress and prospects, Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, 1999, pp. 453–460.

31.

Newman

M.E.J.

and Girvan

, Finding and evaluating community structure in networks, Physical Review E 69(2) (2004), 8577–8582.

32.

Lin

C.Y.

and Hovy

, The automated acquisition of topic signature for text summarization, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2000, pp. 495–501.