A citation recommendation method based on context correlation

Abstract

Researchers need to formulate their achievements as research papers. Representative references are essential to high-quality papers. Academic citation recommendation refers to providing the recommendation of citations for the author of papers when they write. With the help of citation recommendation, researchers can improve the efficiency of writing academic papers and reduce the omission of important related literature. To achieve this goal, some methods were proposed. Many of them used citation networks to learn the representation of papers and chose references, they tended to ignore the content properties of papers. There are also some methods used partial properties to recommend citation. But their performance can be further improved. In this paper, we propose a citation recommendation method based on context correlation. We use two neural network models to learn the representations of papers and their references, then calculate the context similarity of them. Besides, we also introduce the publishing time and authority of papers, two key properties of papers for citation evaluation. In the experiment section, we compare our method with other methods and evaluate the performance of different properties choice in our method, it shows that our method outperforms some baselines and the combination of the dimensions including time, authority and context performs better.

Keywords

Citation recommendation context correlation neural networks citation network authority

1. Introduction

Researchers often need to read large volumes of literature during conducting an academic study, and determine the overall research goals in project proposals and so on. Subsequent scientific research activities usually include two stages: carrying out scientific research and summing up fruits. In this paper, we focus on the latter stage. During this stage, a researcher needs to summarize the results of scientific research. When writing a paper, the researcher needs to cite suitable references from numerous papers that were read before. However, it is unnecessary to cite all the literature the researcher has read. When researchers want to pick suitable references accurately, citation recommendation will help. Citation recommendation is a major research direction. It can be divided into the following categories according to the recommended scenarios: global citation recommendation, local citation recommendation, potential citation recommendation, and cross-language citation recommendation. Global citation recommendation means recommending the citations that have high consistency with the paper according to the analysis of the whole article; local citation recommendation refers to recommending the citations with high consistency with some parts of the article; potential citation recommendation refers to the search for some citations with indirect citation relationship based on some literature already cited in the article; cross-language citation recommendation is also the recommendation of the citations with high consistency. The difference is that in this scenario, the recommended citations are written in different languages.

The concept of citation recommendation was first proposed by Strohman and others in 2007. This is the rudiment of global citation recommendation [29]. They use some features of papers to recommend. Their work performance is not satisfactory, but they attracted a great deal of interest from researchers. After that, plenty of citation recommendation approaches were proposed.

He et al. learned the closed-form estimate of density matrices, calculated the document similarity for global citation recommendation, and calculated the relevance of documents to context for local citation recommendation [11]. Ebesu et al. used a model trained by a neural network with context information and author information to score citations and then choose the top K documents [7]. They all chose the context of papers to recommend citations, but they hardly considered the structural features of citation networks other than directly connected edges in the network (i.e. first-order proximity). However, in a citation network, the edges are sparse, it is difficult to achieve good representations that can preserve the global network structures by only using first-order proximity. Tang et al. explored second-order proximity between the vertices to get more structural information [30]. Dong et al. further designed a heterogeneous network representation that could represent different types of vertices and links so that it can be used to calculate paper similarity [6], Jiang et al. used Relation Type Usefulness Distributions to construct hierarchical random walks and then learned a hierarchical representation on heterogeneous graphs to finish cross-language citation recommendation [14]. The heterogeneous graph contains different types of vertices and links, the extra amount of computation is generated for preserving the structural information as well as the context information.

We propose a novel citation recommendation method that can reduce computation while preserving the above information. In our method, we can preserve first-order proximity, second-order proximity, degree properties. We also use the paper context and the publishing time to improve the accuracy of computing similarity. We use a two-part neural network to score candidate papers. Two parts are used for training the weights and learning the context representation respectively. The latter combines first-order proximity, second-order proximity and the context. As our method is designed for local citation recommendation, this part trains two representation models for the input partial paper and candidate papers respectively. We also use degree centrality, betweenness centrality and closeness centrality to measure the authority of candidate papers so that the degree properties in the citation network can be preserved and utilized. We design experiments on the benchmark datasets used in [26, 32], compare our method with several citation recommendation methods and explore the effect of different properties selection on the performance of our method.

The main contributions of our work are summarized:

1.
We propose a new neural network for citation recommendation that can perform much better than several baselines. This model can save much time by reducing calculation of different types of vertices and links. The model can also be used to recommend technical documentation of the computer field.
2.
Our method can preserve some common structural information of citation network. We use first-order proximity and second-order proximity to help train the context representation model, which has never been conducted before to our best knowledge, our experiments can prove the effectiveness of this method.
3.
We explore the effect of different properties combinations on our method and select the better properties for citation recommendation.

We organize the rest of the paper as follows. In Section 2, we discuss related work. In Section 3, we define the problems and concepts researched in this paper. In Section 4, we introduce the proposed method. In Section 5, we introduce the datasets and design the experiments to prove the effectiveness and efficiency of our method and the combination of time, authority, and context performs better in our method. At last, we conclude this paper and introduce future work in Section 6.
2. Related work

2.1 Citation recommendation

As for citation recommendation, some researchers tapped into the deep meaning of literature, they applied topic models to obtain the topic distribution, to perform citation recommendation depending on literature topics. If potential topics of a paper can be found, the topic model offers low-dimensional document representation. Tang et al. proposed the RBM-CS model for the paper versus citing relationship and employed the word term-topic-citation relationship to recommend citations [31]. Likewise, He et al. employed the topic model in citation recommendation, however, they divided recommendation units into global contexts based on abstracts and titles, and local contexts of specific paper texts, and extracted the topic models for the recommendation in cases of two different contexts, respectively [11]. Following this approach, Dai et al. proposed a novel probabilistic topic model to automatically recommend citations for researchers, they also calculated the community relevance among authors for effective citation recommendation [4].

It might be difficult to use only text topics for finding convincing citations. Therefore, researchers began to recommend citations using other methods. For example, McNee et al. recommended citations for users by leveraging the links between different literature [19]. Zhou et al. built a directed weightless graph to reflect the correspondence between literature and authors as well as the relation between literature and journals. By combining these relations, they assessed the similarity between literature and hence recommended citations [34]. Galke et al. utilized adversarial autoencoders to reconstruct the sparse item vectors, the autoencoders can be applied to citation recommendation. They focused on the problem of data sparsity but ignored the structure information of the citation network, which is considered in our method [9]. Further, based on the citation network, Shi et al. defined the co-authorship and co-citation rules to compute the strength of nodes and cluster citations, and depending on the clustering result, they looked for those candidate papers similar to the input paper and recommended them to the user [28]. Besides, Zarrinkalam and Mohsen used six types of relations between different literature, which along with literature text features constituted jointly semantic distance between different literature. By computing the semantic distance between input texts and the papers in references, this system recommended citations with smaller semantic distance to the user [33]. The methods mentioned above are mainly designed for global citation recommendation, while our method serves for local citation recommendation mainly. Our method can be adapted to global citation recommendation, and even cross-language citation recommendation. Another difference is that our method can use more structure information of the citation network than those methods, thus we can get higher accuracy.

2.2 Network embedding

Network embedding is used to learn the representation of the nodes in a network. This method can represent many types of networks. The citation network is one of them. In this field, the key point is to learn a representation that supports reconstructing the original network and inferences. Perozzi et al. learnt a latent representation through constructing short random walks which can extract the information in a network [23], the frequency of nodes appearing in random walks of a network is similar to that of words in natural languages so that they could borrow from natural language processing methods. While Tang et al. proposed a method related to DeepWalk, they learnt the representation by preserving the first-order proximity and the second-order proximity [30], our method also uses these two important properties to make representation of paper contexts which we learn can be used to restructure the original context. The ability of network inference can be proved by the experiments. We can use the latent representation of context to find context-related candidate papers.

2.3 Autoencoder

Autoencoder is an unsupervised neural network that can learn the representation of the input. The input can be various, Li et al. train an LSTM autoencoder for paragraphs and documents [16], while Pan et al. use it to learn a graph embedding [24], Hong et al. use it to capture the image features and map 2D images and 3D poses so that human pose can be recovered through monocular videos [12]. Inspired by these works, we use an autoencoder to learn the representation of paper context due to its representation ability.

3. Problem definition

We first introduce the notations used throughout the paper. In a local citation recommendation system (LCRS), all the documents constitute the document set $D={\{}d_{1},d_{2},d_{3},\ldots,d_{n}{\}}$ , the words appeared in all the documents make up a bag of words $WS={\{}w_{1},w_{2},w_{3},\ldots,w_{m}{\}}$ . Let $CD={\{}cd_{1},cd_{2},cd_{3},\ldots,cd_{s}{\}}$ denote the candidate document set recommended to be cited. One document $d={\{}p_{1},p_{2},p_{3},\ldots,p_{t}{\}}$ consists of many paragraphs, a paragraph $p={\{}w_{1},w_{2},w_{3},\ldots,w_{o}{\}}$ consists of many words. All the documents and their relations form a citation network $CN=(V,E)$ , $V={\{}v_{1},v_{2},v_{3},\ldots,v_{r}{\}}$ is the vertex set where a vertex stands for a document, $E\subseteq(V\times V)$ are edges in the citation network. Our task is to recommend some $c d$ for a given paragraph $p$ . To find a solution for our task, we need to solve the following subproblems.

SUBPROBLEM1 How to design a standard to choose some $c d$ from $C D$ as citations.

In our task, there are lots of candidate documents, we should choose some from them as the citations. This can be seen as a TopK problem or a classification problem. If it is a TopK problem, we should have a method that can score these documents and choose K documents as the result, the parameter K can be set according to the need. If we treat it as a classification problem, a document will be classified as related or unrelated, then the number of the recommended documents is not controlled. Thus we treat it as a TopK problem.

SUBPROBLEM2 In the representation model trained, how to preserve the structure of the citation network and the context relations of documents.

The structure of the citation network shows the relation of documents and document properties show the features of documents. We should find an efficient representation method to reduce the complexity of the heterogeneous network representation so that we can find the target documents accurately and efficiently.

4. Context-based citation recommendation method

As Section 3 mentions, we develop our work according to the order of the subproblems. The inputs of LCRS are a document set $D$ and a candidate document set $C D$ . $CD\subseteq D$ , we firstly preprocess the data to improve the quality of the data, then extract and filter the word terms from the documents in $D$ , we get a bag of words ( $W S$ ). $D$ is also used to construct the citation network ( $C N$ ) with the help of the citation relationship between different documents, the citation relationship can be accessed in the dataset according to [26]. Then, we receive a paragraph as the input, extract the context representation of the paragraph using the $W S$ and an autoencoder. We search for the target citations in $C D$ , $C N$ is used to calculate the authority of a reference and to provide the structural information for the autoencoder. We use a forward neural network to calculate the probability of selection of a candidate document as its score, choose the documents with the top K highest scores as the final citations, and recommend them to the user.

4.1 Preprocessing and preparation

From AAN (ACL Anthology Network) and DBLP (DataBase systems and Logic Programming), we can get the text contents, the metadata and the citation relationship about a document. We firstly remove incomplete literature, including “No content”, “No abstract”, “No introduction” papers and those whose relevant references are not in the data set. According to the citation relationship and the publication time, we delete those documents that were published over 1 year and have not been cited, this is because we regard these documents as low-quality documents, and they should not be used as the guidelines of our method. If we learn the patterns of these papers when they choose citations, the papers written under the guidance of our trained model are likely to become low-quality papers. Also, there is another situation that should not appear in our data set as training data. Among the documents we download, there are documents published at different times and as we know, a document published in 2009 cannot cite a document published in 2010, so such non-reference documents should not be used as negative examples in our training set.

We create a word bag according to the high-quality documents filtered from $D$ , we use the natural language processing method introduced in [18] to segment words and stemming words, these words make up a word bag. Besides removing stop words, we also remove the author name and the publication time from the word bag because when we recommend a document for a paragraph in practice, a paragraph does not contain this information. As our method works for local citation recommendation, we need paragraph-document citation relationship. In AAN dataset, the Parscit file contains several xml files, if the “valid” value of the “citation” tag is “true” in xml, the xml contains the context information corresponding to the citation, this citation relationship is paragraph-document citation relationship. Another citation relationship is document-document citation relationship, this can be used to construct a citation network. When constructing the citation network, a vertex denotes a document, if two documents have document-document citation relationship, then we link these two vertices with a directed edge.

4.2 Score model

In our method, we use a forward neural network as the score model. The forward neural network receives input from three parts: the first part is the publication time used for searching for documents, the second is the document authority and the last is the context similarity between the input paragraph and the candidate document. The output is the score of the candidate document, documents are ranked according to the score obtained. The forward neural network consists of an input layer, an output layer and several hidden layers, the number of hidden layers and the number of neurons in each hidden layer are hyperparameters, in the experiment part, we will contrast the effects of different hyperparameters and select the best values.

Because the effect of three inputs on the score is non-linear, we use sigmoid function as the activation function to realize nonlinearization. To simplify the operation, we also map the output value of the output layer to the interval [0, 1]. In the training process, the expected output of positive samples is 1, while that of negative sample is 0. The loss function we use is the square loss function, we constantly adjust the weight of the network by minimizing the loss function.

In the following parts, we will introduce how to calculate the document time, document authority and document-paragraph similarity.

4.3 Time

When we search for papers in academic search engines, we find that search sites often have selection options about time. The default tabs include screening documents in the last year, the last two years or the last five years. This is because the relevance of a document is often dependent on its publication time, as time goes by, technology evolves, old documents are also less likely to be referenced. In the preprocessing and preparation part, we have explained that the paragraphs are all newer than their citations or their non-reference documents. As the effect of time is non-linear and the closer the publication time of the document is to the current time, the more obvious the difference is, so we use the reciprocal of the year gap as the time representation to input into the score model.

4.4 Authority

Among all the documents in $D$ , some documents are cited more frequently than others, these documents are often of high quality and high authority. In the academic field, achievements in a new document usually inherit and develop those in the previous document. By referring to other documents, the citation relationship is built, and a network built based on such a citation relationship is called the citation network. In the preprocessing and preparation part, we have introduced how we construct a citation network. The citation network reflects knowledge inheritance, and by analyzing the citation network, we can come to know much important information about knowledge relevance, domain development trend and so on. Citation network analysis can help acquire latent knowledge, thus some researchers have applied citation networks to citation recommendation [10, 32]. The frequently-used first-order proximity and second-order proximity are used in the process of calculating the context similarity in our method, we will introduce this in the following part. In this part, we mainly use the centrality to calculate an authority for each document. In network analysis, the centrality of network nodes is often used to denote the importance of nodes. There are many methods to define the centrality to identify the effect of each node, including degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality [5, 17]. They are introduced in the following:

1.
Degree centrality: In a directed graph, the centrality can be the indegree centrality or outdegree centrality. In the citation network, the indegree centrality of one node stands for the frequency of this document cited by others. The greater the indegree centrality is, the more important the document is. Each document has a similar outdegree, so we don’t use outdegree as a measurement of document importance.
2.
Betweenness centrality: It is measured by the number of the shortest paths crossing a node in the citation network. By connection density, the citation network can be divided into several communities, where a community represents a research group [15, 21]. If a node has high betweenness centrality, then it may link different knowledge communities, it may have the potential to be cited by these communities.
3.
Closeness centrality: It is a structural measure of the importance of a node in a network, which is based on the ensemble of its distances to all other nodes [3] and reflects how close a node is to other nodes in the network. If a node has higher closeness centrality, then it suggests that this node is at a more central position and can arrive at other nodes at ease.
4.
Eigenvector centrality: It is a measure of the importance of a node, depending on the number of its adjacent nodes as well as their importance. A node with higher eigenvector centrality has a greater influence in the citation network [2, 27].

In addition to these centrality attributes, our method also uses publisher information to calculate the authority of documents. This is because in the academic field, different academic conferences or journals often have different rankings or influencing factors, and the authority of these publishing media also has a certain impact on the authority of documents.

We propose a new method for computing document authority. A high authority document has the following features:

1.
It is frequently cited by other documents, which is represented by higher indegree centrality in the citation network.
2.
A high authority document refers to a core representative document in several subfields, such a document can connect different subfields and thus can be cited by the documents in different subfields. This property can be represented by betweenness centrality in the citation network.
3.
It has some authoritative followers. If many authoritative documents cite the same document, then this document can be seen as an authoritative document. This property can be represented by eigenvector centrality in the citation network.
4.
It is often published in some prestigious publishers. As prestigious publishers have strict audits, their documents are more likely to have a high quality and be cited by other documents.

We use three kinds of centrality except for closeness centrality. This is because if we use closeness centrality, the distance from one node to all other nodes is calculated. When we calculate them, the distance between nodes with loss relationship is set a big value, as the citation network is very sparse, the total distance is the sum of many big values and few small values. The influence of big value is huge, so the difference between the results is small. Therefore, closeness centrality is abandoned from our method. Next, we introduce how these factors are calculated and how we can combine them to calculate the authority of documents:

1.
The indegree centrality equals the citation frequency, which can be obtained through a scholar search engines. We get the citation frequency of a document from the datasets and use this number as the indegree centrality of a document in the citation network.
2.
Generally speaking, the betweenness centrality can be calculated using the shortest path method [1]. But in the sparse citation network, calculating the betweenness centrality by distance is not efficient enough for the geographical concept of far or near is not clear enough. So we use graph community detection method to cluster nodes in the citation network and calculate the betweenness centrality of nodes by counting the association between nodes and different clusters [8]. The algorithm to calculate the betweenness centrality is shown in Algorithm 1.

[h] Calculating betweenness centrality[1] $M$ Citation network adjacency matrix Betweenness centrality set GetBCentrality $M$ $\textit{cluster}\leftarrow$ GraphCommunity $M$ $k,i\leftarrow 0$ , $l\leftarrow\textit{M.length}$ $k<l$ $\textit{list}\leftarrow\left\{\right\}$ $i<l$ $(M[k,i]=1\textbf{ or }M[i,k]=1)$ and $\textit{cluster}[i]$ not inlistadd $\textit{cluster}[i]$ to list $i++$ $\textit{result}[k++]\leftarrow\textit{list.length}$ resultGraphCommunity $M$ $k\leftarrow 0$ , $l\leftarrow\textit{M.length}$ //Initial cluster of nodes, a node belongs to a class $k<l$ $\textit{cluster}[k++]\leftarrow\textit{class}_{k}$ //E: number of cells with 1 in M, $d_{i}$ : degree of node $i$ , $\textit{class}_{i}$ : class of node $i$ $\delta(a,b)$ is 1 when $a=b$ , 0 when $a\neq b$ $Mo\leftarrow\frac{1}{E}\sum_{i,j=1}^{l}(M[i,j]-\frac{d_{i}d_{j}}{E})\delta(% \textit{class}_{i},\textit{class}_{j})$ nodes in cluster belong to more than one class $\Delta Mo\leftarrow 0$ each $(\textit{class}_{i},\textit{class}_{j})$ in all classes pairs $\textit{class}_{\textit{new}}\leftarrow\textit{class}_{i}+\textit{class}_{j}$ $Mo_{\textit{new}}\leftarrow\frac{1}{E}\sum_{i,j=1}^{l}(M[i,j]-\frac{d_{i}d_{j}% }{E})\delta(\textit{class}_{i},\textit{class}_{j})$ $\Delta Mo\leftarrow\max(Mo_{\textit{new}}-Mo,\Delta Mo)$ $Mo\leftarrow Mo+\Delta Mo$ //Record $M o$ and corresponding clusterselect the biggest $M o$ and clustercluster //Mo is biggest when choosing this cluster
3.
The eigenvector centrality is related to the adjacent nodes, we use the idea of PageRank algorithm to calculate it [22]. Every document in the citation network gets an initial $D R$ value, Next, the random access probability is introduced and the $D R$ value is updated according to the adjacent nodes. The formula is shown in Eq. (1).

$\displaystyle DR(d_{i})=\alpha\sum\limits_{d_{j}\epsilon A(d_{i})}\frac{DR(d_{% j})}{L(d_{j})}+\frac{(1-\alpha)}{N}$ (1)

Herein, $A(d_{i})$ is the adjacent nodes of document $d_{i}$ , $L(d_{j})$ is the outdegree of document $d_{j}$ , $\alpha$ is the random access probability, $N$ is the number of nodes. When the $D R$ value tends to be stable at the end of iteration, we get the eigenvector centrality. The algorithm to calculate the eigenvector centrality is shown in Algorithm 2.

[h] Calculating eigenvector centrality[1] $M$ Citation network adjacency matrix Eigenvector centrality set GetECentrality $M$ $\textit{result}\leftarrow\{{1,1,\ldots,1}\}$ //Lenghth of result is l $k\leftarrow 0$ , $l\leftarrow\textit{M.length}$ , $\textit{minErr}\leftarrow 0.001$ , $\textit{error}\leftarrow 1$ $\textit{error}>\textit{minErr}$ $\textit{error}\leftarrow 0$ $k<l$ // $\alpha$ : random access probability, $Od_{i}$ : outdegree of node $i$ $\textit{temp}\leftarrow\textit{result}[k]$ $\textit{result}[k]\leftarrow\alpha\sum_{i=1}^{l}\frac{M[i,k]\cdot\textit{% result}[i]}{Od_{i}}+\frac{1-\alpha}{l}$ $\textit{error}\leftarrow\textit{error}+\textit{result}[k++]-\textit{temp}$ $\textit{error}\leftarrow\frac{\textit{error}}{l}$ result
4.
The publisher related authority depends on the quality of journal or conference, journals have impact factors ( $I F$ ) that can be directly used, the $I F$ values of conferences are given according to the rank in CCF (China Computer Federation) [13], conferences with Rank A, B and C are set 4, 2 and 1 separately.

Next, we train a model to calculate the authority of a document. The inputs are the indegree centrality, the betweenness centrality, the eigenvector centrality and the publisher related authority, the output is the authority. The authority consists of the citation frequency and the popularity of the document on the Internet. We count the popularity through a web spider, which can browse blog websites, forum websites, education websites of the related fields. We train the model instead to calculate the authority instead of getting it by the citation frequency and the popularity because the workload of getting the popularity especially the workload of the web spider is enormous. We just use the popularity to construct a dataset for training and testing and use the trained model. This model is also a forward neural network, the difference with the score model lies in the number of network layers and the number of neurons in each layer.
4.5 Context similarity

Context similarity in our paper is the context similarity between paragraphs and documents. We learn the representation of a paragraph as rp and the representation of a document as dp. rp and dp are located in the same vector space. The cosine distance between rp and dp can be calculated to show the similarity between paragraphs and documents. The smaller the cosine distance, the more similar they are. We use two different autoencoders to learn rp and dp respectively. As mentioned above, we use first-order proximity and second-order proximity when constructing positive samples.

In our method, first-order proximity denotes the relationship between a paragraph and its citations, second-order proximity denotes the relationship between a paragraph and the documents with one or more same references as it cites. We use Fig. 1 to explain first-order proximity and second-order proximity. This paragraph and four citations ([2, 6, 8, 16] in Fig. 1) are first-order proximity, while another document [25] which cites the third citation in Fig. 1 is not cited by this paragraph, this paragraph and [25] are second-order proximity.

Figure 1.

Example of paragraphs citing documents.

Figure 2 shows a citation network, the arrows represent the citation relationship. As described in [30], the relation of (a, c), (a, b), (f, d), (d, c) and (e, c) in Fig. 2 belongs to first-order proximity, the relation of (a, f), (a, d) and (a, e) belongs to second-order proximity. But first-order proximity and second-order proximity in our paper are slightly different because the problem is to recommend citations for paragraphs.

Figure 2.

An example of citation network.

The model for calculating context similarity is shown in Fig. 3. The model contains two autoencoders: one is used to learn the representation of paragraphs, another is for documents. The representations learned are used to calculate the context similarity.

Figure 3.

The model for calculating context similarity.

The input, output and learned representation of the left model in Fig. 3 are $p_{1}$ , $p_{2}$ and $p^{\prime}$ respectively. On the right are $d_{1}$ , $d_{2}$ and $d^{\prime}$ respectively. The loss functions are shown in Eq. (2).

$\displaystyle{L}=\left({1-\frac{p_{1}\cap p_{2}}{p_{1}\cup p_{2}}}\right)+% \left({1-\frac{d_{1}\cap d_{2}}{d_{1}\cup d_{2}}}\right)-\frac{\mathop{d^{% \prime}}\limits^{\to}\cdot\mathop{p^{\prime}}\limits^{\to}}{||d^{\prime}||% \cdot||p^{\prime}||}$ (2)

We use Jaccard distance as the loss function of the autoencoders and the complement of the cosine distance between the paragraph representation and the document representation as the loss function of the context similarity calculation part. The Jaccard similarity coefficient in Eq. (2) is the ratio of the sum of the minimum values of each word term to the sum of the maximum values in two vectors. In the training step, we use SGD to minimize the loss function and update the weights in the model. The inputs of these two autoencoders are word terms, global word terms in all documents and word terms in a paragraph are both gained by the method in [18]. As a word term is mentioned in different documents, we use the TF-IDF (term frequency – inverse document frequency) value of a term as the input vector instead of the one-hot encoder to distinguish different documents and paragraphs [33]. After training, we use two encoders to represent a paragraph and a document and calculate the cosine distance between the two representations as the context similarity between the paragraph and the document.

Using the score model with the time score, authority score and context similarity score, we can gain a score for the documents in the candidate document set. Those documents with higher scores are recommended.

5. Experiments

In the experimental section, we use the AAN (https://www.aclweb.org/anthology/) and DBLP (https://dblp.uni-trier.de/db/) as the dataset. ACL Anthology was established by the Association for Computational Linguistics. It provides full-text retrieval, paper metadata, citation metadata and citation context. We download documents published in the last decade from ACL Anthology. The papers in the digital library are connected through the citation link. DBLP was developed and maintained by Michael ley of the University of Trier. It only provides metadata of scientific documents in the computer domain. We download documents published in the last decade based on the metadata.

After the clean work mentioned in the preprocessing and preparation part, we get a cleaned AAN dataset with 5642 documents and a cleaned DBLP dataset with 24672 documents. Paragraphs in documents are similar to the one shown in Fig. 3, with several citations for each paragraph. As the number of documents cited by a paragraph is far less than the number of documents not cited by this paragraph, so the number of positive and negative samples is unbalanced, we use the method of random under-sampling in sampling. We perform 10-fold cross-validation to evaluate the methods.

To evaluate our method – context correlation based method (CCB), we select four evaluation metrics: mean average precision (MAP), mean reciprocal rank (MRR), recall and normalized discounted cumulative gain (NDCG). The higher their value is, the better the results are. MAP is the average of experimental results. MRR measures the exact rank position of recommended documents. Recall@K is the ratio of the number of cited documents in the recommended TopK list to that of all cited documents. NDCG@K measures the performance of a method by the relevance score and the position of documents in the TopK list.

The formulas for the metrics are shown in Eqs (3)–(6). $P$ is the paragraph set, $p_{i}$ is a paragraph, $T D$ is the true reference set, $R D$ is the recommended ranking list, $d_{i}$ is a document, $n(d_{i})$ is the number of the true reference papers which ranks higher than $d_{i}$ , $d_{r}$ is the most relevant paper in the true references, $S(i)$ is the score of rank $i$ in the ranking list, $F_{k}$ is the normalization factor that guarantees the NDCG of a perfect ranking is 1.

$\displaystyle\textit{MAP}=\frac{1}{|P|}\sum_{p_{i}\epsilon P}\sum_{d_{i}% \epsilon(TD\cap RD)}\frac{n(d_{i})+1}{n(d_{i})}$ (3) $\displaystyle\textit{MRR}=\frac{1}{|P|}\sum_{p_{i}\epsilon P}\frac{1}{\textit{% rank}(d_{r})}$ (4) $\displaystyle\textit{RECALL}=\frac{|TD\cap RD|}{|TD|}$ (5) $\displaystyle\textit{NDCG@K}=F_{k}\sum_{i=1}^{K}\frac{2^{S(i)}-1}{\log(i+1)}$ (6)

In our method, the number of the hidden layers in the score model and the authority model are both set as 4, the representation layers in the autoencoders have 32 nodes. The learning rate is initially set as 0.01. As we recommend citations for paragraphs, the number of the citations is less than that for a document, so we set $K$ as 5, 10 and 20 for the TopK list.

In the experiments, we firstly contrast the performance of different combinations in our method. For convenience, we use T instead of time, A instead of authority, and C instead of context similarity. We contrast the performance of CCB-C, CCB-CA, CCB-CT, CCB-CAT. CCB-C uses context similarity only, CCB-CA uses context similarity and authority, the remaining two are similar. The experiment results are showed in Table 1. The recall and NDCG of combinations of different properties are shown in Figs 4 and 5 separately.

Table 1

Experimental results of different properties combinations

Dataset	Approach	MAP	MRR	Recall@5	Recall@10	Recall@20	NDCG@5	NDCG@10	NDCG@20
AAN	CCB-C	0.286	0.713	0.485	0.542	0.602	0.337	0.361	0.416
	CCB-CA	0.302	0.734	0.511	0.564	0.611	0.349	0.381	0.441
	CCB-CT	0.291	0.715	0.490	0.548	0.605	0.344	0.368	0.425
	CCB-CAT	0.320	0.744	0.521	0.570	0.616	0.359	0.388	0.446
DBLP	CCB-C	0.281	0.702	0.467	0.529	0.588	0.334	0.341	0.399
	CCB-CA	0.301	0.730	0.485	0.538	0.597	0.347	0.360	0.415
	CCB-CT	0.293	0.708	0.471	0.533	0.593	0.340	0.357	0.409
	CCB-CAT	0.322	0.739	0.503	0.545	0.602	0.353	0.368	0.423

From Table 1, we find that the combination of time, authority and context performs best, context alone performs poorest, the combination of authority and context performs better than the combination of time and context. So we use the combination of time, authority and context in our method.

To verify the effectiveness of our method and the correctness, we contrast our method with some baseline models. We choose five: [11] context-aware, [7] NCN, [30] LINE, [23] DeepWalk, [20] MMRQ.

Figure 4.

Recall of different method properties combinations.

Figure 5.

NDCG of different method properties combinations.

Figure 6.

Recall of different methods.

Figure 7.

NDCG of different methods.

Experimental results are shown in Table 2. The recall and NDCG of different methods are shown in Figs 6 and 7 separately.

From Table 2, we find that our method outplays the others. The context-aware method has the worst performance, even CCB-C performs better than it because we calculate context similarity using some structural information (first-order proximity and second-order proximity). Both of DeepWalk and LINE are network embedding approaches, LINE has better performance due to its additional local network structure. NCN utilizes author information besides content information, thus, NCN is better than context-aware approach. MMRQ utilizes a three-layered graph including the author layer, the paper layer and the word layer, so MMRQ outperforms NCN. Compared with these approaches, our method use context information, publisher information, structural information and time information.

Table 2

Experimental results of different recommendation approaches

Dataset	Approach	MAP	MRR	Recall@5	Recall@10	Recall@20	NDCG@5	NDCG@10	NDCG@20
AAN	Context-aware	0.268	0.679	0.466	0.519	0.574	0.306	0.360	0.398
	DeepWalk	0.272	0.682	0.475	0.534	0.577	0.319	0.365	0.405
	LINE	0.286	0.693	0.483	0.542	0.588	0.327	0.370	0.419
	NCN	0.295	0.716	0.497	0.553	0.597	0.341	0.378	0.427
	MMRQ	0.311	0.727	0.511	0.561	0.608	0.348	0.384	0.438
	CCB-CAT	0.320	0.744	0.521	0.570	0.616	0.359	0.388	0.446
DBLP	Context-aware	0.266	0.668	0.449	0.497	0.565	0.302	0.329	0.385
	DeepWalk	0.275	0.677	0.461	0.509	0.571	0.315	0.334	0.391
	LINE	0.288	0.695	0.469	0.515	0.576	0.327	0.340	0.398
	NCN	0.298	0.706	0.486	0.531	0.588	0.335	0.347	0.409
	MMRQ	0.317	0.721	0.492	0.539	0.596	0.347	0.354	0.417
	CCB-CAT	0.322	0.739	0.503	0.545	0.602	0.353	0.368	0.423

6. Conclusions

When writing a research paper, a researcher has to screen suitable references from literature read before. However, the researcher normally has read a huge amount of literature over a long time. Therefore, it is prone to omission and improper citation of research achievements, thus citation recommendation is indispensable for writing papers.

To help the researcher spend less time selecting documents to be cited and improve the efficiency of writing academic papers, we propose a local citation recommendation problem and design a multi-unit score model combining citation network structure, context information, publishers and publish time to solve this problem. This model can utilize a variety of information to improve the accuracy of citation recommendation. To evaluate the performance of our method and find the best combination, we conduct some experiments, the results demonstrate that our method is effective and accurate.

In our future work, we plan to expand the score model to combine with author network information and test our model using more datasets.

Footnotes

Acknowledgments

The completion of the paper is attributed to many people who have offered selfless support to us.

First, we want to express my gratitude to The Association for Computational Linguistics for providing the open dataset. This has helped us save a lot of time in finding and organizing data.

Second, we want to thank professors and friends in our school, esp. Prof. Minbo Li, Shi Pu, Hongbo Zhao, Rongbin Zhu, Yuanwen Hu. They gave us technical and intellectual help.

Third, this work was supported by the National Nature Science Foundation of China [grant number 61671157], we want to give our heartiest thanks for the National Nature Science Foundation of China.

Last, we are extremely grateful to our family members, it is their understanding and support that help us finish this paper.

References

Brandes

, A faster algorithm for betweenness centrality, Journal of Mathematical Sociology 25(2) (2001), 163–177.

Ruhnau

, Eigenvector-centrality – a node-centrality? Social Networks 22(4) (2000), 357–365.

Cohen

et al., Computing classic closeness centrality, at scale, Eprint Arxiv, 2014, 37–50.

Dai

et al., Explore semantic topics and author communities for citation recommendation in bipartite bibliographic network, Journal of Ambient Intelligence and Humanized Computing 9(4) (2018), 957–975.

de Arruda

G.F.

Rodrigues

F.A.

and Moreno

, Fundamentals of spreading processes in single and multilayer complex networks, Physics Reports, 2018, 1–59.

Dong

Chawla

N.V.

and Swami

, metapath2vec: Scalable representation learning for heterogeneous networks, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 135–144.

Ebesu

and Fang

, Neural citation network for context-aware citation recommendation, in: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, pp. 1093–1096.

Fortunato

, Community detection in graphs, Physics Reports 486(3–5) (2010), 75–174.

Galke

et al., Multi-modal adversarial autoencoders for recommendations of citations and subject labels, in: Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, 2018, pp. 197–205.

10.

Guan

et al., Document recommendation in social tagging services, in: Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 391–400.

11.

et al., Context-aware citation recommendation, in: Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 421–430.

12.

Hong

et al., Multimodal deep autoencoder for human pose recovery, IEEE Transactions on Image Processing 24(12) (2015), 5659–5670.

13.

https://www.ccf.org.cn/, China Computer Federation, 2019.11.20.

14.

Jiang

et al., Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph, in: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018, pp. 635–644.

15.

Kim

and Hastak

, Social network analysis: characteristics of online social networks after a disaster, International Journal of Information Management 38(1) (2018), 86–96.

16.

Luong

M.T.

and Jurafsky

, A hierarchical neural autoencoder for paragraphs and documents, arXiv preprint arXiv:1506.01057, 2015.

17.

Maharani

and Gozali

A.A.

, Degree centrality and eigenvector centrality in twitter, in: 2014 8th International Conference on Telecommunication Systems Services and Applications (TSSA), IEEE, 2014.

18.

Manning

C.D.

et al., The Stanford CoreNLP natural language processing toolkit, in: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014, pp. 55–60.

19.

McNee

S.M.

et al., On the recommending of citations for research papers, in: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, 2002, pp. 116–125.

20.

et al., Query-focused personalized citation recommendation with mutually reinforced ranking, IEEE Access 6 (2017), 3107–3119.

21.

Oldham

et al., Consistency and differences between centrality metrics across distinct classes of networks, arXiv preprint arXiv:1805.02375, 2018.

22.

Page

et al., The PageRank citation ranking: Bringing order to the web, Stanford InfoLab, 1999.

23.

Perozzi

Al-Rfou

and Skiena

, Deepwalk: Online learning of social representations, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 701–710.

24.

Pan

et al., Adversarially regularized graph autoencoder for graph embedding, arXiv preprint arXiv:1802.04407, 2018.

25.

Pang

and Lee

, Foundations and Trends

{}^{\@setsize{\tiny}{7pt}{\vipt}{\@vipt}\textregistered}

in Information Retrieval 2.1-2, 2008, 1–135.

26.

Radev

D.R.

et al., The ACL anthology network corpus, Language Resources and Evaluation 47(4) (2013), 919–944.

27.

Roffo

and Melzi

, Ranking to Learn, International Workshop on New Frontiers in Mining Complex Patterns, Springer, Cham, 2016.

28.

Shi

et al., A citation recommendation method based on multiple factors, J. Comput. Res. Dev.(s2), 2011.

29.

Strohman

Croft

W.B.

and Jensen

, Recommending citations for academic papers, in: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007, pp. 705–706.

30.

Tang

et al., Line: Large-scale information network embedding, in: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web, 2015, 1067–1077.

31.

Tang

and Zhang

, A discriminative approach to topic-based citation recommendation, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2009.

32.

West

J.D.

Wesley-Smith

and Bergstrom

C.T.

, A recommendation system based on hierarchical clustering of an article-level citation network, IEEE Transactions on Big Data 2(2) (2016), 113–123.

33.

Yun-tao

Ling

and Yong-cheng

, An improved TF-IDF approach for text classification, Journal of Zhejiang University-Science A 6(1) (2005), 49–55.

34.

Zarrinkalam

and Kahani

, SemCiR: a citation recommendation system based on a novel semantic distance measure, Program 47(1) (2013), 92–112.

35.

Zhou

et al., Learning multiple graphs for document recommendations, in: Proceedings of the 17th International Conference on World Wide Web, 2008, pp. 141–150.