Towards effective link prediction: A hybrid similarity model

Abstract

Link prediction is an important research direction in complex network analysis and has drawn increasing attention from researchers in various fields. So far, a plethora of structural similarity-based methods have been proposed to solve the link prediction problem. To achieve stable performance on different networks, this paper proposes a hybrid similarity model to conduct link prediction. In the proposed model, the Grey Relation Analysis (GRA) approach is employed to integrate four carefully selected similarity indexes, which are designed according to different structural features. In addition, to adaptively estimate the weight for each index based on the observed network structures, a new weight calculation method is presented by considering the distribution of similarity scores. Due to taking separate similarity indexes into account, the proposed method is applicable to multiple different types of network. Experimental results show that the proposed method outperforms other prediction methods in terms of accuracy and stableness on 10 benchmark networks.

Keywords

Complex networks link prediction node similarity hybrid model Grey Relation Analysis

1 Introduction

In the last decade, the problem of link prediction in complex networks has captured growing attention from various disciplines [1 –4], owing to not only the incompleteness of many available real-world networks [5, 6], but also its importance in theory and reality [7 –9]. In biological networks, link prediction gives help to find potential interactions between proteins [10, 11]. In social networks, it is applied in friend recommendation [12, 13]. In co-authorship networks, link prediction can be used for collaboration prediction [14, 15]. Moreover, in e-commerce networks, it conduces to personalized recommendation of commodity [16, 17].

Link prediction aims to uncover the potential links or point out spurious links, with the known information in networks [1 , 18]. Generally speaking, potential links include missing and future links. This paper focuses on missing links. To find missing links, traditional attribute-based methods measure the connecting likelihood of links by using the external features, such as demographic or historical information about specific nodes. These methods take the attitude that individuals tend to form links if there are more common features among them [19]. However, many features of nodes suffer from the inaccessible and unreliable problems due to privacy policy in real scenario [1, 8]. Fortunately, structural similarity-based methods provide a new way to solve the problem. These methods only use observed network structures, such as common neighbors [20 –22], paths [23 –25], and triangles [26 –28]. Therefore, they are not affected by privacy information. Usually, these methods use only one or two features to measure node’s similarity, and assume the features work for all networks. However, different networks tend to have different internal structural characteristics [29, 30]. Therefore, these methods are not robust on different networks.

To address this issue, this paper proposes a new link prediction method based on a hybrid similarity model, in which the technique of Grey Relational Analysis (GRA) [31] is adopted to fuse different similarity features. GRA model was originally developed by Deng [31], and as a part of grey system theory, it is suitable for solving problems with complicated interrelationships between multiple factors and variables [32, 33]. Nowadays, GRA has become a well-known model for multiple-attribute decision-making (MADM) and received growing attention from researchers [34 –37]. In this paper, link prediction is regarded as an MADM problem, in which potential links are alternatives and similarity indexes are attributes. The GRA is employed to identify the missing links by solving the MADM problem. To the best of our knowledge, this paper is the first work to apply GRA in link prediction.

In the proposed method, four well-known similarity indexes are carefully chosen as the multiple attributes of GRA, which are LP [20, 23], RA [20], CAR [28], and LNB_AA [38]. The reason that these indexes are selected is they are designed according to different but prominent structural features. By using GRA to fuse different similarity indexes, the weight of each index is necessary. To adaptively assign a weight for each index on various networks, a new weight calculation method is designed in this study by borrowing some idea from the precision-to-noise ratio (PNR) [39], which was proposed to estimate the connection likelihood of potential links according to the distribution of their similarity scores. To verify the prediction performance of the proposed method, this paper experimentally compares it with eight baselines on 10 networks derived from various fields. The experimental results show that the proposed method is superior to other methods in terms of accuracy and robustness.

The main contributions of this work are summarized as follows:

To solve the link prediction problem, this work regards it as an MADM problem and uses the technique of GRA to address the problem.

A new weight calculation method is proposed to adaptively weigh different attributes in our MADM problem.

Extensive experiments executed on 10 benchmark networks manifest that the accuracy and robustness of the proposed method outperforms those of the compared ones.

The remainder of the paper is organized as follows. Section 2 introduces the problem description, evaluation metrics, baselines, and Friedman test. Section 3 shows the proposed method in detail. The experimental results and analysis are given in Section 4. Finally, this work is concluded in Section 5.

2 Preliminaries

2.1 Problem description and evaluation metric

Consider an undirected and unweighted network G (V, E), where V denotes the set of nodes and E describes the set of links. Multi-links and self-loops in G are not allowed. Let N = |V| be the number of nodes in G, and U represent the universal set containing all $\frac{N (N - 1)}{2}$ possible links. The task of link prediction, in this study, is to find out the missing links from the set of non-existing links U - E. To this end, all unconnected node pairs in U - E are assigned similarity scores based on a similarity measure, and then sorted in descending order according to their scores. Node pairs ranked at the top place are supposed to most likely have missing links [1]. In fact, the actual missing links are unknown. Therefore, to evaluate the prediction accuracy of a link prediction method, the observed link set E is randomly divided into a training set E_tr and a testing set E_ts, such that E = E_tr ∪ E_ts and E_tr∩ E_ts = ∅ [1]. Two standard metrics, i.e., AUC [1] and Precision [40], are employed in this paper.

(1) AUC can be viewed as the probability that the similarity score of one randomly selected missing link (i.e., a link in E_ts) is higher than that of one randomly selected non-existent link (i.e., a link in U - E). In implementation, among n times of independent comparisons, if there are n′ times that the missing links have higher scores and n″ times that they have the same scores, then the value of AUC can be defined as

$AUC = \frac{n^{'} + 0.5 n^{″}}{n} .$ (1) Note that, the AUC ≈ 0.5 if all similarity scores are randomly given. Therefore, a higher AUC value than 0.5 means high performance for a prediction method.

(2) Precision only concentrates on the top ranked links within a given prediction list. If l links are correctly predicted when considering top-L links, then Precision is defined as

$Precision = \frac{l}{L} .$ (2) Clearly, the higher value of Precision always corresponds to the higher accuracy.

2.2 Baselines

Heretofore, many link prediction methods have been presented by measuring the similarity of nodes in complex networks. This paper chooses several state-of-the-art methods as baselines for the purpose of performance comparison. The brief description of these methods is listed as follows.

(1) Resource allocation (RA) index [20]. Motivated by the resource assignment process that takes place in networks, RA defines the quantity of resource that one node received from another node through their shared neighbors as their similarity, which is

$RA (x, y) = \sum_{z \in Ω (x, y)} \frac{1}{k_{z}},$ (3) where k_z is the degree of node z, and Ω (x, y) is the set of common neighbors between nodes x and y.

(2) Local path (LP) index [20, 23]. When computing the similarity between two nodes, this index takes paths with length 2 and 3 connecting them into account. Formally, LP is defined as

$LP (x, y) = \sum_{i = 2}^{3} λ^{i - 2} | P^{i} (x, y) |,$ (4) where |Pⁱ (x, y) | is the number of paths with length i between nodes x and y, λ is a free parameter, which is set to 0.001 in this paper.

(3) Local Naïve Bayes (LNB) method [38]. This method employs the Local Naïve Bayes model to calculate the connection likelihood between two nodes. It defines the likelihood score as

$r (x, y) = s^{- 1} \prod_{z \in Ω (x, y)} s R_{z},$ (5) where $s = \frac{| U |}{| E_{tr} |} - 1$ is a constant, and $R_{z} = \frac{N_{Δ z} + 1}{N_{Λ z} + 1}$ , in which N_Δz and N_Λz respectively denote the amount of connected and disconnected node pairs whose common neighbors contain z. In [38], an exponent function f (k_z) is added to the item sR_z in Eq. (5), where f is a function about the degree of node z. By using a logarithmic function to both sides of the equation and neglecting the constant s^-1, a linear formula of likelihood score is obtained, which is $\begin{matrix} r^{'} (x, y) = \sum_{z \in Ω (x, y)} f (k_{z}) (log (s) + log (R_{z})) . \end{matrix}$ In this paper, function $f (k_{z}) = \frac{1}{log (k_{z})}$ is used; the corresponding method is called LNB_AA [38], which is

$\begin{matrix} LNB_AA (x, y) \\ = \sum_{z \in Ω (x, y)} \frac{1}{log (k_{z})} (log (s) + log (R_{z})) . \end{matrix}$ (6)

(4) CAR index [28]. This method is derived from both node-based and link-based viewpoints. It suggests that two nodes are more likely to link together if their common neighbors are members of a local-community. CAR index estimates the connection probability of node pair (x, y) as

$\begin{matrix} CAR (x, y) \\ = | Ω (x, y) | \sum_{z \in Ω (x, y)} \frac{| Γ (x) \cap Γ (y) \cap Γ (z) |}{2}, \end{matrix}$ (7) where Γ (x) denotes the set of neighbors of node x.

(5) Adaptive fusion model base on logistic regression (LR) [29]. This method is an adaptive fusion model, which predicts missing links using logistic regression. It is mainly based on the following assumptions: (i) the roles of different structural features are different in networks; (ii) even in the same network, the role of a structural feature in different modules is also different [29]. The connection probability of node pair (x, y) is defined as $\begin{matrix} P (x, y) = max {P_{M_{1}} (x, y), \dots, P_{M_{k}} (x, y)}, \end{matrix}$ where P_{M
_k} (x, y) denotes the connection probability of (x, y) in module M_k, which mathematically reads as

$\begin{matrix} P_{M_{k}} (x, y) \\ = \frac{1}{1 + e^{(- (β_{0} + \sum_{l = 1}^{L} β_{l} S_{M_{k}}^{F_{l}} (x, y)))}}, \end{matrix}$ (8) where $S_{M_{k}}^{F_{l}} (x, y)$ represents the similarity score of (x, y) in module M_k according to feature F_l. Values of parameters β₀, β₁, ⋯ , β_L can be attained by using known information. Features used in [29] are common neighbors (CN) index, preferential attachment (PA) index and degree difference (DD) index, which are respectively defined as $CN (x, y) = | Ω (x, y) |,$ (9) $PA (x, y) = k_{x} \cdot k_{y},$ (10) $DD (x, y) = | k_{x} - k_{y} | .$ (11) Three scenarios of modules were considered in [29]. Accordingly, the indexes based on these scenarios were represented as LR₁, LR₂, and LR_m, in which LR_m is the compromise between LR₁ and LR₂. LR_m is taken as a baseline in the experiments of this work.

(6) Adaptive degree penalization (ADP) index [30]. This method is an adaptive degree penalization link prediction method, which tries to estimate the best-performing degree penalization by using the average clustering coefficient observed in the network. The similarity measure is calculated as

$ADP (x, y) = \sum_{z \in Ω (x, y)} k_{z}^{- β C},$ (12) where C is the average clustering coefficient of the network, and β is a constant. As suggested in [30], this paper sets β = 2.5.

(7) Clustering coefficient for link prediction (CCLP) index [27]. Clustering coefficient of a node reflects the density of links between the neighbors of this node. CCLP index uses the clustering coefficients of shared neighbors to estimate the similarity between two nodes, which is

$CCLP (x, y) = \sum_{z \in Ω (x, y)} {CC}_{z},$ (13) here, the clustering coefficient of node z, i.e., CC_z, is defined as

${CC}_{z} = \frac{2 t_{z}}{k_{z} (k_{z} - 1)},$ (14) in which the number of links between neighbors of z is denoted by t_z.

(8) Mutual information (MI) index [41]. This index evaluates the similarity of nodes from the viewpoint of information theory, which computes the connection likelihood of a link using the conditional self-information between two unconnected nodes with common neighbors. MI index defines the similarity between nodes x and y as $MI (x, y) = - I (L_{xy}^{1} | Ω (x, y))$ (15) $= \sum_{z \in Ω (x, y)} I (L_{xy}^{1}; z) - I (L_{xy}^{1}),$ where $I (L_{xy}^{1})$ is the self-information that node pair (x, y) is connected, and $I (L_{xy}^{1}; z)$ is the mutual information between the event that node pair (x, y) has one link and the event that this node pair shares neighbor z. Their formulas are as follows $\begin{matrix} I (L_{xy}^{1}) = {log}_{2} (1 - \prod_{i = 1}^{k_{y}} \frac{(| E | - k_{x}) - i + 1}{| E | - i + 1}), \\ I (L_{xy}^{1}; z) = \frac{\sum_{u, v \in Γ (z), u \neq v} (I (L_{uv}^{1}) + {log}_{2} ({CC}_{z}))}{k_{z} (k_{z} - 1)} . \end{matrix}$

2.3 Friedman test

In experiments, the Friedman test [42] is introduced to further reveal the statistical significance of the proposed method. This test is a non-parametric statistical hypothesis test that is used to compare multiple methods based on a group of datasets [43]. Suppose k methods are compared on N datasets, all methods on each dataset are sorted in descending order of accuracy, and assigned the rank sequence 1, 2, ⋯, in turn. In case of ties, average ranks are assigned. Let r_ij denote the rank of the ith method on the jth dataset, and R_i be the average rank of the ith method, then $R_{i} = \frac{1}{N} \sum_{j} r_{ij}$ . The null-hypothesis in Friedman test is that the performance of all methods are equivalent. The Friedman statistic is

$χ_{F}^{2} = \frac{12 N}{k (k + 1)} [\sum_{i} R_{i}^{2} - \frac{k (k + 1)^{2}}{4}] .$ (16) When k and N are both large, Friedman statistic obeys $χ_{F}^{2}$ distribution with (k - 1) degrees of freedom.

However, the Friedman statistic may be undesirably conservative. Later, Iman and Davenport presented a better statistic [43], which is

$F_{F} = \frac{(N - 1) χ_{F}^{2}}{N (k - 1) - χ_{F}^{2}} .$ (17) This statistic distribution obeys the F-distribution with (k - 1) and (k - 1) (N - 1) degrees of freedom. When F_F is greater than the critical value of F ((k - 1) , (k - 1) (N - 1)), the null-hypothesis in Friedman test is rejected [43]. In other words, the performance of these methods has significant differences. Then, a post-hoc test, i.e., Bonferroni-Dunn test [44], is adopted to further distinguish the differences between the proposed method and others. In Bonferroni-Dunn test, the critical difference is defined as

$CD = q_{α} \sqrt{\frac{k (k + 1)}{6 N}},$ (18) where q_α is a critical value for post-hoc test [43]. If the difference between the average ranks of two methods exceeds the critical threshold value CD, the performance of these two methods is distinctly different.

3 Methodology

To accurately predict missing links, a multitude of similarity indexes have been proposed from different perspectives. Usually, a similarity index, which computes similarity scores of node pairs based on only one or two structural features, assumes the features are applicable to all networks. As a result, its performance is unstable on different networks that have diverse inner structural features. To address this issue, this work proposes a novel link prediction method by fusing multiple similarity indexes with different structural features. To synthesize different indexes, the technique of Grey Relation Analysis (GRA) [31] is adopted. In implementation, each similarity index is treated as an attribute, and each potential missing link is regarded as an alternative. At the same time, four classic similarity indexes, namely LP [20, 23], RA [20], LNB_AA [38] and CAR [28], are employed in the proposed method. For convenience, the proposed method is named LP_GRA.

3.1 GRA method

Since it was originally developed by Deng [31], the GRA technique has been widely applied in many multiple-attribute decision-making problems [34 –37]. As part of grey system theory, GRA is appropriate for solving problems with complicated interrelationships between multiple factors and variables [32, 33]. In the existing literature, there have been many different variants of the GRA method [33 , 45–47]. This paper uses a simple and efficient GRA method as in [33, 47], which takes some thoughts from TOPSIS (techniques for order preference by similarity to an ideal solution) [48, 49].

Suppose there are m alternatives and n attributes, the value of the ith alternative under jth attribute is expressed as x_ij. Then, one can get a decision matrix X = {x_ij} _m×n, which is presented in Eq. (19).

$X = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 n} \\ x_{21} & x_{22} & \dots & x_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{m 1} & x_{m 2} & \dots & x_{mn} \end{matrix}]$ (19) Based on the decision matrix X , the GRA method ranks alternatives according to the following steps [33, 47].

Step 1: Normalize the decision matrix X by using the vector normalization technique. Let Y = {y_ij} _m×n be the normalized decision matrix, and y_ij be the normalized value of the ith alternative under the jth attribute, which is computed as

$y_{ij} = \frac{x_{ij}}{\sqrt{\sum_{i = 1}^{m} x_{ij}^{2}}} .$ (20)

Step 2: Compute the weighted normalized decision matrix Z = {z_ij} _m×n according to Eq. (21).

$z_{ij} = ω_{j} \cdot y_{ij},$ (21) where w_j is the weight of the jth attribute.

Step 3: Obtain the positive ideal solution S⁺ and negative ideal (anti-ideal) solution S^-, respectively. Both solutions are represented as $S^{+} = {s_{1}^{+}, s_{2}^{+}, \dots, s_{n}^{+}},$ (22) $S^{-} = {s_{1}^{-}, s_{2}^{-}, \dots, s_{n}^{-}} .$ (23) The positive ideal solution is composed of the maximal (minimal) value under each benefit (cost) criterion, and the negative ideal solution is made up of the minimal (maximal) value under each benefit (cost) criterion.

Step 4: Calculate the grey relational coefficients. The grey relational coefficient between the ith alternative and the positive ideal solution S⁺ with respect to the jth attribute is calculated as

$γ_{ij}^{+} = \frac{m^{+} + ε M^{+}}{Δ_{ij}^{+} + ε M^{+}},$ (24) where $Δ_{ij}^{+} = | y_{ij} - s_{j}^{+} |$ , $m^{+} = min {Δ_{ij}^{+}}$ , $M^{+} = max {Δ_{ij}^{+}}$ , ε ∈ (0, 1) is the distinguishing coefficient. As in [33 , 46], ε is set as 0.5 in this paper.

Similarly, the grey relational coefficient between the ith alternative and the negative ideal solution S^- with respect to the jth attribute is computed as

$γ_{ij}^{-} = \frac{m^{-} + ε M^{-}}{Δ_{ij}^{-} + ε M^{-}},$ (25) where $Δ_{ij}^{-} = | y_{ij} - s_{j}^{-} |, m^{-} = min {Δ_{ij}^{-}}, M^{-} = max {Δ_{ij}^{-}$ }.

Step 5: Calculate the grey relational grades. The grey relational grade is the average value of relational coefficients of all attributes, which is an overall evaluation of alternatives. The grey relational grade of the ith alternative from the positive ideal solution is defined as

$G_{i}^{+} = \frac{1}{n} \sum_{j = 1}^{n} γ_{ij}^{+} .$ (26) Likewise, the grey relational grade of the ith alternative from the negative ideal solution is

$G_{i}^{-} = \frac{1}{n} \sum_{j = 1}^{n} γ_{ij}^{-} .$ (27)

Step 6: Estimate the relative grey relational grade. For the ith alternative, its relative grey relational grade is computed as

$G_{i} = \frac{G_{i}^{+}}{G_{i}^{+} + G_{i}^{-}} .$ (28)

The relative grey relational grade is used to measure the relationship between an alternative and the positive ideal solution. An alternative with higher G_i is assumed to be a better solution.

3.2 The proposed method

This section describes the LP_GRA method in detail. LP_GRA considers link prediction as a multi-attribute decision-marking problem (MADM) [50], in which potential missing links are considered as alternatives and similarity indexes are treated as attributes. The process of the proposed method LP_GRA is outlined in what follows.

Step 1: Determine the decision matrix.

In this paper, the indexes of LP, RA, LNB_AA and CAR are used to compute the similarity scores for potential links. Suppose there are m unconnected node pairs, according to their similarity scores, the decision matrix X can be established, which is listed as $\begin{matrix} X = [\begin{matrix} {LP}_{1} & {RA}_{1} & LNB_A A_{1} & {CAR}_{1} \\ {LP}_{2} & {RA}_{2} & LNB_A A_{2} & {CAR}_{1} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ {LP}_{m} & {RA}_{m} & LNB_A A_{m} & {CAR}_{m} \end{matrix}] \end{matrix}$ here, LP_i (i = 1, 2, ⋯ , m) denotes the similarity score of the ith unconnected node pair in terms of LP index. Step 2: Normalize and weigh the decision matrix.

In this step, the normalized decision matrix Y and weighted normalized decision matrix Z can be obtained by using Eqs. (20) and (21), respectively. To get matrix Z , the weights of all attributes (i.e., similarity indexes) must be given. This work designs a new algorithm to estimate these weights, which will be described in Section 3.3.

Step 3: Obtain the positive ideal and negative ideal solutions.

Since similarity scores of node pairs are the higher the better, all attributes are benefit criteria. The positive ideal solution S⁺ is composed of the maximum of each similarity index. On the contrast, the negative ideal solution S^- is formed from the minimum of each similarity index. Mathematically, $\begin{matrix} \begin{matrix} S^{+} = { & max {{LP}_{i}}, max {{RA}_{i}}, \\ max {LNB_A A_{i}}, max {{CAR}_{i}}}, \\ S^{-} = { & min {{LP}_{i}}, min {{RA}_{i}}, \\ min {LNB_A A_{i}}, min {{CAR}_{i}}} . \end{matrix} \end{matrix}$

Step 4: Calculate the grey relational coefficients and grades.

For each alternative (i.e., potential missing link), its grey relational coefficients and grey relational grade from the positive ideal solution can be calculated according to Eqs. (24) and (26). On the other hand, the coefficients and grade from the negative ideal solution are computed based on Eqs. (25) and (27).

Step 5: Obtain the relative grey relational grade.

Compute the relative grey relational grades for all potential missing links by using Eq. (28), and then rank them in descending order according to their grades. The links at the top are most likely to be missing ones.

Algorithm 1 Weight calculation

Input: G: observed network; I : set of n similarity indexes; K: size of bin;

Output: weight vector W ;

1: Randomly sample 10% links from G as validation set E^V;

2: for i = 1 → n do

3: Compute similarity scores of all node pairs in G by I_i;

4: Divide these scores into H uniform bins with size K;

5: for h = 1 → H do

6: n_e = number of existing links in bin h;

7: n_n = number of non-existing links in bin h;

8: $PNR (h) = \frac{n_{e}}{n_{n}}$ ;

9: end for

10: w_i = 0;

11: for e ∈ E^V do

12: s_e = score of e computed by I_i;

13: Suppose s_e locates in bin h;

14: w_i = w_i + PNR (h);

15: end for

16: end for

17: for i = 1 → n do

18: $w_{i} = \frac{w_{i}}{\sum_{j = 1}^{n} w_{j}}$ ;

19: end for

20: W = {w₁, w₂, ⋯ , w_n};

21: return W ;

3.3 Weight calculation method

In the proposed LP_GRA method, one important thing is to determine the weight of each similarity index. To do that, a new weight calculation method is presented by taking the idea of precision-to-noise ratio (PNR) defined in [39]. The PNR was proposed to estimate the connection likelihood of missing links under different scores by simultaneously analyzing the score distributions of existing and non-existing links.

The proposed weight calculation method measures the weight of each similarity index according to the observed network structure. Algorithm 1 outlines the process of the method. The input contains the observed network G, the set of similarity index I , and bin size K. In implementation, G is the training network (i.e., E_tr). Suppose n similarity indexes are used in the proposed method, I = {I₁, I₂ ⋯ , I_n} is the set of these indexes. According to PNR [39], similarity scores of all node pairs are divided into many uniform bins. Here, parameter K denotes the number of distinct scores in each bin. The output of the algorithm is a weight vector W . Line 1 of the algorithm randomly samples 10% links from G to generate a validation set, which will be used to measure the weight of each index. Lines 3 and 4 first compute the similarity scores of all existing links and non-existing links in G in terms of similarity index I_i, and then divide these scores into H uniform bins. The size of each bin is K, i.e., $H = \frac{η}{K}$ , where η is the number of distinct scores got in line 3. Lines 5 to 8 define PNR value for each bin, which is the ratio of the number of existing links to the number of non-existing links. Lines 10 to 15 estimate the weight of index I_i. The basic opinion is a better index can make a link in validation set falling into a bin with higher PNR score. Lines 17 to 19 normalize all weights. Line 20 generates the weight vector.

4 Experimental analysis

This section evaluates the performance of the LP_GRA method based on 10 real-world networks compared with eight baselines.

4.1 Datasets

To fairly evaluate the accuracy of link prediction methods, 10 real-world networks collected from various fields, including social networks, biological networks and technological networks, are employed in experiments. (1) Jazz [54]: a network of Jazz musicians. (2) USAir [1]: a network of the US air transportation system. (3) Email [55]: an email network between members of a university. (4) Facebook (FBK) [56]: a social network collected from https://www.facebook.com/. (5) Dolphin [57]: a social network of 62 dolphins in a community living off Doubtful Sound, New Zealand. (6) Football [58]: the network of American football games between Division IA colleges during regular season Fall 2000. (7) Polblog [59]: a blogging network about US politics. (8) Infectious (INF) [60]: a network of people’s face-to-face contacts in the exhibition "Infectious: Stay Away" in 2009 at the Science Gallery in Dublin. (9) NetScience (NS) [61]: a co-authorships network between scientists working on network theory and experiment. (10) C. elegans (CE) [51]: the neural network of a Caenorhabditis elegans worm.

In this work, all networks are treated as undirected and unweighted, and only the giant component of each network is used. The basic topological features of the giant components of these networks are listed in Table 1. One can observe from Table 1 that the structural characteristics of these networks are various. For examples, Jazz, FBK and Polblog have high average degrees while Dolphin and NS have low average degrees; Jazz and NS are dense networks, whereas Email and FBK are spare ones.

Table 1
The basic topological features of these 10 networks. N and M are the total number of nodes and links, respectively. 〈k〉 is the average degree and 〈d〉 is the average shortest path distance. C and r indicate the clustering coefficient [51] and assortative coefficient [52], respectively. $ρ = \frac{2 M}{N (N - 1)}$ denotes the network density, $H = \frac{〈 k^{2} 〉}{〈 k 〉^{2}}$ is the degree heterogeneity [1], and e is the network efficiency [53]

Dataset N M 〈k〉〈d〉 C ρ r H e

Jazz 198 2742 27.697 2.235 0.617 0.141 -0.020 1.395 0.513

USAir 332 2126 12.807 2.738 0.625 0.039 -0.208 3.464 0.406

Email 1133 5451 9.622 3.606 0.220 0.009 0.078 1.942 0.300

FBK 4015 87,882 43.777 3.985 0.071 0.011 -0.128 2.427 0.294

Dolphin 62 159 5.13 3.357 0.303 0.084 -0.044 1.327 0.379

Football 115 613 10.66 2.508 0.403 0.094 0.162 1.007 0.450

Polblog 1222 16,714 27.36 2.738 0.320 0.022 -0.221 2.971 0.398

INF 410 2765 13.49 3.631 0.456 0.033 0.033 1.388 0.323

NS 379 914 4.82 6.042 0.741 0.297 -0.082 1.663 0.203

CE 297 2148 14.465 2.455 0.292 0.049 -0.163 1.801 0.445

Dataset	N	M	〈k〉	〈d〉	C	ρ	r	H	e
Jazz	198	2742	27.697	2.235	0.617	0.141	-0.020	1.395	0.513
USAir	332	2126	12.807	2.738	0.625	0.039	-0.208	3.464	0.406
Email	1133	5451	9.622	3.606	0.220	0.009	0.078	1.942	0.300
FBK	4015	87,882	43.777	3.985	0.071	0.011	-0.128	2.427	0.294
Dolphin	62	159	5.13	3.357	0.303	0.084	-0.044	1.327	0.379
Football	115	613	10.66	2.508	0.403	0.094	0.162	1.007	0.450
Polblog	1222	16,714	27.36	2.738	0.320	0.022	-0.221	2.971	0.398
INF	410	2765	13.49	3.631	0.456	0.033	0.033	1.388	0.323
NS	379	914	4.82	6.042	0.741	0.297	-0.082	1.663	0.203
CE	297	2148	14.465	2.455	0.292	0.049	-0.163	1.801	0.445

4.2 Estimate parameter K

In Algorithm 1, there is a parameter K, which denotes the number of distinct similarity scores in each bin. This experiment determines the optimal value of K under the metric of AUC. Figure 1 shows the values of AUC with changes of K. These results are the average of 50 independent implementations with |E_ts|/|E|=0.1. It can be seen from the figure that AUC values have extremely slight fluctuations with the changes of K. That is, the method of LP_GRA is not sensitive to K. In the following experiments, the value of K is simply fixed as 1.

Fig. 1

AUC values obtained by LP_GRA with different values of K.

4.3 AUC results

Table 2 exhibits the predicted accuracy of nine methods under the metric of AUC on 10 networks. The results are the average of 50 independent implementations on each network. In each implementation, a network is randomly partitioned into a training set E_tr and a testing set E_ts, such that |E_tr| : |E_ts|=9 : 1. The best accuracy for each network is emphasized by boldface. Apparently, LP_GRA achieves the best performance on Email, FBK, Dolphin, Football, INF, and NS, and obtains the second best on Polblog and CE. These results show that LP_GRA can get fairly decent accuracy. From Table 1, one can see that Email is a very spare network, therefore common neighbor-based methods get lower accuracy than LP, which benefits from the additional information supplied by length 3 paths [20, 23]. However, by taking advantage of GRA and adaptively ascertaining weights of different indexes, LP_GRA attains the best AUC value on Email.

Table 2
AUC values of different methods on 10 networks. The results are the average of 50 independent implementations with |E_tr| : |E_ts|=9 : 1. The best performance for each network is emphasized by boldface

Dataset RA ADP CAR CCLP LP MI LR_m LNB_AA LP_GRA

Jazz 0.9699 0.9711 0.9528 0.9579 0.9499 0.9455 0.9723 0.9635 0.9651

USAir 0.9516 0.9517 0.9127 0.9381 0.9268 0.9122 0.9454 0.9477 0.9475

Email 0.8461 0.8465 0.6968 0.8421 0.9005 0.8514 0.8957 0.8465 0.9014

FBK 0.9944 0.9943 0.9842 0.9921 0.9923 0.9891 0.9935 0.9930 0.9947

Dolphin 0.7748 0.7762 0.6346 0.7698 0.7937 0.6457 0.7689 0.7741 0.7939

Football 0.8464 0.8464 0.8146 0.8420 0.8613 0.7967 0.8531 0.8425 0.8646

Polblog 0.9230 0.9232 0.8921 0.9209 0.9289 0.9238 0.9383 0.9226 0.9300

INF 0.9444 0.9444 0.8622 0.9379 0.9575 0.9159 0.9604 0.9425 0.9616

NS 0.9588 0.9587 0.8154 0.9287 0.9576 0.8509 0.9597 0.9584 0.9640

CE 0.8680 0.8676 0.7650 0.8651 0.8628 0.8321 0.8828 0.8653 0.8724

Dataset	RA	ADP	CAR	CCLP	LP	MI	LR_m	LNB_AA	LP_GRA
Jazz	0.9699	0.9711	0.9528	0.9579	0.9499	0.9455	0.9723	0.9635	0.9651
USAir	0.9516	0.9517	0.9127	0.9381	0.9268	0.9122	0.9454	0.9477	0.9475
Email	0.8461	0.8465	0.6968	0.8421	0.9005	0.8514	0.8957	0.8465	0.9014
FBK	0.9944	0.9943	0.9842	0.9921	0.9923	0.9891	0.9935	0.9930	0.9947
Dolphin	0.7748	0.7762	0.6346	0.7698	0.7937	0.6457	0.7689	0.7741	0.7939
Football	0.8464	0.8464	0.8146	0.8420	0.8613	0.7967	0.8531	0.8425	0.8646
Polblog	0.9230	0.9232	0.8921	0.9209	0.9289	0.9238	0.9383	0.9226	0.9300
INF	0.9444	0.9444	0.8622	0.9379	0.9575	0.9159	0.9604	0.9425	0.9616
NS	0.9588	0.9587	0.8154	0.9287	0.9576	0.8509	0.9597	0.9584	0.9640
CE	0.8680	0.8676	0.7650	0.8651	0.8628	0.8321	0.8828	0.8653	0.8724

In addition, LR_m also manifests good predicted results. It obtains three best and two second-best based on AUC. Similar to LP_GRA, the reason that LR_m can get good performance is it aggregates several structural features. However, other baselines do not always give satisfactory results. Take LP index as an example, on some networks, it is ranked second, but ranked seventh or eighth on some others. This phenomenon is caused by the parameter of LP. On different networks, the optimal parameter is quit diverse [20, 23]. However, it is very time-consuming and impractical to determine the optimal parameter of each network. In a nutshell, the proposed method is more stable than baselines on different networks.

Furthermore, the Friedman test [43] is employed to analyze the significant differences between baselines and LP_GRA based on the above AUC results. According to Table 2, we get $χ_{F}^{2} = 53.573$ and F_F = 18.245. Since there are 9 methods and 10 networks, F_F is distributed by the F-distribution with 8 and 72 degrees of freedom. When α = 0.05, the critical value of F (8, 72) is 2.070. Since F_F = 18.245 > 2.070, the null-hypothesis in Friedman test is rejected. In other words, these methods are not equivalent. Afterwards, a post-hoc test is proceeded by the Bonferroni-Dunn test to estimate the significant differences between LP_GRA and baselines. When α = 0.05, the critical difference is $CD = 3.102 \times \sqrt{\frac{9 \times (9 + 1)}{6 \times 10}} = 3.80$ . The results are graphically shown in Fig. 2. The ranking of these methods are arranged from left to right on the axis, and the best rank is on the left side. Figure 3 manifests that LP_GRA has the best performance among all methods, and is significantly better than CCLP, MI, and CAR.

Next, the changes of AUC of all prediction methods with different proportions of training set E_tr in E (from 0.7 to 0.9) is shown in Fig. 3. Evidently, with the proportion increasing from 0.7 to 0.9, the AUC scores show an upward trend. This phenomenon is easy to understand. Increasing the proportion of training set will provide more training information. Conversely, the lower the E_tr ratio, the more difficult the link prediction is. As a result, experiments with lower proportions of E_tr, such as 0.6 and 0.5, are not further enumerate. More importantly, Fig. 3 presents that the AUC values of the proposed method are either the best or close to the best on all networks. Figure 3(k) depicts the average ranks of different methods under varying proportions of testing sets on all networks, which shows that the average ranks of the proposed method are always the best. Additionally, the significant differences for |E_tr|/|E| = 0.8 and 0.7 are analyzed separately. The corresponding values of F_F are 20.605 and 20.095, both of which are greater than 2.070 (the critical value of F (8, 72)). That implies that these methods are not equivalent. The results of the Bonferroni-Dunn test are graphically shown in Fig. 4, which again prove the best performance of the proposed method.

Fig. 2

Comparison of LP_GRA against the others with the Bonferroni-Dunn test. This comparison is based on the results in Table 2. All methods with ranks outside the marked interval are significantly different from LP_GRA.

Fig. 3

AUC results on 10 networks with different proportions of training set E_tr.

Fig. 4

The Bonferroni-Dunn test for |E_tr|/|E| = 0.8 and 0.7. All methods with ranks outside the marked interval are significantly different from LP_GRA.

4.4 Precision results

The comparison of different link prediction methods under the metric of Precision on these 10 networks with different sizes of L is shown in Fig. 5. Unlike AUC, which quantifies the accuracy from the entirety, Precision concerns the prediction accuracy of the top-L predicted links. These results in Fig. 5 prove that the performance of LP_GRA is invariably at the forefront on most networks. Nevertheless, the performance of baselines fluctuates wildly across different networks. For example, the precision of RA is the best on NS, but the last on Polblog. In addition, one can observe that with the increase of L, Precision scores of most methods tend to decline. The reason is that the increase of L will reduce the probability to uncover relevant items, and then the value of Precision will be lower.

Fig. 5

Precision of different methods on 10 networks with different values of L. The results are the average of 50 independent implementations with |E_ts|/|E|=0.1. The size of E_ts for Dolphin is 15, so the max L selected is 15. Similarly, the max L selected for Football is 60.

Finally, the changes of Precision with different training set ratios (|E_tr|/|E| = 0.7 to 0.9) are depicted in Fig. 6. Here, L = |E_ts| for all networks. It can be seen from Fig. 6 that the trend of Precision is different, even opposite, to AUC. When the proportion of training set increases from 0.7 to 0.9, Precision scores present a gradual decline. This scenario was explained in [62]. For the calculation of AUC (see Eq. (1)), the decrease of training set will result in weak n′ and strong n″. As a result, the value of AUC will be lowered [62]. On the other hand, the decrease of training set means increase of testing set. Correspondingly, the probability of getting relevant items will also increase, which causes more missing links to be revealed [62].

Fig. 6

Precision results on 10 networks with different proportions of training set E_tr.

Overall, the performance of the proposed method is superior to baselines, and thence it is applicable to more networks compared with baselines. The remarkable characteristic of LP_GRA is that it integrates multiple structural features of a network via the technique of GRA. By using the weight calculation method, LP_GRA is able to automatically adapt to different networks and maintain stable performance.

5 Conclusion

This paper proposed a new link prediction method LP_GRA, which aggregates the results of several similarity indexes. LP_GRA regards link prediction as an MADM problem, in which potential links are alternatives and similarity indexes are attributes. The Grey Relation Analysis, a well-known MADM method, is adopted in the proposed method to rank potential links by solving the MADM problem. In the proposed method, to fuse different indexes, the weight of each index is necessary. To this end, a new weight calculation method was designed, which can adaptively assign weight scores for all indexes only according to the observed network structures. The accuracy and stableness of the proposed method were experimentally investigated on 10 benchmarks under the metrics of AUC and Precision. The experimental results demonstrate that the proposed method is superior to baselines in terms of accuracy and stableness. In addition, experiment analysis implies that hybrid similarity model is a feasible way to solve the link prediction problem.

Footnotes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No. 61602225) and the Fundamental Research Funds for the Central Universities (No. lzujbky-2019-90).

References

Lü

and Zhou

, Link prediction in complex networks: A survey, Physica A 390(6) (2011), 1150–1170.

Haghani

and Keyvanpour

M.R.

, A systemic analysis of link prediction in social network, Artificial Intelligence Review 52(2) (2019), 1961–1995.

Pandey

, Bhanodia

P.K.

, Khamparia

and Pandey

D.K.

, A comprehensive survey of edge prediction in social networks: Techniques, parameters and challenges, Expert Systems with Applications 124 (2019), 164–181.

Kumar

, Singh

S.S.

, Singh

and Biswas

, Link prediction techniques, applications, and performance: A survey, Physica A 553 (2020), 124289.

Sprinzak

, Sattath

and Margalit

, How reliable are experimental protein–protein interaction data? Journal of Molecular Biology 327(2) (2003), 919–923.

Stumpf

M.P.H.

, Thorne

, de Silva

, Stewart

, An

H.J.

, Lappe

and Wiuf

, Estimating the size of the human interactome, Proceedings of the National Academy of Sciences 105(2) (2008), 6959–6964.

Zhang

Q.-M.

, Xu

X.-K.

, Zhu

Y.-X.

and Zhou

, Measuring multiple evolution mechanisms of complex networks, Scientific Reports 5 (2015), 10350.

, Fang

, Bai

, Xu

, Cheng

and Chen

, Effective link prediction based on community relationship strength, IEEE Access 7 (2019), 43233–43248.

Daud

N.N.

, Ab Hamid

S.H.

, Saadoon

, Sahran

and Anuar

N.B.

, Applications of link prediction in social networks: A review, Journal of Network and Computer Applications 166(February) (2020), 102716.

10.

Guimerà

and Sales-Pardo

, Missing and spurious interactions and the reconstruction of complex networks, Proceedings of the National Academy of Sciences 106(2) (2009), 22073–22078.

11.

Bhowmick

S.S.

and Seah

B.S.

, Clustering and summarizing protein-protein interaction networks: A survey, IEEE Transactions on Knowledge and Data Engineering 28(2) (2016), 638–658.

12.

Cheng

, Zhang

, Zou

, Huang

and Zhang

, Friend recommendation in social networks based on multi-source information fusion, International Journal of Machine Learning and Cybernetics 10(2) (2019), 1003–1024.

13.

, Zhou

and Zhang

H.-F.

, Playing the role of weak clique property in link prediction: A friend recommendation model, Scientific Reports 6 (2016), 1–12.

14.

Chuan

P.M.

, Son

L.H.

, Ali

, Khang

T.D.

, Huong

L.T.

and Dey

, Link prediction in co-authorship networks based on hybrid content similarity metric, Applied Intelligence 48(2) (2018), 2470–2486.

15.

, Long

, Lv

, Shao

, He

and Duan

, Predicting co-author relationship in medical co-authorship networks, PLoS ONE 9(2) (2014), e101214.

16.

, Zeng

, Gillard

and Medo

, Network-based recommendation algorithms: A review, Physica A 452 (2016), 192–208.

17.

Wang

, Liu

, Zhang

, Chen

and Lu

, Mixed similarity diffusion for recommendation on bipartite networks, IEEE Access 5 (2017), 21029–21038.

18.

Pan

, Zhou

, Lü

and Hu

C.-K.

, Predicting missing links and identifying spurious links via likelihood analysis, Scientific Reports 6(2) (2016), 22955.

19.

Lin

, An information-theoretic definition of similarity, in: Proceedings of the Fifteenth International Conference on Machine Learning, ICML ’98, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998, pp. 296–304.

20.

Zhou

, Lü

and Zhang

Y.C.

, Predicting missing links via local information, European Physical Journal B 71(2) (2009), 623–630.

21.

Newman

M.E.J.

, Clustering and preferential attachment in growing networks, Physical Review E 64(2) (2001), 4.

22.

Ahmad

, Akhtar

M.U.

, Noor

and Shahnaz

, Missing Link Prediction using Common Neighbor and Centrality based Parameterized Algorithm, Scientific Reports 10(2) (2020), 364.

23.

L{ü

, Jin

C.H.

and Zhou

, Similarity index based on local paths for link prediction of complex networks, Physical Review E 80(2) (2009), 046122.

24.

, Qian

, Wang

, Luo

and Chen

, Accurate similarity index based on activity and connectivity of node for link prediction, International Journal of Modern Physics B 29(2) (2015), 1550108.

25.

Ayoub

, Lotfi

, El Marraki

and Hammouch

, Accurate link prediction method based on path length between a pair of unlinked nodes and their degree, Social Network Analysis and Mining 10(2) (2020), 9.

26.

Bai

, Li

, Cheng

, Xu

and Chen

, Predicting missing links based on a new triangle structure, Complexity 2018 (2018), 1–11.

27.

, Lin

, Wang

and Gregory

, Link prediction with node clustering coefficient, Physica A 452 (2016), 1–8.

28.

Cannistraci

C.V.

, Alanis-Lobato

and Ravasi

, From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks, Scientific Reports 3(2) (2013), 1613.

29.

, Bao

Z.-K.

and Zhang

H.-F.

, Improving link prediction in complex networks by adaptively exploiting multiple structural features of networks, Physics Letters A 381(2) (2017), 3369–3376.

30.

Mart{ínez

, Berzal

and Cubero

J.-C.C.

, Adaptive degree penalization for link prediction, Journal of Computational Science 13 (2016), 1–9.

31.

Deng

J.L.

, Introduction to grey system theory, Journal of Grey System 1(2) (1989), 1–24.

32.

Wei

, Grey relational analysis model for dynamic hybrid multiple attribute decision making, Knowledge-Based Systems 24(2) (2011), 672–679.

33.

Wang

, Zhu

and Wang

, A novel hybrid MCDM model combining the SAW, TOPSIS and GRA methods based on experimental design, Information Sciences 345 (2016), 27–45.

34.

Yazdani

, Kahraman

, Zarate

and Onar

S.C.

, A fuzzy multi attribute decision framework with integration of QFD and grey relational analysis, Expert Systems with Applications 115 (2019), 474–485.

35.

Lei

, Lu

, Wei

, Wu

, Wei

and Guo

, GRA method for waste incineration plants location problem with probabilistic linguistic multiple attribute group decision making, Journal of Intelligent & Fuzzy Systems 39(2) (2020), 2909–2920.

36.

Wei

, Lu

, Wei

and Wu

, Probabilistic linguistic GRA method for multiple attribute group decision making, Journal of Intelligent & Fuzzy Systems 38(2) (2020), 4721–4732.

37.

Baranitharan

, Ramesh

and Sakthivel

, Multi-attribute decision-making approach for Aegle marmelos pyrolysis process using TOPSIS and Grey Relational Analysis: Assessment of engine emissions through novel Infrared thermography, Journal of Cleaner Production 234 (2019), 315–328.

38.

Liu

, Zhang

Q.-M.

, Lü

and Zhou

, Link prediction in complex networks: A local naïve Bayes model, EPL (Europhysics Letters) 96(2) (2011), 48007.

39.

Zhou

M.-Y.

, Liao

, Xiong

W.-M.

, Wu

X.-Y.

and Wei

Z.-W.

, Connecting patterns inspire link prediction in complex networks, Complexity 2017 (2017), 1–12.

40.

Herlocker

J.L.

, Konstan

J.A.

, Terveen

L.G.

and Riedl

J.T.

, Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems 22(2) (2004), 5–53.

41.

Tan

, Xia

and Zhu

, Link prediction in complex networks: A mutual information perspective, PLoS ONE 9(2) (2014), e107056.

42.

Friedman

, A comparison of alternative tests of significance for the problem of m rankings, The Annals of Mathematical Statistics 11(2) (1940), 86–92.

43.

Dev̌msar

, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7(2) (2006), 1–30.

44.

Dunn

O.J.

, Multiple comparisons among means, Journal of the American Statistical Association 56(2) (1961), 52–64.

45.

Liou

J.J.

, Hsu

C.C.

, Yeh

W.C.

and Lin

R.H.

, Using a modified grey relation method for improving airline service quality, Tourism Management 32(2) (2011), 1381–1388.

46.

Kuo

M.S.

and Liang

G.S.

, Combining VIKOR with GRA techniques to evaluate service quality of airports under fuzzy environment, Expert Systems with Applications 38(2) (2011), 1304–1312.

47.

and Song

, Study on effectiveness evaluation of weapon systems based on grey relational analysis and TOPSIS, Journal of Systems Engineering and Electronics 20(2) (2009), 106–111.

48.

Hwang

C.-L.

and Yoon

, Multiple Attribute Decision Making, Vol. 186 of Lecture Notes in Economics and Mathematical Systems, Springer Berlin Heidelberg, Berlin, Heidelberg, 1981.

49.

Chen

S.-J.

and Hwang

C.-L.

, Fuzzy Multiple Attribute Decision Making Methods, in: Fuzzy Multiple Attribute Decision Making: Methods and Applications, Springer Berlin Heidelberg, Berlin, Heidelberg, 1992, pp. 289–486.

50.

Rao

R.V.

, Introduction to Multiple Attribute Decision-making (MADM) Methods, in: Decision Making in the Manufacturing Environment, Springer London, London, 2007, pp. 27–41.

51.

Watts

D.J.

and Strogatz

S.H.

, Collective dynamics of ‘small-world’ networks, Nature 393(2) (1998), 440–2.

52.

Newman

M.E.J.

, Mixing patterns in networks, Physical Review E 67(2) (2003), 026126.

53.

Latora

and Marchiori

, Efficient behavior of small-world networks, Physical Review Letters 87(2) (2001), 198701.

54.

Gleiser

P.M.

and Danon

, Community structure in jazz, Advances in Complex Systems 06(2) (2003), 565–573.

55.

Guimerà

, Danon

, Díaz-Guilera

, Giralt

and Arenas

, Self-similar community structure in a network of human interactions, Physical Review E 68(2) (2003), 065103.

56.

Mcauley

and Leskovec

, Discovering social circles in ego networks, Vol. 8, Association for Computing Machinery, New York, NY, USA, 2014.

57.

Lusseau

, Schneider

, Boisseau

O.J.

, Haase

, Slooten

and Dawson

S.M.

, The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations: Can geographic isolation explain this unique trait? Behavioral Ecology and Sociobiology 54(2) (2003), 396–405.

58.

Girvan

and Newman

M.E.J.

, Community structure in social and biological networks, Proceedings of the National Academy of Sciences 99(2) (2002), 7821–7826.

59.

Adamic

L.A.

and Glance

, The political blogosphere and the 2004 U.S. election, in: Proceedings of the 3rd international workshop on Link discovery – LinkKDD ’05, ACM Press, New York, New York, USA, 2005, pp. 36–43.

60.

Isella

, Stehlé

, Barrat

, Cattuto

, Pinton

J.-F.

and den Broeck

W.V.

, What’s in a crowd? analysis of face-to-face behavioral networks, Journal of Theoretical Biology 271(2) (2011), 166–180.

61.

Newman

M.E.J.

, Finding community structure in networks using the eigenvectors of matrices, Physical Review E 74(2) (2006), 036104.

62.

Yang

and Zhang

X.-D.

, Predicting missing links in complex networks based on common neighbors and distance, Scientific Reports 6(2) (2016), 38208.