An adaptive node embedding framework for multiplex networks

Abstract

Network Embedding (NE) has emerged as a powerful tool in many applications. Many real-world networks have multiple types of relations between the same entities, which are appropriate to be modeled as multiplex networks. However, at random walk-based embedding study for multiplex networks, very little attention has been paid to the problems of sampling bias and imbalanced relation types. In this paper, we propose an Adaptive Node Embedding Framework (ANEF) based on cross-layer sampling strategies of nodes for multiplex networks. ANEF is the first framework to focus on the bias issue of sampling strategies. Through metropolis hastings random walk (MHRW) and forest fire sampling (FFS), ANEF is less likely to be trapped in local structure with high degree nodes. We utilize a fixed-length queue to record previously visited layers, which can balance the edge distribution over different layers in sampled node sequence processes. In addition, to adaptively sample the cross-layer context of nodes, we also propose a node metric called Neighbors Partition Coefficient (NPC). Experiments on real-world networks in diverse fields show that our framework outperforms the state-of-the-art methods in application tasks such as cross-domain link prediction and mutual community detection.

Keywords

Multiplex networks node embedding network sampling neighbors partition coefficient

1. Introduction

Networks are ubiquitous data structures and are used to model relations among entities such as social networks, co-authorship networks, biological networks. Network analysis and mining have become one of the most active research fields. In the real world, the information of a certain entity can actually be collected from various sources and in different scenarios [57]. Take social networks for example. The type of social relationship is diverse, e.g. Facebook, Twitter, LinkedIn [19], Lunch, Leisure, Co-authorship and Work [2]. These multi-relational social networks with rich information are able to reflect comprehensive and accurate user profiles. Therefore, taking together these type and source data that describe the same entities, we can give a more accurate and nuanced picture of network structure than any single one can alone [39]. In the relation extraction task of multi-source and multi-modal data [38, 14, 49], in particular, networks can be extracted from video, text, and audio, respectively. Each network only reflects the connectivities among nodes in a single view. Therefore, data analysis results can be easily misinterpreted if we only rely on data from a single source or modal. In the process of fusion and expansion of domain-specific knowledge graph [4], we use domain-specific knowledge graphs from many other domains to achieve the relationship expansion of an existing knowledge graph. In social network analysis [56, 55], lots of new online social networks have emerged and start to provide services, the information available for the users in these emerging networks is usually very limited. The abundant information available in mature networks can actually be useful for link prediction and community detection in the emerging networks. In biological multi-omics research [33, 25], by using the individual’s expression in each omic, researchers can construct a network structure of different omics. One can integrate different data types by constructing a network of samples (rather than genomic features) for each data type, and then fuse these networks into one comprehensive network. Using this network can achieve more accurate prediction and analysis. To model such heterogeneous information networks, a multiplex network is an effective and reliable model.

Multiplex network is made up of multiple layers, each of which can represent a given social scenario, relationship type, view, dimension, or temporal instance. Therefore, we utilize a multiplex network to model multi-source, multi-relational and multi-view networks in which different types of entity interactions are regarded as different layers. Multi-source networks are multiple networks, which constructed from different sources. Multi-view networks also are multiple networks, which are different views captured from a comprehensive network or data. Multiplex networks model these networks data with each type of relation or view as one layer. Each layer has the same set of nodes. In a multiplex network, the information fusion of multiple layers of nodes is a significant fundamental issue for the joint analysis of networks. In addition, network embedding is an effective method to analyze and mine the network. It can project node (or network) into a continuous low-dimensional space. Intuitively, modeling the information fusion problem of nodes as a feature fusion problem is a straightforward way. Based on the fused embedding, we can further mine the network data for node classification, link prediction, node clustering, and visualization [45, 46]. Thus, in this paper, we are motivated to focus on multiplex network embedding. What’s more, cross-domain link prediction and mutual community detection tasks in multiplex networks are also conducted by the latent representation.

In the past few years, some methods have been proposed for multiplex network embedding. Intuitively, according to the resulted embedding of nodes, we divide these methods into two categories: Fused embedding and Separate embedding. As for fused embedding method, to precisely capture the structure of a node in multiple layers as a comprehensive representation, Liu et al. [26] extended Node2vec [11] to project the multilayer network representation learning. Then, Zitnik and Jure [60] proposed OhmNet framework to learn protein features in different tissues. OhmNet applied the Node2vec to construct network neighbors for each node in each layer. As for separate embedding method, Bagavathi and Krishnan [1]presented a fast and scalable embedding technique for multilayer networks. To model both within-layer connections and cross-layer network dependencies simultaneously, Li et al. [23] designed a unified optimization framework for multilayer network embedding. Lu et al. [27] proposed four multilayer network embedding algorithms based on non-negative matrix factorization. However, there are some gaps in the above algorithms. On the one hand, the scalability of factorization-based methods is a major bottleneck. Such methods are memory intensive, computationally expensive, even infeasible [53]. On the other hand, network sampling-based methods such as random walk (RW), breadth-first sampling (BFS) and depth-first sampling (DFS), are not only biased towards high degree nodes (hubs) but also are limited capacity to capture complex interaction of nodes. Furthermore, once a hub is picked, every node connected to the hub is selected in the next step. It is more likely to get stuck in the local structure and less likely to jump around the network [16, 21, 42]. Besides, in the multiplex network sampling process, existing methods are hard to balance different relationship types for cross-layer sampling by setting a fixed parameter, which can result in the loss of a certain relationship type. To solve the above problems, major challenges are as follows: First, how to avoid the bias of sampling strategies? Second, how to gracefully balance the type of sampled relations? Third, how to sample the cross-layer context of nodes adaptively?

In response to these challenges, we propose an adaptive node embedding framework for multiplex networks (ANEF). The ANEF samples the cross-layer context of a node by metropolis hastings random walk (MHRW) and forest fire sampling (FFS). Both of the sampling methods can not only avoid the bias towards node with a high degree but also sample several nodes simultaneously to improve efficiency. It is less likely to be trapped locally and thus can generate more accurate node sequences. Specifically, MHRW fits symmetric and asymmetric distributions to sample complex network structural by acceptance/rejection function. FFS can adjust the balance between staying local and jumping around the network by changing the burning probability parameter. In addition, inspired by JUST algorithm [17], we construct a fixed-length queue to balance the edge distribution over different layers. To achieve an adaptive cross-layer sampling method, we propose a neighbors-partition coefficient (NPC) as an indicator of cross-layer jumping. The main contributions of this paper are listed as follows:

1.
We propose an adaptive node embedding framework for multiplex networks, named ANEF. We implement two representation learning instance methods for multiplex networks through more effective MHRW and FFME sampling methods to avoid bias towards high degree nodes.
2.
We modify two sampling algorithms (MHRW and FFS) to balance the edge distribution over different layers in sampled node sequences. We utilize a fixed-length queue to record previously visited layers. The cross-layer jumping operation of MHRW and FFS are constrained by this queue.
3.
We propose a neighbors partition coefficient, named NPC, which is intended to reveal the link modes of a node in different layers. Considering the distribution of node neighbors in different layers, NPC can supervise the cross-layer jump process in an adaptive manner.
4.
We compare the results of cross-domain link prediction and mutual community detection tasks to that of state-of-the-art methods using six real networks as the datasets. Experiments verify that our methods outperform existing state-of-the-art approaches.

Furthermore, ANEF can not only be applied to the embedding of multi-relational networks but also be easily extended to the analysis of dynamic networks. For dynamic network embedding, the additional condition is to restrict the order of cross-layer jumping. The remainder of the paper is organized as follows. In Section 2, we summarize related works on network embedding and network sampling. In Section 3, the definitions can help to understand the problem, framework, and algorithm. In Section 4, we introduce the proposed framework and algorithms. In Section 5, we then describe our experimental setup and evaluate the proposed approach. Finally, in Section 6, we draw our conclusions and discuss future research directions.
2. Related work

In this section, we first briefly describe the ideas of network embedding and network sampling, as well as their correlation. Then, we respectively introduce related works about traversal-based network embedding and network sampling methods. Finally, we review related work on network sampling and analyze the bias of familiar sampling algorithms.

Figure 1.

The relation between network embedding and network sampling. Traversal-based network embedding methods have the same goal as network sampling, which is to preserve the original network structural information by sampling path or nodes sequences.

The structure-property of networks can be divided into three categories such as microscale structure, mesoscale structure, and macroscopic structure. It mainly includes the in/out degree distribution, path length distribution, clustering coefficient, eigenvalues, community structure, motif, shrinking diameter, higher-order types connection, note type distribution, average clustering coefficient over time, intra-link and inter-link type and so on. It can be seen from Fig. 1 that we can know that network sampling methods and network embedding methods have some similar parts. In both processes, it is necessary to generate the sampling paths or node sequences. Moreover, these paths or sequences need to preserve the property of the original network. Many network sampling methods have been proposed to capture the microscale, mesoscale and macroscopic structure of the original network. The network sampling method is not all a chaining process like random walk, but a diffusion structure process like forest fire sampling.

2.1 Traversal-based network embedding

Embedding techniques using random walk to obtain node representations have been proposed: DeepWalk [40] is the first algorithm based on random walk to learn node representation. Node2vec [11] is based on Breadth-First-Search (BFS) and Depth-First Search (DFS). Both algorithms are traditional network embedding examples. Perozzi et al. introduced DeepWalk, inspired by language modeling, more precisely by word2vec [35] algorithm with the goal of learning representation of nodes. Grover et al. presented Node2vec, an extension of DeepWalk algorithm. Their model is based on the design of a biased random walk mechanism controlled by two parameters $p$ and $q$ . These two parameters are allowed to tune the bias of the random walk, from exploring only neighbors to being able to visit node sequences even farther from the root node. Although their high scores of learning efficiency and accuracy have drawn much attention, they still lack in transparency. Lyu et al. [28] analyzed the characteristics and the downsides of random-walk based sampling strategies and proposed a network embedding framework combining both structural similarity and neighborhood information. Hu et al. [15] proposed URGE, a proximity preserved embedding method for uncertain graph embedding. They investigated expected Jaccard similarity and probabilistic random walk with restart proximities, as well as efficient algorithms for computing. Recently Gu et al. [12] proposed an approach based on the open-flow network model to reveal the underlying flow structure and its hidden metric space of different random walk strategies on networks. It shows that the essence of network embedding by random walk is the latent metric defined on the open-flow network.

In order to learn the representation of multi-relations heterogeneous information networks based on traversal-based methods, the following algorithm is proposed recently. Dong et al. [8] proposed a strategy for random walk sampling from heterogeneous networks, where the random walk is restricted to transition between particular types of nodes. This strategy allows many methods to be applied to heterogeneous graphs and is complementary to the idea of taking type-specific encoders and decoders into account. Leonardo et al. [43] presented Struc2vec, a novel and flexible framework with the target to learn latent representations for the structural identity of nodes. The framework uses a hierarchy to measure node similarity at different scales, and constructs a multilayer graph to encode structural similarities and generate a structural context for node. It constructs a weighted multilayer graph based on measuring structural similarity where all nodes in the network are present in every layer. In some cases, graphs have multiple “layers” that contain copies of the same nodes. In these cases, it can be beneficial to share information across layers so that a node’s embedding in one layer can be informed by its embedding in other layers. Qu et al. [41] proposed an attention based method (MVE) to learn the weights of views for different nodes with a few labeled data. MVE can get the robust node representations across different views by vote strategy. Zitnik et al. [60] proposed OhmNet framework to learn features of proteins in different tissues. They represented each tissue as a network, where nodes represent proteins. Individual tissue networks act as layers in a multilayer network, where they use a hierarchy to model dependencies between the layers (i.e., tissues). Recently, Liu et al. [26] extended a standard graph mining into the area of multilayer network. The proposed methods (“network aggregation”, “results aggregation” and “layer co-analysis”) can project a multilayer network of a continuous vector space. On one hand, without leveraging interactions among layers, “network aggregation” and “results aggregation” apply the standard network embedding method on the merged graph or each layer to find a vector space for multilayer networks. On the other hand, in order to consider the influence of interactions among layers, “layer co-analysis” expands arbitrary single-layer network embedding method to a multilayer network. Aakas et al. [58] proposed highly scalable node embedding for link prediction in large-scale networks. The method learns the co-occurrence features of node pairs to embed a node into a vector by a damping-based random walk algorithm. In the nodes sampling process, there is a bias problem with these existing methods that samples are trapped in a local structure. In addition, cross-layer sampling heavily depends on fixed parameters, which is an inflexible manner. Ma et al. [29] implemented node embedding for multi-dimensional networks with hierarchical structure. They simply added up node embedding in multiple dimensions as the fusion feature of nodes in multiple networks. Matsuno et al. [34] presented a multilayer network embedding method (MELL) that captures and characterizes each layer’s connectivity. The method utilizes the overall structure to consider similar or complementary structure of the layer. Finally, the fusion feature learning of nodes in multiplex networks is obtained by combining node embedding in each layer with layer vectors. Zhang et al. [54] proposed a scalable multiplex network embedding method (MNE), which assumes that the same nodes in multiple networks preserve certain common features and unique features of each layer. Thus, the common and unique embedding of nodes in each layer is learned by the Deepwalk algorithm separately. Sun et al. [44] presented a framework MEGAN for multi-view network embedding by generative adversarial network, aimed at preserving the information from the individual network views, while accounting for connectivity across different views. Wei et al. [47] proposed an attributed node random walk framework, which can not only be able to incorporate both the topology and attribute information flexibly but also easily deals with missing data and is applied to large networks. For multiple network alignment problem, Chu et al. [5] proposed a cross-network embedding method (CrossMNA). CrossMNA defines two categories of embedding vectors for each node, inter-vector and intra-vector. The idea of CrossMNA is the same as that of MNE. They think intra-vector contains both the commonness among counterparts and the specific local connections in its selected network due to the semantics. Cen et al. [3] focused on embedding learning for attributed multiplex heterogeneous networks, where different types of nodes might be linked with multiple different types of edges, and each node is associated with a set of different attributes. GATNE splits the overall node embedding of GATNE-I into three parts: base embedding, edge embedding, and attribute embedding. GATNE-T contains only the first two parts.

2.2 Network sampling

In the above methods based on random walks, these researchers utilize random walk to generate the context of nodes that preserve the local topology properties of nodes. The random walk method is one of network sampling approaches. The sampling of large networks is a fundamental data mining problem. When the network is huge and it is costly or infeasible to process the network in its entirety, the network sampling is often the most realistic option to infer network properties and to obtain estimates about basic topological properties [10, 51]. The sampling methods have been successfully used for many network-related measurements, ranging from estimating the size of the network or even to higher-order properties such as the motif, clustering coefficient, community structure [31, 18], and more. Actually, the goal of network sampling method is similar to that of network representation. Both the local and global topological structural properties of the network are preserved by the generated node sequence or edges chain. The network sampling depends on appropriate criteria for choosing sampling methods to measure the properties more accurately.

Crucial sampling strategies include BFS, DFS, FFS, snowball sampling, random walk, metropolis hastings random walk, reweighted random walk and respondent-driven sampling. The BFS method begins with a random node and visits its neighbors iteratively. Similar to BFS, DFS is derived from the depth-first search algorithm. Snowball sampling is another traversal-based sampling. First, it randomly selects a starting node and puts it into the current nodes set, and then all nodes connected to any node in the current nodes set are chosen and put into the current nodes set recursively until required number of nodes have been selected. Simple random walk, BFS, DFS, snowball sampling are biased towards high degree nodes. Another example is the forest fire sampling (FFS) method [21] which takes advantage of partial BFS where only a fraction of neighbors is followed for each node. The algorithm starts by picking a node uniformly at random and adding it to the sample sequences. Then it “burns” a random proportion of its outgoing links and adds those edges, along with the incident nodes, to the sample. Metropolis-hasting random walk(MHRW) is an example of node sampling-based algorithms. In the node sampling process, nodes are sampled independently and uniformly at random.

However, the existing network embedding and network sampling algorithms are difficult to adaptively be extended to multiplex networks adaptively, and there is a problem that the relationship type of sampling is imbalanced. Hence, we propose two adaptive multiplex network sampling algorithms based on MHRW and FFS to solve the above problems. Each algorithm has different biases [32]. MHRW can avoid the biased to high degree nodes of random walk. FFS can capture module structure information of network. Also, in order to achieve a more efficient representation of multiplex networks, we propose a traversal-based network embedding learning method based on NPC. The detailed definitions and implementation steps are presented in the next section. Noboa et al. [9] studied the influence of network structure and sampling on relational classification. Network structure factors contains heterophilic/homophilic networks, link density and degree assortativity and the selection of seed nodes.

3. Problem definition

In this section, we describe related concepts and definitions in detail. Firstly, the basic concepts of network sampling and multiplex networks are as shown below. Secondly, we formulate generalized random walk in multiplex networks. Lastly, we define a node embedding problem for multiplex network embedding.

.

Network sampling. Let $G=\{V,E\}$ denote the input graph, a sampling technique is a function $f:G^{\rightarrow}G_{s}$ with sampling rate $0<\varphi<1$ , where $G_{s}=\{V_{s},E_{s}\}$ is the sampled network in which $V_{s}\subseteq V$ , $E_{s}\subseteq E$ and $|V_{s}|=\varphi\times|V|$ .

.

Multiplex networks. Considering $L$ layers multiplex networks of $N$ nodes, a multiplex network in which each node can interact with the other ones by means of different $L$ kinds of relationships. An aligned multiplex networks $G=\{G^{\alpha}(V,E^{\alpha})$ , $\alpha\in[1,L]\}$ is made up of $L$ layers with $N=|V|$ nodes and $M=|\sum_{\alpha=1}^{L}E^{\alpha}|$ edges.

$i^{\alpha}$ denotes the node $i$ at layer $\alpha\in[0,L]$ , $e_{i,j}^{\alpha}$ denotes the edge between nodes $i^{\alpha}$ and $j^{\alpha}$ connected at layer $\alpha$ . In terms of the edge $e^{\alpha,\beta}_{i}$ , it denotes the nodes $i^{\alpha}$ and $i^{\beta}$ are the duplicates of same entity $i$ in different layers. $e_{i,j}^{\beta}$ denotes an anchor link between node $i$ and node $j$ in layer $\beta$ . Besides the edge of intra-layer, a edge of inter-layers $e_{i,j}^{\alpha,\beta}$ denotes the edge between nodes $i$ and $j$ are connected by inter-layers edge $e^{\alpha,\beta}_{i}$ and intra-layer $e_{i,j}^{\beta}$ cross two layers $\alpha$ and $\beta$ . An adjacency matrix element $A^{\alpha,\beta}_{i,j}\in R^{N\times N}$ , if $A_{i,j}^{\alpha,\beta}=1$ , node $i$ and node $j$ are cross layer connection; otherwise, $A_{i,j}^{\alpha,\beta}=$ 0.

.

Generalized Random Walk in multiplex networks. Suppose the walker starts at node $i^{\alpha}$ in layer $\alpha$ , then it walks to $j^{\beta}$ . A selected probability $p_{i,j}^{\alpha,\beta}$ helps to find neighbor $j^{\beta}$ according to $i^{\beta}$ that a duplicate of $i$ in $\alpha$ layer, which $\alpha,\beta\in[0,L]$ layer in graph $G$ . If $\alpha=\beta$ , the node i and j in the same layer. If $\alpha\neq\beta$ , $i^{\alpha}$ and $j^{\beta}$ are connected by the duplicate $i^{\beta}$ of node $i^{\alpha}$ . $i^{\alpha}$ and $i^{\beta}$ are the same entity at different layers. Therefore, $\forall i,j\in V$ and $p_{i,j}^{\alpha,\beta}$ is given by

$\displaystyle p_{i,j}^{\alpha,\beta}=\left\{\begin{array}[]{cl}f_{s}(i,j)&% \alpha=\beta,\\ f_{d}(i,j)&\alpha\neq\beta,\\ 0&e_{i,j}^{\alpha}\vee e_{i,j}^{\beta}\notin E.\\ \end{array}\right.$ (1)

where $f_{s}(i,j)$ is the probability that node $i$ jumps to neighbor $j$ , and $i$ , $j$ in the same layer. sThe specific implementation of this function can be referred to literature [21]. $f_{d}(i,j)$ is the probability that node $i$ jumps to neighbor node $j$ in the different layer. If there is no edge between the node pairs, the jumping probability is equal to 0. $f_{s}$ and $f_{d}$ jointly control the direction of walker.

.

Node embedding on multiplex network. Suppose the methods can make use of a real-valued adjacency tensor A, $A\in R^{N\times N\times L}$ (e.g., representing text or additional information with nodes). The goal of node embedding is to learn the mapping functions $\textit{ENC:A}\rightarrow R^{N\times d}$ .

ENC is a mapping function with the target to learn for d-dimensional representation of node $i$ from $A_{i}$ , and $A_{i}$ is vector of adjacency matrices. $A_{i}=\{A^{1}_{i},\ldots,A^{\alpha}_{i},\ldots,A^{L}_{i}\}$ , where $A^{\alpha}_{i}$ is the $d$ -dimensional vector and $d\ll N$ . Notice that all definitions above can be easily extended to the case of weighted multiplex networks.

Inspired by a family of network embedding methods based on random walk [26, 11, 40, 43, 22, 52, 24, 59], we intuitively summarize the processes of representation learning methods based on random walk for multiplex networks are as follows:

The first step is to calculate the probability of intra-layer and inter-layers jump between nodes according to a certain topology property $\phi$ and $\psi$ respectively. The probability of intra-layer jumping determines which neighbor node it should jump to. In a single-layer network, $\phi$ can be degree, centrality, feature similarity and structural proximity. In multilayer networks, $\psi$ should be added to measure multiplexity property. $\psi$ can be the edge overlap number, intra-class correlation coefficient (ICC) and the pairwise Pearson correlation coefficients, the entropy of the multiplex degree and the multiplex participation coefficient. It determines the probability of stay at the original layer or cross-layer.

The second step aims at generating node context by a sampling method with an instance of $\phi$ and an instance of $\psi$ . In other words, after the node property $\phi$ and $\psi$ are determined, a multiplex network sampling function is also determined. Executing this sampling function can produce the sample node sequences which are regarded as the context of nodes.

The third step is to feed the node sequences to a Skip-Gram [20] model for learning embedding, including the optimization of learning model and the update of parameters. Empirically, the quality of the generated node context directly affects the quality of the final learned embedding vectors.

Taking literature [26] as an example. $\phi$ is embodied by two parameters $p$ and $q$ of Node2vec. $\psi$ is embodied by a fixed parameter. Obviously, this method lacks flexibility and does not take into account the type imbalance of sampling edges problem. Algorithm 4 presents the pseudo-code of the generalized node embedding for multiplex networks based on truncate traversal sampling methods.

[htbp]The generalized multiplex network embedding based on truncate network samplingGraph $G=\langle V,E,L\rangle$ , truncate length $\tau$ , number of iterations $i t e r s$ Assumptions: $L_{s}$ : sequence of selected node with size of $\tau$ $S t$ : visited nodes list for sampling multiplex networks. $E_{g}$ : The matrix of node embedding $L_{s}=$ Nonenode in V 1 $\rightarrow$ iters $St=\textit{Sampling\_Algorithm}$ ( $G$ , node, $\tau$ )Append ( $\textit{L}_{s},St$ )

$E_{g}=\textit{Skip-gram}(L_{s})$

4. Adaptive sampling-based multiplex network embedding

Figure 2.

Our sampling-based node embedding framework for multiplex networks. This framework contains two components: A. Node sample process and B. Representation Learning process.

In this section, we first present the definition of neighbors partition coefficient to achieve adaptively cross-layer jumping. Then we propose a method to solve the type imbalance of sampling nodes problem using a fixed-length queue. We implement two adaptively node embedding methods for multiplex networks based on MHRW and FFS, named MHME and FFME respectively. Both of them can alleviate the biased problem of random walk, DFS, BFS, etc. Finally, the optimization of network representation learning is given for iteratively updating vectors. In two parts of Fig. 2, the intra-layer sampling, NPC, cross-layer jumping are shown in ANEF.

4.1 Adaptive cross-layer jump problem

In order to achieve adaptive cross-layer sampling of nodes, we propose a neighbors partition coefficient (NPC). It is used to measure the multiplexity property of nodes in a multiplex network to decide whether to stay in the original layer or cross-layer jump to other layers. The original multiplexity of node in a multiplex network only considered the number of neighbors of the node and did not distinguish the overlapping neighbors of the node.

.

Neighbors Partition Coefficient(NPC) of Node in Multiplex Networks. Supposing $\widetilde{L}$ is a layer set of node $i$ which not is isolated, we can formulate: Supposing $\widetilde{L}$ is a set of layers in which node $i$ is not isolated, we can formulate NPC of node $i$ :

$\displaystyle\textit{NPC(i)}=\frac{2}{{|\widetilde{L}|(|\widetilde{L}|-1)}}% \sum\limits_{\alpha<\beta}^{E}{\frac{{\delta(\phi(\Gamma_{i}^{\alpha},\Gamma_{% i}^{\beta}))}}{{\delta(\Gamma_{i}^{\alpha})}}},$ (2)

where $\Gamma_{i}^{\alpha}$ is a set of node $i$ ’s neighbors in layer $\alpha$ . $\alpha$ and $\beta$ denote the layer where node $i$ located respectively. $\delta$ is a function that calculates the size of a set. $\phi$ is a function that calculates the intersection of two sets. Intuitively, if the neighbor of nodes in each layer is different, $\textit{NPC}_{i}$ is equal to 0 and set it equal to 1 otherwise.

In terms of node properties, the node’s neighbor distribution in each layer reveals the node’s link mode. In addition, when a specific node in a certain layer is deleted, the impact on the whole network is also different. In the node sampling process, nodes with larger NPC should have a smaller cross-layer jumping probability. The goal of setting is to make the sampling path contain as many different neighbors as possible by frequent jump between inter-layers. Therefore, the NPC of a node is regarded as an indicator of staying the original layer for the next sampling operation.

4.2 Type imbalance of sampling edges problem

For this problem, we use a fixed-length queue to memorize the layers that were previously visited. We will select a layer that is not in the queue to treat as a target layer will jump.

.

Cross-layer jumping strategy. We define a fixed-length queue $Q_{s}$ of size $s$ to memorize up-to- $s$ previously visited layers. According to NPC of a node, the probabilities of walker cross-layer jumpy is given by:

$\displaystyle{P_{i,j}^{\alpha,\beta}}=\left\{{\begin{array}[]{ll}\textit{NPC(i% )}+{\frac{1}{{L}}},&\beta=\alpha\\ \frac{{1-\textit{NPC(i)}}}{{(L-{|Q_{s}|})}},&\beta\in[(0,L)-{Q_{s}}]\\ \frac{{(L-{|Q_{s}|}-1)(1-\textit{NPC(i)})}}{{L}},&\beta\in{Q_{s}}\\ 0,&\textit{otherwise}.\end{array}}\right.$ (3)

To gracefully balance types of edges sampled among different layers without missing information of some layers, the strategy considers the NPC of node and memorizes the visited queue. If the candidate set isn’t empty, we choose to jump to other layers except for the candidate set. Then, we update $Q_{s}$ by using a first-in-first-out (FIFO) manner that dropping the oldest layer if $Q_{s}$ is full. Instead of a new parameter, this strategy generates the context of nodes via adaptive cross-layer sampling.

4.3 Node embedding based on adaptive sampling

In terms of Sampling_Algorithm function in Algorithm 4, we implement the function for different purposes based on MHRW and FFS. MHRW algorithm uses acceptance/rejection function to capture complex network structure information avoiding the bias of random walk [50]. FFS algorithm can not only avoid the bias problem but also capture the module property of networks. Consequently, we propose MHME and FFME methods for node embedding of multiplex network based on MHRW and FFS respectively.

4.3.1 MHME

For taking the superiority of MHRW into account, we propose a truncate metropolis-hastings random walk based on NPC and $Q_{s}$ for network embedding, called MHME.

.

Metropolis-hastings random walk. The algorithm is an application of the metropolis-hastings (MH) strategy for random walk, for unbiased graph sampling. The probability of node $j$ being selected from node $i$ ’s neighbors is given by

$\displaystyle f^{mh}_{s}(i,j)=\left\{\begin{array}[]{ll}\textit{NPC(i)}/d_{i}% \times\textit{min}(1,d_{i}/d_{j}),&\textit{if}\ j\in\Gamma(i)\\ \textit{NPC(i)}\times(1-\sum_{i\neq v}p_{i,v}^{mh}),&\textit{if}\ i=j\\ 0,&\textit{otherwise}\\ \end{array}\right.$ (4)

where min(1, $d_{i}/d_{j}$ ) is the acceptance function of MH. Let $\textit{j}\in\Gamma(i)$ be a neighbor of node $i$ . $d_{i}$ denotes the degree of node $i$ . It makes use of a modified random walk to draw nodes from the graph. In particular, MHRW modifies the walk probabilities so that the walk converges into a uniform distribution. It suggests that MHRW can overcome the large biases problem of random walk and BFS. In fact, Eq. (4) is equal to a product of NPC(i) and probability of MH. Based on the definition above, we present an instance method (MHME) of ANEF model with MHRW for node embedding of multiplex networks. The MHME pseudo-code is shown in Algorithm 7.

[htbp]The Sampling_Function based on MHRWGraph $G=\langle V,E,L\rangle$ , initial node, truncate length $\tau$ Assumptions: $S t$ : the sequence of selected nodes with size of $\tau$ . $Q_{s}$ : the visited layers queue of length s. curlayer: the current layer of walking. $\gamma$ : the probability of stay current layer based on NPC.St $v=\textit{node}$ ; $\textit{St}=$ empty; $Q_{s}=$ empty; $\textit{next\_layer}=$ 0 $\textit{curlayer}=$ Randomly select a layer which node is not isolated. $\gamma$ is calculated by Eq. (2)length (St) $<\tau$ $\textit{St}=\textit{St}\cup v$ Select node $u$ according to $v$ and Eq. (4) in layer curlayer. $\textit{cross}_{l}\textit{ayer}=$ AliasSample ( $\gamma(v)$ ) $\textit{cross}_{l}\textit{ayer}\neq\textit{curlayer}$ $Q_{s}=Q_{s}\cup\textit{curlayer}$ Select next_layer according to Eq. (3) $\textit{next\_layer}=\textit{curlayer}$ $v=u$ ; $\textit{curlayer}=\textit{next\_layer}$

4.3.2 FFME

In capturing the ability of modules, FFS outperforms other sampling strategies in the aspect of network visualization. Moreover, Wu et al. [48] validated that FFS is modularity sensitive. We also propose another node embedding instance method (FFME) of ANEF based on adaptive FFS. FFS first choose node $v$ uniformly at random. We then generate a random number $x$ that is geometrically distributed with mean $p_{f}/(1-p_{f})$ . Node $v$ selects $x$ links incident to nodes that are not yet visited. Let $w_{1}$ , $w_{2}$ , $\ldots$ , $w_{x}$ denote the other ends of these selected links. We then apply this step recursively to each of $w_{1}$ , $w_{2}$ , $\ldots$ , $w_{x}$ until enough nodes have been burned. FFME uses a truncated forest fire sampling based on NPC and $Q_{s}$ to implement Sampling_Algorithm function for network embedding. The FFME pseudo-code is shown in Algorithm 4.3.2.

[htbp]The Sampling_Function based on FFSGraph $G=\langle V,E,L\rangle$ , initial node, truncate length $\tau$ Assumptions:St: the sequence of selected nodes with size of $\tau$ , $Q_{s}$ : the recently visited layers queue and the length of queue is $s$ .curlayer: the current layer of walking. $\gamma$ : the probability of stay current layer based on NPC.St $v=\textit{node}$ ; $L_{s}=$ empty; $Q_{s}=$ empty; $\textit{next\_layer}=$ 0 $\textit{seed\_quence}=$ [ $v$ ]; $\gamma$ is calculated by Eq. (2) $\textit{curlayer}=$ Randomly select a layer which node is not isolatedlength (St) $<\tau$ Select fire_seed from curlayer layer.Select next node $u$ from fire_seed in curlayer layer. $\textit{seed\_quence}=\textit{seed\_quence}\cup\textit{fire\_seed}$ $\textit{St}=\textit{St}\cup v$ $\textit{cross}_{l}\textit{ayer}=$ AliasSample ( $\gamma(v)$ ) $\textit{cross}_{l}\textit{ayer}\neq\textit{curlayer}$ $Q_{s}=Q_{s}\cup\textit{curlayer}$ Select next_layer according to Eq. (3) $\textit{next\_layer}=\textit{curlayer}$ $v=u$ ; $\textit{curlayer}=\textit{next\_layer}$ seed_quence.remove(v)

In summary, as to the Algorithms 7 and 4.3.2 pseudo-code, larger $\gamma$ means that the neighbors of a node in each layer are similar. In this case, it has a smaller jump probability. Smaller $\gamma$ indicates that the neighbors of a node are different in each layer. Therefore, in order to preserve complete information of nodes, we need to have a larger probability of inter-layers jump for the walker.

4.4 Optimization

We conduct MHME and FFME methods to generate sequences of nodes. Then, we perform Skip-Gram over the sequences to learn the node embedding with a given dimension d. For a node v, assuming it appears in position j, we define $\Gamma(v)=\{v_{j-c},\ldots,v_{j+c}\}$ as the context of v, where 2c is the window size. We model this problem as a minimization of the following negative likelihood problem.

$\displaystyle-\log P({v_{j-c}},\ldots,{v_{j-1}},{v_{j+1}},\ldots,{v_{j+c}}|{v_% {j}}).$ (5)

Hence, we need to minimize Eq. (6)

$\displaystyle-\log\prod\limits_{u\in\Gamma(v)}{P(u|v)}=-\sum\limits_{u\in% \Gamma(v)}{\log P(u|v)}.$ (6)

For each $P(u|v)$ , a softmax function is used to define the probability:

$\displaystyle P(u|v)=\frac{{\exp(X_{u}^{T}{X_{v}})}}{{\sum\limits_{n\in V}{% \exp(X_{n}^{T}{X_{v}})}}},$ (7)

where $X_{v}$ represents the input word embedding of node $v$ , $X_{u}$ and $X_{n}$ represent the parameters of context vectors. To solve the computational problem, we adopt the negative sampling approach proposed in [20], which samples multiple negative edges according to some noise distribution for each edge. We replace each $\textit{logP}(u|v)$ with

$\displaystyle O({X_{u}},{X_{v}})=-\log\sigma(X_{u}^{T}{X_{v}})-\sum\limits_{{v% _{n}}\in{N_{e}}}{\sigma(-X_{{v_{n}}}^{T}{X_{v}})},$ (8)

where $\sigma=\frac{1}{(1+\textit{exp}(-x))}$ is the sigmoid function and $N_{e}$ denotes the negative samples. We adopt Stochastic Gradient Descent (SGD) to optimize the problem. In each step, the context of a nodes is sampled according to their weights. For each sampled edge, the gradient for $X_{u}$ , $X_{v}$ and $X_{v_{n}}$ are

$\displaystyle\frac{{\partial O({X_{u}},{X_{v}})}}{{\partial{X_{v}}}}=(1-\sigma% (X_{u}^{T}{X_{v}}))X_{u}^{T}-\sum\limits_{{v_{n}}\in{N_{e}}}{(1-\sigma(-X_{{v_% {n}}}^{T}{X_{v}})X_{{v_{n}}}^{T})},$ (9) $\displaystyle\frac{{\partial O({X_{u}},{X_{v}})}}{{\partial{X_{u}}}}=(1-\sigma% (X_{u}^{T}{X_{v}})){X_{v}},$ (10) $\displaystyle\frac{{\partial O({X_{u}},{X_{v}})}}{{\partial{X_{v_{n}}}}}=-(1-% \sigma(-X_{{v_{n}}}^{T}{X_{v}}){X_{v}}),{v_{n}}\in{N_{e}}.$ (11)

Then, we update node embedding vectors, as follows:

$\displaystyle{X_{v}}={X_{v}}-\eta\frac{{\partial O({X_{u}},{X_{v}})}}{{% \partial{X_{v}}}},$ (12) $\displaystyle{X_{u}}={X_{u}}-\eta\frac{{\partial O({X_{u}},{X_{v}})}}{{% \partial{X_{u}}}},$ (13) $\displaystyle{X_{v_{n}}}={X_{v_{n}}}-\eta\frac{{\partial O({X_{u}},{X_{v}})}}{% {\partial{X_{v_{n}}}}},$ (14)

where $\eta$ denotes learning rate. We suppose that the multiplex networks have $L$ layers, the number of positive nodes is $N_{p}$ , the number of negative edges is $N_{e}$ , $M$ denotes the number of edges. In preprocessing steps, the time complexity of NPC is $O(M+L^{2}\cdot N)$ . We adopt the alias methods to randomly sample a neighbor of the target node from a discrete distribution. The complexity per optimization with negative sampling is $O(L\cdot N\cdot(1+N_{e}))$ , where $N_{e}$ is the number of negative samples. If the training times is $t$ , the total complexity of optimization processes is $O(t\cdot L\cdot N\cdot(1+N_{e}))$ .

5. Simulation results

In this section, we validate the performance of ANEF model1

¹
The source code is available at: https://github.com/Brian-ning/ANEF.

One is the Algorithm 7 (MHME) based on metropolis hastings random walk. The other is the Algorithm 4.3.2 is (FFME) based on forest fire sampling. Both sampling methods are extended by neighbors partition coefficient (NPC) for adaptive cross-layer sampling. The real world multiplex network benchmark datasets,2

https://comunelab.fbk.eu/data.php.

and details are shown in Table 1. Furthermore, these two experiments are performed on an Intel(R) Core(TM) CPU i5-3470 at 3.20 GHz with 12.0 GB RAM. We use a unified hyperparameter setting for training, which is

\textit{walk\_number}=

20,

\textit{walk\_length}=

25, (

p, q

)

=

{(1,1), (1,2)},

d=

100,

j=

0.8,

\rho=

0.8. Here, walk_number denotes the number of walk for each node. walk_length denotes the length of walks or the length of samples. (

p, q

)

=

{(1,1),(1,2)} denotes the hyperparameters of DeepWalk and Node2Vec respectively.

d

is the dimension of vector.

j

is a cross-layer jump parameter of LCAE. For FFS,

\rho

is a forward burning probability, and the backward burning ratio is

1-\rho

. According to the above setting, to test the performance and scalability of these two algorithms, we learn node embedding in multiplex networks through different sampling methods for various types of datasets.

Table 1

Properties of the datasets

Dataset Name	Nodes	Edges	Layer	Undirected	Unweighted
CS-Aarhus	61	620	5	Yes	Unweighted
CKM	246	1551	3	No	Unweighted
Pierre	514	7153	16	Yes	Weighted
ArXiv	14489	59026	13	Yes	Weighted

CS-Aarhus: This dataset is a social multiplex network [30] that consists of five kinds of online and offline relationships (Facebook, Leisure, Work, Co-authorship, Lunch) between the employees of computer science department at Aarhus.

CKM: This dataset is a physicians multiplex network [6] collected by Coleman, Katz and Menzel on medical innovation, considering physicians in four towns in Illinois, Peoria, Bloomington, Quincy and Galesburg. It is concerned with the impact of network ties on the physicians’ adoption of a new drug, tetracycline.

Pierreauger: This dataset is a working relationship multiplex network [7] that consists of layers corresponding to different working tasks within the Pierre Auger Collaboration. The multiplex network considered all submissions between 2010 and 2012. It assigned each report to 16 layers according to its keywords and its content, with manual disambiguation to avoid spurious results from an automated process.

ArXiv: This dataset is a multiplex network [7] that consists of layers corresponding to different arxiv categories. To restrict the analysis to a well-defined topic of research, it only includes papers with “networks" in the title or abstract up to May 2014.

5.1 Comparing methods

Deepwalk[40]: first applies random walk to single network for node embedding. It feeds the sampling node sequences to Skip-gram algorithm for training of node embedding.

Node2Vec[11]: simulates biased random walks by adding a pair of parameters to control BFS and DFS sampling process on a single-layer network. It improves embedding performance by capturing the role of nodes, such as hubs or tail users.

LCAE[26]: introduces the walk jump probability based on information distance. This method not only uses the first and second order random walk to traverse on a layer, but also has the ability to traverse between layers by leveraging interactions. Through the evaluation, it is proved that compared with regular link prediction methods, “layer co-analysis” achieved the best performance on most of the datasets.

Ohmnet[60]: is based on hierarchy structure and Node2vec. Given any multilayer network and a hierarchy describing relationships between the layers, it embeds nodes in each layer into a low dimensional space.

MELL[34]: is a novel embedding method for multiplex networks, which incorporates an idea of layer vector that captures and characterizes each layer’s connectivity. This method exploits the overall structure effectively and embeds both directed and undirected multiplex networks, whether their layer structures are similar or complementary.

MNE[54]: is a joint learning model based on high-dimensional common embedding and lower-dimensional additional embedding for each type of relation. It constructs a unified network embedding model to learn multiple relations jointly.

GATNE-T [3]: is a representation learning method based on self-attention mechanism for multiplex heterogeneous networks with both transductive and inductive settings. The base embedding is shared among edges of different types, while the edge embedding is computed by aggregation of neighborhood information with the self-attention mechanism.

In this paper, we apply Deepwalk and Node2Vec to the merged network of multiplex networks. For LCAE algorithm, we use “layer co-analysis” methods with the best performance as the comparison algorithm. As to MELL, we fuse the node embedding in each layer as comprehensive embedding to predict test dataset. In terms of parameter setting, we set $\beta=$ 1, $\gamma=$ 1, $\lambda=$ 10, $k=$ 3. In the case of MNE, we regard the common embedding of MNE as global embedding for prediction. GATNE-T is conducted according to default parameters. Base embedding is served as a global feature of each node. For the above algorithms, we use the same embedding dimension, the same training times, the same verification dataset and the test dataset. The purpose of this experiment setting is to fairly demonstrate that ANEF is effective.

5.2 Cross-domain link prediction

In this section, we perform a cross-domain link prediction task on the multiplex networks. Supposing there is a target layer in multiplex networks, cross-domain link prediction task is to deduce whether the node pair in this specific layer is connected on the basis of the node embedding of other layers. The cross-domain link prediction task can also be applied to evaluating the interdependent and interactional relationship in different layers between the same nodes pairs. Through the task, we can mitigate the cold start problems of the recommendation system and the problem that information available for the users tend to be limited in emerging networks.

We refer to the experimental setup of link prediction in the multilayer network of literature [13]. We remove a certain layer of original multiplex networks and use Area Under Curve (AUC) and Accuracy scores to evaluate the performance of algorithms to predict missing edges in the target layer. These removed node pairs are regarded as positive examples. We randomly sample an equal number of node pairs from this layer which has no edge connecting them as negative examples. First, our ANEF model can learn the latent vector representation of nodes for the residual network. Then, we use logistic regression as a prediction model. Finally, we also use a 5-fold stratified cross-validation testing strategy: for each test, we use 80% of the data for training and the remaining 20% for testing. With each target class as the complete set, each fold set contains approximately the same percentage of samples since prediction items are sorted in the data. For every test set, the node pairs are sorted according to the scores returned by the classified for the positive class label, i.e., an existing link. Subsequently, we use AUC score from these curves to indicate the relative performance of each task by averaging the results across all folds. We are interested in the fraction of positive examples correctly classified instead of the fraction of negative examples incorrectly classified.

Table 2
The averaged Accuracy score of cross-domain link prediction

	CS-Aarhus		CKM		Pierreauger		ArXiv
	20	30	20	30	20	30	20	30
DeepWalk	0.5372	0.6314	0.7272	0.7711	0.7550	0.7125	0.7550	0.7707
Node2Vec	0.5758	0.6597	0.7313	0.7797	0.6800	0.6875	0.7778	0.7609
LCAE	0.7528	0.7794	0.8640	0.8485	0.6700	0.7875	0.8194	0.8303
Ohmnet	0.7697	0.7572	0.8483	0.8495	0.6700	0.725	0.8164	0.8303
MELL	0.5572	0.5706	0.6442	0.6110	0.6300	0.6400	–	–
MNE	0.7395	0.6640	0.7759	0.7856	0.6450	0.7450	0.8362	0.8313
GATNE-T	0.7701	0.7063	0.8434	0.8539	0.6719	0.7760	0.8433	0.8659
FFME	0.8618	0.8750	0.8638	0.8606	0.6400	0.8125	0.8462	0.8364
MHME	0.8340	0.8739	0.8645	0.8610	0.8350	0.8300	0.8472	0.8422

Figure 3.

The AUC scores of different walk length in cross-domain link prediction task.

As we can see from Table 2 and Fig. 3, we compare the performance of MHME and FFME with Deepwalk, Node2vec, LCAE, Ohmnet, MELL, MNE and GATNE-T under the same four datasets and evaluation conditions. We treat Accuracy and AUC as two indicators of performance evaluation are shown as Table 2 and Fig. 3, respectively. In Table 2, we compare the average Accuracy scores (5-folds) of different methods with the number of walk length is 20 and 30. From the bold text of Table 2, we can observe that two instances (MHME and FFME) of our framework is significantly better than the other compared methods. It can be seen from the data in Fig. 3 that MHME exhibits excellent performance with AUC and Accuracy scores in general. In detail, we consider that MHME, FFME and LCAE as a comparison group. MHME and FFME considering the bias and type imbalance problems in node sampling processes can improve the quality of node embedding. From another perspective, a comparison of results of ANEF and single-layer merged network methods (DeepWalk and Node2vec) reveals that our framework can effectively fuse multiple features of the identical node in different layers. This comprehensive feature can utilize information from other layers to enhance the link prediction performance of MHME and FFME. Moreover, the simple aggregation of edges in different layers will bring noisy information. Ohmnet is an extended method based on Node2vec. But Ohmnet cannot like Node2vec in a single-layer network which accurately captures the node role by adjusting a pair of parameters. The reason is that the aggregation of multiplex networks bring about the loss of nodes role information in the original network. In addition, Ohmnet requires stringent conditions that there is a hierarchy describing relationships between the layers. But, these relationships are hard to capture. As for MELL, MNE and GATNE-T, the results of these three algorithms are all the generalization of a node in each layer. Hence, we need to fuse these multiple features or utilize base embedding as a comprehensive feature for the cross-domain link prediction task. In addition, GATNE-T get a good perference with walk length is set as 30 in ArXiv dataset. We have carried out additional experiments, when our algorithm is executed in another time, it will get better accuracy than the current best. GATNE-T algorithm proposed for heterogeneous networks, but our experimental data set is homogeneous, so it does not meet some assumptions of the algorithm. Generally, We think that MELL, MNE and GATNE-T are unsuitable for this task.

Overall, these results indicate that the two adaptive cross-layer sampling of MHRW and FFS based on neighbors partition coefficient (NPC) can outperform state-of-the-art methods, like MELL, MNE, GATNE-T. Through the comparison of proposed ANEF model (MHME and FFME), we believe that the proposed methods can get more accurate multiplexity and adjacency properties of nodes in multiplex networks. It should be noted that the reason why the results are missing in ArXiv is that we get an Out-of-Memory exception.

5.3 Mutual community detection

Node clustering aims to group similar nodes together so that nodes in the same group are more similar to each other than those in different groups. Mutual Community Detection (MCD) task is to distill relevant information from other aligned social networks for complement information. It can improve the clustering or community detection while preserving the distinct characteristics of each individual network. After representing nodes as vectors, the traditional clustering algorithms can be applied to the node embedding matrix. To evaluate the result of mutual community detection, we use k-means algorithm to cluster node and cosign function to compute similarities between vectors. In terms of the evaluation metric, the literature[37] developed a generalized modularity framework for studying the community structure of multiplex networks. We adopt a generalized modularity value as an evaluation criterion that measures the tightness of connections within the community after obtaining node clustering.

$\displaystyle{Q_{\textit{multislice}}}(p)=\frac{1}{{2\mu}}{\sum\limits_{% \scriptstyle i,j\in c\hfill\atop\scriptstyle k,l:1\to\alpha\hfill}{\left\{% \left({A_{ij}^{k}-{\lambda_{k}}\frac{{d_{i}^{k}d_{j}^{k}}}{{2{m^{k}}}}}\right)% \delta_{k,l}+\delta_{i,j}C_{j}^{k,l}\right\}}}\delta(g_{i}^{k},g_{j}^{l})$ (15)

where $\lambda_{k}$ denotes a resolution factor, $\mu=\sum_{k:1\to\alpha}{{m^{k}}}$ denotes a normalization factor which is introduced to cope with the modularity resolution problem, i and j range over all nodes, k and l range over all slice, $C_{j}^{k,l}$ is the link between node j in slice k and node j in slice l (inter-layers edges). Through this definition, a lot of modular optimization methods based on Eq. (15) can be directly applied to multiplex networks of community discovery. In this paper, we set the community number of multiplex networks is two.

Table 3

The modularity scores of compared algorithms

Algorithm	CS-Aarhus	CKM	Pierreauger	ArXiv
Deepwalk	0.3265	0.3368	0.4747	0.5686
Node2vec	0.3188	0.3387	0.4934	0.4429
LCAE	0.3184	0.3068	0.3924	0.5325
Ohmnet	0.5156	0.3598	0.5677	0.6945
MNE	0.3319	0.3447	0.6725	0.5834
GATNE-T	0.3485	0.3588	0.6908	0.6072
FFME	0.3808	0.3611	0.6912	0.7234
MHME	0.2382	0.2445	0.3157	0.4036

The Table 3 illustrates that the FFME method generally outperforms other methods, which contains Ohmnet, MNE and GATNE-T. As for CS-Aarhus data set, Ohmnet algorithm obtains better experimental results on account of it is relatively sparse and the size of the community is small. Moreover, in the experiment setting, the value of walks length is set to 25 will cause distortion of community information. The result of FFME also indicates that the FFS can capture the modules property. There is a significant positive correlation between the modularity score of FFME and the size of the network. This is an interesting outcome that FFME is suitable to detect mutual communities of large-scale networks. However, MHME avoids the sampling bias problem using the same probability of each node being sampled, which leads to the loss of network community structure information. Therefore, the performance of MHME is worst in the mutual community detection task. It also verifies that the performance of ANEF depends on the characteristic of specific sampling algorithm implemented in community detection tasks.

To summarize, the proposed MHME can preserve local adjacency information of nodes and FFME can precisely detect community structure of multiplex networks. Our model based on different sampling algorithm outperforms state-of-the-art baselines in real network data sets. It is obvious that different sampling strategies have different degrees of competitiveness for different goals.

5.4 Parameter sensitivity study

In this section, we investigate the impact of node embedding dimension and the sampling length walk_length on the quality of node embeddings in CS-Aarhus social multiplex networks. Similar results are also observed on the node clustering task.

Figure 4.

Impact of three parameters on Area Under Curve (AUC) scores in link prediction task: (a) node embedding dimension in the left column; (b) walk length of node sampling in the right column; (c) burning probability parameter $\rho$ of FFS in the bottom.

Node embedding dimension. By fixing other parameters to their optimal values, we investigate the impact of the node embedding dimension by increasing it from 10 to 230 with a step of 20. Figure 4a shows the AUC for link prediction with respect to different dimensions. We observe that the AUC scores of FFME and MHME tend to increase first and then stabilize when the dimension reaches to 100. Moreover, when dimension is larger than 100, our node embedding framework has been able to capture sufficient information. If you continue to increase, the noise can be introduced in the embedding of nodes. It should be noted that the studies in [36] have empirically demonstrated that extracted methods of network features based on three-hop neighborhoods contain the most useful information.

Walk length of sampling. By fixing other parameters, we study the walk length of sampling nodes by increasing it from 10 to 50 with a step of 2. Figure 4b verifies the relation between AUC for link prediction and walk length. The results indicate that the quality of embedding vectors of nodes will increase with the increase of the walking length. After the walk length reaches 15, the performance of ANEF model tends to be stable. So, we empirically select $\textit{walk\_length}=$ 25 in all previous experiments.

Burning probability parameter of FFS. We investigate the impact of the forward burning probability $\rho$ by increasing it from 0.05 to 1 with a step of 0.05. Figure 4c shows that FFME by gradually increasing $\rho$ within the specified range yields stable performance. When the value of $\rho$ exceeds this range, the performance of FFME will tend to decrease. The reason is that FFME maybe will tend to preserve global network properties.

6. Conclusion

Multiple-sources, multiple-relational and multiple-dimensional networks are ubiquitous in the real world. We model these kinds of networks as a multiplex network or multilayer network. In this paper, we mainly focus on multiplex network representation learning and propose a novel adaptive model of node embedding(ANEF). We implement the node sampling process of ANEF based on metropolis hastings random walk method (MHRW) and forest fire sampling method (FFS), respectively. These two sampling methods can not only capture complex interaction mode and module information of network respectively but also be less likely to be trapped in local with a high degree node. Initially, to construct node sequences by adaptive network sampling algorithms proposed, we propose an efficient neighbors partition coefficient (NPC) as a jumping indicator. Then, ANEF also takes the imbalanced relation type problem into consideration in the process of cross-layer sampling. We utilize a fixed-length queue to memorize previously visited layers. Finally, the embedding vectors of nodes are obtained through feeding the node visited sequence to Skip-Gram model. We evaluate our methods and baseline methods in cross-domain link prediction and Mutual community detection tasks. The experimental results show that our methods have superior performance in different tasks for four real-world datasets. For future work, we will make use of other effective sampling algorithms and strategies to preserve structure information of different scales in multiplex networks. In addition, according to the current research on network representation learning, few research works involve the sensitivity of the algorithm to noise or the network vulnerability. Therefore, it is also an interesting research direction.

Footnotes

Acknowledgments

This work is supported by the National Key Research and Development Program of China (2018YFC0831500), the National Natural Science Foundation of China under Grant No.61972047 and the NSFC-General Technology Basic Research Joint Funds under Grant U1936220.

References

Bagavathi

and Krishnan

, Multinet: Scalable multilayer network embeddings, arXiv preprint arXiv:1805.10172, 2018.

Cardillo

Gómez-Gardenes

Zanin

Romance

Papo

Del Pozo

and Boccaletti

, Emergence of network features from multiplexity, Scientific Reports 3 (2013), 1344–1350.

Cen

Zou

Zhang

Yang

Zhou

and Tang

, Representation learning for attributed multiplex heterogeneous network, In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 1358–1368.

Chen

and Chen

, Research on knowledge graph application technology, In Journal of Physics: Conference Series, volume 1187, IOP Publishing, 2019, p. 042083.

Chu

Fan

Yao

Zhu

Huang

and Bi

, Cross-network embedding for multi-network alignment, In The World Wide Web Conference, 2019, pp. 273–284.

Coleman

Katz

and Menzel

, The diffusion of an innovation among physicians, Sociometry 20(4) (1957), 253–270.

De Domenico

Lancichinetti

Arenas

and Rosvall

, Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems, Physical Review X 5(1) (2015), 011027(1)–011027(11).

Dong

Chawla

N.V.

Swami

Dong

Chawla

N.V.

and Swami

, metapath2vec: Scalable representation learning for heterogeneous networks, In Acm Sigkdd International Conference on Knowledge Discovery & Data Mining, 2017, pp. 135–144.

Espín-Noboa

Wagner

Karimi

and Lerman

, Towards quantifying sampling bias in network inference, In Companion Proceedings of the The Web Conference 2018, 2018, pp. 1277–1285.

10.

Gjoka

Kurant

Butts

C.T.

and Markopoulou

, Walking in facebook: A case study of unbiased sampling of osns, In Infocom, IEEE, 2010, pages 1–9.

11.

Grover

and Leskovec

, node2vec: Scalable feature learning for networks, In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2016, pp. 855–864.

12.

Gong

Lou

and Zhang

, The hidden flow structure and metric space of network embedding algorithms based on random walks, Scientific Reports 7(1) (2017), 13114–13125.

13.

Hristova

Noulas

Brown

Musolesi

and Mascolo

, A multilayer approach to multiplexity and link prediction in online geo-social networks, EPJ Data Science 5(1) (2016), 24.

14.

and Flaxman

, Multimodal sentiment analysis to explore the structure of emotions, In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2018, pp. 350–358.

15.

Cheng

Huang

Fang

and Luo

, On embedding uncertain graphs, In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, ACM, 2017, pp. 157–166.

16.

and Lau

W. C.

, A survey and taxonomy of graph sampling, CoRR, 2013.

17.

Hussein

Yang

and Cudré-Mauroux

, Are meta-paths necessary: Revisiting heterogeneous graph embeddings, In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, ACM, 2018, pp. 437–446.

18.

Jalali

Z.S.

Rezvanian

and Meybodi

M.R.

, Social network sampling using spanning trees, International Journal of Modern Physics C 27(5) (2016).

19.

Jalili

Orouskhani

Asgari

Alipourfard

and Perc

, Link prediction in multiplex online social networks, Royal Society Open Science 4(2) (2017), 160863–160873.

20.

and Mikolov

, Distributed representations of sentences and documents, In International conference on machine learning, 2014, pp. 1188–1196.

21.

Leskovec

and Faloutsos

, Sampling from large graphs, In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2006, pp. 631–636.

22.

Wang

Yang

Zhang

and Zhou

, Ppne: property preserving network embedding, In International Conference on Database Systems for Advanced Applications, Springer, 2017, pp. 163–179.

23.

Chen

Tong

and Liu

, Multi-layered network embedding, In Proceedings of the 2018 SIAM International Conference on Data Mining, SIAM, 2018, pp. 684–692.

24.

Zhu

and Zhang

, Discriminative deep random walk for network classification, In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 1004–1013.

25.

Liang

Chen

Zhu

Fan

and Lu

, Integrating data and knowledge to identify functional modules of genes: a multilayer approach, BMC Bioinformatics 20(1) (2019), 225.

26.

Liu

Chen

P.-Y.

Yeung

Suzumura

and Chen

, Principled multilayer network embedding, In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, 2017, pp. 134–141.

27.

Xuan

Zhang

and Luo

, Structural property-aware multilayer network embedding for latent factor analysis, Pattern Recognition 76 (2018), 228–241.

28.

Lyu

Zhang

and Zhang

, Enhancing the network embedding quality with structural similarity, In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, ACM, 2017, pp. 147–156.

29.

Ren

Jiang

Tang

and Yin

, Multi-dimensional network embedding with hierarchical structure, In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM, 2018, pp. 387–395.

30.

Magnani

Micenkova

and Rossi

, Combinatorial analysis of multiple networks, arXiv preprint arXiv:1303.4986, 2013.

31.

Maiya

A.S.

and Berger-Wolf

T.Y.

, Sampling community structure, In Proceedings of the 19th international conference on World wide web, ACM, 2010, pp. 701–710.

32.

Maiya

A.S.

and Berger-Wolf

T.Y.

, Benefits of bias: Towards better characterization of network sampling, In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2011, pp. 105–113.

33.

Malod-Dognin

Petschnigg

Windels

S.F.

Povh

Hemmingway

Ketteler

and Pržulj

, Towards a data-integrated cell, Nature Communications 10(1) (2019), 805.

34.

Matsuno

and Murata

, Mell: effective embedding method for multiplex networks, In Companion Proceedings of the The Web Conference 2018, International World Wide Web Conferences Steering Committee, 2018, pp. 1261–1268.

35.

Mikolov

Chen

Corrado

and Dean

, Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781, 2013.

36.

Mostafavi

Goldenberg

and Morris

, Labeling nodes using three degrees of propagation, PloS One 7(12) (2012), e51947.

37.

Mucha

P.J.

Richardson

Macon

Porter

M.A.

and Onnela

J.-P.

, Community structure in time-dependent, multiscale and multiplex networks, Science 328(5980) (2010), 876–878.

38.

Nassar

and Gleich

D.F.

, Multimodal network alignment, In Proceedings of the 2017 SIAM International Conference on Data Mining, SIAM, 2017, pp. 615–623.

39.

Newman

, Network structure from rich but noisy data, Nature Physics 14(6) (2018), 542–545.

40.

Perozzi

Al-Rfou

and Skiena

, Deepwalk: Online learning of social representations, In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 701–710.

41.

Tang

Shang

Ren

Zhang

and Han

, An attention-based collaboration framework for multi-view network representation learning, In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 1767–1776.

42.

Ran Wei

, On Estimation Problems in Network Sampling. PhD thesis, 2016.

43.

Ribeiro

L.F.

Saverese

P.H.

and Figueiredo

D.R.

, struc2vec: Learning node representations from structural identity, In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2017, pp. 385–394.

44.

Sun

Wang

Hsieh

T.-Y.

Tang

and Honavar

, Megan: A generative adversarial network for multi-view network embedding, arXiv preprint arXiv:1909.01084, 2019.

45.

Wang

Mezlini

A.M.

Demir

Fiume

Brudno

Haibe-Kains

and Goldenberg

, Similarity network fusion for aggregating data types on a genomic scale, Nature Methods 11(3) (2014), 333.

46.

Wang

Chen

and Li

, Predictive network representation learning for link prediction, In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2017, pp. 969–972.

47.

Wei

Pan

Yang

and Zhou

, Attributed network representation learning via deepwalk, Intelligent Data Analysis 23(4) (2019), 877–893.

48.

Cao

Archambault

Shen

and Cui

, Evaluation of graph sampling: A visualization perspective, IEEE Transactions on Visualization & Computer Graphics 23(1) (2016), 401–410.

49.

Xue

and Zhang

, Cross-domain network representations, Pattern Recognition 94 (2019), 135–148.

50.

Yildirim

, Bayesian inference: Metropolis-hastings sampling, Dept. of Brain and Cognitive Sciences, Univ. of Rochester, Rochester, NY, 2012.

51.

Yousuf

M.I.

and Kim

, List sampling for large graphs, Intelligent Data Analysis 22(2) (2018), 261–295.

52.

Zhang

Yin

Zhu

and Zhang

, User profile preserving social network embedding, In IJCAI International Joint Conference on Artificial Intelligence, 2017.

53.

Zhang

Yin

Zhu

and Zhang

, Network representation learning: A survey, IEEE transactions on Big Data, 2018.

54.

Zhang

Qiu

and Song

, Scalable multiplex network embedding, In IJCAI, volume 18, 2018, pp. 3082–3088.

55.

Zhang

Xia

Zhang

Cui

and Philip

S.Y.

, Bl-mne: emerging heterogeneous social network embedding through broad learning with aligned autoencoder, In 2017 IEEE International Conference on Data Mining (ICDM), IEEE, 2017, pp. 605–614.

56.

Zhang

and Yu

P.S.

, Community detection for emerging networks, In Proceedings of the 2015 SIAM International Conference on Data Mining, SIAM, 2015, pp. 127–135.

57.

Zhang

and Yu

P.S.

, Broad Learning Through Fusions: An Application on Social Networks, Springer, 2019.

58.

Zhiyuli

Liang

and Chen

, Hsem: highly scalable node embedding for link prediction in very large-scale social networks, World Wide Web, 2018, pp. 1–26.

59.

Zhou

Liu

and Gao

, Scalable graph embedding for asymmetric proximity, In Thirty-First AAAI Conference on Artificial Intelligence, 2017.

60.

Zitnik

and Leskovec

, Predicting multicellular function through multi-layer tissue networks, Bioinformatics 33(14) (2017), i190–i198.

An adaptive node embedding framework for multiplex networks

Abstract

Keywords

1. Introduction

2.2 Network sampling

3. Problem definition

.

.

.

.

.

.

4.3.1 MHME

.

4.4 Optimization

1 The source code is available at: https://github.com/Brian-ning/ANEF.

5.2 Cross-domain link prediction

Table 2 The averaged Accuracy score of cross-domain link prediction

Footnotes

Acknowledgments

References

¹
The source code is available at: https://github.com/Brian-ning/ANEF.

Table 2
The averaged Accuracy score of cross-domain link prediction