Abstract
Network representation learning aims at learning a low-dimensional vector for each node in a network, which has attracted increasing research interests recently. However, most existing approaches only use topology information of each node and ignore its attributes information. In this paper, we propose an Improved Attributed Node Random Walks(IANRW) framework, which constructs the neighborhood of an attributed node and then leverages the skip-gram model to perform node embeddings. The method can be able to flexibly incorporate both the topology and attribute information. Additionally, it can easily deal with missing data and be applied to large networks. Extensive experiments on six datasets show that IANRW outperforms many state-of-the-art embedding models and can improve various attributed networks mining tasks.
Introduction
Network representation learning aims at learning a low-dimensional distributed representation vector of nodes in a network, which has proved extremely useful for a wide variety tasks [1, 2, 3, 4, 5]. As shown in Fig. 1, the network representation learning is a bridge that connects the network’s original data and network application tasks. The main idea of node embeddings is to distill the high-dimensional information about a node to a dense vector embedding. The vector embeddings can then be used to deal with tasks such as node classification, link prediction, clustering and social recommendation [2, 3, 4]. Node classification and link prediction problems in networks both require to construct a vector representation for the nodes. Usual methods are to learn the vector by defining an objective function and solving the corresponding optimization problem. These methods must make a trade-off in balancing computational efficiency and predictive accuracy. Alternatively, some design an objective which only focus on preserving local neighborhoods of nodes.
The flow chart of network representation learning.
Up to now, most of previous works such as DeepWalk [6], LINE [7], Node2Vec [8] and SDNE [9] have been designed for plain networks. They only investigate the topology information of networks and ignore the node attributes which are potentially complementary in learning. They assume that nodes with similar topology context should be distributed closely in the learned representation space. However, in real-word networks, there always exit attributes information of nodes. For example, users in social networks may have profiles like gender, age and address. Node attributes information are important to measure the similarity between nodes. It is very likely that two users share very similar interests, but they are not connected in social networks. As the node attribues encode much information of networks, integrating them into the embedding process is expected to achieve a better performance. Moreover, by utilizing the auxiliary attributes information, the link sparsity can also be alleviated. Some recent works try to incorporate node attributes and community structure into the network embeddings [10, 11, 12, 13]. TADW [14] as the first attempt to incorporate the text features of nodes into network embedding process under a framework of matrix factorization. However, it is very time and memory consuming and cannot be used to large networks. How do we effectively preserve the concept of “word-context” among attributed networks? Can random walks, as used in DeepWalk and node2vec, be applied to attributed networks? Can we directly apply network-oriented embedding architectures (e.g., skip-gram) to attributed networks?
To address the above challenges, in this paper, we propose the Improved Attributed Node Random Walks (IANRW) framework which is based on attributed random walks [35]. Instead of learning individual embeddings for each type based on functions that map node attributes to types in attributed random walks, IANRW learns individual embeddings for each node. The goal of IANRW is to maximize the likelihood of preserving both the structures and attributes of nodes in networks. In IANRW, we define two kinds of neighbors. One is topology neighbor, another is attribute neighbor. The topology neighbors of a node
Our main contributions are defining a flexible notion of a node’s two kinds of neighbors and proposing a biased random walk between topology neighbors and attribute neighbors. Moreover, our method allows nodes to miss the links or attributes, which is suitable for incomplete networks. The proposed method is flexible by controlling over the search space through tunable parameters. The parameters can be easily interpreted and govern our search strategy. These parameters can be determined in advance or learned directly using a tiny fraction of labeled data. Our experiments mainly including multi-label classification, link prediction and node clustering tasks demonstrate that IANRW outperforms state-of-the-art methods on multi-label classification, link prediction and node clustering. IANRW is also robust to perturbations in the form of missing edges and missing attributes of nodes.
To summarize, our work makes the following contributions:
We propose IANRW, an efficient scalable framework for attributed network embedding. The framework flexibly incorporates both network topology and node attributes information, and it can be easily interpreted and govern our search strategy. To utilize the attribute similarity information, we define two kinds of neighbors. One is topology neighbor, another is attribute neighbor. These two kinds of neighbors make our random walk easier in attributed networks. We show how IANRW can easily deal with missing data, regardless of missing node’s edges or node’s attributes. We extensively evaluate our approach through multi-class classification, link prediction and node clustering on several real-world datasets. Experimental results show the superior performance of IANRW over state-of-the-art embedding methods.
The rest of the paper is structured as follows. In Section 2, we survey related work in network representation learning. Section 3 defines the problem of improved attributed node random walks. Section 4 introduces the details of the proposed IANRW framework. Section 5 presents the experimental results. Finally, we conclude this work and vision the future works in Section 6.
Network embedding aims to learn a distributed representation for each node in a network. Most of earlier works mainly use matrix factorization techniques. They extract the leading eigenvectors of affinity matrix as the representations of nodes [16]. provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. CENI [17] adopts clustering method on the projected space can better preserve the structure hidden in the cascades and generate more accurately inferred links [18] provides some new theoretical results on random projections, which is a randomized approximate algorithm widely used in machine learning and data mining. LLE [19] maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima [20] efficiently computes a globally optimal solution, and, for an important class of data manifolds, is guaranteed to converge asymptotically to the true structure.
Recently, the Skip-gram model [21, 22] which is originally introduced for learning vector representations of words has been used for network embedding. DeepWalk [23] is first to utilize random walks to get a set of node sequences to learn the network representations by Skip-gram model. Tang et al. propose LINE [24], which preserves both the local and global structures information of networks by using first-order proximity and second-order proximity. GraRep [1], which can be regarded as an extension of LINE, considers high-order information. SDNE [25] is the first deep model which preserves both the first and second order proximity to capture the highly nonlinear structural information of networks. Node2Vec [26] introduces hyperparameters to tune the explored search space, which is a mixture of breadth-first and width-first search procedures and is flexible to sample nodes from a network. Most recently [27], presents a semi-supervised learning framework based on network embeddings. It uses deep learning to enhance the learned embedding.
However, the above discussed methods cannot effectively handle multi-view data such as node attributes. Zhu et al. [28]. combined both the links and node attributes, and proposed a matrix factorization model to learn embeddings. TADW [14] incorporates the text features of nodes into network embedding process. Huang et al. [29]. proposed an approach for attributed networks, which combines with the label information to help learn better feature representation. NEEC [30] is a general framework for attributed network embedding. Instead of directly modeling expert cognition, NEEC learns it from the oracle by performing a number of concise but effective queries. MVC-DNE [31] incorporates both the network structures and the node properties based on deep network embedding methods, which is the first time to study the problem of network embedding on incomplete networks. GCN [32] is designed for semi-supervised learning, which requires the full graph Laplacian information during training. Recently, Hamilton et al. proposed an inductive network embedding method GraphSAGE [33], which can be generalized to unseen nodes. SEANO [34] is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Nesreen et al. [35, 36, 37] proposed a flexible framework based on the notion of attributed random walks. Instead of learning individual embeddings for each node, it learns embeddings for each type based on functions that map node attributes to types. However, as it learns embeddings for each type, nodes with the same attributes will get the same embeddings, which is unreasonable for nodes with the same attributes but very different topologies. Furthermore, this method cannot handle missing data well, such as missing edges or attributes of nodes. In this paper, we propose an improved attributed node random walks framework which is based on attributed random walks. It flexibly incorporates both network topology and node attributes information and can easily deal with missing data.
Problem definition and preliminaries
In this section, we will define our research problems formally. Attributed network
Topology neighbors and attribute neighbors describe nodes from two different views. When clustering nodes based on attributes, we can choose any suitable clustering method.
The main challenge of our problem is how to use the attributes of nodes. It is hard to directly apply the attributes to the previous network embedding models. Previous works preserve the proximity between a node and its neighborhood. In an attributed network, how do we define and model this ‘node-neighborhood’ concept? Furthermore, how do we optimize the embedding model that effectively maintains the structures and attributes of nodes?
Nesreen et al. proposed a flexible framework based on the notion of attributed random walks [35, 36, 37]. The framework consists of two general steps:
Function mapping nodes to types: The first step is to learn a function
Attributed random walks: The second step uses the types derived by the function
The set of attributed random walks can then be given as input into the Skip-Gram model to learn embeddings for the node types instead of the nodes themselves.
Skip-gram model: Skip-Gram is a model used to produce word embeddings. It takes a large corpus of text and produces a corresponding vector for each word in the corpus. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the vector space. As shown in Fig. 2, The input
Skip-gram model.
In this section, we present the details of the proposed IANRW framework. First, we introduce our model framework. Second, the objective function contains both the topology information and attributes information is introduced in this framework. Finally, we present the specific details of IANRW and discuss some practical issues.
Framework
Figure 3 shows the framework of IANRW. We can see that each node in the network is associated with attributes. In fact, the model allows some nodes to lose attributes information. Our IANRW framework mainly contains three steps. First, topology matrix and attribute matrix are constructed from the given network. Topology matrix is the adjacency matrix. Each row in attribute matrix presents the attributes of a node. Second, nodes are clustered according to their attributes or using a function
Framework of IANRW.
Given a text corpus, word2vec [15] model learns the distributed representations of words in a corpus. Inspired by it, DeepWalk [23] and node2vec [26] extended the Skip-gram model to networks. Both methods leverage random walks to generate node sequences and aim to map the word-context concept into the node sequences. Here, we still apply this idea to attributed networks. Given a network
where
The objective function Eq. (1) can be rewritten as follows by replacing
where
where
In random walks, given a source node
where
However, this does not allow us to account for the network structure and guide our search procedure to explore different types of network neighborhoods. Intuitively, the nodes that belong to a node’s topology neighbors and the attribute neighbors simultaneously should be more similarities with it, which should be easier to walk to. Additionally, in real-world networks, attributes and topology are not always equally important. Our random walks should accommodate for the fact in different networks. Based on the above facts, we define the unnormalized transition probability in networks as the following:
where
Illustration of the random walk procedure in IANRW. The walk now resides at node u and valuating its next step. Edge labels indicate search biases.
Parameter
Dealing with missing data. A significant advantage of our modes is that it works well with missing data. Going back to Fig. 4 again. If node
Compared with the attributed node random walks. First, attributed node random walks learns embeddings for the node types instead of the nodes themselves. Nodes with the same attributes will get the same embeddings, which is unreasonable for nodes with the same attributes but very different topologies. While IANRW learns individual embeddings for each node. Second, IANRW uses parameter
The pseudocode for IANRW is given in Algorithm 1. Algorithm 1 assumes the nodes clusters are learned apriori. At every step of the walk, sampling is done based on the transition probabilities
In this section, we demonstrate the efficacy of the presented IANRW frameworks for attributed network representation learning.
Datasets. We conduct experiments on two social networks and four paper citation networks. The two social networks are Flickr and Google+. In social networks, nodes refer to users and links represent friend relationships among users. In Google+ network, each node has 5476 dimensions features, such as institution, place, university. We choose the institution of each user as the categories and select top 20 popular institutions as the final categories. Flickr is an online community. People in the communities can share photos. Photographers can follow each other and form a network. The groups that photographers joined are the labels and the tags specified on the photos are the attributes information. The four paper citation networks are Cornell, Cora, Texas and Citeseer. Nodes in the networks refer to papers and links refer to the citation relationships among papers. Papers are classified according to the belonged domains. Features of papers are binary values indicating whether each word in the vocabulary is present (indicated by 1) or absent (indicated by 0) in the papers. Table 1 shows the detailed information of the six datasets.
Baseline Methods. IANRW is measured against the following baseline methods:
DeepWalk [6]: is a topology-only network embedding method, which combines random walks and skip-gram to learn network representations. Node2vec [8]: an algorithmic framework for learning feature representations for nodes in networks, which defines a flexible notion of a node’s network neighborhood. Node2vec designs a biased random walk procedure. Line [7]: is a network embedding model with the first order and second order proximity preserved. TADW [14]: is the first shallow model incorporates rich text information of the network. Grarep [1]: which can be regarded as an extension of LINE, considers high-order information. Attributed node random walks [35]: is an approach learns embeddings for each type based on functions that map feature vectors to types. In the experiment we abbreviated attributed node random walks as ANRW.
For all embedding methods, we follow the suggestions of the original papers to set the parameters of all these baselines. In IANRW, we use spectral clustering to divide the community and the number of communities divided equals real number of groups in the data set.
Detailed information of the six datasets
In real-world networks, only a subset of nodes in networks are labeled. Thus, node classification can be used to predict the labels of unlabeled nodes. In this section, the learned node representations by various network embedding methods are used as the input features of multi-class node classification model. We randomly sample a portion of labeled nodes as the training data, and the rest are the test data. The portion
Tables 3 and 3 show the multi-class classification results of the six datasets. From the results, it is evident that IANRW consistently outperforms other baseline methods. On Cora, all the methods perform very well in addition LINE. Their F1score are more than 0.7. This is mainly because the nodes in Cora are densely connected. Methods based on edge connection can achieve good results. On Citeseer, Flickr and Google+, IANRW completely beat other methods. These networks are sparsely connected and attribute information can be very important for network representation learning. TADW is also for attributed network and also performs well. In the small network such as Cornell and Texas, our method IANRW completely beat all the other methods. Over all, we can see that IANRW has the best performance and TADW has quite good performance. However, TADW cannot be applied to large-scale networks. On the other hand, the methods only based on topology information cannot get a stable performance.
Parameter sensitivity. IANRW has a number of parameters like the node2vec algorithm. We conduct a sensitivity analysis of IANRW to these parameters on the Cornell and Google+ datasets. When a parameter is tested, all the other parameters assume default values. We fix the training proportion to 50% and measure the Micro-F1 score.
Figure 5 shows the classification performances with different parameters. The classification performances improve as
| Methods | Micro (Cornell) | Macro (Cornell) | Micro (Citeseer) | Macro (Citeseer) | Micro (Cora) | Macro (Cora) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
10% | 30% | 50% | 10% | 30% | 50% | 10% | 30% | 50% | 10% | 30% | 50% | 10% | 30% | 50% | 10% | 30% | 50% |
| DeepWalk | 0.28 | 0.36 | 0.38 | 0.24 | 0.24 | 0.23 | 0.49 | 0.56 | 0.58 | 0.46 | 0.52 | 0.54 | 0.75 | 0.79 | 0.81 | 0.73 | 0.78 | 0.79 |
| Line | 0.27 | 0.36 | 0.38 | 0.15 | 0.15 | 0.15 | 0.25 | 0.29 | 0.31 | 0.23 | 0.26 | 0.27 | 0.38 | 0.44 | 0.47 | 0.28 | 0.40 | 0.42 |
| Node2vec | 0.27 | 0.41 | 0.41 | 0.20 | 0.23 | 0.24 | 0.48 | 0.57 | 0.58 | 0.45 | 0.53 | 0.53 |
|
0.79 | 0.81 | 0.74 | 0.77 | 0.79 |
| TADW | 0.36 | 0.54 | 0.58 | 0.23 | 0.26 | 0.31 | 0.56 | 0.61 | 0.63 | 0.53 | 0.58 | 0.61 |
|
0.80 | 0.81 | 0.73 | 0.77 | 0.79 |
| Grarep | 0.32 | 0.45 | 0.51 | 0.23 | 0.27 | 0.34 | 0.50 | 0.54 | 0.55 | 0.46 | 0.48 | 0.48 | 0.74 | 0.78 | 0.78 | 0.73 | 0.76 | 0.76 |
| ANRW | 0.57 | 0.62 | 0.63 | 0.38 | 0.41 | 0.43 | 0.37 | 0.37 | 0.38 | 0.16 | 0.47 | 0.47 | 0.37 | 0.38 | 0.38 | 0.23 | 0.26 | 0.27 |
| IANRW |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Classification performance on Texas, Flickr and Google+ datasets
Parameter sensitivity on Cornell and Google+.
continued.
In Fig. 5, it also shows how the number of features
In link prediction, the task is to predict the edges that will be added in the future time. We randomly remove 50% of edges in the network, which are considered as the positive samples. To obtain negative samples, we also sample an equal number of edges which are not in the network. Based on the residual network, we learn the node representation by different methods. Then we calculate the cosine similarity of each node pair. AUC [39] is used to evaluate the performance of methods as it can find the optimal threshold to predict positive and negative links automatically.
We summarize the link-prediction results in Table 4. One can see that IANRW outperforms all the baselines except on Google+. On Google+, the DeepWalk, Line and Node2vec slightly outperforms IANRW. On Flickr, the performance of DeepWalk and Node2vec are as good as IANRW. On Cornell, Citeseer, Cora and Texas, our method completely beat all the baselines, which demonstrates that IANRW is effective in learning good node vectors for the task of link prediction.
Link prediction results (AUC) on the six datasets
Link prediction results (AUC) on the six datasets
In this section, we also conduct node clustering tasks to measure the performance of the methods. We apply the learned embeddings to a clustering model. We leverage the
Table 5 shows the results. As we can see, IANRW achieves the best results in all networks. Especially in Sparse network such as Cornell and Texas, IANRW achieves about 30% improvement compared with the second-best results, which demonstrates the superior performance of our model for the task of node clustering. This is mainly because the attributes information is also important in this networks. A flexible combination of the links and attributes provides the best performance.
Node clustering results (NMI) on the six datasets
Node clustering results (NMI) on the six datasets
In this subsection, we conduct experiments to evaluate the performance of our proposed method on missing data. It includes nodes have no edges or have no attributes. To construct an incomplete network with missing links, we randomly sample a portion of nodes to remove their all edges. The portion
Table 6 shows the results of incomplete network with missing links. One can see that with the increase of removed link proportion, the classification accuracy score decrease. However, the decline trends of IANRW is smaller than other methods, and they tend to be stable finally. It demonstrates that our method can learn nodes representation well in incomplete network with missing links. Table 7 shows the results of incomplete network with missing attributes. We can see that IANRW consistently outperform baseline methods. From the two experiments, it demonstrates that IANRW can easily deal with missing data.
Classification performance on Cora with missing links
Classification performance on Cora with missing links
Classification performance on Cora with missing attributes
In this paper, we define the representation learning problem in attributed networks. To address the attributed network challenge, we propose the IANRW framework. The framework is developed from the attributed random walks and can be able to capture both the topology and attribute information. We formalize the improved attributed node random walks, which enable the Skip-gram-based maximization of the network probability in the context attributed nodes. The search strategy in IANRW is flexible exploring network neighborhood through parameters
Footnotes
Acknowledgments
Our work is supported by the National Key Research Development Program of China (No. 2017YFB0802800). The authors would like to thank the Editor-in-Chief and anonymous reviewers for their insightful and constructive commendations that have led to an improved version of this paper.
