Temporally-aware node embeddings for evolving networks topologies

Abstract

Static node embedding algorithms applied to snapshots of real-world applications graphs are unable to capture their evolving process. As a result, the absence of information about the dynamics in these node representations can harm the accuracy and increase processing time of machine learning tasks related to these applications. Aiming at fill the gap regarding the inability of static methods to capture evolving processes on dynamic networks, we propose a biased random walk method named Evolving Node Embedding (EVNE). EVNE leverages the sequential relationship of graph snapshots by incorporating historic information when generating embeddings for the next snapshot. It learns node representations through a neural network, but differs from existing methods as it: (i) incorporates previously run walks at each step; (ii) starts the optimization of the current embedding from the parameters obtained in the previous iteration; and (iii) uses two time-varying parameters to regulate the behavior of the biased random walks over the process of graph exploration. Through a wide set of experiments we show that our approach generates better embeddings, outperforming baselines by up to 20% in a downstream node classification task. EVNE’s embeddings achieve better performance than others, based on experiments with four classifiers and five datasets. In addition, we present seven variations of our model to show the impact of each of EVNE’s mechanisms.

Keywords

Node embeddings evolving graphs representation learning

1. Introduction

Real-world data and processes are often modeled as graphs, since these abstractions can express the relationships between entities of interest in an intuitive and useful way. Relationships in online social networks ([17]), protein-protein interactions in biological systems ([33]), telco churn prediction ([20]), location-based networks and route graphs in roadmaps ([3,24]), credit card fraud detecction ([29]) are some examples of data intuitively modeled as graphs (also referred to as networks).

However, in many real networks, access to data associated to nodes and edges is often difficult and/or costly. For this reason, it is not uncommon that, at all times, a large fraction of the data we desire to model is unobserved. In most cases, data is acquired through some sort of online search or exploration of the graph, which can be seen as an evolving process that increases the available knowledge about the network as the search progresses. At each step, information about topology, nodes and edges’ is collected, but the full knowledge may never be attained ([21]). Analyzing these networks is essential to extract hidden knowledge and meaningful patterns, since many machine learning tasks on graphs involve making predictions over nodes and edges ([8]. Effective graph analysis can consequently improve both these predictions and the performance of machine learning algorithms, which are highly influenced by the discriminate representation of the information extracted in the observed data ([1].

In this work, we focus on addressing issues that arise with evolving graphs, whose network topology is partially unknown throughout the graph exploration. In this scenario, we start from a small set of nodes and edges and, at each step of the search (i.e. network snapshot), nodes and edges may be added or removed. As an example of evolving graphs, we have the process of spreading fake news on social media. ([34]). As an example of evolving graphs, we have the process of spreading fake news on social media [33]. In this context, imagine that a few Twitter users issue a initial tweet with a fake news. Each of these users would then be modeled as a node in the graph, which is composed only by nodes in the beginning of the evolving process. After some time, other users start to posting or retweeting the same fake news. In this case, users posting the fake news are modeled as new nodes in the network and users retweeting the fake news not only are modeled as new nodes but also create and edge with the users from the original tweet. As the dissemination of the fake news goes on, the graph keeps evolving adding new nodes and edges. As mentioned before, networks evolve over time and capturing this evolution process can be decisive in several applications, as described in the example of fake news spread on Twitter. Therefore, it is essential that models are capable of not only capturing this temporal evolution but also doing so as quickly as possible, even when this dispersion event is still small (and also the spread network generated by this event). By doing this quickly, it is possible to avoid having greater and more effective damage control in these scenarios.

By modeling the way entities in the data relate to each other as a graph, we can extract useful information that cannot be obtained by analyzing the data in isolation. As an example, in PageRank, the importance of structure goes beyond first and second order neighborhoods: the centrality of a node is a recursive function of the entire graph ([2]). In this direction, a slew of machine learning methods have been developed for graph data in the past few years, mostly for tasks involving predictions over nodes and edges ([8]), by improving node and graph learning representations. Effective representations can improve both these types of predictions and the performance of machine learning algorithms, which are dependent on how discriminative is the information extracted from the observed data ([1,5]).

In the context of graph analytics problems graph embeddings were proven to be an excellent alternative for graph representation ([17]). Graph embedding is a framework for building low dimensional representations of the entire graph, or parts of it, e.g. nodes, edges and sub-graphs, while preserving structural information and graph properties ([5]). In this work, we focus on node embedding techniques, which aim to represent each node as a d-dimensional feature vector (d is an input) that accurately captures its relationships to other nodes ([5]).

Seminal node embedding methods focused on applying linear and non-linear transformations to a graph similarity or adjacency matrix, aiming to reduce the high dimensionality of non-relational data modeled as networks. These methods perform poorly on real-world networks due to high computational cost and lack of robustness ([10,27]). More recently, other node embedding approaches for static graphs ([10,23,27]) were proposed in order to learn feature representations for each node scalably and with a more consistent performance. These approaches outperform the seminal methods in all prediction tasks for graphs. Among these methods is node2vec ([10]), a static embedding technique for node representations, is often the best performing for node classification, although it also performs well on other tasks.

node2vec builds on the Skip-Gram model ([18]) by generating “sentences” of nodes through sampling sequences of nodes using a biased random walk, and learning node embeddings via an optimization process. Biased random walks can balance between behaving more as a BFS (breadth first search) or as a DFS (depth first search) depending on how node2vec parameters are chosen, defining a flexible notion of neighborhood ([10]). It is worth mentioning that there are also static node embedding techniques based on recurrent and convolutional neural networks, such as GE ([12]) and SDNE ([32]), but these tend to work better when larger volumes of data are available (which is not a characteristic of the scenarios hereby explored) and are, therefore, out of the scope of this paper.

All of the methods mentioned above assume that the graph topology is fully observable and static (i.e., does not change over time and thus do not consider evolving processes). In the scenario where the graph evolves as it is explored described above, applying one of these methods to generate node embeddings consists of running it on snapshots taken from the network. In this case, each snapshot and respective embedding represents an independent network. This approach disregards the fact that each snapshot may contain relevant information about the next, and that incorporating past information into the learning procedure could improve the accuracy of downstream inference tasks. Moreover, the resources spent in computing previous embeddings are entirely wasted. As a result, this approach leads to decreased performance in machine learning tasks mainly due to the characteristic of adding or removing nodes and edges of growing graphs, which directly impacts on the stability of the generated embedding, since the embeddings can change significantly from one snapshot to the next.

In order to compensate for the shortage of limitations and gaps mentioned before, in this work:

Proposal: We propose a technique named Evolving Node Embedding (EVNE), that instead of learning the node embeddings in each snapshot from scratch, incrementally builds the embedding at time t from the embedding at time $t - 1$ . This is achieved by the development of three mechanisms that are incorporated to node2vec: (i) we provide a strategy for varying node2vec parameters during the evolving process in order to balance the exploration bias of random walks throughout the snapshots (the values depend on the progress of the exploration); (ii) we leverage random walks computed during step $t - 1$ in step t, allowing the paths observed in these snapshots to be complementary; and (iii) before learning the embedding for snapshot t, we initialize it as the embedding learned for $t - 1$ . These mechanisms allow EVNE to generate embeddings for evolving networks faster and more effectively than existing static approaches.

Hypothesis: We design experiments to answer four research questions:

RQ1: Among the existing static node embeddings under consideration, which one achieves the best node classification performance in the described evolving network scenarios?

RQ2: Are EVNE’s mechanisms effective for improving node classification performance in evolving network scearios? Can EVNE outperfom the baselines?

RQ3: Is EVNE more robust than the baselines? Are the results of EVNE consistent over time?

RQ4: Can we quantify and understand the effect of the modifications on node2vec over time?

Research design and results:

We conduct a comprehensive experimental study to evaluate the quality of the embeddings obtained for an evolving graph by EVNE in comparison to those obtained by node2vec, LINE ([27]) and DeepWalk ([23]). More precisely, we consider a sequence of graphs coming from an online network exploration process and, for each snapshot, we train a model that classifies nodes as belonging to one of two classes.

We evaluate the embeddings according to their performance on the node classification task using four different network datasets. Furthermore, we conduct an ablation study to show the impact of each of the new mechanisms incorportated in our model.

Our results show that our approach generates better embeddings: in a downstream node classification task, our embeddings achieve better performance than node2vec and the other baselines in almost all scenarios, with numbers up to 20% better.

The rest of the paper is organized as follows. In Section 2, we review the related work. We develop our method in Section 3 and report the experimental results in Section 4. We conclude the paper in Section 5.

2. Related work

Graph embedding techniques have drawn a lot of attention in the recent past, becoming the standard feature engineering paradigm for nodes features for graphs ([1,5]). In essence, they provide low dimensional representations based on large adjacency matrices. These techniques condense information from the nodes and edges of a graph into reduced structures, called embeddings.

Seminal works on graph for generating features for nodes include linear and non-linear dimensionality reduction algorithms, such as PCA, ISOMAP and Laplacian Eigenmaps ([1]), which have been applied to graphs’ adjacency or similarity matrices. These methods became popular due to their relative simplicity and effectiveness. However, they rely on finding eigenvectors of the (often sparse) affinity matrix, which can be extremely costly. The complexity of such methods is at least quadratic on the number of nodes, which does not scale well to large networks ([27]). Moreover, such approaches are not very robust, since applying these features as input to machine learning algorithms could reduce the values of accuracy, in different tasks ([3,8]).

In the last decade, a range of methods were introduced to address the high cost and low robustness of these seminal works by applying non-linear techniques, mainly inspired on random walks and deep learning, to learn representations for nodes in a graph ([10,23,27]). These approaches generate node embeddings which represent each node of a graph as a vector in a low dimensional space. These methods are known to scale well for very large networks, containing up to millions of nodes and billions of edges, and to outperform the first cited methods in many machine learning tasks, such as node classification, link prediction and anomaly detection.

The remainder of this section discusses the state-of-art methods for two classes of node embedding methods: (i) the static node embeddings, which generate a single representation for each node of the graph; and (ii) the dynamic node embeddings, which generate a temporal representation for each node through a series of vectors.

2.1. Static node embeddings

The very first methods for static node embeddings were proposed in the past five years, when researchers on this field changed the focus from traditional dimensionality reduction algorithms to scalable graph embedding techniques which leverage sparsity from real-world networks ([8]). Since then, a number of methods for generating embeddings have been proposed, such as HOPE ([22]), SDNE ([32]), and DNGR ([4]). Both SDNE and DNGR are methods for generating static embeddings based on deep learning models. In this case, a reasonably large data network is necessary for the correct training of these models. Additionally, HOPE was built to preserve high-order proximities of large scale graphs and capable of capturing the asymmetric transitivity. As the scenario in this work involves networks evolving from extremely small graphs (from 20 nodes onwards), these models are not applicable as baselines.

Among the most prominent methods, we select three methods which were used as baselines during our experimental study: DeepWalk ([23]), Large-scale Information Network Embedding (LINE) ([27]) and node2vec ([10]).

DeepWalk: It is an unsupervised feature learning model that learns latent representations of nodes in a network in a continuous vector space with a small number of dimensions. Representations are learned using information obtained from truncated random walks. Such representations encode social relations, i.e., features that capture neighborhood similarity and community membership, among nodes of the graph. DeepWalk captures the structure of the graph regardless of node label distributions, allowing the same representation to be used across various classification problems.

LINE: One of its main characteristics is to preserve the local and global structures of the graph, defined respectively in terms of first-order and second-order proximity. First-order proximity corresponds to links between nodes, whereas second-order proximity reflects their shared neighborhood structures. The model learns similar representations for nodes with high first and second order proximities. DeepWalk, unlike LINE, does not preserve neither the local nor the global structure of the graph.

node2vec: Similarly to DeepWalk, node2vec uses biased random walks to learn a low-dimensional embedding of a graph, preserving the neighborhoods of nodes, in a semi-supervised fashion. This framework is responsible for generating a mapping of nodes of a network in a low-dimensional space while preserving the network structure and characteristics of the neighborhood of the nodes. The exploration of the neighborhood is done through a process of biased random walks, able to define a flexible notion about this neighorhood condensing the information in the dimensions defined for the embedding [10]. The model uses walks to relate nodes that belong to the same community (homophily) or nodes that play similar structural roles in the network (structural equivalence). These concepts define similarity in node2vec (analogously to proximity in LINE). To capture both types of similarity, the random walk used by node2vec interpolates between two sampling strategies: breadth-first search (BFS) and depth-first search (DFS). In BFS, the neighborhood of a node is restricted to the nodes directly connected to it, whereas DFS considers nodes at increasing distances from the source node. In addition, random walks are used to define the neighborhoods of the nodes in the network, which are then used as input to an extension of the Skip-Gram model ([18]). From the extended model, node2vec learns feature representation for each node observing their neighborhood.

The methods described above have been developed for scenarios where the graph topology is completely observable upfront. We call these “static graphic scenarios” because the topology remains fixed. However, it has been shown that most large scale real-world networks are rarely fully available ([21,25]). Furthermore, they may evolve over time, for instance, with the addition or removal of nodes and edges ([7]). The most common way of representing evolving networks is through a sequence of snapshots, each containing the nodes and edges observed until that specific step of the exploration process.

Previous works have applied methods for generating static embeddings to evolving graph datasets. In this case, an embedding is generated for each snapshot of a network, and, at the end, the embeddings are compared to identify possible changes over the exploration of the network ([7,8]). However, in this case, each snapshot is treated by the algorithm as an independent network, not leveraging any information regarding the evolution of the network. In addition, the absence of relevant information related to the temporal evolution of the network impairs the learning procedure and the performance of downstream machine learning algorithms for node classification, link prediction and other tasks ([8,9]).

2.2. Dynamic node embeddings

Dynamic node embedding techniques have been proposed as extensions to methods originally designed for static embeddings in order to capture the temporal evolution of real-world networks. The main idea behind these methods is to update the embeddings using information from the previous embeddings along with information from the current state of the graph. By accounting for these two sources of information, these methods attempt to capture the time patterns present in the evolution of the graph. When these patterns get encoded in the embeddings, the performance of downstream machine learning tasks is expected to improve ([7]).

Several methods for generating embeddings in dynamic graphs have been proposed in the past years. Some of them are random walk based methods adapted to work with dynamic networks, markedly, dynnode2vec ([15]), HIN–DRL ([16]), DNE ([6]), tcc2vec ([19]) and tNodeEmbed ([26]). However, such approaches do not outperform the standard node2vec in well-know datasets. Refer to ([30]), for more details. On the other hand, most of the recently methods are based on deep learning techniques, particularly, DynGEM ([9]), dyngraph2vec ([7]), DyRep ([28]), SGNN ([14]) and GraphSAGE ([12]). Although they present competitive results in machine learning tasks when compared to static counterparts, the applicability of deep neural networks and deep autoencoders is restricted to graphs that are large enough to allow for learning a large number of parameters without overfitting. Hence, although they present a good alternative to treat dynamic graphs, they are only feasible for sufficiently large graphs. In the graph exploration scenario, the observed network can be rather small in the beginning. Hence these techniques cannot be applied.

Our contributions. Unlike previous works, EVNE is a viable alternative for networks that evolve from very small graphs. As the proposed method does not rely on deep learning architectures, it is possible to learn useful embeddings regardless of the graph size. EVNE is built on top of node2vec, a well-known method for generating static embeddings. We design mechanisms to deal with information that evolves and is updated as the network expands. EVNE updates the embeddings using information from the previous embeddings together with information from the current state of the graph without starting from scratch, as traditional dynamic node embeddings do. Yet, it preserves the asymptotic complexity of the static methods while incorporating evolving information regarding the graph into the generation of evolving embeddings. Therefore, EVNE can handle small and large graphs with the addition and removal of nodes and edges. In sum, we explore the sequential relationship between graph snapshots by incorporating information from past snapshots into the generation of the embedding corresponding to the current snapshot. Although EVNE uses node2vec as a building block, its mechanisms can be applied to any Skip-Gram based Network Embedding and Random Walks based methods in a dynamic setting. Table 1 highlights the most important differences among the main baselines for node embedding generation and EVNE.

Table 1
Comparison of baselines for node embedding generation. In ^∗, we refer to methods such as in ([7,9,12,14,28]

DeepWalk ([23] LINE ([27] Dynamic DNN^∗ node2vec ([10] EVNE (ours)

Fits to evolving graphs - - ✓ - ✓

Works with small graphs ✓ ✓ - ✓ ✓

Deals with the exploration–exploitation trade-off - - - ✓ ✓

Stable - ✓ - ✓ ✓

Updated run walks - - - - ✓

Updated neural network weights - - - - ✓

Evolving exploitation parameters - - - - ✓

Scalable ✓ ✓ ✓ ✓ ✓

	DeepWalk ([23]	LINE ([27]	Dynamic DNN^∗	node2vec ([10]	EVNE (ours)
Fits to evolving graphs	-	-	✓	-	✓
Works with small graphs	✓	✓	-	✓	✓
Deals with the exploration–exploitation trade-off	-	-	-	✓	✓
Stable	-	✓	-	✓	✓
Updated run walks	-	-	-	-	✓
Updated neural network weights	-	-	-	-	✓
Evolving exploitation parameters	-	-	-	-	✓
Scalable	✓	✓	✓	✓	✓

3. Evolving Node Embedding

This section introduces the definition of evolving graph used in this paper and describes the proposed model, named Evolving Node Embedding (EVNE).

3.1. Definitions

Let $G = (V, E)$ be a graph where V is the set of nodes and E is the set of edges. We define a discrete time evolving network as a sequence of graphs $G = {G_{t}}_{t = 1}^{T}$ , in which $G_{t} = (V_{t}, E_{t})$ is the network snapshot at an exploration step $t = 1, \dots, T$ .

Given a graph $G = (V, E)$ , a node embedding is a mapping function f from nodes to feature vector representations, $f : V \to R^{d}$ , for some $d ≪ | V |$ . The function f is defined so that a similarity measure (e.g., cosine similarity) between $f (u)$ and $f (v)$ encodes some notion of proximity between $u, v \in G$ . As a slight abuse of notation, we denote by $f_{t} (G_{t}) \in R^{| V | \times d}$ the embedding matrix of all nodes in $G_{t}$ . Hence, for evolving networks, we have $F = {f_{1}, f_{2}, \dots, f_{T}}$ as a time-series of $(| V_{t} | \times d)$ -dimensional matrices, where $f_{t}$ is the graph embedding matrix for $G_{t}$ .

In particular, we consider evolving graphs that start relatively small and evolve by the addition or removal of new nodes and edges. This process can represent, for instance, social networks where users accept new friends (addition of edges), or neighbors who sometimes remove or delete contacts each time they move to a new neighborhood, or even other networks that are being discovered on-the-fly through a search algorithm.

3.2. Proposed method

EVNE is an evolving node embedding algorithm that can be applied to both small and large evolving networks. As previously mentioned, our method builds upon the well-known static node embedding method node2vec ([10]) to handle evolving networks. The purpose of the embedding generated by EVNE is (i) to preserve the aggregation of local and global structural characteristics from the exploration of the graph and (ii) to account for information about the evolution of the network; and (iii) avoid the tipical process of visiting and storing information about all network snapshots.

EVNE challenge is to incorporate information from previous graphs in the generation of the embedding for a given snapshot. That is, it must update the embedding for $G_{t}$ from the embedding generated for $G_{t - 1}$ . Particularly, EVNE extends node2vec through three new modifications that allow for an evolving representation learning: (i) it concatenates the previous iteration’s biased random walks to the current iteration’s walks; (ii) it loads the previous snapshot’s embeddings as the initial weights of the extended Skip-Gram model; and (iii) it uses a strategy for varying parameters p and q throughout the iteration over the snapshots.

In contrast to mechanism (i), in the original node2vec, random walks are independently generated for each time step, rendering it unable to capture temporal patterns across graph snapshots. To address this issue, we use both the walks sampled in step $t - 1$ and those sampled in step t when learning the embedding for $G_{t}$ . This mechanism enables the inclusion of additional structural information identified in the previous step in the current embedding generation, improving structural knowledge and adding dynamic information to the embeddings generated by EVNE.

Mechanism (ii) changes the way we start the training phase of Skip-Gram. The original Skip-Gram initializes the learning process from a vector of random weights. Instead, we initialize the weight vectors at time t using embeddings generated at time $t - 1$ . A similar idea has been applied to detect changes in language across time in [14]. This allows EVNE to capture structural changes in network nodes that were previously seen.

node2vec provides a flexible notion of neighborhood through the use of biased random walks that interpolate between a BFS and a DFS-like behavior ([8]). This interpolation is governed by two parameters: the return parameter p controls how far from the source node the walk will go, by defining the likelihood of returning to a node that was visited in the previous step; the in-out parameter q directs the walk towards nodes that are further away from the source node. Random walks are in turn used to define the neighborhoods of the nodes by feeding the sequences of visited nodes to an extension of the Skip-Gram model ([18]). The extended model tries to learn a feature representation f for each node v that minimizes the negative log-likelihood of observing a neighborhood $N (v)$ given $f (v)$ .

We modify the original p and q parameters to change their values dynamically, allowing the balance between BFS and DFS biases to change over time. At the beginning, when the graph is small, we weaken the DFS behavior, reinforcing the BFS bias. Conversely, when the observed network becomes larger, EVNE reinforces the DFS bias in order to explore a larger region of the graph, while dimming the BFS behavior. This change can improve the quality of the embeddings by making the exploration more consistent over time, based on the size of the graph. We define the values of p and q in two ordered lists of values ( $P$ and $Q$ ), used as input into the algorithm. The values must be manually set in the input file. Also, the values of p and q change according to the state of evolution of the network. This means that, for example, if a network is observed for 200 timestamps and we have 4 different values for p in the input list $P$ , then the initial value will be set at iteration 0 and updated at iterations 50, 100 and 150.

Algorithm 1 presents EVNE. It receives as inputs a set of snapshots $G = {G_{t}}_{t = 1}^{T}$ , the embedding dimension d, the number of random walks r to be sampled per node, the walk length ℓ, the context window size for Skip-Gram k and the list of values for parameters p and q, $P$ and $Q$ . It outputs a sequence of node embeddings $E = {G_{t}}_{t = 1}^{T}$ , one for each graph $G_{t}$ .

Algorithm 1:

Evolving Node Embedding algorithm

When processing snapshot t, $p_{t}$ and $q_{t}$ are set as the $⌈ \frac{t}{T} L ⌉$ position on the lists $P$ and $Q$ , where $L = | P | = | Q |$ . At step t, the transition probability matrix $Π_{t}$ – which governs the random walks – is initialized according to $p_{t}$ , $q_{t}$ and $E_{t}$ . Next, for each node $u \in V_{t}$ , we sample r walks of length ℓ per node using node2vec’s biased random walk with transition probabilities given by $Π_{t}$ (line 9). These walks are stored in $curWalks$ (line 10). Then, using stochastic gradient descent to minimizes Skip-Gram’s loss function, we obtain embedding $f_{t}$ for snapshot $G_{t}$ using both $prevWalks$ (from $t - 1$ ) and $curWalks$ (from t), initializing the representations of all nodes observed in $t - 1$ as $f_{t - 1}$ and those for new nodes, randomly (line 13). Last, we append $f_{t}$ to the list $E_{t}$ and update $prevWalks$ (line 14). Note that none of these modifications increase the asymptotic complexity of node2vec.

4. Experimental analysis

In this section we describe the experiments performed to evaluate EVNE. The experimental setup section provides details about the datasets used in the experiments and the process for generating the set of snapshots for each dataset. Next, we detail the node classification task used for evaluating node embeddings comparisons and baselines used for comparison. Finally, we perform experiments to answer four important research questions related to the performance of the proposed method.

4.1. Experimental setup

The proposed method was analyzed using five networks datasets representing undirected and unweighted networks. The classification task is defined by choosing one node subpopulation of interest and defining them as targets and the remaning nodes as non-targets, yielding highly unbalanced classes.

Table 2 summarizes the network characteristics of each dataset below:

Table 2
Description and basic statistics of each network. “Targets”refers to the subpopulation of interest, $| G |$ is the number of snapshots, $| V |$ $| E |$ are the number of nodes and edges in the last snapshot, respectively, and $| V_{+} | / | V | %$ is the percentage of target nodes in the last snapshot of the network

ID CS DBP DC KS WK

Dataset CiteSeer DBpedia DonorsChoose Kiskstarter Wikipedia

Nodes Papers Places Donors Donors Wikipages

Edges Citations Hyperlinks Co-donors Co-donors Links

Targets Top venue Adm.regions P donors DFA donors OOP pages

$| G |$ 1482 677 133 680 377

$| V |$ 3825 4931 677 18644 4536

$| E |$ 31984 44204 7406 439970 39302

$| V_{+} | / | V | %$ 41.38 14.72 8.27 7.89 4.47

ID	CS	DBP	DC	KS	WK
Dataset	CiteSeer	DBpedia	DonorsChoose	Kiskstarter	Wikipedia
Nodes	Papers	Places	Donors	Donors	Wikipages
Edges	Citations	Hyperlinks	Co-donors	Co-donors	Links
Targets	Top venue	Adm.regions	P donors	DFA donors	OOP pages
$\| G \|$	1482	677	133	680	377
$\| V \|$	3825	4931	677	18644	4536
$\| E \|$	31984	44204	7406	439970	39302
$\| V_{+} \| / \| V \| %$	41.38	14.72	8.27	7.89	4.47

CiteSeer is a citation network in which each node represents a paper published on the top 10 venues in Computer Science. The target nodes are papers published at NIPS.

DBpedia is a network of 5,000 populated places from the DBpedia ontology. Places linked on Wikipedia in either direction, are linked together by an edge in the graph. Target nodes are places labeled as “administrative regions”.

DonorsChoose is a network of donors from the DonorsChoose.org crowdfunding website, in which teachers of US public schools can post classroom projects in need of donations. Projects added within 2007 and 2012 were used to compose a donor-to-donor network in which edges are placed between two donors if they donated to the same project within a time window of 48 hours. Target nodes are the donors of a project P, which received the largest number of donations in 2013.

Kickstarter is another network from a crowdfunding website. Sponsors that have donated to the same projects in the past were connected by an edge in this donor-to-donor network. Target nodes are the donors of the project with the largest number of donors in this dataset (named DFA).

Wikipedia is a dataset where nodes represent pages about topics related to programming languages. An edge between two nodes exists if one page links to the other, in either direction. Target nodes are pages related to Object Oriented Programming.

4.2. Graphs snapshots

The evolving networks were modeled as a sequence of graph snapshots representing the graphs previously described at consecutive observation steps ([13]). We consider snapshots generated by two different network search algorithms: Selective Harvesting ([21]) and a modified version o Breadth-First Search (BFS).

Selective Harvesting (SH) performs a type of online network search. The goal of SH is to maximize the number of nodes found, belonging to a certain target subpopulation, given a partial view of the graph and assuming the cost to query node labels is high. At each step, SH makes a prediction, based on the graph structure observed so far and the queried node labels, about which unlabeled node is more likely to belong to the target subpopulation. It covers scenarios where good temporal embeddings for nodes could boost performance in a given machine learning task.

In this sense, SH resembles a DFS, with the addition of a guiding procedure, which decides which node to query next. Figure 1(left) illustrates two consecutives steps of SH execution. On the left snapshot, four nodes (black) have already been queried for labels. Four other nodes (gray) are known but unqueried and the remaning nodes (white) are still unknown at this point. Solid edges represent the known graph structure in this step, dashed edges are still unseen. In the next snapshot, on the right, a node is queried for its label, also revealing its outgoing edges and neighboring nodes. Nodes above the traced line are included in the current snapshot.

The second algorithm modifies the BFS original search procedure in order to explore one node of the same level at each query. Figure 1(right) illustrates an example of two consecutives steps of BFS procedure execution. On the left, the gray node indicates the node to be explored in the current query. After the query, the explored node turns to black and on the right, the gray nodes indicates the nodes to be query next.

Fig. 1.

Example of two consecutives steps of selective harvesting (left) and BFS (right) search algorithms. Black, gray and white colors represent queried, unqueried and unknown nodes respectively. Solid and dashed lines represent known and unknown edges respectively.

Figures 2 and 3 show the growth in number of observed nodes and edges, respectively, across snapshots for SH’s and BFS. With either algorithms, we observe that the evolving process in the networks snapshots start with very few nodes and edges for all datasets. Thus, as mentioned before, existing dynamic embedding techniques cannot be applied to this scenario since most of them are based on convolutional and recurrent neural networks. These models require large training sets in order to avoid overfitting.

Fig. 2.

Node count for each dataset through the snapshots for SH’s (top) and BFS (bottom) search procedure.

Fig. 3.

Edge count for each dataset through the snapshots for SH’s (top) and BFS (bottom) search procedure.

The number of snapshots varies for each search procedure. As the search begins from a random node, the neighborhood exploration differs for each node. Table 3 presents the number of generated snapshots for both SH’s and BFS. For reproducibility propuses, all graph snapshots used in this work are publicly available.1

EVNE implementation and all the datasets, snapshots, generated embedding and results are available at: http://afterthepublication.com/EVNE.

Table 3

Number of snapshots generated for each search procedure

ID	CS	DBP	DC	KS	WK
#SH	1482	677	133	680	379
#BFS	832	39	10	107	133

4.3. Baselines and experimental setup for node embedding generation

We obtain node embeddings for each of the snapshots of the five networks using four static node embedding models as baselines, namely DeepWalk, ${LINE 1}^{st}$ , ${LINE 2}^{nd}$ ( $1^{st}$ and $2^{nd}$ order, respectively), and node2vec. Refer to Table 1 for some characteriscs regarding the comparison of the baselines.

The embeddings were trained with 128 dimensions. DeepWalk and node2vec have three commom parameters, namely, size of the context window and number and length of the random walks. The size of the context window was set to $d = 80$ . The parameters for controlling the number and length of the random walks sampled by DeepWalk and node2vec were also set to the node2vec defaults ( $r = 10$ and $ℓ = 80$ , respectively). Node2vec’s in-out parameter p and the return parameter q were both set to 1.

We tested all variants of our method using the same parameters as node2vec’s. The only difference is the value of the parameters p and q, which are static in node2vec and adaptive in our methods. We designed a strategy for changing these parameters as a function of the knowledge about the network. We define the start values for $Q = 4$ and $P = 0.25$ . We increase $P$ and decrease Q by a factor of 2 at every $\frac{1}{5}$ of the total number of snapshots. The rationale behind this strategy is: when the embedding generation process begins, we have a very small graph, thus its sampling strategy should bias the search towards nodes that are close to the initial node. Hence, we set higher values for q and lower values of p. As the graph evolution process goes on (the evolving process starts – nodes and edges can be added or removed) and the knowledge about the network begins to increase, the node sampling strategy should sample nodes that are further from the initial node. Thus, we set higher values for p and lower values for q.

4.4. Node classification

Our experimental study focuses on the node classification task. We use the node embeddings generated for each snapshot and the corresponding node labels to train a classifier along the graph evolution process. We consider four standard classifiers, namely Adaboost (AD), Logistic Regression (LR), Naive Bayes (NB) and Random Forest (RF). Classifiers were run with their default parameters, since the goal is to compare the embedding techniques. The performance of each combination dataset × embedding technique × classifier was computed over 5 executions of a 10-fold cross-validation procedure, each corresponding to a sequence of snapshots obtained from a different initial node. Since class examples are highly unbalanced, we chose Macro-F1 as our evaluation metric. The F1 measure is the harmonic average of precision and recall. The Macro averaging calculates metrics for each class, and finds their unweighted mean.

Also, to provide a holistic assessment of the embeddings over the entire set of experiments, we compute two evalutation metrics that account for all executions over all datasets used in the experimental study: Mean Penalty and Mean Rank. These metrics were proposed in ([11,31]) as a simple and intuitive way of comparing ensemble methods. Lower values for Mean Rank and Mean Penalty imply better performance. We computed a value of Mean Rank and Mean Penalty for each classifier. In our results, we present the sum of all Mean Ranks and Mean Penalties. We describe these metrics below.

Mean Rank: Let $R_{e, d}$ be the rank of embedding method e on dataset $d \in D$ , where D is the set of all datasets we run a classifier. The Mean Rank of embedding method ${MR}_{e}$ is given by $\begin{matrix} {MR}_{e} = \frac{\sum_{d \in D} R_{e, d}}{| D |} . \end{matrix}$

Mean Penalty: Let E be the set of embedding methods and $S_{e, d}$ be the score achieved by embedding method e on same dataset d for a classifier. The Mean Penalty ${MP}_{e}$ is given by $\begin{matrix} {MP}_{e} = \frac{\sum_{d \in D} max (S_{e^{'}, d}) - S_{e, d}}{| D |}; e^{'} \in E . \end{matrix}$

Roughly speaking, Mean Rank determines which embedding yields the best results more often. Mean Penalty, in turn, determines which embedding is best taking into account scores. Therefore, an embedding e can achieve the lowest Mean Penalty even if e does not yield the best results for all datasets and classifiers, as long as the score difference to the best embedding is never too large. Hence, Mean Penalty identifies embeddings that are robust and consistent to changes in datasets and classifiers throught the evolution process. By robustness, we mean the method should have small variations of Mean Penalty values across different datasets and classifiers. By consistency, a robust method should also be consistent regardless of changes in the network’s evolution scenario, i.e., the method should be consistent over time.

Next, we present and discuss the results found for the proposed research questions.

4.5. RQ1: Among the existing static node embeddings under consideration, which one achieves the best node classification performance in the described evolving network scenarios?

More specifically, using which static methods are we able to achieve the best results in the node classification task? Does node2vec still provide better results than the classical baselines DeepWalk, ${LINE 1}^{st}$ and ${LINE 2}^{nd}$ ? Which of the 4 static methods has greater robustness and consistency of results?

Figure 4 presents the results of Mean Rank and Mean Penalty for all 5 datasets and each classifier using DeepWalk, ${LINE 1}^{st}$ , ${LINE 2}^{nd}$ , and node2vec. The results are presented for different iterations steps of the evolution process, namely $25 %$ , $50 %$ , $75 %$ and $100 %$ of the total number of steps (indicated at the end of each picture line as 0.25, 0.5, 0.75 and 1).

Figure 4(left) indicates that, for BFS, node2vec outperforms all other methods except at the first search stage. Also, node2vec’s performance is superior to LINE1 and LINE2 in all cases. Figure 4(left) also indicates that, for SH, node2vec outperforms LINE2 in all cases, LINE1 in 2 cases and Deepwalk in 1 out of 4 cases. Particularly, node2vec outperforms LINE2 in all cases, for both SH and BFS. Thus, we conclude that node2vec works best for BFS. For SH, Deepwalk yields competitive to superior results when compared to node2vec.

Figure 4(right) presents the Mean Penalty results for each baseline and search procedure. Note that the methods’ performance is consistent across classifiers and interation steps. From the reported results, node2vec achieves lower values of Mean Penalty in 5 out of 8 cases. Particularly, node2vec’s performance tends to be more robust than the other baselines. In other words, the deviation associated with the baselines’ performance is greater than the deviation associated with the node2vec’ performance. All baselines are consistent over time.

From the results discussed above, we find that node2vec displays great robusteness across datasets and classifiers, and great consistency over time. For this reason, from now on we focus the comparisons of our method with node2vec.

Fig. 4.

Results of mean rank (left) and mean penalty (right) for all datasets on all classifiers, regarding 4 iteration steps and 2 search methods (BFS and SH). Dw, l1, l2 and n2v represent, respectively, DeepWalk, ${LINE 1}^{st}$ , ${LINE 2}^{nd}$ and node2vec. Lower values represent better performance.

4.6. RQ2: Are EVNE’s mechanisms effective for improving node classification performance in evolving network scenarios? Does EVNE outperfom the baselines?

We showed in the previous section that node2vec is typically the best among the static methods and since EVNE extends node2vec, we use it as our main baseline henceforth. Recall that EVNE introduces three new mechanisms to node2vec. In order to understand the impact of each, we analyze all seven combinations of these mechanisms. The combinations are referred by to the following acronyms: dpq for dynamic p and q parameters, rw for the evolving random walks, wgt for initializing weights using those of previous snapshots. Combinations that include 2 mechanisms are denoted by their acronyms separated by a “+” (that is, “modification 1 + modification 2”), whereas those including the three mechanisms are denoted by all.

Figure 5(left) presents the Mean Rank results for the seven combinations of EVNE and for node2vec. We computed the Mean Rank over the 4 classifiers, in order to avoid bias of looking at a single classifier at a time.

The results of the BFS indicate that in each iteration step, at least one variation of EVNE embeddings presents better performance compared to node2vec. For the SH search, we find the same behavior, except at stage $100 %$ , where node2vec outperforms all seven combinations of EVNE. In short, EVNE substantially outperforms node2vec in 7 out of 8 cases.

From the results in Fig. 5(left), we observe that EVNE variations dpq+wgt and rw+wgt perform better than the other variations for BFS. In addition, dpq+wgt outperforms node2vec in all 4 scenarios (BFS and SH for bothe mean rank and mean penalty metrics). Regarding the SH search process, EVNE variations dpq+wgt, rw and wgt perform better than the other variations. Additionaly, variations dpq+wgt and wgt outperform node2vec in 3 out of 4 scenarios.

Fig. 5.

Mean rank (left) and mean penalty (right) of node2vec and the seven combinations of EVNE by classifier, iteration step and search method. The smaller the bar the better.

From these analysis, we highlight 3 key facts: (i) any improvement in classification performance results only from the embedding obtained and not from the classifier used, (ii) the mechanisms of EVNE are responsible for improving node classification performance in evolving scenarios, and (iii) EVNE wgt is the variation with the best results considering all iteration steps. In addition, in a general representation scenario, EVNE wgt is the best variation of the proposed model.

4.7. RQ3: Is EVNE more robust than the baselines? Are the results of EVNE consistent over time?

We analyze the Mean Penalty results to assess the consistency and robusteness of all EVNE variations and node2vec over time. Low Mean Penalty values indicate that the errors obtained are low relative to other embeddings in the evaluated scenarios.

Figure 5(right) presents the results of Mean Penalty for all EVNE variations and node2vec. Two interesting facts should be highlighted: (i) the mean penalty values are very low for both BFS and SH, which indicates great robustness of the proposed method variants; (ii) When comparing the results of Fig. 5(right) and Fig. 4(right), we observe a large difference in the values of the Mean Penalty metric and the baselines previously analyzed. EVNE is considerably more robust than the baselines evaluated.

The similar values of Mean Penalty observed at different iteration steps indicate that changes during the evolution of the network do not interfere with the consistency of the method over time. In particular, we observe that our method and its variations are substantially more consistent than the other baselines analyzed. According to Fig. 5(right), EVNE variations tend to be more consistent than node2vec in all scenarios.

4.8. RQ4: Can we quantify and understand the effect of the modifications on node2vec over time?

From Fig. 5, we observe that leveraging information on the network evolution process through each of EVNE’s mechanisms can have a positive impact on the node classification performance for all classifiers. Our analysis indicates 2 main results: (i) most of EVNE’s variants outperform node2vec w.r.t. Macro-F1 based on the Mean Rank evaluation; and (ii) EVNE provides results that are more robust and methods more consistents throughout the network evolution process.

Therefore, to awnser RQ4: (i) we analyze how the aggregation of evolving information affects the node classification performance during the exploitation process when compared to node2vec and (ii) we quantify this effect at different points of the evolving process. We show Macro-F1 values for different points in time on Tables 4 and 5.

Table 4
Numerical results for the BFS search method, on four different iteration steps. In bold the best results of each dataset for each embedding method according to 4 different classifiers

Search method ID Embed. 25% 50% 75% 100%

AD LG NB RF AD LG NB RF AD LG NB RF AD LG NB RF

BFS CS n2v 1.000 0.716 1.000 1.000 0.623 0.713 0.792 0.644 0.650 0.651 0.687 0.619 0.689 0.681 0.687 0.674

rw 1.000 0.736 1.000 1.000 0.693 0.710 0.830 0.616 0.660 0.701 0.710 0.582 0.686 0.676 0.697 0.670

wgt 1.000 0.739 1.000 1.000 0.576 0.725 0.801 0.594 0.680 0.731 0.665 0.604 0.688 0.712 0.607 0.645

dpq 0.733 0.718 0.756 0.691 0.664 0.703 0.713 0.624 0.688 0.702 0.720 0.631 0.677 0.673 0.714 0.651

rw+wgt 1.000 0.732 1.000 1.000 0.566 0.716 0.799 0.545 0.666 0.738 0.699 0.588 0.699 0.712 0.695 0.649

dpq+rw 1.000 0.716 1.000 1.000 0.669 0.716 0.835 0.577 0.664 0.717 0.688 0.625 0.691 0.682 0.670 0.680

dpq+wgt 1.000 0.737 1.000 1.000 0.664 0.721 0.804 0.576 0.661 0.736 0.713 0.592 0.686 0.705 0.687 0.665

all 1.000 0.736 1.000 1.000 0.577 0.721 0.808 0.511 0.645 0.732 0.703 0.595 0.686 0.710 0.695 0.649

DC n2v 0.540 0.529 0.542 0.484 0.548 0.553 0.541 0.521 0.576 0.567 0.608 0.521 0.582 0.563 0.633 0.525

rw 0.519 0.516 0.513 0.487 0.536 0.545 0.560 0.526 0.556 0.592 0.604 0.502 0.578 0.587 0.607 0.535

wgt 0.550 0.557 0.456 0.493 0.551 0.561 0.513 0.503 0.567 0.594 0.566 0.494 0.562 0.603 0.616 0.525

dpq 0.533 0.547 0.563 0.526 0.540 0.563 0.596 0.494 0.532 0.542 0.590 0.518 0.547 0.577 0.621 0.537

rw+wgt 0.551 0.517 0.424 0.505 0.559 0.555 0.536 0.499 0.556 0.579 0.544 0.498 0.571 0.577 0.592 0.526

dpq+rw 0.540 0.541 0.556 0.502 0.551 0.557 0.533 0.507 0.549 0.569 0.612 0.498 0.590 0.572 0.617 0.545

dpq+wgt 0.506 0.530 0.474 0.483 0.556 0.575 0.535 0.508 0.584 0.586 0.580 0.510 0.556 0.577 0.590 0.517

all 0.543 0.531 0.470 0.483 0.540 0.577 0.645 0.493 0.573 0.620 0.622 0.502 0.551 0.592 0.532 0.498

DBP n2v 0.633 0.631 0.643 0.649 0.735 0.755 0.726 0.747 0.744 0.758 0.738 0.761 0.757 0.763 0.748 0.771

rw 0.631 0.632 0.643 0.645 0.744 0.753 0.725 0.754 0.754 0.760 0.737 0.763 0.758 0.765 0.748 0.777

wgt 0.637 0.592 0.643 0.653 0.729 0.740 0.725 0.749 0.744 0.745 0.738 0.766 0.756 0.754 0.748 0.775

dpq 0.634 0.628 0.643 0.652 0.733 0.744 0.724 0.750 0.750 0.747 0.738 0.765 0.762 0.754 0.748 0.771

rw+wgt 0.634 0.593 0.643 0.653 0.738 0.744 0.725 0.748 0.752 0.751 0.737 0.768 0.754 0.757 0.748 0.777

dpq+rw 0.630 0.626 0.643 0.647 0.728 0.738 0.725 0.744 0.744 0.732 0.737 0.764 0.756 0.742 0.745 0.776

dpq+wgt 0.638 0.601 0.643 0.655 0.739 0.733 0.725 0.753 0.749 0.732 0.738 0.768 0.761 0.745 0.748 0.783

all 0.631 0.589 0.643 0.642 0.742 0.718 0.725 0.747 0.745 0.720 0.736 0.770 0.760 0.721 0.746 0.783

KS n2v 0.487 0.477 0.529 0.493 0.493 0.526 0.574 0.498 0.510 0.517 0.573 0.505 0.512 0.509 0.540 0.509

rw 0.497 0.486 0.531 0.500 0.502 0.519 0.569 0.491 0.511 0.513 0.590 0.503 0.511 0.508 0.558 0.506

wgt 0.495 0.489 0.533 0.485 0.504 0.518 0.550 0.485 0.517 0.518 0.561 0.498 0.515 0.507 0.560 0.503

dpq 0.491 0.527 0.573 0.496 0.513 0.514 0.572 0.510 0.513 0.508 0.550 0.510 0.524 0.510 0.572 0.513

rw+wgt 0.500 0.488 0.543 0.488 0.502 0.514 0.589 0.490 0.521 0.516 0.592 0.501 0.521 0.511 0.552 0.498

dpq+rw 0.492 0.490 0.529 0.489 0.495 0.516 0.568 0.492 0.515 0.521 0.580 0.501 0.521 0.511 0.541 0.510

dpq+wgt 0.499 0.490 0.513 0.488 0.508 0.516 0.568 0.489 0.523 0.514 0.576 0.494 0.524 0.515 0.564 0.500

all 0.499 0.493 0.521 0.489 0.512 0.517 0.562 0.486 0.516 0.515 0.555 0.493 0.529 0.509 0.538 0.493

WK n2v 0.505 0.423 0.422 0.492 0.500 0.427 0.435 0.489 0.505 0.439 0.438 0.499 0.514 0.463 0.449 0.493

rw 0.504 0.443 0.423 0.489 0.514 0.428 0.435 0.490 0.510 0.445 0.434 0.503 0.522 0.462 0.452 0.500

wgt 0.503 0.458 0.423 0.489 0.506 0.456 0.433 0.489 0.505 0.463 0.439 0.499 0.504 0.481 0.451 0.501

dpq 0.517 0.420 0.419 0.491 0.515 0.427 0.433 0.495 0.509 0.444 0.437 0.494 0.508 0.451 0.448 0.500

rw+wgt 0.510 0.454 0.423 0.493 0.515 0.446 0.402 0.489 0.508 0.457 0.434 0.501 0.505 0.470 0.451 0.496

dpq+rw 0.494 0.424 0.424 0.489 0.509 0.436 0.436 0.490 0.514 0.441 0.435 0.498 0.509 0.467 0.447 0.495

dpq+wgt 0.509 0.452 0.422 0.489 0.514 0.450 0.434 0.489 0.510 0.464 0.434 0.497 0.505 0.476 0.451 0.501

all 0.498 0.458 0.420 0.489 0.510 0.449 0.332 0.491 0.507 0.463 0.421 0.497 0.518 0.466 0.417 0.494

Search method	ID	Embed.	25%	50%	75%	100%
BFS	CS	n2v	1.000	0.716	1.000	1.000	0.623	0.713	0.792	0.644	0.650	0.651	0.687	0.619	0.689	0.681	0.687	0.674
		rw	1.000	0.736	1.000	1.000	0.693	0.710	0.830	0.616	0.660	0.701	0.710	0.582	0.686	0.676	0.697	0.670
		wgt	1.000	0.739	1.000	1.000	0.576	0.725	0.801	0.594	0.680	0.731	0.665	0.604	0.688	0.712	0.607	0.645
		dpq	0.733	0.718	0.756	0.691	0.664	0.703	0.713	0.624	0.688	0.702	0.720	0.631	0.677	0.673	0.714	0.651
		rw+wgt	1.000	0.732	1.000	1.000	0.566	0.716	0.799	0.545	0.666	0.738	0.699	0.588	0.699	0.712	0.695	0.649
		dpq+rw	1.000	0.716	1.000	1.000	0.669	0.716	0.835	0.577	0.664	0.717	0.688	0.625	0.691	0.682	0.670	0.680
		dpq+wgt	1.000	0.737	1.000	1.000	0.664	0.721	0.804	0.576	0.661	0.736	0.713	0.592	0.686	0.705	0.687	0.665
		all	1.000	0.736	1.000	1.000	0.577	0.721	0.808	0.511	0.645	0.732	0.703	0.595	0.686	0.710	0.695	0.649
	DC	n2v	0.540	0.529	0.542	0.484	0.548	0.553	0.541	0.521	0.576	0.567	0.608	0.521	0.582	0.563	0.633	0.525
		rw	0.519	0.516	0.513	0.487	0.536	0.545	0.560	0.526	0.556	0.592	0.604	0.502	0.578	0.587	0.607	0.535
		wgt	0.550	0.557	0.456	0.493	0.551	0.561	0.513	0.503	0.567	0.594	0.566	0.494	0.562	0.603	0.616	0.525
		dpq	0.533	0.547	0.563	0.526	0.540	0.563	0.596	0.494	0.532	0.542	0.590	0.518	0.547	0.577	0.621	0.537
		rw+wgt	0.551	0.517	0.424	0.505	0.559	0.555	0.536	0.499	0.556	0.579	0.544	0.498	0.571	0.577	0.592	0.526
		dpq+rw	0.540	0.541	0.556	0.502	0.551	0.557	0.533	0.507	0.549	0.569	0.612	0.498	0.590	0.572	0.617	0.545
		dpq+wgt	0.506	0.530	0.474	0.483	0.556	0.575	0.535	0.508	0.584	0.586	0.580	0.510	0.556	0.577	0.590	0.517
		all	0.543	0.531	0.470	0.483	0.540	0.577	0.645	0.493	0.573	0.620	0.622	0.502	0.551	0.592	0.532	0.498
	DBP	n2v	0.633	0.631	0.643	0.649	0.735	0.755	0.726	0.747	0.744	0.758	0.738	0.761	0.757	0.763	0.748	0.771
		rw	0.631	0.632	0.643	0.645	0.744	0.753	0.725	0.754	0.754	0.760	0.737	0.763	0.758	0.765	0.748	0.777
		wgt	0.637	0.592	0.643	0.653	0.729	0.740	0.725	0.749	0.744	0.745	0.738	0.766	0.756	0.754	0.748	0.775
		dpq	0.634	0.628	0.643	0.652	0.733	0.744	0.724	0.750	0.750	0.747	0.738	0.765	0.762	0.754	0.748	0.771
		rw+wgt	0.634	0.593	0.643	0.653	0.738	0.744	0.725	0.748	0.752	0.751	0.737	0.768	0.754	0.757	0.748	0.777
		dpq+rw	0.630	0.626	0.643	0.647	0.728	0.738	0.725	0.744	0.744	0.732	0.737	0.764	0.756	0.742	0.745	0.776
		dpq+wgt	0.638	0.601	0.643	0.655	0.739	0.733	0.725	0.753	0.749	0.732	0.738	0.768	0.761	0.745	0.748	0.783
		all	0.631	0.589	0.643	0.642	0.742	0.718	0.725	0.747	0.745	0.720	0.736	0.770	0.760	0.721	0.746	0.783
	KS	n2v	0.487	0.477	0.529	0.493	0.493	0.526	0.574	0.498	0.510	0.517	0.573	0.505	0.512	0.509	0.540	0.509
		rw	0.497	0.486	0.531	0.500	0.502	0.519	0.569	0.491	0.511	0.513	0.590	0.503	0.511	0.508	0.558	0.506
		wgt	0.495	0.489	0.533	0.485	0.504	0.518	0.550	0.485	0.517	0.518	0.561	0.498	0.515	0.507	0.560	0.503
		dpq	0.491	0.527	0.573	0.496	0.513	0.514	0.572	0.510	0.513	0.508	0.550	0.510	0.524	0.510	0.572	0.513
		rw+wgt	0.500	0.488	0.543	0.488	0.502	0.514	0.589	0.490	0.521	0.516	0.592	0.501	0.521	0.511	0.552	0.498
		dpq+rw	0.492	0.490	0.529	0.489	0.495	0.516	0.568	0.492	0.515	0.521	0.580	0.501	0.521	0.511	0.541	0.510
		dpq+wgt	0.499	0.490	0.513	0.488	0.508	0.516	0.568	0.489	0.523	0.514	0.576	0.494	0.524	0.515	0.564	0.500
		all	0.499	0.493	0.521	0.489	0.512	0.517	0.562	0.486	0.516	0.515	0.555	0.493	0.529	0.509	0.538	0.493
	WK	n2v	0.505	0.423	0.422	0.492	0.500	0.427	0.435	0.489	0.505	0.439	0.438	0.499	0.514	0.463	0.449	0.493
		rw	0.504	0.443	0.423	0.489	0.514	0.428	0.435	0.490	0.510	0.445	0.434	0.503	0.522	0.462	0.452	0.500
		wgt	0.503	0.458	0.423	0.489	0.506	0.456	0.433	0.489	0.505	0.463	0.439	0.499	0.504	0.481	0.451	0.501
		dpq	0.517	0.420	0.419	0.491	0.515	0.427	0.433	0.495	0.509	0.444	0.437	0.494	0.508	0.451	0.448	0.500
		rw+wgt	0.510	0.454	0.423	0.493	0.515	0.446	0.402	0.489	0.508	0.457	0.434	0.501	0.505	0.470	0.451	0.496
		dpq+rw	0.494	0.424	0.424	0.489	0.509	0.436	0.436	0.490	0.514	0.441	0.435	0.498	0.509	0.467	0.447	0.495
		dpq+wgt	0.509	0.452	0.422	0.489	0.514	0.450	0.434	0.489	0.510	0.464	0.434	0.497	0.505	0.476	0.451	0.501
		all	0.498	0.458	0.420	0.489	0.510	0.449	0.332	0.491	0.507	0.463	0.421	0.497	0.518	0.466	0.417	0.494

Table 5

Numerical results for the SH search method, on four different iteration steps. In bold the best results of each dataset for each embedding method according to 4 different classifiers

Search method	ID	Embed.	25%				50%				75%				100%

			AD	LG	NB	RF	AD	LG	NB	RF	AD	LG	NB	RF	AD	LG	NB	RF
SH	CS	n2v	0.701	0.705	0.700	0.747	0.691	0.685	0.685	0.758	0.681	0.694	0.682	0.757	0.702	0.710	0.704	0.770
		rw	0.701	0.702	0.702	0.748	0.680	0.685	0.684	0.750	0.682	0.697	0.692	0.752	0.701	0.711	0.699	0.762
		wgt	0.700	0.712	0.612	0.755	0.678	0.672	0.618	0.750	0.687	0.703	0.633	0.749	0.704	0.708	0.671	0.758
		dpq	0.697	0.692	0.694	0.747	0.683	0.682	0.683	0.752	0.694	0.692	0.696	0.759	0.701	0.713	0.705	0.763
		rw+wgt	0.692	0.694	0.647	0.746	0.669	0.687	0.626	0.733	0.679	0.700	0.617	0.739	0.690	0.712	0.642	0.745
		dpq+rw	0.695	0.697	0.686	0.743	0.679	0.683	0.679	0.748	0.681	0.695	0.685	0.745	0.692	0.711	0.702	0.749
		dpq+wgt	0.680	0.698	0.664	0.743	0.683	0.700	0.633	0.747	0.688	0.701	0.627	0.751	0.689	0.711	0.665	0.746
		all	0.666	0.705	0.624	0.731	0.681	0.686	0.616	0.728	0.673	0.701	0.602	0.734	0.686	0.709	0.626	0.734
	DC	n2v	0.595	0.663	0.609	0.515	0.646	0.652	0.658	0.517	0.615	0.709	0.670	0.522	0.630	0.710	0.690	0.555
		rw	0.638	0.705	0.661	0.564	0.612	0.698	0.696	0.511	0.624	0.721	0.691	0.497	0.627	0.675	0.672	0.487
		wgt	0.648	0.660	0.673	0.561	0.674	0.730	0.735	0.558	0.680	0.728	0.703	0.482	0.638	0.676	0.728	0.489
		dpq	0.622	0.650	0.618	0.498	0.628	0.646	0.640	0.552	0.647	0.695	0.658	0.523	0.673	0.712	0.703	0.518
		rw+wgt	0.613	0.663	0.692	0.517	0.613	0.675	0.665	0.532	0.673	0.689	0.720	0.512	0.635	0.677	0.672	0.479
		dpq+rw	0.600	0.662	0.638	0.497	0.637	0.687	0.666	0.520	0.636	0.707	0.709	0.493	0.699	0.707	0.682	0.497
		dpq+wgt	0.649	0.679	0.689	0.563	0.635	0.718	0.726	0.585	0.652	0.697	0.734	0.485	0.699	0.701	0.756	0.485
		all	0.595	0.678	0.651	0.549	0.639	0.641	0.678	0.486	0.645	0.668	0.743	0.491	0.636	0.692	0.726	0.490
	DBP	n2v	0.734	0.743	0.688	0.726	0.765	0.776	0.700	0.742	0.766	0.792	0.744	0.760	0.791	0.815	0.756	0.766
		rw	0.744	0.751	0.697	0.729	0.754	0.773	0.681	0.726	0.795	0.802	0.757	0.779	0.809	0.825	0.767	0.796
		wgt	0.739	0.758	0.722	0.733	0.766	0.782	0.708	0.741	0.791	0.809	0.756	0.778	0.816	0.821	0.781	0.814
		dpq	0.735	0.746	0.704	0.741	0.757	0.764	0.713	0.725	0.743	0.793	0.738	0.688	0.748	0.776	0.724	0.665
		rw+wgt	0.733	0.758	0.725	0.722	0.765	0.784	0.702	0.739	0.796	0.808	0.768	0.784	0.811	0.827	0.768	0.784
		dpq+rw	0.742	0.746	0.703	0.725	0.767	0.783	0.689	0.724	0.787	0.804	0.752	0.763	0.782	0.816	0.743	0.742
		dpq+wgt	0.734	0.750	0.724	0.733	0.763	0.778	0.700	0.749	0.805	0.808	0.755	0.808	0.753	0.779	0.738	0.753
		all	0.733	0.756	0.743	0.681	0.770	0.779	0.685	0.740	0.800	0.810	0.776	0.767	0.788	0.802	0.757	0.757
	KS	n2v	0.544	0.565	0.565	0.508	0.539	0.537	0.557	0.496	0.539	0.528	0.573	0.495	0.529	0.530	0.602	0.492
		rw	0.546	0.567	0.551	0.496	0.539	0.537	0.577	0.501	0.535	0.535	0.596	0.494	0.536	0.535	0.612	0.494
		wgt	0.550	0.565	0.560	0.487	0.542	0.540	0.549	0.506	0.536	0.529	0.480	0.492	0.539	0.531	0.478	0.498
		dpq	0.558	0.565	0.567	0.516	0.543	0.539	0.566	0.504	0.540	0.524	0.584	0.495	0.536	0.520	0.593	0.491
		rw+wgt	0.544	0.568	0.563	0.500	0.533	0.536	0.487	0.498	0.536	0.525	0.478	0.496	0.539	0.529	0.461	0.506
		dpq+rw	0.552	0.567	0.545	0.505	0.537	0.543	0.581	0.500	0.547	0.532	0.592	0.495	0.539	0.525	0.599	0.489
		dpq+wgt	0.544	0.562	0.547	0.500	0.550	0.541	0.515	0.508	0.533	0.526	0.489	0.499	0.532	0.521	0.496	0.501
		all	0.543	0.561	0.541	0.500	0.535	0.539	0.499	0.505	0.535	0.534	0.466	0.503	0.544	0.526	0.474	0.502
	WK	n2v	0.642	0.654	0.628	0.570	0.665	0.664	0.639	0.530	0.664	0.676	0.638	0.520	0.672	0.674	0.617	0.529
		rw	0.652	0.647	0.652	0.533	0.633	0.670	0.646	0.529	0.677	0.684	0.642	0.517	0.656	0.666	0.639	0.537
		wgt	0.621	0.632	0.667	0.540	0.653	0.662	0.658	0.515	0.681	0.683	0.649	0.521	0.646	0.654	0.653	0.524
		dpq	0.659	0.648	0.651	0.573	0.664	0.674	0.654	0.526	0.670	0.671	0.642	0.546	0.641	0.667	0.600	0.514
		rw+wgt	0.641	0.643	0.661	0.521	0.642	0.667	0.678	0.498	0.680	0.682	0.656	0.530	0.659	0.667	0.653	0.535
		dpq+rw	0.635	0.646	0.616	0.569	0.654	0.681	0.650	0.517	0.679	0.682	0.649	0.518	0.675	0.668	0.616	0.504
		dpq+wgt	0.636	0.657	0.658	0.528	0.654	0.677	0.651	0.508	0.676	0.687	0.664	0.512	0.690	0.676	0.669	0.527
		all	0.623	0.649	0.646	0.525	0.661	0.674	0.660	0.501	0.662	0.680	0.632	0.518	0.671	0.664	0.664	0.530

Table 4 shows the Macro-F1 achieved at four different iteration steps, after 25%, 50%, 75% and 100% of the evaluation period for the BFS search process. For each dataset, classifier and snapshot, we highlight in bold the best performing embedding method. We group the analysis of the table according to each snapshot.

At 25% of the graph exploration: we observe on all scenarios, regarding the datasets and the classifiers, at least one variation of EVNE performs substantially better than node2vec. The differences are up to 6%. The variations dpq and wgt present the best results.

At 50% of the graph exploration: at least one of EVNE variations performs better than node2vec in all scenarios, except for CS running RF, DPB running LG and NB, and KS running LG. EVNE is substantially better than node2vec by up to 20% higher. Variations rw and dpq present the best results.

At 75% of the graph exploration: Once again EVNE provides superior results in most of the evaluated scenarios. node2vec performs better than EVNE in 1 out of 20 evaluated scenarios: DC running RF. EVNE’s results are up to 8% better than node2vec’s. Variations dpq and rw present the best results again.

At 100% of the graph exploration: node2vec outperforms EVNE only on DC running NB. In all other cases, at least one EVNE variations achieves higher Macro-F1 than node2vec. In all other scenarios, EVNE outperforms node2vec by up to 4%. Variations wgt and dpq+rw present the best results.

Table 5 shows the numbers of Macro-F1 achieved at four different stages (25%, 50%, 75% and 100%) of the evaluation period for the SH search process. For each dataset, classifier and snapshot, we highlight the results of the best performing embedding method. We organize the analysis of the table based on the snapshot (25%, 50%, 75% or 100%).

At 25% of the graph exploration: regarding all 20 scenarios, at least one variation of EVNE outperforms node2vec. EVNE Macro-F1 results are up to 8% higher. Variations wgt and rw presents the best results.

At 50% of the graph exploration: EVNE yields higher valeus of Macro-F1, except on 4 specific cases: CS running AD, NB and RF and WK running RF. EVNE achieves Macro-F1 up to 8% higher than node2vec’s. Variations wgt and dpq+wgt give the best results.

At 75% of the graph exploration: EVNE outperforms node2vec in all 20 scenarios, by up to 7%. The variations dpq+wgt, wgt and dpq present the best results.

At 100% of the graph exploration: node2vec performs better than EVNE only in 2 out of 20 scenarios: in CS with RF and in DC with RF. Variations dpq+wgt, wgt and dpq yield the best results, outperfoming node2vec by up to 6%.

To provide an overview of EVNE ’s performance, we show, in Fig. 6 the Macro-F1 values for all classifiers though all stages of the evolving process, for both BFS (left) and SH (right) search method.

Fig. 6.

Mean absolute macro-F1 results of node2vec and the seven variations of EVNE, by classifier and search method, at all iteration steps.

The results of Macro-F1 for the BFS search are generally lower than the values of Macro-F1 for the SH search process. In addition, the values of Macro-F1 tend to increase over time for both BFS and SH search.

Also from Fig. 6, we observe that, for both SH and BFS, the variations dpq, dpq+wgt and wgt tend to provide the best results, while the variation all tends to provide lower values of Macro-F1 for both BFS and SH search process. EVNE is a skip-gram model, based on biased random walks. There are some random features inherent in the model. These characteristics may be responsible for causing variations in the results of EVNE variants. In a general scenario, EVNE wgt is the best variation of the proposed model, showing better results than node2vec and the other EVNE variations in most cases, for both BFS and SH search process.

5. Final remarks

In this work we proposed EVNE, a new node embedding technique that learns representation vectors from evolving networks. Unlike previous works, EVNE is a viable alternative for networks that evolve from very small graphs. As the proposed method does not rely on deep learning architectures, it is possible to learn useful embeddings regardless of the graph size. EVNE is built on top of node2vec, a well-known method for generating static embeddings. We design mechanisms to deal with information that evolves and is updated as the network expands. Although EVNE uses node2vec as a building block, its mechanisms can be applied to any Skip-Gram based Network Embedding and Random Walks based methods in a dynamic setting.

Our proposals to EVNE are generic enough to be applied to all SGNE and Random Walks based methods in a dynamic setting. However, in this work EVNE incorporates three mechanisms that extend the node2vec algorithm to leverage information contained in previous snapshots of the network without increasing the asymptotic complexity of the original method. We present an alternative of a node embedding technique able to generate node embeddings for both small and large evolving network, with the asymptotic complexity of a traditional static method and without the need to train a large set of recurrent and convolutional neural network parameters.

EVNE yield embeddings capable of adapting to structural changes that take place in evolving graphs. Using a wide set of experiments which involved several combinations of embedding techniques, network datasets and classifiers, we demonstrated that EVNE generated better embeddings: in a downstream node classification task, our embeddings achieved better performance than the other static methods in most cases. Our results show the value of aggregate temporal information in the generation of the embedding. EVNE and its variations are robust and consistent over the evolving process.In addition, in a general representation scenario, we suggest EVNE wgt as the best variation of the proposed model.

After an extensive analysis of objective results in different types of datasets, we can conclude that EVNE is an effective alternative for generating dynamic network embeddings, even in cases where these networks are still extremely small. Furthermore, the application of the proposed method in this type of scenario could facilitate the identification of spread fake news processes in Twitter, for instance, when the dispersion process is still recent and the information has been little dispersed yet.

There are several directions of future work. EVNE ensures two time-varying parameters to regulate the behavior of the biased random walks over the process of graph exploration. We plan to improve the generation of these parameters, based on a dynamic adaptation strategy. We also plan to use EVNE representations in a hybrid scenario where an evolving graph becomes large enough that a deep learning method can be applied. Thus, EVNE representations could be used as inputs to deep learning methods. We intend to extend our evaluation in larger datasets with different characteristics.

Footnotes

Acknowledgements

This work was partially funded by projects EUBra-BIGSEA (H2020-EU.2.1.1 690116, Brazil/MCTI/RNP GA-000650/04), ATMOSPHERE (H2020 777154 and MCTIC/RNP 51119), FAPEMIG (grant no. CEX-PPM-00098-17), MPMG (project Analytical Capabilities), CNPq (grant no. 310833/2019-1), CAPES, MCTIC/RNP (grant no. 51119) and H2020 (grant no. 777154).

References

Bengio,

Courville and

Vincent, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence35(8) (2013), 1798–1828. doi:10.1109/TPAMI.2013.50.

Bojchevski,

Klicpera,

Perozzi,

Kapoor,

Blais,

Rózemberczki,

Lukasik and

Günnemann, Scaling graph neural networks with approximate pagerank, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 2464–2473. doi:10.1145/3394486.3403296.

Cai,

V.W.

Zheng and

K.C.-C.

Chang, A comprehensive survey of graph embedding: Problems, techniques, and applications, IEEE Transactions on Knowledge and Data Engineering30(9) (2018), 1616–1637. doi:10.1109/TKDE.2018.2807452.

Cao,

Lu and

Xu, Deep neural networks for learning graph representations, AAAI (2016).

Chen,

Y.-C.

Wang,

Wang and

C.-C.J.

Kuo, Graph representation learning: A survey, Transactions on Signal and Information Processing9 (2020).

Du,

Wang,

Song,

Lu and

Wang, Dynamic network embedding: An extended approach for skip-gram based network embedding, in: IJCAI, 2018, pp. 2086–2092.

Goyal,

S.R.

Chhetri and

Canedo, Dyngraph2vec: Capturing network dynamics using dynamic graph representation learning, Knowledge-Based Systems187 (2020), 104816. doi:10.1016/j.knosys.2019.06.024.

Goyal and

Ferrara, Graph embedding techniques, applications, and performance: A survey, Knowledge-Based Systems151 (2018), 78–94. doi:10.1016/j.knosys.2018.03.022.

Goyal,

Kamra,

He and

Liu, Dyngem: Deep embedding method for dynamic graphs, arXiv:1805.11273, 2018.

10.

Grover and

Leskovec, node2vec: Scalable feature learning for networks, ACM SIGKDD (2016).

11.

Gurukar,

Vijayan,

Srinivasan,

Bajaj,

Cai,

Keymanesh,

Kumar,

Maneriker,

Mitra,

Patelet al., Network representation learning: Consolidation and renewed bearing, arXiv:1905.00987, 2019.

12.

Hamilton,

Ying and

Leskovec, Inductive representation learning on large graphs, Advances in Neural Information Processing Systems (2017).

13.

Khurana and

Deshpande, Efficient snapshot retrieval over historical graph data, ICDE (2013).

14.

Ma,

Guo,

Ren,

Zhao,

Tang and

Yin, Streaming graph neural networks, arXiv:1810.10627, 2018.

15.

Mahdavi,

Khoshraftar and

An, dynnode2vec: Scalable dynamic network embedding, in: IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 3762–3765.

16.

Meilian and

Danna, HIN_DRL: A random walk based dynamic network representation learning method for heterogeneous information networks, Expert Systems with Applications158 (2020), 113427. doi:10.1016/j.eswa.2020.113427.

17.

Mendoza,

Parra and

Á.

Soto, GENE: Graph generation conditioned on named entities for polarity and controversy detection in social media, Information Processing & Management (2020), 102366. doi:10.1016/j.ipm.2020.102366.

18.

Mikolov,

Sutskever,

Chen,

G.S.

Corrado and

Dean, Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems (2013).

19.

Mitrović,

Baesens,

Lemahieu and

De Weerdta, Tcc2vec: RFM-informed representation learning on call graphs for churn prediction, Journal of Information Science477 (2019), 11–53.

20.

Mitrović and

De Weerdt, Churn modeling with probabilistic meta paths-based representation learning, Information Processing & Management57(2) (2020), 102052. doi:10.1016/j.ipm.2019.06.001.

21.

Murai,

Rennó,

Ribeiro,

G.L.

Pappa,

Towsley and

Gile, Selective harvesting over networks, Data Mining and Knowledge Discovery32(1) (2018), 187–217. doi:10.1007/s10618-017-0523-0.

22.

Ou,

Cui,

Pei,

Zhang and

Zhu, Asymmetric transitivity preserving graph embedding, ACM SIGKDD (2016).

23.

Perozzi,

Al-Rfou and

Skiena, Deepwalk: Online learning of social representations, ACM SIGKDD (2014).

24.

Qiao,

Luo,

Li,

Tian and

Ma, Heterogeneous graph-based joint representation learning for users and POIs in location-based social network, Information Processing & Management57(2) (2020), 102151. doi:10.1016/j.ipm.2019.102151.

25.

Sahu,

Mhedhbi,

Salihoglu,

Lin and

M.T.

Özsu, The ubiquity of large graphs and surprising challenges of graph processing, Proceedings of the VLDB Endowment11(4) (2017), 420–431. doi:10.1145/3186728.3164139.

26.

Singer,

Guy and

Radinsky, Node embedding over temporal graphs, in: Proceedings of the 28th International Joint Conference on Artificial Intelligence, AAAI Press, 2019, pp. 4605–4612.

27.

Tang,

Qu,

Wang,

Zhang,

Yan and

Mei, Line: Large-scale information network embedding, WWW (2015).

28.

Trivedi,

Farajtbar,

Biswal and

Zha, Representation learning over dynamic graphs, arXiv:1803.04051, 2018.

29.

Van Belle,

Mitrović and

De Weerdt, Representation learning in graphs for credit card fraud detection, in: Workshop on Mining Data for Financial Applications, Springer, 2019, pp. 32–46.

30.

C.O.

Vázquez,

Mitrović,

De Weerdt and

vanden Broucke, A Comparative Study of Representation Learning Techniques for Dynamic Networks, World Conference on Information Systems and Technologies, Springer, 2020, pp. 523–530.

31.

Vijayan,

Chandak,

M.M.

Khapra and

Ravindran, Fusion graph convolutional networks, Mining and Learning with Graphs (MLG), ACM SIGKDD (2018).

32.

Wang,

Cui and

Zhu, Structural deep network embedding, ACM SIGKDD (2016).

33.

Zhang,

Yu,

Xia and

Wang, Protein–protein interactions prediction based on ensemble deep neural networks, Neurocomputing324 (2019), 10–19. doi:10.1016/j.neucom.2018.02.097.

34.

Zhang and

A.A.

Ghorbani, An overview of online fake news: Characterization, detection, and discussion, Information Processing & Management57(2) (2020), 102025. doi:10.1016/j.ipm.2019.03.004.