Abstract
The application of Machine Learning techniques over networks, such as prediction tasks over nodes and edges, is becoming often crucial in the analysis of Complex systems in a wide range of research fields. One of the enabling technologies in that sense is represented by Node Embedding, which enables us to learn features automatically over the network. Among the different approaches proposed in the literature, the most promising are DeepWalk and Node2Vec, where the embedding is computed by combining random walks and neural language models. However, characteristic limitations with these techniques are related to memory requirements and time complexity. In this paper, we propose a distributed and scalable solution, named ActorNode2vec, that keeps the best advantages of Node2Vec and overcomes the limitations with the adoption of the actor model to distribute the computational load. We demonstrate the efficacy of this approach with a large network by analyzing the sensitivity of walk length and number of walks parameters and make a comparison also with Deep walk and an Apache Spark distributed implementation of Node2Vec. Results show that with ActorNode2vec computational times are drastically reduced without losing embedding quality and overcoming memory issues.
Keywords
Introduction
In a wide range of disciplines it is possible to find real systems often characterized by heterogeneous or similar entities that interact with each other in a complex way: In fact, these so-called Complex systems are pervasive in several research fields, such as Sociology, Biology, Genetics, Physics, Computer science and Finance and their analysis for knowledge discovery is still challenging. Complex system analysis involves often the use of graphs (or networks) to model the behavior of the system with the basic idea of representing entities as nodes and their interactions and dynamics as (un)directed edges. For decades the study of graph-data has been limited to analysis of the network topology with structural metrics that are capable of extracting connectivity patterns among the system components. More recently, with the progress of Machine Learning techniques, it is emerged the idea of taking advantages from this kind of structures, also to perform prediction tasks or clustering. For example in [1] the authors modeled the interaction between proteins as a network, with the aim of automatic predicting a correct label for each protein describing its functionalities. In [2] the authors uses a temporal network to model the US stock market in order to discover correlations among the dynamics of stocks’ cluster and to predict economic crises. In [3] the authors analyze a social group of patients to extract new knowledge about their emotional state and disease temporal pattern by modeling them in two attributed networks: an interaction network and a friendship network. However, the application of Machine Learning directly on graph-data is made it difficult because of the necessary manually feature extraction. This context can be seen as very similar to Text Mining where the same problem has been addressed with the development of Neural language models [4, 5] that perform a Representation Learning task over sentences, called Embedding, in order to learn a vector representation of each word. In light of this, several works have highlighted the importance of performing a similar task with the same models also to extract automatically features from networks that can preserve local and global information of nodes. The first successful attempt in this direction is represented by the algorithm DeepWalk [6] who combines the Skip-gram model and the generation of unbiased random walks. However, to the best of our knowledge the State of Art in this field is represented by the Node2Vec algorithm [7] which differs from the previous one for the adoption of biased random walks. However, these approaches present some limitations related to memory constraints and time complexity both of them dependent on the number of nodes and edges in analysis. These issues make sometimes infeasible the application of these techniques on real and large networks. In this paper, we propose a distributed and scalable solution, named ActorNode2vec, that keeps the best advantages of Node2Vec but overcoming these kind of constraints with the adoption of the actor model to distribute the computational load. We demonstrate the efficacy of this approach with a real large network by analyzing the sensitivity of walk length and number of walks parameters and make a comparison also with Deep walk and an Apache Spark distributed implementation of Node2Vec. Results shows that with ActorNode2vec computational times are drastically reduced without losing embedding quality and overcoming memory issues. Our proposal, has been developed on ActoDeS [8], a framework for the development of large concurrent and distributed systems. We demonstrate the efficacy of this approach with a real large network (Enron network [9]) by analyzing the sensitivity of two algorithm’s parameters (walk length and number of walks). Results show a significant improvement in terms of reduction of computational times and the overcome of the memory constraints in the case of a real large network.
State of the art
In the last years, combining machine learning and graph analysis techniques to work with networks represented a challenge that has been faced with two main approaches:
Graph embedding techniques can be divided in two general sectors:
DeepWalk
DeepWalk aims to learn latent representation of nodes in a network in a continuous vector space that can be easily exploited by traditional Machine Learning models. This algorithm generalizes the advancements in word embedding obtained with Word2Vec [5] by combining its neural language model, known as Skip-gram [14] and the generation of truncated random walks on the network that play a role equivalent to sentences in text mining. In order to achieve this result it optimizes a neighborhood preserving objective function using the Stochastic Gradient Descent with a Hierarchical softmax in the form of a shallow Neural network. More details about the optimization function are discussed in the next section, because also Node2Vec is based on the same function but with a different generation policy of random walks. From a computational point of view, DeepWalk results to be a lightweight algorithm in terms of required memory for two main reasons: the first one is that random walks are generated considering only edges weights and using a uniform probability distribution; the second one is that to avoid memory limits, it implements a streaming strategy to deal with large networks. However this last aspect is the responsible for the grown in terms of time complexity and for less quality in terms of representation. The main parameters of the algorithm are the number of dimension for the embedding space d, the number of walks to be generated for each node γ and the walk length ω .
Node2Vec
Node2Vec [7] extends the previous node embedding algorithm DeepWalk.
From the last one, Node2Vec retrieves the Skip-gram model [14] and the idea of learning embeddings by analyzing the relationships among the entities in the form of sequences and the use of a shallow neural network model. In fact, Word2Vec learns word embeddings by analyzing the context of each word in a corpus. It predicts the current word from a sliding window of surrounding context words. The key idea of applying this model in the graph domain is to build something similar to a text corpus (a set of word sequences) by performing random walks on the graph. The result can be thought as a set of word sentences where words are substituted with nodes that compose the walk. The innovation of Node2Vec with respect to DeepWalk is in the method that is used to build biased random walks on the network. In the previous one in fact, random walking is obtained by a uniform random sampling over the linked nodes, while Node2Vec combine two different strategies for the network exploration: Depth-First-Search (DFS) and Breadth-First-Search (BFS), see Figure 1. It uses also a second-order Markov chain and the Alias sampling to select the next node to be visited in the walk. In order to learn latent representations, Node2Vec optimizes a neighborhood preserving function by also redefining the concept of the neighborhood as a set of visited nodes depending on the combination of BFS and DFS. The considered optimization problem is the following:

BFS and DFS search strategies from node S.
Probabilities computation over edges with a second-order Markov chain. Alias sampling and random walks generation. Embedding with the Neural network model.
In order to feed the neural network for the embedding step, Node2Vec has to perform a fixed number of random walks starting from each node, with a fixed length. These walks are based on a second-order Markov chain that takes in consideration the parameters p and q to mix the two different search strategies introduced previously. To achieve this result, the first duty of the algorithm is to generate all of the required probabilities over the edges for the next step of random walks generation. Formally, given a source node

Example of the calculus of probabilities over edges with a second-order Markov chain.
Once that a set of probabilities is associated to each edge depending on the second-order neighborhood of each node, Node2Vec is ready to perform effectively the random walks based on the Number of walks and Walks length parameters. To sample the edges to be traversed from the probability distribution obtained in the previous step, the algorithm uses the Alias method [15]. This method enables the algorithm to sample efficiently from the discrete probability distribution by building and consulting two tables: the probabilities table and the alias table. To take a decision the algorithm uses a system based on a biased coin to be flipped. At the end of this phase, random walks get collected. The collected sequences of nodes play the same role as sentences in a text mining corpus. In this analogy, sentences are built from the nodes that have been traversed by each walk.
Embedding with neural networks
The neural network model used by Node2Vec is exactly the same of Word2Vec [5]. The basic idea is to train a simple neural network with a single hidden layer to predict the most probably word that follows the input one, but then the algorithm does not use this model for the task we trained it on. Instead, the goal is actually just to learn the weights of the hidden layer that will represent our euclidean representation of the input. The training set is composed by words pairs (x,y) where x is the input word and y is a nearby word of x in a fixed window in each sentence of the corpus. In the case of Node2Vec the operation is the same but the algorithm uses node pairs and the corpus is composed by the previous computed random walks.
ActoDeS overview
ActoDeS (Actor Development System) is a Java software framework for the development of concurrent and distributed applications that has been experimented in different application domains (see, for example, [16–20]). This software framework allows the definition of distributed applications exploiting concurrent objects (actors) [21]. These actors are implemented on the top of some preexisted Java software libraries and solutions for supporting concurrency and distribution. These actors interact by exchanging asynchronous messages and, after their creation, can change several times their behaviors until they kill themselves. Each behavior has the main duty of processing the incoming messages that match some message patterns. Therefore, if an unexpected message arrives, then the actor maintains it in its mailbox until a behavior will be able to process it, or a behavior kills the actor. The processing of the incoming messages is performed through some message handlers. In response to a message, an actor uses the associated handler for: sending other messages, creating new actors, changing its local state or its behavior, setting a timeout within receiving a new message and finally killing itself. In particular, when a timeout fires, the actor sends automatically a timeout message to itself and the corresponding message handler is executed. Depending on the complexity of the application and on the availability of computing and communication resources, an application can be distributed on one or more actor spaces. Each actor space corresponds to a Java virtual machine and so a system distributed on some actor spaces can be deployed on one or more computational nodes. An actor space acts as “container” for a set of actors and provides them the necessary services for their execution. An actor space performs its tasks through three main run-time components, (i.e., controller, dispatcher and registry) and through two run-time actors (i.e., the executor and the service provider). The controller manages the execution of an actor space. It configures and starts the run-time components and the actors of the application, and manages its activities until the end of its execution by using different communication technologies (the current release of the software supports ActiveMQ, MINA RabbitMQ, RMI, and ZeroMQ). The registry is a run-time component that supports actor creation and message passing. In fact, it creates the references of new actors and supports the delivery of those messages that come from remote actors, by mapping each remote reference onto the local reference of the final destination of the message. The executor actor manages the execution of the actors of an actor space and in some cases creates its initial set of actors. Finally, the service provider actor provides an extensible set of services to the other actors of the application that allows them to perform new kinds of actions (e.g., to broadcast or multi-cast a message and to move from an actor space to another one). The development of a standalone or distributed application consists in the definition of the behaviors assumed by its actors and in the definition of few configuration parameters that allow the selection of the more appropriate implementation of the actors and of the other run-time components of the application. This solution allows the optimization of some or others execution attributes of an application (e.g., performance, reliability and scalability). Moreover, the deployment of an application on a distributed architecture is simplified because an actor can move to another computational node and can transparently communicate with remote actors.

The actor space structure in ActoDeS.
In this section, we present our contribution in the development of a distributed actor-based version of the Node2Vec algorithm, ActoNode2Vec, with the aim of improving performance in terms of the required computational times and scalability. ActorNode2Vec implements two different strategies to distribute the computational bottleneck of the original algorithm depending on the need of the context. The first one aims to distribute with specialized actors only the probabilities computation part (
Computational limits of Node2Vec
Although Node2Vec is presented as a scalable algorithm, the original formulation and implementation involve the use of a multi-threading solution only for the second phase of the random walk generation (workers parameter) to reduce the computational times. However experimenting Node2Vec with large networks it is immediately clear that the first phase of the probabilities computation is the most critical point of the architecture. This issue is due to the construction of the required probability distribution presented in equation 2. The use of a second-order Markov chain considering the second-order neighborhood of each node, inevitably has the consequence of leading to a heavy dependency on the number of nodes and in particular on the network density. Increasing the dimension of the network (number of nodes and edges) in fact, leads to two main critical issues:
In light of this, this phase of the algorithm represents a first considerable bottleneck of the algorithm. Our proposal is to overcome this issue using an actor-based architecture that enables a reformulation of this step. The key idea is to distribute the computation of the probabilities distribution in equation 2 using several specialized actors that take the burden of this computation each one for a subset of actors without the need of a preliminary ordering of nodes in the graph.
The actor-based architecture
In order to implement an actor-based solution, we have defined different actor behaviors and elements that compose the software architecture:
Starting and initialization steps
The execution of the algorithm is performed by the Remote Launcher (RL) introduced in the previous section. The RL represents a server application that has the duty of reading the input network from the memory, of choosing the embedding parameters and of initializing different ActoDeS actor spaces in the distributed network. It initializes the primary actor space, requiring the creation of the Node2Vec initiator on it and by sending to this actor the graph and the parameters. At the same time, the RL initializes an ActoDeS broker on a secondary actor space to establish the communication among the different actor spaces that build the architecture (Figure 4). After this step the RL main duties are concluded and it can wait a notification of the algorithm’s termination. At the same time, the Node2Vec Initiator actor requests the creation of an Actorspace Initiator on each secondary actor space and sends to each one the entire graph to be embedded and the Node2Vec parameters. The previous ActoDeS broker become another Actorspace initiator for its actor space. These different Initiators generate several Probabilities Manager actors (PM) to prepare the architecture for the next steps. The number of actors that are created is a parameter, but the default value is equal to the number of available logical CPUs on the network node. Each PM actor receives the entire graph and the other parameters. This design choice is motivated by the next steps of the algorithm and in particular to do not have to order the nodes of the graph,that would make computational complexity exponential. Not only, this choice is due also to take advantages from the shared memory of the actors on a single node because operation on the graphs are read-only. (See Figure 5).

Starting step and creation of the actor spaces.

Initialization step of ActorNode2Vec.
This step represents the key difference between the original version of the algorithm and our proposal. In light of the limits presented previously, we propose a reformulation of this step where the required probability distribution is computed not on a single thread and neither with a multi-thread solution but dividing and distributing the entire computation. The simple multi-thread solution on a single computational node has been excluded because it does not permit to overcome the memory issue presented in section A in the case of a large network. The two models implemented in the algorithm, PC and PC-RWC start to differ exactly from this point. In both case we defined a specialized typology of actor’s behavior (Probabilities Manager or PM) that has the duty of computing probabilities and walks for a subset of the graph nodes depending on the selected mode of operation. These specialized actors are created in the actor spaces on request of the Coordinator actor. This last actor has the duty of executing the following algorithm steps: It receives all of the PM’s references from the Actorspace Initiators and after that it kills the latters as they are no longer necessary. Secondly, the Coordinator sends actors references to each PM actor. This operation is necessary to enable the PMs to understand which subset of nodes are under their responsibilities for the probabilities computation. Figure 6 describes this phase of the algorithm.

Distributed and actor-based probabilities computation.
Probabilities Managers have the duty of computing probabilities following the second-order Markov chain model presented in equations 2 and 3. Each one is responsible for a subset of the graph nodes. PMs know the references of each other in order to identify the number of actors involved in that calculus and to divide the nodes set. This choice permits to avoid nodes ordering that is an operation that grows exponentially with the number of nodes. In this way, each PM has the visibility of the entire graph to consider the neighborhood of each node and it is computationally efficient using the advantages of the shared memory among the actors on a single working node. The nodes set is divided by each PM using an ordered list of the other involved PMs references, so each actor is able to detect the subset under its responsibility and who are the responsible for the other nodes. Once probabilities have been computed by each actor using for the first-order neighborhood the edges weights and for the second-order the Markov chain model, they also prepare the probability distribution to be sampled with the alias method. The output of each PM in fact, are two data structures, one for the probabilities and one for the related alias as the method requires [15]. At the end each PM sends these two structures to the Coordinator, which has the duty of collecting and joining them in a single probability distribution. To summarize, Probabilities Managers compute probabilities for a subset of nodes and edges but they compute also the alias probabilities to improve the following sampling process.
Random walks generation and embedding phase
Once the Coordinator received all of the required probabilities, it builds the probability distribution to be sampled for the random walks generation. It implements a biased coin system to sample from the probabilities and generates the random walks from each node with a fixed length. After that, the Coordinator collects the random walks in a corpus and feed the Neural Network model based on Word2Vec described previously. The embedding phase can represent the second bottleneck of the system in terms of time complexity, in particular depending on which technologies are used to implement the model. Some approaches to resolve this issue are presented in the future developments section.
PC and RWC operating mode
In this case, Probabilities Managers have a different behavior after probabilities computation. Dealing with large networks can be problematic in terms of memory although the distributed probabilities computation. This can happen when in a large networks the edge-density is very high as well as the average nodes degree. In these case the computational node where the Coordinator lives could not have enough memory to receive walks from the probabilities Managers. In this case is it possible to use ActorNode2Vec in the PC and RWC operating mode. The first step remains the same as in the other mode, but once terminated with the probabilities, each PM starts to compute random walks for its nodes, sampling its partial probabilities distribution. When the next node to be visited is not in its responsibilities set, the PM looks for the target PM in the environment and ask it to continue the walk with its information. When the walk reaches the fixed length determined by the algorithm parameters it is sent to the Coordinator. This last one, free of the random walks generation duty, proceeds directly to the embedding phase once collected all of the walks.
Experimentation and results
We experimented ActorNode2Vec with a well-known large network "Enron", to demonstrate the benefits of the actor-based solution and its efficiency in terms of time complexity. Memory issues are overcome by the scalability of the actor-based solution. We experimented both operation mode of ActorNode2Vec: PC and PC-RWC. In our experimentation we compared our solution with the original Node2Vec implementation in Python and with the Scala-based implementation for Apache Spark, both of them are public and available on the Stanford SNAP laboratory web site. In particular, the comparison with the Scala version is interesting because it implements a distributed version of the algorithm. We compared our solution also with DeepWalk and finally, with a Java version of Node2Vec. As previously mentioned DeepWalk does not present memory issues because it does not compute a probabilities distribution. However to deal with large networks or with high algorithm parameters, DeepWalk use a streaming policy from the main memory to build the truncated random walks. The comparison with this algorithm is therefore more interesting because with a less quality of the embedding, DeepWalk can be thought as a lightweight version of Node2Vec.
The enron email dataset
The Enron Email dataset is a large collection of over 600,000 emails generated by the past employees of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation after the company bankruptcy in 2001. This collection has been processed for several scientific studies (e.g [9], [22]), analyzing discussion threads and the content of the email. These works have identified the most important component in about 37 thousands of email addresses. In particular in [22] they modeled this component as a graph by considering each email address as a node (36692 nodes), and the emails exchanges as undirected edges (183831 edges). Currently it is considered one of the most important standard network in Network Science.
Experiments setup
In order to compare the other algorithms with ActorNode2Vec with the Enron network we used a computer cluster with 4 linux nodes. The most performing one (Node 1) has been used to execute Node2Vec, DeepWalk, Java Node2Vec, the master in Apache Spark and The coordinator of ActorNode2Vec, the others to execute the Remote Launcher, the ActoDeS broker and the secondary actor spaces for the probabilities computation. Hardware specifications are the followings:
Results
A first analysis concerned an overall comparison between Node2Vec and ActorNode2Vec operating in the PC mode. In order to achieve this result, we choose to compute the embedding of the Enron graph with the following initial configuration of the parameters: Embedding dimensions: 100 P: 1 Q: 1 Walk length: 10 Numbers of walks: 10
In particular with parameters p and q equal to one, we can assume that the DFS and BSF strategies are used uniformly in the random walks generation. This particular choice of P and Q parameters enable us also a next fair comparison with Deep Walk algorithm where the two searching strategies are balanced. However p and q do not affect the algorithm’s performance. Results of this first comparison are in Table 1. In this first experiment ActorNode2vec required about the 80 % less time to embed the Enron network. In this first case the improvements seem to be related both to less time in probabilities computation and in the embedding phase. We have further analyzed this case by increasing the number of embedding dimension to 300. In such way the embedding phase take several minutes in both versions of the algorithm because it represents the number of neurons in the hidden layer of the adopted Multi-layer perceptron. It is also to report that training and inference of the statistical model could be more efficient in Python with the Gensim implementation of the neural network than in Java. Results of this case are in Table 2. In fact, in this case ActorNode2Vec took about the 28 % less than Node2Vec, however the improvements are related only to a minus required time of the actor-based probabilities computation.
Run times with the initial parameters configuration
Run times with the initial parameters configuration
Run times with embedding dimensions @300
In order to evaluate how the actor-based solution of ActorNode2Vec improves the time complexity in its two operating modes, we compared our solution also with a distributed Scala implementation of Node2Vec available for Apache Spark, with DeepWalk and with a Java version of Node2Vec. These three additional implementation are further analyzed in a fair way with a sensitivity analysis of parameters. In light of the results presented in the previous section, we experimented ActorNode2Vec with the same initial configuration set of the parameters, but further analyzing the results by changing the number of walks and the walks length in a range between 1 and 30. These two analysis are necessary to understand how the actor-based solution of ActorNode2Vec improves the time complexity of the algorithm. Walk length and the number of walks are the main parameters that affect the probabilities computation phase in terms of required time. Results concerning the walk length sensitivity are presented in Figure 7, while the others concerning the numbers of walks in Figure 8.

Sensitivity analysis of the Walk length parameter.

Sensitivity analysis of the Number of walks parameter.
The sensitivity analysis highlights a first result in the comparison between the different modes of ActorNode2Vec. As previously anticipated the PC mode results to be the more efficient when probabilities computation can be the bottleneck in terms of memory and the computational node where the the Coordinator is executed is able to manage the probability distribution. However in case of a more dense and large network when also the random walks generation has to be distributed the PC-RWC mode requires in average only the 3.9% 10.12% more of time than PC by varying the Number of walks and Walks length respectively.
Comparison with Node2Vec
The main important result concerns the comparison between ActorNode2Vec with Node2Vec on a single machine and its distributed version on Apache Spark. As it possible to notice in Figure 7 and 8 both modes of ActorNode2Vec require less time than Node2Vec. In particular ActorNode2Vec PC requires on average the 73% less time than Node2Vec and the 72% less than the Spark version by varying the walks length. Considering the number of walks Actornode2vec requires respectively the 81.56 % less and the 74% less than Node2Vec and the Spark version. This results highlights also that the Node2Vec implementation on Spark offers better performance than the original one only when the value of the parameters are below a threshold.
Comparison with DeepWalk
DeepWalk differs from Node2Vec for the absence of a probabilities computation step and implements also a streaming strategy from the main memory to avoid memory issues. However varying the two parameters of interest as in the previous cases it is possible to see that below a very small threshold its behavior is similar to ActorNode2Vec, but above that value both modes of our solution perform better. In particular the PC mode requires on average the 19% less time in the number of walks case and the 10.5 % by varying the walks length.
Comparison with a Java version of Node2Vec
In order of further analyze if the better performance of ActorNode2Vec are due to the actor model or to the different Java platform, we implemented the original Node2Vec in Java. This version exploits common parts of ActorNode2Vec but it does not provide distributed probabilities computation and random walking as the original Python implementation. In both cases (walk length and number of walks) ActorNode2Vec requires less computational time especially with the grow of the parameters value. In particular the PC mode requires on average the 35% less time in the number of walks case and the 25.2% less by varying the walks length.
Conclusions and future developments
In this paper, we propose a scalable and distributed version of the Node2Vec algorithm called ActorNode2vec, developed on an actor-based architecture that allows to overcome some limitations in terms of memory requirements and time complexity when applied to large networks. Results show important reduction with respect to other available algorithms in terms of the required computational times, while memory issues are overcome with the scalability of the solution. Some future developments are related to the second bottleneck of the algorithm, that concerns with the time complexity of the Neural Network adopted for the embedding phase. For example an implementation with DeepLearning4j could improve required training times, as the choice of distributing also the training phase. Some other future developments are related to the use of the actors also for a distributed generation of the random walks and the use of different strategies for the network exploration. Finally, other future research activities will be dedicated to ActoDeS. We are working about the improvement of the support for knowledge and ontology based applications [23, 24] and in the development of tool that has the aim of simplifying the design of applications [25, 26].
