Abstract
Networks are a fundamental and flexible way of representing various complex systems. Many domains such as communication, citation, procurement, biology, social media, and transportation can be modeled as a set of entities and their relationships. Temporal networks are a specialization of general networks where every relationship occurs at a discrete time. The temporal evolution of such networks is as important to understand as the structure of the entities and relationships. We present the Independent Temporal Motif (ITeM) to characterize temporal graphs from different domains. ITeMs can be used to model the structure and the evolution of the graph. In contrast to existing work, ITeMs are edge-disjoint directed motifs that measure the temporal evolution of ordered edges within the motif. For a given temporal graph, we produce a feature vector of ITeM frequencies and the time it takes to form the ITeM instances. We apply this distribution to measure the similarity of temporal graphs. We show that ITeM has higher accuracy than other motif frequency-based approaches. We define various ITeM-based metrics that reveal salient properties of a temporal network. We also present importance sampling as a method to efficiently estimate the ITeM counts. We present a distributed implementation of the ITeM discovery algorithm using Apache Spark and GraphFrame. We evaluate our approach on both synthetic and real temporal networks.
Introduction
Networks have been widely used to represent entities, relationships, and behaviors in many real-world domains including power grids [10], social networks [22], microbial interaction networks [49], corporate networks [51], the food web [19], and modeling adversarial activities [11]. These complex systems do not show a temporal or structural continuum, but rather show a characteristic non-linear dynamic behavior [4, 52]. Count-based metrics such as the number of entities (i.e., nodes), the number of interactions (i.e., edges), and the average connectivity of the entities in the network (i.e., degree) are important measures that represent the population and the interaction density of the entities involved in the network. However, these measures are limited in their ability to describe non-linear, localized, and dynamic properties of the systems. To uncover structural, temporal, and functional insights of complex systems, network motifs have been used extensively in recent years as they provide a tractable approximation of the networks that can be measured and updated within given memory and compute constraints. Holme et al. [15] define a library of motifs that can represent six fundamental interactions types in synchronous and asynchronous communication [45], as shown in Fig. 1. In the case of synchronous communication, both participants are active and engaged, whereas, in the case of asynchronous communication, the receiver may not be present when the sender initiates the message. In addition to the structure of a graph, [8, 25], the motifs can also measure temporal patterns in a graph. Recent temporal motif-based approaches [40, 17] count all the isomorphic instances of a motif that are formed within a given duration, and use the count to characterize the graph. In addition to the global temporal patterns of the graph, it is also desirable to discover the local temporal evolution of the graph which is important for many domains such as social networks, communication networks, and terrorism activity knowledge graphs. Table 1 shows a notional example of two events. Both the events that take 30 time units to form and can be represented as star temporal motifs but {e1, e2, e3} and {e6, e4, e5} differ in the order they are formed. Such an ordering of events is a critical piece of information to measure the spread of misinformation in social networks, packet switching in communication networks, etc.
Temporal ordering of events
Temporal ordering of events
Fundamental interaction types define by motifs.
We present the
The rest of the paper is organized as follows. Section 2 presents related work and highlights out research contributions of this work. Section 3 lays out various definitions. We start by defining Atomic Motif and extend it to Temporal Motif and Independent Temporal Motif. We also define various ITeM-based metrics used to characterize salient properties of a temporal graph. We present our core approach and distributed algorithms in Section 4. Section 5 shows our experimentation with synthetic and real-world datasets to summarize the temporal networks and measure their similarity. Section 6 presents conclusions and future work.
Extensive research has been done on the appropriate definition of network motifs [37, 53] and their application to various network analytical tasks. Cao et al. [8] use network motifs to define the network backbone, which is a collection of relevant nodes and edges in the large-scale network. They define a motif-based extraction method to extract the functional backbone of the complex network. The functional backbone is indicative of certain functional properties of the network that cannot be explained by centrality-based backbones. Similarly, Shen et al. [49] use a weighted motif to cluster microbial interaction networks. Network motifs can also be used to identify the exchange of emotions in online communication networks, such as Twitter [25], using emotion-exchange motifs. The emotion-exchange motifs containing reciprocal edges manifest anger or fear, either in isolation or in any combination with other emotions. Conversely, positive emotions are characteristic of one-way motifs. Jin et al. [18] define TrendMotif that describes a recurring subgraph of weighted vertices and edges in a dynamic network over a user-defined period. The TrendMotif can indicate the increasing and/or decreasing intervals for the weighted vertices or edges over the time period. Borgwardt et al. [7] extend pattern mining on static graphs to time series of graphs where each graph has the same set of vertices and observed addition and deletion of edges.
A temporal network is a generalization of a static network that changes with time. Many system modeling approaches model time as an attribute of the entity or the interaction, which makes temporal graphs a special case of attributed graphs. We interchangeably use network and graph in this paper. Incorporating time into static graphs has given rise to a new set of important and challenging problems that cannot be modeled as a static graph problem [21, 36]. Network motifs are also used to visualize and summarize large dynamic graphs [32]. TimeCrunch [47, 46] discovers five different temporal patterns of some common substructures and summarizes the network in terms of a sequence of substructures that minimizes the Minimum Description Length (MDL) cost of describing the graph. Adhikari et al. [2] use local substructures to condense a temporal network. Liu et al. [29] propose a Bayesian framework to estimate the number of temporal motifs in communication networks. A majority of the prior research does not account for the temporal evolution of the motif. Recent work [40] defines
We propose the
Symbols and their descriptions
Symbols and their descriptions
We present the ITeM-based approach to characterize a temporal network. In the following sections, we present definitions and algorithms used by ITeM to model a temporal network. We also review the Maximum Independent Set (MIS) problem, which is a subproblem of the proposed algorithm. MIS has been proved to be an NP-complete problem, and we present a heuristic-based approach to finding the lower bound on the ITeM frequency [33]. We also outline a sampling method to estimate the true frequency of a temporal motif in the network. The sampling approach is based on the importance of the sampled network [30].
A temporal graph is a specialization of a static graph, where each edge of the static graph appears at a time unit such as second, day, year, etc. Various representations of temporal graphs that are useful in different scenarios are proposed [35]. We use a window-based representation, where each window corresponds to a temporal sub-graph between two time-steps.
.
Temporal Graph: A temporal graph
This definition allows for the representation of a large graph with a single window. It is useful for datasets that are small in size and cover a small period of time.
Atomic motif
Atomic motifs are small subgraphs that serve as interesting indicators for complex networks. They can reveal patterns of association among entities in the network. Figure 2 shows a library of atomic motifs used in the current work. A
Atomic motifs.
We can define atomic motifs of any number of vertices and edges, but the larger motifs are more difficult to search for in a network due to the intractability of the subgraph isomorphism, leading to an exponential increase in the runtime. Previous work shows that the computational cost of motif counting increases exponentially with k in
We limit our motif library to 4-order motifs. The selection of d-order motifs to include in the search library has been influenced by previous research in this area, functional interpretation of the motifs in real-world domains, and computational pragmatism. In addition to the higher-order motifs (d
Dyads and triads are the most used motifs to model complex networks. Larger acyclic and dense patterns do not uniquely explain different phases of temporal diffusion, whereas both the triads and linear chains do a better job [43]. Motifs such as
Example input graph.
.
Temporal Motif: A Temporal Motif
Edges have a temporal ordering such that for an edge
A Temporal Motif is a specialization of the atomic motif, where every interaction between two vertices occurs at a specific time-step. The time step
ITeMs for example input graph.
Schreiber and Schwobbermeyer [44] describe three different ways to measure the frequency of any pattern in a graph. They categorize them as
A major contribution of our work is the ITeM, which is an edge-disjoint temporal motif such that no two motif instances share any edge between them. ITeM is different from the temporal network modeling approaches mentioned in the related work, which use overlapping motif instances where some of them can share any number of edges. Overlapping motif instances can be used to model a network where it is common to have nodes and edges participate in multiple functional processes such as biological networks [9] but fails to model a network where each edge represents one transaction between two entities as in communication and financial networks. Independent motif instances capture a more accurate state of the network as no two transactions are part of any two motifs. Figure 4 shows a set of ITeM instances discovered for the input graph shown in Fig. 3. ITeM provides a lossless characterization of the input graph and can be used to reconstruct the input graph. Algorithms presented in Section 4 use the order defined in Fig. 5 to discover ITeM instances. ITeMs also define the temporal ordering of the edges within a motif. Such an ordering is independent of the actual timestamps used to define the temporal edges. This flexible approach allows us to use ITeMs with different granularities of the temporal graph. Additionally, for a temporal network, ITeM can be used to model the rate at which the network grows as it distinguishes between adding a transaction using new nodes to the network and reusing them for multiple future transactions. Overlapping instances fail to capture this phenomenon as they compute all the isomorphic instances of a motif type. This restriction also poses a greater complexity issue as finding temporal motifs is proved to be an NP-Complete problem [30]. In the following sub-sections, we define some key concepts used by ITeM to model a temporal network.
Library of temporal motifs used in this work.
We define the birth-time of a vertex in the temporal network as the time of the first transaction involving the vertex. The birth of a vertex increases the network size by one vertex. For the rest of the life of the network, that entity is treated as reused and it never increases the network population.
Structural contribution
Structural Contribution of an ITeM instance is a measure of the growth in the graph size as a result of adding the instance. The Structural Contribution of an independent temporal motif in terms of the number of edges is always equal to the number of temporal edges in the temporal motif. Figure 5 shows a set of temporal motifs and their structural contributions. As shown in Fig. 5, every instance of
Motif orbit
An orbit of a motif is defined as distinct positions in which a vertex can appear within the motif. An
Independence
We also define Independence of a temporal motif as a measure of its uniqueness in a given temporal graph. The independence can be measured for temporal motifs, temporal edges, or vertices of the temporal graph. The edge-disjoint concept defined above leads to maximal independent temporal edges because every edge has a bijection to the set of independent temporal motifs. We define the independence of a temporal motif and a vertex as follows:
.
Motif Independence: For a given temporal motif
where
This frequency-based metric identifies unique temporal motifs in the graph. Highly independent motifs exhibit the lower average cost of finding isomorphic combinatorial instances because of their uniqueness.
.
Vertex Independence: For a given temporal motif
where
Temporal motifs with high vertex independence lead to high structural contribution, whereas low vertex independence leads to co-located independent temporal motifs with a higher number of shared vertices among them.
Approach
In this section we present the algorithm to count ITeM frequency. We also present a variant of it using window-based Importance sampling.
Algorithm to count ITeM frequency
ITeM discovery algorithm flow diagram.
blueFor every motif instance label
return
Figure 6 shows a simple flow diagram for ITeM discovery. The ITeM discovery process takes a temporal graph and a library of atomic motifs as input and iteratively measures the count and temporal evolution of the motifs. ITeM is a simple and efficient approach to graph characterization because of an abundance of small motifs in real-world systems. Algorithms 4.1–6 present the pseudocode to find independent temporal motif instances in a given temporal graph. Algorithm 4.1 inputs a temporal graph and a set of atomic motif types to discover as shown in Figs 3 and 2 respectively. Line 1 discovers all overlapping motif instances of a given motif type
Algorithm 6 presents the pseudocode of a distributed implementation of the MIS algorithm. We use Pregel API, available in Apache Spark, to implement Luby’s Algorithm [33]. We initialize all vertices in their own independent set as shown in lines 2–4. At lines 5–9 of Algorithm 6, each vertex exchanges messages with its neighbors and updates its independent set value based on the minimum values received from all neighbors. This process stops when no vertex in the graph changes its independent set.
Performance of Algorithms 4.1–6 is influenced by different structural properties such as degree distribution and clustering of a real-world graph. Worst-case performance is observed for complete large graphs, but the real-world graphs are not fully connected and instead show domain dependent characteristics such as power-law degree distribution, low clustering coefficient, and shrinking diameter as the graph evolves. For the maximum number of edges
Importance sampling to count ITeM frequency
Our approach includes three major algorithmic components: searching for overlapping temporal atomic motifs, finding independent temporal motifs, and computing information content and temporal evolution of such motifs. Out of the three components, finding independent temporal motifs is an NP-Complete problem, and we use a heuristic to find a lower bound of the actual count. As explained in the previous section, we construct a motif overlap graph where every vertex is a motif instance and an edge between two vertices exists if the corresponding motif instances share an edge in the original temporal graph
Importance sampling for motifs is presented by Liu et al. [30]. The importance sampling is based on the assumption that each distribution has some interesting or important regions and the samples drawn from those regions must be normalized to get an unbiased estimate [38]. Window-based importance sampling splits the time series dataset into multiple temporal windows and performs computation on each window. We create window graphs with equal temporal window size, each with a different number of edges within the window. Each window is assigned an importance, based on the fraction of all the edges present in the window. The importance is used to normalize the computed metric across all randomly-selected windows. The normalization reduces the overall variance for real-world domains that do not show a burst. The current approximation approach does not model such anomalies in the ITeM distribution but allows us to model the evolution of a network as shown in Section 5. Future work will address this challenge using an importance decay approach that gives more importance to recent windows. We compute the distribution of all temporal motifs present in the window graph. At the end of all the windows, we compute the weighted average of all the distributions, which gives a lower bound estimate of the distribution that can maintain a relative error tolerance of 5% in the count [30].
For a given temporal graph
We also define a random variable
where
Experiments
To evaluate the performance and scalability of our approach, we analyzed a rich set of synthetic and real-world temporal datasets. Datasets used in this study exhibit real-world properties of different domains such as sparsity, preferential attachment, and long-tail degree distribution. The synthetic dataset is generated with Gaussian noise in temporal edges. The real datasets show temporal bursts, observed in social networks. The experiment provides support for our following core contributions:
ITeMs are a novel way of capturing discerning temporal properties of a temporal network that cannot be measured using static and overlapping temporal motifs. ITeMs outperform the Stanford SNAP temporal motif algorithm (referred as Our approach is scalable and configurable to analyze a temporal network as one large graph or a sequence of windows using sampling.
All the experiments are done on a cluster using Apache Spark 2.3.0 and GraphFrame 0.7.0. All the algorithms are implemented in Scala 2.11.8, and the source code is available at
ITeMs can efficiently model the evolution of a temporal network using the properties defined in the section above. To present the accuracy of modeling temporal changes in the network using ITeMs, we generate a set of synthetic temporal graphs using a stochastic generation method and measure the change in the similarity as the networks evolve. We benchmark against
Figure 7 shows the rate of the addition of temporal edges to the graph. We also show a zoomed-in version (right) of
Synthetic graphs.
Temporal graph similarity.
Figure 8 shows the change in normalized graph similarity as a function of the difference in the time duration of the synthetic graphs. A point (i, j) on the plot represents the average Euclidean distance
DG also characterizes a temporal network in terms of graphlet count for the entire network and individual nodes. DG also provides a
Figure 8 (right) shows the result comparing DG and ITeM. As shown in the Fig. 7, the base graph shifts from a stochastic base model to a Gaussian distribution based temporal network, which explains the initial sharp increase in the graph distance measured by both algorithms. Both the approaches also show sub-linear trends afterward but only ITeM continues as the time difference between graphs increases. DG shows sudden exponential changes in the distance (or similarity) that do not correspond to the linear temporal evolution of the graphs as shown in Fig. 7 (right). Overall, both the approaches exhibit similar trends that show the importance of modeling temporal variations and orbital information of the graph, in addition to the frequency count.
Temporal graphs datasets
We analyze various real-world networks and measure the difference in their temporal evolution. The following list introduces all the datasets used for the experiments. Table 3 describes their static and temporal scale. We generate tITeM distribution and use it for the measurement. We also use the change in the distribution over time to detect an event in the network.
ITeM distribution (log10) of different datasets.
Real-world graph similarity (y-axis) using ITeM, SNAP, and DG (x-axis) for CM, BA, EE, TT, and IA.
Motif and vertex independence of different datasets. X-axis represents motif-id and y-axis represents motif independence (left) and vertex independence (right).
Figure 9 shows the independent temporal motif distribution of different datasets. Similarly, Fig. 11 shows motif independence and vertex independence for the datasets. These results give initial clues that similar domain networks such as
We also compute network similarity across all network pairs, using Euclidean distance between the normalized frequency vectors. Figure 10 shows the similarity between each pair of real-world datasets using the three approaches. Both DG and ITeM identify CollegeMsg (
ITeM can also model the temporal evolution of a network using a sequence of temporal graphs, each with a given time window. We use the Higgs Twitter (
ITeM frequency changes in the Higgs Twitter (HT) temporal network.
ITeM Independence changes in the Higgs Twitter (HT) temporal network.
ITeM runtime analysis on Email-EU (EE) dataset: Single Graph.
ITeM runtime analysis on Reddit (RH) dataset: Sequence of temporal graphs.
Figure 13 shows motif independence over time for the same window of the
ITeM frequency changes in the Wiki-talk (WT) temporal network.
A major contribution of this paper is a distributed algorithm to analyze a large temporal graph or a sequence of temporal graph windows. All the algorithms are developed using the Apache Spark 2.3.0, GraphFrame 0.7.0, and Scala 2.11.8 environment. This allows the use of scalable distributed data structures to handle large graphs in the order of millions of edges and to iteratively update the temporal-structural and orbital properties of the graph. To analyze the scalability of the core algorithm, we use a Snakemake [20] based automation pipeline and a SLURM [55] based resource manager. We experiment with different combinations of hardware resources and distributed partitions. Figure 14 shows the results of the scalability experiment using the EmailEU dataset. ITeM shows initial speed-up up to a maximum of 32 cores available to the Spark application. Beyond this point, the application suffers from communication and data serialization overhead. A similar trend was observed as we increased the number of data partitions, keeping the maximum number of cores fixed. The run-time sharply decreases as we increase the executor memory from 2 GB to 6 GB, and the decrease slows down after that.
Temporal analysis of an evolving network using a window-based approach poses memory constraints and scalability challenges as the number of windows increases. We preserve minimum information across the windows to maintain a global summary of the temporal network and to save window-specific summaries and vertex features to files, to be used by other analytic processes. This allows us to use our method in a longer running streaming fashion. Although we do not observe a strong sub-linear trend as the windows progress, as shown in Fig. 15, further analysis of the window graph structure using ITeM suggests that the run times depend on both the window size and the fringe structure of the graph. The runtime of Window 5 and 10 decreases even as the graph size increases because those windows have a higher number of multi-edges in comparison to windows of similar size, which leads to aggressive subgraph reduction while discovering larger motifs. Future work will perform a more detailed analysis of the impact of a specific ITeM count on the runtime. For all three approaches, overall run-time complexity depends on enumerating larger motifs in the network but
ITeM-based analysis of real-world COVID-19 indicators
In this section, we present a real-world use case and ITeM-based analysis to extract actionable knowledge from it. Coronavirus Disease 2019 (COVID-19) is an ongoing outbreak and the latest threat to global health. Understanding the implications of social interaction on COVID-19 indicators is an important research objective to help formulate policies and guidelines by governments and local authorities. We use curated state-level COVID-19 indicators [48] such as Active Cases, Deaths, and Hospitalization Rates for the United States. We also curate domestic US air travel data as shown in Fig. 17a and present its impact on COVID-19 indicators.
Temporal Graph representing domestic US air-travel.
We create a dynamic graph to model air travel between different US states as shown in Fig. 17b. Each US state is modeled as a node in the graph. We create edges using the total air-travel passenger count. We create a logarithmic bin to reduce the number of edges between any two states for a given day. We also create a sequence of weekly temporal graphs to measure temporal trends in the COVID-19 air-travel dataset. We generate weekly ITeM distributions to observe temporal patterns in the COVID-19 travel graph. The ITeM allows us to measure the air-travel magnitude and also the travel behavior between US states. In addition to weekly ITeM distributions, we also compute the weekly distribution of COVID-19 indicators such as Active Cases, Deaths, and Hospitalization Rates for the United States. As shown in Fig. 18, we qualitatively observe three different classes of real-world COVID-19 indicators. The “Hospitalization Rate” and the absolute number of people hospitalized in the US averaged over a week show a similar trend. In contrast, indicators such as “Active Cases”, “Confirmed Cases”, “Deaths” show a similar upward trend. The “Mortality Rate” does show an initial sharp upward and slowly decreasing trend in contrast to all the other indicators.
Weekly real-world Indicators and ITeM trends from April 2020–Aug 2020.
We use Dynamic Time Warping (DTW) [6] to quantify ITeM similarity with the rest of the indicators. DTW measures the similarity (or distance) between two time series by computing optimal matches between them. We compute the pair-wise distance between all real-world indicators and ITeM distribution for each week from April 2020 to August 2020. We used the degree distribution of the travel graph as the baseline similarity with each indicator. ITeM distribution shows high similarity with average travel and “Hospitalization Rate” as shown in Fig. 19.
COVID-19 real-world indicator similarity with ITeM distribution.
ITeM can also be used as an early indicator of a shift in a real-world indicator. COVID-19 contact tracing research has estimated a 2 to 14 day incubation period for the virus [1]. Lauer et al. [26] predicts that more than 97% of people who contract SARS-CoV-2 show symptoms within 11.5 days of exposure. So the early identification of real-world indicators is beneficial to contact tracing and local government policy recommendations. Baseline measures such as aggregated travel statistics take about a month to accurately predict the trend as shown in Fig. 20. In contrast, ITeM provides a shorter prediction window where real-world indicators show similar trends within three weeks, as shown in Fig. 21. The gain of one week to estimate the indicators based on higher-order analysis such as ITeM is significant and can improve policy planning.
Average travel trend follows COVID-19 real-world indicators with a shift of one month. 
ITeM-based travel trend follows COVID-19 real-world indicators with a shift of three weeks. 
Complex temporal networks are observed in the real world, and a better understanding of them is required to effectively handle real-world applications. We present Independent Temporal Motif (ITeM) as a building block to characterize temporal graphs. ITeM reveals many salient features of the temporal graph, such as its core structure, fringe vertices and edges, temporal evolution, and uniqueness. Graphs from different domains are found to exhibit varied structural and temporal distributions. Likewise, graphs from similar domains are found to exhibit similar structural properties, but many of them show varied temporal characteristics. We use these observations to characterize individual graphs and define a metric to quantitatively measure the similarity among them. We also present the importance sampling based approach to analyze a large graph as a sequence of smaller windows. We use this to show a change in the distribution that exhibits a behavioral shift in the way entities interact in a transactional graph, such as a social network.
The rate at which temporal motifs are formed can also be used to generate synthetic graphs that exhibit similar evolution as a given real-world graph, as shown in [41]. Additionally, these features can also be used in a diverse set of applications, such as approximate sub-graph matching, graph mining, and network embedding learning. We will compare ITeM to other temporal network embedding generation techniques to measure the benefits of ITeM over other approaches for use in such applications. Future work will also address scalability challenges by estimating the number of ITeMs using specialized algorithms for different motif classes and perform a sensitivity analysis of the sampling approach.
Footnotes
Acknowledgments
We thank the DARPA Modeling Adversarial Activity (MAA) program for funding this project. The associated PNNL project number is 69986. A portion of the research was performed using PNNL Institutional Computing (PIC) at Pacific Northwest National Laboratory. We also thank Patrick Mackey and Joseph Cottam for providing feedback and help setting up the experimentation.
