Abstract
Abstract
1. Introduction
D
There have been numerous pairwise network alignment approaches, such as PathBlast (Kelley et al., 2004), MaWISh (Koyutürk et al., 2006), NetworkBlast (Kalaev et al., 2008), NetAligner (Pache et al., 2012), PINALOG (Phan and Sternberg, 2012), SPINAL (Aladag and Erten, 2013), MI-GRAAL (Kuchaiev et al., 2010), Græmlin (Flannick et al., 2006), IsoRank (Singh et al., 2007), Natalie 2.0 (El-Kebir et al., 2011), GHOST (Patro and Kingsford, 2012), and Node Fingerprinting (Radu and Charleston, 2014), which compute pairwise alignments with varying degrees of success on varying datasets, but this problem remains a challenging task. There has also been an approach to directly optimize edge conservation while the alignment is constructed, known as MAGNA (Saraph and Milenković, 2013). While pairwise alignment is a great tool for comparing two networks, multiple alignment is more appropriate at highlighting highly conserved interaction patterns across networks. There are several alignment algorithms that are able to perform multiple network alignment, but they are still in their infancy. These are IsoRankN (Singh et al., 2008), SMETANA (Sahraeian and Yoon, 2013), and NetCoffee (Hu et al., 2014). While these algorithms can align multiple networks, the accuracy is low, resource requirements are high, and the size and number of networks is highly limited.
Further, there exists a paucity of accuracy metrics for assessing multiple network alignment. This makes the evaluation of alignments, and the comparison of alignment algorithms very difficult. The accuracy measures for pairwise alignment such as edge correctness (EC) (Kuchaiev et al., 2010), induced conserved structure (ICS) (Patro and Kingsford, 2012), and symmetric substructure Score (S3) (Saraph and Milenković, 2013), demonstrate a substantial reliance on the number of conserved interactions across all networks. Therefore, they do not scale well when applied to multiple networks, where the conservation or absence of interactions across a subset of networks is highly informative. Additionally, these metrics are unable to discriminate between a set of networks that are all highly dissimilar and a set of networks that contain multiple highly similar networks and one highly dissimilar network. Published accuracy measures devised for multiple alignment—such as specificity, the number of correct nodes (Sahraeian and Yoon, 2013), as well as the mean entropy and the mean normalized entropy (Hu et al., 2014)—rely heavily on knowing the correct alignment, which is rarely available.
The use of one-to-one or many-to-many mappings varies throughout the network alignment community. This further complicates the requirements of an ideal alignment metric. We define one-to-one mappings for cases in which a vertex in one network is mapped to at most one vertex from each of the other networks being aligned. We define many-to-many mappings for cases in which a set of nodes from one network is mapped to a set of vertices from each of the other networks, where any set may contain more than one node. There are currently no widely accepted methods to deal fairly with these two biologically reasonable mapping approaches. The ideal metric should penalize many-to-many mappings only in cases where there is ambiguity in the mapping and the network structure does not fully support the many-to-many mapping over several one-to-one alignments. It should also reward many-to-many alignments in cases where they are correctly mapping sets of nodes that are structurally homologous, because in such cases the interactions common across networks will be correctly identified. A good example of this is the mapping of a set of vertices {u,v,w} that are adjacent to the node x in network A to the set of vertices {a, b, c, d, e} that are adjacent to the node y in network B where x and y are mapped. We propose a new method for analyzing alignment accuracy that is both fair in these senses, and scalable, further on in this article.
2. Approach
We propose a modification to Node Fingerprinting (NF) (Radu and Charleston, 2014) to allow it to handle multiple network alignment. We call this new algorithm Node Handprinting (NH). Instead of comparing all networks directly, NH compares only networks that are adjacent in input order. In cases where the adjacent network has been completely mapped, the next adjacent network will be compared.
NH uses a similar progressive alignment technique as NF. Within each run of the progressive alignment step, candidate node pairings are chosen based on adjacency to already mapped nodes. If there are no nodes available as candidates, either because the mapping has just begun or the compared networks contain a disconnected component, the progressive alignment strategy selects the most highly connected, currently unmapped nodes, while ensuring that the minimum degree of these node sets is kept similar across all networks. Once the candidate pairings have been selected, the similarity score will be calculated using the same score function as NF (Radu and Charleston, 2014). Once all scores are calculated, we build up a multiple network alignment based on these scores by modeling it as a layered weighted bipartide matching problem (Fig. 2), which is then solved to calculate a multiple network mapping. Our instance of a layered weighted bipartide problem has a few unique properties. The number of nodes at each level is equal; this is modeled by adding “virtual” nodes with edges that have a weight of zero. This facilitates the bypassing of a network in the mapping. There may be disconnected components at each level, but any connected components per layer are always complete. The path calculated through these nodes indicates a possible mapping, with the sum of the edge weights being the mapping score. Only above average score mappings are accepted by the progressive alignment strategy in order to calculate the best alignment. Through this approach, it is possible to require only a linear increase in complexity as the number of networks increase by separating the layers and disconnected components, however, in our current implementation of the layered weighted bipartide matching solver, we show a quadratic complexity increase. We have observed that this complexity is acceptable in practice, and a reduction to a linear complexity is in preparation.

An example of the layered weighted bipartide matching problem. The nodes in this problem represent the nodes in the respective graphs and the edges represent the similarities that have been calculated with the weights being the similarity scores. In cases where the number of candidate nodes in each network is not the same, virtual “blank” nodes are added. This also allows for a network to be completely skipped by the alignment algorithm if it is drastically different from all other networks.
A potential limitation of this approach is that input order influences the accuracy of the alignment. We have experimentally discovered that while there is a variation in accuracy, there is no strong pattern as to the ordering of input networks (Fig. 1). While this is not an ideal situation, our approach allows for alignment of many networks and still compares favorably with current methods, and given the relatively low computational requirements of our approach, it is possible to perform sampling of input ordering and calculate several multiple alignments before other approaches can complete a single alignment. This is especially true for large sets of networks (see Results).

Accuracy variation based on network ordering. To observe the variation in accuracy based on input ordering, we aligned a set of networks with increasing differences in all possible ordering permutations. These different networks were created by mutating networks with a given number of mutations. The mutations performed on the networks were node addition/deletion, and edge addition/deletion with each node or edge, and the choice of addition or deletion selected at uniform random. While there is a difference in alignment accuracy as the input order varies, there is no clearly discernible pattern, especially as the difference between network increases.
2.1. Measures of accuracy
Our network alignment accuracy metrics use a parameterized accuracy function to calculate the proportion of edges that are conserved across a given number of networks. This generates an accuracy “curve” that we believe is superior to a single numerical representation, as it is able to provide a more granular portrayal of the similarity between multiple networks.
The “fair assessment” of many-to-many alignments posed a challenge, but we believe that our approach has resolved this satisfactorily. We deal with these alignments through the use of “expected” edges. That is, if there is an observed edge between node u and node x in network A, and the mapping has defined node x and node y as structurally indistinguishable in network A, then there should also be an edge between u and y in A. If this “expected” edge does not exist, then we define this edge as induced but not observed and define it as an edge conserved across zero networks. In this manner, many-to-many alignments are penalized only in cases where the ambiguity in the mapping will generate multiple induced edges that are not observed, and are given a better score in comparison to one-to-one mappings in cases where there is no ambiguity and the set of vertices is truly homologous.
We present induced multiple edge correctness (IMEC) as the accuracy metric devised for multiple network alignment. IMEC is a curve made up of a number of parameterized IMEC m scores, where 1 ≤ m ≤ k and k is the number of networks being aligned.
Consider k networks
Let S(f) be a network (V*, E*) where V* = {νi}, that is, f determines V* directly and S(f) represents the alignment. E* is such that
Let
A variant of this score curve, the comprehensive multiple edge correctness (CMEC) score, was also devised to include edges that are not in the generated alignment network defined above, but exist in the input networks. The calculation steps of the score are identical to the calculation of IMEC, but the divisor for the CMEC m score includes the number of edges that are in the input networks but are not included in the alignment network.
Let
3. Methods
3.1. Experimental procedure
The experiments were originally performed on a standard desktop machine with four physical cores running at 3.4 GHz and 8 GB of RAM. We discovered that some algorithms required more RAM than was available on this machine, so we performed these experiments on a server class machine with 32 physical cores at 2 GHz and 512 GB of RAM to ensure memory requirements were adequately satisfied. However, these experiments took over 72 hours and were aborted, with some algorithms exhibiting a memory requirement exceeding 200 GB of RAM. All results presented are from executions on the standard desktop machine.
Experiments one (E1-HuEtal1), two (E2-HuEtal2), and three (E3-HuEtal3) are a replication of the experiments performed by Hu et al. (2014). With these experiments we wished to compare the accuracy of existing algorithms on an already analyzed dataset. Experiment four (E4-HerpesVirus) is the alignment of four highly similar, small networks. We expected the accuracy to be high and the resource requirements low for all algorithms on this dataset. We devised experiment five (E5-Stress) as a “stress” test for the alignment algorithms. In this experiment there is little similarity between the organisms being compared; network sizes vary greatly, and there are a large number of networks, aggregating into a challenging alignment situation. We expected the accuracy to be low and the resource requirements very high for this experiment. Experiment six (E6-Large) contains only networks with more than 2500 unique proteins. The organisms are distantly related, but the variation in network sizes is less than with other experiments. We expected that the accuracy will be higher than experiment five, but still remain low. Experiment seven (E7-Mammals) contains only mammals, and as such we expected that the accuracy will be relatively high. The variation in network sizes is a clear indication that the accuracy may not be as high as we might expect with more balanced network sizes.
The ordering of inputs for NH was done arbitrarily (alphabetically) for all experiments. We did not perform a range of alignments with different orderings to choose the most accurate, however, in general, we have found that the ordering does not make a big difference (Fig. 1).
3.2. Data
The protein–protein interaction networks were obtained from the supplementary material of the NetCoffee publication (Hu et al., 2014), and inferred from the protein–protein interaction data from BioGRID (accessed June 2014). The networks derived from the interaction data from BioGRID contain only unique proteins and unique interactions. We accepted all forms of experimental evidence of a protein–protein interaction in this dataset. Network statistics as well as experiment participation are given in Table 1.
The networks belonging to the NetCoffee dataset are not guaranteed to have unique interactions, whereas the networks derived from BioGRID have unique interactions.
4. Results
We discovered that NH compares favorably in terms of accuracy, runtime, and memory requirements with current algorithms. From Table 2, we note that while NH is not the fastest or the most memory efficient approach in experiments E1-HuEtal1, E2-HuEtal2, or E3-HuEtal3, which use the NetCoffee dataset (Hu et al., 2014), it does exhibit substantially better EC, ICS, and S3 scores. The runtime and memory requirements are still within a respectable bound of the fastest and most lightweight algorithms for these experiments. From Figure 3, it is clear that NH has the shallowest IMEC and CMEC curve, indicating a more accurate global alignment.

Comparison of existing algorithms: NetCoffee dataset and experiments using IMEC and CMEC curves.
We note that while NH is not always the fastest alignment algorithm, it is able to calculate a much better alignment using reasonable computational resources. NH is also the only algorithm that is capable of handling the BioGRID dataset within a reasonable amount of time. NH was able to complete all alignments using the moderate resources of a standard desktop machine and does not require specialized hardware to perform the alignments.
E4-HerpesVirus is the only experiment using the BioGRID (Stark et al., 2006) dataset that all alignment algorithms completed within 72 hours. With this dataset we see NH being quicker and more memory efficient than the other alignment algorithms. Using EC, ICS, and S3, NH is far more accurate, with only SMETANA of the remaining methods able to calculate an alignment with a score above 0%. With E5-Stress, E6-Large, E7-Mammals, NH is the only alignment algorithm that is able to complete the alignment within 72 hr. Other alignment algorithms also required copious amounts of RAM (see Table 2), whereas NH could compute the alignment on a modest desktop machine. Over these experiments, we note that EC, ICS, and S3 are extremely low, especially ICS and S3. From these values, along with the IMEC and CMEC curves shown in Figure 4, we deduce that these networks have very different structures. We do note that the alignments of closely related organisms such as those in E4-HerpesVirus and E7-Mammals have shallower curves than experiments that include distantly related organisms. We believe that the reason E7-Mammals exhibits curve features that would be more in line with an alignment of distantly related organisms is that the sizes of the networks being compared are so dissimilar. We expect the accuracy to improve, and more common features to be discovered as more interactions are measured in the nonhuman mammals.

Alignment of the networks from BioGrid.
5. Discussion
5.1. Accuracy
We compared the alignments using both modified metrics from pairwise alignment and our new metrics. While the classic measures of accuracy, extended into the multiple alignment space, are still capable of providing a notion of the similarity of the networks being aligned, they do not provide the depth of information that IMEC and CMEC are able to. NH is able to perform very competitively with state-of-the art algorithms, even with the variance of accuracy based on input ordering. We would also like to note that it is possible to apply sampling to the order of inputs and get multiple alignments out of NH before any of the other algorithms are able to finish one alignment in some experiments. We also present a simple visualization in Figure 5 of the alignment resulting from E4-HerpesVirus by both NH and NetCoffee to compare the difference in accuracy visually. We selected the resulting alignment for this experiment largely by the size of the input networks. Current visualization techniques do not scale well with large networks in terms of both runtime and usability. We did not visualize the alignment given by IsoRankN and SMETANA because our visualization solution was unable to handle many-to-many alignments gracefully.

Visualization of the alignment of E4-HerpesVirus using
5.2. Computation time
While NH is not the fastest algorithm compared across all experiments, it was always within a reasonable margin of the fastest method. NH was the only alignment algorithm that is capable of performing the strenuous experiments.
A tight upper bound of the time complexity of the algorithm remains an open problem as it depends on the structure, density, similarity of the networks being compared, as well as the uniqueness of the structures within each network. We offer an approximation of the computational complexity but stress that this is very rough.
We consider k networks
Once the nodes are scored, we use a layered weighted bipartide matching solver to combine these pairwise “fragments” into a multiple alignment. In our initial implementation, this step has a complexity of ρn2k2. This gives the program an overall complexity of ρ3nk + ρn2k2, with a worst case O(n3k(n + k)). We performed several experiments to validate our rough complexity approximation, the results of which are shown in Figures 6 and 7.

Resource usage scaling with varying the number of networks aligned (k).

Number of score function calls that must be performed with varying k.
5.3. Memory requirements
The memory requirements of NH are not the lowest in all experiments performed, but were always within a reasonable margin of the most memory efficient alignment algorithm. NH has the best memory scaling however, and is the only alignment algorithm that is able to run on a standard desktop machine while allowing a realistic number of large networks to be aligned. While aligning networks from BioGRID dataset, NH requires less memory than other alignment algorithms. We believe that, as with runtime, the memory requirement of NH is highly dependent on the structure of networks being compared.
6. Conclusion
We have presented NH, a fast, accurate, lightweight, and highly scalable method for global alignment of multiple biological networks. We have compared our method to current multiple network alignment approaches and have found that NH is able to compute alignments with a favourable accuracy while using reasonable computational resources. It is also the only algorithm that is capable of aligning the larger networks from BioGrid. While NH is not guaranteed to give an optimal alignment, and the alignment quality may vary based on input ordering, it does provide a good alignment using minimal computational resources.
Footnotes
Acknowledgment
Funding: This work was supported by the Australian Postgraduate Award to A. Radu.
Author Disclosure Statement
The authors declare that no competing financial interests exist.
