Churn handling strategies for structured overlay networks: A survey

Abstract

Churn handling has become an active area of research due to increased complexity, high dynamicity of participating nodes and its effect on the performance of overlay network. In this paper, we systematically analyse churn, its causes and implications along with churn handling strategies employed in various state of the art structured overlay network. These churn handling strategies are classified into two main categories based on addressing effects of churn: Churn Resilience and Dynamic repair strategies. These techniques are further classified into sub categories based on their inherent mechanism for addressing a particular concern. In addition, this paper pinpoints various concerns related to the future research for handling participant’s churn.

Keywords

Structured overlay churn maintenance stabilization resilience availability

1. Introduction

The structured overlay networks are realized as an effective approach for management and utilization of a large number of available high configuration, interconnected computing devices, which otherwise remain idle. These high configuration devices voluntarily pool their resources for sharing them to a group of machines using structured overlay networks. The idea of structure here refers to the maintenance of tightly controlled topology as the network evolves and deterministic placement of resources on the overlay nodes. These structured overlay networks serve as a platform for many popular applications including file sharing (bit torrent), media streaming and internet telephony. The recent study has revealed that the overlay networks are the biggest consumers of the internet bandwidth and has a great future [1].

The operating environment of structured overlay networks becomes more inhospitable with the participation of multitude of heterogeneous, dynamic and mobile participants. This heterogeneity and dynamicity of overlay participants raises serious questions about the reliability of structured overlay networks under churn. The churn is a condition, where a large number of participants simultaneously join or leave the network such that stabilization of a network results in a significant down time. The churn becomes more severe due to presence of malicious users on the structured overlay network, who can plan random or strategic attack against legitimate users by exploiting the structural properties of overlay network. Thus, churn handling in structured overlay networks is the most critical issue to provide high and predictable level of service. This motivates the research community to focus their research interests towards churn handling strategies to guarantee the performance of structured overlay networks in highly dynamic environments [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14].

In this paper, we overview a number of research works on the maintenance of structured overlays, while categorizing them according to their approaches. These diverse research efforts of research community can be divided two main classes based on their underlying mechanism of handling churn: churn resilience and Dynamic Repair. The churn resilience strategy is based on the idea of maintaining the acceptable performance of overlay network even under multiple node failures. The second class of topology maintenance strategy is dynamic repair that concentrates on reducing the impact of churn on overlay topology by dynamically updating the routing table in the face of churn. These categories are further divided into subcategories according to their working behaviour.

Most of the previous review papers [15, 16, 17, 18, 19, 20, 21, 22, 23] focused on the survey of different overlay network schemes or their applications, security or vulnerabilities of overlay networks. However, this review paper concentrates on effectiveness of existing churn handling strategies of structured overlay networks to ensure its performance in dynamic environments. These churn handling strategies are the backbone mechanism of structured overlay network to maintain its structure in the face of participant dynamics. Below are some of the key advantages of our systematic reviews over other available surveys in this field:

This review answers the most basic question of overlay network community “Is there any protocol that provides resilient overlay network, which regain system performance in the face of churn without compromising the openness of the system”.

A comprehensive detailing of churn issues in structured overlay network to simplify the understanding the pool of related work.

A taxonomy analysis of different topology maintenance schemes.

This article is organized as follows. In the Section 2, we will discuss the basic concepts such as structured overlay networks, different churn definitions, factors affecting the churn and evolution of normal churn to intentional churn attack to target the availability of services on the structured overlay networks. This section also discusses the various design challenges and characteristics of an effective churn handling strategy. Section 3 provides the taxonomy of different churn handling mechanism. Sections 4 and 5 discuss the individual approaches of the proposed taxonomy in detail. Section 6 concludes this paper along with the discussion of the main research issues that need to be handled in future.

2. Basic concepts

The correctness and the performance of structured overlay networks mainly depend on its up to date routing table entries that together form an overlay topology with specific structural constraints (e.g. ring structure in the chord overlay network). But, due to their open nature in the real world, where nodes may join or leave the overlay network at any time. This continuous dynamism of peer participants in the form of continuous joining, leaving, and failure from the overlay network is known as churn. Churn has significant impact on the performance of structured overlay networks as it may generate a considerable traffic to accommodate the rebalancing of data among overlay participants and to update the routing table entries accordingly. Thus, significant churn may result in blocking of the normal search operation and result in lookup failures or inconsistent lookups. The most common causes of node’s unavailability are network failures, mobility, overload, crashes, or a nightly-shutoff schedule. Before discussing the existing overlay maintenance strategies, we provide a brief introduction to the basic concepts of structured overlay network in relation to the topology maintenance. It includes churn definition; churn attack, churn patterns of overlay participants in the real world applications and the need of the churn handling mechanism to aid the better understanding of the different churn handling procedures.

2.1 Churn

Figure 1.

Lifetime of a node.

The churn can be defined as the participation dynamics of autonomic overlay network participants. Each participating node may join or leave the overlay at any time. This continuous join and leave operations in a structured overlay network result in content unavailability and outdated routing tables. Because of this dynamics, the routing topology and content availability need to be maintained in order to guarantee overlay network performance. Autonomic computing principles are an attractive solution to tackle the problem of self organization of dynamic network, mainly due to node heterogeneity and high churn. There are mainly two definitions of churn available in the literature:

Definition (Global Churn): Churn can be defined as a measure of the total number of nodes joining or leaving the overlay network over a given time, also known as node turnover.

Definition (Local Churn): Churn is defined as several online (participating in overlay) and offline sessions in a node’s lifetime (starts with initial creation and ends with permanent departure) as shown in Fig. 1.

Churn is a constant problem often offsetting the advantages of structured overlay networks. The literature on the structured overlay networks addresses churn as a legitimate failure scenario to be handled by the inbuilt fault tolerance or self management capabilities of structured overlay networks. Researchers have conducted different studies [24, 25, 26, 27, 28, 29, 30] to understand the churn behaviour of participating nodes. These studies have reported the daily or weekly periodic uptime pattern in different file sharing applications [24, 25, 27, 28], instant messaging [30] and distributed computing applications [29]. Douceur studies the churn behaviour of Microsoft machines and observed that, some of the participating hosts have always on characteristic, while others have cyclic on/off patterns. This cyclic behaviour pattern of overlay nodes is exploited to propose a predictive stabilization strategy to schedule the stabilization operation appropriately.

2.2 Churn attack

Churn can be exploited as a tool to attack, the availability of overlay network by generating peers, joining and leaving the network quickly in order to corrupt the functionality of the overlay network. Churn attack is a peer to peer version of Denial of services (DoS) attacks in which malicious peers frequently join and leave the overlay network to induce a large amount of communication and processing overhead to make it incapable to serve legitimate overlay participants. These efforts of malicious attackers can be categorized into two main categories based on attacker’s capability: Random churn attack and Strategic churn attack.

In random intentional churn attack, an attacker can exploit the churn effect by triggering the fast join/ leave of a number of slave nodes to destabilize the routing infrastructure. This attack is a type of DDoS (Distributed Denial of services) attack on overlay networks and its impact can be amplified by Sybil nodes.

In strategic churn attack, an intelligent adversary can plan a strategic attack against a structured overlay network by continuously attacking its weakest part or pinpointing the attack to specified targets only. Such an attacker learns the topology of the overlay by inserting a crawler and then plans its attack accordingly to partition the overlay network.

Churn attacks are an artificially induced churn with potentially high rates to cause bandwidth consumption due to overlay maintenance. This leads to the worst case to denial of service or service degradation. Cooperative web caching [31] is an attractive application of structured overlay networks to eliminate the use of proxy servers by storing meta-information in the overlay nodes. However, this application is vulnerable to churn attack as an attacker can easily mount a distributed denial of service attack by crippling the sharing mechanism. For this, attackers can pose extreme stress in the form of maintenance overhead generated by a large number of concurrent nodes join/leave procedures, which otherwise considered non-malicious.

2.3 Churn patterns

The recent years have witnessed a lot of research from the overlay networks community to make them more robust in the churn periods. But, most of the researchers have presumed that the overlay nodes will join and leave the overlay networks unpredictably and their churn behaviour has no specific patterns. Thus; most of the research efforts focus on the development of the different ways to recover, maintain connectivity or to ensure availability of services in the face of churn.

However, several studies [25, 32, 33] have shown that the churn behaviour usually obeys a particular pattern and the state of the overlay node can be predicted from their online time history [29]. These works clearly show the recurrence in the online and offline behaviour of overlay nodes. The dynamic behaviour of overlay nodes depends on many factors such as underlying user’s online behaviour, geographical position, time of the day, day of the week etc. as reported in many overlay applications. Based on this observation, overlay nodes will have daily and weekly behaviour patterns based on the working habits of their underlying users. Different users will have different behaviour patterns.

The different overlay network applications will have different online patterns of overlay nodes, depending on their use. The news and communication applications may have their nodes online during the working hours of the region they belong, whereas media streaming applications and social network users may be online during non-working hours, weekends or during their breaks. The main aspect of this observation is that, these patterns are recurrent as users will have the same routine every day, every week or other repeating period. So, we can easily predict the future state of the overlay node based on the past observations. We can exploit these characteristics for efficient maintenance of structured overlay networks by implementing an intelligent prediction framework based on this human factor based workload for setting stabilization parameters appropriately. Researchers [34, 35, 36] have utilized this recurrent behaviour patterns for availability prediction and intelligent replica placement algorithm.

Most of the state of the art structured overlay networks [3, 4, 5, 6, 11] does nothing in order to ensure the connectivity of the overlay nodes ahead of time, rather they only react to the disconnection caused by the churn behaviour of the participant nodes. Some of the researchers [37] have proposed a statically resilient structured overlay network, but at the cost of larger routing table overhead.

Churn is an inherent dynamic characteristic of the overlay networks. Thus; it is a good idea to prepare for it ahead of time. Intelligent prediction of neighbouring node dynamism is helpful to improve the resilience and can reduce the overhead associated with topology maintenance of structured overlay networks. The short-term availability prediction of neighbouring nodes can serve as an important basis for scheduling maintenance operation on a particular structured overlay network node. This review paper proposes the use of a sophisticated predictive stabilization approach to schedule the next stabilization action by assessing the risk of failure of the neighbouring node for some time in the future.

2.4 Need of churn handling

The overlay networks are fundamentally open networks, where nodes may join or leave the overlay network at any time. The joining of a new node have a necessary condition to contact an existing node, whereas existing nodes may leave the overlay network ungracefully (without informing other nodes). This dynamicity raises the need of monitoring and maintenance of network connections to keep the routing table and the overlay structure up to date. The main idea is to enable the self organization of the overlay nodes in an autonomic manner.

Without employing a maintenance mechanism, the routing table entries may get stale and results in performance degradation as new nodes join or existing nodes leave the overlay network. The stabilization mechanism comprises of two parts: failure detection and node replacement. The failure detection mechanism periodically pings the neighbours to discover ungraceful leaves. After the detection of the failed routing table entry, it will be replaced by another appropriate alive node in the network. The method of detection and replacement of the failed node to ensure the structural integrity of the overlay network is known as stabilization mechanism.

A stabilization strategy is required to maintain the structural integrity by keeping the routing table entries (finger tables and successor pointers in case of chord) up to date in case of peer failure. The stabilization algorithm ensures the timely detection of node failures and keeps the routing table full with up-to-date entries. Thus, the stabilization mechanism is defined as the process of maintaining geometric and routing properties of an overlay network experiencing churn.

2.5 Design issues of a churn handling mechanism

The structured overlay networks are self organizing networks capable to handle high node dynamism. This self organization property ensures the legitimate configuration of overlay network in dynamic situation to maintain acceptable performance to its participants. But, to achieve this self organization, practical churn handling techniques are needed, which provide guaranteed overlay services without compromising the openness of overlay networks. Here, we highlight the issues and requirements of practical churn handling mechanism for the structured overlay network. The main issues while designing a practical churn handling mechanism are:

Overlay networks are practically open networks in which any new node can join or leave the network at any time. Thus a practical churn handling mechanism is needed that provide resilient overlay network structure in dynamic scenarios without compromising the openness of the system.

The majority of the internet devices connected to the internet is mobile devices with wireless links. So, most of the overlay network participants are mobile. Thus, an effective churn handling mechanism should be able to handle churn in mobile environment.

The overlay networks are composed of a large number of geographically distributed peers. Heterogeneous internet connectivity patterns and underlying user behaviour results in the variation of node behaviour. Whereas, most of the churn handling mechanisms either be designed by considering the capabilities of the weakest component (negative in mind) or by strongest component (positive in mind).

The overlay network is a set of independent nodes with no central administration. This decentralized nature of peer to peer networks offers many advantages, but it makes the monitoring and resilience to failure a complex task. Thus, while designing a churn handling mechanism; we need to keep in mind this decentralization and lack of global knowledge about the arrival/departure of nodes into account.

Due to the open nature of structured overlay networks, malicious nodes can easily become the part of the legitimate overlay network and can perform various malicious activities such as a DDoS attack, network partitioning and churn attack. Thus, along with the inherent churn feature of participants an effective maintenance strategy should be robust to this malicious behaviour.

2.6 Requirements of a practical churn handling mechanism

The churn handling mechanism requires efficient methods to find and replace stale routing table entries to retain the desired structure of the overlay network. The main requirements of a practical churn handling mechanism are as follows:

A practical and effective churn handling strategy must not induce high maintenance overhead for maintaining the structure of the overlay network. The excessive bandwidth requirements for the maintenance traffic will make the system impractical. Thus, a practical churn handling strategy has to maintain a balance between the maintenance traffic imposed, and desired performance level of the network.

A good churn handling strategy should be able to detect and correct the stale routing table entries timely to ensure the acceptable level of performance in terms of average number of successful query lookups.

Some researchers advocate the fact that an effective maintenance strategy should react to the changes in the environment, e.g. the churn handling strategy should behave according to the varying levels of churn. But, this feature could be exploited by the attackers to perform an attack on the availability of the overlay network by engaging its maximum bandwidth for maintenance action only. So, a good maintenance strategy should maintain a balance between reactivity to changes and performance of a structured overlay network.

The churn handling strategy should be efficient enough to maintain the desired performance level of structured overlay network without constraining their ever evolving nature. That is, a good churn handling strategy should not bind the arbitrary behaviour of overlay network participants.

A good churn handling strategy should be able to differentiate between malicious user induced churn and a legitimate churn. The inability to make this differentiation result in an inefficient maintenance mechanism that itself collapses the whole network.

A practical churn handling strategy should be decentralized that is, it should be able to maintain the structure of overlay network without requiring any centralized stabilization agent.

Figure 2.

Classification of churn handling strategies.

3. Taxonomy of churn handling strategies

Structured overlay networks are used for maintaining membership and exchanging queries and responses. In this class of overlays, the main goal of the overlay construction is reachability and resiliency and therefore the overlays are often richly connected. The following are some examples of the overlay network churn handling strategies, categorized by their underlying principle of handling churn: (1) Churn Resilience and (2) Dynamic repair strategies. These categories are further divided into sub-categories based on their underlying mechanism of topology maintenance. The Fig. 2 classifies the existing churn handling strategies based on their underlying principle.

4. Churn resilience strategies

Churn resilience is the ability of maintaining system invariants (i.e., overlay topology, lookup performance, etc.) despite the node arrival and departure. In other words, the arrival of new nodes or departure of participating nodes must not disrupt the functionality of the overlay networks. Existing churn resilience strategies are classified into two main approaches based on the churn behaviour they are addressing: Resilience to legitimate node failures and Resilience to churn attack. The resilience to node failure approach considers the legitimate churn behaviour of participants that is they may go offline due to their inherent nature or due to occasional failure. Whereas, the resilience to churn attack approaches makes the system capable to defend against intentional churn attack orchestrated by a malicious attacker.

4.1 Resilience to legitimate node failures

The resilience to node failures approaches will maintain the system performance under the legitimate node dynamics. These approaches can be classified into two main categories based on their underlying strategy to maintain the functional structured overlay network in the face of churn: static resilience and dynamic resilience approaches. The static resilience approach is based on the idea to maintain an acceptable level of performance in the churn scenario. That is, although the system is not fully repaired in churn scenario, but it should be fully functional. Whereas dynamic resilience approaches concentrate on improving the overlay network resilience dynamically by providing flexibility in the selection of certain design parameters such as routing table size and neighbour selection.

4.1.1 Static resilience

The functioning of structured overlay networks relies on unreliable peers that may fail or leave the network at any time. So, the designers of these networks focus on the formation of the resilient geometry of the structured overlay networks that can cope with legitimate node failures without any recovery mechanism. The static resilience is the structural resilience that is how robust is the topology of a structured overlay network to the failure of the participating nodes. Thus, the idea here is to introduce topologically rich architectures to improve the structural resilience in churn scenarios. Inherently resilient overlay architectures need less frequent stabilization routines for topology maintenance.

Table 1
Summary of the state of art structured overlay network geometries

Overlay geometry	Definition	Advantages	Drawbacks
Ring	Overlay nodes are connected to each other in one dimensional circle by maintaining a pointer to its successor node	Flexibility in neighbor and route selection	Low performance due to skewed IDs
Butterfly	Logarithmic diameter and constant degree network Each node maintains three set of neighbours: Predecessor and successor nodes (general ring pointers) Right and left connection to the next level along with one connection to the upper level of the butterfly network (butterfly pointers) Predecessor and successor links of the level ring (level pointers)	Small and constant size routing table	Poor neighbor and route selection flexibility
Hypercube	n-dimensional cube with 2n corners Each corner is connected to its n adjacent corners with the same edge length	Great flexibility in route selection	No flexibility in neighbor selection
Tree	Hierarchical organization of overlay nodes in order to provide efficient routing	Good neighbor selection flexibility	No route selection flexibility
XOR	Symmetric unidirectional tree structure based on the exclusive OR of 160 bit node ids	Good neighbor and route selection flexibility	Do not provide independent alternate path
Hybrid (e.g. Pastry)	Combination of two or more other geometries	Good route and neighbor selection flexibility Supports proxim- ity neighbor selection

Overlay geometry

Definition

Advantages

Drawbacks

Ring

Overlay nodes are connected to each other in one dimensional circle by maintaining a pointer to its successor node

Flexibility in neighbor and route selection

Low performance due to skewed IDs

Butterfly

Logarithmic diameter and constant degree network

Each node maintains three set of neighbours:

Predecessor and successor nodes (general ring pointers)

Right and left connection to the next level along with one connection to the upper level of the butterfly network (butterfly pointers)

Predecessor and successor links of the level ring (level pointers)

Small and constant size routing table

Poor neighbor and route selection flexibility

Hypercube

n-dimensional cube with 2n corners

Each corner is connected to its n adjacent corners with the same edge length

Great flexibility in route selection

No flexibility in neighbor selection

Tree

Hierarchical organization of overlay nodes in order to provide efficient routing

Good neighbor selection flexibility

No route selection flexibility

XOR

Symmetric unidirectional tree structure based on the exclusive OR of 160 bit node ids

Good neighbor and route selection flexibility

Do not provide independent alternate path

Hybrid (e.g. Pastry)

Combination of two or more other geometries

Good route and neighbor selection flexibility

Supports proxim- ity neighbor selection

Till date, various researchers have studied the static resilience of different routing geometries [4, 38, 39, 40]. Some of the researchers also study the resilience of state of the art geometries to random failure of the nodes and targeted attacks. These works have classified the overlay geometries into six main categories: Ring, Hypercube, Butterfly, Tree, XOR and Hybrid geometry. These geometries along with their advantages and disadvantages are discussed in Table 1. It is beyond the scope of our review to discuss these geometries in detail.

The Table 2 compares the routing performance of these geometries with respect to the following criteria’s: Flexibility of Neighbour Selection, Flexibility of Route Selection and their Fault Tolerance Capability. In this subsection, we discuss the robustness property of the state of the art flat geometries to accommodate the dynamic behaviour of its participating nodes.

Flexibility in neighbour selection: The flexibility of neighbour selection is defined as the number of choices available for the selection of its neighbouring nodes. This freedom of neighbour selection boosts the resiliency of structured overlay network to dynamic peer behaviour.

Flexibility in route selection: The route selection flexibility is defined as the freedom of selection of next hop in order to forward the lookup query towards the destination. The resiliency of a structured overlay network is more for more flexible overlay geometry.

Routing table size: The size of the routing table is defined as the number of overlay node’s connections that a particular node has to maintain in the form of routing address. The more is the number of connections an overlay node has to maintain more will be the flexibility of route selection. However, the higher degree corresponds to higher maintenance overhead in case of joining, leaving or failure of nodes.

The static resilience of the DHTs (Distributed Hash Table) is analysed based on their underlying geometries. The main feature to determine the robustness of the geometry is the offered alternating paths to a destination. The more is the number of alternating paths available, more will be the resilience to churn. However, the number of alternate path corresponds to the high node degree, which implies more overhead in case of joining, leaving or failure of nodes. Table 2 summarizes the route and neighbour selection flexibility provided by different geometries. The flexibility to choose neighbours allows the selection of closer and highly available peers, thus improve routing latency or may result in dynamic resilience to failures in case of dynamic systems. While, the route selection flexibility will result in increased fault tolerance in case of route failures due to departed neighbours and have a small impact on the routing latency too.

Table 2

Routing performance comparison of structured overlay geometries

Overlay geometry	Flexibility of route selection	Flexibility of neighbor selection	Routing table size
Ring	c ${}_{1}$ $\log$ N	n ${}^{\log n/2}$	$\log$ N
Butterfly	1	1	1
Hypercube	c ${}_{1}$ $\log$ N	1	2d
Tree	1	n ${}^{\log n/2}$	$\log_{B}$ N
XOR (Kademlia)	c ${}_{1}$ $\log$ N	n ${}^{\log n/2}$	B $\log_{B}$ N $+$ B
Hybrid (Pastry)	c ${}_{2}$ $\log$ N	n ${}^{\log n/2}$	( $\log_{B}$ N)

The experimental results of [39] confirm the hypothesis that more paths, offer better resilience to the dynamic overlay participants. Their analysis results clearly specify that, the ring and hypercube geometries have high resilience, whereas the tree and butterfly geometries have low resilience among the analysed DHTs (Distributed Hash Table).

4.1.2 Dynamic resilience

Dynamic resilience is the resilience achieved by the flexible selection of design parameters such as the intelligent selection of neighbouring nodes and node degree of the overlay network participants to increase the routing redundancy to sustain acceptable performance in the face of churn. The selection of these parameters has a different impact on the resilience of structured overlay networks in churn scenarios.

The implementation of structured overlay networks in the practical environment has to consider a trade-off between the routing performance and maintenance cost by the appropriate selection of routing table size and expected hop count. At one end of the spectrum, there are designs that route lookups within the logarithmic bounds (logarithmic to the size of the network), with the maintenance of small routing tables per node. Thus, enhances the scalability of the architecture and ensures the lower maintenance overhead in the face of churn (nodes join and leave the network). The examples of structured overlay networks with logarithmic lookup bounds include Chord [5], Pastry [4] Kademlia [41] and Tapestry [3] with the routing table size O( $\log$ N), where N is the size of the network.

Table 3
Taxonomy of notable structured overlay networks

Structured	Topology	Lookup hop count	Join/leave	Routing table size
overlay network			performance
Chord	Circular Node-ID space, logarithmic degree mesh	O( $\log$ N), where N is the number of peers	( $\log$ N) ${}^{2}$	$\log$ N
CAN	d-dimensional Cartesian coordinate space	(d/4)*n ${}^{1/d}$ , N is number of nodes, and d is dimension	2d	2d
Pastry	Plaxton-style mesh network, prefix routing	O( $\log_{B}$ N), where N is number of peers, and B $=$ 2b, b is number of bits of NodeID	( $\log_{B}$ N)	Blog ${}_{B}$ N $+$ Blog ${}_{B}$ N
Tapestry	Uniformly at random from a large identifier space (typically 160 bit with a globally defined radix)	O( $\log_{\beta}$ N), where N is the size of the identifier space, and $\beta$ is the radix used	( $\log_{B}$ N)	$\log_{B}$ N
Kademlia	XOR metric for distance between points in the key space	O( $\log_{B}$ N) $+$ c, Where N is number of peers, B $=$ 2b, b is number of bits of Node-ID, and c is a small constant	( $\log_{B}$ N) $+$ c	Blog ${}_{B}$ N $+$ B
Viceroy	Butterfly network with connected ring of predecessor and successor links; data managed by Servers.	O( $\log$ N)	$\log$ N	$\log$ N
P-Grid	Binary tree	O( $\log$ N), where N is number of data items in the overlay	$\log$ N	Each peer in P-Grid contains a routing table which contains an entry for every bit in the binary tree
Epichord	Circular node ID space, logarithmic degree mesh	O(1) under lookup intensive workloads, and O(logN) in the worst case	$\log$ N	n
D1HT	Hashing keys and peers into an ID space [0, N], N $\gg$ n	O(1)	1	n

At the other end of the spectrum, there are overlay network designs with larger routing table size and have a constant lookup hop count to route a lookup query to the destination node. However, this larger routing table size requires more bandwidth to maintain the routing state up to date under churn. Thus; are not suitable for the applications that are prone to churn. Examples of these structured overlay network designs include EpiChord [11] and D1HT [37], that maintain sufficient routing information to resolve lookups in a single hop count irrespective of the size of the overlay. The performance comparison of different structured overlay network designs in terms of the lookup hop count, join/leave cost and routing table size is provided in Table 3.

For the performance improvement of structured overlay networks in churn scenario, researchers [42, 43, 44] have analysed the impact of node degree, neighbour selection and neighbour replacement strategies on (selection of neighbours is random instead of sequential) the state of the art structured overlay networks. The resilience of structured overlay networks is very sensitive to these low level design choices, but this feature is still least explored in the peer to peer literature. So, in this section we will discuss these design choices and their impact on the performance of the structured overlay networks.

Node degree or dynamic routing table size: The overlay network designers can tune the node degree of the overlay nodes to achieve different tradeoffs between lookup latency and maintenance traffic in the churn scenario. Most of the proposed structured overlay network designs offer some flexibility in the size of per node routing table. Thus, overlay designers can exploit this feature to adjust the fault resilience and efficiency of the overlay network at the cost of maintaining the larger routing table per node. The higher node degree usually induces strong connectivity and therefore good resilience in the face of churn. However, they will require more maintenance traffic to keep them up to date. So, a careful parameter choice will result in demanding fault tolerance with acceptable maintenance traffic under a certain percentage of node failure.

Several researchers [45, 46] have explored this trade-off to suggest a fixed degree overlay network with provably minimum lookup latency and maximum connectivity with bounded bandwidth overhead in the face of churn. The first study is conducted by Liben-Nowell et al. in 2002 to examine the fault resilience of the chord overlay network to derive lower bounds on the node degree of overlay nodes to maintain a sufficient level of performance under concurrent join/leave of nodes. In 2005, Li et al. have proposed a bandwidth efficient routing table management protocol that adapts its routing table size to provide best performance according to the user’s bandwidth budget. This system mainly maintains large routing tables for small and stable overlay networks and shrinks the size of the overlay network with increased churn in order to reduce the maintenance overhead.

Neighbour selection strategies: The flexibility of neighbour selection property can be exploited to improve the performance of structured overlay networks in the churn scenario by intelligent or randomized selection of neighbouring nodes (fingers in case of chord) instead of using predefined node identifier combinations. This flexibility provides the opportunity to consider different factors such as node lifespan, availability, etc. other than node identifiers to reduce the impact of churn.

The researchers have compared the performance of different intelligent neighbour selection policies [44] including randomized neighbour selection [9, 39, 43, 47] policies in terms of their lookup performance and resilience to churn. The main factors that will influence the selection of neighbouring nodes are the proximity and their liveness information. The liveness information of the overlay nodes can be used to predict their remaining uptime or to consider the node with the longest current uptime as the most reliable one to be the neighbour. These factors results in three main neighbour selection policies: Proximity Neighbour Selection (PNS), Random Neighbour Selection (RNS) and Liveness Neighbour Selection (LNS). The proximity neighbour selection policy improves the lookup performance by choosing physically close nodes as neighbours whereas random neighbour selection algorithm uses a uniform random node selection policy instead of using any predefined criteria. The Liveness neighbour selection policy populates its neighbouring table with the nodes that tends to stay alive for a long time. The LNS (Liveness Neighbour Selection) policy will result in stable neighbourhood and reduces the bandwidth consumption due to periodic probing and timeouts due to neighbour failures. The careful analysis of these policies shows the significant performance improvement by using LNS (Liveness Neighbour Selection) policy in churn scenario. These policies mainly find rules for intelligently selecting $k$ nodes from a network of n nodes where $n>k$ such that the churn rate gets minimized.

In 2013, the researchers [48] have come up with a new idea of couple nodes to eliminate the churn effect by utilizing the complementary nature of their join and leave behaviour. In this scheme coupled nodes cache resources and monitor each other. In case of departure of one node, its spouse node will hold its responsibility for resource services temporarily.

Parallel Lookups: The phenomenon of churn degrades the lookup performance and increases the lookup latency of overlay networks either due to route failure or node failure. A large number of researchers [11, 41, 46] have examined the improvement in the performance of structured overlay networks by employing lookup parallelism in the churn scenario. In a parallel lookup scenario, the requestor will initiate multiple lookups simultaneously, some of these lookup requests will fail due to stale routing table entries, and the lookup process can continue process without being blocked. However, the selection of optimal degree of parallelism is important for the balancing between message overhead and latency reduction. In 2007, Wu et al. [49] have provided a simulation study to select the optimal degree of parallelism. However, existing studies still are not sufficient to answer the optimal parallelism degree and whether periodic pinging to check routing entry liveness is a more efficient use of bandwidth than sending parallel lookups.

4.2 Resilience to churn attack

Churn is an attractive tool for an adversary to destabilize the structured overlay network [31]. In this attack, a large number of malicious users will join and leave the network frequently in order to increase the bandwidth consumption due to overlay maintenance. The state of the art structured overlay network architectures [4, 5, 9, 50, 51, 52] consider churn as a legitimate node behaviour and provide simple maintenance mechanisms with significant recovery time to handle a set of node failures [44, 48, 53]. Researchers have been aware of this attack for quite a while [18, 54, 55, 56, 57] and various solutions have been presented to thwart this attack [3, 51, 58, 59, 60] but until recently no solution can provably cope with this attack without compromising the openness of the overlay network. Moreover, most of the proposed solutions [58, 59] are static as they can handle only bounded number of node failures. Kuhn et al. [58] have proposed an efficient but a complex architecture to counter an intelligent adversary by continuously shifting newly joined nodes to less sparse areas. In contrast, our proposed technique is simpler and addresses more realistic random intentional churn attack by allowing the new nodes to join as a leaf node and thereby isolating the effect of leave of malicious nodes without affecting the rest of the overlay.

A large number of researchers are currently working on structuring the overlay networks in a hierarchical manner [61, 62, 63, 64, 65, 66, 67, 68, 69, 70] in order to achieve better efficiency, performance, maintenance cost and load balancing. Mariela et al. [69] have evaluated the hierarchical DHTs in the churn scenario of mobile nodes and clearly state the effectiveness of hierarchical DHTs as compared to flat DHTs. However, they have not considered the effect of malicious adversary, who maliciously triggers a large number of join requests for limited lifetime to bring the entire system down.

Table 4 summarizes the comparison of performance of different state of the art flat and hierarchical structured overlay networks in terms of lookup hop count and cost of join and leave in the form of messages sent per join/leave event.

5. Dynamic repair strategies

The dynamic participation of overlay network nodes results in stale routing table entries of the structured overlay network. The dynamic repair is the process of dynamically updating these routing table entries to maintain the structural integrity of the overlay network to accommodate the churn behaviour of the overlay network participants. The static and dynamic resilience of structured overlay networks deals with their ability to survive the churn behaviour before taking any repair action. The comparison of these three strategies is provided in Table 5.

The dynamic repair, also known as the maintenance strategy is needed to maintain the structural integrity by compensating for the changes in the overlay structure due to peer failure or network connection failure. The structured overlay network implementation has an explicit node join mechanisms, but leaving is implicit as nodes may simply go offline or crash. Both graceful leave and unexpected failure of participating nodes require the implementation of a maintenance strategy to repair the routing tables of online nodes. Thus, a maintenance strategy can be defined as a routine to maintain the structural integrity in case of peer or network connection failure. This maintenance routine will be responsible for ensuring node failures are detected, and the routing tables are kept as full and up-to-date as possible.

Table 4
Comparison of performance of different overlay architectures

Technique	Topology	Join	Leave	Hops
CHORD	Flat	O( $\log^{2}$ n)	O( $\log^{2}$ n)	O( $\log$ n)
Pastry	Flat	O( $\log^{2}$ n)	O( $\log^{2}$ n)	O(n $\log$ n)
TLS	Flat	O( $\log$ n)	O( $\log$ n)	O( $\log$ n)
BATON	Flat	O(6 $\log$ n)	O( $\log$ n)	O( $\log$ n)
Chordella	Hierarchical	O( $\log^{2}$ N)	O( $\log^{2}$ N)	O( $\log$ n)

In case of super peer failure. Where N is the number of super peers.

Table 5

Comparison of different churn handling approaches

Techniques	Static resilience	Dynamic resilience	Dynamic repair
Definition	How well overlay networks work in the presence of node failures and before the restoration of the routing tables	Intelligent selection of the overlay design parameters to survive the churn	Process of repopulating the routing tables of live nodes with the updated neighbours list to accommodate the churn
Approach	Resilient geometry formation	Flexible selection of overlay design parameters	Repair action to remove the failed nodes and to incorporate the new neighbours
Nature	Proactive	Proactive	Proactive/Reactive
Measurement	Performance evaluation of overlay	Tuning of different parameters	Execution of the dynamic repair
technique	architecture with static routing table	under different workloads and network conditions	strategies under different churn scenarios
Contribution	Measure of desirable quickness of the dynamic repair algorithm	Optimized performance at the cost of higher node degree, intelligent neighbour selection and implementation of the parallel lookup procedure	To achieve stable dynamic equilibrium for different conditions while ensuring successful routing

The dynamic repair can be characterized into two main types: proactive and reactive repair. The proactive repair strategy executes the maintenance routine periodically whereas in the reactive maintenance strategy the maintenance routine is triggered immediately after the failure detection.

5.1 Proactive repair

The proactive repair [3, 4, 5] strategies maintain the overlay network structure by running a stabilization routine periodically. Periodic maintenance is employed by using scheduled probing to check the health of neighbouring nodes, even when it is not needed. The main purpose of the periodic stabilization is to incorporate the new nodes and remove the failed nodes from the overlay network. But, this joining and leaving of node may occur at any time and do not have any fixed interval. So, the performance of the periodic stabilization depends on the period (stabilization interval chosen). The choice of stabilization interval further classifies the proactive repair in two categories: Fixed interval and Adaptive Stabilization.

5.1.1 Fixed interval stabilization

The fixed interval proactive maintenance [3, 4, 5] approaches keep-up the overlay network structure by using a periodic stabilization mechanism, where period is chosen without taking into consideration the churn rate of the overlay network. The performance of the periodic stabilization [3, 4, 5] technique depends on the stabilization interval chosen. If stabilization interval is selected pessimistically, it may result in performance degradation in case interval chosen is longer as compared to the frequency of network change. However, the optimistic choice of stabilization interval may lead to higher overhead as it will select frequent stabilization. Thus, the appropriate selection of stabilization interval is critical for the achievement of desirable reliability and performance.

The current approach for configuring the structure overlay network is static. That is, design parameters such as stabilization interval, routing table size etc. are selected statically. This setting will result in optimal performance, if either the perfect information about the future is available or the network and operating conditions remain static throughout the system lifetime. As the structure overlay network evolves over time, thus static configuration will result in a low lookup success ratio, high failure rate and more communication overhead [13].

State-of-the-art structured overlay networks [3, 4, 5] employ a periodic stabilization scheme, where each node periodically contact its neighbours to check whether they are online or offline and update their routing table in order to improve the robustness under churn. This approach uses a fixed stabilization interval without taking churn rate of the network into account. However, various studies [13] argue that churn rate is not fixed. So, selection of appropriate stabilization interval has significant impact on the performance of overlay both in terms of robustness and control overhead.

Researchers [9, 13] have analyzed the various inconsistencies in the performance of structured overlay networks employing proactive stabilization in the face of churn. Krishnamurthy et al. [14] analytically studied the chord [5] architecture under churn and observed a significant degradation in the routing efficiency due to incorrect or dead entry in the longest finger. Thus, the stabilization mechanism needs to be triggered frequently to ensure the correct and efficient working of overlay networks.

5.1.2 Adaptive stabilization

Due to the criticality of the period selected in fixed interval stabilization techniques, several researchers have come up with the idea of adaptive stabilization that will choose the period for stabilization routine by considering the present network conditions and adjust it dynamically. The adaptive stabilization procedure considers the continuous evolution of network conditions of structured overlay networks and tries to adapt to them by continuous collection and processing of statistical data about network dynamics and adjusting its design parameters accordingly. This technique is quite effective as compared to the fixed interval stabilization where period is chosen passively [13].

Several researchers [71, 72] have worked on improving the performance fixed period stabilization techniques by adjusting the stabilization rate, according to churn conditions or by sharing the probe load among the nodes having common neighbours or by limiting the sending of stabilization messages to its direct successors and failure of successor will be broadcasted to all the routing table entries. In the adaptive stabilization framework [13], a node will estimate the churn, build a stochastic network model and then adjust stabilization rate by using statistical analysis to satisfy the QoS requirements.

Researchers [73] have extended the periodic stabilization process to handle extreme events like network partitions in the face of churn by introducing a new mechanism of merger. The merger messages will die out when the overlay architecture converges to a ring again. In 2015, Paul et al. [74] has modified this mechanism to trigger the merger in a proactive manner by maintaining a knowledge base at each participating node.

Table 6
Comparison of proactive and reactive repair strategies

Features	Proactive	Reactive
Basic approach	Periodic stabilization mechanism to check the health of neighbouring nodes, even when it is not needed	Triggered stabilization, where, the stabilization will be performed only after the detection of the failure
Nature	Periodic	Triggered
Liveness	Live	Event driven
Adaptability to varying churn rate	No	Yes
Parameter selection	Stabilization Interval can be chosen optimistically, pessimistically or statistically	Failure detection algorithm
Performance	Depends on the choice of the stabilization interval	Driven by the success of the failure detection algorithm used
Resilient to churn attack	Yes	No (make the induced churn attack more severe by congesting the overlay network with maintenance traffic)
Maintenance overhead	High	Low
Safety	Lookup inconsistencies and uncorrected false suspicions can be introduced	Yes
Decentralization	Yes	Yes
Loss of P2P benefits	No	No

5.2 Reactive repair

In contrast to proactive repair, the reactive repair strategies [7, 10, 11] are based on the concept of triggered stabilization in view of sporadic churn behaviour of overlay participants. Thus; maintenance routine will be called only after the detection of node failure. The Table 6 provides the comparison of the proactive and reactive repair strategies. The reactive repair strategies involve two phases: failure detection phase and failure recovery phase. In the recovery phase, the missing entry in the routing table of the peer is replaced by sending a contact request to a new peer. The reactive repair strategies can be classified into two categories based on the execution time of recovery phase: Eager and Lazy maintenance. The eager techniques execute the recovery action as soon as a change is detected. Whereas, the lazy repair strategies delay the implementation of recovery action either up to the failure to deliver a data packet or until all the routing table entries of a node become stale. In this sub-section, we will discuss different reactive maintenance strategies in detail.

5.2.1 Eager repair

The eager repair strategy will trigger the maintenance action as soon as a failure is detected. This strategy is also known as correction-on-change strategy as the maintenance action will be taken as soon as any change in the form of join/leave/failure of a node is detected. The maintenance action comprises of updating of successor and predecessor pointers along with the triggered update of finger table. In Correction-on-change- technique, whenever a change is detected, all nodes that depend on the node where the change occurred are corrected [7].

However, Krishnamurthy et al. [14] highlighted that the eager repair strategy is vulnerable to network congestion as it can create a positive feedback loop and exacerbating existing problems.

5.2.2 Lazy repair

The lazy repair strategy delays the maintenance action, even after the detection of failure of few nodes. This technique tries to remove the vulnerability of eager repair techniques to attack the structured overlay network’s availability by triggering the maintenance action simultaneously over many nodes. The lazy repair strategy is further of two types: Correction on use or correction on failure. These two techniques, correction on use and correction on failure, update the routing tables lazily after detecting the failure of data packet during system use or when all the routing table entries become stale.

In the correction-on-use technique, the failure of the neighbouring node is detected, when a message sent to the neighbour is not acknowledged within the given time. This failure should be tackled by performing finger table updating in the form of piggybacked maintenance messages on the lookup requests. So, the more network usage will result in more accurate finger tables without incurring extra maintenance message overhead. This scheme will not apply any probe based ping mechanism; instead the failure of the node will only be detected on the use of each route for message delivery. This technique is also known as the opportunistic maintenance strategy, as it performs maintenance opportunistically by piggybacking the maintenance messages over application traffic. In 2003, Alima et al. [7] have provided a new infrastructure for overlay networks that eliminate the unnecessary bandwidth consumption for maintenance by correcting routing table entries on-the-fly while performing lookups. As an example, Epichord [11] have used the opportunistic routing table maintenance. The main opportunistic aspect of Epichord’s routing table maintenance algorithm is the addition of new neighbours to the routing table only upon receiving the lookup request from them and removing peers after considering them dead. In 2005, Li et al. have proposed a new routing protocol Accordion [46], which adjusts its routing table size according to node’s bandwidth limit and network churn rates. The adjustment of the routing table size is done using opportunistic aspect, as each node will learn about its neighbours based on the ordinary lookup messages. Each Accordion node will keep only those neighbours in its routing tables from which it has recently been heard from and a new neighbour will only be added upon receiving a lookup request only.

In the correction-on-failure maintenance technique, the maintenance technique relies on the high degree of redundancy in the routing tables. Thus, instead of repairing it immediately or opportunistically, this scheme will perform maintenance only when the routing fails or when all the routing table entries of a node becomes stale and further routing is not possible. This scheme is very suitable for the networks, where the nodes will only be temporarily offline and will reappear after some time.

5.3 Proactive vs. reactive repair

A good maintenance strategy should be adaptable to the varying levels of churn with minimum overhead. Performing periodic maintenance, when it is not needed leads to increased overhead and reactive maintenance occurs only after causing routing inconsistencies. Although, reactive maintenance has minimum overhead and triggered immediately after the detection of the failure, but this reactive maintenance strategy can bring down the network instead of repairing it by congesting the network with maintenance traffic or can easily be exploited by a malicious attacker.

The main driving factor behind the implementation of the predictive stabilization is to provide affordable, preventive maintenance for the structured overlay networks. For predictive maintenance, researchers [75, 76] have proposed two main data driven approaches for prediction: Hidden Markov Model (HMM) and Neuro-Fuzzy, and compare their performance to forecast the failure of neighbouring nodes in order to schedule the next stabilization routine. The HMM is a statistical model, which finds the relationship between different variables to predict the failure of nodes. Whereas, Neuro-Fuzzy is an artificial intelligence based models that learns from the data instead of using explicitly programmed instructions.

The Table 6 concludes that, both periodic and reactive maintenance approaches have drawbacks that make them unsuitable in certain situations. Performing periodic maintenance, when it is not needed leads to increased overhead and reactive maintenance occurs only after causing routing inconsistencies. Thus, recently researchers have [75, 76] have proposed a new class of maintenance mechanism: Predictive Maintenance. The predictive maintenance strategy comes into play after predicting the failure of neighbouring nodes and before causing any routing inconsistency unlike reactive stabilization. The ability to predict or estimate the node failure enables to perform preventive maintenance on abnormal nodes instead of all nodes and results in affordable system operation by providing triggered maintenance.

6. Conclusion and future directions

In this paper, we have analyzed the problem of churn in the decentralized structured overlay networks along with the detailed study of the existing churn handling mechanism by categorizing them according to their underlying principle of handling churn: Churn resilience and Dynamic Repair. We have also explored the issues involved in designing and implementing efficient maintenance techniques to improve the performance and reliability of structured overlay networks. These discussions lead to some interesting conclusions. First, although static resilience is an important first step while choosing structured overlay network’s ability to handle node failures and an important design parameter, but it does not provide the clear picture of system resilience unde continuous churn and repair mechanisms. The main function of static resilience is to determine the desirable quickness of dynamic repair algorithm to ensure the trouble free operation of the overlay networks. Second, the main purpose of the dynamic resilience is to enhance the resilience of the system under node failures by the appropriate selection of design parameters. This can be achieved by tuning different overlay design parameters under different work loads and network conditions to determine the competitive performance of overlay network architecture. The dynamic resilience provides the clear idea of system resilience under continuous churn and repair mechanism. However, both, the static and dynamic resilience of structured overlay networks deals with their ability to survive the churn behaviour and do not accommodate it by taking any repair action. Third, the dynamic repair strategies can accommodate the churn behaviour of overlay nodes by repopulating the routing tables of live nodes with updated neighbour list. This repair action can be scheduled proactively or reactively for structural integrity maintenance. However, the cost of proactive repair and vulnerability of reactive repair strategies to churn attack highlight the need of predictive repair mechanism for the affordable and preventive maintenance of structured overlay networks. Fourth, the open nature of the overlay networks makes it vulnerable to malicious attackers, who can easily exploit the churn handling mechanisms to attack the availability of the structured overlay networks. So, appropriate measures should be taken to propose churn handling mechanisms robust to such attacks. The research issues discussed in this paper highlights some issues that should be considered in the further research. First, the heterogeneity and mobility properties of the overlay nodes results in a complex operating environment for overlay applications. Thus; more sophisticated behaviour studies of the overlay nodes are required for understanding and modelling of churn behaviour. These behaviour studies will form an important basis for the design of churn handling mechanisms. Second, the overlay network applications involve a large number of geographically distributed overlay hosts with different time zones. Depending on these time zones, different overlay nodes will have different business hours and their online behaviour may vary accordingly and results in a different churn patterns. Thus, researchers can utilize this diversity for the churn handling and replica placement for the development of more fault-tolerant overlay architectures. Third, Churn is an attractive tool for an adversary to destabilize the structured overlay network. Researchers have been aware of this attack since quite a while and various solutions have been presented to thwart this attack, but until recently no solution can provably cope with this attack without compromising the openness of the overlay network. Fourth, the maintenance protocol of structured overlay networks itself is attackable. For example, the reactive maintenance strategy can be exploited to bring down the network instead of repairing it by congesting the network with maintenance traffic by generating a false detection. Similarly, the maintenance protocol can also be exploited to launch an eclipse attack against the overlay network by routing table poisoning. Thus, the future research work should concentrate on the development of churn aware routing table updation for the development of efficient structured overlay networks and appropriate measures should be taken to secure the overlay network against malicious attackers without compromising its open nature.

Footnotes

Authors’ Bios

Ramanpreet Kaur received a Bachelor of Engineering degree in Computer Science from SBSCET, Ferozpur in 2007 and M.Tech in information security from National Institute of Technology, Jalandhar in 2009. She has submitted her Ph.D in the Department of CSE, NIT Jalandhar in network security. She was previously associated with Punjab Technical University, Jalandhar as a Lecturer. She is currently working as an Assistant Professor in the Department of CSE at Jaypee University of Information Technology, Waknaghat. Her research interests are P2P networks, Maintenance & security, Network Security and DDoS Attacks.

A.L. Sangal is professor in the Department of Computer Science and Engineering, at Dr B.R. Ambedkar National Institute of Technology Jalandhar (Punjab), India since 2008. He obtained his B.Tech Degree from Panjab Engineering College Chandigarh, India in the year 1985 and Master Degree in the year 1991 from Thapar Institute of Engineering & Technology Patiala (Punjab), India and Ph.D. in the field of Scientific Computing from Dr B.R. Ambedkar National Institute of Technology Jalandhar (Punjab), India in 2008. He has served as a faculty at Thapar Institute of Engineering & Technology Patiala and Dr B.R. Ambedkar National Institute of Technology Jalandhar and served as Head of the Department of Computer Science and Engineering since 1992–2006. He has published many research papers in the field of Scientific Computing and Wireless Networks in International Journal and Conferences and presently guiding six Research Scholars of Ph.D/ M.Tech.

Krishan Kumar received B.Tech (1995) in Computer Science & Engineering from National Institute of Technology NIT, Hamirpur, Himachal Pradesh, M.S (2001) in Computer Science and Engineering from Birla Institute of Technology and Science BITS, Pilani, Rajasthan and Ph.D in “Protection from Distributed Denial-of-Service DDoS attacks in ISP Domain” from Department of Electronics & Computer Engineering, Indian Institute of Technology (IIT), Roorkee (2008). His research area Intrusion Detection, Protection from Internet Attacks, Web performance, Network architecture/protocols, and Network measurement/modeling.

References

Zhao

B.Y.

Kubiatowicz

and Joseph

A.D.

, Tapestry: An infrastructure for fault-tolerant wide-area location and routing, Technical Report UCB//CSD-01-1141, U C Berkeley (2001), 1–28.

Kim

, Spectrum Future, Accessed 4 April 2017, https://spectrumfutures.org/5g-requires-virtualization-of-network-core-and-edge/.

Zhao

B.Y.

Kubiatowicz

and Joseph

A.D.

, Tapestry: An infrastructure for fault-tolerant wide-area location and routing, U C Berkeley, Technical Report (2001), 1–28.

Rowstron

and Druschel

, Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems, in: IFIP/ACM International Conference on Distributed Systems Platforms and Open Distributed Processing, Springer (2001), 329–350.

Stoica

Morris

Liben-Nowell

Karger

D.R.

Kaashoek

M.F.

Dabek

and Balakrishnan

, Chord: A scalable peer-to-peer lookup protocol for internet applications, IEEE Transactions on Networking (TON) 11(1) (2003), 17–32.

Kaashoek

M.F.

and Karger

D.R.

, Koorde: A simple degree-optimal distributed hash table, in: International Workshop on Peer-to-Peer Systems, Springer (2003), 98–107.

Alima

L.O.

El-Ansary

Brand

and Haridi

, DKS (N, k, f): A family of low communication, scalable and fault-tolerant infrastructures for P2P applications, in: 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, IEEE (2003), 344–350.

Aberer

Cudré-Mauroux

Datta

Despotovic

Hauswirth

Punceva

and Schmidt

, P-Grid: A self-organizing structured P2P system, ACM SIGMOD Record 32(3) (2003), 29–33.

Castro

Druschel

Y.C.

and Rowstron

, Topology-aware routing in structured peer-to-peer overlay networks, in: Future Directions in Distributed Computing, Springer (2003), 103–107.

10.

Ghodsi

Alima

L.O.

and Haridi

, Low-bandwidth topology maintenance for robustness in structured overlay networks, in: Proceedings of the IEEE 38th Annual Hawaii International Conference on System Sciences, IEEE (2005), 302a-312a.

11.

Leong

Liskov

and Demaine

E.D.

, Epichord: Parallelizing the chord lookup algorithm with reactive routing state management, Computer Communications 29(9) (2006), 1243–1259.

12.

Porter

Taiani

and Coulson

, Generalized repair for overlay networks, in: 25th IEEE Symposium on Reliable Distributed Systems, IEEE (2006), 132–142.

13.

Ghinita

and Teo

Y.M.

, An adaptive stabilization framework for distributed hash tables, in: IEEE 20th International Parallel and Distributed Processing Symposium, IEEE (2006), 10–20.

14.

Krishnamurthy

El-Ansary

Aurell

and Haridi

, Comparing maintenance strategies for overlays, in: The 16th IEEE Euro Micro Conference on Parallel, Distributed and Network-Based Processing, IEEE (2008), 473–482.

15.

Kurian

and Sarac

, A survey on the design, applications, and enhancements of application-layer overlay networks, ACM Computing Surveys (CSUR) 43(1) (2010), 5–39.

16.

Lua

E.K.

Crowcroft

Pias

Sharma

and Lim

, A survey and comparison of peer-to-peer overlay network schemes, IEEE Communications Surveys & Tutorials 7(2) (2005), 72–93.

17.

Malatras

, State-of-the-art survey on P2P overlay networks in pervasive computing environments, Journal of Network and Computer Applications 55(1) (2015), 1–23.

18.

Wallach

D.S.

, A survey of peer-to-peer security issues, in: Software Security-Theories and Systems, Springer (2003), 42–57.

19.

Saroliya

and Shrivastava

, Security problems and their upshots in routing protocols of DHT based overlay networks, Journal of Theoretical and Applied Information Technology 1(1) (2005), 19–23.

20.

Ding

Balasingham

and Bouvry

, Management of overlay networks: A survey, in: Third International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, IEEE (2009), 249–255.

21.

Pathan

A.M.K.

and Buyya

, A taxonomy and survey of content delivery networks, Grid Computing and Distributed Systems Laboratory, University of Melbourne, Technical Report (2007), 4–48.

22.

Pallis

and Vakali

, Insight and perspectives for content delivery networks, in: Communications of the ACM 49(1) (2006), 101–106.

23.

Liu

Guo

and Liang

, A survey on peer-to-peer video streaming systems, Peer-to-peer Networking and Applications 1(1) (2010), 18–28.

24.

Chu

J.C.

Labonte

K.S.

and Levine

B.N.

, Availability and locality measurements of peer-to-peer file systems, in: ITCom: The Convergence of Information Technologies and Communications (2002), 310–321.

25.

Bhagwan

Savage

and Voelker

, Understanding availability, in: International Workshop on Peer-to-Peer Systems II, Springer (2003), 256–267.

26.

Douceur

J.R.

, Is remote host availability governed by a universal law? in: ACM SIGMETRICS Performance Evaluation Review 31(3) (2003), 25–29.

27.

Castro

Costa

and Rowstron

, Performance and dependability of structured peer-to-peer overlays, in: IEEE International Conference on Dependable Systems and Networks, IEEE (2004), 9–18.

28.

Guha

Daswani

and Jain

, An experimental study of the Skype peer-to-peer voip system, in: Proceedings of the 5th International Workshop on Peer-to-Peer Systems (IPTPS ’06), ACM (2006), 1–6.

29.

Mickens

J.W.

and Noble

B.D.

, Exploiting availability prediction in distributed systems, in: Proceedings of the 3rd conference on Networked Systems Design & Implementation, ACM (2006), 6–6.

30.

Javadi

Kondo

Vincent

J.M.

and Anderson

D.P.

, Mining for availability models in large-scale distributed systems: A case study of seti@ home, in: IEEE International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, IEEE (2009), 276–283.

31.

Linga

Gupta

and Birman

, A churn-resistant peer-to-peer web caching system, in: Proceedings of the ACM workshop on Survivable and Self-regenerative Systems: In association with the 10th ACM Conference on Computer and Communications Security, ACM (2003), 1–10.

32.

Saroiu

Gummadi

Krishna

and Steven

, Measurement study of peer-to-peer file sharing systems, in: Electronic Imaging International Society for Optics and Photonics 4673(1) (2002), 156–171.

33.

Stutzbach

and Rejaie

, Understanding churn in peer-to-peer networks, in: Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, ACM (2006), 189–202.

34.

Le Blond

Fessant

F.L.

and Merrer

E.L.

, Finding good partners in availability-aware P2P Networks, Anglais, Technical Report (2009), 472–484.

35.

Pace

Quema

and Schiavoni

, Exploiting node connection regularity for DHT replication, in: IEEE Symposium on Reliable Distributed Systems, IEEE (2011), 111–120.

36.

Song

Kim

and Seo

, Replica placement algorithm for highly available peer-to-peer storage systems, in: Proceedings of the First International Conference on Advances in P2P Systems (AP2PS), ACM (2009), 160–167.

37.

Monnerat

L.R.

and Amorim

C.L.

, D1HT: A distributed one hop hash table, in: 20th IEEE International Parallel and Distributed Processing Symposium, IEEE (2006), 1–10.

38.

Ratnasamy

Francis

Handley

Karp

and Shenker

, A scalable content-addressable network, in: Proceedings of the ACM SIGCOMM Computer Communication Review 31(4) (2001), 161–172.

39.

Gummadi

Gribble

Ratnasamy

Shenker

and Stoica

, The impact of DHT routing geometry on resilience and proximity, in: Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM (2003), 381–394.

40.

Loguinov

Kumar

Rai

and Ganesh

, Graph-theoretic analysis of structured peer-to-peer systems: Routing distances and fault resilience, in: Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM (2003), 395–406.

41.

Maymounkov

and Mazieres

, Kademlia: A peer-to-peer information system based on the XOR metric, in: International Workshop on Peer-to-Peer Systems, Springer (2002), 53–65.

42.

Chun

B.G.

Zhao

B.Y.

and Kubiatowicz

J.D.

, Impact of neighbour selection on performance and resilience of structured P2P networks, in: International Workshop on Peer-to-Peer Systems, Springer (2005), 264–274.

43.

Manku

G.S.

Naor

and Wieder

, Know the neighbour’s neighbour: The power of look ahead in randomized P2P networks, in: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, ACM (2004), 54–63.

44.

Godfrey

Shenker

and Stoica

, Minimizing churn in distributed systems, in: Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM (2006), 147–158.

45.

Liben-Nowell

Balakrishnan

and Karger

, Analysis of the evolution of peer-to-peer systems, in: Proceedings of the Twenty-first Annual Symposium on Principles of Distributed Computing, ACM (2002), 233–242.

46.

Stribling

Morris

and Kaashoek

M.F.

, Bandwidth-efficient management of DHT routing tables, in: Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation, ACM (2005), 99–114.

47.

Zhang

Goel

and Govindan

, Incrementally improving lookup latency in distributed hash table systems, in: ACM SIGMETRICS Performance Evaluation Review 31(1) (2003), 114–125.

48.

Meng

Chen

and Ding

, Using the complementary nature of node joining and leaving to handle churn problem in P2P networks, Computers & Electrical Engineering 39(2) (2013), 326–337.

49.

Tian

and Ng

K.W.

, An analytical study on optimizing the lookup performance of distributed hash table systems under churn, Concurrency and Computation: Practice and Experience 19(4) (2007), 543–569.

50.

Hildrum

Kubiatowicz

J.D.

Rao

and Zhao

B.Y.

, Distributed object location in a dynamic network, in: Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, ACM (2002), 41–52.

51.

Rhea

Geels

Roscoe

and Kubiatowicz

, Handling churn in a DHT, in: Proceedings of the USENIX Annual Technical Conference (2004), 127–140.

52.

Saia

Fiat

Gribble

S.D.

Karlin

A.R.

and Saroiu

, Dynamically fault-tolerant content addressable networks, in: Peer to Peer Systems 1 (2002), 270–279.

53.

Bustamante

F.E.

and Qiao

, Designing less-structured P2P systems for the expected high churn, IEEE/ACM Transactions on Networking (TON) 16(3) (2008), 617–627.

54.

Sit

and Morris

, Security considerations for peer-to-peer distributed hash tables, in: International Workshop on Peer-to-Peer Systems, Springer (2002), 261–269.

55.

Srivatsa

and Liu

, Vulnerabilities and security threats in structured overlay networks: A quantitative analysis, in: 20th Annual Computer Security Applications Conference, IEEE (2004), 252–261.

56.

Pretre

, Attacks on peer-to-peer networks, Department of Computer Science, Swiss Federal Institute of Technology (ETH) Zurich, 2005.

57.

Kwok

Y.K.

, Autonomic peer-to-peer systems: Incentive and security issues, in: Autonomic Computing and Networking Denko

M.K.

Yang

L.T.

and Zhang

, eds, Springer, 2009, pp. 205–236.

58.

Kuhn

Schmid

and Wattenhofer

, Towards worst-case churn resistant peer-to-peer systems, in: Distributed Computing 22(4) (2010), 249–267.

59.

Misra

and Plaxton

C.G.

, Active and concurrent topology maintenance, in: International Symposium on Distributed Computing, Springer (2004), 320–334.

60.

Dick

M.E.

Pacitti

Akbarinia

and Kemme

, Building a peer-to-peer content distribution network with high performance, scalability and robustness, in: Information Systems 36(2) (2011), 222–247.

61.

Yang

B.B.

and Garcia-Molina

, Designing a super-peer network, in: Proceedings of the 19th IEEE International Conference on Data Engineering, IEEE (2003), 49–60.

62.

Garces-Erice

Biersack

E.W.

Ross

K.W.

Felber

P.A.

and Urvoy-Keller

, Hierarchical peer-to-peer systems, Parallel Processing Letters 13(4) (2003), 643–657.

63.

Nejdl

Wolpers

Siberski

Schmitz

Schlosser

Brunkhorst

and Löser

, Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks, in: Proceedings of the 12th International Conference on World Wide Web, ACM (2003), 536–543.

64.

Sacha

Dowling

Cunningham

and Meier

, Discovery of stable peers in a self-organising peer-to-peer gradient topology, in: IFIP International Conference on Distributed Applications and Interoperable Systems, Springer (2006), 70–83.

65.

Zöls

Hofstätter

Despotovic

and Kellerer

, Achieving and maintaining cost-optimal operation of a hierarchical DHT system, in: Proceedings of the IEEE International Conference on Communications, IEEE (2009), 2194–2199.

66.

Trifa

and Khemakhem

, A novel replication technique to attenuate churn effects, in: Journal of Peer-to-Peer Networking and Applications 9(2) (2016), 344–355.

67.

Korzun

and Gurtov

, Hierarchical architectures in structured peer-to-peer overlay networks, in: Journal of Peer-to-Peer Networking and Applications 7(4) (2014), 359–395.

68.

Koskela

Harjula

Kassinen

and Ylianttila

, Robustness of a P2P community management system based on two-level hierarchical DHT overlays, in: IEEE Symposium on Computers and Communications (ISCC), IEEE (2011), 881–886.

69.

Mariela

Rocamora

Rhodette

and Pedrasa

, Evaluation of hierarchical DHTs to mitigate churn effects in mobile networks, Computer Communications 85(1) (2016), 41–57.

70.

Ganesan

Gummadi

and Garcia-Molina

, Canon in G major: Designing DHTs with hierarchical structure, in: Proceedings of the 24th International Conference on Distributed Computing Systems, IEEE (2004), 263–272.

71.

Freedman

M.J.

Lakshminarayanan

and Stoica

, Non-transitive connectivity and DHTs, in: WORLDS 5(1) (2005), 10–10.

72.

Ghodsi

, Distributed k-ary system: Algorithms for distributed hash tables, Doctoral Dissertation, The Royal Institute of Technology, 2006.

73.

Shafaat

T.M.

, Partition tolerance and data consistency in structured overlay networks, Doctoral Dissertation, KTH, School of Information and Communication Technology, 2013.

74.

Paul

R.R.

Roy

P.V.

and Vlassov

, Interaction between network partitioning and churn in a self-healing structured overlay network, in: 21st International Conference on Parallel and Distributed Systems (ICPADS), IEEE (2015), 232–241.

75.

Kaur

Sangal

A.L.

and Kumar

, Modelling and simulation of adaptive neuro-fuzzy based intelligent system for predictive stabilization in structured overlay networks, Engineering Science and Technology, An International Journal 20(1) (2017), 310–320.

76.

Kaur

Sangal

A.L.

and Kumar

, Performance analysis of predictive stabilization for churn handling in structured overlay networks, in: Proceedings of the International Conference on Big Data and Advanced Wireless Technologies, ACM (2016), 39–45.

77.

Kaur

Sangal

A.L.

and Kumar

, A persistent structured hierarchical overlay network to counter intentional churn attack, in: Journal of Computer Networks and Communications (2016), 1–15.

Churn handling strategies for structured overlay networks: A survey

Abstract

Keywords

1. Introduction

2. Basic concepts

2.1 Churn

2.3 Churn patterns

2.4 Need of churn handling

2.5 Design issues of a churn handling mechanism

2.6 Requirements of a practical churn handling mechanism

4. Churn resilience strategies

4.1 Resilience to legitimate node failures

4.1.1 Static resilience

Table 1 Summary of the state of art structured overlay network geometries

Table 3 Taxonomy of notable structured overlay networks

5. Dynamic repair strategies

Table 4 Comparison of performance of different overlay architectures

5.1.1 Fixed interval stabilization

5.1.2 Adaptive stabilization

Table 6 Comparison of proactive and reactive repair strategies

5.2.1 Eager repair

5.2.2 Lazy repair

5.3 Proactive vs. reactive repair

6. Conclusion and future directions

Footnotes

Authors’ Bios

References

Table 1
Summary of the state of art structured overlay network geometries

Table 3
Taxonomy of notable structured overlay networks

Table 4
Comparison of performance of different overlay architectures

Table 6
Comparison of proactive and reactive repair strategies