An optimal storage and repair mechanism for Group Repair Code in a distributed storage environment

Abstract

This paper aims to reduce the storage space required for data storage in a distributed storage environment and it provides an optimal repair bandwidth when a system failure occurs. Previous scientific literature suggests various approaches such as replication, erasure code, local reconstruction, regenerating codes etc. to overcome from system failure. These approaches are applied on archival storage, cloud storage etc. to provide data availability and reliability. Although, these approaches have proved efficient, but they have their own strengths and weaknesses as some of them deals with storage improvement and others focus on providing an effective repair mechanism. In this paper, we present a new approach, Group Repair Codes, which provides optimal repair bandwidth by replicating the nodes and calculating parity nodes for smaller groups. In comparison to approaches (hybrid and double code) that provide optimal repair, it utilizes less storage space. Moreover, it improves fault tolerance, disk reads and data transferred by the system in case of failure of nodes. The current study is conducted considering various existing approaches like replication, erasure codes, LRC, hybrid and double coding that were implemented to manage the big data. The results reported in the paper prove the suitability of our approach. We have also discussed the significance of intelligent system for the present study. We are intended to propose an intelligent based system for Group Repair Codes in the near future. We believe that our research will be beneficial for several communities such as cloud storage, big data and distributed storage.

Keywords

Network coding storage repair bandwidth distributed storage system artificial intelligence replication erasure coding fault tolerance double coding hybrid coding

1. Introduction

Today we are surrounded by social networking, digitization, e-commerce, data analytics and lot many things that are available to us online. In order to share relevant information, various exchange mechanisms are available which generate enormous amount of data. Various researchers have suggested different methods to store and manage this data. The data is thus stored in distributed fashion with a goal to provide 100% availability and prevent the data from being lost due to any permanent or temporary failures occurring from natural or unnatural disasters. Peer-to-Peer systems, distributed and cloud storage systems implement various coding methods to store massive amount of data [1]. The most popular real time applications are OceanStore [2], Windows Azure [3], Amazon, Google File System [4], HDFS-RAID Xorbus [5], NCCloud [6].

The simplest method that can be applied in distributed storage is replication. Replication stores the same data at different locations. Maintaining multiple copies/replica at different nodes makes data readily available to the users by connecting to the nearest location where the data is stored. In case any node gets damaged, its replica stored at some other location can serve the users’ requirements and the lost data of the failed node can be recovered from any of its replica nodes [7, 8]. Though it is easier to maintain, the biggest drawback of this scenario is that it requires a lot of space because the same data has to be stored multiple times.

To address the storage issue discussed above, network coding [9] has been used. It divides the original data into smaller units, apply codes on it to obtain few redundant units and then stores the encoded data (which has original fragmented data and redundant units both) onto different nodes. To obtain the original data it has to be collected from multiple nodes and then reconstructed. If any node gets damaged, the remaining nodes are used to recover the data of the failed node. Although network coding reduces the storage requirements considerably, but it introduces some other significant considerations like 1) encoding and decoding complexities, 2) number of nodes accessed for reconstructing the original data, 3) number of nodes required to recover data of the failed nodes, 4) number of faults that can be tolerated by the system simultaneously, 5) amount of data read and transferred by each node for reconstruction or repair, 6) bandwidth required for repairing a failed node and many more.

Erasure codes [7, 10], a type of network coding technique, proves efficient for storage requirement. Reed-Solomon erasure codes utilize the minimum storage space but other issues were not efficiently dealt with. The research was carried forward to improve other parameters and as a result local reconstruction code [3, 5] was implemented to reduce node accessed for repair and repair bandwidth, but this approach used slightly more storage. New class of codes introduced by Dimakis, called the regenerating codes [11, 12] is available in two flavors – minimum storage and minimum bandwidth. In this the author had explained the use of functional repair to obtain minimum storage and minimum bandwidth of repair values. Other methods of storing data are hybrid code [13] and double coding [14, 15] which are a combination of replication and erasure codes. These codes have the optimal repair bandwidth, better fault tolerance, node access, data read and data transfer but the storage requirement in these methods is almost double the storage required in erasure codes.

The problem identified in all these methods is the trade-off between storage and repair [16], which are the two main parameters of concern. Erasure codes have minimum storage requirements but did not improve upon repair bandwidth. Other mentioned codes improved repair bandwidth by slightly increasing the storage requirement. In simple words, if the cost of repair is reduced, the storage is increased and if the cost of storing the data is reduced, the cost of repair is increased.

Motivated by this we propose an algorithm with an objective to provide an optimal repair bandwidth as in hybrid and double codes but with reduction in amount of storage that was used by these systems. The main contributions of our approach are: 1) to provide optimal repair bandwidth during failure of a node, 2) improved storage space required by the system, 3) large number of faults tolerated simultaneously, and 4) less number of nodes accessed for repair, resulting in 5) reduced disk read and data transfer.

The purposed GRC method can be applied for big data centers, cloud storage and distributed storage systems. The encoding algorithm used in Group Repair Codes (GRC) and its decoding mechanism is discussed in this paper. A comparison with existing literature and a quantitative analysis proves GRC to be efficient for various parameters already mentioned. Further, the GRC algorithm can be combined with machine learning techniques to identify confidential or sensitive data so that it can be protected from public access and provide higher security. Also, storing the data intelligently adds value to the data and may provide support for data analytics and efficient data retrieval.

The remainder of this paper is structured as follows: the relevant background and related work by researchers is discussed in Section 2. The proposed work is explained in Section 3 along with the algorithm used for encoding in Group Repair Code. This section also gives the decoding mechanisms and presents the main outcomes of our research. In Section 4, we analyze the performance of the proposed approach by comparing it with existing approaches. The result of quantitative analysis proves GRC to be efficient on multiple parameters out of which reducing the storage is our major objective. Section 5 gives the overview of how artificial intelligence can be combined with data storage and benefit us. Finally, we conclude the paper in Section 6 with future research directions.

2. Related work

This section highlights the research carried out by various scientists. Network coding has been applied in various fields like network communication, satellite communication, wireless sensor network and distributed storage to name a few. Our focus here is distributed storage systems. The data is stored in big data centers to provide data availability and reliability. Different approaches have been applied to store the data. Simplest of all the approaches is replication [7, 8] that maintains the same copy of the data at different locations. The users can access their data from any of the nearest location. In case any node gets damaged, its data can be recovered from other nodes where the same data is stored. Replication has been successfully used in Amazon Dynamo, Google File System [4], Cassandra of Facebook and other file storage systems. Maintaining only two replicas may result in data loss because in doing so, it may sometimes happen that while recovering the data of failed node from its only replica, this replica is also damaged, resulting in data loss. So, considering the mean time between failures of a node and the time required to recover the data, the minimum replicas required for reliable storage is three. Also it has been proved on PlanetLab trace that by maintaining three replicas, the data will never be lost [8]. Triple replication is thus widely used but it leads to three times storage requirement. To store $M$ bytes of data three copies are maintained, utilizing a total of $3M$ bytes. For recovery, only one node containing its replica is accessed and complete data is transferred from the replica node to the recovery node. The bandwidth utilized for repair is thus $M$ .

The large storage requirement and bandwidth for repair means lots of resources are required. With a prediction of increase in the amount of data from terabytes to petabytes in the nearer future, replication does not prove to be an efficient way for storage. So, the storage systems implemented network coding techniques. By using erasure codes [7, 10], the storage requirements are reduced to a large extent. The popular erasure coding techniques like Reed-Solomon Codes [17], EVENODD Codes [18], STAR Codes [19], Row Diagonal Parity (RDP) Codes [20], X-Code [21] and Fountain Codes [22] have been used in storage systems like OceanStore [2], Cleversafe, Facebook-HDFS RAID, HDFS [23], HP and IBM [16, 24]. In the recent research, erasure codes have also been used for hot storage or caching systems. Before that it was limited to cloud and archival data [25]. Procodes, a proactive erasure code have been implemented in cloud storage on HDFS RAID of Facebook which predicts the failure of disk prior to its occurrence and provides safety to the data [26]. The erasure codes divide the data into $k$ smaller fragments, calculate $r$ redundant bits from them and then store the data in $n=k+r$ nodes. The original data can be reconstructed by decoding any $k$ out of $n$ nodes. Erasure codes are Maximum Distance Separable (MDS) code and can thus tolerate up to $r$ faults and recover the data using any $k$ survival nodes. For storing a data of size $M$ bytes, over $n$ nodes a total of $n\cdot\frac{M}{k}$ bytes are required, each node storing $\frac{M}{k}$ bytes. Although this method utilizes the minimum storage space and increase the number of faults that can be tolerated but, the bandwidth required to recover the data of failed node is still $M$ . This is because for recovering data of a single node, any of the $k$ nodes transfer $\frac{M}{k}$ bytes of data stored in it, resulting in same value of repair bandwidth as was required for replication. In other words, to recover $\frac{M}{k}$ bytes of a node, the repair bandwidth is $k\cdot\frac{M}{k}=M$ , which is very high.

The repair bandwidth was reduced in Local Reconstruction Codes (LRC) [3, 5], Locally Repairable Codes [11, 12] and Regenerating Codes [9, 10]. In LRC, $k$ data fragments are divided into $l$ local groups and for each group local parities are calculated resulting in $l$ local parities along with $r$ global parities. At any time the $k$ fragments will serve the users demand for data and in case of failure it will first try to recover data locally using local parity otherwise it will be recovered globally. For storing the data of size $M$ bytes the total storage is $n\cdot\frac{M}{k}$ bytes. In LRC, the total number of nodes, $n=k+l+r$ where, $k$ is the number of data fragments, $l$ is number of local parities and $r$ is the number of global parities. Clearly, LRC require slightly more storage space as compared to erasure codes but for repairing a node, minimum $\frac{k}{l}$ nodes are accessed for repair within a group, each node transferring $\frac{M}{k}$ bytes resulting in $\frac{M}{l}$ repair bandwidth. LRC is not an MDS code so it does not tolerate $n-k$ failures but instead is capable of tolerating arbitrary $r+1$ and upto maximum $r+l$ failures. LRC is applied for cloud storage system, the XORBAS version of Hadoop and Windows Azure Storage (WAS) [3]. Binary Locally repairable codes have been used in HDFS [28, 29].

Figure 1.

System model for Group Repair Code (GRC).

The Regenerating Codes (RC) [11, 12] is of two major types: Minimum Storage Regeneration (MSR) and Minimum Bandwidth Regeneration (MBR). It divides the data into $k$ fragments, applies functions on each data fragment, calculates redundancies and then stores it in $n$ different nodes. For a data of size $M$ bytes each node store $\geqslant\frac{M}{k}$ bytes. MSR use minimum storage of $n\cdot\frac{M}{k}$ bytes, with higher repair bandwidth value and MBR provides minimum repair bandwidth by utilizing storage $\geqslant n\cdot\frac{M}{k}$ bytes. The users’ data is obtained by connecting to any $k$ out of $n$ nodes. Data of failed node is recovered by transferring a fraction of data from all the $d$ surviving nodes where, $k\leqslant d\leqslant n-k$ . The RC codes have been applied in NCCloud [6], P2P backup systems, archival storage.

The above mentioned erasure codes, LRC and RC approaches have reduced either storage or repair bandwidth, but the optimal repair bandwidth is obtained from hybrid coding [13] and double coding [14, 15]. These codes use a combination of both replication and erasure codes. The hybrid code maintains one original copy of the data along with erasure coded data ending up with a total storage of $M+n\cdot\frac{M}{k}$ bytes. On failure of any erasure coded node its data may be easily recovered from other erasure coded data nodes or it can be recovered by accessing the original copy. If recovery is done from original copy then only one node is accessed and the part of data that was stored in the damaged node is transferred, thus reducing the repair bandwidth to the optimal value of $\frac{M}{k}$ [13]. The hybrid codes have been applied in DHT [13], OceanStore [2] and P2P systems. Double coding first apply erasure code to distribute the data on $n$ different nodes and then replicates each node thus using $2\cdot n$ nodes and storage of $2\cdot n\cdot\frac{M}{k}$ bytes. The storage requirement in double code is twice than that in erasure code. For recovery of a failed node, its corresponding replica is accessed leaving the repair bandwidth to $\frac{M}{k}.$ In case both replicas fail then the data is recovered by erasure coding method. In both these codes replication helps in faster recovery of failed nodes. Only one node is accessed and no decoding is done while repairing replica nodes. The major drawback of hybrid and double coding is increased storage due to replication. We thus need an approach which is capable of providing optimal repair bandwidth and at the same time stores the data more efficiently. In this paper we propose an algorithm, Group Repair Codes which is motivated from optimal repair of hybrid and double coding.

3. Approach adopted: Group Repair Codes

The roots of the proposed work, Group Repair Codes (GRC) lie in replication and erasure code with an objective to provide optimal repair bandwidth along with reduced storage. The GRC divides the data into fragments to form smaller groups in which they are further replicated and coded. Before explaining GRC, first let us recall what Reed-Solomon code RS(6,10) does. It divided the data to be coded in 6 equal fragments and distribute over 6 data nodes and then 4 parity nodes are computed using all these data nodes [17]. The total number of nodes here are 10 and each data node stores 1/6 ${}^{\rm th}$ part of data. The original data is obtained by combining data of any 6 nodes and in case any node fails, its data is recovered by decoding data of any 6 nodes out of 10 nodes. Considering a similar example, we explain the GRC approach. In Group Repair Codes, the original data is divided into 6 equal fragments and distributed over 6 nodes. These nodes are divided equally into 2 groups. Within each group one parity node is computed from the data nodes and the data nodes are replicated. Clearly, each group will now contain 7 nodes (3 data nodes, 3 replicated data nodes and 1 parity node) and the system with 2 groups have a total 14 nodes as shown in Fig. 1.

Here, replication allows failures to be recovered only by accessing a single replica node and directly transferring its data thus reducing the repair bandwidth. If we use only two replica nodes without using any parity node and in case both the replicas fail simultaneously, then this results in data loss. And using three replica nodes is same as triple replication wherein the cost of storage is very high. So, GRC approach maintains two copies of the same data along with parity node to prevent from data loss. The storage requirement is thus reduced to a large extent. In GRC, replica nodes help in faster recovery and provide optimal repair bandwidth value. The purpose of parity node is that in case both of the replica nodes fail simultaneously then the data can be recovered from remaining nodes and the parity node in that group. In this case also, the recovery is done with comparatively less value of repair bandwidth because the less number of nodes are accessed within the group.

3.1 System model

Consider that the data of size $M$ bytes has to be stored using Group Repair Codes. The notation used in this approach is given in Table 1. The GRC consists of a set of $k$ nodes named as $x_{1},{x}_{2},\ldots,x_{i},\ldots,x_{k}$ , another set of $k$ nodes named ${x}_{1}^{\prime},{x}_{2}^{\prime},\ldots,{x}_{i}^{\prime},\ldots,{x}_{k}^{\prime}$ and $g$ nodes named as $p_{1},p_{2},\ldots,p_{j},\ldots,p_{g}$ . The total number of nodes used in this algorithm is thus $n=2k+g$ where $k$ indicates the number of fragments into which the data is divided and $g$ indicates the number of groups. We choose the value of $k$ and $g$ such that $k$ is a multiple of $g$ .

$\textit{GRC}(k,g)$ first divides the data of size $M$ bytes into $k$ equal size fragments and then store the data of each fragment over $k$ different nodes $x_{1},{x}_{2},\ldots,x_{i}$ , $\ldots,x_{k}$ . Each node now has $m=\frac{M}{k}$ bytes of data. These $k$ nodes are grouped into $g$ groups and within the group each node is replicated on to another set of $k$ nodes by copying the data of node $x_{i}$ to node ${x}_{i}^{\prime}$ for $i=1$ to $k$ and the data of parity node are calculated using data nodes of a group. Each group has $\frac{k}{g}$ original data nodes, $\frac{k}{g}$ replica nodes and one parity node. The system model for GRC is shown in Fig. 1.

Table 1
Notations used for Group Repair Codes

Notation	Description
$M$	Size (in bytes) of original data to be encoded and stored
$k$	Number of fragments in which original data is divided
$g$	Number of smaller groups
$n$	Total number of data nodes after encoding
$x_{i}$	Data nodes having part of original data $(i=1\ldots k)$
$x^{\prime}_{i}$	Data nodes having replicate data $(i=1\ldots k)$
$p_{j}$	Parity node $(j=1\ldots g)$

3.2 Encoding in Group Repair Code

We now introduce the algorithm used for encoding data in Group Repair Code. In order to build the entire system encoding is done using the following algorithm:

Encoding Algorithm: Group Repair Code, GRC

(k,g)

Consider the data of size

M

bytes stored on node

X

, have to be coded and stored at various locations to provide availability and reliability of the data. Total node required for GRC is

n=2k+g

. The nodes are labeled as

x_{1},x_{2},\ldots,x_{i},\ldots,x_{k},x^{\prime}_{1},x^{\prime}_{2},\ldots,x^{% \prime}_{i},\ldots,x^{\prime}_{k},p_{1},p_{2},\ldots,

p_{g}

where

k

is the number of nodes into which the original data is divided and

g

is the number of groups formed.

1.
Determine the value of $g$ and $k$ depending on the size of data such that $k$ is a multiple of $g$ . (Example, for g $=2$ , the possible values of $k$ can be {6, 8, 10}) so that each group has $\frac{k}{g}$ fragments of data. 2.
Divide the data of the node $X$ into $k$ equal size blocks such that each block has $m=\frac{M}{k}$ bytes of data.

/The $k$ nodes are labeled as $x_{1},x_{2},\ldots,x_{i},\ldots,x_{k}$ contain first replica, the next set of $k$ nodes labeled as $x^{\prime}_{1},x^{\prime}_{2},\ldots,x^{\prime}_{i},\ldots,x^{\prime}_{k}$ contain second replica of data and the $g$ nodes labeled as $p_{1},p_{2},\ldots,p_{j},\ldots,p_{g}$ contain parity data. So the total number of nodes, $n=2k+g$ /.
3.
Initially all nodes are empty, data of nodes $[x_{1},x_{2},\ldots,x_{i},$ $\ldots,x_{k},\ldots,x^{\prime}_{1},x^{\prime}_{2},\ldots,x^{\prime}_{i},\ldots% ,x^{\prime}_{k},\ldots,p_{1},p_{2},\ldots,p_{j},$ $\ldots,p_{g}]=0$ .
4.
$i=1,count=0,j=1,ptr=$ first address of the storage node of size $M$ bytes.
5.
Do

a.
$count++$
b.
[copy and replicate the data of size m bytes in node $x_{i}$ and $x^{\prime}_{i}$ from node $X$ ]

data $x_{i}\left[1..m\right]=$ data ${x^{\prime}}_{i}\left[1..m\right]=$ data $X[ptr..ptr+\left(m-1\right)]$
c.
$ptr=ptr+m$ // ptr points to the beginning of next block
d.
$data∼{}p_{j}=data∼{}p_{j}\oplus data∼{}{x}_{i}$ // calculating the data of parity node
e.
$i++$
f.
If $\left(count==k/g\right)$

then $j++$ and $count=0$

6.
while $(i\leqslant k)$

We describe the encoding algorithm with value of $k=6$ and $g=2$ . The original data of size $M$ bytes stored in node X is divided into $k=6$ fragments, each having $\frac{M}{k}$ bytes of data and distributed over the nodes $x_{1},x_{2},x_{3},x_{4},x_{5}$ and $x_{6}$ . These nodes form $g=2$ groups in which the data nodes are replicated and one parity node per group is calculated. The first group has total seven nodes out of which three nodes $x_{1},x_{2},x_{3}$ have original data, three nodes $x^{\prime}_{1},x^{\prime}_{2},x^{\prime}_{3}$ have replicas and node $p_{1}$ is a parity node. Similarly, the other group comprises of three original nodes, $x_{4},x_{5},x_{6}$ , its replica nodes, $x^{\prime}_{4},x^{\prime}_{5},x^{\prime}_{6}$ and a parity node $p_{2}$ . Here, $x^{\prime}_{1},x^{\prime}_{2},x^{\prime}_{3},x^{\prime}_{4},x^{\prime}_{5}$ and $x^{\prime}_{6}$ are exact copies of their corresponding nodes $x_{1},x_{2},x_{3},x_{4},x_{5}$ and $x_{6}$ respectively. So if any of these nodes fail then its data can be recovered using its corresponding copy. In case both the copies are damaged, parity node comes into picture. For instance if node $x^{\prime}_{2}$ fails then its data can be recovered from its corresponding replica node $x_{2}$ and vice-versa. But if both $x_{2}$ and $x^{\prime}_{2}$ fails then any one node say $x_{2}$ is recovered using a set of data nodes, $x_{1},x_{3}$ and parity node, $p_{1}$ in its group. The data of node $x_{2}$ is then replicated to node $x^{\prime}_{2}$ for its recovery.

3.3 Decoding in Group Repair Codes

The data that is encoded and stored using the GRC encoding algorithm needs to be decoded in two cases. First, the decoding for reconstruction is done when the user demands for the data. For this $k$ nodes, each with a different copy of data are required to transfer the data to the user where it is combined to obtain original data. Second, the decoding for regeneration is done when the data of the failed node has to be recovered using other nodes. The repair bandwidth required to recover the data depends on the type of failure. If data node fails then GRC obtains the optimal repair bandwidth which is equal to the amount of data that is recovered. For this only one single node is accessed and the data is directly copied to the required node. But in case of failure of both replicas or failure of parity node decoding is done using the other nodes of the group that were used for encoding. Below are few theorems which provide the value of repair bandwidth for different failure patterns in the GRC approach.

Theorem 1. The repair bandwidth for failure of a data node is equal to $\frac{M}{k}$ which is the optimal value.

Proof: In GRC, each node stores $m=\frac{M}{k}$ bytes of data. In case of failure of a data node ${x}_{i}$ or ${x^{\prime}}_{i}$ , its data can be recovered from its corresponding replica data node. Only one node is accessed for repair and no calculation is required while decoding because the data is just copied from one node to the other. For example, consider that the node ${x}_{\mathrm{1}}$ fails, its data can be recovered from its exact replica ${x}_{1}^{\prime}$ by transferring the entire data of this node. The amount of data transferred in this case is the data stored in that node i.e. $m=\frac{M}{k}$ bytes. Hence, the repair bandwidth for recovery of data node is $\frac{M}{k}$ . This value is optimal because the amount of data transferred is equal to the amount of data stored in that data node.

Theorem 2. The repair bandwidth for failure of a parity node is $\frac{M}{g}$ .

Proof: While encoding the data of parity node is calculated from the data nodes in a group. If the parity node $p_{j}$ fails, then it can be recovered from the $\frac{k}{g}$ data node of that group. This requires $\frac{M}{k}$ bytes of data to be transferred from each of the $\frac{k}{g}$ data node in the group. Thus the repair bandwidth becomes $\frac{k}{g}\cdot\frac{M}{k}$ which is equal to $\frac{M}{g}$ . For example, for GRC(6,2) if $p_{\mathrm{1}}$ fails then its data can be recovered by calculating the parity bits of 3 nodes $x_{1},x_{2}$ and $x_{3}$ . Each node transfer $\frac{M}{6}$ bytes of data stored in it resulting in the repair bandwidth of $3\cdot\frac{M}{6}=\frac{M}{2}$ .

Theorem 3. The repair bandwidth for failure of both replicas of the data node simultaneously is $\frac{M}{g}+\frac{M}{k}$ .

Proof: If both the replica nodes ${x}_{\mathrm{I}}\mathrm{}$ and ${\mathrm{}x}_{\mathrm{I}}^{\prime}$ fails then one replica node is recovered by decoding the parity node and the survival data nodes of that group. The second replica node is then recovered by copying the data from the repaired node. Repairing data of the first data node requires bandwidth of $\frac{M}{g}$ and for second data node requires a bandwidth of $\frac{M}{k}$ . Thus the total repair bandwidth is $\frac{M}{g}+\frac{M}{k}$ . For example, if both ${x}_{\mathrm{2}}\mathrm{}$ and ${\mathrm{}x}_{\mathrm{2}}^{\prime}$ fails, then ${x}_{\mathrm{2}}$ is recovered by decoding the data nodes ${x}_{\mathrm{1}},{\mathrm{}x}_{\mathrm{3}}$ and the parity node $p_{\mathrm{1}}$ . All three node in this case transfer $\frac{M}{6}$ bytes each thus transferring $3\cdot\frac{M}{6}$ bytes. After the node ${x}_{\mathrm{2}}$ has been recovered, its data is copied to node ${x}_{\mathrm{2}}^{\prime}$ in order to recover its data, with a bandwidth of $\frac{M}{6}$ .

Based on the above theorems for repair bandwidth we can compare GRC with hybrid and double code which have the same minimum repair bandwidth of $\frac{M}{k}$ . Considering different patterns of node failures we show the performance of GRC for repairing failures.

Table 2
Comparison of repair bandwidth based on four different failure patterns for hybrid code, double Code and GRC.

Cases	Failure pattern applied	Hybrid coding	Double coding	Group repair code
Case 1	Data node fails	$M/k$	$M/k$	$M/k$
Case 2	Parity node fails	$M$	$M/k$	$M/g$
Case 3	[l]Two non-replica data nodes
and one parity node fails	$\frac{2M}{k}+M$	$\frac{3M}{k}$	$\frac{2M}{k}+\frac{M}{g}$
Case 4	Both replica nodes fail	$2M$	$M+\frac{M}{k}$	$\frac{M}{g}+\frac{M}{k}$

Case 1: Case 1:

One data node fails. The repair bandwidth is minimum in all three approaches if the failed node is a data node. Only one node is accessed which transfer $\frac{M}{k}$ bytes of data that was stored in failed node. For example, if $M=$ 10 Mb is distributed on $k=$ 10 nodes, each node stores 1 Mb of data and on failure, only 1 Mb will be transferred for repairing a data node in all three approaches.

Case 2:

One parity node fails. To recover data of a parity node, in hybrid code any $k$ nodes transfer $\frac{M}{k}$ bytes each to decode the data of parity node, resulting in repair bandwidth of $M$ . The double code maintains replica for both data and parity node and thus parity node is recovered by accessing a single node that transfer $\frac{M}{k}$ bytes. According to Theorem 2, in GRC the repair bandwidth for failure of a parity node is $\frac{M}{g}$ . This shows that the value of repair bandwidth for failure of a parity node in GRC is better than hybrid code but is slightly more than double coding. For example for $M=$ 10 Mb, $k=10$ and $g=2$ , only 1 Mb data is transferred in case of double coding, 5 Mb for GRC and 10 Mb data is transferred for hybrid coding.

Case 3:

Two non-replica data nodes and one parity node fails. Consider that total 3 nodes fail, two of which are data nodes that do not have same data and one is a parity node. The hybrid code recovers the two data nodes by transferring $\frac{M}{k}$ bytes each and recovers the parity node with repair bandwidth of $M$ . So the repair bandwidth for this scenario in hybrid code is $\frac{2M}{k}+M$ Double code transfer $\frac{M}{k}$ bytes for recovery of each node so the repair bandwidth is $\frac{3M}{k}$ . GRC recovers two data nodes by transferring $\frac{M}{k}$ bytes each and recovers the parity node in a group. The repair bandwidth is thus $\frac{2M}{k}+\frac{M}{g}$ , which is less than hybrid code but greater than double coding. For $M=$ 10 Mb, $k=10$ and $g=2$ the amount of data transferred for GRC, double and hybrid code is 7 Mb, 3 Mb and 12 Mb respectively.

Case 4:

Both replica nodes fail. The hybrid code maintains one complete replica of $M$ bytes, so if both replicas fail then the repair bandwidth will be very high and will be recovered using multiple decoding of erasure coded data. First, the erasure coded replica will be recovered with bandwidth of $M$ and then each data nodes transfer $\frac{M}{k}$ bytes for recovery of replica node. The minimum value of repair bandwidth for this scenario will be $2M$ . For double coding, the first replica is recovered using erasure coding which requires a repair bandwidth of $M$ for transfer of $\frac{M}{k}$ bytes each from any $k$ nodes. The second replica node will be recovered using the recently recovered node. So total repair bandwidth is $M+\frac{M}{k}$ . Using Theorem 3, the repair bandwidth in GRC is $\frac{M}{g}+\frac{M}{k}$ . For this type of failure, the performance of GRC is better than other two approaches. For $M=$ 10 Mb, $k=10$ and $g=2,$ only 6 Mb data is transferred in case of GRC whereas, data transferred in case of double and hybrid code is 11 Mb and 20 Mb respectively.

Table 2 shows the repair bandwidth for these three approaches in all 4 cases of different failure patterns. Assuming $M=$ 10 Mb, $k=10$ and $g=2$ the graph is plotted for hybrid, double and GRC for each case in Fig. 2. From this, we conclude that the repair bandwidth in case of multi-node failure (Cases 3 and 4) in GRC is better than that of hybrid code and might be slightly greater than that of double coding.

Figure 2.

Graph showing repair bandwidth values for four different failure patterns in hybrid, double and GRC codes.

Figure 3.

Graph for comparing total storage required for different approaches in a distributed storage systems.

Table 3

Total storage required for different approaches to store 50 Mb, 100 Mb, 150 Mb, 200 Mb and 250 Mb of data

Approaches	Total storage	Data of 50 Mb	Data of 100 Mb	Data of 150 Mb	Data of 200 Mb	Data of 250 Mb	Data of 300 Mb
Replication	$3∼{}M$	150	300	450	600	750	900
Erasure code RS(6, 10)	$n\cdot\frac{M}{k}$	83.33	166.67	250	333.33	416.67	500
[l]Local regeneration
code LRC(6, 2, 2)	$n\cdot\frac{M}{k}$	83.33	166.67	250	333.33	416.67	500
Hybrid code	$M(\left(\frac{n}{k}\right)+1)$	133.33	266.67	400	533.33	666.67	800
Double code	$2\cdot n\cdot\frac{M}{k}$	166.67	333.33	500	666.67	833.33	1000
Group Repair Code	$\frac{M\left(2k+g\right)}{k}$	116.67	233.33	350	466.67	583.33	700

Theorem 4. The total storage required for Group Repair Code is $\frac{M\left(2k+g\right)}{k}$ .

Proof: The Group Repair Codes divide the original data of size $M$ bytes into $k$ fragments and each fragment has $\frac{M}{k}$ bytes of data. These form $g$ smaller groups and in each group there are $k$ original data nodes, $k$ replica data nodes and one parity nodes each. The total number of nodes utilized are $n=k+k+g$ . All data nodes and parity nodes store $m=\frac{M}{k}$ bytes each. So the total storage can be computed as:

$\displaystyle\textit{Total Storage}=k\cdot\frac{M}{k}+k\cdot\frac{M}{k}+g\cdot% \frac{M}{k}=\frac{M\left(2k+g\right)}{k}.$

Using this value a comparison can be made for storage requirement for GRC with other approaches used for distributed storage like replication, erasure code, LRC, hybrid and double coding. Table 3, gives the values of total storage required for all these approaches to store 50 Mb, 100 Mb, 150 Mb, 200 Mb and 250 Mb of data. The graph obtained by plotting the storage requirements for these approaches is shown in Fig. 3. The maximum storage is required for replication and the minimum storage needed for erasure code and LRC codes. If we compare the storage requirements of only three approaches, hybrid, double and GRC, that have the optimal repair bandwidth values then it can be easily noticed that the minimum storage value is required for GRC. The aim of optimal repair bandwidth along with reduced storage requirement if thus achieved. In GRC(6, 2) there are 12 data nodes and only 2 parity nodes. If all nodes have equal probability of failures, then a data node will fail more frequently in comparison to parity node and hence the repair of the data node is done more frequently which has the optimal repair bandwidth value.

Theorem 5. The GRC system is resilient to at most $(k+g)$ node failures.

Proof: Let us suppose that all the $g$ parity nodes and $k$ data nodes are damaged in such a way that any one out of two replicas for all pairs of replicas is safe, then data is not lost. This means that out of total $2k+g$ nodes, if $k+g$ nodes fails and only a set of $k$ nodes survive, then the complete system can be rebuilt again using the survival nodes by replicating the data and calculating the parities. But in case more than $k+g$ nodes fail the complete data cannot be recovered. This makes the system resilient to at most ( $k+g$ ) node failures. Clearly, if eight nodes ${x}_{1}^{\prime},{x}_{2}^{\prime},{x}_{3}^{\prime},{x}_{4}^{\prime},{x}_{5}^{% \prime},{x}_{6}^{\prime},p_{1}$ and $p_{2}$ fail then the data of these nodes can be recovered using the remaining six data nodes $x_{1},x_{2},x_{3},x_{4},x_{5}$ and ${x}_{6}$ .

Figure 4.

(a) Graph for total storage required for various approaches with different values of $k$ . (b) Graph for maintenance bandwidth for various approaches with different values of $k$ .

Table 4

Comparison of all the approaches based on various parameters for $k=6$ and $k=10$

[l]Data of size M $=$ 100Mb is stored using different approaches	Replication	Erasure codes		LRC		Hybrid coding		Double coding		Group repair codes
		RS (6, 10)	RS (10, 14)	LRC (6, 2, 2)	LRC (10, 2, 4)	$k=6$	$k=10$	$k=6$	$k=10$	$k=6$	$k=10$
Total storage	300	166.67	140	166.67	160.00	266.67	240	333.33	280	233.33	220
Repair bandwidth	100	100	100	50	50	16.67	10	16.67	10	16.67	10
Disk read	100	100	100	50	50	100	100	16.67	10	16.67	10
[l]Maximum number of
faults tolerated	2	4	4	$\leqslant$ 4	$\leqslant$ 6	10	14	12	18	8	12

4. Quantitative analysis

In this section, we compare the performance of Group Repair Code with replication, erasure code, LRC, hybrid and double coding for two different values of number of fragments, $k$ equals to 6, 8, 10 and 16, for storing the data of size $M=$ 100 Mb. There are a number of parameters on the basis of which these approaches can be compared. The values obtained for storage, repair bandwidth, disk read and fault tolerance for all these approaches are given in Table 4. From the table, it is clear that replication and double coding utilize the maximum storage space. Erasure codes and LRC are although efficient in storage but, their repair bandwidth is high. The GRC approach has minimum repair bandwidth which is same for hybrid and double coding which proves the GRC, optimal in terms of repair. Among these three approaches, the GRC occupies the minimum storage space which is also discussed in previous section. From the table we can see that the disk read (the amount of data read for repairing a failed node) and the number of faults tolerated is also better in GRC compared to all other approaches.

We take a set of values for number of fragments, $k=\{6,8,10,16\}$ based on which the storage requirement and minimum repair bandwidth for all the approaches is compared through a graph given in Fig. 4a and b respectively. The storage for replication remains the same for all values of $k$ . Erasure code has the minimum value and the value for double code is exactly twice the value of erasure codes. It may be figured out that, the GRC stores the data more efficiently in comparison to replication, double and hybrid codes. Figure 4b shows the repair bandwidth value which depicts that there is a tremendous improvement in bandwidth values for hybrid, double and GRC codes in comparison to replication, erasure codes and LRC. Another observation drawn from both the figures is that with the increase in number of fragments $k$ , both storage and repair bandwidth is reduced for all the approaches.

5. Intelligent data storage: A future perspective

In the current study we have presented an approach for efficient data storage and optimal repair, but we have not made it intelligent. Supervised and unsupervised machine learning and artificial intelligence has been applied in recent study to improve the system by adding some sort of intelligence. But these have not been combined with distributed storage environment [30]. In this section, we highlight the importance of artificial intelligence for storage and related areas and we aim to extend our work in near future along with machine learning and artificial intelligence.

The relation between artificial intelligence (AI) and data storage is bidirectional. AI needs vast amount of data to train a model in order to get accurate predictions. And in return AI is needed to maintain and extract valuable data from the vast storage. To manage the stored data is challenging. When AI is applied to store the data, it can differentiate between the data which is most often used and the data which is never used, so that it can be stored in either cache or archival storage.

Furthermore, the data stored on a public cloud is accessible to all the users making it unsecure to store private and confidential data of an organization. As a result, Amazon Macie (a machine learning-powered security service) has been introduced that uses machine learning to detect certain keywords in the data, classify it and protect it from unwanted access. On combining machine learning with distributed storage, significant improvements can be made in the system. So, we aim to carry forward our work with machine learning and artificial intelligence in near future.

6. Conclusion and future work

This paper investigates various network coding techniques like replication, erasure codes, LRC, regenerating codes, hybrid and double codes that are applied for distributed storage to make the data easily available and reliable. To achieve this, data is replicated and/or coded. It has been challenging for researchers to design an approach which utilize less storage and is capable of efficient recovery of data during failures. We have thus proposed an approach named Group Repair Codes which provides optimal repair and efficient storage. Like hybrid and double coding, it uses two types of architecture (replication and erasure coding) for storing the data and has the optimal repair bandwidth, along with an advantage of reduced storage requirement. It divides the data into fragments, replicate the data and form smaller groups to calculate parities within the group. The GRC $\left(k,g\right)$ used total $2k+g$ nodes, each node having $\frac{M}{k}$ bytes of data and thus utilizing a total storage of $M\cdot\frac{\left(2k+g\right)}{k}$ bytes. The algorithm for encoding the data in GRC and the decoding process for reconstruction and repair of the data or parity node is also discussed in the paper. GRC is compared with hybrid and double codes for different failure patterns which prove that GRC utilizes less repair bandwidth for multiple failures as compared to the other two. The GRC approach is also compared with replication, erasure code, LRC, hybrid and double code based on storage, repair bandwidth, disk read and fault tolerance which proves its efficiency. The proposed approach can be applied in distributed storage systems and cloud storage systems to enhance its performance. Other than storage, this can also be implemented in other fields where network coding is used like communication, wireless sensor networks etc. In addition to this, the GRC can further be combined with artificial intelligence to provide security to the data and also help in better data analytics and retrieval.

References

Singal

Rakesh

Matam

. Coding strategies to avoid data loss in cloud storage systems. 4th International Conference on Parallel, Distributed and Grid Computing (PDGC). 2016; 22.

Kubiatowicz

Bindel

Chen

Czerwinski

Eaton

Geels

Gummadi

Rhea

Weatherspoon

Weimer

Wells

Zhao

. OceanStore: An architecture for global-scale persistent storage. ACM ASPLOS. 2000.

Huang

Simitci

Ogus

Calder

Gopalan

Yekhanin

. Erasure coding in windows azure storage. USENIX ATC. 2012.

Ghemawat

Gobioff

Leung

. The google file system. 19th ACM Symposium on Operating System Principles. 2003 Oct.

Sathiamoorthy

Asteris

Papailiopoulos

Dimakis

Vadali

Chen

Borthakur

. Xoring elephants: Novel erasure codes for big data. Proc. of the VLDB Endowment. 2013.

Chen

HCH

Lee

PPC

Tang

. NCCloud: Applying network coding for the storage repair in a cloud-of-clouds.10th USENIX Conf. on File and Storage Tech. (FAST’12). San Jose. 2012 Feb.

Weatherspoon

Kubiatowicz

. Erasure coding vs. replication: A quantitiative comparison. Peer-to-Peer Systems: First International Workshop. IPTPS 2002; LNCS 2429: 328.

Chun

Dabek

Haeberlen

Sit

Weatherspoon

Kaashoek

Kubiatowicz

Morris

. Efficient replica maintenance for distributed storage systems. NSDI. 2006.

Cai

Yeung

. Network coding and error correction. IEEE Information Theory Workshop, Banglore, India. 2002 Oct; 119.

10.

Plank

. Erasure codes for storage applications. FAST-2005: 4th Usenix Conference on File and Storage Technologies (San Francisco, CA). 2005 Dec.

11.

Dimakis

Godfrey

Wainwright

Ramchandran

. Network coding for distributed storage systems. IEEE Trans. On Information Theory. 2010 Sept; 56(9): 4539.

12.

Dimakis

Godfrey

Wainwright

Ramchandran

. The benefits of network coding for peer-to-peer storage systems. 3rd Workshop on Network Coding, Theory, and Applications. 2007.

13.

Rodrigues

Liskov

. High availability in DHTs: Erasure coding vs. replication. Proc. IPTPS. 2005.

14.

Araujo

Giroire

Monteiro

. Hybrid approaches for distributed storage systems. Proceedings of Fourth International Conference on Data Management in Grid and P2P Systems (Globe’11), Toulouse, France. 2011 Sep.

15.

Mohan

Harold

Caneleo

PIS

Parampalli

Harwood

. Benchmarking the performance of Hadoop triple replication and erasure coding on a nation-wide distributed cloud. Network Coding (NetCod), 2015 International Symposium on Network coding. 2015 Jun; 61–65.

16.

Singal

Rakesh

Matam

. Storage vs repair bandwidth for network erasure coding in distributed storage systems. International Conference on Soft Computing Techniques & Implementations (ICSCTI-2015). 2015 Oct; 27-32.

17.

Reed

Solomon

. Polynomial codes over certain finite fields. J. SIAM. 1960; 8(10): 300.

18.

Blaum

Brady

Bruck

Menon

Vardy

. The EVENODD code and its generalization. In High Performance Mass Storage and Parallel I. 2002; 187

19.

Huang

. STAR: An efficient coding scheme for correcting triple storage node failures. IEEE Transactions on Computers. 2008; 57(7): 889-901.

20.

Corbett

English

Goel

Grcanac

Kleiman

Leong

Sankar

. Row-diagonal parity for double disk failure correction. Proc. of USENIX FAST, San Francisco, CA, USA. 2004 Mar. 31 to Apr. 2.

21.

. X-code: MDS array codes with Optimal encoding. IEEE Transactions on Information Theory. 1999 Jan; 45(1): 272-276.

22.

MacKay

DJC

. Fountain codes. Capacity Approaching Codes Design and Implementation Special Section, IEEE Proceedings – Communication. 2005 Dec; 152(6).

23.

Xia

Saxena

Blaum

Pease

. A tale of two erasure codes in hdfs. Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 2015.

24.

Rakesh

Tyagi

. Failure recovery in XOR’ed networks. 2012 IEEE International Conference on Signal Processing, Computing and Control (ISPCC-2012). 2012 Mar. 1-6.

25.

Halalai

Felber

Kermarrec

Taïani

. Agar: A caching system for erasure-coded data. IEEE 37th International Conference on Distributed Computing Systems. 2017; 23-33.

26.

Stones

Wang

Liu

. ProCode: A proactive erasure coding scheme for cloud storage systems. IEEE 35th Symposium on Reliable Distributed Systems. 2016; 219-228.

27.

Papailiopoulos

Dimakis

. Locally repairable codes. IEEE Trans Inf Theory. 2014 Oct; 60(10): 5843-5855.

28.

Shahabinejad

Khabbazian

, Ardakani. An efficient binary locally repairable code for hadoop distributed file system. IEEE Commun Lett. 2014 Aug; 18(8): 1287.

29.

Goparaju

Calderbank

. Binary cyclic codes that are locally repairable. IEEE International Symposium on Information Theory (ISIT). 2014 Jun: 676-680.

30.

Zhou

Pan

Wang

Vasilakos

. Machine learning on big data: Opportunities and challenges. Neurocomputer. 2017; 237; 350-361.

An optimal storage and repair mechanism for Group Repair Code in a distributed storage environment

Abstract

Keywords

1. Introduction

2. Related work

3.1 System model

Table 1 Notations used for Group Repair Codes

Table 2 Comparison of repair bandwidth based on four different failure patterns for hybrid code, double Code and GRC.

5. Intelligent data storage: A future perspective

6. Conclusion and future work

References

Table 1
Notations used for Group Repair Codes

Table 2
Comparison of repair bandwidth based on four different failure patterns for hybrid code, double Code and GRC.