Abstract
Storage software and products design/architecture from different storage vendors are often incompatible. This incompatibility of the heterogeneous storage makes the remote disaster recovery infeasible, which results in a lot of wasted storage resources and excessive duplication of investment. To solve the problem of incompatibility for remote disaster recovery of heterogeneous storage, this paper proposed a solution. The feasibility of the solution has been testified by the implementation of a practical case. Accordingly, this scheme can reach six disaster levels and meets the requirement of recoverability, reliability and real-time performance indicators.
Introduction
Information security and data protection have become major issues which affect the overall situation and long-term development of a country [1]. In recent years, as people learned more about damages of enterprises business and losses caused by all kinds of disasters such as the “9.11” attacks, the earthquake in Kobe, Japan, Southeast Asia tsunami and the “5.12” Wenchuan Earthquake, information systems and data security are attracting more and more attention. Therefore, off-site disaster recovery for data and information systems, which is aiming to prevent the damages from disasters, has become one of the necessary security measures taken by the industry which is dependent on information systems [2].
In the current disaster recovery market, driven by the interests, vendors usually implement the remote disaster recovery based on different product design architecture. Moreover, there are also complex management issues for the products of the same manufacturer. Specifically, in the large storage environments, the life cycle of information is very long and the technologies and products for storage develop quickly, which exacerbate the problem. As disaster recovery service providers often only provide a single technical solution based on the storage devices [3–5] they purchased, heterogeneous storage devices and software provided by different storage vendors can not achieve remote disaster recovery with each other, which greatly restricts the development of the disaster recovery industry. Therefore, how to deal with the incompatibility between heterogeneous storage devices becomes an urgent problem. As far as I’m concerned, the related work is limited, therefore we do some research work on this.
In this paper, we study a remote disaster recovery method based on heterogeneous storage. First, we analyze the problem of storage incompatibility, and fine that the main cause of this problem is the heterogeneity of storage, which leads to a lot of disaster recovery systems restricted by the brands of the equipment. Disaster recovery center need to deal with multiple users with different storage brands, so that solving the problem of storage incompatibility is the key. Second, this paper proposes a solution which has a cost-effective performance. We have also applied the proposed solution to a practical system. The results show that the method is stable and has good effectiveness.
A heterogeneous storage solution
Currently, the traditional solutions that solved the remote disaster recovery of heterogeneous storage devices are mainly virtual storage, virtual gateways and remote mount [6, 7].
Virtual storage is to put multiple storage modules (such as disks and disk arrays) together in some way and all storage modules are centralized and managed unifiedly in a storage pool. Although it is compatible with the different brands of storage and ensures the technical feasibility of disaster recovery, the cost of the scheme is very high and requires certain downtime for the user to reinitialize [17].
Virtual Gateway bridges the transmission of voice and the associated signal through VG among multiple endpoints which are behind the same or different NAT or firewalls. It can be compatible with different brands of storage, but it has major two disadvantages as the virtual storage. One is the high cost, and the other is that it requires the user to shut down to replan the network.
Remote mount utilizes the software of the operating system itself to attach the disk of disaster recovery centre to the production host directly. Generally, it is not used independently as a disaster strategy, but to serve the data disaster recovery centre with volume management software and disaster recovery software. Although remote mount has a very high demand on network and low overall security, it may be a viable scheme of backup software for some users who have relatively high requirements on cost [8].
Considering the shortcomings of the above options, this paper presents a new disaster recovery scheme for heterogeneous storage that can be extended flexibly. That is adding a new low-cost storage which is compatible with the storage of off-site disaster recovery centre at the user terminal. In this way, it can achieve storage replication with disaster recovery centre via this compatible low-cost storage. As there is no compatibility issues between the two storage devices, there is no heterogeneous problem for storage replication. Since the transmission distance is short between the new low-cost storage device and storage device of produce system at the user terminal, so it can use the volume management software or backup software to achieve data synchronization [16]. The user only adds a relatively inexpensive storage (about 10 T data capacity needs $10,000) that implements a remote disaster recovery between heterogeneous storage devices. Moreover, it is acceptable for the users because it also provides protection at the user terminal by adding a new storage device. The disaster recovery scheme is shown in Fig. 1.
Experiment result
The verification of remote offsite disaster recovery scheme in this article is mainly carried out in the real environment in the Data disaster recovery centre of Shandong. It is located in Jinan, which provides professional third party services and fully meets the environmental requirements for disaster verification. The user is the Shandong data centre in Weihai, which is located in the eastern part of Shandong and which is 500 km away from Jinan. This paper carries on the data disaster recovery from the user to the disaster recovery centre in practical case in order to verify the indicators including network bandwidth, synchronization time, server utilization and packet loss rate [9, 10]. To verify the the feasibility of this scheme, which is shown in Fig. 2., the paper compares its performance with the remote mount.
Test environment for verification
In order to implement the remote offsite disaster recovery, it is necessary to consider the data amount of the transmission, the speed of the transmission, as well as the reliability of the transmission. Therefore, the test environment of the heterogeneous storage scheme designed in this paper is: At the user terminal(the production side) the application storage equipment is AS500N produced by Inspur Group. The server is HP blade server C7000. The local storage of the backup is Huawei low-cost storage S2600T which is compatible with off-site backup storage. The off-site backup storage is S5600T of Huawei. The test environment of traditional scheme for heterogeneous storage is: The application storage is AS500N produced by Inspur Group at the user terminal. The server is HP blade server C7000. The off-site backup storage is Huawei S5600T. The simple schematic diagram for the test environment of the two schemes is shown in Fig. 3.
From Fig. 3, we can see that except for the local disaster backup, the links and parameters are the same for the schemes from the application storage device to the backup server [11]. They also have the same backup software. Specifically, it adds one low-cost storage device between the backup server and off-site backup storage device as well as the backup software that comes with the storage device [12–14]. Overall, most parameters are ensured to be consistent during the test so that it can be relatively easy to discuss the differences between the two kinds of schemes.
According to the actual situation, the implementation strategy of disaster recovery used is the incremental backup. Therefore, the amount of data transferred will not be large. So in the experiment, the size of the data transferred is 1 G, 2 G, 5 G, 10 G and so on. The corresponding test results are as follows.
Network throughput
In the experimental environment, the paper tests the performance of the network communication system. Specifically, this paper backs up data of 1 G, 2 G, 5 G, 10 G from the user terminal to off-site backup storage device separately every time by backup tool, and then tests the network throughput and transfer time of traditional scheme and the proposed scheme respectively. In the traditional scheme, the backup tool is the Fast Copy. This paper takes 1 GB and 10 GB size of data transmission as example, and measurement results are shown in Fig. 4. The two parameters can be read directly.
In the scheme proposed by this paper, between the application storage and the local backup storage, the test program uses the Fast Copy as backup tool, while between the application storage and the local backup storage, it uses the backup software comes with the storage device. Therefore, the transmission time is the sum of the two procedures. The network throughput can then be calculated by sum up them. Take the data transmission of 2 GB and 5 GB for example, the measurement results are shown in Fig. 5.
In order to obtain accurate measurement results, each group is tested 5 times. According to the results of the actual measurements, we calculates the average of five measurements, and the results are shown in Tables 1 and 2 respectively.
In practical environment, this paper mainly focuses on the feasibility of two programs when using incremental backup, so the comparison of the preliminary synchronization time is not of much significance. While the data transmission time is different, A figure is plotted with data transmission volume versus data transmission time, by which we can see in what cases the scheme is feasible. From Table 1, you can see that the transmission time of two schemes are doubled with the amount of data transmitted doubled. When amount of data transmission is 10 GB or less, the transmission time of the two schemes is within minutes. Therefore, for the incremental backup, the amount of data is very small and is far from the GB level, so the both schemes can meet the requirements.
As can be seen from Table 2, the network throughputs of two schemes are respectively 4.4 M/s and 5.6 M/s. This is because transmission delay and propagation delay of both schemes are almost the same in the process of transmission, so the main difference is the processing delay. That is to say, the differences in the nodes passed in the data transmission process result in the differences of throughput.
Is the measured data correct? You can verify the correctness of the data measured by the following theory. Theoretical data transmission time = propagation delay + transmission delay + processing delay [15], therefore, the actual data throughput Ep is:
The propagation delay refers to the time of the data block from a node into the transmission media. That is the time required form the start of sending the first bit of the data block to the completion of the transmission of the last bit. This parameter is related to the TCP window size and the link bandwidth. In the test environment of this paper, the link bandwidth is 100 Mbps. The TCP window size is 16 bits, so the maximum window size is 65535 bytes. However, there is an option to expand the window in the TCP protocol. When the existing network requires a larger window to provide maximum throughput, you can use the option to expand the window and then make the TCP window increases from 16 bits to 32 bits.
The transmission speed of the optical or electrical signal is fixed, that is, approximately 300 thousand km per second (in fact, the propagation speed of light in fibre optic cables is only 200000 kilometres per second, and the propagation speed of signals in the cable is approximately 210000 kilometres per second.). Distance between Weihai to Jinan is 500 km, so the transmission time is 500/300000 s.
Therefore, the actual network throughput can be written as equation (1):
Where V is Network Throughput, S is TCP Window Size, and t denotes Processing delay.
If you ignore the processing delay caused by various repeaters, forwarding or protocol conversion equipments in the whole link, you can obtain the curve of maximum network throughput and TCP window size just as shown in Fig. 6. You can see from the first figure of Fig. 6, with the increasing of the TCP window size, the maximum network throughput increases. When the TCP window increases to a certain point, the network throughput reaches the maximum value and tends to be stable, as the turning point in the plot. In the environment of the test program, the turning point of the TCP window size is about 1 MB, and the maximum network throughput value corresponding to the point is approximately 6 M/s.
In the results of this test scheme, the measured throughput of the network is less than 6 M/s, which does not exceed the theoretical maximum network throughput and which is similar to the previous cases. Therefore, it can be inferred that the measured data is correct.
For the packet loss test, this test scheme we adopt a simple method, using the PING command to PING the firewall connected with the off-site storage at the backup server [18]. Taking sending 10 MB size of data as an example, the 10 MB data is divided into 20480 packets and each packet is 512 bytes. The test results are shown in Fig. 7.
Similarly, in order to obtain accurate measurement results, each group is tested 5 times. The average is calculated with the results of the 5 measurements and the results are shown in Table 3.
According to Table 3, you can obtain that with small amount of packet loss, the data loss rate is approximately zero. Thus, the two schemes are feasible. In addition, the network environment is ideal for this test program. While the network environment is very bad, the proposed scheme will be better than the traditional backup solutions on the performance of packet loss.
For this proposed scheme, the application storage connected to the local storage by backup server, which is controlled by their own. For the backup service providers, although they can backup the data by backup software coming with the storage, they can not add servers on storage. Because the link is interconnected between backup storages and any operations are transparent, for the user terminals and the service providers. Therefore, this program ensures that the relative safety of the data.
Conclusions
This paper have proposed a unique storage replication disaster recovery solution which can be flexibly extended for the needs of users by comparing with the existing remote disaster recovery solutions of heterogeneous storage devices on the current market, and verifies the feasibility, reliability and security through the implementation of practical cases. The program not only solves the incompatibility issue of heterogeneous storage devices, it also achieves extended storage, enhances the value of IT investments, and promotes the development of disaster recovery industry. More importantly, this paper provides a practical remote disaster recovery solution for heterogeneous storage devices which solves the biggest bottleneck in remote disaster recovery, and improves storage efficiency. Moreover, this scheme costs relative low and the security is also guaranteed.
Footnotes
Acknowledgments
This research was partially supported by Youth Fund Project of Shandong Academy of Sciences No. 2014QN012.
