Abstract
Pre-stack Reverse Time migration (RTM) is a full wave field imaging method. There are lots of data calculation tasks and data storage tasks in RTM. This requires powerful data analysis and processing capabilities to meet the needs of actual production. In this paper, we first discuss the seismic data regularization related technology and the impact of the regularized seismic data on the RTM imaging results. Then, in order to improve the efficiency of RTM data processing, cloud computing technology is used. We research the data parallel processing method of RTM in cloud computing environment. We apply the parallel computation of CPU and GPU to RTM, and design a MapReduce parallel computing model C-GMR suitable for multi-GPU nodes. The experimental results show that the technical measures can greatly improve the computational efficiency on the basis of guaranteeing the advantages of RTM high-precision imaging, and provide good technical support for the application of RTM to process massive seismic data.
Introduction
In the current processing of seismic data, the conventional migration techniques have limited accuracy, and the amplitude and phase are not accurate. It is difficult to handle imaging problems with severe changes in lateral velocity and steep dip angle structures, and it has become increasingly unable to meet the needs of actual production [1]. The RTM technique based on the two-way wave equation has higher accuracy and accurate phase. And it avoids the separation of the up and down waves in the one-way wave migration method. There are few approximations to the wave equation, and there are no restrictions on the tilt angle and offset aperture, which can effectively deal with the physical characteristics of the earth medium with drastic changes in longitudinal and lateral velocities. It is the most accurate migration method with the highest imaging accuracy [2, 3].
In the process of field seismic data acquisition, the influence of complex environment and construction conditions may lead to the lack of seismic data. Irregular sampling in actual exploration reduces the quality of the offset imaging [4]. Therefore, before performing high-precision processing such as wave equation migration, it is necessary to regularize the irregular seismic data to complete the missing data reconstruction. Through regularization processing, the signal-to-noise ratio is improved, and the spatial aliasing caused by irregular sampling is eliminated. The imaging profile obtained by inversely offsetting the regularized seismic data can more accurately and clearly reflect the actual geological structure of the underground.
Although RTM imaging has high precision and strong adaptability to speed, the scale of application in actual production is still very small, mainly due to the fact that RTM’s huge demand for calculation and storage is not easy to solve [5, 6, 7].
In recent years, the GPU/CPU cooperative parallel computing technology emerging in high-performance computing has rapidly become the trend of high-performance intensive computing. The organic combination of cloud computing technology and RTM can effectively integrate and use existing resources. It not only accelerates the calculation speed but also saves costs in the processing of seismic data, and is beneficial to improving the processing capability and efficiency of migration imaging. It is of great significance and practical value for oil and gas exploration in complex areas in China.
Research status of RTM data processing
In the process of RTM, more time is concentrated on the data processing of forward and reverse continuation of seismic wave field. High wave field calculation and data access efficiency are two major problems in data processing of RTM.
Due to the low performance of early computing devices, most studies have focused on improving or optimizing the algorithms and methods of reverse-time migration, and in order to improve computational efficiency by reducing the amount of computation. At present, there are two solutions to improve the computational efficiency of RTM, which are optimization algorithms that reduce the amount of computation and high-performance calculations. Villarreal and Scales used the large-scale cluster to implement forward parallel simulation of the wave equation finite difference method [8]. With the advent of CPU/GPU collaborative parallel acceleration technology, Li et al. implemented parallel reverse computing task with CPU/GPU cooperative parallel acceleration algorithm, and the computational efficiency has greatly improved [9]. Liu et al. adopted a GPU multi-card computing method and a CPU/GPU cluster platform to implement a three-dimensional RTM data calculation task [10]. Zhao et al. according to the strong CPU/GPU heterogeneous platform computing power, sacrificed computational complexity in exchange for storage and I/O reductions, and implemented pre-stack seismic RTM calculations [11].
Due to the large range of blocks, complex geology, and increased data dimensions, larger amounts of computation and storage are generated during the processing of RTM data. Frequent reading and writing of storage devices during data processing also creates I/O bottlenecks [12]. Therefore, this paper takes the parallel computation of RTM data as the research content, and focuses on the research of data processing method of RTM in cloud computing environment. Applying CPU/GPU co-computing to RTM, a MapReduce parallel computing model C-GMR for multi-GPU nodes in cloud computing environment has been proposed through in-depth study of GPU technology and MapReduce programming model. Finally, the correctness and application effect of the method are verified by experiments.
Theoretical research framework of RTM data processing method
The principle of RTM
Pre-stack RTM is mainly the process of continuation and imaging. For each shot, the RTM calculation is mainly divided into three steps: forward calculation, inverse time extrapolation calculation and application imaging calculation [13]. Specific steps are as follows [14]:
Wave field forward calculation. The seismic source wave field propagates from zero time to the maximum time in the time direction, and the wave field information at each time is saved. The detection point wave field propagates from the maximum time to the zero time along the time direction, and the wave field information at each time is saved. Two wave fields are read at the same time and performed imaging operations with imaging conditions.
After the single shot offset is completed, the offset results of the shots are superimposed, and the pre-stack RTM imaging data body is finally obtained.
Finite difference method
In the whole RTM process, the core algorithm of wave field numerical simulation is the wave equation [15]. In this paper, the high-order finite difference method is used to numerically discretize the wave equation. The finite difference expression of the wave field forward continuation can be expressed as:
The wave field backward continuation formula is:
In the formula,
Taking the two order difference of the time domain and the high order difference of the space domain as an example, the stability conditions in the two dimensional case are as follows:
where
The principle of the PML boundary condition is shown in Fig. 1. The attenuation factor is 0 in the effective calculation area O, and the attenuation factor is greater than zero in the absorption layers A, B, and C. The cosine attenuation factors in the
In the
In the
where
Schematic diagram of PML.
The idea is to eliminate the coherence of the reflected wave from the artificial boundary, so that the boundary reflection cannot be imaged [16]. The random boundary function can be expressed as follows:
Where
The most direct way to achieve the time-consistent imaging principle is the zero-delay cross-correlation between the source wave field and the received wave field. It is the cross-correlation between the forward wave field and the inverse wave field [17], that is:
Where
In the entire RTM process, considering the factors such as calculation amount, storage amount, and low-frequency noise, random boundary conditions and PML completely absorbing boundary conditions, are used in forward wave field forward modeling. Finally, Gaussian filter and Laplacian filter are used to denoise after imaging [18, 19].
The flow chart of RTM algorithm is shown in Fig. 2. RTM algorithm mainly includes three key technologies: forward continuation of the source wave field, reverse continuation of the wave field at the receiving point, and appropriate imaging conditions. Since the time of the source wave field is positive along the time, the front shot records the wave field along the time are reversed. To make the two wave fields’ zero-delay cross-correlation, we must save one of the wave fields. This is the problem of storage volume that RTM must face.
The flow chart of RTM algorithm.
Aiming at the problem of data regularization, many scholars have proposed and developed the reconstruction method based on wave equation, predictive filtering theory, various mathematical transformation domains, matrix reduction rank theory and the more popular compression sensing theory in recent years [20]. Compressed sensing theory points out that all the original signals that can be sparsely represented, as long as they can find a sampling matrix that is not related to the signal sparse basis, can be recovered to high precision original signal with a few optimized sampling solutions through a suitable optimization solution [21]. Fourier transforms, wavelet transforms, curvelet transforms, and so on are there commonly used sparse representations of mathematical transformations.
Regularized processing is not the most important for seismic exploration. Improving RTM calculation efficiency and obtaining high-precision seismic data imaging profiles are the ultimate goal of seismic exploration data processing. Therefore, this paper uses a threshold iterative algorithm for seismic migration data to constrain seismic signals in the Fourier domain, and performs regular pre-processing for RTM imaging. Then focus on how to improve the efficiency and accuracy of reverse time migration calculations.
Analysis of data processing in RTM
According to the characteristics of seismic data RTM data processing, each data processing job consists of multiple data processing modules. It is usually processed in units of shots during the process. Each cycle includes the input, calculation, and output of shot data. The flow chart of data processing is shown in Fig. 3.
The data processing of RTM.
It can be seen from the Fig. 3 that the data processing of RTM consists of a data input module, a data processing module, and a data output module. Firstly, it read the seismic data files. The index of the seismic data is retrieve to obtain the offset of the seismic data gather in the file and the gathers are synthesize to shot data according to the shot records. Then shot data is read to memory calculation and processed, all the single shot calculation results are superimposed. Finally, a new seismic data file is generated. Traditionally, RTM data processing adopts serial mode. Due to the huge amount of data and all of them are floating-point operations, this leads to long running time and low computational efficiency. After processing, the data is often stored on the local basic disk array. This large amount of data centralized storage method may cause bottlenecks in I/O, which results in low data access efficiency.
At present, according to the parallelization of the RTM algorithm, RTM in a cluster environment usually uses parallel processing methods of multiple computing nodes and multi-service modules which can improve the data processing efficiency. As shown in Fig. 4.
The data parallel processing of RTM.
CPU/GPU cooperative RTM parallel computing
We introduce CUDA programming idea into inverse time wave field continuation and correlation imaging algorithms. It can handling RTM computing using CPU/GPU cooperative mode [22]. The main idea is that the GPU is responsible for performing parallel computing tasks such as wave field continuation and related imaging, and the CPU is responsible for data input/output and other related operations.
Relying on the powerful computing power of GPU devices, CPU/GPU cooperation has high parallel computing efficiency, but its own drawbacks also limit the application in large-scale computing [23]. In the actual calculation process, a single GPU device cannot complete the one-to-one correspondence between the number of computing grid points and the number of threading points. This leads to excessive thread load and reduces the advantages of GPU parallel computing [24].
For the single-card device calculation problems, this paper designs a multi-GPU to process large-scale data computing tasks in parallel. Through the block storage of large data and the sending of computing tasks in blocks, the cooperative parallel computing mode between multiple cards is realized. The main idea is to divide the calculation data into blocks according to the number of devices. Taking into account the data exchange delay between multiple GPU devices in the calculation process, the smallest dimension of the block cross section is selected to divide. The data storage and computation tasks assigned to each GPU device should take into account the performance of the GPU device. When the performance parameters of multiple GPU devices are similar, the task allocation can tend to be averaged. Multiple GPU can solve the limitation of GPU memory shortage.
The data processing of RTM in cloud environment.
Combining the characteristics of data processing of RTM, we design the data processing flow in cloud environment, as shown in Fig. 5.
RTM data processing in cloud environment adopts multi-node, multi-task parallel processing. Data processing includes data input, data segmentation, task allocation, calculation processing, result merging, and data output. First, the large-scale seismic data file is input, and then the large data is divided into a plurality of small-block data. Each small-sized data block is formulated as a calculation task, and the task scheduler distributes the tasks to each of the computing nodes for parallel processing. After each node receives the task, it completes the forward continuation calculation of the source wave field and the reverse continuation calculation of the wave point of the detection point, and uses the imaging conditions to complete the migration imaging calculation. Finally, after all tasks are processed, the imaging data in each computing node is superimposed to complete the reverse-time migration imaging task of the entire large seismic data, and the imaging results are output as files.
Parallel computing model C-GMR based on GPU-MapReduce
C-GMR design ideas
At present, GPU has gradually become the main technology of graphic computing, and has been widely used in high-performance computing fields such as seismic exploration data processing. In recent years, the combined application of GPU technology and MapReduce parallel computing model has also been extensively studied [25, 26]. Through depth study on related technologies such as GPU and MapReduce programming model, this paper proposes a MapReduce parallel computing model for multiply GPU nodes in cloud environment, which is called C-GMR (Cloud GPU MapReduce). The main idea of C-GMR model data processing is to split the traditional MapReduce process into three task processing stages: Map, LocalReduce and Reduce. It combines multi-node GPU resources collaborative computing. The node completes its own Map and LocalReduce processing tasks, and the nodes complete the single-node processing result merge task to form the intermediate achievement data. Then, through the execution of Reduce tasks in multiple nodes, the secondary result data is merged to form the final result data.
The task function is defined as:
The Map function receives the key-value pair
C-GMR model structure
According to the C-GMR model data processing idea, the model structure should be designed by considering various factors such as node management, data segmentation, task scheduling, and calculation methods. The logical structure of the C-GMR model consists of three parts: user layer, management layer, and computation layer, as shown in Fig. 6.
The logic structure diagram of C-GMR model.
User layer: It is mainly responsible for sending the calculation task request to the Manage layer. After the calculation task is completed, the management layer returns the result to the user layer.
Manage layer: As a master node, it responsible for managing all node status and load information. JobNode as a work management node is responsible for data segmentation, distribution and scheduling of computing tasks, monitoring the implementation of the task.
Calculation layer: After the TaskNode receives the task request as the task execution node; it calls the GPU Task to perform the parallel computing task and sends the result to the execution node after completion.
In the C-GMR model design, the cooperative working mechanism among multiple nodes is considered, including the parallel and intra-node parallelism between nodes, and the integration of data parallel processing results between nodes. At the same time, a dynamic data distributor is designed to monitor the memory usage information of each node’s GPU and the execution status of the distributed tasks in real time, and can be dynamically selected according to specific requirements. The divided data blocks are allocated to the most suitable GPU resource for processing.
CPU/GPU cooperative acceleration analysis
In the RTM experiment, we use the temporal-second-order and spatial-tenth-order finite difference numerical algorithms to complete the source wave field forward calculation and the detection point wave field reverses extrapolation. According to the principle of time consistency, the cross-correlation imaging conditions are used to extract the image point. After imaging, the Laplacian filter was used to denoise and the final imaging result was obtained [28].
In the experimental environment, the test node is configured with 1 CPU and 1 GPU card. The GPU card is NVIDIA GDDR5 (3072 Core/6 GB), the CPU processor is Inter
In the experiment, the Sigsbee speed model was chosen. The model data consisted of 300 shots, the sampling interval was 8 ms, the grid size was
For the Sigsbee model, CPU and GPU are used to complete the pre-stack reverse-time migration respectively. In the experimental process, the CPU calculation and GPU parallel acceleration were tested separately. The single-shot data offset CPU time is 46,870 s, and the GPU time is 562 s. The 300 shots data offset CPU time will be 234,350 min, and the GPU time will be 2810 min. Compared with the traditional CPU serial computing method, using CPU/GPU cooperative parallel computing, the computational efficiency of RTM is increased by nearly 83 times. It can be seen that the introduction of GPU parallel acceleration technology in the reverse-time migration algorithm greatly improves the computational efficiency. The comparison of calculation time between CPU and GPU is shown in Fig. 7.
The comparison of calculation time between CPU and GPU.
We apply the C-GMR parallel computing model to RTM and test the performance of the reverse-time-shift parallel algorithm. The Marmousi model 120 shot data and Sigsbee model 300 shot data were used to complete the pre-stack RTM experiment. In the experiment, the temporal-second-order and spatial-tenth-order finite difference numerical algorithms were used to complete the source wave field forward calculation and the detection point wave field reverses extrapolation. The cross-correlation imaging conditions are used to enhance the imaging point. After the imaging, we can obtain the final offset imaging result by using Laplacian filter.
There are four compute nodes and each node is equipped with 1 NVIDIA GDDR5 (3072 Core/6 GB) GPU and 1 Inter
Considering the same configuration of computing nodes, the model data are partitioned evenly. The segmented data block task is divided into the corresponding computing nodes, and the four GPU nodes accomplish the task of migration calculation together. After generating the computation, the imaging results of each node are superimposed to generate the final inverse time migration image results.
In the whole experiment, the single-core CPU, single card GPU, two GPU nodes and four GPU nodes were used to complete the pre-stack RTM experimental test on the two sets of model data. We record the time cost of each group and count the total time that the RTM using different calculation methods. The results of test computing time are shown in Table 1.
Test computing time(s)
Test computing time(s)
From the table, it can be seen that in the two model migration calculations, it is time-consuming to implement RTM with the CPU; using the GPU calculation greatly reduces the calculation time and obtains a nearly 60-times acceleration ratio.
When using C-GPU model to achieve parallel computing between nodes, as the number of computing nodes increases, the execution time of RTM calculation will decrease. Then the computational efficiency of RTM will be greatly improved.
According to the test migration computing time of two different data sets, the test calculation speed ratio diagrams of the above four methods are generated, as shown in Fig. 8.
The speed ratio of test computing.
From the above results, it can be concluded that the GPU single-card computing speed rate is more than 60 times that of the CPU single-core speed. It reflects that the GPU’s superiority in RTM computation. Using C-GMR’s two nodes to calculate, the speed ratio is more than 150 times that of CPU single core. Using C-GMR’s four nodes to calculate, the speed ratio has reached more than 250 times of CPU single core, which 280 times in the Sigsbee model experiment. It can be seen that using C-GMR multiple GPU nodes to complete the RTM calculation can greatly improve the computational efficiency, and the more the nodes, the more obvious the effect.
This paper focuses on the research of parallel data processing of RTM. The calculation method of pre-stack RTM imaging is analyzed. For the problem of large amount of calculation and low efficiency, the multi-node cooperative processing method in cloud computing environment is introduced into the calculation of RTM. A CPU/GPU cooperative and a multi-GPU card joint RTM calculation model were proposed. A MapReduce parallel computing model C-GMR for multi-GPU nodes in cloud computing environment was designed and applied. After the data is regularized, the method of this paper is used to accelerate the parallel computing, so that the arcing problem in the RTM result is significantly improved, which is conducive to the interpretation of the data. The experimental results show that the C-GMR model cloud computing method does not affect the imaging accuracy, and it can better solve the data calculation problems in the current seismic data processing, and can effectively improve the calculation efficiency of RTM.
In order to further improve the accuracy of migration imaging, our future studies will focus on the theory and method of data regularization reconstruction which can obtain high SNR, high resolution and high fidelity seismic data.
Footnotes
Acknowledgments
This study was supported by the Scientific Research Project of Hubei Provincial Department of Education (No: B2017438).
