General Paradigm of Edge-Based Internet of Things Data Mining for Geohazard Prevention

Abstract

Geological hazards (geohazards) are geological processes or phenomena formed under external-induced factors causing losses to human life and property. Geohazards are sudden, cause great harm, and have broad ranges of influence, which bring considerable challenges to geohazard prevention. Monitoring and early warning are the most common strategies to prevent geohazards. With the development of the internet of things (IoT), IoT-based monitoring devices provide rich and fine data, making geohazard monitoring and early warning more accurate and effective. IoT-based monitoring data can be transmitted to a cloud center for processing to provide credible data references for geohazard early warning. However, the massive numbers of IoT devices occupy most resources of the cloud center, which increases the data processing delay. Moreover, limited bandwidth restricts the transmission of large amounts of geohazard monitoring data. Thus, in some cases, cloud computing is not able to meet the real-time requirements of geohazard early warning. Edge computing technology processes data closer to the data source than to the cloud center, which provides the opportunity for the rapid processing of monitoring data. This article presents the general paradigm of edge-based IoT data mining for geohazard prevention, especially monitoring and early warning. The paradigm mainly includes data acquisition, data mining and analysis, and data interpretation. Moreover, a real case is used to illustrate the details of the presented general paradigm. Finally, this article discusses several key problems for the general paradigm of edge-based IoT data mining for geohazard prevention.

Introduction

Geological hazards (geohazards) are geological phenomena formed under the influence of external factors damaging human life, property, and the environment.^1,2 Generally, geohazards include landslide, debris flow, rock collapse, surface subsidence, surface cracks, and surface collapse. Geohazards are characterized as group-occurring; have certain susceptibility, suddenness, and high risks; and are challenging to forecast. Geohazards cause significant casualties and property losses, which create considerable challenges for geohazard prevention. For example, in 2006, a landslide in the Philippines destroyed a village and killed >1100 people.³ In 2008, the Wenchuan earthquake severely damaged >100,000 km² and killed 69,227 people.⁴

Monitoring and early warning are essential strategies in geohazard prevention.^5–7 Monitoring and early warning aim to collect geohazard-related data to predict the development trend of the monitored target and provide geohazard early warning information. However, the strong suddenness and hostile monitoring environment of geohazards make monitoring and early warning challenging. Traditional manual measurement and prevention cannot satisfy the current needs of geohazard monitoring and early warning.

The rapid development of internet of things (IoT) technology^8,9 makes geohazard monitoring and early warning more accurate and effective. An IoT-based geohazard early warning platform includes three layers, that is, the perception layer, the transmission layer, and the application layer.^10,11 IoT-based monitoring devices can collect the changing indicators related to geohazards, such as displacement, rainfall, and water level. Then, the geohazard monitoring data are transmitted to the application layer through wired or wireless transmission. Finally, the application layer conducts geohazard data processing and sends the results to the command center, thus providing adequate data support for early geohazard warning. Generally, the application layer is usually established based on cloud computing. Cloud computing plays a key role in geohazard monitoring and early warning because of its huge computing capabilities.

The explosive growth of IoT monitoring data increases a load of cloud computing, causing inevitable time delay.¹² Moreover, geohazard monitoring data usually have the characteristics of big data.¹³ The data types are progressively abundant, and the amount of data is also increasing. The limitations of the network bandwidth and high load of the cloud center enhance the time delay of cloud computing in geohazard early warning. However, geohazard early warning requires the rapid processing of monitoring data, which leads to the lack of reliability of cloud computing when a higher delay exists. In addition, in some cases where the monitoring environment is particularly bad, it is challenging to transmit the monitoring data to the cloud center due to the difficulties in the installation and high cost of transfer devices such as base stations. Thus, choosing a more proper data processing strategy to cope with the suddenness and great harm of geohazards is a challenge.

Edge computing can effectively reduce the delay by addressing tasks near the data source.¹⁴ The monitoring data based on the IoT devices do not have to be transmitted to the cloud center, which reduces the time delay of data processing and provides more time for geohazard early warning. Compared with cloud computing, edge computing has advantages of lower delay, higher security, stronger scalability, location awareness, lower costs, and lower traffic.^15,16 Therefore, edge computing has excellent potential in geohazard monitoring and early warning.

However, edge computing has not been widely introduced into the application scenario of geohazard prevention, although it has achieved many successes in intelligent medicine,^17,18 automatic driving,^19,20 smart homes,²¹ industrial IoT,²² and other scenarios.^23,24

To fill the above gap, this article introduces edge computing to the field of geohazard monitoring and early warning. Our contributions can be concluded as follows. (1) A general paradigm of edge-based IoT data mining for geohazard prevention is presented. It especially focuses on geohazard monitoring and early warning, including data acquisition, data mining and analysis, and data interpretation. (2) A real case of surface deformation induced by coal mining is used to illustrate the details of the presented general paradigm. (3) Challenges and potential future work in edge-based geohazard prevention are summarized.

The article organizes as follows. The Background section provides a brief presentation of geohazard prevention and edge computing. The Method: Mining and Analysis of Edge-Based IoT Data for Geohazard Prevention section presents the general paradigm of data mining and analysis in geohazard monitoring and early warning. The Application: A Practical Case section uses a practical case to illustrate the details of the presented general paradigm. The Discussion section discusses the challenges and potential future work in edge-based geohazard prevention. The Conclusion section summarized this article.

Background

Geohazard prevention

Geohazard prevention^25,26 uses practical geological engineering strategies to change the occurrence process of geohazards such as landslide and surface subsidence. Roughly, geohazard prevention strategies can be divided into the following two situations. First, practical geological engineering strategies are used to control geohazards. For example, the use of an anchor bar to reinforce the landslide. Second, IoT-based devices are used to conduct geohazard monitoring and early warning.^27,28 By monitoring the changes in the monitoring body and external-induced factors, IoT-based devices make an alarm before or when the geohazards occur and inform the residents in the dangerous areas to evacuate in time through various means of communication.

Edge computing

Edge computing is an emerging computing mode that extends cloud computing to the network edge through edge devices. Edge computing and cloud computing are not independent; they are closely related to each other. The edge computing architecture includes the terminal layer, edge layer, and cloud layer^29,30 (Fig. 1).

FIG. 1.

An illustration of edge computing. Color images are available online.

Terminal layer

The terminal layer is composed of various IoT devices, such as sensors and card readers. The terminal devices aim to collect the original data and transmit them to the upper layer for computing and storage.

Edge layer

The edge layer is composed of many edge nodes, such as routers, mobile phones, and gateways. Edge nodes can compute and store the data uploaded by the terminal devices. Because edge nodes are close to users, they can serve delay-sensitive scenarios to satisfy the real-time needs of users. Edge nodes can also preprocess the collected data and then transmit the preprocessed data to the cloud layer, thus enhancing the efficiency of the cloud center.

Cloud layer

The cloud layer can provide huge computing power and storage capacity, which can enable it to execute complicated computing tasks. The cloud layer and edge layer are not independent; the cloud layer can effectively schedule edge nodes from a global perspective, maximizing the computing power of edge computing.

Method: Mining and Analysis of Edge-Based IoT Data for Geohazard Prevention

Overview

This article presents the general paradigm of edge-based IoT data mining for geohazard prevention, especially focusing on monitoring and early warning. Geohazard monitoring and early warning are conducted according to the IoT monitoring data. The process can be divided into three phases: (1) data acquisition, in which various targeted monitoring sensors based on IoT technology can continuously monitor the status of hidden danger points, providing extremely rich and fine data for geohazard early warning; (2) data transmission, in which monitoring data are transmitted to the computing center through wired or wireless technology; and (3) data application, including data mining and analysis and data interpretation, which can be conducted in the cloud center or edge computing devices.

The mining and analysis of geohazard monitoring data are the core of geohazard early warning. Cloud computing or edge computing can comprehensively analyze the monitoring data to predict the state of the monitored target and effectively prevent geohazards.^11,31

It can be seen from the above process that the locations of monitoring data mining and analysis are different for cloud computing and edge computing. Edge computing executes the process as close as possible to the location of the collected data, which will reduce data processing time and is very consistent with the geohazard prevention scenario with high requirements for low delay.

In the following subsections, the three main phases of edge-based IoT data mining for geohazard prevention, including data acquisition, data mining and analysis, and data interpretation, are presented.

Phase 1: data acquisition

IoT technology can provide adequate monitoring data for geohazard prevention, even if there are some challenges, including the wide range of monitoring areas, harsh monitoring environment, long monitoring time, and various induced factors.

IoT-based sensors work well in geohazard data acquisition. Sensors can continuously monitor the hidden danger points and realize automatic monitoring. Moreover, the monitoring frequency can be set remotely according to the needs of the hidden danger points and to overcome the disadvantages of the harsh environment and long monitoring time. Simultaneously, there are several kinds of geohazard monitoring devices, such as rainfall gauges, displacement meters, stress meters, inclinometers, and video systems. Different monitoring devices can be selected according to different needs, overcoming the problems caused by many induced factors. In addition, multiple sensors can form a sensor network, such as cellular IoT (Fig. 2).³² Sensor networks can meet large-scale and long-term geohazard data monitoring requirements because of their broad coverage, low power consumption, and low costs.^33–36 A sensor network based on the IoT provides a data basis for collecting data for edge computing.

FIG. 2.

An illustration of a sensor network for monitoring landslide. Color images are available online.

Phase 2: data mining and analysis

The mining and analysis of geohazard monitoring data can determine the probability of geohazards by continuously collecting the data for the same area at different times. Adequate monitoring data and effective data models are the cores of data mining and analysis. The data preprocessing and data modeling processes are presented in the following paragraphs.

Data preprocessing

Data preprocessing is an essential step in geohazard monitoring data mining and analysis. Monitoring data have the characteristics of large amounts, multiple types, and great value. However, due to the harsh environment of geohazard monitoring, the original data collected by monitoring devices are dirty, incomplete, redundant, and fuzzy, which can hardly meet the requirements of the data mining algorithm directly. Thus, it is challenging to obtain clean, accurate, and concise data.³⁷

For geohazard monitoring data, this article summarizes four common preprocessing methods, including data cleaning, data integration, data selection, and data conversion.^38,39

Data cleaning

Device failures and human factors may introduce noise to or cause the loss of geohazard monitoring data, affecting the quality of data sources. The processing method for incorrect data depends on the specific situation. When there is a considerable amount of missing data, the data can be removed; otherwise, it is easy to introduce inaccurate data, which would produce wrong results in the mining task. When there is a small amount of missing data, regression and interpolation methods⁴⁰ can be used to fill in the missing data. For noisy data, binning, clustering, and regressions can be used.⁴¹

Data integration

The data types and structures derived from geohazards are various and complicated. Moreover, the data formats of monitoring devices are different. To better conduct data mining and analysis, different monitoring data need to be integrated.⁴²

Data selection

There are many induced factors of geohazards, and the monitoring process of geohazards is long term and uninterrupted; thus, considerable amounts of multiple types of data will be accumulated. To ensure accuracy and low delay, valuable data from data sources must be selected for mining. For example, most landslides in mountainous areas are caused by heavy rainfall or continuous rainfall. Therefore, rainfall and water level changes need to be used as input data of the prediction model.

Data transformation

The selected geohazard monitoring data can be transmitted into a form that can be mined. The values and dimensions of many indicators are different due to many geohazard monitoring factors. It is essential to conduct the data transformation, such as normalization and discretization.³⁹

Data modeling

Data models are used to mine the existing monitoring data and provide a reference for geohazard early warning. For the geohazard monitoring data, four typical data models^43,44 are introduced, including classification regression models, association rules models, clustering models, and time-series models.

Classification regression models

Classification regression models apply classification and regression to predict problems. Classification mainly constructs a classification model that inputs training data attribute values and outputs predefined categories. Regression mainly establishes a continuous function model to predict the values of dependent variables corresponding to the given independent variables. Typical algorithms include regression analysis,⁴⁵ decision tree,⁴⁶ naive Bayes,⁴⁷ support vector machine,⁴⁸ and artificial neural network.⁴⁹

Association rules models

Association rules models find the associations or relationships hidden between data items. That is, the appearance of other data items can be deduced according to the appearance of one data item. Typical algorithms include the Apriori algorithm⁵⁰ and the Éclat algorithm.⁵¹

Clustering models

Clustering models provide unsupervised learning algorithms that divide data into several groups according to their distance or similarity without a given classification. Typical algorithms include the K-means⁵² and spectral clustering.⁵³

Time-series models

The primary purpose of a time-series model is to predict future changes based on the existing time-series data. Typical algorithms include the long short-term memory (LSTM) model⁵⁴ and the autoregressive integrated moving average (ARIMA) model.⁵⁵

Phase 3: data interpretation for geohazard early warning

In this section, we show how the combination of monitoring data and data mining algorithms can be used to realize the geohazard early warning through specific examples.

Geohazard susceptibility evaluation

Geohazard susceptibility indicates the probability of geohazard events occurring in a specific area according to the local monitoring data. Monitoring data, analyzing data, and conducting susceptibility evaluation in a specific area are the initial methods to address geohazards in advance. Based on different monitoring data and mining algorithms, this article divides geohazard susceptibility evaluation into two types of methods: (1)

Supervised methods: Supervised methods often incorporate previous geohazard monitoring data and the state of geohazard points and are used to train a prediction model. When new monitoring data are input, the model can predict the stable state of the geohazard monitoring points.⁵⁶

In this article, we take landslide as an example. There are many induced factors of landslide such as the slope, lithology, and other static indicators; and many dynamic indicators such as the groundwater level and rainfall. This article uses the naive Bayes model and decision tree model to illustrate the modeling process.

Naive Bayes model

The basic principle of the naive Bayes model is to determine the probability of landslide stability according to the current monitoring data. The inputs are the induced factors and labels of historical landslides. For example, several induced factors, such as slope, rainfall, and displacement, can be used as the input data. The labels of historical landslides include stability, instability, and intermediate state. Generous input data can be trained to obtain a new landslide classifier. When facing the new landslide monitoring data, the landslide classifier is used to predict the probability of landslide stability under the condition that the monitoring data occur. The stability corresponding to the maximum probability is the prediction state of the landslide.

Decision tree model

Figure 3a clearly shows the classification process of the decision tree.⁵⁷ Each landslide to be classified receives the corresponding category after applying the decision tree classifier. The decision tree classifier is trained based on training data, and the order of each decision node is determined by the information gain criterion or Gini index criterion to improve the prediction accuracy. The classifier can be used to classify new landslides.

FIG. 3.

Data interpretation for geohazard early warning. (a) Decision tree model in landslide prediction. (b) Regional division of geohazards: an illustration. (c) Neural network for geohazard prediction. Color images are available online.

(2)

Unsupervised methods: unsupervised methods are usually utilized for the regional division of geohazards. A fundamental problem is determining which areas are prone to geohazards when researchers investigate geohazards in an area.^58,59 The clustering model is usually used to overcome this problem. Regional monitoring data, such as data on the topography, lithology, rainfall, groundwater level, and the historical experience, can be used to divide the survey region into several risk levels, including very low, low, moderate, high, and very high (Fig. 3b). The risk level established by the clustering model provides a reference for later geohazard investigations and prevention.

Geohazard real-time prediction

The geological conditions of the geohazard points determine the probability of geohazards, but most of the geohazards are induced by other factors, such as the rainfall and groundwater level. When the rainfall, groundwater level, and other induced factors reach a certain threshold, this will lead to the occurrence of geohazards. To better control geohazards, single variables, such as rainfall,⁶⁰ groundwater level, and displacement,⁶¹ must be predicted. Moreover, the development of geohazards needs to be dynamically predicted in real time using a variety of factors (Fig 3c).^62,63

Application: A Practical Case

Underground mining can cause surface cracks, surface subsidence, and surface collapse. These issues occur because underground mining destroys the original stress balance state, causing the rock strata to deform and destroy. The movement of the internal rock mass extends to the surface and eventually leads to surface deformation. The surface deformation will affect the safety of underground mining and destroy the surface buildings.⁶⁴

To prevent surface deformation, one of the important strategies is to conduct early warning based on monitoring data. Generally, the mine disaster is sudden and will cause a great risk of casualties. The faster warning gives workers more time to evacuate from the mine. As discussed above, edge computing has a lower delay when compared with cloud computing. Thus, the combination of edge computing and underground mining is meaningful. The Yangjiacun coal mine is used to illustrate the procedures of the general paradigm presented in the Method: Mining and Analysis of Edge-Based IoT Data for Geohazard Prevention section. The following includes a brief introduction to the study area, data acquisition, mining and analysis of monitoring data, and data interpretation for the early warning of surface deformation.

Brief introduction to the study area of the practical case

The Yangjiacun coal mine is located in the Dongsheng coalfield, Ordos City, China. The mine was built with recoverable reserves of 441.82 million tons (mt) and a design production capacity of 5.0 mt per year. The mine started mining in May 2011 and ended mining on December 31, 2015. The monitoring area is located in the central and western part of the Yangjiacun coal mine, 400 m away from the Ordos prison in the south. The Yangjiacun coal mine is located in a mountainous area with sparse vegetation and strong terrain cutting. The mining of the coal mine had resulted in subsidence basins, cracks, and collapse pits in the monitoring area (Fig. 4b).

FIG. 4.

Illustrations of study area. (a) Monitoring area and monitoring points. (b) Surface deformation in the monitoring area. Color images are available online.

Data acquisition

Multiple monitoring points need to be arranged to record the subsidence value in the monitoring area considering the large monitoring scope. To make the monitoring data representative, 4 observation lines and 146 monitoring points are arranged in the working face (Fig. 4a.) We selected 38 monitoring points, including A1–A18 and R1–R20, to analyze the law of surface deformation. These monitoring data began being recorded on May 8, 2011, and stopped being recorded on February 12, 2012. Routine monitoring was conducted once a week, and a total of 41 regular monitoring times were conducted. The data show that 38 monitoring points have finally reached a stable state.

GPS receivers were used to monitor the subsidence. The advantage of edge computing is that the monitoring data do not have to be transmitted to the cloud layer for processing, which provides more response time for the emergency response of the surface collapse.

Mining and analysis of monitoring data for prevention

The idea of monitoring data mining and analysis is to find a law from the existing data and then predict future settlement development. Monitoring data mining and analysis aim to obtain the subsidence law and prediction model to provide a reference for the prevention of surface subsidence caused by subsequent mining.

Analysis of subsidence characteristics

The cumulative subsidence of monitoring points was surveyed to determine the degree of subsidence in the monitoring area (Fig. 5). For A1–A18, there are five monitoring points with accumulated subsidence greater than 1000 mm. The trend indicates that the closer to the central line of the monitoring area a point is, the greater the subsidence, and the maximum cumulative subsidence is close to 2500 mm. For R1–R20, there are 16 monitoring points with accumulated subsidence greater than 1000 mm, and 15 of them are greater than 3000 mm, which shows that the degree of subsidence on the central line of the monitoring area is relatively high.

FIG. 5.

Final subsidence of the monitoring points. (a) A01–A18. (b) R01–R20. Color images are available online.

Figure 6 presents the subsidence curve of each monitoring point, and the results indicate that during the first 30 monitoring periods, the surface was in a state of subsidence; and after the first 30 monitoring periods, the surface has reached a stable state. Moreover, the surface subsidence caused by coal mining has a certain regularity, which can be divided into three stages: subsidence development, subsidence sufficient, and subsidence attenuation.

FIG. 6.

Subsidence law of monitoring points. (a) A01–A09. (b) A10–A18. (c) R01–R10. (d) R11–R20. Color images are available online.

Mining and analysis of subsidence data

Effective data models need to be determined to predict the subsequent mining subsidence based on the subsidence law. The cumulative subsidence induced by coal mining can be regarded as an unstable time-series. It is considered that the cumulative subsidence includes the trend items and periodic items based on the additive model of time-series.⁶⁵

The surface subsidence induced by coal mining is related to the mining frequency and mining mode; thus, the seasonal factor is not considered. The additive model can be expressed as follows: $y_{t} = T_{t} + C_{t}$

where y_t is the cumulative subsidence, T_t is the trend items, and C_t is the periodic items. The cumulative subsidence is caused by the interaction of geological conditions and induced factors. The trend items are affected by the geological conditions of the rock stratum, which show monotonic growth. The subsidence affected by external factors presents fluctuating changes, and the time-series presents the characteristics of regularity and laddering.

In this article, four monitoring points, including A05, A10, R10, and R20, are selected to conduct the time-series prediction.

(1)

Prediction of trend items

The trend items reflect the inevitable result of the evolution of the surface subsidence, which is subject to its internal conditions. In this article, the trend terms are extracted by the simple moving average method,⁶⁶ and the details are as follows. Suppose the observation value of the time-series is X_t, where $t = 1, 2, \dots, n$ ; and the moving average of period k is

X_{t} = \frac{X_{t} + X_{t - 1} + \dots + X_{t - n + 1}}{n}

where $X_{t - n + 1}$ is the subsidence value of monitoring point, k − n + 1, n is the period, and n = 2 in this article. In this article, the first 30 periods are the training data and the remaining points are the testing data. The fitting method is utilized for the trend items, and the results show that the Boltzmann function is the best. The fitting function, correlation coefficient, and prediction value of monitoring points are shown in Figure 7.

(2)

Prediction of periodic items

According to the additive model of the time-series, the periodic items can be obtained by removing the trend items from the accumulated subsidence. The periodic item displacement is related to the periodic disturbance of underground mining. However, due to the large differences in the periodic items in each monitoring period, we have processed the periodic items for better visualization (Fig. 8).

FIG. 7.

Fitting and prediction of trend items. (a) A05. (b) A10. (c) R10. (d) R20. Color images are available online.

FIG. 8.

Prediction of periodic items. (a) A05. (b) A10. (c) R10. (d) R20. Color images are available online.

The LSTM model is used to predict the periodic items. LSTM is a deep learning model that inherits the recursive cycle and temporal characteristics of a recurrent neural network. LSTM effectively overcomes the problem of gradient disappearance and can realize long short-term memory of information.⁶⁷

In this article, the LSTM-based prediction model is established in three steps: (1) data preprocessing: The data set is divided into the training set and testing set at a ratio of 3:1. Then, the minmaxscaler() function is used to normalize the training set. Then, the sequence length is set as 10, that is, the monitoring data within ten weeks are a time-series. (2) Creating the LSTM model: The LSTM model includes three layers: the input layer, the hidden layer, and the output layer. Because the training data only contain the accumulated subsidence, the input dimension and output dimension are 1, and the number of neurons is set as 100. (3) Model training: A linear activation function and the mean squared error are used as the activation function and loss function, respectively. Furthermore, the Adam algorithm is chosen as the optimization function with a learning rate of 0.005, and the number of iterations is 500.

We used the prediction model to forecast the data for the following 10 periods (Fig. 8). The results indicate that LSTM can capture the regularity of the periodic items. However, some errors exist due to the lack of monitoring data and the periodic weakening of disturbance in the later monitoring period.

Discussion of the above results

The above results indicate that (1) the surface subsidence caused by coal mining can be divided into three stages and the degree of subsidence on the central line of the monitoring area is relatively high; and (2) the Boltzmann function and LSTM model can capture the laws of the trend items and periodic items.

Moreover, the following can also be observed that:

(1)

The monitoring period is long due to GPS monitoring, resulting in insufficient data and errors among the periodic items. Therefore, it is necessary to collect the monitoring data of small particle sizes to enhance the prediction accuracy.

(2)

The displacement of the surface subsidence directly reflects the law of surface displacement, but the occurrence of displacement is related to many factors. Fewer induced factors decrease the generalization ability of the prediction model. Therefore, subsidence prediction must be combined with multiple factors to enhance the prediction accuracy.

Data interpretation for the prevention of surface subsidence

As mentioned several times, monitoring and early warning are important strategies to prevent the surface subsidence induced by coal mining. Early warning is based on monitoring data. Mining the monitoring data can timely grasp the deformation law of the surface subsidence and its development trend, providing early warning information for subsequent mining.

The subsidence characteristics indicate that the degree of subsidence on the central line of the monitoring area is relatively high and that subsidence mainly occurs in the stage of subsidence development and is sufficient. Thus, it is suggested that more monitoring devices should be arranged near the central line of the monitoring area. Moreover, to better monitor the surface deformation trend, the data acquisition frequency of the monitoring devices should be increased.

The mining of trend items and periodic items indicates that the Boltzmann function and LSTM model can predict subsidence development. The above prediction model provides data support for the development of surface subsidence induced by subsequent mining, making the early warning faster before the occurrence of geohazards and being more conducive to the evacuation of mining workers and the protection of essential facilities.

Geohazards are usually with the characteristics of group-occurring and the occurring of surface subsidence may induce secondary geohazards. The monitoring areas are located in mountainous areas, and surface subsidence may cause other geohazards, such as landslide and rock collapse. Understanding the subsidence law of the monitoring area is critical for the prevention of secondary geohazards. The monitoring of secondary geohazards should be strengthened in the regions with a high degree of subsidence, and edge-based devices should be arranged to conduct the monitoring and early warning.

Discussion

The data mining and analysis of the IoT based on edge devices have shown great potential in geohazard monitoring and early warning. Like the application of edge computing in intelligent medicine, automatic driving, and other scenarios, edge computing can also play an essential role in geohazard monitoring and early warning, which requires a low delay. The difference is that the monitoring objects are rock and soil, the monitoring area is larger, and the monitoring environment is worse. These factors present new requirements for edge computing. In the following subsections, several key problems in edge-based IoT data mining for geohazard prevention are discussed.

Key techniques in data acquisition

Sufficient monitoring data are an essential basis for geohazard prevention. Sensor networks are very suitable for geohazard monitoring scenarios due to their large-scale, self-organization, dynamics, and reliability. However, the geohazard monitoring environment is harsh, and the complex monitoring environment will lead to the failure of part of the sensor nodes; thus, the sensor network needs higher fault tolerance to avoid sensor network failure.^68,69 Moreover, the monitoring period of geohazards is long, and the sensor nodes should have low consumption to ensure the continuity of the monitoring.^70–72 In addition, the monitoring range of geohazards is wide and the monitoring types are various, which puts forward the requirements of low costs^73,74 and more new types for sensors. For the monitoring area, reasonable distribution and configuration of sensors⁷⁵ will contribute to understanding the geological information of all aspects.

Due to the harsh monitoring environment of geohazards, some areas, such as mountains and canyons, cannot be monitored by sensor networks easily; thus, more monitoring methods need to be developed. Satellite remote sensing works well for geohazard monitoring, but the coverage period of satellite remote sensing is long and the update speed is slow, leading to the lack of flexibility in geohazard monitoring. In recent years, unmanned aerial vehicle (UAV) remote sensing systems have been widely used in geohazard monitoring and early warning due to their strong mobility, high adaptability, low altitude flight, high acquisition speed, high resolution, and low costs. Through its remote sensing images and video data, UAVs provide a data basis for monitoring and early warning,^76,77 and UAVs are a powerful supplement to edge computing. Moreover, UAVs can also be used as a suitable carrier of edge computing. The combination of both makes geohazard prevention flexible and has a low delay.

Ground sensor networks and low-altitude UAV remote sensing systems can provide first-hand monitoring data for geohazard prevention and lay a solid data foundation for subsequent processing.

Utilization of 5G for data transmission

Geohazard monitoring data have the characteristics of big data, leading to an unavoidable delay in data transmission. Fifth-generation communication technology (5G) has high bandwidth, low latency, and high interaction, providing better connectivity and lower latency for edge-based applications.^78–80 Moreover, to cope with the large workload, low delay demand, and harsh geohazard environment, the need for data transmission devices will be more complex, and the performance requirements will also be higher. The emerging 5G technology can better overcome the problems encountered in the data transmission of edge computing.

Application of emerging prediction models

Establishing an effective prediction model is the core task of accurate geohazard prediction.^81,82 The mechanism of geohazards is a very complicated dynamic process; thus, the establishment of an early warning model needs to meet the characteristics of reliability and timeliness.

As there are many induced factors and a wide influence range of geohazards, the early warning of geohazards needs to be conducted from an overall perspective. The application of the sensor network should meet the requirements of geohazard monitoring. Each sensor node stores the monitoring data; moreover, the monitoring data between each monitoring point are dependent, and there is a correlation between the monitoring data from different sensors. Therefore, we need to mine the data from the overall perspective.

The graph neural network (GNN)^83,84 provides a new idea to address this problem. The GNN is a neural network that runs directly on graph structures. A graph is a data structure composed of two parts: vertexes and edges. The edges have weights and directions, which represent the connection between two vertexes and the information flow. We can take the sensor nodes in the monitoring area as the graph network's vertexes and the sensor nodes contain some monitoring data, such as the monitoring time, geographical location, and induced indicators. Each vertex is connected to establish a connected network. The weight can be the geographical distance between two points. The closer the two points are, the closer the disaster situation between the two points. Currently, the GNN model has been widely introduced in many cases, such as traffic flow prediction.^85,86 We are looking forward to applying new algorithms, such as GNNs, in geohazard monitoring (Fig. 9).

FIG. 9.

Graph neural network in geohazard prevention. Color images are available online.

Cloud computing and edge computing in geohazard prevention

Cloud computing and edge computing are different modes of big data computing. Cloud computing focuses on the overall situation, more like the command center, while edge computing is auxiliary. Edge computing and cloud computing need to cooperate closely to satisfy the geohazard prevention needs and increase the application value of edge computing and cloud computing. For example, edge computing devices are the execution units of monitoring and early warning; moreover, edge computing devices are the collection and preprocessing units of the high-value data required by cloud computing. On the contrary, cloud computing can train the early warning model through data mining and distribute it to the network edge, and the edge side can conduct early warning based on the output model.

The geohazard prevention early warning system based on edge computing and cloud computing includes three layers: the terminal layer, the edge computing layer, and the cloud computing layer. The terminal layer contains sensor network monitoring devices, responsible for collecting geohazard monitoring data, such as displacement, rainfall, and surveillance videos. The edge computing layer is mainly composed of various edge computing nodes. These nodes process the collected data, generate relevant early warning information, and send messages to pertinent users. According to the historical data and currently processed data, the cloud computing layer conducts unified scheduling for the deployment of geohazards.

The coordination of cloud computing and edge computing can maximize the advantages of both and can be flexibly deployed to conduct geohazard prevention (Fig. 10).

FIG. 10.

Cloud computing and edge computing in geohazard prevention. NPU, neural network processing unit. Color images are available online.

Conclusion

This article illustrates a general paradigm of edge-based IoT data mining for geohazard prevention. The combination of edge computing and geohazard prevention is usually with the characteristics of low delay, security, and reliability, and has great potential compared with cloud computing. Edge computing has not been widely introduced into the application scenario of geohazard monitoring and early warning. Thus, in this article, a general paradigm of edge-based IoT data mining for geohazard monitoring and early warning is presented, including data acquisition, data mining and analysis, and data interpretation. Moreover, a real case of surface deformation induced by coal mining is used to illustrate the details of the presented general paradigm. In addition, the challenges and potential future work in edge-based geohazard prevention are summarized as follows: (1) the harsh geohazard monitoring environment puts forward high requirements for fault tolerance, low power consumption, and the networking of sensor networks. (2) Several new techniques, such as UAV monitoring, 5G communication technology, and the GNN, could be integrated with edge computing. (3) Cloud computing and edge computing could be combined and collaborate to maximize their advantages in geohazard prevention. The presented general paradigm of edge-based IoT data mining can contribute to geohazard prevention.

Footnotes

Authors' Contributions

Conceptualization: J.Q. and G.M. Methodology: J.Q., G.M., and Z.M. Writing—original draft preparation: J.Q. and G.M. Writing—review and editing: G.M. and F.P.

Acknowledgments

The authors thank the editor and the reviewers for their contributions.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This research was jointly supported by the National Natural Science Foundation of China (Grant Nos. 11602235 and 41772326), the Fundamental Research Funds for China Central Universities (2652018091), and Major Program of Science and Technology of Xinjiang Production and Construction Corps (2020AA002).

Abbreviations Used

References

Vilas

, Kathleen

, Beavers

, et al.

Natural hazards review. Virginia USA: American Society of Civil Engineers.

2000.

Wang

, Shun

, XianJun

, et al. The review and prospects of the study on Chinese Geological Disaster. Appl Mech Mater. 2012; 166:2597–2600.

Evans

, Guthrie

, Roberts

, et al. The disastrous 17 February 2006 rockslide-debris avalanche on Leyte Island, Philippines: A catastrophic landslide in tropical mountain terrain. Nat Hazards Earth Syst Sci. 2007; 7:89–101.

Yin

, Wang

, Ping

. Landslide hazards triggered by the 2008 Wenchuan earthquake, Sichuan, China. Landslides. 2009; 6:139–152.

Anderson

Early warning for geological disasters: Scientific methods and current practice. Environ Eng Geosci. 2014; 20:404.

Yuan

. Analysis of monitoring and early warning for sudden geological hazards. China: World Nonferrous

Metals

. 2018.

Zhang

, Shan

. Early warning and prevention of geo-hazards in China. Landslides. 2005; 36:285–289.

Atzori

, Iera

, Morabito

. The Internet of Things: A survey. Comput Netw. 2010; 54:2787–2805.

Gubbi

, Buyya

, Marusic

, et al. Internet of Things (IoT): A vision, architectural elements, and future directions. Future Gen Comput Syst. 2013; 29:1645–1660.

10.

Yong-Qiang

, Juan

. Dynamic monitoring and early warning system of geo-hazards based on the technology of internet of things. Chin J Geol Hazard Control. 2013; 24:90–93+99.

11.

Mei

, Xu

, Qin

, et al. A survey of internet of things (IoT) for geohazard prevention: Applications, technologies, and challenges. IEEE Internet Things J. 2020; 7:4371–4386.

12.

Ahmed

, Yaqoob

, Hashem

, et al. The role of big data analytics in Internet of Things. Comput Netw. 2017; 129:459–471.

13.

, Ju

, Qiang

, et al. Automated data processing and integration of large multiple data sources in geohazards monitoring. Int J Georesour Environ. 2017; 3:9–21.

14.

Salman

, Elhajj

, Kayssi

, et al. Edge computing enabling the Internet of Things. In: 2015 IEEE 2nd World Forum on Internet of Things (WF–IoT). Milan, Italy: IEEE, 2015, pp. 603–608.

15.

Satyanarayanan

The emergence of edge computing. Comput Netw. 2017; 50:30–39.

16.

Shi

, Jie

, Quan

, et al. Edge computing: Vision and challenges. IEEE Internet Things J. 2016; 3:637–646.

17.

Dong

, Ning

, Obaidat

, et al. Edge computing based healthcare systems: Enabling decentralized health monitoring in Internet of Medical Things. IEEE Netw. 2020; 34:254–261.

18.

Pustokhina

, Pustokhin

, Gupta

, et al. An effective training scheme for deep neural network in edge computing enabled Internet of Medical Things (IoMT) systems. IEEE Access. 2020; 8:107112–107123.

19.

Wang

, Wei

, Kong

, et al. ECASS: Edge computing based auxiliary sensing system for self-driving vehicles. J Syst Architect. 2019; 97:258–268.

20.

Yuan

, Zhou

, Li

, et al. Toward efficient content delivery for automated driving services: An edge computing solution. IEEE Netw. 2018; 32:80–86.

21.

Chakraborty

, Datta

. Home automation using edge computing and Internet of

Things

. In: 2017 IEEE International Symposium on Consumer Electronics (ISCE), Malaysia, IEEE, 2018, pp. 47–49.

22.

, Zeng

, Yang

, et al. A novel blockchain framework for industrial IoT edge computing. Sensors. 2020; 20:2061.

23.

Barthélemy

, Verstaevel

, Forehead

, et al. Edge-computing video analytics for real-time traffic monitoring in a Smart City. Sensors. 2019; 19:2048.

24.

Zhang

, Ying

, Zhang

. Forest fire monitoring system based on edge computing. Big Data Res. 2019; 5:79–88.

25.

Shi

Analysis, Prevention and treatment of the geological hazards to the Dushanzi-Lalati Section of the Duku Highway. Traffic Engineering & Technology for National Defence. 2008; 168:269–273.

26.

Zhang

Prevention countermeasures of geological hazards. Chin J Geol Hazard Control. 2001; 79–82.

27.

Ginzburg

, Svalova

, Nikolaev

, et al. Early-warning landslide monitoring system. Natl Hazards Risk Res in Russia. 2015; 5:63–85.

28.

Liu

, Liu

, Wen

, et al. Early warning for regional geo-hazards during 2003–2012, China. Chin J Geol Hazard Control. 2015; 26:1–8.

29.

, Hui

, Xu

, et al. Serving at the edge: A scalable IoT architecture based on transparent computing. IEEE Netw. 2017; 31:96–105.

30.

Lopez

, Montresor

, Epema

, et al. Edge-centric computing: Vision and challenges. ACM Sigcomm Comput Commun Rev. 2015; 45:37–42.

31.

Wang

, Yan

, Xianguo

, et al. Development of geological hazards monitoring system based on IoT and application in Guizhou province. China Meas Test. 2017; 43:94–99.

32.

Mangalvedhe

, Ratasuk

, Ghosh

NB-IoT deployment study for low power wide area cellular IoT. In: IEEE International Symposium on Personal, Spain, IEEE, 2016, pp. 1–6.

33.

Kotta

, Rantelobo

, Tena

, et al. Wireless sensor network for landslide monitoring in Nusa Tenggara Timur. Telecommun Comput Electr Control. 2011; 9:9–18.

34.

Kui

, Qin

, Chun

, et al. Layout program of landslide monitoring network. China: Geospatial Information. 2009.

35.

Ohbayashi

, Nakajima

, Nishikado

, et al. Monitoring system for landslide disaster by wireless sensing node network. In: 2008 SICE Annual Conference, Japan, 2008, pp. 1704–1710.

36.

Giri

, Ng

, Phillips

. Wireless Sensor Network System for Landslide Monitoring and Warning. IEEE Trans Instrum Meas. 2019; 68:1210–1220.

37.

Long

, Guang-Lia

, Gao

, et al. Application of the data preprocessing methods to the correlation analysis of landslide displacement. Geol Sci Technol Inf. 2012; 31:122–127.

38.

Crone

, Lessmann

, Stahlbock

. The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. Eur J Oper Res. 2006; 173:781–800.

39.

García

, Luengo

, Herrera

. Data preprocessing in data mining. Germany: Springer Publishing

Company

, Incorporated. 2016.

40.

Gao

, Mei

, Cuomo

, et al. Adaptive RBF interpolation for estimating missing values in geographical data. In: Sergeyev YD, Kvasov DE (Eds). Numerical Computations: Theory and Algorithms, Italy, Springer, 2020, pp. 122–130.

41.

Patel

, Bhojak

, Shah

, et al. Comparison of various data cleaning methos in mining. Int J Adv Res Eng Sci Technol. 2016; 3:1–41.

42.

Halevy

, Rajaraman

, Ordille

Data integration: The teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, Korea, VLDB Endowment, vol. 8, 2006. pp. 9–16.

43.

Chen

, Han

, Yu

. Data mining: An overview from a database perspective. IEEE Trans Knowl Data Eng. 1996; 8:866–883.

44.

Han

, Kamber

. Data mining: Concepts and technique. San Francisco: Morgan Kauffman

Publishers

. 2006.

45.

Winship

, Radbill

. Sampling weights and regression in analysis. Sociol Methods Res. 1999; 23:230–257.

46.

Safavian

, Landgrebe

. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern. 2002; 21:660–674.

47.

Rish

An empirical study of the naive Bayes classifier. J Univ Comput Sci. 2001; 1:127.

48.

Suykens

JAK

, Vandewalle

. Least squares support vector machine classifiers. Neural Proc Lett. 1999; 9:293–300.

49.

Haykin

. Neural networks: A comprehensive foundation, 3rd ed. New York

A: Macmillan. 1994.

50.

Toivonen

. Apriori algorithm. New York

A: Springer

. 2010.

51.

Zhang

, Xiong

, Geng

, et al. Analysis and improvement of Eclat algorithm. Comput Eng. 2010; 36:28–30.

52.

Meng

, Shang

, Ling

. The application on intrusion detection based on

K-means cluster algorithm

. Washington

: IEEE Computer

Society

. 2009.

53.

Long

, Zhang

, Xiaoyun

, et al. Spectral clustering for multi-type relational data. In: Cohen W, Moore A (Eds). Proceedings of the 23rd International Conference on Machine Learning, Association for Computing Machinery, New York, NY: vol. 8, 2006, pp. 585–592.

54.

Gers

Learning to forget: Continual prediction with LSTM. Neural Comput. 2000; 12:2451–2471.

55.

Contreras

, Espinola

, Nogales

, et al. ARIMA models to predict next-day electricity prices. IEEE Trans Power Syst. 2003; 18:1014–1020.

56.

Bui

, Pradhan

, Lofman

, et al. Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and Nave Bayes models. Math Prob Eng, 2012; 2012:26.

57.

Yeon

, Han

, Ryu

. Landslide susceptibility mapping in Injae, Korea, using a decision tree. Eng Geol. 2010; 116:274–283.

58.

Meng

, Pei

, Liu

, et al. GIS-Based susceptibility assessment of geological hazards along the road from Dujiangyan to Wenchuan by factor analysis. Chin J Geol Hazard Control. 2016; 27:106–115.

59.

Huo

, Zhang

, Yu-Dong

, et al. Method of classification for susceptibility evaluation unit for geological hazards: A case study of Huangling County, Shaanxi, China. J Jilin Univ. 2011; 41:523–528+535.

60.

Ramana

, Krishna

, Kumar

, et al. Monthly rainfall prediction using wavelet neural network analysis. Water Resour Manage. 2013; 27:3697–3711.

61.

Cheng

, Zeng

, Wei

, et al. Multiple neural networks switched prediction for landslide displacement. Eng Geol. 2015; 186:91–99.

62.

Korup

, Stolle

. Landslide prediction from machine learning. Geol Today. 2014; 30:26–33.

63.

Kuradusenge

, Kumaran

, Zennaro

. Rainfall-induced landslide prediction using machine learning models: The Case of Ngororero District, Rwanda. Int J Environ Res Public Health. 2020; 17:4147.

64.

Shao

, Jun

. Geological hazards types induced by mining and their characteristics in Guizhou province. Chin J Geol Hazard Control. 2011; 22:56–60.

65.

Zarnowitz

, Ozyildirim

. Time series decomposition and measurement of business cycles, trends and growth cycles. J Monet Econ. 2006; 53:1717–1739.

66.

Seng

A new approach of moving average method in time series analysis. In: New Media Studies, Indonesia, IEEE, 2014, pp. 1–4.

67.

Hochreiter

, Schmidhuber

. Long short-term memory. Neural Comput. 1997; 9:1735–1780.

68.

Shirazipourazad

, Sen

, Bandyopadhyay

. Fault-tolerant design of wireless sensor networks with directional antennas. Pervasive Mob Comput. 2014; 13:258–271.

69.

Hoblos

, Staroswiecki

, Aitouche

Optimal design of fault tolerant sensor networks. In: IEEE International Conference on Control Applications, USA, IEEE, 2018, pp. 467–472.

70.

Lee

, Ke

, Fang

, et al. Open-source wireless sensor system for long-term monitoring of slope movement. IEEE Trans Instrum Meas. 2017; 66:767–776.

71.

Ong

, Yang

, Mukherjee

, et al. A wireless sensor network for long-term monitoring of aquatic environments: Design and implementation. Sensor Lett. 2004; 2:48–57.

72.

Yang

, Li

Design and implementation of low-power wireless sensor networks for environmental monitoring. In: IEEE International Conference on Wireless Communications, China, IEEE, 2010, pp. 593–597.

73.

Abraham

, Li

. Design of a low-cost wireless indoor air quality sensor network system. Int J Wirel Inf Netw. 2016; 23:57–65.

74.

Hakala

, Kivelä

, Ihalainen

, et al. Design of low-cost noise measurement sensor network: Sensor function design. In: 2010 First International Conference on Sensor Device Technologies and Applications, Italy, IEEE, 2010, pp. 172–179.

75.

Rabaey

, Ammer

, Silva

, et al. PicoRadio supports ad hoc ultra-low power wireless networking. Computer. 2000; 33:42–48.

76.

Torrero

, Seoli

, Molino

, et al. The use of micro-UAV to monitor active landslide scenarios. In: Lollino G, Manconi A, Guzzetti F, et al. (Eds). Engineering Geology for Society and Territory—Vol 5. Cham, Switzerland: Springer, 2014, pp. 701–704.

77.

Lindne

, Schraml

, Mansberger

, et al. UAV monitoring and documentation of a large landslide. Applied Geomat. 2015; 8:1–11.

78.

Mobile edge computing towards 5G: Vision, recent progress, and open challenges. China Commun. 2016; 13(Supplement2):89–99.

79.

Khodashenas

PSKS

, Ruiz

, Siddiqui

MSSS

, et al. The role of Edge Computing in future 5G mobile networks: Concept and challenges. In: Markakis E, Mastorakis G, Mavromoustakis C, Pallis E (Eds). Cloud and Fog Computing in 5G Mobile Networks: Emerging Advances and Applications. Beijing, China: The Institution of Engineering and Technology (The IET). DOI: 10.1049/PBTE070E_ch13.

80.

Tran

, Hajisami

, Pandey

, et al. Collaborative mobile edge computing in 5G networks: New paradigms, scenarios, and challenges. IEEE Commun Mag. 2017; 55:54–61.

81.

Pradhan

, Kang

, Kim

. Hybrid landslide warning model for rainfall triggered shallow landslides in Korean Mountain. In: Workshop on World Landslide Forum, Slovenia, Springer, 2017, pp. 193–200.

82.

Xiong

, Xie

, Kuang

. Design and implementation of geological hazard forecast system based on WebGIS. In: 2009 International Conference on Computational Intelligence and Software Engineering, China, IEEE, 2009.

83.

Scarselli

, Gori

, Tsoi

, et al. The graph neural network model. IEEE Trans Neural Netw. 2009; 20:61.

84.

Yuting

, Ming

, Chicheng

, et al. Graph neural network. Sci Sin Math. 2020; 50:367.

85.

Zhao

, Song

, Zhang

, et al. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans Intell Transp Syst. 2019; 21:3848–3858.

86.

Zhishuai

, Yisheng

, Xiong

. Short-term traffic flow prediction based on graph convolutional neural network and attention mechanism. J Transp Eng. 2019; 19:16–19+28.