Pseudo grey metabolic Markov model and its application in urban rainfall prediction

Abstract

Purpose:

The purpose of this paper is to propose a pseudo-grey metabolic grey Markov model to deal with the prediction issue in which the original sequences are oscillation sequences.

Design/methodology/approach:

First, the original sequences were processed with the accelerated advection transformation and the weighted mean generation transformation to make them smoother. Then, the mean GM (1, 1) model was applied to the multi-step prediction of the pre-processed data sequences. Finally, with the help of the optimal partitioning method, the pseudo-grey metabolic Markov model was used to correct the prediction results and determine the final prediction values.

Findings:

The results demonstrate that the accuracy of this model is significantly higher than that of the traditional grey Markov model, which further verifies the rationality of the proposed model. Therefore, scientific and reasonable prediction of urban rainfall is of great theoretical significance and application value for the government and decision-making departments to formulate drought prevention and disaster mitigation measures.

Originality/value:

The model in this paper not only provides new ideas for the data preprocessing problem of the grey Markov model, but also solves the problem of errors due to individual subjectivity in state interval division. It provides a novel idea for the development of grey prediction models. The rationality and validity of the model are illustrated by taking the Zhengzhou City of Henan Province as examples.

Keywords

Prediction GM (1,1) model Markov model rainfall

1 Introduction

Coming into the twenty-first century, water resources became the scarcest resources and it is also is a key resource for human survival [1]. As we all know, rainfall plays a critical role in agricultural production, industrial construction and water resources flood control [2]. Meanwhile, rainfall is one of the consequential reasons for the occurrence of natural disasters such as floods and droughts. Rainfall changes are highly related to the development and utilization of regional water resources and even affect the development of society and the economy. In recent years, environmental issues caused by human activities and natural factors are becoming more and more severe and the frequency of droughts and floods is gradually increasing, in which society and the economy suffered huge losses. Therefore, accurate prediction of rainfall can provide scientific suggestions for flood and drought prevention and reduce the losses caused by natural disasters. There is large randomness in the rainfall process owing to the variability and complexity of meteorological conditions. How to predict rainfall more accurately has become a popular topic. A considerable amount of literature on rainfall prediction has been published. These studies have shown that the main methods for rainfall prediction include the ARIMA prediction method [3], classical correlation analysis [4] and Markov models. These models and methods have been investigated from different perspectives and obtained many meaningful results on rainfall prediction.

Raval first purposed machine learning algorithms and compared the performance of different models [5]. Then he developed a prediction model using the neural network. Finally, he did a comparative study of new and existing prediction techniques using Australian rainfall data. The results showed that both traditional and neural network-based machine learning models can predict rainfall with more precision. Fan developed a hydrological forecasting system using merged rainfall information from ground-based telemetric gauges and real-time TRMM satellite rainfall estimates [6]. By analyzing the 2011/2012 rainy season flood predictions and comparing results of deterministic and ensemble forecasts for the major flood of 2012/2013, the feasibility of the model was verified. He attempted to combine rainfall prediction from a high-resolution meso-scale weather model and a radar-based rainfall model [7]. Two rainfall forecasting methods were selected and examined: the weather research and forecasting model (WRF) and a translation model (TM). It was shown that results from WRF were very useful as an advisory of anticipated heavy rainfall event, whereas those from TM, which used information of rain cells already appearing on the radar screen, were more accurate for rainfall nowcasting as expected. All the above models can effectively predict rainfall, which lays the foundation for the development of rainfall prediction.

In 1907, Markov chains were first introduced by Russian mathematician A. A Markov. Gabriel investigated the sequence of daily rainfall occurrences and found that the daily rainfall model was consistent with Markov chains [8]. During the next hundred years, the application of Markov chains to rainfall has developed considerably. PEREIRA constructed the two-state Markov chain model [9]. In this paper, an answer to a similar concern about the model developed was given using the Bayesian Information Criterion (BIC) to establish the order of the Markov chain which best fitted the data. It was seen that the data generated using the optimal order are closer to the real data than before, except for two sites. Lu conducted a cellular automata-based Markov chain model based on fuzzy set theory and multi-criteria evaluation [10]. This study highlighted the need for integrating spatiotemporal modeling analyses, such as a statistical downscaling model driven by climate change with remote sensing and GIS. The research findings indicated that the mean rainfall will increase in the future near New York City. Wang proposed a scheme based on the improved residual multivariable grey model and Markov process [11]. First, on the basis of analyzing the multivariable grey model (MGM) and the Markov process, the improved residual MGM-Markov theory was expounded in detail. Second, a dual pre-warning scheme of transmission line icing was proposed based on the improved residual MGM-Markov theory. Eventually, the scheme was applied to predict transmission line icing in a province of China. The error was less than 5%, which proved its accuracy and applicability.

All of the above studies have greatly contributed to the development of Markov chains. Scholars have conducted the following studies for the problem of dividing the state intervals of Markov models and pre-processing the original data.

(1) In the division of state intervals: Ye introduced the central-point triangular whitenization weight function in state division to calculate possibilities of research values in each state which reflected preference degrees in different states objectively [12]. The method effectively solved the problem that the traditional grey Markov prediction model state partitioning process was mostly based on subjective real numbers, which affected the accuracy of the predicted values. Gong divided 61 annual low-temperature weighted indexes of 1951–2011 into five states using the mean standard deviation method [13]. The feasibility of the model was verified according to employing both the weighted Markov method and grey weighted Markov method to the real low-temperature states of 2010 and 2011. Lu divided the data series into several states according to the principle of equal probability, then applied the model to the prediction of Beijing subway passenger flow and verified its applicability [14]. The above models introduce the method of numerical calculation, which provides a new idea for state interval division. However, in terms of application, there are many existing problems about large time complexity and computational complexity.

(2) In the preprocessing of the raw data: Wang used a weighted Markov chain model for prediction. The idea of the model was a sequence of dependent random variables, which the correlation coefficients of each order portray the dependence of various step sequences and their strengths [15]. The optimal partitioning algorithm was used to determine the grading criteria. The annual rainfall data from 1951 to 2004 in Wuhan City was used to predict the rainfall situation in 2008, which proved the feasibility of the model. Chen proposed an improved weighted Markov chain prediction method to address the problem of smoothness of Markov chains [16]. The annual rainfall data from 1952 to 1998 from Hequ hydrological station in Shaanxi province was used to test, which proved the feasibility of the model. Miao established a weighted Markov chain model [17]. The model used the affiliation vector as the initial state vector in prediction and the normalized autocorrelation coefficients of each order as weights. Took the annual rainfall of Yulin City from 2009 to 2015 as an example, the relative errors of all prediction results are within 10%, which indicated that the model is reasonable and feasible.

The above models are considered as weighted Markov models, in other words, the raw data is a sequence of dependent random variables. To some extent, it promotes the diversification of Markov research directions. However, it is a question that how to consider the case where the original data are non-dependent variables.

The factors affecting rainfall mainly include geographical location, ocean currents, vegetation and hydrological conditions, etc. The extent, time, and intensity of rainfall occurring are uncontrollable, which leads to greyness in the statistical process of data. Grey system theory is first proposed by professor Deng, including the grey predicting model, grey decision-making model and grey association model, which is a useful tool to analyze small samples from different perspectives [18, 19]. The main advantage of the grey predicting model is that only a few data are required to obtain accurate predicting results [20]. In Deng’s work, the GM (1, 1) model is proposed, which is a grey model with first-order accumulation and one variable. Additionally, Markov models are suitable for forecasting problems with high randomness and volatility. The combination of both is used to construct the grey Markov model [21]. In the problem of state interval partitioning, the existing grey Markov models have the problem of large time complexity and computational complexity. In addition, most scholars start from the weighted grey Markov model and unbiased grey Markov model in the processing of the original data sequences when a grey model needs to be constructed. Therefore, conducting research from the data itself is an urgent problem to be solved.

Therefore, this paper constructs the following model based on the existing research. First, under the framework of the grey Markov model, we apply accelerated advection transformation and weighted mean generation transformation into the original data sequences filled with high volatility, so that the sequences perform more smooth; second, we use the optimal partitioning method to determine the state interval; finally, we apply this model to the annual rainfall prediction of Zhengzhou City, which verifies the feasibility and validity of the model.

Methodologically speaking, this model not only overcomes the complication of calculation in existing models, but also makes the division of state intervals more scientific and reasonable. The cluttered data series are appropriately transformed and then fitted combined with the mean GM (1,1) model, which promotes goodness of fit and prediction accuracy. In terms of application, the main influencing factors of urban rainfall are temperature, precipitation, unit surface water resources and unit ground water resources. Since scope, time and intensity of drought are uncontrollable, the statistical data often shows the coexistence of grey and randomness. It is reasonable to apply the model to urban rainfall prediction and the model in this paper can also be used for similar prediction problems.

This paper is organized as follows: In Section 2, the conduction of the novel grey model and the modeling steps are given. In Section 3, a practical case based on the data of Zhengzhou’s annual rainfall from 2009 to 2017 is conducted, followed by a further prediction from 2018 to 2020. The practical case is utilized to validate the novel model. And Section 4 is conclusion, which summarizes the whole research.

2 Model construction

Markov mode have been widely used in various fields since its proposition. Some scholars have extended it to high-order Markov models and combined it with fuzzy theory or grey theory, in which these methods demonstrated better performance. To address the existing problems of grey Markov model, scholars have improved it from the following two perspectives. On the one hand, state intervals are delineated by optimized methods, such as the hierarchical clustering method using Euclidean distance, the mean-mean-squared difference method and the equal probability principle for the delineation of state intervals. On the other hand, raw data is preprocessed. Most scholars used weighted Markov model where raw data is a sequence of dependent random variables. However, there is issues of calculation complexity in former and the processing of raw data is homogeneous in latter. Thus, the question how to resolve the sequence of non-random variables is needed to be done.

This paper proposed a pseudo grey metabolic Markov model to tackle two questions above. The model first processed the raw data appropriately and then applied a modified grey Markov model for prediction. Optimal partitioning is a clustering method that partitions data of n state into k classes where the intra-segment data variation is minimized while the inter-segment variation is maximized. This paper applied the optimal partitioning method to delineate state interval, which is the difficulties of this study. Given that the actual data series are mostly random oscillation series and the mean GM (1,1) model is most suitable for modeling non-exponential growth series, it is novel to firstly process the raw data in this paper. Next, the pseudo grey metabolic Markov model and the traditional modes were applied to predict the annual rainfall in Zhengzhou City, which indicated that the accuracy of this model has improved compared with the traditional model.

2.1 Data processing

Theoretically, every simple model describing the trend of monotonic transformation is difficult to describe the variation of oscillation sequences. The characteristics of rainfall data consist of volatility and randomness. Meanwhile, the mean GM (1, 1) model has a considerable advantage in building the model of non-exponential growth sequences. Owing to the reasons above, we adopt the accelerated advection transformation and the weighted mean generation transformation to preprocess the oscillation data sequences.

Definition 1 [22] Let X = (x (1) , x (2) , ⋯ , x (n)) be a sequence, and

If ∀k = 2, 3, ⋯ , n, x (k) - x (k - 1) >0, the sequence X is a monotonically increasing sequence.

If the inequality sign in formula (1) is reversed, the sequence X is a monotonically decreasing sequence.

If ∃k, k′ ∈ {2, 3, ⋯ , n}, x (k) - x (k - 1) >0 and x (k′) - x (k′ - 1) <0, then the sequence X is a random oscillation sequence. M - m is the amplitude of the random oscillation sequence, where M = max {x (k) |k = 1, 2, ⋯ , n}, m = min {x (k) |k = 1, 2, ⋯ , n}.

Definition 2 [23] Let X = (x (1) , x (2) , ⋯ , x (n)) be a sequence, T = M - m, transformation $x (k) d_{1} = x (k) + x (k - 1) T, k = 1, 2, \dots, n$ (1) is called as the accelerated translational transformation, denoted as D₁.

Every sequence can be translated into a monotonically increasing sequence after the accelerated translational transformation.

Definition 3 [24] Let X = (x (1) , x (2) , ⋯ , x (n)) be a sequence, transformation $x (k) d_{2} = (\sum_{i = 1}^{k} x (i)) / k, k = 1, 2, \dots, n$ (2) is called as the weighted mean generating transformation, denoted as D₂.

The weighted mean generation transformation can weaken the randomness of the original data sequences. It also means that the volatility is diminished.

2.2 State interval division

The traditional Markov model divides the state interval division according to the concentration of the error range so that the objective law of state change is satisfied in each interval. However, this division process is highly influenced by subjectivity. Everyone has a varied understanding of the relative simulation error concentration, which results in large randomness in the delineation interval. In order to weaken the randomness, we apply the optimal partitioning method to the division of state intervals. Optimal partitioning is a clustering method that partitions n state data into k classes. This clustering method minimizes the intra-segment data variation and maximizes the inter-segment variation. The detailed process is as follows:

Define state set E = {e₁, e₂, ⋯ , e_n}, and divide the n state into k classes. So we can get class 1 is {e₁, e₂, ⋯ , e_p}, class 2 is {e_p+1, e_p+2, ⋯ , e_q}, dots , class k - 1 is {e_r, e_r+1, ⋯ , e_s}, and class k is {e_s+1, e_s+2, ⋯ , e_n}. Assume that {e_i, e_i+1, ⋯ e_j} indicates one of the classes, whose mean value of the state data is ${\bar{e}}_{ij} = \sum_{i = 1}^{j} \frac{e_{i}}{j - i + 1}$ (3) We can conclude the variation is $D (i, j) = \sum_{i = 1}^{j} {(e_{i} - {\bar{e}}_{ij})}^{2}$ (4) Define the total variance of the k classes as S (k), where

$\begin{matrix} S (k) = D (1, p) + D (p + 1, q) + \dots + \\ D (r, s) + D (s + 1, n) \end{matrix}$ (5) The optimal segmentation can be achieved while the value of variation S (k) is the smallest.

Namely,

Calculate the minimum value of D (s + 1, n) and solve for the segmented point t_k to obtain the class k, denoted as [t_k, e_n]. In the same approach, we get the remaining k - 2 segmentation points t_k-1, t_k-2, ….

From the theoretical perspective, the traditional Markov model state intervals are divided according to the concentration of the error range, which has a strong subjectivity. This method reduces the randomness caused by artificial selection so that the divided state intervals are more scientific.

2.3 Determine the one-step transfer probability matrix

The one-step transfer probability matrix is $P = [\begin{matrix} p_{11} & p_{12} & \dots & p_{1 n} \\ p_{21} & p_{22} & \dots & p_{2 n} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ p_{n 1} & p_{n 2} & \dots & p_{nn} \end{matrix}]$ , where $p_{ij} = f_{ij} / \sum_{i = 1}^{n} f_{ij}$ , i, j = 1, 2, ⋯ , n, p_ij represents the probability that the sequence is transferred from state i to state j in one step, f_ij represents the frequency that the sequence is transferred from state i to state j in one step.

2.4 Marginality test

Let E = (e₁, e₂, ⋯ , e_n) be the set of state, and f_ij represents the frequency that the sequence is transferred from state i to state j in one step, where i, j ∈ E. The sum of the columns j of the transfer frequency matrix divided by the sum of the columns of each row is called the “marginal probability”, denoted as p_j, where $p_{j} = \sum_{i = 1}^{n} f_{ij} / \sum_{i = 1}^{n} \sum_{j = 1}^{n} f_{ij}$ . The statistic χ² is calculated as $χ^{2} = 2 \sum_{i = 1}^{n} \sum_{j = 1}^{n} f_{ij} | ln (\frac{p_{ij}}{p_{j}}) |$ , obeying the χ² distribution with degree of freedom (m - 1) ². Given the significance level α, looking up the table can score the value of the loci $χ^{2} > χ_{α}^{2} ((n - 1)^{2})$ . When $χ^{2} > χ_{α}^{2} ((n - 1)^{2})$ , the sequence is considered to be consistent with the Marginality.

Fig. 1

Modeling process of the novel model.

2.5 Probability criterion for determining state transfer

Let the initial moment of the sequence be in the state e_i, the initial state vector $α (i) = (0, 0, \dots, 0, \overset{i}{\overset{︷}{1}}, 0, \dots, 0)$ , and the state vector that is transferred from the state e_i to the next state by one step is β (j),where β (j) = α (i) P, $α (i) = (0, 0, \dots, 0, \overset{i}{\overset{︷}{1}}, 0, \dots, 0)$ , so $β (j) = α (i) P = (0, 0, \dots, 0, \overset{i}{\overset{︷}{1}}, 0, \dots, 0) (\begin{matrix} p_{11} & p_{12} & \dots & p_{1 n} \\ p_{21} & p_{22} & \dots & p_{2 n} \\ ⋮ & ⋮ & \dots & ⋮ \\ p_{n 1} & p_{n 2} & \dots & p_{nn} \end{matrix}) = (p_{i 1}, p_{i 2}, \dots, p_{in})$ . The maximum p_ij value is taken out so that the state of the next step is e_j. If there are two or more cases of equal data in the state vector β (j) = (p_i1, p_i2, ⋯ , p_in) of the row, in other words, max p_ij = p_i1 = p_i2 = ⋯ = p_in, it is necessary to discuss the case by the situation. Take p_ij = p_ik while i = 1 and min {1, 2, ⋯ , n} = k; take p_ij = p_iq while i = n and max {1, 2, ⋯ , n} = q; take p_ij = p_im while i ≠ 1 and i ≠ n, $\underset{m}{arg min} {i - m | m = 1, 2, \dots, n}$ . It means that if the row is in the first or the last row, the number on either side is adopted; if the row is in the middle row, the number closest to the row is adopted.

2.6 Revised predicted values

The revised value of the predicted value is relevant to the next transferred state. Based on the idea of metabolism, we add the latest revised value into the original data in time without deleting the raw data.

Assume that the predicted object is next transferred to the state e_j, the revised value of the predicted value is $\tilde{x} (k) = \frac{{\hat{x}}^{0} (k)}{1 \pm 0.5 | E_{j 1} + E_{j 2} |}$ (9)

2.7 Algorithm of the model

Step 1: Transform all the raw data by applying Equations (6) and (7).

Step 2: Establish the mean GM (1, 1) model by regarding the transformed data as the raw data.

Step 3: Test the model’s accuracy and applicability.

Step 4: Delineate state intervals using the relative simulation error results of the fitting model as the set of state.

Step 5: Calculate the one-step transfer probability matrix.

Step 6: Test the Marginality of the model.

\begin{matrix} min [S (k)] = min [D (1, p) + D (p + 1, q) + \dots + \\ D (r, s) + D (s + 1, n)] \end{matrix}

(6)

S (k) = S (k - 1) + D (s + 1, n)

(7)

Step 7: Correct the predicted values by applying Equation (8).

S_{min} (k) = S_{min} (k - 1) + D_{min} (s + 1, n)

(8)

3 Case study

3.1 Study area and data sources

Zhengzhou, the capital city of Henan Province, is located in the central north of Henan Province and the middle and lower reaches of the Yellow River, belonging to the Huai River basin. It covers over 7446 square kilometers (Fig. 2). Zhengzhou spanning 112.42°-114.14°E and 34.16°-34.58°N belongs to the north temperate continental monsoon climate with four distinct seasons, high temperature in summer, cold and dry in winter and uneven distribution of rainfall seasons. The rainfall in Zhengzhou is mainly concentrated in summer and is highly variable in different years, with large randomness of the rainfall process and the large fluctuation of rainfall data. Considering the situation that the rainfall sequence is filled with uncertainty, we apply the model as below to predict the rainfall of Zhengzhou in a more precise method.

Fig. 2

Location of Zhengzhou City.

The data in this paper are from the China Statistical Yearbook (2010–2020) and the National Weather Science Data Center (2020).

3.2 Application of the model

1. Data processing

All the raw data are transformed by applying Equations (6) and (7). The processed data are shown in Table 1 and Fig. 3.

Table 1
Annual rainfall of Zhengzhou City in 2009–2020 (Unit: mm)

Year Raw data X Data d₁ (X) of transformation d₁ Data d₂ (X) of transformation d₂

2009 762.5 762.5 762.5

2010 600.3 1080.1 921.3

2011 706.5 1666.1 1169.6

2012 498.7 1938.1 1361.7

2013 353.2 2272.4 1543.8

2014 551.6 2950.6 1778.3

2015 689.1 3567.9 2034.0

2016 833 4191.6 2303.7

2017 598.8 4437.2 2540.7

2018 609.5 4927.7 2779.4

2019 633.4 5431.4 3020.5

2020 681.7 5959.5 3265.4

Year	Raw data X	Data d₁ (X) of transformation d₁	Data d₂ (X) of transformation d₂
2009	762.5	762.5	762.5
2010	600.3	1080.1	921.3
2011	706.5	1666.1	1169.6
2012	498.7	1938.1	1361.7
2013	353.2	2272.4	1543.8
2014	551.6	2950.6	1778.3
2015	689.1	3567.9	2034.0
2016	833	4191.6	2303.7
2017	598.8	4437.2	2540.7
2018	609.5	4927.7	2779.4
2019	633.4	5431.4	3020.5
2020	681.7	5959.5	3265.4

Table 2

Annual rainfall fitting results from 2009 to 2017

Year	Actual value	Simulated value	Residual	Relative simulation error
2009	762.500	762.500	0	0
2010	921.300	1021.957	–100.657	–10.926%
2011	1169.567	1168.077	1.49	0.127%
2012	1361.700	1335.089	26.611	1.954%
2013	1543.840	1525.981	17.859	1.157%
2014	1778.300	1744.167	34.133	1.919%
2015	2033.957	1993.549	40.408	1.987%
2016	2303.663	2278.588	25.075	1.089%
2017	2540.700	2604.382	–63.682	–2.506%

Table 3

Prediction error belongs to the state

Year	Relative simulation error	Affiliated status interval
2009	0	1
2010	–10.926%	1
2011	0.127%	2
2012	1.954%	4
2013	1.157%	3
2014	1.919%	3
2015	1.987%	4
2016	1.089%	2
2017	–2.506%	1

Table 4

Table of marginal probabilities

State	1	2	3	4
p _j	1/4	1/4	1/4	1/4

As can be seen from Fig. 3, the raw data X is significantly volatile. Compared with the raw data, the data sequence d₁ (X) has an obvious trend of monotonic growth. Compared with the data sequence of d₁ (X), the randomness and volatility of the data sequence d₂ (X) transformation d₂ have been weakened. Overall, these results indicate that the sequence d₂ (X) is the best choice to build the mean GM (1, 1) model because the sequence d₂ (X) is more smooth than the other two.

Fig. 3

Comparison of two transformations.

2. Establish the mean GM (1, 1) model

We simulate the mean GM (1, 1) model by the method of replacing the raw data with the transformed data sequence d₂ (X). The results are shown in the following table.

We apply the mean GM (1, 1) model to the annual precipitation prediction from 2009 to 2017. Figure 4 indicates that the simulation is consistent with the actual value except for 2010. The average relative simulation error we calculated is 2.708%, which has significant superiority in prediction.

Fig. 4

Fitting image of the mean GM (1, 1) model.

3. Model applicability and model accuracy test

The calculation results report that the grey model development coefficient a is –0.134 and the grey effect amount b is 853.290, where -a = 0.134 and -a < 0.3. Therefore, the model can be used for medium and long-term forecasting.

The calculation results indicate that the average relative simulation error is 2.708%, i.e. $\bar{Δ} = 0.02708$ and Δ_n = 0.02506. We can get Δ < α and Δ_n < α while taking α = 0.05. So, the model accuracy is level 2. The absolute correlation between the actual sequence and the corresponding simulation sequence of the prediction model is 99.9%, i.e. ɛ = 0.999. We can get ɛ > ɛ₀ while taking ɛ₀ = 0.9. So, the model accuracy is level 1. The mean squared error ratio is 0.079, that is C = 0.079. We can get C < C₀ while taking C₀ = 0.35. So, the model accuracy is level 1. The small error probability is 1, that is p = 1. We can get p > p₀ while taking p₀ = 0.95. So, the model accuracy is level 1. Therefore, the model can be used for prediction.

4. State division

The relative simulation error between the simulated rainfall and the actual value according to the mean GM (1,1) model yields the state data as E = {0, - 0 .10926, 0.00127, 0.01954, 0.01157, 0.01919, 0.01987, 0.01089, - 0.02506}. The data state is 9. We are proposed to be divided into four state intervals so that each state interval contains at least two values. The fourth interval is determined by the variance value and it means that the fourth segmentation point t₄. It is calculated as follows: D(8,9)=2.7225×10^- 8, D(7,9)=2.31267×10^- 7, D(6,9)=4.77923×10^- 5, and D(8,9)<D(7,9)<D(6,9). Therefore, the fourth-best segmentation point is e₈, and it also means that 0.01954. The remaining three segmentation points are obtained in the same way. In summary, the final segmentation results are as follows: the first class: [–0.10926, 0]; the second class: [0.00127, 0.01089]; the third class: [0.01157, 0.01919]; and the fourth class: [ 0.01954, 0.01987.

5. Calculate the one-step transfer probability matrix

The one-step transfer probability matrix is $(p_{ij})_{4 \times 4} = [\begin{matrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 \\ \frac{1}{2} & 0 & 0 & \frac{1}{2} \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{1}{2} & 0 \end{matrix}]$ .

6. Marginality test

The frequency shift array (f_ij) _4×4 and the one-step shift probability matrix (p_ij) _4×4 can be obtained from the above table, respectively. $(f_{ij})_{4 \times 4} = [\begin{matrix} 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 0 \end{matrix}]$ (10) $(p_{ij})_{4 \times 4} = [\begin{matrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 \\ \frac{1}{2} & 0 & 0 & \frac{1}{2} \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{1}{2} & 0 \end{matrix}]$ (11)

The corresponding marginal probabilities table can be obtained from (p_ij) _4×4 is shown below.

The statistic $χ^{2} = 2 \sum_{i = 1}^{n} \sum_{j = 1}^{n} f_{ij} | ln \frac{p_{ij}}{p_{j}} |$ =22.1808, given the significance level α = 0.05, and checking the table can score the loci $χ_{α}^{2} (9) = 16.919$ . Since $χ^{2} > χ_{α}^{2} ((n - 1)^{2})$ , the sequence satisfies the martingale test.

7. Revise the forecast value

We apply the mean GM (1, 1) model to predict the annual rainfall in 2018 and the predicted value is 2976.8. Then we apply the Markov model to correct the predicted value. Since 2017 is in the first state, we conclude that 2018 is most likely to be in the first state. Consequently, the correction value is 2976.8/(1 + 0.5|-0.10926 + 0|)=2823.3; we get the revised value of 2018 and add the revised value of 2018 to the original sequence to predict the value of 2019. The predicted value is 3293.9. Thus, we can get the one-step transfer probability matrix $P_{1} = [\begin{matrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 \\ \frac{1}{2} & 0 & 0 & \frac{1}{2} \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{1}{2} & 0 \end{matrix}]$ . Similarly, the revised value of 2019 is 3124.1 and the revised value of 2020 is 3688.3.

Table 5 and Fig. 5 indicate that the average relative simulation error is 12.691% when we use the grey GM (1, 1) model to predict directly. The accuracy of the prediction is improved by correcting this predicted value using the grey Markov model. Considering the impact of new information on the sequence and the idea of metabolism, we add the corrected value of 2018 in time without deleting the data of 2009 while predicting the value of 2019. In the same way, we predict and correct the data of 2020. Finally, the prediction accuracy is significantly promoted and the average relative simulation error is reduced to 3.285%.

Table 5

Predicted values of annual rainfall from three models in 2018–2020

Year	Actual value	The mean GM (1,1) model		Grey Markov model		Pseudo grey metabolic Markov model

		Predicted value	Relative simulation error	Correction value	Relative simulation error	Predicted value after adding the correction value	Correction value	Relative simulation error
2018	2799.4	2976.7	–6.337%	2823.3	–0.854%	2976.7	2823.3	–0.854%
2019	3020.5	3402.4	–12.644%	3227.0	–6.837%	3293.9	3124.1	–3.430%
2020	3265.4	3888.8	–19.091%	3688.3	–12.951%	3634.7	3447.3	–5.571%
MAPE (%)		12.691%		6.881%		3.285%

Fig. 5

Predicted values of three models.

4 Conclusion

Urban rainfall data inevitably contain the characteristics of randomness and volatility, while the traditional grey Markov model has the disadvantage of unsatisfactory prediction accuracy. To improve the situation, most scholars try to improve it from the perspective of weighting and unbiased, whereas the problem of directly modeling the raw data has been existing until now.

Based on previous research, this paper uses accelerated advection transformation and weighted mean generating transformation to weaken the randomness and volatility trend of the raw data; constructs a pseudo-metabolic grey Markov model based on the mean GM (1, 1) model; adopts the Markov model to correct the prediction results with the optimal partitioning method. The main goal of the current study is to improve the prediction accuracy of the model using the raw data to the model. We apply this model to the rainfall prediction of Zhengzhou City from 2018 to 2020. The results indicate that the prediction accuracy of the novel model is much higher than any of the traditional grey Markov models. The model in this paper not only contributes a novel method to smooth and steady precipitation sequence, but provides theoretical support for the government and scientific researchers to predict urban rainfall scientifically and reasonably.

Footnotes

Acknowledgments

The work was supported by National Natural Science Foundation of China under Grant 51979106; Scientific and Technological Plan Project of Henan Province under Grant 182102310014; Key Research Project of Henan Universities under Grant 18A630030; and the Quality Curriculum Construction Project of Postgraduate Education in Henan Province (Grey Systems Theory under Grant HNYJS2015KC02). Postgraduate Innovative Project of North China University of Water Resources and Electric Power (No.YK2021-113).

References

Dong Qianjin , Ai Xueshan , Cao Guangjing , et al. Study on risk assessment of water security of drought periods based on entropy weight methods[J], Kybernetes 39(06) (2010), 864–870.

Gao Chao , Guan Xinjan , Booij Martijn

, et al. A new framework for a multi-site stochastic daily rainfall model: coupling a univariate Markov chain model with a multi-site rainfall event model[J], Journal of Hydrology(598) (2021), 126478.

Unnikrishnan Poornima and Jothiprakash

, Hybrid SSA-ARIMA-ANN Model for Forecasting Daily Rainfall[J], Water Resources Management 34(11) (2020), 3609–3623.

Sun Xiaoting , Ren Ganghong , Du Kun , et al.Monthly rainfall prediction based on grey correlation method[J], Journal of Irrigation and Drainage 38 (01) (2019), 90–95.

Raval

, Sivashanmugam

, Pham

et al.Automated predictive analytics tool for rainfall forecasting[J], Scientific Reports 11(01) (2021), 9565.

Fan

F.M.

, Collischonn

, Quiroz

K.J.

, et al.Flood forecasting on the Tocantins River using ensemble rainfall forecasts and real-time satellite rainfall estimates[J], Journal of Flood Risk Management 9(03) (2016), 278–288.

He Shan

, Raghavan Srivatsan , Nguyen Ngco Son , et al.Ensemble rainfall forecasting with numerical weather prediction and radar-based nowcasting models[J], Hydrological Processes 27(11) (2013), 1560–1571.

Gabriel

K.R.

and Neumann

J.A.

, A Markov chain model for daily rainfall occurrence at Tel[J], Quarterly Journal of the Royal Meteorological Society 88(375) (1961), 90–95.

Pereira

A.G.C.

, Sousa

F.A.S.

, Andrade

B.B.

, et al. Higher Order Markov chain Model for synthetic generation of daily stream flows[J], TEMA 19(03) (2018), 449–464.

10.

Lu Qi , Joyce Justin , Imen Sanaz , et al.Linking socioeconomic development, sea level rise, and climate change impacts on urban growth in City with a fuzzy cellular automata-based Markov chain model[J], Environment and Planning B-Urban Analytics City Science 46 (03) (2019), 551–572 New York.

11.

Wang Yan , Yao Duoxi , Lu Haifeng , Mine gas emission prediction based on grey markov prediction model[J], Open Journal of Geology 8 (10)(2018), 939–946.

12.

Ye Jing , Dang Yaoguo , Li Bingjun , Grey-Markov prediction model based on background value optimization and central-point triangular whitenization weight function[J], Communications in Nonlinear Science and Numerical Simulation(54) (2018), 320–330.

13.

Gong Zaiwu , Chen Caiqin , Ge Xinming , Risk prediction of low temperature in Nanjing city based on grey weighted Markov model[J], Natural Hazards 71(02) (2014), 1159–1180.

14.

Lu Qian , Wang Yafei , Yang Ling , et al. Passenger Flow Forecasting for Subways Based on Equidimensional New Information Grey Markov[J], China Production Safety Science and Technology 17(01) (2021), 54–60.

15.

Wang Yan , Mao Zhiming , Fan Jing , et al.Application of Weighted Markov Chain Determined by Optimal Partitioning Method to Rainfall Forecasting[J] , Statistics and Decision(11) (2009), 17–18.

16.

Chen Wen , Lu Wangyong , Dai Juan , et al. Improved Weighted Markov Chain Prediction Method and Applications[J], Statistics and Decision 37 (10) (2021), 179–183.

17.

Miao Zhiwei , Xu Ligang , Precipitation prediction of weighted Markov chain based on affiliation correction[J], Journal of Yangtze River Scientific Research Institute 35(01)(2018), 40–46.

18.

Julong

Deng

, , Introduction to grey system theory[J], Journal of Grey System 1(01) (1989), 1–24.

19.

Liu Sifeng , Lin Yun , Grey Systems: Theory and Applications[M]. Springer, Berlin.2010

20.

Liu Yitong , Yang Yang , Xue Diyu , A novel fractional discrete grey model with an adaptive structure and its application in electricity consumption prediction[J], Kybernetes 2021.

21.

Wang Junhua , Liu Shiqi , Shao Jianwen , Study on dual pre-warning of transmission line icing based on improved residual MGM-Markov theory[J], IEEJ Transactions on Electrical and Electronic Engineering 13(04) (2018), 561–569.

22.

Liu Sifeng ,Grey Systems: Theory and Application[M]. Beijing: Science Press, 2020.

23.

Qian Wuyong , Dang Yaoguo , GM (1, 1) model based on oscillation sequences[J], Systems Engineering-Theory and Practice 29 (03)(2009), 149–154.

24.

Dang Yangguo , Liu Sifeng , Mi Chuanqi , A study of the properties of reinforced buffer operators[J], Control and Decision(07) (2007), 730–734.