Energy big data demand prediction model based on fuzzy rough set

Abstract

Energy is an indispensable material resource for human production and life. It is a powerful engine and an important guarantee for human survival, economic and social sustainable development and world change. The economy is developing rapidly, the demand for energy continues to grow, energy consumption has increased sharply in a short period, and the security of energy supply and demand has also shown a severe trend. Predicting energy demand is especially important. However, due to the many influencing factors and the lack of energy data, the energy demand prediction has great uncertainty in the prediction results. Because of the above problems, this paper proposes an energy big data demand prediction model based on a fuzzy rough set model. Firstly, according to the data, the factors affecting the energy demand are determined, and the fuzzy C-means clustering algorithm is used to discretize the data according to the characteristics of the fuzzy rough set. Then the decision table is established and the attribute importance is calculated, and then the neighborhood rough set is used for attribute reduction. Then extract the correlation rules to establish a prediction model. Compare the prediction model proposed in this paper with the existing gray prediction method and energy elasticity coefficient method. The results show that this method can more scientifically predict the changes in energy big data demand. Finally, based on the experimental results, the corresponding strategies for optimizing the energy structure are proposed to provide reference for the optimization and development of energy demand.

Keywords

Energy big data fuzzy rough set demand prediction structure optimization

1 Introduction and literature review

The international political and economic environment is becoming more and more complex. To maintain steady and rapid economic growth in this environment, it is inseparable from the long-term stable supply of energy [1]. Ensuring national energy security is a strategic issue related to the development and progress of the entire country [2]. In China’s energy consumption, coal is the main source of energy. Nowadays, due to environmental issues, governments around the world are promoting sustainable development and low-carbon emission reduction economic models. Most of the existing world energy reserves are in the hands of governments in developed countries. The above reasons have led to a surge in investment demand of energy suppliers [3, 4]. Energy such as oil plays an increasingly important role in the world economy. To adapt to the development and management of the social economy, it is necessary to forecast the demand for energy big data, to provide more accurate reference results for the complex energy market environment and situation [5].

Many experts and scholars have done a lot of research on the field of demand forecasting. Foreign scholars’ research on demand forecasting in many fields is fruitful. Arias M B and Bae S [6] used South Korea’s historical traffic data and weather data to classify the traffic patterns and cluster the decision trees to establish a classification standard, thus constructing a method based on big data technology. Predictive models are to show the different charging loads of electric vehicles in residential and commercial locations. In the short-term air passenger demand forecast, Kim S and Shin D H [7] used the big data and basic regression analysis in the search engine query to construct a prediction model for predicting the number of passengers at Incheon International Airport in South Korea. See-To Eric W K et al. [8] analyzed customer reviews to establish short-term demand allocation and sales forecasting models to help managers make better decisions in supply chain operations. Perea R G et al. [9] used the dynamic artificial neural network architecture and genetic algorithm for short-term prediction of daily irrigation water demand on the Bemb $\overset{´}{e}$ zar MD irrigation area water demand prediction problem. Sagaert Y R et al. [10] used a LASSO method to process a large number of variables to improve the accuracy of tactical sales forecasts of major suppliers in the tire industry. Domestic scholars are also working hard to conduct relevant research. Chen et al. [11] introduced the belief-desire-intention model in the modeling of travel individual agents, considering the heterogeneity and variable traffic scenarios at the micro-level of travel. Zhu Liang et al. [12] introduced the Bernstein Copula function to describe the sequence-related structure of China’s inbound tourism demand, to carry out tourism prediction. Hu Xiaojian et al. [13] constructed a multivariate nonlinear combination regression prediction model for logistics demand by quantitative analysis of total social logistics and total logistics costs. Fan Sixia et al. [14] used the global kernel function and the local kernel function to train the training samples separately, and dynamically extracted the bottom function of the composite kernel model based on the training results to build a prediction model. Wang Zixing et al. [15] proposed a demand forecasting method for carrier-based aircraft discontinuous spare parts based on the gray system and time aggregation. Gong Wenwei et al. [16] built a demand forecasting model based on grey theory and exponential smoothing method for the problem of large inventory in China’s manufacturing industry due to inaccurate demand forecasting. Chen Rui et al. [17] used the LEAP model to predict the energy demand of Changsha City, Hunan Province in different scenarios from 2015 to 2020, and provided visual constructive opinions on the industrial structure and national energy conservation policies. Liang Huizhen [18] based on the energy consumption of China over the years, using the self-help method to re-sampling the calculation error, established a self-contained energy consumption life cyclotron prediction model. Tang Limin et al. [19] constructed a dynamic model of road transportation energy demand system in Liaoning Province by analyzing the interaction between road transportation and economic, population and energy subsystems.

From the above, domestic and foreign experts and scholars have made great achievements in the research of demand forecasting models, and there are not many studies on energy demand forecasting. However, the predictions of the existing various types of prediction models are not good, mainly in: (1) In the medium and long-term prediction, the prediction results are highly biased with time; (2) The actual influencing factors are not fully considered, and the constructed prediction model cannot flexibly correct the prediction results. (3) The fusion of information sources is not enough, ignoring the role of big data in the prediction model.

The fuzzy rough set was proposed by Dubois and Prade in 1990 and is a model that combines fuzzy sets with coarse sugar set theory [20]. The fuzzy coarse sugar set theory plays a key role in dealing with uncertainties, inaccurate data, and incomplete data. Fuzzy rough sets have been applied in many fields. Xie Song et al. [21] applied fuzzy rough sets to transformer oil-paper insulation state evaluation, which provided a new idea for transformer oil-paper insulation state evaluation and has practical value in engineering applications. Zhang Chao et al. [22] applied fuzzy rough sets to multi-attribute decision-making problems such as occupational assessment, and used the advantages of fuzzy sets and rough sets in uncertain decision-making to provide valuable decision-making models for decision-makers. Guo Rongchao et al. [23] used the fuzzy rough set model in the multi-label classification task, and the classification effect was good through feature selection. Xiao Bai et al. [24] applied fuzzy rough sets to space load forecasting to accurately characterize the unbalanced and uncoordinated load development. It shows that the method is practical and effective. Cao Yuyuan [25] combines fuzzy rough sets with support vector machines for aero-engine fault diagnosis. This method shows strong diagnostic ability and greatly shortens the calculation time without affecting the diagnosis rate. In summary, the application of fuzzy rough sets in data mining is extensive, especially in the processing of uncertain, incomplete and fuzzy information problems. Therefore, consider the application of the reduced fuzzy rough set to the demand forecast of energy big data.

This paper proposes an energy big data demand prediction model under the fuzzy rough set model. First, based on the China Statistical Yearbook from 1992 to 2018, the China Energy Big Data Report, and the China Energy Statistics Yearbook, the factors affecting energy demand are determined. According to the characteristics of fuzzy rough sets, the fuzzy C-means clustering algorithm is used to discretize the data. Then the decision table is established and the attribute importance is calculated, and the neighborhood rough set is used for attribute reduction. Next, extract the correlation rules to establish a prediction model. To verify the validity of the prediction model proposed in this paper, the method is compared with the grey prediction method and the elastic coefficient method. Comparing the predicted results with the energy historical data, the results show that the prediction accuracy of this method is as high as 99%, and the prediction results are relatively stable. Finally, based on the prediction results of coal, oil and natural gas, the corresponding strategies for optimizing the energy structure are proposed to provide reference for the optimization and development of energy demand.

2 Forecast model and method for energy big data demand based on fuzzy rough set model

2.1 Energy demand prediction theory

The prediction of energy demand refers to scientific and reasonable prediction of energy demand in the coming period. Under the guidance of scientific and reasonable forecasting methods, based on comprehensive consideration of the factors affecting energy demand, the data of past energy consumption and related influencing factors are mined and analyzed to find out the relationship and general development rules among the data. The energy demand prediction has the following characteristics:

Energy demand prediction needs to proceed from the perspective of meeting the normal and stable development of the national economy and society, and cannot be carried out unilaterally from the industry itself or in certain aspects.

The inaccuracy of the results. The consumption demand for energy will be affected by many factors.

The prediction step of the energy demand can be performed according to the flow shown in Fig. 1.

2.2 Basic theory of fuzzy rough sets

Fig.1

Energy demand prediction steps.

In the Pawlak rough set model, not all classical sets A in the universe U can be accurately described by the knowledge in the knowledge base (U, R). Then use A’s (U, R) one. Describe the lower and upper approximations. But in real life, many concepts are often fuzzy and inaccurate. This problem is explained by the fuzzy rough set theory.

Definition Let (U, R) be the Pawlak approximation space, that is, R is an equivalence relation on the domain U. If A is a fuzzy set on the domain U, then a pair of lower approximations and upper approximations of A on (U, R) are defined as a pair of fuzzy sets on U: $\begin{matrix} {\underline{A}}_{R} (x) = min {A (y) | y \in [x]_{R}, x \in U}, \\ {\bar{A}}_{R} (x) = max {A (y) | y \in [x]_{R}, x \in U} \end{matrix}$

Where [x]_R represents an equivalence class containing x, and ${\underline{A}}_{R}$ , ${\bar{A}}_{R}$ are respectively the lower and upper approximations of (U, R) of the fuzzy set A.

${\underline{A}}_{R}$ And ${\bar{A}}_{R}$ are also fuzzy sets on the domain U. If ${\underline{A}}_{R}$ = ${\bar{A}}_{R}$ , then A is a definable fuzzy set, otherwise A is a fuzzy rough set. It is not difficult to see that when A is a classical set, ${\underline{A}}_{R}$ and ${\bar{A}}_{R}$ are transformed into a lower and upper approximation of (U, R) for set A under Pawlak conditions. Therefore, the fuzzy rough set can be understood as a generalization of the rough set in concept.

2.3 Energy big data demand prediction model based on fuzzy rough sets

The establishment process of energy big data demand prediction model based on fuzzy rough set mainly includes: determining energy influencing factors, establishing decision table and calculating attribute importance, data discretization, attribute reduction and rule generation.

2.3.1 Selection of energy impact factors

The demand for energy is affected by many factors. According to the China Energy Big Data Report (2018) and other energy-related data, factors such as gross national product, industrial structure, household income, science and technology, investment, final consumption, total exports, heavy industry, and the urban population will affect energy demand to varying degrees. Therefore, this paper selects the factors influencing the model of energy big data demand, such as energy consumption, gross domestic product, industrial structure, residence income, science and technology, investment, final consumption expenditure, total export, heavy industry ratio, and urban population.

2.3.2 Discretization

Rough set theory is easy to handle discrete data. In this paper, FCM (fuzzy C-means clustering) is used to discretize continuous data.

(1) Membership function

The membership function is a function that indicates the degree to which an object x belongs to the set A, and is usually denoted as μA(x). The argument range is all objects that may belong to the set A (that is, all the points in the space where the set is located), and the value range is [0, 1], that is, 0≤μA (x)≤1. μA(x)=1 means that x is completely affiliated with set A, which is equivalent to x ∈ A on the traditional set concept.

(2) Fuzzy C-means clustering (FCM)

The fuzzy C-means clustering algorithm is partitioning-based. Its idea is to make the similarity between objects divided into the same cluster the largest, and the similarity between different clusters to be the smallest. FCM is a clustering algorithm that uses membership to determine the extent to which each data point belongs to a certain cluster. FCM divides n vectors x_i(i= 1,2, ... n) into c fuzzy groups, and finds the cluster center of each group, so that the value function of the non-similarity index is minimized. Specific steps are as follows:

Step 1: Initialize the membership matrix U with a random number whose value is between 0 and 1. Rule: The sum of the memberships of the data set is always equal to 1.

Step 2: Calculate c cluster centers c_i, i = 1, ⋯ , c using the formula $c_{i} = \frac{\sum_{j = 1}^{n} u_{ij}^{m} x_{j}}{\sum_{j = 1}^{n} u_{ij}^{m}}$ . U_ij is between 0 and 1, and m ∈ [1, ∞) is a weighted index.

Step 3: Calculate the value function according to $J (U, c_{1}, \dots, c_{c}) = \sum_{i = 1}^{c} J_{i} = \sum_{i = 1}^{c} \sum_{j}^{n} u_{ij}^{m} d_{ij}^{2}$ . d_ij =∥ c_i - x_j ∥ represents the Euclidean distance between the i-th cluster center and the j-th data point. If it is less than a certain threshold, or if its change from the last value function value is less than a certain threshold, the algorithm stops.

Step 4: Calculate the new matrix U according to $u_{ij} = \frac{1}{\sum_{k = 1}^{c} {(\frac{d_{ij}}{d_{kj}})}^{2 / (m - 1)}}$ . Go back to step 2.

2.3.3 Establish decision tables and calculate attribute importance

In the data of this paper, energy consumption, gross domestic product, industrial structure, residence income, science and technology, investment, final consumption expenditure, total export, Heavy industry ratio, and urban population, all come from China Statistical Yearbook, China Energy Big Data Report and China Energy Statistics Yearbook. The period is from 1992 to 2018. To find out the influence law of various influencing factors on energy demand, this paper takes each decade as a research object, and the periods are 1992– 2001, 1997– 2006, 2002– 2011, 2007– 2016, 2012– 2018, respectively. Dividing data by period can dynamically reflect the impact of different factors on energy demand in different periods.

The basic tool for expressing and processing knowledge in rough set theory is the information table. Decision tables are a special and important knowledge expression system and a special information table. It indicates how decisions should be made when certain conditions are met. The data discretization is characterized by the FCM algorithm, and a decision table is established. The rows of the table correspond to the objects of the study, or tuples. The column corresponds to the attribute of the object, and the information of the object is represented by the value of each attribute of the specified object. The decision attribute of the decision table is energy consumption (Q); Condition attributes are gross domestic product (GDP), industrial structure (IS), residence income (RI), science and technology (ST), investment (INV), final consumption expenditure (FCE), total export (TE), Heavy industry ratio (HIR), and urban population (UP). The information entropy method is used to find the attribute importance.

2.3.4 Attribute reduction

The purpose of attribute reduction is to eliminate those attributes that are not important and have little effect on the experimental results. Based on the neighborhood rough set reduction algorithm, the steps are as follows:

Step 1: input decision system NDS=(U, A⋃D), input the obtained neighborhood radius set and the appropriate lower limit of importance;

Step 2: initializing the reduced set red to an empty set red=φ; taking the whole sample as the initialization state of the sample set, the formula is expressed as smp = U;

Step 3: to remove all the attributes a_i remaining in the set of reduction sets, you need to find its positive domain, that is ${Pos}_{red - a_{i}}^{smp} (D)$ ;

Step 4: Compare all the found positive domains to find the largest positive Pos_k (D).

Step 5: Compare the importance obtained at this time with the lower limit of the importance set at the beginning. If the importance obtained is greater than the lower limit, the reduction result can be directly output, and the algorithm ends here; Otherwise, you need to record the k value, and red = red+a_k, S = S-Pos_k, then return to Step 3 to loop until the algorithm ends.

2.3.5 Establishment of the demand prediction model

(1) Extract correlation rules

The basic correlation rule set is obtained from the reduced sample data:

L = {r : A_{[x]} \to B_{[y]}} | [x] \in U / R_{c}, [y] \in U / R_{d}}

Among them, A_[x], B_[y] respectively represent the factor variation characteristics and economic indicators change characteristics of the equivalence classes [x] ∈ U/R_c and [y] ∈ U/R_d. If the relevant rules are deterministic, the factor characteristics completely determine the characteristics of the economic indicators. Each basic correlation rule has a certain degree of confidence, and the confidence is defined as:

P (r) = P (A_{[x]} \to B_{[y]}) = \frac{card ([y] \cap [x])}{card ([x])}

Table 1
Attribute importance of condition variables relative to decision variables in each period

Attribute GDP INV IS FCE RI TE HIR ST UP

importance

time

1992— 2001 0.087 0.078 0.034 0.037 0.051 0.029 0.063 0.051 0.031

1997— 2006 0.093 0.062 0.042 0.032 0.065 0.031 0.854 0.055 0.054

2002— 2011 0.115 0.03 0.073 0.025 0.068 0.029 0.125 0.067 0.067

2007— 2016 0.113 0.05 0.051 0.024 0.053 0.037 0.231 0.054 0.075

2012— 2018 0.129 0.04 0.064 0.041 0.062 0.022 0.240 0.068 0.092

1992— 2018 0.112 0.05 0.057 0.032 0.062 0.031 0.146 0.058 0.072

Attribute	GDP	INV	IS	FCE	RI	TE	HIR	ST	UP
1992— 2001	0.087	0.078	0.034	0.037	0.051	0.029	0.063	0.051	0.031
1997— 2006	0.093	0.062	0.042	0.032	0.065	0.031	0.854	0.055	0.054
2002— 2011	0.115	0.03	0.073	0.025	0.068	0.029	0.125	0.067	0.067
2007— 2016	0.113	0.05	0.051	0.024	0.053	0.037	0.231	0.054	0.075
2012— 2018	0.129	0.04	0.064	0.041	0.062	0.022	0.240	0.068	0.092
1992— 2018	0.112	0.05	0.057	0.032	0.062	0.031	0.146	0.058	0.072

By definition:

1) When P (A_[x] → B_[y]) =1, it means that A_[x] → B_[y] is a deterministic rule, that is, when the factor characteristic of the object u is A_[x], the economic indicator d is B_[y];

2) When P (A_[x] → B_[y]) =0, it means that the rule A_[x] → B_[y] is not established, that is, when the factor characteristic of the object u is A_[x], the economic indicator d is not B_[y];

3) When 0 < P (A_[x] → B_[y]) <1, it indicates that A_[x] → B_[y] is an uncertainty rule. When the factor characteristic of object u is A_[x], the economic indicator d is B_[y] and the confidence is P (A_[x] → B_[y]).

(2) Building prediction model

The influencing factor value c_1,T, c_2,T, ⋯ , c_m,T of the prediction period T is characterized, the corresponding object is assumed to be u_T, the equivalence class under the knowledge system U/Rc is [x_T], the feature description is A_{[x_T]}, and the eigenvalue is still represented by c_1,T, c_2,T, ⋯ , c_m,T, ie A_{[x_T]} = (c_1,T, c_2,T, ⋯ , c_m,T) ^T. The prediction principle of the economic indicator d of u_T is to predict the d_T by using the comprehensive information of the economic indicator change pattern corresponding to the rule of the highest credibility of the u_T. To do this, we need to define the credibility of the object u_T and the rule r, assuming any r ∈ L, r : A_[x] → B_[y], $A_{[x]} = (c_{1}^{r}, c_{2}^{r}, \dots, c_{m}^{r}),$ B_[y] = d^r, the weighted distance of the factor feature vector is used to define the credibility of the u_r and the rule r: $δ (u_{r}, r) = \sum_{i = 1}^{m} λ_{i} (c_{i, r} - c_{i}^{r})^{2}$ . Solve $min_{r \in L} δ (u_{r}, r) = δ (u_{r}, r^{*})$

The set of all the r^* corresponding factor characteristics satisfying the above formula is M^*, then the prediction model of $d_{r}^{⌢}$ is: $d_{r}^{⌢} = \sum_{A_{[x]} \in M *} \sum_{[y] \in U / R_{D}} B_{[y]} P (A_{[x]} \to B_{[y]})$

3 Data source and model evaluation

3.1 Data source

The data used in this paper are from the China Statistical Yearbook from 1992 to 2018, the China Energy Big Data Report and the China Energy Statistics Yearbook. The data includes: energy consumption, gross domestic product, industrial structure, residence income, science and technology, investment, final consumption expenditure, total export, Heavy industry ratio, and urban population, etc.

3.2 Model evaluation

The energy big data demand forecasting model based on the fuzzy rough set model proposed in this paper is designed and implemented on an ordinary PC. The relative error and accuracy are used to evaluate the prediction results of the prediction model on historical data.

Relative error: $Relative error = \frac{| Prediction value - Actual value |}{Actual value}$

Accuracy: $Accuracy = 1 - Relative error$

4 Comparison of the demand forecasting model

Table 2
Comparison of prediction errors of each method (10,000 tons)

Year Actual Prediction Prediction Relative Accuracy (%)

value method value error(%)

2010 29027.6 Grey prediction 27515.3 5.21 94.79

Elastic coefficient method 28380.3 2.23 97.77

Method of this paper 29059.5 0.11 99.89

2011 28915.09 Grey prediction 27174.4 6.02 93.98

Elastic coefficient method 28377.27 1.86 98.14

Method of this paper 28706.9 0.72 99.28

2012 29838.46 Grey prediction 27403.6 8.16 91.84

Elastic coefficient method 30559.3 2.36 97.64

Method of this paper 29713.1 0.42 99.58

2013 30137.84 Grey prediction 31695.9 5.17 94.83

Elastic coefficient method 31045.0 3.01 96.99

Method of this paper 30143.87 0.02 99.98

2014 30396.74 Grey prediction 32603.5 7.26 92.74

Elastic coefficient method 29627.70 2.53 97.47

Method of this paper 30630.8 0.77 99.23

Year	Actual	Prediction	Prediction	Relative	Accuracy (%)
2010	29027.6	Grey prediction	27515.3	5.21	94.79
		Elastic coefficient method	28380.3	2.23	97.77
		Method of this paper	29059.5	0.11	99.89
2011	28915.09	Grey prediction	27174.4	6.02	93.98
		Elastic coefficient method	28377.27	1.86	98.14
		Method of this paper	28706.9	0.72	99.28
2012	29838.46	Grey prediction	27403.6	8.16	91.84
		Elastic coefficient method	30559.3	2.36	97.64
		Method of this paper	29713.1	0.42	99.58
2013	30137.84	Grey prediction	31695.9	5.17	94.83
		Elastic coefficient method	31045.0	3.01	96.99
		Method of this paper	30143.87	0.02	99.98
2014	30396.74	Grey prediction	32603.5	7.26	92.74
		Elastic coefficient method	29627.70	2.53	97.47
		Method of this paper	30630.8	0.77	99.23

Result 1: attribute importance

The known conditional attributes are gross domestic product (GDP), industrial structure (IS), residence income (RI), science and technology (ST), investment (INV), final consumption expenditure (FCE), total export (TE), Heavy industry ratio (HIR), and urban population (UP). The importance of the above conditional attributes relative to the decision attribute energy consumption (Q) is obtained, as shown in Table 1.

To visually show the difference of each variable, according to the importance of each condition attribute during 1992– 2018, the radar chart is drawn as shown in Fig. 2.

It can be seen from Table 1 and Fig. 2 that by analyzing all the sample data from 1992 to 2018, the conditional attributes are ranked from large to small according to the importance of the decision variables: HIR, GDP, UP, RI, ST, IS, INV, FCE, TE. The attribute importance index reflects the dependence of energy consumption on various influencing factors. The proportion of heavy industry ratio (HIR) and gross national product (GDP) have the greatest impact on energy consumption. The importance of science and technology (ST) and industrial structure (IS) also reached 0.058 and 0.057, respectively. As for investment (INV), final consumption expenditure (FCE) and total export value (TE), the impact on energy consumption is less than that of other conditional attributes.

Fig.2

Attribute importance radar chart.

Result 2: Comparison of prediction accuracy of each method

To verify the validity of the prediction model, the historical energy data of the five years from 2010 to 2014 are compared with the predicted values. The method proposed in this paper is compared with the gray prediction method and the elastic coefficient method. Taking coal prediction as an example, the forecast situation is shown in Table 2.

To further observe the difference between the data predicted by each method and the actual data, the T-test is performed on the above-predicted data and the existing historical data, and the T-test value is shown in Table 3.

Fig.4

Prediction of Coal consumption.

It can be seen from Table 2 that the prediction accuracy of the proposed method is as high as 99% or more. The grey prediction method performed poorly on the energy prediction results, with an average accuracy of 93.636%, while the elastic coefficient method had an average prediction accuracy of 97.062%. In Table 3, by comparing the obtained prediction data with the original historical data, the T value of the predicted data and the original data in this method is the smallest, which is 0.0268, indicating that the results obtained by this method are more consistent with the actual situation. The gray prediction method has the lowest average precision and the largest T value. The reason for the analysis is that the gray prediction method uses the fitting idea and is suitable for prediction with certain data characteristics. Energy big data demand prediction is affected by many factors. The energy demand prediction model based on the fuzzy rough set proposed in this paper plays a key role in dealing with uncertainties, inaccurate data, missing data, etc., and thus obtains better prediction results.

The gray prediction method, the elastic coefficient method and the mean-variance of the prediction data of coal energy in this paper are shown in Fig. 3. As can be seen from the Fig., the demand for coal shows an upward trend with time.

Result 3: Comparison of the method in this paper on various types of energy prediction.

Technology development has an impact on energy supply and demand, so the baseline scenario and the high-tech factor scenario (high scenario) are set. The baseline scenario (base) simulates a low-power alternative where all technical factors are zeroed. The high scenario (high) is the case where the simulated electric energy replacement technology develops effective energy replacement for fossil energy on the demand side. The technical factors of coal, oil and natural gas consumption in the high scenario are set to 3%, 2%, and 1%, respectively, and characterize the degree of development of alternative technologies for electrical energy in different fields. In this paper, there is no major policy adjustment in the future, and the policy factors of the two are set to 0.1% and 0.1% respectively. Since China will be in the stage of energy structure adjustment for a long time, the structural adjustment factors for coal, oil and natural gas are set at 4%, 0.2% and 0.1% respectively. The prediction results of the method in coal, oil and natural gas are shown in Figs. 4, 5 and 6.

Table 3

T-test results

Method	Grey	Elastic coefficient	Method of
	prediction	method	this paper
T value	0.3156	0.1048	0.0268

Fig.3

Mean-variance of the predicted values of the three methods in different years.

Fig.5

Prediction of oil consumption.

Fig.6

Forecast of natural gas consumption.

As can be seen from Fig. 4, coal consumption still dominates China’s energy consumption in both scenarios. Under the baseline scenario, the total coal consumption will continue to rise by 2035, and the proportion of total coal consumption in primary energy consumption will slowly decrease. Under the high scenario, the total coal consumption will peak at around 2028 and then gradually decline. In the high scenario, coal consumption is always lower than the baseline scenario. As can be seen from Fig. 5, China’s oil consumption will increase steadily in the future, and it will be slightly lower than the baseline scenario in the high scenario. Under the baseline scenario, oil accounted for a slow increase in the proportion of primary energy consumption, while in the high scenario, it slowly declined. As can be seen from Fig. 6, the proportion of natural gas consumption and primary energy in China will increase rapidly in the future, and the high scenario is higher than the baseline scenario. The fossil energy structure has gradually shifted from coal to high-quality energy such as oil and gas. The issue of climate change has attracted much attention, and carbon dioxide emissions have become an important factor restricting the demand for energy production. According to the data, coal dioxide emissions are about 30% higher than oil and about 70% higher than natural gas, while providing the same amount of energy. In the future, China’s natural gas consumption will continue to maintain a relatively high growth rate, and the growth rate in the high scenario is higher than the baseline scenario. By 2036, the growth rate of natural gas consumption in the baseline scenario is 3.05%, and the growth rate of natural gas consumption in the high scenario is 4.45%. This means that China’s demand for natural gas will continue to grow.

5 Discussions of the results

The relative error of the fuzzy rough set based prediction model proposed in this paper is within the acceptable range. By comparing with the gray prediction method and the elastic coefficient method in energy prediction, the experimental results show that the prediction method proposed in this paper has achieved very high precision. Energy demand is affected by many uncertain factors. There is an extremely complex nonlinear correlation between energy consumption indicators and influencing factors, which is difficult to accurately represent with an analytical mathematical model. The fuzzy rough set based prediction method starts from the data and reduces the redundant information without losing information, and finds the influence relationship between the energy prediction index and the explanatory variable, and simultaneously deals with qualitative, quantitative factors and uncertain factors.

In terms of coal, oil and natural gas forecasts, although the demand for coal and oil is increasing, the growth rate has slowed down, while the demand for natural gas has increased significantly. This is because the energy structure in the later period of the country is transforming. Coal and oil are non-clean sources of energy and produce carbon dioxide that exacerbates the greenhouse effect. Natural gas is a clean energy source, so its demand will continue to maintain a high growth rate. The vigorous development of clean energy is the trend of the times. We can promote power and energy, such as thermal power, hydropower, wind power and other large-scale development, to ensure the effective development of the renewable energy market.

6 Conclusions

Sustainable energy development is an important issue related to national economic and social development. Scientifically and reasonably predicting China’s energy supply and demand situation is of great significance for exploring China’s energy alternative path and promoting China’s energy strategy transformation. The medium and long term of energy demand is the focus and difficulty in the field of energy prediction in recent years. This paper establishes a model for predicting energy big data demand based on fuzzy rough set model, and hopes to explore the development path of China’s energy structure transformation and electric energy substitution through predicting future energy demand.

Footnotes

Acknowledgments

This work was supported by Yunnan Local Colleges Applied Basic Research Projects (No. 2018FH001-055) and Scientific Research Foundation of Yunnan Education Department (No. 2018JS477).

References

Lingdi

Zhao

, Dong

, China’s energy supply side carbon accounting and spatial differentiation pattern, China Population • Resources and Environment 28(02) (2018), 48–58.

Tao

Xie

and Li

, US Energy Strategy Adjustment and China’s Coping Strategies, Customs and Economics Research 2018(3), 88–94.

Huajie

, Limei

, Analysis of China’s overseas energy investment development in recent years, China Energy v.40 (04) (2018), 44–49.

Mohamed

A comparative study on Internet of Things (IoT): Frameworks, Tools, Applications and Future directions,, Journal of Intelligent Systems and Internet of Things 1(1) (2020), 13–39.

Zhihan

, Kong

Weijia

, Zhang

Xin

, Jiang

Dingde

, Lv

Haibin

, Lu

Xiaohui

, Intelligent Security Planning for Regional Distributed Energy Internet,, IEEE Transactions on Industrial Informatics (2019).

Arias

M.B

, Bae

Electric vehicle charging demand forecasting model based on big data technologies, Applied Energy 183 (2016), 327–339.

Kim

, Shin

D.H

Forecasting short-term air passenger demand using big data from search engine queries, Automation in Construction 70 (2016), 98–108.

Eric

See-To W.K

, Ngai

E.W.T

, Customer reviews for demand distribution and sales nowcasting: a big data approach, Annals of Operations Research 270 (6) (2016), 1–17.

Perea

R.G

, Poyato

E.C

, Montesinos

, et al., Optimisation of water demand forecasting by artificial intelligence with short data sets, Biosystems Engineering 177 (2019), 59–66.

10.

Sagaert

Y.R

, Aghezzaf

E.H

, Kourentzes

, et al., Temporal big data for tactical sales forecasting in the tire industry, Interfaces 48 (2) (2017), 121–129.

11.

Chen

, Lei

Peng

, Wei

, Public Transportation Demand Forecasting Method Based on Computational Experiment, Acta Automatica Sinica 43 (1) (2017), 60–71.

12.

Liang

Zhu

and Jianping

Zhang

, Prediction of China’s Inbound Tourism Demand Based on Bernstein Copula Function, Tourism Tribune 2017(11), 44–51.

13.

Xiaojian

, Meiyan

Zhang

and Lin

, Construction of Logistics Demand Forecasting Model, Statistics and Decision 2017(19), 187–190.

14.

Sixia

Fan

and Bin

, Research on Logistics Demand Forecast Based on Composite Kernel Model, Industrial Engineering and Management 23 (2) (2018), 40–44.

15.

Zixing

Wang

, Wei

Han

and Jiechao

, Demand Forecasting Method for Intermittent Spare Parts of Carrier Aircraft, Ordnance Industry Automation 2017(05), 45–48.

16.

Wenwei

Gong

and Jing

Huang

, Comprehensive Model of Demand Forecast Based on Grey Theory and Exponential Smoothing Method, Statistics and Decision 2017(1), 72–76.

17.

Rui

Chen

, Zhenghua

Rao

, Jixiong

Liu

, et al., Energy Demand Forecast and Countermeasure Research in Changsha City Based on LEAP Model, Resources Science 39 (3) (2017), 482–489.

18.

Huichen

Liang

, China’s clean energy demand forecast based on self-help method, Hydroelectric Power 43 (9) (2017), 97–100.

19.

Limin

Tang

, Yicheng

Wang

and Pan

Wang

, Prediction of Road Transportation Energy Demand Based on System Dynamics— Taking Liaoning Province as an Example, Journal of Chongqing Jiaotong University (Natural Science Edition) 38 (03) (2019), 91–96.

20.

Kai

Zhang

and Jing

Yang

Review of Rough Set Theory and Its Application, Internet of Things Technology 7 (6) (2017), 93–94.

21.

Song

Xie

, Yang

Zou

and Jinding

Cai

, Evaluation of insulation state of transformer oil-paper based on fuzzy rough set, Chinese Journal of Scientific Instrument 38 (1) (2017), 190–197.

22.

Chao

Zhang

and Deyu

, Pythagorean fuzzy rough set and its application in multi-attribute decision making, Journal of Chinese Computer Systems 37 (7) (2016), 1531–1535.

23.

Rongchao

Guo

, Deyu

and Suge

Wang

, Fuzzy Rough Set Model Based on Marking Relationship, Pattern Recognition & Artificial Intelligence 30 (10) (2017), 952–960.

24.

Bai

Xiao

, Qingyong

Liu

, Longjiang

Fang

, et al., Spatial load forecasting based on fuzzy rough set theory and spatiotemporal information, Electric Power Construction 38 (1) (2017), 58–67.

25.

Yuyuan

Cao

, Jian

Zhang

, Yanjun

, et al., Aeroengine fault diagnosis based on fuzzy rough set and SVM, {Journal of Vibration, Testing and Diagnosis {37 (1) (2017), 169–173.

Attribute	GDP	INV	IS	FCE	RI	TE	HIR	ST	UP
importance
time
1992— 2001	0.087	0.078	0.034	0.037	0.051	0.029	0.063	0.051	0.031
1997— 2006	0.093	0.062	0.042	0.032	0.065	0.031	0.854	0.055	0.054
2002— 2011	0.115	0.03	0.073	0.025	0.068	0.029	0.125	0.067	0.067
2007— 2016	0.113	0.05	0.051	0.024	0.053	0.037	0.231	0.054	0.075
2012— 2018	0.129	0.04	0.064	0.041	0.062	0.022	0.240	0.068	0.092
1992— 2018	0.112	0.05	0.057	0.032	0.062	0.031	0.146	0.058	0.072