Carbon emissions forecasting based on temporal graph transformer-based attentional neural network

Abstract

In the field of electric carbon, the mapping relationship between carbon emission flow calculation and power flow calculation was studied by combining techniques such as current trajectory tracking, carbon flow trajectory analysis, power system flow calculation methods, and electric network analysis theory. By delving into the mechanism between these two factors, a better understanding of the correlation between them can be achieved. In addition, by using time series data, graph attention neural networks (GNN), distributed computing technology, and spatiotemporal computing engines, carbon emission fluctuations can be decomposed and a high-frequency “energy-electricity-carbon” integrated dynamic emission factor can be achieved. Through the spatiotemporal distribution patterns of this dynamic factor in multiple dimensions, the carbon emissions from key industries in cities can be accurately calculated. In this paper, the LSTM-GAT model is used as the core to construct a key carbon emission prediction model for cities. The study focuses on the power plant, chemical industry, steel, transportation industry, and construction industry, which are high energy-consuming industries with an annual electricity consumption of more than 100 million kWh in a major city of China. By analyzing the entire life cycle from power generation to electricity consumption and conducting current flow analysis, monthly, weekly, and daily carbon emission calculations were performed. Additionally, other factors such as the industrial development index, GDP, coverage area of power generation enterprises, regional population, size, and type of power-consuming units were included in the comprehensive calculation to build a measurement system. By conducting experiments and analyzing historical data, we have found that the LSTM-GAT model outperforms the single models of GCN, GAT, LSTM, GRU, and RNN in terms of lower error values and higher accuracy. The LSTM-GAT model is better suited for predicting carbon emissions and related indicators with an accuracy rate of 89.5%. Our predictions show that the carbon emissions will exhibit a slow growth trend in the future, while the carbon emission intensity will decrease. This information can provide a scientific basis for government decision-making.

Keywords

Electric carbon dynamic emission factor LSTM-GAT predict carbon emissions

1. Introduction

With economic development and improving living standards, carbon emissions [1, 2] have risen in tandem. China’s carbon emissions statistics show that from 1970 to 2002, China’s carbon emissions increased on average by 5.5% per year. However, from 2002 to 2008, the average annual increase in carbon emissions doubled to 11.5%. By 2006, China’s carbon emissions surpassed that of the United States and accounted for 20% of the world’s total carbon emissions. China’s rapid economic growth and high energy demand, coupled with a single energy structure, inevitably led to a sharp increase in carbon emissions. Consequently, energy conservation [3, 4] and emission reduction have become important national development goals. Emission reduction can bring environmental benefits and protect natural resources, and can help reconcile the contradiction between carbon emissions and environmental protection. Using scientific and effective prediction methods to detect carbon emissions can prevent irreversible damage to the environment caused by blind economic development and safeguard the survival of future generations. Investing in environmental protection and development can bring greater economic benefits in the future, and forecasting carbon emissions [5, 6] can provide technical support for sustainable development and scientific basis for formulating emission reduction plans.

With the continuous development of big data technology, numerous emerging technologies bring new ideas and methods to data prediction work. Data has become more abundant and available, and computing power has increased dramatically, creating the conditions for the rise of deep learning. In 2006, Hinton et al. proposed a model called “Deep Belief Networks”, which provides an effective method for training deep neural networks. Since then, deep learning has developed rapidly, and a series of improved algorithms and architectures have been proposed, such as convolutional neural network (CNN) for image recognition, recurrent neural network (RNN) for sequence data processing, long short-term memory network (LSTM) and gate Governed Recurrent Units (GRU) are used to handle long-term dependencies in sequence data. Deep learning [7, 8], a branch of machine learning [9, 10], can accomplish tasks, regardless of complexity, and can even outperform traditional machine learning methods. The learning capacity of deep learning is directly proportional to the depth of the model. This implies that a deeper model has a stronger learning capacity [11]. However, as the depth of deep learning increases, the complexity of processing the parameters also increases. Consequently, the time required to train the model doubles constantly. As a result, deep learning often involves a significant amount of computation.

Rosenblatt introduced the perceptron model [12] in 1958 as a powerful tool for handling linearly separable problems. However, when it came to linearly inseparable problems, the perceptron’s efficacy was limited and often failed to deliver desired results. In 1986, Rumelhart [13] combined the perceptron model with error backpropagation to create the Back Propagation Network, or BP neural network, marking a significant milestone in the development of artificial neural networks. This amalgamation allowed for the creation of a network capable of updating parameters and processing nonlinear problems, thus paving the way for the emergence of the multi-layer perceptron model. As a result, neural networks also became known as multi-layer perceptron [14, 15], initiating the era of shallow learning. Support vector machines and other algorithms also emerged during this period. However, the BP neural network [16, 17] encountered challenges such as slow parameter convergence and over-fitting during the learning process, impeding the progress of neural network development for some time. In 2006, Hinton et al. [18] proposed the multi-hidden layer artificial neural network, which marked a major breakthrough in the field of deep learning.

As previously noted, a deep neural network is a computationally demanding algorithm that requires thousands of parameter updates and calculations, leading to significant computing equipment requirements. Therefore, substantial computational resources are necessary to support its operation effectively. With the advancement of computer technology, powerful computing equipment can undertake complex calculations and parameter updates of deep learning. With its robust learning ability, deep learning plays an increasingly important role in production and everyday life. Long Short-Term Memory (LSTM) [19] and Recurrent Neural Network (RNN) [20] are two prominent deep learning models. LSTM performs adaptive analysis on related data before and after, extracting dependency relationships between previous and subsequent data. It exhibits excellent learning ability in time series analysis and is widely used in the field. RNN is the foundation of LSTM, but it faces training difficulties due to the gradient explosion.

The Graph Neural Network (GNN) [21] is a neural network model that processes graph data. Unlike traditional neural networks, GNN can process non-Euclidean data such as social networks and chemical molecules [22]. By learning the features of nodes and edges in graph data and summarizing them, GNN can obtain the feature representation of the entire graph, which can be used to predict various properties such as node classification, edge classification, and graph classification [23, 24]. GNN helps process graph data and extracts useful features for prediction and decision-making [25].

With the progress and development of various carbon emission prediction models, the effectiveness of a single model is becoming more and more limited. Scholars are paying more and more attention to combining multiple models or algorithms to learn from each other to further improve the prediction accuracy of carbon emissions. Composite models, whether involving deep learning algorithms or graph algorithms, often outperform their individual components. Therefore, more and more researchers begin to pay attention to the study of combined models and algorithms.

In this study, we introduce an innovative LSTM-GAT model that combines the long short-term memory network (LSTM) and the graph attention network (GAT) [26] in graph networks. Our innovation is that we calculate carbon emissions on different time scales (months, weeks, days) by deeply analyzing the flow of electricity throughout the complete life cycle from electricity generation to consumption. In addition, we also considered other key factors, such as industrial development index, GDP, power generation company coverage, regional population, power consumer size and consumer type, etc. Our research not only focuses on the prediction of carbon emissions, but also conducts a comprehensive analysis in multiple dimensions of the entire life cycle. We built a comprehensive calculation system to predict the carbon emissions of a large city in China. By comprehensively considering and analyzing various factors in multiple dimensions, our method is highly comprehensive and adaptable, providing a more comprehensive perspective for carbon emission prediction. In experiments, we verify the superiority of the proposed LSTM-GAT model in terms of prediction accuracy, and its prediction ability is significantly improved compared with traditional methods. This further proves the innovation and effectiveness of our method in exploring complex carbon emission prediction problems, and provides new ideas and methods for future research on similar problems.

2. Related work

Due to the significant impact of production on carbon emissions, many scholars have conducted research on this topic. Research has been carried out from various perspectives, such as statistical models, time series models, and machine learning models. Although carbon emissions have become a popular research topic in recent years, the number of domestic and foreign scholars conducting research on this topic is still relatively low. In this regard, scholars have made progress in recent years, and the following outlines the research directions and achievements of scholars in this field. The methods of time series analysis, including the Autoregressive Integrated Moving Average Model (ARIMA) method based on linear analysis and the ARCH method based on nonlinear analysis, were systematically explained in “Time Series Analysis: Forecasting and Control” co-published by Box and JenKins in 1970. However, these methods have high distribution requirements for data, and their prediction effect is limited. In 1994, Thomas [27] attempted to use artificial neural networks to analyze time series. The network was mainly used to predict univariate time series, and the author believes that artificial neural networks require more data information than traditional time series analysis methods. However, the network still has issues with parameter optimization. In 1997, Hochreiter proposed the LSTM model, which added a gated neural unit to the cyclic neural network architecture, enabling it to learn dependencies between data. The memory unit in LSTM [28] is capable of retaining relevant information and disregarding irrelevant data.

The input variables were identified to directly impact carbon dioxide emissions, including industrial power consumption and carbon dioxide emissions resulting from coal combustion. To evaluate the model’s performance, cross-validation techniques were utilized, with the root mean square error as the chosen evaluation metric. The SVM model’s best parameters were determined using a trial-and-error approach, resulting in the model’s ability to monitor industrial carbon emissions. In 2020, Nie [29] conducted an analysis of the primary factors impacting China’s carbon dioxide emissions, examining the country’s energy consumption structure and industrial trends. Two main models were constructed, namely, the carbon dioxide emission growth rate regression model and the carbon dioxide emission prediction model. To simplify the calculation process, 15 years of data (2000 to 2014) from the China Statistical Yearbook were selected. The principal component analysis method was used to calculate the growth rate, which was treated as the dependent variable. The contribution rate of each influencing factor was then calculated, and three factors (international trade, industrial structure, and energy consumption structure) with a combined contribution rate of over 85% were selected to replace the original eight influencing factors in further analysis. For the prediction model, high-fit equations for the trend Fourier function of the three principal components were obtained using principal components and data from 2000 to 2014. These equations were combined with the regression model to create the final carbon emission prediction model. The model was verified using carbon dioxide emission data from 2015 to 2016. Finally, China’s current economic development and energy consumption structure were considered, and a resource allocation method based on quantitative analysis was proposed, along with a low-carbon emission reduction strategy suggestion.

Our country is a high-energy-consuming nation in both production and daily life. Carbon emissions are influenced by different factors compared to other countries, which prompted Chinese scholars to develop predictive models tailored to our national conditions. These models were based on numerous carbon emission studies conducted in line with domestic circumstances. In 2003, the Institute of Energy Research’s Research Group conducted an “Analysis of China’s Sustainable Development Energy and Carbon Emissions” [30] using scenario analysis. This involved analyzing potential scenarios of energy demand based on our economic growth rate and development goals. Scenario analysis aims to explore all possible situations and analyze them, unlike predictive analysis, which focuses on the most likely situation of the object of analysis. Nonetheless, both have the same research purpose, which is to predict the future. In 2013, Du Qiang and others [31] used the Logistic model to predict carbon emission data from various provinces in China. They used the K-means clustering method to divide the carbon emissions of each province from 1987 to 2010 into five categories and drew the emission curve for each category. The training set selected carbon emission data from 2002 to 2010, and the test set selected carbon emission data from 2011 to 2020. The prediction error of the model performed well, but some areas had large errors. In 2016, Kou Jing and others [32] focused on the research of China’s carbon emission intensity. They used carbon emission data from 1978 to 2013 to obtain the carbon emission intensity data and established an ARIMA model to predict its development trend. According to the analysis, the overall carbon emission intensity in China is on a downward trend, and the carbon emission intensity has dropped significantly. However, the single model used as a forecasting method had a general predictive effect. In 2020, Liang Using stabilized data, the researchers established an Adaboost-SVR prediction model and compared its performance with several other machine learning algorithms, including Adaboost, DT, SVR, and BP neural network. The Adaboost-SVR model was found to have the best prediction accuracy. The study also suggested adjustments to China’s energy structure to promote energy saving and emissions reduction based on the analysis of carbon emission intensity. From the above descriptions, it can be found that the traditional time series analysis method [33] has a certain prediction effect for carbon emissions. However, compared with machine learning methods, it is slightly insufficient, and the use of data is not sufficient. Therefore, this paper proposes the use of deep learning and graph-based methods to predict carbon emissions, which is the mainstream development direction of data analysis.

3. LSTM-GAT model

This chapter introduces a novel LSTM-GAT model that merges the LSTM and GAT neural network models. The objective of this model is to leverage past carbon emission data and external factors, such as weather patterns, holidays, and multiple time frames, to identify the intrinsic features of carbon emissions, as well as the influence of external factors. Specifically, an LSTM subnet is established for each time frame, and a GAT module is embedded to maintain temporal consistency of the output results post graph attention network processing, thus enhancing the predictive accuracy of the LSTM-GAT model as a whole. The subsequent section will provide a comprehensive overview of the complete model.

3.1 Model structure

The proposed LSTM-GAT model’s structure, as outlined in this article, is depicted in Fig. 1. It comprises two distinct submodules: (1) an LSTM submodule that receives time-series carbon emissions data and applies the LSTM model to process it, and (2) a GAT submodule that consists of the GAT neural network and the LSTM model. This submodule groups the carbon emission data according to two-time frames, namely daily, weekly, and biweekly, as well as external features. Next, it feeds three sets of LSTM subnets to extract the carbon emissions data of daily features, weekly features, biweekly features, holidays, weather, and other characteristics separately. Finally, the features are combined, and the prediction outcome is generated via the fully connected layer.

Figure 1.

LSTM-GAT model structure diagram.

Before training the model, data preprocessing is necessary, which involves several steps. The first step involves segregating the electric power carbon emission data based on the month, day, hour, and minute of time. Additionally, holidays should be designated as either 1 or 0, where 1 represents a holiday and 0 indicates a non-holiday. Moreover, Kelvin temperature should be converted to Celsius temperature, and textual weather descriptions should be processed separately into feature categories. The second step involves applying the exponential smoothing algorithm $\bar{X}(t)=\bar{X}(t-1)+K(\bar{X}(t)-\bar{X}(t-1))$ for data denoising, followed by the min-max linear transformation formula ${X}_{i}=({{X}-{X}_{{\min}}})/{X}_{{\max}}-{X}_{{\min}}$ for data normalization. The final step is to divide the dataset into training and testing sets. The training set requires processing based on three different time axes: day (D), week (W), and bi-weekly (BW).

After data processing, the time-carbon emission flow data feature needs to be extracted from the original electricity carbon emission data and inputted into the LSTM subnet for training. During testing and prediction, the test set should be fed into the LSTM subnet, and the prediction accuracy should be evaluated. Next, carbon emission data features from multiple time axes, such as day (D), week (W), and bi-weekly (BW), should be extracted and inputted into the GLSTM subnet for training. During testing and prediction, the external factor data test set should be fed into the GLSTM subnet, and the prediction accuracy should be evaluated. Finally, the features from both the LSTM subnet and the GLSTM subnet should be combined and inputted into the FCN with two hidden layers for nonlinear transformation. The predicted sequence can then be obtained through the output layer of the FCN, allowing for joint prediction of multiple time axes and external features.

3.2 Model submodules

In the LSTM-GAT model, the GAT model believes that different neighbors have different influences on the central node, and it wants to automatically learn this weight parameter through attention, thereby improving the representation ability. Compared with our power grid knowledge map, The information of each plant is different. For example, new energy plants, thermal power plants, and nuclear power plants have different carbon emission coefficients for the output power. If different, its impact on the carbon emissions of the central node is also different. The attention mechanism in GAT is analogous to the Key-Value attention mechanism. In the graph structure, the central node is Query, the information of all neighbor nodes is Source, and the Attention Value is the feature vector of the central node after aggregation. Key and Value are the same, that is the eigenvectors of neighbor nodes. The goal is to learn the weights of neighbor nodes (Source) for the central node (Query), and then add the weighted sum to the central node to form a new feature vector expression (Attention Value). At the same time, the law of carbon emissions is easily affected by time, so we introduce LSTM in the monthly, weekly, and daily carbon emission calculation model to realize the time-series carbon emission prediction of key industries. The LSTM-GAT model is comprised of two sub-modules: the LSTM network, which has a strong fitting effect on time series data, and the GLSTM model, which is specifically designed for complex external factors. As GAT is a unique type of graph neural network model, it is capable of maintaining the adjacent correlation of input data while also possessing a certain time correlation. Consequently, inputting data processed by the GAT network into the LSTM model for further processing can significantly preserve the time correlation in the carbon emission data, leading to improved prediction performance of the model.

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) that have gained significant popularity in recent years due to their ability to overcome the limitations of traditional RNNs. One of the main challenges with RNNs is their inability to effectively capture long-term dependencies in data, resulting in poor performance when working with sequences that are longer than a few time steps. LSTMs solve this problem by incorporating memory cells and gates, which enable them to selectively remember or forget information over time. Figure 2 depicts the internal structure of an LSTM neuron, which includes a memory cell that serves as a long-term storage unit for information. In addition to the memory cell, there are three gates that control the flow of information into and out of the neuron: the input gate, the output gate, and the forget gate. The input gate determines how much new information should be added to the memory cell, based on the current input and the previous hidden state. The forget gate controls how much information from the previous memory cell should be discarded, based on the current input and the previous hidden state. Finally, the output gate determines how much information from the memory cell should be used to produce the output, based on the current input and the previous hidden state. The forget gate is a crucial component of the LSTM architecture, as it allows the network to selectively retain or discard information from the memory cell based on its relevance to the current task. The forget gate is computed using a sigmoid activation function, which outputs a value between 0 and 1. A value of 1 indicates that all information from the previous memory cell should be retained, while a value of 0 indicates that all information should be discarded. The forget gate is calculated based on the current input, the previous hidden state, and the previous memory cell state, using a set of learned weights and biases.

Figure 2.

Internal structure diagram of LSTM.

In this paper, LSTM directly processes the time-carbon emission flow data, and the algorithm of each unit gate in the LSTM layer is as follows, the inputs and outputs of the input gate are represented as:

$\displaystyle i[t]=G(w_{i}[h_{t-1};x_{t}]+b_{i})$ (1) $\displaystyle f[t]=G(w_{f}[h_{t-1};x_{t}]+b_{f})$ (2) $\displaystyle g[t]=\text{tan}h(w_{h}[h_{t-1};x_{t}]+b_{g})$ (3) $\displaystyle o[t]=G(w_{o}[h_{t-1};x_{t}]+b_{o})$ (4) $\displaystyle c[t]=f[t]\odot c[t-1]+i[t]\odot g[t]$ (5) $\displaystyle h[t]=o[t]\odot\text{tan}h(c[t])$ (6)

where $i[t]$ is the input gate and its state does not change with time, $x_{t}$ is the input of the neural network at time $t$ , $f[t]$ is the forgetting gate, $g[t]$ and $h[t]$ are activation functions, $o[t]$ is the output gate, $c[t]$ is the cell state, $G$ and tanh are the sigmoid and hyperbolic tangent activation functions, respectively. The weight w and bias $b$ in LSTM are decision variables, including $w_{i}$ , $w_{f}$ , $w_{h}$ , $w_{o}$ , $b_{i}$ , $b_{f}$ , $b_{g}$ , $b_{o}$ . Last, we define the output of the LSTM layer is ${Y}^{L}$ .

Graph Attention Network (GAT) is a type of GNN model based on graph attention mechanism [34, 35]. It can establish a graph based on the relationships between nodes and model the interactions between them in order to perform information propagation and aggregation. In GAT, the connections between nodes are implemented by the attention mechanism, which can adaptively assign different weights to each node, allowing the model to better integrate information between different nodes. Compared to other GNN models, GAT shows better performance and flexibility in handling complex graph-structured data due to its use of attention mechanisms. The GLSTM sub-module in this paper inputs the divided data sets (D, W, BW) into the GLSTM model for processing respectively. Taking the algorithm process of the data set D as an example, the formal expression of the processing flow is as follows:

$\displaystyle\textit{GLSTM}({D})=\emptyset({{W}_{f}[{{h}_{{t}-1},{D}}]+{b}_{f}})$ (7) $\displaystyle\textit{GLSTM}({D})=\emptyset({{W}_{i}[{{h}_{{t}-1},{D}}]+{b}_{i}})$ (8) $\displaystyle\textit{GLSTM}({D})=\textit{GLSTM}_{f}^{D}{C}_{{t}-1}+\textit{% GLSTM}_{i}^{D}\text{tanh}({{W}_{c}[{{h}_{{t}-1},{D}}]+{b}_{c}})$ (9) $\displaystyle\textit{GLSTM}_{o}({D})=\emptyset({{W}_{o}[{{h}_{{t}-1},{D}}]+{b}% _{o}})$ (10) $\displaystyle\textit{GLSTM}_{h}({D})=\textit{GLSTM}_{o}^{D}\text{tanh}({% \textit{GLSTM}_{c}^{D}})$ (11)

Among them, $\cdot$ represents the causal convolution operation, $\emptyset$ represents the RELU activation function, and ${W}_{f}$ , ${W}_{i}$ , ${W}_{o}$ , ${W}_{c}$ represent the weights of each unit. We define the output of the GLSTM in data set $D$ is ${Y}^{D}$ . The operation process of W and BW similar, the output of the GLSTM in data set W is ${Y}^{w}$ and the output of the GLSTM in data set BW is ${Y}^{{bw}}$ .

In order to better predict the short-term carbon emissions, we sequentially connect the output ${Y}^{L}$ , ${Y}^{D}$ , ${Y}^{W}$ , ${Y}^{{BW}}$ of the two sub-networks of the model into a feature vector:

$\displaystyle Y^{\prime}=\textit{concatenate}(Y^{L},Y^{D},Y^{W},Y^{BW})$ (12)

Then input the fused features into the fully connected layer for prediction, the formula is expressed as:

$\displaystyle Y^{\prime\prime}=\textit{FCN}(Y^{\prime})$ (13)

where FCN means fully connected layer.

The goal of the prediction model is to minimize the prediction error. Usually, a loss function is used to measure the difference between the predicted value and the real value. In this paper, the mean square error is used We define the label of all data as $Y$ , and then iteratively update the model after calculating the loss. The formula for calculating the loss is as follows.

$\displaystyle\textit{loss}=\frac{1}{n}\mathop{\sum}\nolimits(Y^{\prime\prime}-% Y)^{2}$ (14)

In this paper, we use the Adam algorithm to optimize the final objective function.

3.3 Model training

During the model training process, the parameters and weights of each unit are updated using the training set, resulting in adjusted model parameters. Subsequently, the performance of the model is evaluated by comparing the error between its predicted values and the actual values from the test set. The entire process is depicted in Fig. 3.

Figure 3.

Flow chart of model training and evaluation.

Typically, prior to commencing training of a deep neural network model, several sets of parameters are predefined. As different hyperparameter settings can influence model performance, it is imperative to undertake parameter training on the predefined model. This entails utilizing a data verification set to control the prediction error of the model, thereby adjusting its parameters to achieve the predefined error range. Eventually, a set of parameters yielding optimal model performance within the predefined error range is retained, enabling the model to attain the best training outcomes. The process of parameter selection is depicted in Fig. 4. There are several approaches to training model parameters, including the gradient descent algorithm, historical time series length N, number of hidden layers of the model, among others. In this paper, we employ the grid method to train the model, which is commonly used to solve nonlinear programming problems. This method is highly intuitive, easily adaptable, and involves minimal calculation in subsequent experiments. The experimental process of this model involves denoising and normalizing the carbon emission data set, followed by employing mathematical operations to simplify the characteristics of Q, K, V, and time T, thereby establishing a linear relationship between time and flow. Prior to the experiment, the model training set and test set are divided, and the trained model is inputted to extract the time and space characteristics of the data. Finally, the predicted value is outputted through the fully connected layer.

Figure 4.

Model parameter selection flowchart.

4. Experiment

4.1 Datasets

This paper is a comprehensive analysis of the carbon emissions in in a major city of China. The primary objective of this study is to provide a clear understanding of the factors that contribute to carbon emissions and their impact on the city’s environment. To achieve this objective, we have analyzed the data set of power plants, chemical industry, steel industry, transportation industry, and construction industry. Our study reveals that power plants are the most significant contributors to the city’s total carbon emissions, mainly due to their reliance on fossil fuels such as coal. The chemical and steel industries also have substantial emissions because their energy and raw material sources heavily depend on fossil fuels, making it challenging to transition to clean energy sources. Furthermore, the transportation industry is a significant contributor to carbon emissions, given the surging demand for transportation services in the city, coupled with the use of conventional fuel power in most vehicles. Additionally, the construction industry also has a significant role in the city’s carbon footprint as the building energy largely relies on fossil fuels. To forecast the carbon emissions for the entire year of 2022, we rely on three years of historical data for training. Our iterative training approach involves training the model on the available data and then using that knowledge to make predictions. For instance, to predict emissions in January 2022, we train our model on data from January 2019 through December 2021. Similarly, to forecast emissions in December 2022, we use data from December 2019 through November 2022 for training. The final forecast accuracy is calculated as the average across all months of 2022. This study emphasizes the need for a clean energy transition in the city’s power plants, chemical industry, and steel industry. It also highlights the need for awareness and technology development for building energy conservation. Moreover, the transportation industry must shift towards clean energy sources to reduce its carbon emissions. The results of this study provide a framework for policymakers and city planners to develop effective strategies for reducing carbon emissions in the city. The adoption of renewable energy sources and the implementation of energy-efficient practices can significantly reduce the city’s carbon footprint, promoting sustainable economic growth and environmental protection. In the experimental data set, we adopted an innovative method to map the information of each power station in the power system to node information, and established a complex and organic relationship network through the association of lines and currents. In this network, each power station information is not only regarded as a node, but its associated electricity customers are also included, thus presenting the intricate connection between different elements in the power system. In addition to the basic node and association information, we also introduce exogenous information and integrate it into the relational network. These external sources of information can be environmental factors, market conditions, etc. Based on the above nodes, associations and exogenous information, we successfully constructed a spatiotemporal power network knowledge graph. This map not only presents the complex relationships among various elements within the power system, but also reflects the changes of these relationships over time and space. In the training process, we first divide the data set into a training set and a test set, choose to use 90% of the data set as the training set, and use the remaining 10% of the data as the test set, and use cross-validation to train and test the performance of the model.

4.2 Baselines models

In this paper, we benchmark the proposed model against other models and graphical models in deep learning, including GCN, GAT, LSTM, GRU and RNN.

Graph Convolutional Network (GCN) [36] belongs to the family of Graph Neural Network (GNN) models and was introduced by N. Kipf and Max Welling 2016. Its key ability lies in learning node representations by leveraging the convolution operation on the graph structure. By incorporating the information from neighboring nodes on the graph, the representation of each node can be more comprehensive. The fundamental concept of GCN involves aggregating the features of nodes and their neighbors, applying linear transformations, and nonlinear activation functions to derive a new representation for each node. The core of GCN, however, is to aggregate neighbor node features through a weighted average operation, where the weights are determined by the edge weights between the nodes.

Graph Attention Network (GAT) [37] is a member of the Graph Neural Network (GNN) family, first proposed by Petar Veličković et al. in 2018. GAT innovates by introducing an attention mechanism to model the relationship between nodes, enabling more flexible learning of the connections between noes and improving the expressiveness of node representations. The main concept behind GAT is to weight and average the features of nodes and their neighbors, with the weights learned via attention mechanisms. Specifically, the attention mechanism in GAT determines the importance of each node to other nodes by calculating the similarity between them, and this similarity is learned through the neural network model.

Recurrent Neural Network (RNN) [38] are neural networks designed to process sequence data by retaining previous information and updating the state dynamically based on the current input. Elman introduced the RNN concept in 1990. RNNs possess inherent advantages in processing time series data, and they have found widespread use in fields such as speech recognition, natural language processing, and machine translation. The fundamental idea behind RNNs is to feed the output of the previous time step as the input to the next time step and compute the state of the current time step based on the present input and the previous time step’s state. In an RNN, every time step has a state, and the states get passed and updated through the weight matrix.

Long Short-Term Memory (LSTM) [39] is a variant of RNN used to model sequence data. Hochreiter and Schmidhuber proposed LSTM in 1997 to address the gradient disappearance and explosion problems that traditional RNNs face when dealing with long sequences. The primary concept behind LSTM is to control information flow by introducing a gating mechanism. In an LSTM, the input data at each moment, the previous moment’s state, and output are managed via a series of gates, regulating the retention and forgetting of information. The core structure of LSTM consists of memory units and three gates: the input gate, forget gate, and output gate. Memory cells store and transfer information, while the input, forget, and output gates control the flow and preservation of data.

Gated Recurrent Unit (GRU) [40] are gated recurrent neural networks and an improvement on traditional RNNs, introduced by Cho et al. in 2014. Like LSTM, GRUs better capture dependencies in long sequences by incorporating a gating mechanism, preventing issues such as gradient disappearance and explosion. Compared to LSTM, GRU employs simpler gate control, consisting of only two gates: the reset gate and the update gate. The reset gate regulates the significance of the previous moment’s information at the current moment, while the update gate controls the degree of fusion between the current moment’s input data and the previous moment’s state. The core concept of GRU involves weighting and averaging the previous moment’s state and the current moment’s input, controlling the fusion of the current input and the previous moment’s state by resetting the gate to manage the previous moment’s forgetting and updating the gate. Compared to traditional RNNs, GRU better captures long- term dependencies in time series data, offering better modeling capabilities.

4.3 Influencing factors

The primary factors that influence changes in carbon emissions across various industries on the urban power consumption side include weather patterns, seasonal fluctuations, festivals, industry-specific characteristics, power consumption unit size, power consumption unit type, power consumption unit development index, and user energy consumption product ratio. In addition to these factors, periodic industry activities, industry tidal flow, engineering overtime, and maintenance shutdowns may also directly impact electricity consumption and subsequently affect carbon emission levels. However, since these factors do not exhibit a discernible pattern of influence in this project, they are not considered in the model.

Among the five industries studied, we found that the transportation industry and the construction industry are the most affected by the weather For instance, when it rains, private car and taxi usage in the transportation industry tends to increase, but the amount of rainfall follows a normal long-tail distribution, thereby affecting carbon emissions. Similarly, in the construction industry, rain can impede work progress, leading to a subsequent impact on carbon emissions. To address this, we will add a correction coefficient (carbon emission fluctuation factor) to the model prediction. According to the weather elements of various industries, including the weather carbon emission fluctuation factor coefficient of the construction industry, the transportation industry weather carbon emission fluctuation factor coefficient, and the chemical industry weather carbon emission coefficient Emission fluctuation factor coefficient, steel industry weather carbon emission fluctuation factor coefficient.

In addition, among the five industries studied in this study, we also found that the transportation industry will also increase carbon emissions due to the use of air-conditioning in cars in summer and winter, and the same is true for the construction industry. As above, according to the weather elements of various industries, the weather carbon emission fluctuation factor coefficient of the construction industry, the weather carbon emission fluctuation factor coefficient of the transportation industry, the weather carbon emission fluctuation factor coefficient of the chemical industry, and the steel industry weather carbon emission fluctuation factor coefficient are added.

Table 1
Parameter tables for different models

Method	LSTM-GAT	GCN	GAT	RNN	LSTM	GRU
Batch_size	64	64	64	64	64	64
Learning rate	0.001	0.001	0.001	0.001	0.001	0.001
Train epochs	100	80	80	70	70	70
Model layers	5	5	5	2	2	2
The number of hidden layer units	100	60	60	80	80	100
Optimization function	adam	adam	adam	adam	adam	adam
Loss function	mse	mse	mse	mse	mse	mse

4.4 Model parameters

During the experimental phase of a research project, it is of utmost importance to carefully evaluate the parameters of various models to ensure that the obtained results are reliable and accurate. This is particularly crucial when dealing with complex models that require a significant amount of time and computational resources to train. As highlighted in Table 1, we conducted a thorough evaluation of the parameters of different models and monitored their training progress closely to ensure that we obtained precise and dependable results. It is essential to note that the training times of these models may vary considerably depending on their complexity and the size of the dataset. Some models may take longer to converge, while others may converge more quickly. Nonetheless, it is imperative to avoid performing unnecessary training once convergence is achieved to save significant computational resources and time, especially when working with large datasets. Therefore, careful monitoring of the training progress is essential to ensure that the training process is terminated once convergence is reached. It is also important to select the initial values of the model’s parameters carefully to avoid getting stuck in local optima, which can lead to poor performance. In such cases, the model may fail to capture the underlying patterns and relationships in the data, resulting in unreliable and inaccurate results. Therefore, we need to conduct a comprehensive analysis of the model’s parameters and select the best combination of values that can help the model converge to the optimal solution. In conclusion, evaluating the parameters of various models and monitoring their training progress carefully are critical steps during the experimental phase. By doing so, we can ensure that the results obtained from the experiments are reliable and accurate while saving significant computational resources and time. These steps are especially crucial when dealing with complex models that require a considerable amount of time and computational resources to train. Thus, researchers must conduct a thorough analysis of the model’s parameters and select the best combination of values to obtain the most accurate and reliable results possible.

4.5 Results and analysis

The presented Table 2 provides a comprehensive analysis of the models that were evaluated for each month in 2022. This in-depth analysis leads to an important conclusion that the proposed LSTM-GAT model achieves the highest accuracy and the lowest MAE and RMSE compared to all other models. This finding highlights the effectiveness of the proposed model in accurately predicting the values in the given dataset. Upon closer inspection of Table 2, it was observed that the GAT model has also performed relatively well and exhibits accuracy that is closely following the LSTM-GAT model. However, the GRU model, considered the simplest of all models, performed the worst, with the lowest prediction accuracy and the highest MAE and RMSE. The large differences in accuracy, MAE, and RMSE among the models are mainly due to the feature extraction methods used by each model. The LSTM-GAT model was able to extract both relationship features in all nodes and time-series features between different days, leading to the highest prediction accuracy, the lowest MAE and RMSE. On the other hand, the GAT model only extracts relationship features in different nodes, resulting in a slight reduction in accuracy. Finally, the GRU model, being the simplest of all models with fewer training parameters and learned features, has not been able to capture the complexity of the data and has not extracted the necessary features required for accurate predictions. Therefore, it is evident that the LSTM-GAT model proposed in the paper is the most effective model in predicting the values in the given dataset, and it can be considered as the best performing model amongst all the models that were considered in this study. This conclusion is a significant contribution to the field of machine learning and data analysis as it highlights the importance of feature extraction methodology in achieving accurate predictions. In conclusion, the analysis presented in Table 2 clearly showcases the effectiveness of the proposed LSTM-GAT model in accurately predicting the values in the given dataset. This study also highlights the importance of feature extraction methodology in achieving accurate predictions and the limitations of simpler models in capturing the complexity of the data. Therefore, this study provides valuable insights for the field of machine learning and data analysis, which can be used to improve the accuracy of predictions in various fields and reduce the predicted MAE and RMSE.

Table 2
Table of experimental results of different models

Method	LSTM-GAT	GCN	GAT	RNN	LSTM	GRU
Accuracy	89.5%	82.5%	83.4%	80.3%	82.9%	78.1%
MAE	1.5	3.4	2.8	3.9	3.1	4.5
RMSE	0.12	0.26	0.16	0.32	0.23	0.41

In our research, we undertook a rigorous and thorough analysis of various models to predict carbon emissions. The results of our study are presented in Fig. 5, where we list the forecast accuracies of all the models for each month. What is particularly noteworthy about our findings is that regardless of the month, the LSTM-GAT method consistently outperformed all the other models that were considered. The GAT model came in second, while the GRU model ranked as the least effective method. Our experimental results highlight the effectiveness of the approach proposed in this paper. By incorporating both the related information of the graph and the time series information of the LSTM model, we were able to accurately predict carbon emissions. This is particularly important in today’s world, where reducing carbon emissions is a top priority for many nations. Our research significantly contributes to the existing body of knowledge in the field by providing compelling evidence of the effectiveness of the proposed method. We hope that our findings will inspire and encourage other researchers to explore the use of graph-based models in predicting carbon emissions. We believe that this approach has the potential to significantly advance our understanding of climate change and help us develop more effective strategies to combat it. It is important to emphasize that the results presented in our study are robust and reliable. We conducted a comprehensive analysis of various models, carefully evaluating their performance under different conditions. We also carefully monitored the training progress of the models to ensure that we obtained accurate and reliable results. In conclusion, our research demonstrates that the LSTM-GAT method is the most effective model for predicting carbon emissions. This finding is significant as it can potentially help governments and organizations worldwide make informed decisions about reducing carbon emissions. We hope that our study will inspire further research and innovation in this important field.

Figure 5.

Experimental results of different models for each month.

4.6 Parametric analysis

In addition to evaluating the experimental accuracy of the model, we also conducted an in-depth analysis of the impact of different training times on the performance of the model. The experimental results were graphically represented in Fig. 6, which clearly illustrates the trends in accuracy with respect to different training times. It is evident from the graph that the accuracy of all models initially increases with increasing training times. This initial increase can be attributed to the fact that the model is gradually learning and improving its performance with more exposure to the training data. However, beyond a certain threshold, the accuracy of the models starts to decline. This decline is attributed to the phenomenon of overfitting, which occurs when the model is too closely fitted to the training data and loses its ability to generalize to new, unseen data. The optimal number of training times required for each model is not a universal value and is dependent on several factors. The performance of the model is influenced by both the intrinsic characteristics of the model itself and the specific parameters chosen for the model. Therefore, it is essential to carefully select the training times and other hyperparameters to ensure the optimal performance of the model. Furthermore, our analysis revealed that the rate of increase in accuracy varies significantly between models. Some models exhibit a sharp increase in accuracy at the initial stages of training, while others show a more gradual improvement. This variation can be attributed to differences in the complexity of the models and the data sets used for training. In summary, our experiments and analysis have shown that the performance of a machine learning model is heavily influenced by the training times and other hyperparameters selected for the model. Careful consideration and selection of these parameters are essential for obtaining optimal performance from the model. By analyzing the accuracy trends for different training times, we can identify the optimal number of training times required for a specific model and parameter combination.

Figure 6.

Experimental results of different models for different training epochs.

5. Conclusion

This paper presents a novel approach for predicting carbon emissions using a hybrid model based on LSTM and GAT. The proposed model leverages the plant information by mapping it to node information and establishes a relationship network among them. The node information comprises the plant station details, its associated electricity customers, and external source information embedded in the network. Using this knowledge graph as input, we leverage the GAT to extract relational features. Next, we employ LSTM to extract temporal features. Finally, we utilize these features to make accurate predictions. The experimental results demonstrate that the LSTM-GAT model outperforms other single models, including GAT, LSTM, GRU, and RNN, with higher accuracy and lower error value. The accuracy rate of the proposed model reaches 89.5%, making it more suitable for predicting carbon emissions and related indicators. The model predicts that the carbon emissions of the major city in China will exhibit a slow growth trend, while the carbon emission intensity will decrease. This prediction provides a scientific basis for government decision-making.

Footnotes

Acknowledgments

This project is supported by the State Grid Corporation of China’s Science and Technology Project as well as the Research on Key Technologies for Carbon Emission Monitoring and Diagnosis of Urban Energy and Power Based on Multi-source Heterogeneous Data Fusion and Sharing (No.5700-202290184A-1-1-ZN).

References

Shen

Malik

, et al. Does green investment, financial development and natural resources rent limit carbon emissions? A provincial panel analysis of China. Science of the Total Environment. 2021; 755: 142538.

Zhao

Zuo

Wang

, et al. Review of green and low-carbon ironmaking technology. Ironmaking & Steelmaking. 2020; 47(3): 296-306.

Kim

Choi

Kang

, et al. A systematic review of the smart energy conservation system: From smart homes to sustainable smart cities. Renewable and Sustainable Energy Reviews. 2021; 140: 110755.

Ren

Wang

, et al. Can carbon emission trading scheme achieve energy conservation and emission reduction? Evidence from the industrial sector in China. Energy Economics. 2020; 85: 104590.

Qader

Khan

Kamal

, et al. Forecasting carbon emissions due to electricity power generation in Bahrain. Environmental Science and Pollution Research. 2021; 1-12.

Huang

Shen

Liu

. Grey relational analysis, principal component analysis and forecasting of carbon emissions based on long short-term memory in China. Journal of Cleaner Production. 2019; 209: 415-423.

Dong

Wang

Abbas

. A survey on deep learning and its applications. Computer Science Review. 2021; 40: 100379.

Zhou

Zhang

Liu

, et al. Application of deep learning in food: A review. Comprehensive Reviews in Food Science and Food Safety. 2019; 18(6): 1793-1811.

Janiesch

Zschech

Heinrich

. Machine learning and deep learning. Electronic Markets. 2021; 31(3): 685-695.

10.

Brynjolfsson

Mitchell

. What can machine learning do? Workforce implications. Science. 2017; 358(6370): 1530-1534.

11.

Jha

Gupta

Ward

, et al. Enabling deeper learning on big data for materials informatics applications. Scientific Reports. 2021; 11(1): 4244.

12.

Rosenblatt

. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review. 1958; 65(6): 386.

13.

Rumelhart

Hinton

Williams

. Learning representations by back-propagating errors. Nature. 1986; 323(6088): 533-536.

14.

Gardner

Dorling

. Artificial neural networks (the multilayer perceptron) – a review of applications in the atmospheric sciences. Atmospheric Environment. 1998; 32(14-15): 2627-2636.

15.

Tang

Deng

Huang

. Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems. 2015; 27(4): 809-821.

16.

Paola

Schowengerdt

. A review and analysis of backpropagation neural networks for classification of remotely-sensed multi-spectral imagery. International Journal of Remote Sensing. 1995; 16(16): 3033-3058.

17.

Ding

. An optimizing BP neural network algorithm based on genetic algorithm. Artificial Intelligence Review. 2011; 36: 153-162.

18.

Hinton

Salakhutdinov

. Reducing the dimensionality of data with neural networks. Science. 2006; 313(5786): 504-507.

19.

Van Houdt

Mosquera

Nápoles

. A review on the long short-term memory model. Artificial Intelligence Review. 2020; 53: 5929-5955.

20.

, et al. A review of recurrent neural networks: LSTM cells and network architectures. Neural Computation. 2019; 31(7): 1235-1270.

21.

Pan

Chen

, et al. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems. 2020; 32(1): 4-24.

22.

Zhu

Cheng

Luo

, et al. SI-News: Integrating social information for news recommendation with attention-based graph convolutional network. Neurocomputing. 2022; 494: 33-42.

23.

Xiang

Wen

Cheng

, et al. General graph generators: experiments, analyses, and improvements. VLDB Journal. 2021; 1-29.

24.

Niu

Zhong

, A review on the attention mechanism of deep learning. Neurocomputing. 2021; 452: 48-62.

25.

Cheng

Chen

Wang

, et al. Efficient top-k vulnerable nodes detection in uncertain graphs. IEEE Transactions on Knowledge and Data Engineering. 2023; 35(2): 1460-1472.

26.

Dong

Liu

, et al. Weighted feature fusion of convolutional neural network and graph attention network for hyperspectral image classification. IEEE Transactions on Image Processing. 2022; 31: 1559-1572.

27.

Torres

Hadjout

Sebaa

, et al. Deep learning for time series forecasting: a survey. Big Data. 2021; 9(1): 3-21.

28.

Graves

Schmidhuber

. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks. 2005; 18(5-6): 602-610.

29.

Yao

. A new regression model: modal linear regression. Scandinavian Journal of Statistics. 2014; 41(3): 656-671.

30.

. Forecasting Chinese carbon emissions based on a novel time series prediction method. Energy Science & Engineering. 2020; 8(7): 2274-2285.

31.

Sun

Liu

, Prediction and analysis of the three major industries and residential consumption CO2 emissions based on least squares support vector machine in China. Journal of Cleaner Production. 2016; 122: 144-153.

32.

Sun

Huang

. Predictions of carbon emission intensity based on factor analysis and an improved extreme learning machine from the perspective of carbon emission efficiency. Journal of Cleaner Production. 2022; 338: 130414.

33.

Wang

Mao

Wang

, et al. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering. 2017; 29(12): 2724-2743.

34.

Zhu

Cheng

Luo

, et al. Leveraging enterprise knowledge graph to infer web events’ influences via self-supervised learning. Journal of Web Semantics. 2022; 74: 100722.

35.

Cheng

Yang

Xiang

, et al. Financial time series forecasting with multi-modality graph neural network. Pattern Recognition. 2022; 121: 108218.

36.

Sun

Zhao

Gilvary

, et al. Graph convolutional networks for computational drug development and discovery. Briefings in Bioinformatics. 2020; 21(3): 919-935.

37.

Lee

Rossi

Kim

, et al. Attention models in graphs: A survey. ACM Transactions on Knowledge Discovery from Data. 2019; 13(6): 1-25.

38.

Schuster

Paliwal

. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing. 1997; 45(11): 2673-2681.

39.

Van Houdt

Mosquera

Nápoles

. A review on the long short-term memory model. Artificial Intelligence Review. 2020; 53: 5929-5955.

40.

Chen

Jing

Chang

, et al. Gated recurrent unit based recurrent neural network for remaining useful life prediction of nonlinear deterioration process. Reliability Engineering & System Safety. 2019; 185: 372-382.

Carbon emissions forecasting based on temporal graph transformer-based attentional neural network

Abstract

Keywords

1. Introduction

2. Related work

3. LSTM-GAT model

3.1 Model structure

4.1 Datasets

4.2 Baselines models

4.3 Influencing factors

Table 1 Parameter tables for different models

4.5 Results and analysis

Table 2 Table of experimental results of different models

Footnotes

Acknowledgments

References

Table 1
Parameter tables for different models

Table 2
Table of experimental results of different models