Abstract
Accurate taxi demand forecasting is significant to estimate the change of demand to further make informed decisions. Although deep learning methods have been widely applied for taxi demand forecasting, they neglect the complexity of taxi demand data and the impact of event occurrences, making it hard to effectively model the taxi demand in highly dynamic areas (e.g., areas with frequent event occurrences). Therefore, to achieve accurate and stable taxi demand forecasting in highly dynamic areas, a novel hybrid deep learning model is proposed in this study. First, to reduce the complexity of taxi demand time series, the seasonal-trend decomposition procedures based on loess is employed to decompose the time series into three simpler components (i.e., seasonal, trend, and remainder components). Then, different forecasting methods are adopted to handle different components to obtain robust forecasting results. Moreover, considering the instability and nonlinearity of the remainder component, this study proposed to fuse the event features (in particular, text data) to capture the unusual fluctuation patterns of remainder component and solve its extreme value problem. Finally, genetic algorithm is applied to determine the optimal weights for integrating the forecasting results of three components to obtain the final taxi demand. The experimental results demonstrate the better accuracy and reliability of the proposed model compared with other baseline forecasting models.
Keywords
Introduction
Complex transportation systems and instability due to multiple factors have rendered taxi demand forecasting extremely challenging. It is important to accurately forecast taxi demand for alleviating traffic pressure and avoiding resource wastage [46]. From the perspective of taxi-calling platforms such as Uber, knowing the future demand for taxis in advance will allow it to assign the appropriate number of taxi drivers. Analogously, from the perspective of shared car platforms such as Car2Go, it will allow suppliers to reallocate their vehicles at night based on future demand forecasting [35]. With accurate taxi demand forecasting, taxi drivers will know places with many passengers to pick them up as soon as possible. Additionally, passengers will have taxis which are waiting for them early [45]. Therefore, efficient scheduling allows taxi drivers and passengers to find each other in the least amount of time, thus benefitting both parties.
In the past decade, various forecasting methods have been proposed to forecast traffic volume, taxi demand, and travel time [18]. The typical methods are statistical models such as autoregressive integrated moving average (ARIMA) [19, 26] and Kalman filtering [10], and nonlinear methods such as support vector machine (SVM) [1]. However, these methods cannot extract features of time series automatically, and the obtained features are often superficial. Fortunately, deep learning models are good at extracting the deep abstract features automatically, which has been proven in our previous study [2]. Recently, some advanced and powerful deep learning models have been proposed for traffic forecasting, including long short term memory network (LSTM) [24], echo state network [34], deep belief network [13], and convolutional neural network (CNN) [44].
Although these previous works can obtain promising traffic forecasting results to some extent, they ignore the complexity of traffic data and the effect of event occurrences, making it difficult to realize accurate taxi demand forecasting in highly dynamic areas. Highly dynamic areas are characterized by the frequently and irregularly held special events (e.g., gala performances, concerts, and sport games). Therefore, to forecast the taxi demand more accurately and stably in highly dynamic areas, a novel hybrid deep learning model (HDLM) is proposed in this study.
The main contributions of this study are as follows: A new model using a hybrid deep learning structure is proposed to handle the complex taxi demand data from highly dynamic areas. It can improve the robustness of the taxi demand forecasting model by simplifying the taxi demand forecasting problem and exploiting the effect of event features on taxi demand. Deep learning model is combined with time series decomposition techniques to model the taxi demand time series more effectively. Specifically, seasonal-trend decomposition procedures based on loess (STL) [3] is employed to decompose the global time series into simpler components, and different forecasting methods are adopted to handle different components, which is beneficial for reducing the complexity of taxi demand time series and capturing the variation patterns of taxi demand time series. Considering that the event occurrences are closely related to the unusual fluctuations in the remainder component, fusion of event features (in particular, text data) is proposed to help deep learning model capture the unusual fluctuation patterns of remainder component and solve its extreme value problem. Genetic algorithm (GA) [12] is applied to determine the optimal weights for integrating the forecasting results of three decomposed components, which can further improve the forecasting accuracy. The superiority of the proposed model is verified on two datasets. Experimental results indicate that the proposed model outperforms state-of-art baseline models in highly dynamic areas.
The remainder of this study is outlined as follows: Section 2 reviews the relevant literature. Section 3 provides a detailed description on the taxi demand forecasting model proposed in this study. Section 4 discusses and analyzes the experimental results. Conclusions and future research directions are presented in Section 5.
Related work
The time series decomposition technique and deep learning have attracted significant attention from researchers. This section provides a brief review of the time series decomposition technique and deep learning.
Decomposition of time series
A time series contains many constituents that affect each other. Modeling a global time series is not easy. It is important to decompose a time series into simpler parts that can be better understood to simplify the forecasting problem. For example, Pal and Kar [31] obtained promising results in forecasting the electricity market’s price by using fuzzy transformation to decompose the time series. Lin et al. [20] proposed to use empirical mode decomposition method to decompose time-series data of visitors, and obtained good results in forecasting tourist capacity. Qin et al. [34] used STL with two improvement strategies to forecast monthly passenger flow in China, and achieved the desired results.
Compared with fuzzy transformation and empirical mode decomposition, STL presents distinctive advantages. STL can generate robust components that exhibit strong adaptability to outliers. Furthermore, STL is based on numerical methods and does not require complex parameters, and therefore implemented easily [39]. Many researchers have used STL to solve forecasting problems. For example, Nguyen and Novák [29] decomposed a time series into three components using two methods: STL and ARIMA, and measured their respective performance. Luo et al. [23] proposed to decompose the monthly streamflow time series with STL and combined SVM and generalized regression neural network to model seasonal and trend components. Liu et al. [21] proposed a method for forecasting dengue fever time series data based on STL decomposition. Gao et al. [6] adopted STL to remove the noise and seasonal fluctuation of Visible Infrared Imaging Radiometer Suite data. However, in most studies based on STL, the hidden information in the remainder component cannot be fully utilized, which affects the forecasting accuracy because the remainder component is part of the global time series and represents some significant patterns. In particular, the extreme value problem of the remainder component has not been well addressed [38].
Realizing that the occurrence of events is closely related to the unusual fluctuations in the remainder component, this study factors the remainder component into the forecasting process and explores the event influencing feature and event text feature to solve the extreme value problem.
Deep learning in taxi demand forecasting
Taxi demand forecasting has attracted significant attention recently owing to its inherent value. Over the last decade, deep learning has been proven successful in taxi demand forecasting. Wang et al. [44] proposed to use random forest and CNN to detect unauthorized ridesharing cars, which is helpful to alleviate the existence of unregulated activities issues. Kuang et al. [16] used attention-based LSTM and 3D CNN to forecast taxi demand by generating feature embedding and capturing the correlation between taxi pick-up and drop-off. Wang et al. [43] proposed a deep neural network structure called Deep supply-demand which considers the environment factors such as weather and traffic condition data. Vanichrujee et al. [41] integrated the forecasting results from LSTM, gated recurrent unit, and extreme gradient boosting (XGBoost) to obtain the final taxi demand with high accuracy. Liu et al. [22] used backpropagation neural network (BPNN) and XGBoost to combine different information to explore the correlation between taxi demand and online taxi-hailing demand.
Although the studies described above demonstrate the power of deep learning for taxi demand forecasting, few have considered the effect of events on taxi demand. It is possible to improve the forecasting accuracy by combining additional information [33, 42]. Recently, Rodrigues et al. [35] used deep learning to combine text data regarding events with time series to forecast taxi demand, and found that the fusion of event information increases forecasting accuracy. However, the complexity of the global taxi demand time series has not been well explored in their study, which may affect the effect of text data fusion. Furthermore, they used the fixed word vectors GloVe [32] trained by other text data to represent the current word vectors, as may result in inconsistences and conflicts.
To solve these limitations, the deep learning approach is applied to a decomposed time series in this study, and the word2vec [25] and embedding layer [17] are used to generate dynamic word vectors that are consistent with and suitable for the current text data, thereby minimizing the loss of hidden information in the text.
Methodology
A novel HDLM that decomposes a time series and fuses event influencing feature and event text feature is proposed in this study. Before the modeling framework is described, background knowledge regarding text data processing, STL, LSTM, and MLP are presented.
Background
Processing and conversion of text data
To fuse unstructured text data extracted from the web, including the description and title of events, the word2vec is adopted to generate structured word vectors because of its powerful feature representation capability [25]. Text data conversion includes applying lowercase transformation, separating sentences, removing high frequency and stop words, converting words to integer identifiers, and initializing an embedding laye with pre-trained vectors generated by word2vec. The embedding layer is fine-tuned with the training process.
Seasonal-trend decomposition procedures based on loess
STL with its component patterns is significantly more flexible than traditional single parameter methods for time series decomposition. It includes inner and outer loops, with the former including steps for detrending, smoothing cycle-subseries, low-pass filtration, deseasonalizing, and trend smoothing, and the latter including steps for calculating the remainder and robustness weights [3]. Following these steps, the global time series is decomposed into seasonal, trend, and remainder components. The decomposition of STL can be defined as follows [3]:
The seasonal-cycle method is adopted to forecast the seasonal component, and the formula is defined as follows:
LSTM has been widely applied to speech recognition, image capturing, and time series forecasting. LSTM incorporates memory units, and learns when to forget memories and update memories by means of input gate, forget gate, and output gate. The processes of three gates and the output of the cell are defined as follows [24]:
MLP is a kind of artificial neural network composed of some fully connected layers [30].It comprises three parts: input layer, hidden layer, and output layer. The computation of an MLP can be defined as follows [8]:
Figure 1 shows an illustrative example that the event occurrences are closely related to the unusual fluctuations in the remainder component. As can be seen, the appearance of extreme values in the remainder component is usually accompanied by the event occurrences. For example, at the time point on February 4, 2014, a basketball game was held and the value of remainder component is higher than those at the surrounding time points. Therefore, this study proposed to fuse event features to help deep learning model capture the unusual fluctuation patterns in the remainder component.

Part of the remainder component with events occurrences.
The framework of the proposed HDLM is shown in Fig. 2, consisting of three modules: time series decomposition, extraction of event features, and modeling and forecasting. In the time series decomposition module, the STL is employed to decompose the global time series into simpler components, including seasonal, trend, and remainder components. In the extraction of event features module, the event features including event influencing feature (EI) and event text feature (ET) are extracted from event text data, and are then used to help LSTM capture the unusual fluctuation patterns in the remainder component. In the modeling and forecasting module, different forecasting methods including seasonal-cycle, MLP, and LSTM are adopted to model corresponding components, respectively. Finally, GA is employed to determine the optimal weights for integrating forecasting results of three components to obtain the final taxi demand. The detailed process of the model will be described below. The process of global time series decomposition is shown in Part A of Fig. 2. It is decomposed by STL into three components: seasonal component (S
d
), trend component (T
d
), and remainder component (R
d
). Each of these components has distinct characteristics compared with the global time series. As shown in Part B of Fig. 2, to express the different effects of events occurring at different times, the event influencing feature is analyzed and represented with variables The extraction process of the event text feature is shown in Part B of Fig. 2. The event text feature only represents the text data regarding the event that occurs on the current day. The event text feature is a one-dimensional vector composed of some integer identifiers. The event text vectors are input into a word embedding layer that is initialized by word2vec, and then converted to the word vector matrix [25]. The word2vec in this study uses the continuous bag-of-words model, in which, the generated word vector dimension is set to 400, the train window is set to 5, the number of iterations is set to 10, the sampling rate is set to 0.001, and words with frequency less than 2 are filtered out. The size of the word vector matrix is ML×400, where ML is the maximum length of a word sequence. All the parameters in word2vec are set using the trial-and-error method [7]. The word vector matrix is subsequently passed to five one-dimensional convolutional layers of a CNN with 30 filters of size 3 and corresponding max-pooling layers of sizes 3, 3, 2, 2, and 2. relu is used as the activation of each convolutional layer since it is easy to converge and can avoid the vanishing gradient problem. All the hyperparameters in CNN structure are determined using grid search method, which can effectively select the optimal combination of hyperparameters. The dropout layer with dropout rate 0.5 is adopted between the convolutional layers [37]. The output of the last layer is fused with the remainder component. To forecast the trend and remainder components on day d + 1, T
d
, Td-1,..., Td-h+1 and R
d
, Rd-1,..., Rd-h+1 on the past h days are used. As shown in Part C of Fig. 2, the vectors {T
d
, Td-1,..., Td-h+1} are input into the MLP to obtain the partial result The partial result

Framework of the proposed taxi demand forecasting model.
where α1, α2, and α3 are the weights. Moreover, to obtain the
In this section, the past h days is set to 10. To ensure the fair model comparison and experimental effectiveness, some hyperparameters settings in the baseline deep learning models are same as that of HDLM. Specifically, their batch size, number of epochs, learning rate, and objective function are set to 64, 500, 0.01, and mean squared error, respectively. The STL is implemented in the R language. Data processing and model construction are implemented in Python version 3.7, and deep learning structures are implemented in Keras (https://keras.io/). The core configuration of the computer includes Intel Core i7-8700 3.2 GHz CPU, 32 GB memory, and NVIDIA GeForce RTX 2080 graphics.
Data description
This study mainly focuses on the taxi demand forecasting task in areas with frequent event occurrences. Therefore, two highly dynamic areas, i.e., Barclays Center and Terminal 5 are evaluated. Barclays Center is located in Brooklyn, USA and has 18000 seats hosting large musical shows and basketball games regularly. Terminal 5 is a large area that is located in Manhattan, USA and hosts many concerts regularly. All the taxi pickups produced in these two areas were collected, and used as the taxi demand data. For the Barclays Center, the event data were extracted from its official website, which includes 751 events (e.g., Brooklyn nets vs Houston rockets). For the Terminal 5, the event data were obtained from Facebook, which includes 315 events (e.g., cultural prophetic 20th anniversary concert). Specifically, each event contains some features such as date, start time, title, provenance, and description. The complete experimental dataset including taxi demand data and event text data can be obtained from Rodrigues et al. [35].
Data preprocessing
The original taxi demand data were obtained every half hour. To forecast the daily taxi demand, half-hourly taxi demand data were aggregated to obtain the daily taxi demand data. Moreover, to deal with the different orders of magnitude for different features, the data normalization was performed to scale the original data into the range [0, 1] according to the following Equation:
For a fair comparison with the work of Rodrigues et al. [35], the partition of experimental datasets was the same as that of Rodrigues et al. [35]. For the Barclays Center dataset, data from January 2013 to September 2014 were used as the training set, data from October 2014 to December 2014 as the validation set, and data from January 2015 to June 2016 as the test set. For the Terminal 5 dataset, data from January 2013 to December 2014 were used as the training set, data from January 2015 to December 2015 as the validation set, and data from January 2016 to June 2016 as the test set.
For event data, some simple text preprocessing methods were implemented: (1) removing some trivial features in the event data (i.e., date, start time, and provenance); (2) converting the uppercase letters in the title and description of event data to the lowercase letters; (3) separating sentences into words and deleting highly frequent and stop words.
To evaluate and compare the performances of the proposed taxi demand forecasting models, some typical forecasting errors are employed as indices of performance assessment, including mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2), absolute percentage error (APE) and mean absolute percentage error (MAPE). Their formulas are defined as follows:
To evaluate the contributions from different functional modules, an incremental analysis of the proposed deep learning structure was adopted: starting with the simplest model, and adding the functional modules incrementally to the model until the final forecasting model was obtained. The models obtained through incremental analysis include STL-SVR, STL-SVR-EI, STL-MLP, STL-MLP-EI, STL-MLP-EIET, STL-LSTM, STL-LSTM-EI, STL-LSTM-EIET, STL-MLP-LSTM-EIET, and STL-MLP-LSTM-EIET-GA (i.e., HDLM). In all models except STL-MLP-LSTM-EIET-GA, the weights α1, α2, and, α3 are all set to 1. In STL-MLP-LSTM-EIET-GA, the weights α1, α2, and, α3 are determined by genetic algorithm. Simultaneously, some state-of-art baseline models, i.e., traditional machine learning model (XGBoost and LASSO), ensemble learning model (Ensemble model), and deep learning model (BPNN, DL-LSTM L + W+E+T, and DL-FC L + W+E+T) were compared in this experiment. All models under comparison and their specific descriptions are listed in Table 1. The STL denotes seasonal-trend decomposition procedures based on loess. The XGBoost, SVR, LSTM, FC, and MLP denote the extreme gradient boosting, support vector regression, long short term memory network, fully-connected dense layers, and multi-layer perceptron, respectively.
The models and their descriptions in the experiment
The models and their descriptions in the experiment
Forecasting results in barclays center
In the experiment, the average results and standard deviations are obtained after 30 executions. After the global time series is decomposed into seasonal, trend, and remainder components, different forecasting methods are applied to each component.
Table 2 shows the average results of the trend and remainder components using different deep learning modules in the Barclays Center dataset. The STL-SVR and STL-SVR-EI use SVR to forecast the trend and remainder components The STL-LSTM, STL-LSTM-EI, and STL-LSTM-EIEF use LSTM to forecast the trend and remainder components. The STL-MLP, STL-MLP-EI, and STL-MLP-EIEF use MLP to forecast the trend and remainder components. For the trend component, models using MLP outperform those using SVR or LSTM in all metric values. For the remainder component, models using LSTM outperform those using SVR or MLP in all metric values. These results demonstrate that MLP is most good at forecasting the trend component, and LSTM is most good at forecasting the remainder component. Therefore, the HDLM proposed in this study uses the MLP and LSTM to forecast the trend and remainder components, respectively.
Average results and standard deviations among different methods in Barclays Center
Average results and standard deviations among different methods in Barclays Center
Table 3 shows the forecasting results of all models in the Barclays Center dataset. Initially, the LSTM, STL-LSTM, MLP, and STL-MLP are compared to validate the STL. For example, the MAE, RMSE, MAPE, and R2 obtained by LSTM are 137.1, 182.6, 16.2, and 42.4, respectively. After the time series is decomposed via STL, the forecasting results are improved, with the MAE, RMSE, MAPE, and R2 of STL-LSTM obtained as 98.4, 140.2, 11.6, and 66.1, respectively. Figure 3 shows the comparison results between LSTM and STL-LSTM with truth and heatmaps. Figure 4 shows the comparison results between MLP and STL-MLP with truth and heatmaps. In these heatmaps, each orange square represents the APE value for a day, e.g., the orange square X represents the APE value for January 11th, 2015, and the orange square Y represents the APE value for January 17th, 2015. The vertical coordinate represents the day of the week. The darkness of the orange square in the heatmaps represents the error level of the forecasting results, i.e., the lighter orange square represents the smaller error. As shown in Figs. 3 and 4, STL-LSTM and STL-MLP can obtain better forecasting results and lower APEs than LSTM and MLP, respectively.
Average results and standard deviations among different models in Barclays Center
Significant values are boldfaced.

Comparison of results between LSTM and STL-LSTM with truth and heatmaps in Barclays Center.

Comparison of results between MLP and STL-MLP with truth and heatmaps in Barclays Center.
Additionally, Table 3 shows that the fusion of the event influencing feature (EI) reduces the forecasting error of all models effectively. For example, the MAE, RMSE, and MAPE obtained by STL-MLP are 96.5, 139.5, and 11.4, respectively. When the event influencing feature is fused into the models, the forecasting error is reduced, with the MAE, RMSE, and MAPE of STL-MLP-EI obtained as 85.9, 126.5, and 10.0, respectively.
However, it is insufficient to merely consider the effect of event occurrences. Without the event text description, the models cannot identify the degree of effect of various events on taxi demand. Table 3 demonstrates that the fusion of the event text feature (ET) can further improve the forecasting results of all models. For instance, the MAE, RMSE, and R2 obtained by STL-LSTM-EI are 83.6, 124.0, and 73.5, respectively. By fusing the event text descriptions, the MAE and RMSE of STL-LSTM-EIET are reduced to 82.3 and 121.9 respectively, and the R2 is improved to 74.4. Figure 5 shows the comparison results among STL-LSTM, STL-MLP, and STL-MLP-LSTM-EIET with truth and heatmaps. It is obvious that the forecasting results of STL-MLP-LSTM-EIET have the smallest gaps with the truth, and the orange square in the heatmap is lighter, indicating that using MLP and LSTM to forecast the trend and remainder components respectively can obtain better forecasting results. Further, by comparing the forecasting results of STL-MLP-LSTM-EIET and STL-MLP-LSTM-EIET-GA, it can be concluded that using GA to determine the optimal weights for integrating the forecasting results of three decomposed components can further improve the performance of the proposed model.

Comparison of results among STL-LSTM, STL-MLP and STL-MLP-LSTM-EIEF with truth and heatmaps in Barclays Center.
Moreover, the superiority of HDLM is further verified by comparing the HDLM with some state-of-art models, including XGBoost, LASSO, Ensemble model, BPNN, DL-LSTM L + W+E+T, and DL-FC L + W+E+T. As shown in Table 3, the proposed model performs better than these state-of-art models in all the metrics. For example, the RMSE of STL-MLP-LSTM-EIEF are 94.2, 65.2, 69.2, 41.1, 19.6, and 12.4 lower than those of XGBoost, LASSO, Ensemble model, BPNN, DL-LSTM L + W+E+T, and DL-FC L + W+E+T, respectively.
Table 4 shows the MAE and MAPE of different models for both event days and no event days in the Barclays Center dataset. By comparing the forecasting results, it is clear that the proposed HDLM yields the best forecasting performance for two types of days among all models under comparison. For example, the MAE of DL-FC L + W+E+T and HDLM for event days are 108.4 and 93.3, respectively. The MAPE of BPNN and HDLM for no event days are 103.2 and 70.4, respectively.
Average results and standard deviations for both event and no event days in Barclays Center
Significant values are boldfaced.
For the Terminal 5 dataset, Table 5 shows the forecasting results of the trend and remainder components using different deep learning modules. Similar to the forecasting results from the Barclays Center area, the MLP can obtain better results in the trend component and LSTM can obtain better results in the remainder component for the Terminal 5 dataset. The forecasting results of all models are shown in Table 6, which demonstrates the same conclusion as that from the Barclays Center dataset regarding the contributions of STL, event influencing feature and event text feature.
Average results and standard deviations among different methods in Terminal 5
Average results and standard deviations among different methods in Terminal 5
Average results and standard deviations among different models in Terminal 5
Significant values are boldfaced.
Figures 6, 7, and 8 show the comparison results more visually. The orange square Z represents the APE value for January 17th, 2016, and the orange square U represents the APE value for January 23rd, 2016. The forecasting errors are significantly reduced via STL and fusing event influencing feature and event text feature, and the performance of model is further improved by using MLP and LSTM to forecast the trend and remainder components respectively. In addition, as shown in Table 6, the forecasting results of HDLM are superior to other state-of-art models in the most metrics.

Comparison of results between LSTM and STL-LSTM with truth and heatmaps in Terminal 5.

Comparison of results between MLP and STL-MLP with truth and heatmaps in Terminal 5.

Comparison of results among STL-LSTM, STL-MLP and STL-MLP-LSTM-EIEF with truth and heatmaps in Terminal 5.
The MAE and MAPE for event days and no event days in the Terminal 5 dataset are shown in Table 7. Similar to the forecasting results from the Barclays Center area, the proposed HDLM yields the best forecasting performance for two types of days among all models under comparison.
Average results and standard deviations for both event and no event days in Terminal 5
Significant values are boldfaced.
Inspired by Demšar [4], the statistical test was adopted to further improve the reliability of the experiment. First, the Friedman test [5] was used to test the significance of all models. Then, if the null hypothesis is rejected, the Nemenyi test [28] would be used. The significance test results of six state-of-art models and the proposed HDLM are presented in Table 8. The steps of Friedman test include: (1) ranking seven selected models based on four evaluation metrics; (2) using model ranking to calculate the significance results of the Friedman test.
Significance test results of the Friedman test
Significance test results of the Friedman test
In Table 8, it can be seen that the significance results of the Friedman test on all evaluation metrics are higher than the critical value (i.e., 4.28), the null hypothesis (i.e., the performance of all models are same) is rejected. Then, the Nemenyi test was used to further compare the performance of seven models. The Nemenyi test results with graphical representation are shown in Fig. 9, where the critical distance (CD) denotes the mean ranking score difference. The more the position of the model on the coordinate axis is to the right, the better the performance of the model. As can be seen, the proposed HDLM achieves the superior performance compared to other models.

Nemenyi test results with graphical representation.
Taxi demand forecasting can provide an effective guidance for the reasonable allocation of traffic resources and benefit both passengers and taxi drivers. To address the difficulty of accurately forecasting taxi demand in highly dynamic areas, a new hybrid deep learning model based on decomposition of time series and fusion of text data is proposed in this study. To verify the superiority of the proposed HDLM, the Barclays Center and Terminal 5 datasets are explored. The experimental results can be concluded that: (a) the forecasting performance could be improved by decomposing the global time series into simpler components, which are subsequently modeled through a hybrid deep learning structure; (b) fusion of the event influencing feature and event text feature can better capture patterns of the remainder component, and solve its extreme value problem caused by unusual fluctuations; (c) using GA to determine the optimal weights for integrating the forecasting results of three decomposed components can further improve the final forecasting accuracy; (d) compared with the baseline models, the proposed model demonstrates better forecasting accuracy for both event and no event days.
In future work, more unstructured data will be used for time series forecasting, such as images and voice. Additionally, various deep learning structures can be considered to obtain better performances. Moreover, more effective data fusion techniques will be investigated to fully utilize unstructured data. Finally, other advanced heuristic algorithm will be investigated to determine the optimal weights more effectively.
Footnotes
Acknowledgments
The work has been supported by National Natural Science Foundation of China (No. 51975512, No. 51875503, No. 61972336), Zhejiang Key R & D Project of China (No.2021C03153), and Zhejiang Natural Science Foundation of China (No. LZ20E050001).
