Abstract
The tourism literature has shown that a combination of tourism forecasting models can provide better performance than individual models. In the literature, the models to be combined are usually subjectively selected and authors focus on the combination method. In this study, a new direction is represented by the integration of forecasts from several models using a two-stage forecasting system. In stage I, a subset of the best available models is objectively selected according to their performance. Then, in stage II, linear and nonlinear combination methods are utilized to integrate the forecast values of the optimal subset. The empirical study using tourist arrivals in Hong Kong from five major source markets indicates that the proposed two-stage tourism forecasting system with linear combination substantially improves forecasting performance compared with benchmarks. This study also finds that when combining forecasts from different models, the linear combination method is more suitable than nonlinear AI models.
Introduction
Tourism is a key sector in many advanced and emerging economies; before the COVID-19 pandemic, it was growing faster than the world economy (UNWTO, 2020). There were 1.5 billion international tourist arrivals in 2019, and the challenge of providing tourism services to satisfy this huge demand is regularly faced by tourism management departments and tourism enterprise managers (Chandra and Menezes, 2001). Accurate tourism demand forecasting is essential for the tourism industry to develop strategies for allocating limited resources (Lin et al., 2015; Liu et al., 2021a; Ma et al., 2023). In particular, in the post-COVID-19 period, the accurate forecasting of tourism demand can help the tourism industry respond to market uncertainty and improve its resilience, allowing policymakers to map out the road to recovery (Athanasopoulos, Hyndman, Kourentzes & O’Hara-Wild, 2023; Yang et al., 2022). Over the last two decades, many quantitative methods have been applied to tourism demand forecasting, such as time-series analysis, econometric models, and artificial intelligence (AI) algorithms (Kulshrestha et al., 2020; Li et al., 2020a, 2020b, 2020c; Song et al., 2019; Xie et al., 2020; Zhang et al., 2020; Zhao et al., 2022). However, each method has its own advantages and disadvantages. There is no single prediction method that can produce the most accurate prediction in all cases (Li et al., 2019b; Wu et al., 2017), so combining several models might be the most appropriate choice when there is uncertainty about which model is likely to produce a more accurate prediction (Song et al., 2009; Wu et al., 2024).
A model that combines two (or more) statistical AI techniques or other methods is called a combination model (Fajardo-Toro et al., 2019); the fundamental goal of a combination model is to resolve the disadvantages of individual models and achieve a synergistic impact on the prediction. Combinatorial predictive thinking was first proposed by Bates and Granger (1969). Since then, many empirical studies have demonstrated that combining the advantages of different models can lead to more accurate and robust predictions than using individual models (Bas et al., 2015; Chen, 2011; Clemen, 1989; Song et al., 2009; Yolcu and Lam, 2017; Yu and Huarng, 2008). Fritz et al. (1984) were the first to use a combination model in the field of tourism economic forecasting. They obtained a combined model forecast through two simple weighting methods and showed that this forecast could provide improved predictions of Florida airline users. Wong et al. (2007) used three linear combination techniques to combine four models and found that the combined model’s prediction clearly outperformed the worst single model’s prediction, avoiding the risk of complete prediction failure. In the study by Cang (2011), nonlinear methods were used to combine nine models and were also compared with linear combination methods in terms of forecasting performance. It should be noted that the researchers subjectively selected the models for combination, and they focused on the combination process, carried out by taking all available models’ forecasts as inputs; however, the number of models used was limited. In contrast, Cang and Yu (2014) proposed an optimal subset selection algorithm applied to individual models using information theory. They used nine models as the model pool and determined the optimal subset by deleting the models repeatedly until there was a nonsignificant improvement in forecast error. They concluded that the optimal model subset should contain two to five models. However, their research encountered the same problem as previous studies, namely the alternative model pool was limited. This can lead to biases in selecting the optimal subset and the optimal number of models. This research phenomenon raises some important questions. First, can we integrate model selection and model combination into a two-stage system by enlarging the alternative model pool? Second, does an enlarged model pool improve forecasting performance? Finally, when combining model forecasts, which type of method can produce better results—linear or nonlinear combination methods?
This study proposes a new two-stage forecasting method based on model selection and combination. In the first stage, focusing on the traditional time-series model, a large number of seasonal autoregressive integrated moving average (SARIMA) models are taken as the model pool to select several forecasting models with good performance and construct the optimal model subset. The second stage generates a combined linear and nonlinear prediction with the optimal subset. An empirical study is carried out focusing on forecasting monthly tourism demand in Hong Kong from five source countries/regions to verify the forecasting effectiveness of the new two-stage combination model. This study contributes methodological development of tourism demand forecasting by proposing a two-stage method including best-performing model pool selection and combinatorial prediction generation. The empirical results indicate the proposed method produces competitive forecasting results. Another contribution of this study lies in the field of forecast combination by integrating a large number of models to avoid deviations and selecting a subset of models according to their forecasting performance.
Literature review
Tourism demand forecasting based on a single model
Single models for tourism demand forecasting can be divided into time-series models, econometric models, AI models, and judgment models (Song et al., 2019). Typical time-series models such as the NAÏVE model, autoregressive (AR) model, single exponential smoothing (ETS) model, moving average (MA) model, autoregressive integrated moving average (ARIMA) model, and SARIMA model have been widely used in tourism demand forecasting research. These models focus on the seasonality, periodicity, and trends of historical data to form predictions, and they are often used as benchmark models for forecast evaluation (Hu et al., 2021a; Pan et al., 2012; Wu et al., 2022). Chen et al. (2019) proposed a multiseries structural time-series method with a data restacking technique and identified it is an effective approach to seasonal tourism demand forecasting. In their study of key papers published between 1968 and 2018, Song et al. (2019) pointed out that the ARIMA and SARIMA models exhibit better predictive performance than other models. Econometric models, such as autoregressive distribution lag models (ADLM), error correction models (ECM) (Song and Lin, 2010), vector autoregressive (VAR) models (Song and Witt, 2006; Wu et al., 2021a), time-series models with explanatory variables (ARX/ARMAX/ARIMAX/SARMAX, among others) (Gunter and Önder, 2016; Pan et al., 2012; Yang et al., 2015), and mixed data sampling (MIDAS) (Bangwayo-Skeete and Skeete, 2015; Wu et al., 2023), have been favored by scholars because they take the economy as a theoretical basis, using tourism demand and its explanation as variables. ADLM and ECM are the most widely used econometric models, and numerous studies have confirmed their superior performance in tourism demand forecasting (Song et al., 2019). Due to their flexible form, both ADLM and simplified ADLM can be used with other tools, such as time-varying parameters (TVP) (Li et al., 2006) and MIDAS (Hirashima et al., 2017). AI algorithms have the advantage of capturing nonlinear relationships among data structures and can accurately reflect tourism demand (Bi et al., 2022; Silva et al., 2019; Sun et al., 2019; Wu et al., 2024; Zheng et al., 2021). Artificial neural networks (ANNs) are the most commonly used AI models and have been shown to be widely applicable and flexible in handling almost all types of nonlinear relationships (Song et al., 2019).
However, due to the high complexity of tourism demand data, with a variety of trends, seasonality, and irregularities, it is impossible to extract all of the features of the data at the same time and with a single method. Time-series models and econometric models are often biased by irregularities in the sequences, which make it impossible to accurately and effectively determine the lag period (Law et al., 2019). Although models based on AI algorithms can theoretically achieve accurate modeling, they are prone to overfitting (Guyon and Elisseeff, 2003) and encounter local minimization problems in practice (Zhang et al., 2021), and there is a “black box” dispute between the input layer and the output layer of the algorithm (Zhang et al. 1998).
Tourism demand forecasting using combinatorial methods
The different models described above have their own advantages and disadvantages. A simple solution is to integrate several individual models to combine their advantages and resolve their limitations by applying combinatorial methods (Hadavandi et al., 2011). According to the literature on tourism demand forecasting, the application of combinatorial methods can be divided into two categories: one involves the combination of forecast results from multiple models, and the other involves hybridizing a model with other models or technologies.
Combination model obtained by integrating single models’ forecast results
Fritz et al. (1984) presented the first study using a combination model in the field of tourism demand forecasting. They selected a traditional econometric model and the ARIMA model and combined their forecasts through two simple weighting methods. The results showed that the combined forecasting model could better predict Florida airline users. Wong et al. (2007) proposed a test to determine the need for combination methods, called “to combine or not to combine in tourism forecasting.” They used three combination techniques, namely simple combination, variance–covariance weighting, and discounted mean square error weighting, to combine four models, namely ARIMA, ADL, ECM, and VAR, suggesting that a combination forecasting model can considerably reduce the risk of forecasting failure. Song et al. (2009) also generated a similar result whereby combinatorial methods provided more accurate predictions than a single model, and they further found that the prediction accuracy of the combinatorial methods did not improve with an increase in the number of models. Their findings support those of Winkler and Clemen (1992) to some extent. In their work, Winkler and Clemen (1992) noted that the strong correlations between the forecast errors generated by the different models may have been the reason for their poor forecasting performance. Recently, combined forecasts were presented by a European team in a tourism forecasting competition that took place during the COVID-19 pandemic, which used a simple average approach to combined forecasting (Liu et al., 2021b). While the aforementioned studies used a linear combination of forecast models, Cang (2011) utilized a nonlinear combination model; they nonlinearly combined several models and showed that a nonlinear combination model can provide better performance in predicting arrivals than benchmarks.
Tourism demand forecasting based on models hybridized with other models or technologies
Qiu et al. (2021) identified that stacking method based on a number of single models can produce more accurate forecasts than 11 single benchmark models under the context of unexpected crisis. To combine the advantages of linear and nonlinear models, Wu, Ji, He, and Tso (2021b) constructed a hybrid model based on the SARIMA model and the long short-term memory (LSTM) neural network model. They then embedded the LSTM into a convolutional neural network (CNN) model, forming a new SARIMA–CNN–LSTM hybrid prediction framework (He et al., 2021). In this hybrid framework, the linear model can capture the characteristics, trends, and development laws of variables in a sequence. Models based on AI algorithms can be used to explore the irregular components of sequences. The combination of two types of models integrates the advantages of learning, memory, and induction associated with machine learning algorithms, while retaining the advantages of linear models, thus achieving more accurate tourism demand forecasting (Zhang, 2003). Among hybrid models, Fourier series and deep neural networks (Shabri et al., 2020), time-varying jackknife model averaging (TVJMA) (Sun et al., 2023b), empirical mode decomposition (EMD) (Cao et al., 2016; Wang, 2015), ensemble empirical mode decomposition (EEMD) (Li and Law, 2020; Zhang et al., 2017a, 2017b), wavelet decomposition (Cao et al., 2016), and other AI techniques have been widely used in tourism demand forecasting.
In addition to the model combinations described above, some scholars have used optimization algorithms to identify a model’s optimal parameters and to model and forecast tourism demand with them. This method results in improvements in the model’s forecasting accuracy. The most commonly used algorithms include genetic algorithms (Chen et al., 2015; Hong et al., 2011), fruit fly optimization algorithms (Li et al., 2019a), and particle swarm optimization algorithms (Li et al., 2020; Xu and Wang, 2022). Empirical studies have demonstrated that models with optimal parameters can achieve better performance.
Rationale for this study
Most studies on tourism demand forecasting have used combinatorial forecasts (Cang, 2011; Nor et al., 2018; Song et al., 2009; Wong et al., 2007). Song and Li (2021) indicated that combination forecasting, which was proposed by a European team in a tourism forecasting competition during the COVID-19 pandemic, can integrate the merits of different methods. In general, studies subjectively select the models for combination and focus on the combination method. The manually selected models used in combination are subjective and limited; manual selection may result in missing the best-performing model and including a poorly performing model. There is room for improvement in the combination method via the optimal selection of models from a model pool. Only Cang and Yu (2014) utilized a selection process to generate candidate combination models via information theory, but their model pool was limited. A limited pool also leads to model selection bias. It is therefore worthwhile to explore whether objectively selecting a combination model and including a large number of models lead to a more accurate result.
Regarding the use of a two-stage model in forecasting, much of the non-tourism literature has shown that two-stage forecasts improve forecasting accuracy (Du et al., 2022; Hao and Tian, 2019). In tourism research, when considering different hybrid models, Guo and Shang (2021) showed that a two-stage prediction model combining the traditional fruit fly optimization algorithm with an echo state network offered a more robust prediction effect, faster convergence speed, and higher prediction accuracy. From a feature selection perspective, Sun, Hu, Wang, and Zhang (2023a) focused on determining the most accurate keyword combination for prediction and proposed a two-stage feature selection-based methodology. For the model combination method, it is reasonable to utilize a two-stage model including a combination model selection process and then combine the best models. It is useful to address the research gap regarding whether a two-stage combination forecasting model based on model selection and combination offers better combination forecasting methods and procedures.
Methodology
An integrated framework (see Figure 1) is proposed that employs a two-stage model to forecast tourist arrivals in Hong Kong and verify its performance compared with that of a single model. This framework includes four steps. The first is to collect tourist arrival data. The second is to specify the models. In our case, the two-stage model generates results based on model selection and combination. The first stage of model selection involves selecting the optimal model subset; the second stage aims to combine the models in the subset via linear and nonlinear combination methods. The linear combination methods include the simple combination method, the variance–covariance method, and the discounted mean square forecast error (MSFE) method. The nonlinear combination methods include neural networks (NN), random forest (RF), and support vector machine (SVM); the benchmark models include the SARIMA, the SNAÏVE, the best single model in the SARIMA family, and the average error of the selected models. The third step in the framework is to construct the models and generate forecasts, and the fourth is to evaluate their forecasting accuracy ex post. Research framework.
Data collection
In this study, we select Hong Kong as the tourism destination and consider five major source regions. Monthly tourist arrivals from the different source regions are obtained from the Hong Kong Tourism Board (https://www.discoverhongkong.com/). Monthly tourist arrivals for the source regions from January 2005 to May 2019 are shown in Figure 2. Monthly tourist arrivals in Hong Kong from five source countries/regions.
Model specification
Seasonal autoregressive integrated moving average model
The seasonal ARIMA (p, d, q) (P, D, Q)
m
process is given by
Seasonal NAÏVE model
The seasonal NAÏVE (SNAÏVE) model is expressed as follows:
Two-stage tourism demand forecasting system
The main phases of the two-stage model are model selection and model combination and they involve the following: (I) selecting the optimal subset and (II) integrating the forecasts of optimal subset as inputs and generating the combination results.
Stage I selects the optimal SARIMA as the input model. The SARIMA family has been commonly used in tourism demand forecasting (Hu and Song, 2020; Li et al., 2020; Pan and Yang, 2017; Zhang et al., 2020). When different parameters (p, d, q, P, D, and Q) are taken, different models are obtained. In particular, a range of values for each of the method parameters (p, d, q, P, D, and Q) is used to construct the candidate SARIMA model. These values are listed in parentheses as follows: p: (0, 3), d: (0, 1), q: (0, 3), P: (0, 3), D: (0, 1), and Q: (0, 3). Based on the notion of optimization, after removing the models that cannot be fitted, we sort the 100 remaining models from lowest to highest based on their in-sample root mean square error (RMSE). When constructing the optimal model subset, we start with the best model and increase the number of models in the set sequentially until we achieve a minimized RMSE on the validation set.
Stage II generates inputs for the combination model based on the optimal subset in the first step. These models’ forecasts are taken as inputs for the combination model. Following Wong et al. (2007) and Song et al. (2009), we use the simple combination method, the variance–covariance method, and the discounted MSFE method as the linear combination model. The nonlinear combination process is executed using NN (Jiao and Chen, 2019). We adopt a three-layer simplified NN structure containing one hidden layer, and we use the greedy search method to search exhaustively for the optimal NN parameters that produce the lowest level of in-sample forecast error, which we measure using the mean absolute percentage error (MAPE). Moreover, we construct the nonlinear combination model based on RF and SVM to compare its forecasting performance with that of the linear combination model. RF is an efficient and popular algorithm that is based on model aggregation (Peng et al., 2021); SVM tends to be combined with other forecasting methods (Abellana et al., 2020). Both have been used in tourism demand forecasting.
The structure of the two-stage model based on model selection and combination is shown in Figure 3. The structure of the two-stage forecasting model.
Estimation and forecasting
The two-stage forecasting model and the benchmark models are implemented and used to generated forecasts for five countries/regions. For the benchmark models, monthly tourist arrivals between January 2005 and May 2017 are initially taken as the estimation dataset. In addition, 24-month one-step-ahead forecasts between June 2017 and May 2019 are generated recursively with an expanding window. The two-stage forecasting model divides the data into three datasets: a training set, a validation set, and a test set. The initial dataset division is shown in Figure 4. The training dataset is used to generate the input for the linear combination models and AI models using the SARIMA models’ forecast results. The initial estimation set for the SARIMA models covers the period from January 2005 to May 2009 and is then extended. The specific validation set comprises data for the 12 months closest to the forecast point, which are used to select the optimal model subset in stage I. At the same time, this dataset is used to determine the parameters of the two-stage model using nonlinear combination, and the test set is used to evaluate the forecasting performance of the two-stage model using linear combination or nonlinear combination. Division of the dataset for the two-stage combination model.
Forecasting accuracy evaluation
The forecasting performance of the two-stage model and the benchmark models is evaluated based on MAE, MAPE, RMSE, and RMSPE. These metrics have been commonly used in tourism demand forecasting (Önder et al., 2020; Wen et al., 2021). For these measures, a smaller value indicates better performance, and they are calculated as follows:
At the same time, the relative improvement percentage is calculated to compare two models (Hu et al., 2021b). Taking the MAE as an example, when Model 1 is the two-stage combination model and Model 2 is the benchmark model, the improvement percentage of Model 1 compared with Model 2 is calculated as follows:
Results
One-step-ahead out-of-sample forecasting accuracy measures of two-stage model using linear combination.
Note: For simplicity, China represents the Mainland China and Korea is the South Korea. X1 is the best single model in the SARIMA family. Average is the error average of models which are selected. Bold number indicates the positive relative improvement. The same applies below.
Three-steps-ahead out-of-sample forecasting accuracy measures of two-stage model using linear combination.
Six-step-ahead out-of-sample forecasting accuracy measures of two-stage model using linear combination.
12-step-ahead out-of-sample forecasting accuracy measures of two-stage model using linear combination.
The results in Table 1 demonstrate that, compared with the benchmark models, the two-stage model with linear combination has higher forecasting accuracy for all source regions in one-step-ahead forecasting. In particular, when comparing the forecasting performance of the two-stage model with linear combination with the SARIMA and the SNAÏVE, the former demonstrates higher forecasting accuracy in all source regions for all measurements, except Japan, as reflected in the RMSPE. These consistent results verify the positive role of the two-stage model in tourism demand forecasting. Moreover, the similar forecasting performance of the different two-stage models with linear combination indicates that linear combination methods do not significantly affect model performance. The superior forecasting ability of the two-stage model with linear combination can be attributed to the optimal selection process of the two-stage system. Furthermore, the comprehensive results of the two-stage model compared with the best single model in the SARIMA family and the average error of the selected models show that the proposed two-stage model with linear combination outperforms the other models in terms of average error, including the best-performing model. These results further confirm the advantages of the combination method. Selecting models before combining forecasts from single models results in more accurate predictions of tourism demand. The comparison results of the three-, six-, and 12-step-ahead forecasts in Tables 2–4 also reveal similar results, whereby the proposed two-stage model with linear combination outperforms the benchmark models in most source regions and for different measurements. It is also noted that in three-, six-, and 12-step-ahead forecasts, the proposed method cannot beat other models occasionally in short haul tourism markets. This may be due to high volatility of the tourism demand since tourists often make their decision quickly in this kind of markets. In other words, the proposed two-stage model with linear combination may be more effective in the contexts of short-term forecasting and low-volatility long haul tourism markets.
One-step-ahead forecasting accuracy comparisons across source markets.
Three-step-ahead forecasting accuracy comparisons across source markets.
Six-step-ahead forecasting accuracy comparisons across source markets.
12-step-ahead forecasting accuracy comparisons across source markets.
Diebold-Mariano (DM) test between the two-stage model with linear combination and benchmark models.
Note: *, **, and *** indicates the significance levels at 10%, 5%, and 1%, respectively. A negative significant value indicates that the forecasting performance using two-stage model with linear combination is significantly better than the other models. The same applies below.
Diebold-Mariano (DM) test between the two-stage model with linear combination and two-stage model with nonlinear combination.
Conclusion
By integrating the forecasts of several optimal models into a tourism demand forecasting system, resulting in two-stage forecasting, this study forecasts tourist arrivals in Hong Kong from five major source countries/regions. This empirical study demonstrates that the two-stage forecasting model based on model selection and linear combination can significantly improve forecasting performance. The similar forecasting performance of the two-stage model with different linear combination methods indicates that differences in these methods do not significantly affect the performance of the model. The superior forecasting ability of the two-stage model with linear combination is attributed to the optimal selection process. The results confirm that the integration of different models is an effective means of improving their predictive performance (Hibon and Evgeniou, 2005; Khashei and Bijari, 2011), and the optimal subset obtained from individual models shows good performance and robustness in general (Cang and Yu, 2014). In addition, the two-stage model using linearly combined forecasts outperforms the benchmark model in one-step-ahead forecasting, and its result is also superior for long-term forecasts (three-, six-, and 12-step-ahead), suggesting that a two-stage model may be beneficial for both short- and long-term tourism forecasting. The results also show that AI models do not exhibit superior forecasting ability in the combination stage, and linear combinations may be more suitable when combining model outcomes.
A key contribution of this study is to benefit model development in tourism demand forecasting by propose a new and effective two-stage method incorporating optimal single model pool selection and linear combination process. Secondly, this study contributes forecast combination field by integrating a large number of models to avoid deviations and selecting a subset of models according to their forecasting performance. Thirdly this study also provides empirical evidence that the linear combination method is more suitable than nonlinear AI models when combining forecasts from different models.
The implications of this work lie in two aspects. First, the above results provide a strong indication that tourism practitioners should use a two-stage model based on model selection and linear combination to improve forecasting accuracy. In the case that policymakers are primarily concerned with the severity of tourism demand forecast errors or demand forecasts are used to assess the feasibility of short-term investment, this study has important managerial implications for destination managers. Second, this study compares the forecasting performance of two-stage models using linear combination methods and those using nonlinear combination methods in terms of forecasting accuracy and finds that the use of linear combination methods after model selection leads to the best performance. This suggests that researchers or practitioners can benefit from the use of linear combination methods after model selection to combine model forecasts.
This study has several limitations. First, we do not use data collected during the COVID-19 pandemic. The main reason for this is that international tourism was halted after the COVID-19 outbreak. Future research could test the performance of the proposed two-stage model after the international tourism field has recovered. Second, we only compare three AI-integrated methods for the models selected; thus, in future research, additional nonlinear combination methods (e.g., deep learning) should be applied to verify that there is no nonlinear relationship between the model prediction results. Third, we only combine univariate time-series models in the two-stage forecasting system, but economic variables and big data variables could be incorporated in the future to improve forecasting performance.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Guangxi Key Research and Development Plan under Grant [Guike-AB20297040], the Guangxi Natural Science Foundation under Grant [2022JJA180046], the National Social Science Fund of China under Grant (21BJY193), the Guangxi Development Strategy Research Institute, the Guangdong Basic and Applied Basic Research Foundation (2020B1515020031), and the National Natural Science Foundation of China (72374226).
