Abstract
Due to the limitations of existing tourism demand forecasting models, data with frequencies lower than those of the tourism demand need to be processed in advance and cannot be directly used in a model, which leads to the loss of timeliness and accuracy in tourism demand forecasting. Taking the inbound tourism of the United States prior to and during the COVID-19 pandemic as an example, this study systematically examines the impact of data frequency processing on tourism demand modeling and forecasting, through the construction and comparison of three categories of models, with a particular focus on the first developed multiple mixed-frequency specification of reverse mixed-data sampling (RMIDAS) model. The results confirm the positive effect of multiple mixed-frequency models, which can directly use various original data frequencies, in improving the accuracy of tourism demand forecasting. This study also provides important guidance for future research on high-frequency tourism variables forecasting.
Keywords
Introduction
The importance of timely and accurate tourism demand forecasting has been illustrated by many researchers (Athanasopoulos et al., 2018; Chatziantoniou et al., 2016; Song & Hyndman, 2011; Uysal & Crompton, 1985; Witt & Witt, 1995). A considerable number of studies on this topic have been published during the past few decades, reflecting an increasing interest in tourism demand forecasting (Bangwayo-Skeete & Skeete, 2015; G. Li et al., 2019; Song & Li, 2008; Song et al., 2019). These studies have strived to improve forecasting performance by utilizing new variables and developing novel methods because improvements in tourism forecast accuracy were mainly derived from these two aspects (Peng et al., 2014). The selection of variables in the process of tourism demand forecasting has changed with the development of tourism studies, and innovations in tourism demand forecasting models are also closely related to the data used.
When more variables are considered in tourism demand forecasting, the data frequencies become richer. The introduction of the mixed-data sampling (MIDAS) model provides a new perspective on the problem of frequency mismatch in the tourism literature (Bangwayo-Skeete & Skeete, 2015). However, in prior studies, MIDAS-type models have only been capable of using explanatory variables with frequencies that are consistent with or higher than the target variable—tourism demand. Few researchers have paid attention to solving the frequency mismatch problem of variables whose frequencies are lower than the tourism demand in the modeling process. For example, as one of the most commonly used explanatory variables in tourism demand forecasting, Gross Domestic Product (GDP) data are traditionally published quarterly, which is at a lower frequency than tourism demand data, which are usually observed monthly. Due to the lack of suitable models that can accommodate these data in their original frequency, existing studies have instead had to deal with the mismatch problem in the data preparation process. Interpolating quarterly GDP into monthly GDP (Chow & Lin, 1971), taking the monthly industrial production index (IP) as a proxy of GDP (Chatziantoniou et al., 2016; H. Liu et al., 2021), and summarizing the monthly tourism demand into quarterly data (G. Li et al., 2019; Wan & Song, 2018) are three main methods used to solve this problem. However, none of these methods can directly use the original low-frequency data (i.e., quarterly GDP) when forecasting monthly tourism demand. Conversely, the data need to be processed or replaced in advance, which often means it is impossible to make the most effective use of the information contained within the series. For example, aggregation will lead to information loss and publication lags, while interpolation may introduce additional biased information (by assuming the same value per month within a quarter) into the original series (Guérin & Marcellino, 2013; H. Liu et al., 2021).
To alleviate this shortcoming of existing models, this study offers a new perspective on forecasting tourism demand by introducing a novel method based on the reverse MIDAS (hereafter, RMIDAS) model proposed by Foroni et al. (2018), which allows direct use of low-frequency explanatory data to forecast high-frequency dependent variable. We further extend the method to enable direct prediction of tourism demand using multiple mixed-frequency data, including those with same-, higher-, and especially lower- frequencies, which means that it does not require pre-processing of mixed-frequency data before modeling. To the best of our knowledge, this study is the first attempt to apply a multiple mixed-frequency RMIDAS model and assess its performance in the broad forecasting literature, not just tourism forecasting.
Specifically, by comparing several models according to multiple evaluation indicators, this study first aims to develop the most general form of the MIDAS model to accommodate original explanatory variables of tourism demand at different frequencies, without any pre-processing or loss of information contained in the original series. In other words, this study systematically explores whether the direct use of variables with original frequencies, especially those with lower frequencies than that of the dependent variable, performs better than those using frequency-processed ones in tourism demand forecasting. Second, we aim to improve the forecast accuracy by introducing data with multiple mixed frequencies into the forecasting process, especially data with lower frequencies than tourism demand. Third, given the importance of predicting tourism recovery from the COVID-19 pandemic, our research includes two stages corresponding to periods prior to and during the pandemic, respectively. By taking account of both normal and crisis situations in the assessment of forecasting performance, the conclusions of our research will be more generalizable. Additionally, our research also provides important practical implications to tourism stakeholders looking to adjust their policies and business strategies to recover from the pandemic, as well as in the post-pandemic era. More specifically, the improved forecasting performance of the newly developed models during COVID-19 has direct relevance to the recovery phase of the pandemic, while the performance before the pandemic provides a valuable reference for the post-pandemic era, as both periods can be regarded as “normal” periods.
The remainder of this paper is organized as follows. Section 2 reviews the related literature on tourism demand forecasting variables and models. Section 3 introduces the models used in our study (both the RMIDAS and benchmark models). Section 4 presents the empirical study, including the data, research design, and empirical results. The last section concludes the study and provides some implications and limitations.
Literature Review
Influencing Factors of Tourism Demand
Researchers have considered factors of various types and frequencies in tourism demand forecasting, aiming at improving the forecast accuracy (Khaidi et al., 2019). Most early studies used traditional variables, mainly macroeconomic indicators, as explanatory variables in their forecasting models. It has been recognized that tourists’ income (usually represented by GDP), the exchange rate, the relative tourism price between the destination and the origin markets, and the tourism prices at competing destinations are predominant factors influencing tourism demand (Song et al., 2019). In addition, such indicators as unemployment (Gounopoulos et al., 2012), infrastructure (Naudé & Saayman, 2005), and trade (Tsui & Fung, 2016) have been added to tourism demand forecasting models. These data are often observed at a lower frequency, either quarterly or monthly.
With the development of tourism demand forecasting research, especially since the popularization of the internet, multiple sources of big data have been incorporated into the process of tourism demand forecasting, including user-generated content data, device data, and transactional data (J. Li et al., 2018; X. Li et al., 2021). As an effective supplement to traditional data sources, internet data have been proven valuable in improving the forecast accuracy of tourism demand to a certain extent (Song et al., 2019). More than half of the research using internet data focused on search engine data, followed by social media data and web traffic data (X. Li et al., 2021). The preference for search engine data may be derived from their structural characteristics and accessibility. Generated in real time, the internet data are often obtained at higher frequencies, such as weekly, daily and even hourly.
The effectiveness of the above influencing factors in improving the forecast accuracy of tourism demand in a stable period has been confirmed by many studies, but most of them can not capture the uncertainty in the economic and social environment. Given the non-negligible negative effect of the ongoing COVID-19 pandemic when forecasting future tourism demand, some researchers have considered indicators that can depict changes in the external environment in their analyses, such as the Economic Policy Uncertainty (EPU) index (H. Liu et al., 2020) and the Infectious Disease Equity Market Volatility (EMVID) tracker (Cepni et al., 2023).
The research on the influencing factors of tourism demand contributes to improving the prediction of tourism demand, and it also helps tourism stakeholders to act more effectively to promote the development of the tourism industry. These influencing factors are associated with a wide range of data frequencies.
Tourism Demand Forecasting With Same-Frequency Models
Noncausal time series and causal econometric models are the two categories most widely used in the field of tourism demand forecasting (Hirashima et al., 2017). Univariate time series models give future predictions only based on the historical and current information in existing tourism demand data series. Plenty of empirical evidence shows that some univariate time series models can still perform well in tourism demand forecasting even without including explanatory variables, including the Naïve, ARIMA, and exponential smoothing (ETS) models. Meanwhile, augmenting the time series models with additional explanatory variables has become another trend (A. Liu et al., 2022). By incorporating appropriate exogenous variables, the multivariate time series models seem to have better forecasting performance than the univariate ones (Athanasopoulos & Hyndman, 2008; Jiao & Chen, 2019; Pan & Yang, 2017).
However, time series models fail to explore the causal relationships between exogenous variables and tourism demand, which may ignore the critical role of influencing factors in tourism forecasting (H. Li et al., 2020). When it is necessary to consider the important role of tourism influencing factors, econometric models, which utilize information from a set of exogenous variables, have become another vital model type of tourism demand forecasting. For example, Witt and Martin (1987) used seven explanatory variables with the autoregressive with exogenous inputs (ARX) model to forecast tourism demand; Song et al. (2003) used the autoregressive distributed lag (ARDL) model to identify the elements contributing to the demand for Hong Kong tourism and forecast its international tourist arrivals. Moreover, Assaf et al. (2019) compared tourism demand forecasting abilities among a series of vector autoregressive (VAR) models.
In the context of such abundant exogenous variables available for tourism demand forecasting, a shortcoming of these conventional models appears that they can only operate at a single frequency. Directly using two data sets at different frequencies in a single model would lead to a data frequency mismatch problem. Thus, most variables are selected at (or processed into) the same frequency in early tourism demand studies (Song & Li, 2008; Wu et al., 2017).
Tourism Demand Forecasting With Mixed-Frequency Models
The problem of frequency mismatch has become inevitable when so many influencing factors are considered in tourism demand forecasting. For example, as the most popular internet data source, search engine data were first applied by Pan et al. (2012) in their research to forecast weekly hotel room demand using Google Trends data of the same frequency. Google Trends data are available monthly, weekly, daily, or more frequently, while the variables of tourism demand are usually offered monthly or at a lower frequency. As a result of the limitations arising from the conventional econometric models in dealing with mixed-frequency variables, most variables are converted into the same frequency as tourism demand before estimation in early studies, with low-frequency processing as one way to convert the data frequency (Choi & Varian, 2012; X. Li & Law, 2020; Sun et al., 2019; Wen et al., 2019). However, directly aggregating high-frequency data into low-frequency data will lose the rich information contained in the original data and lead to a lag in tourism demand forecasting (Bangwayo-Skeete & Skeete, 2015; Havranek & Zeynalov, 2021).
The MIDAS model proposed by Ghysels et al. (2004) provides a new way of solving the frequency mismatch problem. Before being introduced into the tourism literature by Bangwayo-Skeete and Skeete (2015), the MIDAS model or its extended forms had been primarily used in the macroeconomic and financial fields (Clements & Galvão, 2008; Ghysels et al., 2006, 2007; Guérin & Marcellino, 2013). In terms of applications in the tourism field, an increasing number of researchers have been applying the MIDAS model in tourism demand forecasting (Bangwayo-Skeete & Skeete, 2015; Gunter et al., 2019; Havranek & Zeynalov, 2021; H. Liu et al., 2021; Volchek et al., 2019; Wen et al., 2021). The MIDAS-type models demonstrate great potential in addressing the problem of explanatory variables with frequencies higher than those of the tourism demand.
For variables with lower frequencies than those of the tourism demand, previous studies have often dealt with the frequency mismatch problem during data preparation. For example, when using low-frequency quarterly GDP to forecast high-frequency monthly tourism demand, Chow and Lin (1971) interpolated quarterly GDP into monthly one, while H. Liu et al. (2021) used the monthly industrial production index (IP) as a proxy, and G. Li et al. (2019) aggregated the monthly tourism demand into the quarterly data. The RMIDAS model proposed by Foroni et al. (2018) provides the possibility of directly using low-frequency data to forecast a high-frequency target variable. In their study, they used quarterly survey data to forecast monthly macroeconomic indicators to test the validity of this model. Unlike the conventional MIDAS model, whose frequency of the dependent variable cannot be higher than the lowest frequency of explanatory variables, the RMIDAS method can directly utilize explanatory variables with a lower frequency than the dependent variable. Other studies have also confirmed the usefulness of the RMIDAS model in several fields: Xu et al. (2021) used quarterly GDP and monthly inflation to predict US daily interest rates through the RMIDAS model, Wang et al. (2022) forecasted weekly market value of technology-listed companies using monthly indicators by combining the support vector regression (SVR) and the unrestricted RMIDAS (RUMIDAS) model, and Foroni et al. (2023) introduced a Bayesian approach to the RUMIDAS model and forecasted daily electricity prices using monthly macroeconomic information. Although adopted in many ways, the RMIDAS model has not been used in tourism demand forecasting, and its performance in this field has not yet been investigated. The present study therefore aims to bridge this research gap.
Methodology
This section introduces the newly extended RMIDAS model and benchmark models selected in our comparison framework.
The RMIDAS Model
The RMIDAS model, including the unrestricted RMIDAS (RUMIDAS) and the restricted RMIDAS, is established based on the conventional MIDAS model but employs a reversed frequency alignment process. The MIDAS model uses the high-frequency explanatory variables on the right side of the equation to model the low-frequency dependent variable on the left side. Conversely, the RMIDAS model employs the high-frequency dependent variable and low-frequency explanatory variables (Foroni et al., 2018; Xu et al., 2021). In this study, the RMIDAS model is further extended into a multiple mixed-frequency version that allows explanatory variables with various frequencies, enabling the direct regression of monthly tourism demand on a wide range of explanatory variables with same-, higher- and lower- frequencies.
In the course of frequency alignment, the high-frequency dependent variable of the RMIDAS model is divided into several periods, depending on the frequency mismatch, namely the periodic process. Ultimately, the observed frequency of the high-frequency dependent variable is aligned with the lowest frequency on the right side of the equation. This study uses several variables, including quarterly GDP, to forecast monthly visitor arrivals (VA), with a frequency mismatch of three. The frequency alignment for the dependent variable
where
After the frequency alignment process is complete, the monthly VA is divided into three columns (
where
When adding other variables to the model, such as the monthly lag terms of tourism demand and the high-frequency search engine (SE) data, the increasing number of variables, lag orders, and frequency mismatches may result in difficulties regarding the model estimation, similar to the UMIDAS model. Therefore, we impose the exponential Almon lag polynomial
where J and K denote the maximum lag order of VA and SE, respectively, and mSE is the frequency mismatch between the highest-frequency variable SE and the lowest-frequency variable GDP. Equation 3 indicates the most prominent advantage of the multiple mixed-frequency RMIDAS model: it can directly use the variable of the original low-frequency without pre-processing.
Benchmark Models
To compare the effectiveness of the direct use of variables with the original frequencies and those based on frequency processing in tourism demand forecasting, we estimate a series of models for each country in the comparison framework.
The competing models are divided into different categories to verify their validity, along with the development of tourism demand forecasting models. First, the time series models, including the seasonal Naïve (SNaïve) model, the seasonal autoregressive integrated moving average (SARIMA) model, and the exponential smoothing (ETS) model, which contain only historical tourism information, are used as essential benchmark models. It should be noted that since the seasonality of tourism demand data was severely disrupted during the early stages of COVID-19, the non-seasonal version of the Naïve model is more appropriate and is therefore used as a benchmark model at this stage. Second, the widely used conventional econometric model, the autoregressive distributed lag (ARDL) model, which can only accommodate variables of the same frequency in the modeling process, is applied as another benchmark model to test whether the added exogenous variables could help improve the accuracy of tourism demand forecasting. Last, given the increasing diversity of explanatory variables for tourism demand forecasting and the inevitability of the frequency mismatch problem, this study also considers the original MIDAS model as a benchmark for the direct use of high-frequency internet data in the model.
Empirical Study
Data
Variables Selection
To increase the generalizability of this study and avoid the contingency of results with only a single series, we forecast the monthly inbound tourism demand for the United States from its six key source countries: Mexico, Canada, the UK, Japan, Korea, and Brazil. The tourism demand data are obtained from the Pacific Asia Travel Association (PATA, 2022), measured by monthly inbound VA from each source country.
For each source country, the explanatory variables contain the four critical influencing elements: the GDP is taken as a proxy of the tourists’ income, the relative price (own price, OP), and the substitute price (SP) are adjusted by the exchange rate (ER) between the source country and the USA. In light of prior studies by Song et al. (2003) and G. Li et al. (2019), the OP of the ith country (OP i ) is calculated from the consumer price index of the destination, the USA (CPI USA ), and the ith source country (CPI i ), adjusted by the exchange rate between the country’s currency and USD (ER i ). We select Canada as the competing destination based on geographic and cultural proximity to calculate the SP from the ER-adjusted CPI of Canada and the ith source country (since Canada is both a source and a competing destination, we remove the variable of SP in the case of Canada). All these data are downloaded from the OECD databases (https://data.oecd.org/). The OP and SP are calculated as follows:
For the widely used high-frequency search engine data, we focus on the weekly Google Trends (GT) data in our models. We do not include the Chinese mainland (one of the most important source markets for the USA before the COVID-19 pandemic) in our study because of two reasons: (1) China had exceptionally low volumes of outbound travel during the pandemic due to severe travel restrictions; (2) Different from the other six source countries, the leading search engine used in the Chinese mainland is Baidu, with much shorter available time series than Google data. The GT data are sourced from the Google Trends website (https://trends.google.com/). In addition, we add the lag terms of VA as explanatory variables and seasonal dummies to capture the seasonality of tourism demand.
In the forecasting exercise at the recovery stage of COVID-19, we add one-off event dummies and the Infectious Disease Equity Market Volatility (EMVID) tracker to track the outbreak and recovery related to the pandemic. The EMVID tracker data are downloaded from the Economic Policy Uncertainty website (EPU, 2022).
Data Processing
To obtain the Google Trends data, we perform the following process. First, given the different languages of the source countries, we use the native language of each country for “America” as the initial keyword. In this way, we avoid translation bias caused by translating multiple initial keywords selected based on domain knowledge. We search for each initial keyword under the category “Travel” and obtain its most relevant search keywords for the next round of searches according to the “related queries” provided by Google Trends. By expanding keywords and eliminating duplicate values, we build a dataset containing many keywords for each country. Second, we download the monthly search series for each keyword and calculate the Pearson correlation coefficients between the search data and the number of VA. Third, instead of the actual search volume data, Google Trends only provides an index between 0 and 100 and monthly (not weekly) data for the query length over 5 years. To obtain the weekly data for more than 5 years, we adjust each keyword through the method proposed by Xu et al. (2019). Fourth, we use the method of Smith (2016) to ensure the data for 4 weeks per month due to the varying number of weeks in each natural month. Additionally, we conduct a natural logarithm transformation for all sequences of variables except the dummies before model estimation.
Research Design
The models used in our comparison framework can be divided into three categories according to the variables in use: First, the time series (TS) models using only visitor arrivals (M1 to M3); second, the same and mixed frequency (SMF) models with interpolated monthly GDP and other explanatory variables (M4 to M6); third, the reverse mixed frequency (RMF) models contain original quarterly GDP data (M7 to M9). They are differentiated by the type of explanatory variables used in the model. For more specific details, please refer to Table 1.
The Details of Each Model.
Note. TS = time series; SMF = same and mixed frequency; RMF = reverse mixed frequency; q = quarterly; m = monthly; w = weekly.
There are four groups of comparisons in the empirical study. Figure 1 shows the specific contents and the corresponding models for each comparison. Accordingly, Comparison 1 compares the TS and SMF types, which denotes the comparison between time series models and econometric models. It tests the role of crucial influencing elements in tourism demand forecasting. The comparison between M4 and M5 (Comparison 2) examines the usefulness of search engine data for tourism forecasting. Comparison 3 tests whether the direct use of high-frequency variables in the MIDAS model can improve the forecasting accuracy of VA.

Forecasting performance comparisons in the research design.
Comparison 4 is the focus of our research, containing three sets of comparative models: M7/M4, M8/M5, and M9/M6. The only difference between the variables used in each group of the contrasting models lies in whether the GDP data are processed for frequency in advance. The RMIDAS models (M7, M8, M9) use the original quarterly GDP, while the corresponding data used in the other competing models are monthly, obtained from the interpolation of quarterly data. Comparisons 3 and 4 explore whether the direct use of the original data is more helpful in forecasting tourism demand—in other words, whether the processing of data frequency leads to larger prediction errors.
The empirical study consists of two parts: the “normal” stage prior to the COVID-19 pandemic and the recovery stage of the COVID-19 pandemic. The forecasting period spans from January 2018 to December 2019 (24 months) in the normal stage, while it spans from January 2021 to December 2021 (12 months) at the COVID-19 recovery stage. The forecasting is carried out recursively, and the length of in-sample data increases with the increases in the forecasting periods. Since the purpose of this study is to evaluate the forecasting performance of the developed model against several benchmarks, we conduct ex-post forecasting in line with the majority of the past literature (Wu et al., 2022). We adopt the extended window recursive forecasting strategy, which means that the increases in forecasting horizons will lead to a decrease in the number of forecasts to be generated. Taking the normal stage as an example, we have 24 forecasts to evaluate for h = 1, but only have seven forecasts for h = 18.
This study uses the root mean square error (RMSE), the mean absolute percentage error (MAPE), and the mean absolute scaled error (MASE) to compare the forecast accuracy of each model. We also use the Diebold-Mariano (DM) test to compare any two models from a statistical perspective in the empirical study (Diebold & Mariano, 1995).
Results and Discussion
The Normal Phase
Figure 2 features the box plots of the average MAPE for each model during the normal phase. The values for each model are calculated from the average of the six origin countries for all 18 horizons, representing the models’ overall forecasting performance. In Figure 2, the econometric models (M4 to M9) with the addition of crucial influencing elements perform better than the time series models. The comparison between the ARDLX models with and without search engine data (M4 and M5, respectively) shows the role search engine data play in tourism demand forecasting. The comparison between M5 and M6 indicates that the MIDAS model using high-frequency search engine data can further improve forecast accuracy. According to the results of the RMIDAS-type models (M7 to M9) and other models with monthly GDP (M4 to M6), the direct use of quarterly GDP is better than the use of processed monthly data, which demonstrates that the interpolation of data tends to have a negative impact on the forecast accuracy. Notably, the multiple mixed-frequency model RMIDASX-wGT (M9) performs the best among all models. All variables contained in the model are original data without frequency processing, including the same frequency monthly VA, OP, and SP, higher frequency weekly search engine data, and lower frequency quarterly GDP.

Box plots of the average MAPE for each model during the normal phase.
We present both the overall and country-by-country results. The specific forecasting results of each country during the normal phase can be found in Table 2. Due to space constraints, we only show the results for h = 1, 6, 12, and 18, and detailed results are available from the authors upon request. As can be seen in Table 2, the results based on MAPE and MASE are highly consistent, while the results based on RMSE are slightly different. In the one-step-ahead forecasting, M2 performs best in some countries, in line with other studies’ conclusions about the SARIMA’s ability to forecast tourism demand. However, with the increase in the forecasting horizon, the prediction advantage of the RMIDAS models becomes gradually evident. From the average results of all horizons, the RMIDASX-wGT model has the smallest forecast errors across almost all countries, suggesting that tourism demand can be better forecasted by directly using data without frequency processing.
The Forecast Accuracy of the Models in Each Country During the Normal Phase (2018M1:2019M12).
Note. M1 = SNaïve; M2 = SARIMA; M3 = ETS; M4 = ARDLX; M5 = ARDLX-mGT; M6 = MIDASX-wGT; M7 = RMIDASX; M8 = RMIDASX-mGT; M9 = RMIDASX-wGT. The best model in each country and horizon is in bold.
Figure 3 illustrates the average MAPE of the three categories of models for all horizons in each country. As the horizon extends, the forecast errors of the TS models increase gradually, while the econometric models maintain a better and relatively stable forecasting ability across all horizons. It benefits from the stable relationship between the exogenous variables and tourism demand, which also ensures the validity of these variables in forecasting tourism demand in the long term. It is noteworthy that the RMF models perform the best in almost all countries, indicating the role of the RMIDAS models in improving tourism demand forecasting.

The average MAPE of three categories of models for each country.
To clarify the performance of the RMIDAS models against the models based on monthly GDP, we calculate the relative MAPE (rMAPE) between M7/M4, M8/M5, and M9/M6 (see Figure 4). The variables included in each pair of competing models differ only by the frequency of the GDP. The red, green, and blue parts denote rMAPE values of less than one between M7/M4, M8/M5, and M9/M6, respectively, which means that the RMIDAS-type model produces more accurate forecasts than its counterparts. According to the large proportion of the colorful parts in Figure 4, we can suggest that the RMIDAS models using quarterly GDP without frequency processing can further improve the accuracy of tourism demand forecasting to some extent.

Relative MAPE during the normal phase.
The Recovery Stage of COVID-19
In the context of COVID-19, inbound tourism demand has declined sharply and may change at any time due to the aggravation of the pandemic or changes in travel restriction policies. Therefore, we further add the monthly EMVID tracker and some dummy variables to the existing models to track outbreaks and recovery related to the pandemic at this stage.
Although adding the EMVID tracker and dummy variables to all econometric models on the original basis, these models still seem unable to respond to the sudden changes in tourism demand quickly and accurately at the beginning of the outbreak. Therefore, we shift our focus to the recovery period of COVID-19 (January 2021 to December 2021). The overall results of each model at this stage, shown in Figure 5, are similar to those from the normal stage. One difference is that the models with high-frequency search engine data (M6 and M9) perform worse than those with monthly data (M5 and M8). One of the main reasons for this difference is that high-frequency data fluctuate wildly and may contain more noise during this stage. Table 3 shows the models’ forecast accuracy for h = 1, 2, 3, and 6 in each country during the recovery stage of COVID-19. The results indicate that the advantages of econometric models over time series models that only use tourism demand data become more pronounced at the post-COVID-19 stage, especially in multi-step-ahead forecasting.

Box plots of the average MAPE for each model during the recovery stage of COVID-19.
The Forecast Accuracy of the Models in Each Country During the Recovery Stage of COVID-19 (2021M1:2021M12).
Note. M1 = Naïve; M2 = SARIMA; M3 = ETS; M4 = ARDLX; M5 = ARDLX-mGT; M6 = MIDASX-wGT; M7 = RMIDASX; M8 = RMIDASX-mGT; M9 = RMIDASX-wGT. The best model in each country and horizon is in bold.
In the results for some countries, such as Mexico, Canada, and Korea, the RMIDASX model without search engine data performs the best, indicating that the ability of search engine data to improve tourism demand forecasting was weakened during this stage. This weakening may be due to the decreased correlation between search engine data (a representative of travel intention) and tourism demand during the pandemic. In the normal stage, travel intention can easily be transformed into actual travel behavior, but in the pandemic stage, the conversion of travel intention may be affected by many obstacles, such as the outbreak of the pandemic and travel restriction policies. In this case, higher travel intentions are not necessarily transformed into higher actual travel demand, so the ability of search engine data to improve tourism demand forecasting declines. Nevertheless, it can still be seen from the smaller proportion of gray parts in Figure 6 that the direct use of quarterly GDP can reduce errors in tourism demand forecasting.

Relative MAPE during the recovery stage of COVID-19.
As can be seen from Table 4, which illustrates the DM test results of any two comparison models during both periods, the null hypothesis of equal forecast accuracy is rejected in about half of the cases, representing the statistically significant improvements in model prediction accuracy. A negative value for the DM statistic indicates that the latter model has better forecast accuracy. So, the predominant negative values (about 80%) in Table 4 imply the validity of the study’s several sets of comparisons. At the COVID-19 stage, more than 80% of the tests between M5 and M6 are positive, indicating that the forecasting performance based on high-frequency search engine data is sometimes not as good as that based on monthly data at this stage. This difference may be because high-frequency data fluctuate significantly during the pandemic stage and may contain more noise. In addition, during the recovery stage of COVID-19, the comparative advantage of the econometric models over the time series models is more evident.
The DM Test Results of Comparisons in Each Country.
Note. ***, **, *indicate significance at the 1%, 5%, and 10% levels, respectively. M1 = (S) Naïve; M2 = SARIMA; M3 = ETS; M4 = ARDLX; M5 = ARDLX-mGT; M6 = MIDASX-wGT; M7 = RMIDASX; M8 = RMIDASX-mGT; M9 = RMIDASX-wGT. CP 1 to 4 correspond to Comparisons 1 to 4, where CP 1 compares TS (M1, M2, M3) and SMF (M4, M5, M6) models (only the results of M2/M4 are displayed here); CP 2 (M4/M5) assesses the efficacy of search engine data; CP 3 (M5/M6) evaluates the impact of directly incorporating high-frequency variables in MIDAS models; CP 4 encompasses three comparisons (M4/M7, M5/M8, M6/M9) to ascertain the utility of directly incorporating low-frequency variables in RMIDAS models.
Conclusion and Implications
The main objective of our research is to systematically investigate whether the direct uses of variables with original frequencies, especially those with lower frequencies than that of the dependent variable, perform better than those using frequency-processed data in tourism demand forecasting. To achieve this objective, we develop a series of comparison models and evaluate their forecasting performance based on an empirical case study of the monthly inbound tourism demand for the United States from its six key source countries, both in the “normal” period and in the recovery stage of COVID-19 pandemic. The models in the comparison framework include three categories: the TS, SMF, and RMF—a total of nine models. The first two comparisons indicate the roles of relative price, substitute price, tourists’ income, and search engine data in tourism demand forecasting. Comparisons 3 and 4 reveal that the direct uses of high-frequency and low-frequency variables in forecasting VA operate better than those using pre-processed same-frequency data. Moreover, the multiple mixed-frequency model developed, RMIDASX-wGT, performs the best among these competing models.
This study makes several contributions to the existing scholarly literature. First, the contribution of this study goes beyond the tourism forecasting literature by extending the RMIDAS model into a novel multiple mixed-frequency specification for the first time, not just in the tourism forecasting literature, but in the broad forecasting literature. It allows for the accommodation of original variables related to tourism demand at different frequencies, including those with same-, higher-, and especially lower- frequencies, without any pre-processing and loss of information contained in the original series. Second, this study is the first to directly use explanatory variables of a lower frequency than the dependent variable in tourism demand forecasting, which supplements the existing literature on mixed-frequency forecasting. Third, this study conducts the most comprehensive assessment in the tourism forecasting literature of forecasting performance based on explanatory variables with a full range of data frequencies. Finally, this study provides a full assessment of the performance of the RMIDAS model in both “normal” periods and a crisis situation. It is one of the very few tourism demand forecasting studies covering the periods prior to and during the COVID-19 pandemic, evaluating the forecast accuracy in both periods. Therefore, the results of this study provide convincing evidence of the superior performance and broad applicability of the RMIDAS model in both normal and crisis situations.
The study also provides important implications for future tourism forecasting research and management practices. First, it systematically investigates and proves the negative impact of data frequency processing on tourism demand forecasting, focusing on low-frequency data that were ignored in previous literature. The findings provide a new perspective to improve the mixed-frequency forecasting of tourism demand in the future. Second, the multiple mixed-frequency models developed in this paper provide important guidance for future tourism forecasting research and practice, allowing more variables to be directly used in future tourism forecasting research, especially variables with lower frequencies than the dependent variable. It implies that more low-frequency monthly, quarterly, or even annual macro and tourism data can be considered for predicting high-frequency tourism data, such as daily tourist traffic or weekly hotel occupancy rates. Therefore, future tourism forecasting exercises should adopt the more advanced RMIDAS models to accommodate data of varying frequencies instead of pre-processing the data, which may lead to information loss and affect the forecast accuracy.
Last, our research also has policy and management implications, particularly in relation to recovery from the pandemic. In the current recovery stage of COVID-19, the tourism demand largely depends on the severity of the pandemic and pandemic prevention and control policies, and it fluctuates more greatly than during the normal period. The importance of timely and accurate tourism demand forecasting has become more apparent in times of high uncertainty. Therefore, when destination governments, destination management organizations, and tourism businesses predict future market demand, they should consider the advanced RMIDAS model. It can take account of a broad range of useful information that includes different data frequencies, so various pandemic-related influencing factors of tourism demand can be considered to produce the most accurate and reliable predictions of market trends in the near future. For example, this study has proven that by incorporating the EMVID tracker information into a tourism demand forecasting model, the accuracy of tourism demand recovery from the pandemic can be predicted more accurately. This process can assist tourism stakeholders in developing effective tourism recovery or crisis management plans in advance. For policymakers, based on the accurate forecasts derived from the advanced RMIDAS model, response measures can be adjusted promptly according to the specific pandemic situation at hand. It helps to avoid a major blow to the tourism industry and obstacles to its recovery due to inappropriate travel restrictions. For stakeholders in the tourism industry, it is necessary to strengthen the real-time tracking of pandemic-related information and take countermeasures in advance to mitigate the negative impact of the pandemic on the tourism industry. To achieve this, the more advanced forecasting tools developed in this study should be applied. These reliable tools can also help the tourism industry develop effective strategic plans to better prepare for the market recovery from the pandemic, including resource allocation, capacity building, and investment injection.
We also acknowledge some limitations of the study. First, we only take an example of monthly tourism demand forecasting using quarterly GDP to explore the advantages of using low-frequency data directly. In fact, our model is more suitable for predicting high-frequency tourism demand, such as daily tourist traffic, given the greater possibilities of these studies to use explanatory variables with a lower frequency than the target tourism demand. We strongly encourage future studies to estimate and forecast high-frequency tourism demand using multiple mixed-frequency models. It can be conducive to improving the accuracy of tourism demand forecasting by taking full advantage of the information contained in the data of various frequencies. Second, we only conduct ex-post point forecasting to evaluate the models’ performance in this study, while ex-ante forecasting and interval forecasting are crucial directions for future research to provide more effective forecasting for tourism research and management practices. Last, future research should further verify the forecasting performance of our developed method in other empirical contexts.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors would like to acknowledge the financial support from “Natural Science Foundation of China” (Grant No. 72374083; 72004077; 72004106), “Humanities and Social Science Fund of the Ministry of Education” (Grant No. 20YJC79007), and “Fundamental Research Funds for the Central Universities(2023QNTD05).
