Empirically Assessing the Predictivity of Tourism Demand Data

Abstract

Accurate forecasting of tourism demand is critical for policy and business planning yet remains challenging due to the inherent complexity and vulnerability of tourism demand to external shocks. This study introduces a novel predictivity metric based on Weighted Permutation Entropy (WPE) for assessing the intrinsic predictivity of tourism demand data. Building on the limitations of existing entropy measures, particularly Sample Entropy (SampEn) and Multiscale SampEn, WPE is proposed for its effectiveness in capturing both ordinal and amplitude dynamics of the tourism demand, especially under external shocks, such as the COVID-19 pandemic. Using monthly tourist arrival data from Australia, the study evaluates the predictivity of tourism demand across different temporal scales and lengths. The study provides actionable insights for enhancing tourism demand forecasting by optimizing data aggregation scales and adapting the predictivity metric during volatile periods.

Keywords

tourism demand predictivity predictability weighted permutation entropy Australia COVID-19 sample entropy

Introduction

The tourism industry is a vital driver of economic growth for many nations (Vu et al., 2025). Accurate tourism demand forecasting is essential in providing critical insights into emerging trends, evolving tourist preferences, and market dynamics, enabling policy-makers and businesses to make informed decisions (Li et al., 2023). The increasing availability of data and advanced forecasting techniques has fueled efforts to refine these techniques and push the boundaries of predictive potential in tourism data (Zhang et al., 2020).

However, despite these advancements, accurately predicting tourism demand remains a complex challenge (Zhang et al., 2021a, 2021b). Tourism demand is influenced by a multitude of factors, including seasonality, economic cycles, geopolitical events, and unexpected disruptions such as pandemics or natural disasters, all of which introduce significant variability and uncertainty into international tourist flows (Song et al., 2019). These complexities limit the effectiveness of traditional forecasting models, highlighting the need for a deeper understanding of the intrinsic predictivity of tourism demand data—the theoretical upper bound of forecasting accuracy determined by the inherent structure of the data itself (Pennekamp et al., 2019). In the context of tourism, intrinsic predictivity refers to the theoretical maximum accuracy future tourist arrivals can be achieved from historical data patterns (Zhang et al., 2021a, 2021b).

This is particularly relevant, as a recent data-driven meta-method, the RobustSTL-ARIMA-LSTM model proposed from the work of Li et al. (2023), has demonstrated high forecasting accuracy (e.g., up to 98% for Hong Kong tourism data). These results raise fundamental questions about the true predictivity limits of tourism demand data and the reliability of existing predictivity metrics. If forecasting models achieve high accuracy in some cases but fail in others, understanding which predictivity metrics provide the most robust insights into intrinsic predictivity of tourism demand data is essential for future research and practical applications.

Several studies have proposed approaches to evaluate time series predictivity, primarily grounded in Information Theory and entropy-based measures (Zhang et al., 2021a, 2021b). For example, Song et al. (2010) employed variations of Shannon entropy to estimate the upper bound of human mobility predictability, while Pennekamp et al. (2019) quantified the intrinsic predictivity of ecological time series using Weighted Permutation Entropy. Despite the significance of predictivity in tourism demand forecasting, this topic remains underexplored. Zhang et al. (2021a, 2021b) examined the predictivity of Hong Kong’s tourist arrival data by applying Sample Entropy (SampEn) and Multiscale Sample Entropy (MulSampEn) to measure time series complexity and derive maximum predictivity through Fano’s inequality. However, a key limitation of their study lies in the use of these entropy measures, which are not well-suited for accurately capturing the complexity of non-stationary time series (Richman et al., 2004).

The limitation of these measures lies in their methodology, particularly the tolerance distance $r$ , which is defined as a proportion of the time series’ standard deviation. As a result, time series affected by external shocks exhibit inflated standard deviations, leading to a larger $r$ value (Richman et al., 2004). This broader threshold increases the likelihood of detecting more matching patterns (Richman et al., 2004), resulting in lower entropy and misleading high predictivity estimates, even though the actual predictivity of the series has decreased. This limitation raises concerns about the reliability of predictivity assessments on time series with significant disruptions.

Addressing this challenge is the main focus of this research and is essential for developing more robust and reliable metric for assessing the predictivity of tourism demand, especially in the face of unforeseen disruptions (Polyzos et al., 2020). This study proposes a novel Weighted Permutation Entropy (WPE)-based metric for assessing the predictivity of tourism demand data. The motivation for using WPE lies in its sensitivity to temporal dynamics and structural complexity within time series, which other entropy measures often fail to capture (Fadlallah et al., 2013). Thus, this work aims to answer the following research question: How to define a metric that effectively captures the intrinsic predictivity of tourism demand data, particularly under conditions of volatility and disruption?

Building on the work of Zhang et al. (2021a, 2021b) and other related studies, this study assesses the predictivity of tourism demand data and offers the following key contributions. The study proposes a novel and robust metric based on WPE for assessing the predictivity of tourism demand, which is effective in capturing the impacts of external shocks, such as the COVID-19 pandemic. The research identifies the effect of data characteristics, specifically the length and temporal scale of time series data, on predictivity. This helps ensure that forecasting models are built using data with characteristics that offer the highest theoretical potential for accurate prediction. By understanding the relationship between predictivity and actual forecasting performance, practitioners can better select forecasting models that fully leverage the predictive potential of data. These contributions are particularly valuable for risk mitigation strategies within the tourism sectors. For instance, understanding how predictivity declines during external shocks and how quickly it recovers provides an evaluation of the reliability of forecasting models. As a result, government and businesses can formulate more resilient and adaptive strategic plans, minimizing potential losses.

The remainder of this paper is structured as follows: Section literature review evaluates potential metrics to assess the predictivity of tourism demand data. Section at predictivity of tourism deman data presents the calculation and parameter tuning of the predictivity metric based on Weighted Permutation Entropy (WPE). Section at empirical validation of the predictivity metric empirically evaluates and compares predictivity metrics on Australia’s tourist arrival statistics, across time series before and after the COVID-19 pandemic, and explores the relationship between forecasting performance, data characteristics, and these predictivity metrics. Sections at discussion and implications and conclusion discuss the implications and present the conclusion of the study.

Literature Review

This section reviews general predictivity measures from the perspective of time series complexity, which are rooted in two main theoretical frameworks: the Dynamical Systems Theory and Information Theory (Yao et al., 2004). Subsequently, it will focus on entropy measures as a means to quantify intrinsic predictivity, discussing their theoretical foundation and applicability to tourism demand forecasting. Finally, it will examine existing applications of these measures in tourism and highlight the research gap this study aims to address.

General Predictivity Measures

Predictivity in time series forecasting can be categorized into two main types: realized predictivity and intrinsic predictivity. Among them, realized predictivity refers to the actual accuracy achieved by a specific forecasting model when applied to a dataset. For example, the Root Mean Squared Error (RMSE) of a time series forecast quantifies its realized predictivity. In contrast, intrinsic predictivity represents the theoretical upper limit on how well any model can forecast a time series based on the inherent deterministic and stochastic components in the data (Pennekamp et al., 2019). The realized predictivity of forecasting models can then be used to indicate how effectively these models capture the underlying intrinsic predictivity.

This study focuses on assessing the intrinsic predictivity of tourism demand, particularly in univariate time series. While multivariate models often offer improved forecasting potential, they involve complex predictor selection issues that are beyond the scope of this work (Kovantsev & Gladilin, 2020).

Entropy measures offer a practical approximation of intrinsic predictivity by capturing the underlying structure and randomness of time series data. In general, time series that are highly regular (e.g., periodic) are more predictable, while those that are chaotic or purely stochastic exhibit reduced predictivity (Boffetta et al., 2002). Measures of complexity aim to quantify this spectrum and distinguish between deterministic and random behavior (Bandt & Pompe, 2002).

Yao et al. (2004) identified two foundational approaches to quantifying complexity:

the dynamical systems approach, which relies on system trajectories and initial conditions; and

the information theory approach, which assesses the randomness and structural order in data.

These frameworks form the basis for entropy measures of predictivity.

Dynamical Systems Perspective

From the dynamical systems perspective, predictivity is linked to the rate of error growth and the amount of information produced by the system over time (Boffetta et al., 2002). The key measures in this context are the Lyapunov exponent (LE) that quantifies the time interval $T_{p}$ on which the system is predictable (Boffetta et al., 2002), and the Kolmogorov-Sinai (KS) entropy, which is a measure of the rate of information loss due to unpredictability (Adelyanov et al., 2024). However, (Boffetta et al., 2002) noted several limitations of LE in real-world systems, particularly those with finite time or noisy observations. These challenges have motivated the use of modified methods such as the finite-size Lyapunov exponent (FSLE) for empirical applications (Boffetta et al., 2002).

Kolmogorov-Sinai entropy, while theoretically sound, is difficult to compute for empirical datasets, especially those with small data volumes or structural breaks, which is common in tourism demand data. These limitations have led researchers to exploit the statistical description of the data generated by the system to study its dynamical behavior using entropy measures derived from information theory (Barà et al., 2024).

Information Theory Perspective

In information theory, predictivity is inversely related to randomness. A time series with a well-defined structure exhibits lower entropy and therefore higher predictivity, while a fully random sequence has maximum entropy and is inherently unpredictable (Adelyanov et al., 2024). Entropy quantifies the average uncertainty in a sequence of outcomes and serves as a foundation for several complexity measures (Adelyanov et al., 2024).

The most widely known entropy measures for time series include Shannon Entropy, Permutation Entropy (PE), Sample Entropy (SampEn), and their variants. These measures differ in the way they encode information, handle noise, and account for data attributes (Adelyanov et al., 2024; Bandt & Pompe, 2002) that are especially relevant in tourism demand data.

Shannon Entropy

Introduced by Claude Shannon in 1948, Shannon Entropy provides a foundational measure of uncertainty by calculating the probability distribution of time series values (Adelyanov et al., 2024). While it is simple to compute, it assumes stationarity and does not account for temporal or ordinal relationships, making it less suitable for non-stationary tourism data characterized by seasonality and structural shifts.

Permutation Entropy (PE)

PE extends Shannon Entropy by incorporating ordinal structure and was proposed by Bandt and Pompe (Bandt & Pompe, 2002). It analyzes the frequency of different order patterns within a time series and is robust to noise (Bandt & Pompe, 2002). However, PE ignores amplitude information and may produce misleading results in time series with low resolution (Adelyanov et al., 2024), which are typical challenges in tourism datasets.

To overcome this, several PE variants have been developed. For example, Weighted Permutation Entropy (WPE) incorporates amplitude information by weighting patterns based on local variability (Fadlallah et al., 2013). Other enhancements include Multiscale WPE (MSWPE) (Xia et al., 2015) and Fine-Grained PE (FGPE) (Xu et al., 2019), which capture dynamics at different scale or levels of detail.

Sample Entropy (SampEn)

SampEn assesses the regularity of patterns by computing the probability that similar sequences remain similar when extended (Richman & Moorman, 2000). It is less sensitive to short time series than Approximate Entropy (AppEn) but relies heavily on the tolerance parameter $r$ , which is typically set as a fraction of the standard deviation (Adelyanov et al., 2024). In volatile conditions, such as during a pandemic, this leads to inflated $r$ values and unreliable predictivity assessments.

Given these properties, entropy measures have been increasingly applied to domains such as tourism, where complex and non-linear dynamics prevail (Law et al., 2019).

Entropy Measures for Tourism Demand Predictivity

The application of entropy measures to tourism demand forecasting requires careful consideration of the data’s specific characteristics; successful applications in related fields provide a basis for their use. For example, Song et al. (2010) effectively employed entropy to quantify predictability in human mobility, a domain closely related to tourism. This demonstrates the potential of entropy to capture underlying regularities and constraints that influence demand patterns, justifying further exploration of entropy measures in the context of tourism demand.

The suitability of entropy measures for tourism demand data depends on their ability to address its complex characteristics, including strong seasonality, non-stationarity, and vulnerability to external shocks (Song et al., 2019). These features make selecting an appropriate entropy measure particularly challenging.

Sample Entropy has been applied in tourism demand predictivity studies due to its robustness to noise and effectiveness in detecting recurring patterns (Zhang et al., 2021a, 2021b). However, its dependence on the tolerance parameter $r$ can reduce reliability when structural breaks occur, such as those triggered by the COVID-19 pandemic.

Shannon Entropy and Permutation Entropy offer simpler alternatives but do not capture magnitude variations or higher-order dynamics (Adelyanov et al., 2024). In contrast, Weighted Permutation Entropy (WPE) provides a more balanced approach by incorporating both ordinal and amplitude information, making it well-suited for tourism demand data characterized by sudden volatility or long-term structural changes. Table 1 compares the entropy measures discussed, summarizing their main features, strengths, limitations, and applicability to tourism demand analysis.

Table 1.

Summary of Shannon Entropy, Permutation Entropy, WPE, and SampEn

Basis of entropy measure	Strengths	Weaknesses	Applicability to tourism demand	Practical challenges
Shannon entropy
Probabilistic distribution of data values	Simple to compute and interpret	Sensitive to noise and outliers; ignores temporal structure	Applicable to stationary demand	Requires sufficient data to estimate reliable probability distributions
Permutation entropy (PE)
Ordinal patterns: Analyzes relative ordering of data points within sequences	Robust to noise, applicable to deterministic and stochastic systems	Ignores amplitude variations and limited to ordinal patterns	Suitable for detecting structural changes and seasonal patterns in tourism trends	Choice of the embedding dimension can affect results
Weighted permutation entropy (WPE)
Ordinal patterns with weighting function to capture local variations	Improved PE with the weighting function to capture the magnitude of fluctuations	Weighting function design can introduce bias, requires parameters tuning and sufficient data length for seasonal time series	Evaluating structural and magnitude-based predictivity trends	Sensitive to the choice of the parameters ( $m$ , $τ$ )
Sample entropy (SampEn)
Pattern regularity: Evaluates similarity between subsequences within a tolerance ( $r$ )	Ignores minor noise fluctuations; uses relative matching between patterns	Unreliable in the presence of structural breaks due to inflated tolerance parameter ( $r$ )	Useful for assessing regularities in tourism demand	Requires adoption of a resampling approach for non-stationary data

Zhang et al. (2021a, 2021b) provided a methodological framework for assessing the maximum predictivity of tourism data using entropy measures. Their approach involved calculating SampEn (MulSampEn) and then relating these entropy values to maximum predictivity using Fano’s inequality. This approach demonstrates that entropy measures can be adapted to quantify the theoretical limits in the predictivity of tourism demand data.

The methodological framework proposed by (Zhang et al., 2021a, 2021b), while offering valuable insights towards quantifying the predictivity of tourism demand using entropy measures, exhibits several limitations that require further investigation. First, their study lacks a robust rationale for the selection of SampEn as a suitable measure for assessing the intrinsic predictivity of tourism demand data. This lack of clear justification raises concerns about the fundamental validity of using SampEn to accurately reflect the inherent characteristics of tourism demand data. Second, the scope of their analysis regarding external shocks is limited to the SARS outbreak in 2003. While significant, the impact of SARS was relatively contained and short-lived compared to the profound and prolonged impact caused by the COVID-19 pandemic (Song et al., 2022). This limited consideration of external shocks raises questions about the generalizability and robustness of their findings, particularly in the context of more severe and prolonged global crises that significantly alter tourism demand patterns.

Summary

Although entropy measures have shown promise in evaluating the predictivity of time series, their application to tourism demand forecasting remains limited and fragmented. Most existing studies emphasize realized predictivity through model-dependent evaluations (Zhang et al., 2021a, 2021b), with insufficient attention to intrinsic data characteristics.

The work Zhang et al. (2021a, 2021b) has made notable progress using SampEn (MulSampEn) to quantify the intrinsic predictivity of tourism demand data. However, their approach overlooks a key methodological limitation inherent to this entropy measure, which leads to incorrect evaluation of predictivity under external shocks, such as COVID-19 pandemic.

This limitation arises from the use of the tolerance parameter $r$ , which is defined relative to the standard deviation ( $σ$ ) of the time series. During periods of large-scale fluctuations, this dependence causes ( $r$ ) to increase, making more embedding vectors appear similar when compared using Chebyshev distance (Richman et al., 2004). Consequently, the computed entropy artificially decreases, resulting in higher predictivity when, in fact, the underlying dynamics have become more uncertain. To overcome this issue, the present study proposes WPE as a more robust alternative for capturing the intrinsic predictivity of tourism demand data.

The Predictivity of Tourism Demand Data

The research gap outlined previously motivates the present study to propose Weighted Permutation Entropy (WPE) as a superior alternative. WPE’s enhanced ability to capture temporal dynamics, coupled with its robustness and efficiency, offers a more reliable and practical approach to assessing the intrinsic predictivity of tourism demand, particularly in the case of disruptive events where existing measures fall short. Additionally, its simplified parameter tuning and lower computational complexity compared to other entropy measures enhance its practicality for real-world applications (Adelyanov et al., 2024). This section presents the calculation and parameter tuning of WPE for assessing the intrinsic predictivity of tourism demand data.

Formalization of Tourism Demand Forecasting

In this study, tourism demand forecasting task is defined as predicting the number of future tourist arrivals at destinations using past tourism data. The forecasting relies exclusively on historical data presented as a univariate time series (Zhang et al., 2020).

Let the time series be represented by the vector $Y^{T} = {y^{(1)}, y^{(2)}, . ., y^{(T)}}$ with length $T$ , where ${y^{(i)}}_{i = 1}^{k}$ serves as the input, and ${y^{(i)}}_{i = k + 1}^{k + δ}$ represents the forecasted values, with $δ$ indicating the number of steps ahead to predict (Zhang et al., 2021a, 2021b).

The forecasting model, denoted as $F$ , uses the input segment ${y^{(i)}}_{i = 1}^{k}$ to estimate the future values ${y^{(i)}}_{i = k + 1}^{k + δ}$ and the accuracy of the model is assessed by metrics such as the Root Mean Square Error (RMSE) (Zhang et al., 2021a, 2021b).

Predictivity Based on Weighted Permutation Entropy (WPE)

The WPE extends the standard Permutation Entropy (PE) by capturing the dynamic changes in the amplitude of the time series (Pennekamp et al., 2019). The embedding procedure forms $T - τ (m - 1)$ vectors $Y (1), Y (2), . . ., Y (T - τ (m - 1)), Y (i) = [y (i), y (i + 1), . . ., y (i + τ (m - 1))], i = 1, . . ., T - τ (m - 1)$ . Then the m real values of a vector Y(i) are sorted in increasing order: ${y (i + k_{1} - 1) \leq y (i + k_{2} - 1) \leq . . . \leq y (i + k_{m} - 1), 1 \leq k_{1}, k_{2}, . . ., k_{m} \leq m}$ . As a result, any vector Y(i) is mapped into a vector $K_{m} = [k_{1}, k_{2}, . . ., k_{m}]$ , where the original value is replaced by its ordinal index, which is represented as ${0, 1, . . ., m - 1}$ . Each $k_{m}$ ’s contribution to the probability mass function is weighted by its variance $w (K_{m}) \equiv v a r (Y_{i})$ and the weighted probability of each permutation $π$ is estimated by:

p_{w} (π) = \frac{\sum_{t \leq T - m} w (K_{m}) \cdot δ (ϕ (K_{m}), π)}{\sum_{t \leq T - m} w (K_{m})}

(1)

where

δ (k_{i}, k_{j}) = 1

k_{i} = k_{j}

and

δ (k_{i}, k_{j}) = 0

otherwise (Pennekamp et al., 2019). The weighted permutation entropy of order

m \geq 2

is defined as:

h_{w} (m) = - \sum_{π \in K_{m}} p_{w} (π) \log_{2} (p_{w} (π))

(2)

Then the WPE is normalized by the possible maximum PE, and the predictivity (WPE) is calculated as:

Ψ_{h_{w} (m)} = 1 - \frac{h_{w} (m)}{h_{\max} (m)}

(3)

where

h {(m)}_{\max} = \log_{2} (m!)

and

0 \leq h_{w} (m) \leq h {(m)}_{\max}

. Thus, the predictivity value is within the range

[0, 1]

, where

0

represents a completely random and unpredictable time series, and

1

indicates a fully deterministic and perfectly predictable time series. This approach of estimating predictivity is conceptually different from (Zhang et al., 2021a, 2021b), as it links predictivity relative to the bound of complete randomness or unpredictability in a time series.

Parameter Tuning for WPE

The calculation of WPE relies on the selection of key parameters: the embedding dimension (m) and the time delay ( $τ$ ). These parameters significantly influence the resulting entropy value and, consequently, the predictivity metric.

Choosing appropriate values for m and $τ$ is essential for obtaining meaningful results. Riedl, Oertel, and Wessel (2013) provide some heuristics, but there is no single rule of thumb that fits all cases. The optimal parameter selection depends on the characteristics of the time series being analyzed, such as its length, sampling frequency, and underlying dynamics (Riedl, Oertel, and Wessel 2013).

For measuring the complexity of the main oscillation (seasonal cycle), (Riedl, Oertel, and Wessel, 2013) suggests setting the time delay $τ$ equal to the period length of the main oscillation, for instance, $τ = 12$ for monthly data with annual seasonality. To measure the processes that possess an inherent cycle represented by triggering events, Riedl, Oertel, and Wessel (2013) recommend $τ$ = 1. As for the embedding dimension $m$ , it is recommended to choose the maximum according to $N > 5 m!$ , where $N$ is the length of the time series.

Figure 1(a) and (b) illustrates an example of WPE parameters tuning for a monthly time series of length $N = 406$ . The optimal delay parameter within the range 1–20 is observed at $τ = 12$ , corresponding to a local minimum in WPE (Riedl, Oertel, and Wessel, 2013). Given the length of the time series, an embedding dimension of $m = 3$ is appropriate. As the embedding dimension should reveal forbidden permutations and ensure reliable statistics, it typically requires an average of 100 counts per permutation (Garland et al., 2014). Therefore, for this monthly time series, the optimal delay parameter $τ = 12$ and the embedding dimension $m = 3$ .

Figure 1.

WPE With $m$ = 3–11 and Time Delay $τ$ = 2–23

To optimally select the embedding dimension m and time delay $τ$ simultaneously, a generic optimization method such as grid search is employed. In this approach, m varies from 3 to 11, as values below 3 do not provide sufficient permutation space. The time delay $τ$ depends not only on the length of the time series but also on its seasonality. As shown in Figure 1, the optimal embedding dimension is m = 3 and the optimal time delay is $τ = 12$ , corresponding to the minimum Weighted Permutation Entropy (WPE). The results also highlight a prominent half-year pattern for the time delay, suggesting that selecting samples spaced six months apart is preferable for capturing the underlying dynamics.

The Empirical Validation of the Predictivity Metric

This section empirically assesses the predictivity of tourism demand data by addressing the core research question: How to define a metric that effectively captures the intrinsic predictivity of tourism demand data, particularly under conditions of volatility and disruption?

The section begins by presenting the validation framework and methods used to validate the metric. This is followed by a detailed rationale and description of the data used in the analysis. The predictivity of Australia’s tourism demand is then assessed using the proposed predictivity metric and compared against the baseline metric based on SampEn (MulSampEn). Subsequently, the empirical validation of the proposed predictivity metric is carried out within the established framework. All experiments are implemented using the Python programming platform.

The Predictivity Metric Validation Framework

The framework, illustrated in Figure 2, assesses the effectiveness of the predictivity metric by defining its correlation with the realized predictivity of tourism demand data.

Figure 2.

The Predictivity Metric Validation Framework

The validation framework involves the following steps: (1) Calculate the predictivity for a diverse set of tourism demand datasets. (2) Apply established time series forecasting models (e.g., ARIMA, ARIMAX) to these datasets and quantify their realized predictivity with forecasting error (e.g., RMSPE). (3) Analyze the correlation between the predictivity metric and forecasting error.

A strong negative correlation would indicate that higher predictivity (lower complexity) corresponds to lower forecasting errors (higher realized predictivity), thus validating the proposed metric.

Methods Used in the Predictivity Validation Framework

The methods used in the predictivity metric validation framework include predictivity metrics, forecasting models, forecasting error, and correlation coefficients.

As the code of CIR# (Bufalo & Orlando, 2023b) is not available, therefore autoregressive Integrated Moving Average (ARIMA), its variations, Prophet and CIR (Bufalo & Orlando, 2023a) are adopted as baseline approaches for forecasting tourist arrivals. Forecast results are evaluated using the Root Mean Squared Percentage Error (RMSPE), Mean Absolute Percentage Error (MAPE), and Normalized Root Mean Squared Error (NRMSE) as equation (4).

RMSPE = \sqrt{\frac{1}{T} \sum_{t = 1}^{T} {(\frac{| y_{t} - {y_{t}}^{'} |}{y_{t}})}^{2}}

MAPE = \frac{1}{T} \sum_{t = 1}^{T} (\frac{| y_{t} - {y_{t}}^{'} |}{y_{t}})

NRMSE = \frac{\sqrt{\frac{1}{T} \sum_{t = 1}^{T} {(| y_{t} - {y_{t}}^{'} |)}^{2}}}{σ_{y}}

(4)

where

T

is the total number of observations,

{y_{t}}^{'}

is predicted tourist arrivals, and

y_{t}

is actual tourist arrivals. The

σ_{y}

is the mean value of all true tourism arrivals.

The rank-based non-parametric correlation coefficients Kendall’s and Spearman’s correlation coefficients are employed to assess the relationship between the intrinsic and realized predictivity. Non-parametric methods are suitable when the underlying data distribution is unknown or when dealing with small sample sizes (Denœux et al., 2005), conditions that are common for tourism demand data. These coefficients assess the strength and direction of the monotonic association between predictivity metric rankings and the corresponding RMSPE rankings.

Kendall’s coefficient is a conjoint unweighted rank measure, which reflects the agreement between the rankings and provides a direct probabilistic interpretation that $i, j$ are ranked in the same order in both rankings (Webber et al., 2010). The formula for Kendall’s coefficient is listed as below:

τ = \frac{C - D}{(\begin{array}{l} n \\ 2 \end{array})}

(5)

where

C

is the number of concordant pairs,

D

is the number of discordant pairs,

(\begin{array}{l} n \\ 2 \end{array}) = \frac{n (n - 1)}{2}

and the total number of possible pairs is for $n$ datasets.

Spearman’s correlation coefficient measures the strength and direction of a monotonic relationship between two ranked variables, it is computed by first determining the differences between the ranks of the corresponding values and then applying the formula:

ρ = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}

(6)

where

d_{i}

represents the difference between ranks, and

n

is the total number of observations (Pan & Takefuji, 2025).

Both correlation coefficients range from $- 1$ to $1$ . If a coefficient value is negative, this implies that the predictivity values are inversely correlated with RMSPE. A value of $- 1$ indicates a perfect negative monotonic relationship, where higher predictivity values correspond to lower RMSPE values consistently across all datasets. In this context, a negative correlation coefficient suggests that as predictivity increases (or complexity decreases), the RMSPE tends to decrease, which is a desirable outcome indicating stronger potential of the predictivity metric. A positive correlation value, or close to zero, on the other hand, implies little to no meaningful relationship between the predictivity metric and forecasting error. This would suggest that the metric fails to capture the intrinsic predictivity of the data.

Rationale and Description of the Australian Tourism Demand Data

This study uses monthly statistics of short-term international visitor arrivals to Australia, sourced from the Australian Bureau of Statistics (ABS, 2024), spanning multiple years and including both national and state-level data, which allows for analysis across different temporal and spatial scales. Australia is chosen for three main reasons. First, tourism is one of Australia’s fastest-growing industries and a major export sector, and during the COVID-19 pandemic, the country experienced losses of nearly A$9 billion per month in tourism revenue and over 300,000 job losses (Pham et al., 2021), making it economically significant for assessing tourism demand predictability. Second, the richness of the data enables robust testing of the Weighted Permutation Entropy (WPE) predictivity metric across multiple datasets with varying lengths and seasonal patterns. Third, no previous study has specifically applied WPE to Australia’s tourism demand, making this research novel and providing insights into a diverse tourism market, which can support the development of more accurate forecasting models. By leveraging this comprehensive dataset, the study assesses WPE’s ability to capture intrinsic predictability in a real-world, economically important setting. The data consist of 406 monthly records of international tourist arrivals in the Australian states and territories from January 1991 to October 2024 and include nine distinct datasets that are presented in Figure 3. It is observed that the tourist flow increased from 1991 to February 2020, followed by a sharp decline due to the COVID-19 travel restrictions introduced in March 2020. The international border restrictions were subsequently lifted on July 6, 2022 (The Department of Home Affairs, 2022).

Figure 3.

The Graph Shows Monthly International Tourist Arrivals in Australia and Its States and Territories From January 1991 to October 2024

All analyses are conducted using the complete historical datasets, each dataset is divided into sub-series of varying lengths using a sliding-window approach. This approach involves sequentially moving a sub-series of a specific length along the full time series to generate multiple overlapping sub-series. During the validation with forecasting models, each sub-series is split into training and test sets. Forecasting models are trained on the training sets, and forecasts are generated for the test periods. Forecast accuracy is then assessed by calculating the forecasting error over these test sets. The analysis is conducted across three periods: the full period (January 1991 to October 2024), the pre-COVID period (up to February 2020), and the post-COVID period (from February 2020 onward). The monthly time series are also scaled to generate corresponding quarterly (3-month) and half-yearly (6-month) series to define the impact of data characteristics on predictivity. The analysis is conducted with python version 3.1 and numpy version 2.0. Also, the version of statsmodels is 0.13.

Predictivity and Complexity Assessment Using WPE and SampEn (MulSampEn)

This section compares intrinsic predictivity and complexity measured by WPE and SampEn (MulSampEn) across time series of varying lengths and scales (granularity). The assessment is conducted on the time series of the “Total” dataset, with data lengths ranging from 5 to 32 years and temporal scales of monthly, quarterly and half-yearly, separately on the full and the pre-COVID periods. Mean predictivity and entropy values are calculated using a sliding-window approach: for each sub-series, the entropy measures and predictivity metrics are calculated, and the average value is taken across all sub-series of a given length.

Predictivity Assessment With WPE

The complexity of the time series is measured using WPE, with the embedding dimension set to $m = 3$ and the delay parameter set to $τ = 12$ for monthly, 4 for quarterly, and 2 for half-yearly series. The corresponding predictivity metric is calculated using equation (3). According to the guideline by (Riedl, Oertel, and Wessel 2013), the minimum required sub-series length must be not less than $5 m!$ to ensure reliable entropy estimation. Thus, the minimum lengths for quarterly and half-yearly time series are 8 and 16 years, respectively.

Figure 4(a) and (b) presents the complexity and predictivity assessed with WPE for the full and pre-COVID time series. As observed for the pre-COVID series, where external shocks are absent, the complexity tends to decrease, and predictivity (WPE) improves with larger scale, being maximized at the largest scale (half-yearly) and the longest series (29 years). However, the impact of COVID disrupts this trend in the full series, leading to an increase in complexity and a decrease in predictivity. This suggests that COVID increases volatility, leading to reduced predictivity, which in turn makes forecasting more challenging to model.

Figure 4.

The WPE and Predictivity of the Full and Pre-COVID Series by Time Scales: Monthly, Quarterly, and Half-Yearly. The COVID Period Introduces Additional Volatility, Resulting in a Decrease in Predictivity, Which Makes Predictions Less Reliable and More Complex to Model. This Requires Adjustments to Forecasting Models to Account for Structural Breaks and Changing Dynamics

The results indicate that larger data scales can enhance predictivity during stable periods, whereas significant disruptions can alter the underlying structure and predictivity of the data. This demonstrates that WPE effectively captures changes in predictivity caused by external shocks.

Predictivity Assessment With SampEn, MulSampEn, Shannon, and Permutation Entropy

The complexity of the monthly time series is assessed using SampEn, and the quarterly and half-yearly series are assessed using MulSampEn. Similar to SampEn, MulSampEn captures the complexity of the time series data with multiple scaling levels (Zhang et al., 2021a, 2021b). The resulting predictivity metric based on these entropy measures, denoted as $Ψ_{\max}$ , is calculated following the approach outlined by (Zhang et al., 2021a, 2021b).

For SampEn and MulSampEn the following parameters are set: the embedding dimension $m = 3$ , delay $τ = 1$ , and tolerance distance $r = 0.65 \times σ$ . The SampEn, MulSampEn, and $Ψ_{\max}$ of the full and pre-COVID time series, as shown in Figure 5(a) and (b), exhibit almost no difference in the trend between the two series, suggesting that the effects of COVID are not strongly reflected by these entropy measures. This limitation arises because the standard deviation ( $σ$ ) increases when there is a significant change in the trend, which expands the tolerance distance $r$ . As a result, a larger number of embedding vectors, compared by Chebyshev distance, fall within $r$ and are counted as similar, leading to a decrease in complexity and increase in $Ψ_{\max}$ . This effect persists even when the coefficient value of $r$ is reduced, exposing a methodological limitation in SampEn, MulSampEn.

Figure 5.

SampEn, MulSampEn Is Unable to Detect Disruptions Like COVID-19 in Tourism Demand Data, Due to a Methodological Limitation Rooted in the Tolerance Parameter r

The limitation of SampEn and MulSampEn highlights the potential of WPE in assessing the intrinsic predictivity of tourism demand data, especially under external shocks.

Similarly, Shannon and Permutation Entropy are evaluated on both the full and pre-COVID time series, based on the corresponding predictivity measures. In this experiment, the embedding dimension is set to $m = 3$ for both methods, and the time delay is $τ = 12$ for Permutation Entropy. Given the clear monthly patterns in the data, a month-scaled time series is used for simplification. The maximum predictivity $Ψ_{\max}$ for the full and pre-COVID time series is shown in Figure 6. It can be observed that neither Shannon nor Permutation Entropy is able to detect disruptions such as COVID-19 in tourism demand data. The $Ψ_{\max}$ patterns for the full and pre-COVID time series are similar, and predictivity increases with longer data lengths. These results indicate that Shannon and Permutation Entropy are not suitable for assessing predictivity in tourism demand forecasting under realistic scenarios that are prone to disruptions.

Figure 6.

Shannon and Permutation Entropy Are Unable to Detect Disruptions Like COVID-19 in Tourism Demand Data

Predictivity Assessment Over External Shock

The COVID-19 pandemic caused global shifts in all industries, including tourism. Understanding how this structural shock affected the predictivity of tourism data is crucial for tourism analysts in developing robust forecasting models that incorporate structural shifts and external disruptions. This section examines how the impact of COVID-19 on predictivity varies across different data lengths. These insights are valuable for practitioners aiming to identify the optimal data length for forecasting models.

As shown in Figure 7, the impact of COVID-19 is evident in the widening gap between the mean predictivity (WPE) of the full and pre-COVID time series. This widening occurs due to the growing proportion of COVID-affected sub-series as the data length increases.

Figure 7.

Mean Predictivity (WPE) of the Full and Pre-COVID Monthly Time Series

To illustrate this effect, the predictivity distributions are compared for two different data lengths: 144 months (12 years), where the gap is minimal, and 288 months (24 years), where the gap becomes more pronounced.

In Figure 8(a), the predictivity distributions of the full and pre-COVID series are nearly identical for a data length of 144 months (12 years), as the proportion of COVID sub-series is low and has an insignificant impact on the mean predictivity. However, for a data length of 288 months (24 years) in Figure 8(b), the proportion of COVID-affected sub-series with low predictivity, ranging between 0.0 and 0.3, becomes substantial, leading to a decrease in mean predictivity (WPE). It is also observed that some sub-series within the COVID period exhibit higher predictivity values compared to those from the pre-COVID period. To further investigate the effect of COVID-19, the trend of sub-series predictivity for the two data lengths is illustrated in Figure 9(a) and (b).

Figure 8.

The Predictivity Distribution of the Pre-COVID Series Becomes Narrower With Increasing Series Length, as More Recurring Patterns Are Captured. Therefore, Longer Time Series Can Enhance the Reliability of Forecasting Models in the Absence of External Shocks. In Contrast, the COVID-Affected Sub-series Cause a Broader Distribution in Longer Series, Reflecting Increased Volatility and Reduced Pattern Consistency

Figure 9.

The Predictivity (WPE) Trend of Sub-series With Lengths of 144 Months (12 Years) and 288 Months (24 Years) for the Full Series. The Effect of COVID-19 on the Predictivity Is Evident With the First Sub-series Including the Initial Month Affected by the Pandemic. As the Pandemic Disrupted the Cyclical Pattern of the Time Series, the Assessment of Predictivity (WPE) on the COVID-Affected Periods Must Be Conducted With an Adjusted Delay Parameter $τ$

The red dot in Figure 9(a) and (b) represents the predictivity of the last pre-COVID sub-series. The effect of COVID-19 is observed within the length of the oscillation period, or the delay parameter ( $τ = 12$ ), of WPE for monthly time series. During the initial three months, COVID has a negative impact on predictivity, followed by a positive effect over the subsequent nine months, reaching the maximum point. These effects, occurring within the oscillation period, are due to the increased weight (standard deviation) of similar patterns that include the COVID-19 months. This shows a limitation of WPE: it can be influenced by short-term high-volatility events if those events create new, regularly repeating patterns, such as the year-on-year drop in tourist arrivals caused by COVID-19. As a result, it may hide a longer-term decrease in actual predictivity caused by increasing randomness in the data. This decrease becomes evident after the peak point, when the impact of COVID-19 leads to a steady decline in predictivity (WPE), caused by the emergence of increasingly varied and irregular patterns in the data. Therefore, to accurately assess the predictivity of time series impacted by external shocks that alter their cyclical patterns, the delay parameter $τ$ needs to be adjusted.

As Figure 9(a) and (b) demonstrates, during the pre-COVID period, the volatility in predictivity of the shorter series (12 years) is considerably higher compared to the longer series (24 years). As the data length increases, the range of the predictivity (WPE) distribution of the pre-COVID series narrows, indicating that longer time series better capture the underlying structure and repeating patterns. The longer 24-year series also better illustrate the extent of COVID-19’s impact on predictivity: while the pre-COVID predictivity was around 0.4, the stabilized post-COVID predictivity levels, observed in sub-series with an index of 100 or higher, drop to approximately 0.2. Therefore, for cyclical data like tourism demand, longer time series offer more stable predictivity assessments than shorter ones. This suggests that tourism forecasters can benefit from using longer historical data, as it helps capture stable and repeatable trends, leading to more reliable predictions.

The Validation of the Predictivity Metric

This section validates and compares forecasting performance across different models to identify which approach most effectively reflects the intrinsic predictability of the time series. Following Zhang et al. (2021a, 2021b), three forecasting models are applied to 12 years of tourism demand data, and predictivity is evaluated using Weighted Permutation Entropy (WPE).

Tables 2 –4 report both forecasting performance of RMSPE, MAPE, NRMSE, and WPE outcomes for each destination, as well as for total pre-COVID tourism demand. Table 5 states the DM significant test among the four different models. It could be seen that CIR is better than SARIMA and ARIMA accordingly. Notably, for the proposed WPE, as it is model agnostic, the results Tables 2 –4 are only used to indicate the trend of the forecasting performance and WPE. Overall, CIR, SARIMA, and Prophet outperform ARIMA in terms of RMSPE, MAPE, NRMSE, and their corresponding WPE indicate consistently higher predictability. Particularly, the CIR’s performance is outstanding here and is used for later COVID period evaluation (Bufalo & Orlando, 2023a, 2023b). Despite differences in forecasting accuracy, all four models display highly similar trends in WPE behavior, indicating that WPE predictivity remains stable across modelling approaches. The consistency between WPE values and empirical forecasting performance further demonstrates the robustness of WPE as a predictivity metric. Given this alignment and noting that the objective of subsequent experiments is not forecasting accuracy itself but the evaluation of predictivity, the simplest baseline model—ARIMA—is selected for further analysis. ARIMA is defined by three parameters: p (autoregressive order), d (degree of differencing), and q (moving-average order). Optimal configurations are selected through grid search over p, d, q ∈ [0, 1, 2], with the best model determined by the minimum Akaike Information Criterion (AIC), consistent with Zhang et al. (2021a, 2021b). The monthly pre-COVID time series of 12 and 24 years are used to generate a 12-month forecast. This forecast horizon is selected for its alignment with the oscillation (main) period. For each destination dataset, the mean RMSPE,

Ψ_{\max}

and Predictivity (WPE) are calculated to validate the predictivity metric with actual forecasting performance.

Table 2.

Mean RMSPE and Predictivity (WPE) With Different Forecasting Models on Pre-COVID

Destinations	ARIMA	SARIMA	Prophet	CIR	Predictivity (WPE)
Total	0.1477	0.1361	0.1425	0.1355	0.3376
NSW	0.1764	0.1827	0.1683	0.1577	0.2175
Vic	0.2029	0.1983	0.2298	0.1783	0.5475
Qld	0.1345	0.1565	0.1273	0.1582	0.1610
SA	0.2477	0.2673	0.2313	0.2635	0.2120
WA	0.2294	0.2198	0.2652	0.2109	0.2696
Tas	0.5429	0.5169	0.5073	0.5092	0.1265
NT	0.3489	0.3387	0.3152	0.3210	0.1121
ACT	0.2355	0.2151	0.2277	0.2380	0.1751

Table 3.

Mean MAPE and Predictivity (WPE) With Different Forecasting Models on Pre-COVID

Destinations	ARIMA	SARIMA	Prophet	CIR	Predictivity (WPE)
Total	0.1121	0.1006	0.1045	0.0819	0.3376
NSW	0.1211	0.1331	0.0982	0.0873	0.2175
Vic	0.1512	0.1415	0.1536	0.1293	0.5475
Qld	0.0728	0.0938	0.0763	0.1022	0.1610
SA	0.1203	0.1311	0.1097	0.1289	0.2120
WA	0.1315	0.1283	0.1387	0.1029	0.2696
Tas	0.2133	0.1892	0.1763	0.1773	0.1265
NT	0.1918	0.1835	0.1672	0.1781	0.1121
ACT	0.1226	0.1317	0.1045	0.1262	0.1751

Table 4.

Mean NRMSE and Predictivity (WPE) With Different Forecasting Models on Pre-COVID

Destinations	ARIMA	SARIMA	Prophet	CIR	Predictivity (WPE)
Total	0.0982	0.0826	0.0917	0.0822	0.3376
NSW	0.1011	0.0983	0.1002	0.0978	0.2175
Vic	0.1172	0.1102	0.1157	0.1098	0.5475
Qld	0.0793	0.0766	0.0782	0.0761	0.1610
SA	0.1226	0.1298	0.1312	0.1280	0.2120
WA	0.1283	0.1277	0.1279	0.1265	0.2696
Tas	0.1820	0.1893	0.1912	0.1877	0.1965
NT	0.1512	0.1483	0.1498	0.1476	0.1121
ACT	0.1220	0.1192	0.1209	0.1187	0.1751

Table 5.

Diebold-Mariano Test (DM) Test With Different Forecasting Models on Pre-COVID

Models	ARIMA vs SARIMA	ARIMA vs prophet	ARIMA vs CIR	SARIMA vs prophet	SARIMA vs CIR	Prophet vs CIR
p-value	0.5301	0.3396	0.0282	0.6422	0.0472	0.3299

Table 6 summarizes the mean RMSPE, MAPE, NRMSE,

Ψ_{\max}

, and Predictivity (WPE) across different destinations and data lengths. Table 7 shows the correlation between these metrics and RMSPE. There is a negative correlation between predictivity (WPE) and all other error metrics. Notably, correlation is employed here to assess both the strength and direction (positive or negative) of the relationship between forecasting performance and WPE predictivity. This validates that lower forecasting error is associated with higher predictivity, indicating that more structured and less random time series tend to yield more accurate forecasts. The stronger negative correlation for the 24-year time series aligns with the earlier findings, confirming that longer time series offer more consistent predictivity, as reflected in better aligned forecasting performance. In contrast, the positive correlation observed for

Ψ_{\max}

highlights its limited effectiveness in assessing the intrinsic predictivity of tourism data. In particular, the mean RMSPE is higher for Tasmania and the Northern Territory, possibly because these destinations are less popular than others, and their data characteristics affect the predictivity of tourism demand for these regions.

Table 6.

Mean RMSPE, MAPE, NRMSE, $Ψ_{\max}$ , and Predictivity (WPE) by Destination and Data Length

Destinations	Data length 12 years					Data length 24 years
Destinations	RMSPE	MAPE	NRMSE	$Ψ_{\max}$	Predictivity (WPE)	RMSPE	MAPE	NRMSE	$Ψ_{\max}$	Predictivity (WPE)
Total	0.1477	0.1121	0.0982	0.9101	0.3376	0.1506	0.1172	0.1055	0.9452	0.4490
NSW	0.1764	0.1211	0.1011	0.9139	0.2175	0.1680	0.1203	0.0981	0.9375	0.2785
Vic	0.2029	0.1512	0.1172	0.9226	0.5475	0.1805	0.1452	0.1152	0.9682	0.6496
Qld	0.1345	0.0728	0.0793	0.8970	0.1610	0.1323	0.7209	0.0803	0.9282	0.2142
SA	0.2477	0.1203	0.1226	0.9068	0.2120	0.2374	0.1189	0.1209	0.9412	0.2451
WA	0.2294	0.1315	0.1283	0.9255	0.2696	0.2079	0.1289	0.1191	0.9497	0.3019
Tas	0.5429	0.2133	0.1820	0.9508	0.1265	0.3982	0.1763	0.1560	0.9547	0.1340
NT	0.3489	0.1918	0.1512	0.9212	0.1121	0.4107	0.2161	0.1721	0.9429	0.0984
ACT	0.2355	0.1226	0.1220	0.8920	0.1751	0.2533	0.1270	0.1289	0.9394	0.1802

Table 7.

Correlation Results Between Forecasting Results and Predictivity (WPE), and Between Forecasting Results and $Ψ_{\max}$ Across 12- and 24-Year Pre-COVID Time Series

	Kendall’s $τ$	Kendall’s p-value	Spearman’s $ρ$	Spearman’s p-value
RMSPE vs predictivity (WPE)
12 years	−0.3889	0.1802	−0.5333	0.1392
24 years	−0.5556	0.0446	−0.6667	0.0499
MAPE vs predictivity (WPE)
12 years	−0.1111	0.7614	−0.2667	0.4897
24 years	−0.3333	0.2595	−0.5000	0.1705
NRMSE vs predictivity (WPE)
12 years	−0.2778	0.3585	−0.4500	0.2242
24 years	−0.5000	0.0752	−0.6333	0.0671
RMSPE vs $Ψ_{\max}$
12 years	0.3889	0.0446	0.4000	0.2861
24 years	0.2222	0.4767	0.3167	0.4064
MAPE vs $Ψ_{\max}$
12 years	0.5556	0.1802	0.7500	0.0199
24 years	0.0031	0.9991	0.0167	0.5627
NRMSE vs $Ψ_{\max}$
12 years	0.5550	0.0752	0.6000	0.0876
24 years	0.2778	0.3585	0.3833	0.3085

Predictivity Validation Over Varied Data Scale

This section validates the predictivity (WPE) by examining its relationship with forecasting error across different scales of temporal aggregation, thereby assessing how data scale influences predictivity in relation to underlying data characteristics. By fixing the data length at 12 years, Tables 8 and 9 and Table 10 present the mean RMSPE, MAPE, and NRMSE of ARIMA models and predictivity (WPE) for the pre-COVID series. The results clearly indicate that the data scale has a significant impact on forecasting performance. The time series aggregated at quarterly and half-yearly scales exhibit higher predictivity, resulting in improved forecasting accuracy compared to monthly data. However, in many realistic cases, the decision-making is made at monthly level. In here, the conclusion of the forecasting performance is purely based on the evaluation performance. This is supported by a strong negative relationship between error metrics and predictivity at larger data scales, as indicated by both Kendall’s and Spearman’s correlation coefficients in Table 11. These findings suggest that, in the absence of external shocks, tourism analysts can improve forecasting performance by using more coarsely scaled time series.

Table 8.

Mean Predictivity (WPE) and RMSPE by Destination and Scale Level Data Length 12 Years

Destinations	Mean RMSPE			Mean predictivity (WPE)
Destinations	Monthly	Quarterly	Half-yearly	Monthly	Quarterly	Half-yearly
Total	0.1477	0.0546	0.0446	0.3376	0.5031	0.5419
NSW	0.1764	0.0687	0.0509	0.2175	0.3752	0.3858
Vic	0.2029	0.0675	0.0478	0.5475	0.6751	0.7242
Qld	0.1345	0.0646	0.0622	0.1610	0.2979	0.3920
SA	0.2477	0.0894	0.0640	0.2120	0.3810	0.4889
WA	0.2294	0.0571	0.0448	0.2696	0.4654	0.6628
Tas	0.5429	0.1459	0.0970	0.1265	0.2320	0.3388
NT	0.3489	0.1906	0.1533	0.1121	0.1875	0.1943
ACT	0.2355	0.1160	0.0862	0.1751	0.2781	0.4009

Table 9.

Mean Predictivity (WPE) and MAPE by Destination and Scale Level Data Length 12 Years

Destinations	Mean MAPE			Mean predictivity (WPE)
Destinations	Monthly	Quarterly	Half-yearly	Monthly	Quarterly	Half-yearly
Total	0.1121	0.0315	0.0276	0.3376	0.5031	0.5419
NSW	0.1211	0.0463	0.0422	0.2175	0.3752	0.3858
Vic	0.1512	0.0455	0.0325	0.5475	0.6751	0.7242
Qld	0.0728	0.0496	0.0452	0.1610	0.2979	0.3920
SA	0.1203	0.0711	0.0520	0.2120	0.3810	0.4889
WA	0.1315	0.0491	0.0398	0.2696	0.4654	0.6628
Tas	0.2133	0.1363	0.0793	0.1265	0.2320	0.3388
NT	0.1918	0.1618	0.1231	0.1121	0.1875	0.1943
ACT	0.1226	0.1002	0.0735	0.1751	0.2781	0.4009

Table 10.

Mean Predictivity (WPE) and NRMSE by Destination and Scale Level Data Length 12 Years

Destinations	Mean NRMSE			Mean predictivity (WPE)
Destinations	Monthly	Quarterly	Half-yearly	Monthly	Quarterly	Half-yearly
Total	0.0982	0.0261	0.0115	0.3376	0.5031	0.5419
NSW	0.1011	0.0336	0.0262	0.2175	0.3752	0.3858
Vic	0.1172	0.0325	0.0255	0.5475	0.6751	0.7242
Qld	0.0793	0.0311	0.0331	0.1610	0.2979	0.3920
SA	0.1226	0.0557	0.0360	0.2120	0.3810	0.4889
WA	0.1283	0.0310	0.0118	0.2696	0.4654	0.6628
Tas	0.1820	0.1165	0.0562	0.1265	0.2320	0.3388
NT	0.1512	0.1421	0.0981	0.1121	0.1875	0.1943
ACT	0.1220	0.0983	0.0765	0.1751	0.2781	0.4009

Table 11.

Correlation Results Between Predictivity (WPE) and Error Metrics Across Different Data Scales for Pre-COVID Time Series of 12 and 24 Years in Length

	Kendall’s $τ$	Kendall’s p-value	Spearman’s $ρ$	Spearman’ p-value
RMSPE
Monthly	−0.3889	0.1802	−0.5333	0.1392
Quarterly	−0.6667	0.0127	−0.8000	0.0096
Half-yearly	−0.5556	0.0446	−0.7833	0.0125
MAPE
Monthly	−0.1111	0.7614	−0.2667	0.4879
Quarterly	−0.7778	0.0024	−0.9000	0.0009
Half-yearly	−0.6111	0.0247	−0.8000	0.0096
NRMSE
Monthly	−0.2778	0.3585	−0.4500	0.0001
Quarterly	−0.6667	0.0127	−0.8000	0.0001
Half-yearly	−0.5000	0.0752	−0.7333	0.0001

Predictivity Validation Over External Shock

In this section, the predictivity metric is evaluated using COVID-affected time series. To account for the structural shift introduced by the pandemic, ARIMAX and CIR models are applied alongside ARIMA. ARIMAX extends the standard ARIMA framework by incorporating an exogenous dummy variable representing the COVID-19 period, allowing the model to adjust for external shocks and improve forecast robustness under regime changes. Similarly, the CIR model has been reported as resilient to disruptive events (Bufalo & Orlando, 2023b) and is therefore included as a comparative benchmark to assess predictivity under external shocks. Also in this section, the RMSPE is the error metric used here.

Given the diverse impact of COVID-19 across Australian states, the “Total” level is selected to evaluate the overall effect of the pandemic on the predictivity. To ensure an accurate assessment, the delay parameter $τ$ is adjusted from 12 to 1, as the pandemic is considered as an external shock or triggering event that disrupted the original cyclical pattern.

Figure 10(a) and (b) illustrates the trend in predictivity metrics and RMSPE across 45 sub-series of length 288 months (24 years) with ARIMAX model. The first sub-series spans from March 1996 to February 2020, with the latter marking the beginning of the COVID-affected period. The results show that forecast performance, indicated by a decrease in RMSPE, improves in parallel with increases in predictivity (WPE). In contrast, $Ψ_{\max}$ increases only gradually and exhibits weaker alignment with changes in RMSPE, suggesting that it is less responsive to structural variations in the data. This limited responsiveness reflects the inherent constraints of SampEn to short-term disruptions and evolving temporal dynamics, reinforcing the conclusion that WPE provides a more effective and adaptive metric for assessing the predictivity of tourism demand data with rapidly changing structures. Similarly for Figure 10(c) and (d), the trend in predictivity metrics and RMSPE with CIR model have similar patterns as Figure 10(a) and (b). Notably, the RMSPE is lower for CIR model for the first sub-series spans from March 1996 to February 2020.

Figure 10.

RMSPE (Red Line) and Predictivity Metrics (Blue Line), for 45 Monthly Sub-series of the “Total” Dataset, Each Length of 24 Years. The Plot Demonstrates That the Model’s Adjustment to the COVID-19 Impact Corresponds With an Initial Increase in the Predictivity (WPE), Followed by Stabilization in Both Predictivity and Forecasting Performance. This Pattern Suggests That, After a Short-Term Adaptation Period, the Forecasting Model Becomes More Resilient to Pandemic-Related Volatility. Such Insights Can Be Utilized to Inform the Selection of Training Windows That Account for External Shocks, Enhancing Model Robustness and Accuracy During Periods of Structural Change

The stronger negative correlation between RMSPE and predictivity (WPE), as shown in Table 12, further supports the superiority of WPE in capturing the intrinsic predictivity of tourism demand data. Overall, the results confirm that the metric based on WPE serves as a robust and reliable measure for assessing the predictivity of tourism demand data across both stable and volatile periods.

Table 12.

Correlation Results Between RMSPE and the Predictivity Metrics

RMSPE	Kendall $τ$	Kendall’s p-value	Spearman’s $ρ$	Spearman’s p-value
Predictivity (WPE)	−0.5879	0.0000	−0.7611	0.0000
$Ψ_{\max}$	−0.3636	0.0004	−0.5646	0.0001

Discussion and Implications

The empirical findings of this study offer profound insights into the assessment of tourism demand predictivity, particularly under volatile conditions induced by external shocks. This work demonstrates the effectiveness of WPE in capturing the intrinsic predictivity of tourism demand data and validates its superior robustness compared to SampEn (MulSampEn).

Theoretical Implications

This research significantly advances the theoretical understanding of time series predictivity, especially within complex, non-stationary systems like tourism demand. From a theoretical standpoint, this study contributes to the broader literature on time series predictivity by proposing WPE as a more robust alternative for estimating the intrinsic predictivity of tourism demand data. This advancement is significant because it challenges the existing approach, where entropy measures are applied without sufficient consideration of their limitations.

By exposing a key flaw in SampEn (MulSampEn), its reliance on the tolerance parameter ( $r$ ), which inflates with increased standard deviation during structural breaks and leads to artificially high predictivity estimates, this work advances the Information Theory perspective on the robustness and reliability of entropy measures.

The research contributes to theoretical development by showing how intrinsic predictivity varies with data length and temporal scale. Theoretical frameworks in Information Theory and Dynamical Systems Theory assume that predictivity is tightly linked to structural order (Boffetta et al., 2002). This study confirms that predictivity tends to stabilize and improve with longer time series and coarser aggregation (e.g., quarterly or half-yearly), thereby implying that longer and coarser data enhance the underlying structural order.

The study shows that the effectiveness of entropy measures depends on the system’s context, thereby deepening our understanding of their appropriate application and reinforcing their theoretical importance as means for assessing intrinsic predictivity. This contributes to the theoretical role of entropy measures not only as a descriptor of complexity but also as a predictor of forecasting potential, making them much more useful for real-world planning and decision-making in complex environments.

Practical Implications

The practical implications of this study are particularly relevant for tourism analysts, policy-makers, and business strategists who rely on accurate and resilient forecasting to support planning, investment, and operational decisions. One of the most persistent challenges in tourism demand forecasting lies in determining the appropriate data length and temporal scale for model input (Zhang et al., 2021a, 2021b), a decision that significantly affects predictive performance.

The study’s findings provide concrete guidance on this issue. By empirically demonstrating that longer time series and coarser temporal scales (e.g., quarterly or half-yearly) yield higher intrinsic predictivity, the results suggest that practitioners can improve forecast accuracy by aggregating the data to a coarser scale. In practice, this insight helps streamline the forecasting pipeline by enabling more informed decisions about how to prepare input data to maximize predictive potential.

In volatile environments, such as during a global pandemic, the study demonstrates that WPE responds effectively to structural breaks. By adjusting WPE parameters, particularly the delay parameter, to reflect shifts in the data’s underlying structure, practitioners can better capture the changes in the intrinsic predictivity. This is especially critical in real-world forecasting contexts, where abrupt changes in travel restrictions, health policies, or geopolitical events can drastically alter tourism demand patterns.

The validation of WPE-based predictivity metric against realized predictivity (RMSPE) confirms its effectiveness. Unlike SampEn (MulSampEn), which may misleadingly indicate high predictivity during volatile periods, WPE provides a more truthful reflection of forecasting difficulty. This capability is invaluable for risk management and model selection. Forecasting models that are guided by WPE-based predictivity assessments are more likely to deliver robust performance, especially in high-uncertainty scenarios. Additionally, WPE’s computational efficiency and relatively simple parameter tuning, compared to other entropy measures, facilitate easier adoption by the industry practitioners.

Overall, this research introduces a superior metric for assessing the intrinsic predictivity of tourism demand. By integrating a theoretically grounded and empirically validated metric, this study not only advances the understanding of tourism demand predictivity but also equips practitioners with a more adaptive and robust tool for navigating uncertainty in a rapidly evolving global landscape.

Conclusion

Forecasting tourism demand plays a pivotal role in guiding strategic planning, resource allocation, and risk mitigation for governments and tourism-related businesses (Li et al., 2023). However, the volatility and complexity of tourism data, especially during external shocks like the COVID-19 pandemic, make it difficult to assess the theoretical limits of forecasting accuracy. Accurately quantifying intrinsic predictivity is therefore essential to improve the reliability of forecasting models under both stable and disrupted conditions.

This research was motivated by the limitations of existing entropy measures, particularly SampEn and MulSampEn. There was a clear need for a more robust, interpretable, and sensitive metric capable of accurately assessing the intrinsic predictivity of tourism demand data, especially under conditions of volatility and sudden change.

The key contribution of this study is the introduction of a WPE-based predictivity metric for quantifying the intrinsic predictivity of tourism demand with greater reliability. Empirical results demonstrate that the proposed metric is more sensitive to structural changes and more strongly correlated with the realized predictivity compared to the predictivity metric based on SampEn (MulSampEn). The study also reveals the impact of data characteristics, specifically verifying that coarsely scaled data increases predictive potential. These findings validate the effectiveness of WPE as a robust entropy measure for assessing the intrinsic predictivity of tourism demand data.

Despite these contributions, this study has several limitations:

1. The analysis is restricted to the COVID-19 pandemic as a representative external shock. While this event provides a relevant and impactful case study, broader generalizations to other types of disruptions require further investigation to assess the robustness of the proposed metric.

2. The empirical analysis is based solely on tourism demand data from a single country (Australia). Although Australia’s state level tourism demand presents a diverse context, the geographic specificity may limit the generalizability of the findings.

3. The forecasting component focuses on univariate time series, potentially overlooking multivariate dependencies. Therefore, the maximum achievable forecasting accuracy may not have been achieved, and the relationship between intrinsic and realized predictivity may remain partially unverified.

Therefore, future research could extend the analysis to other types of disruptions to assess the generalizability of the predictivity metric (WPE) in capturing structural volatility beyond the COVID-19 context. As well as its broader application to a wider range of tourism-related time series beyond international tourist arrivals. Expanding the study to include tourism demand data from multiple countries would enhance the external validity of the findings and uncover potential regional differences in predictivity behavior. Incorporating forecasting models such as CIR# could offer a more complete understanding of the link between intrinsic and realized predictivity.

Footnotes

ORCID iD

Gang Li

Ethical Considerations

This article does not contain any studies with human or animal participants.

Author Contributions

• Rauan Kyrykbayev: Conceptualization, Methodology, Writing—Original Draft, Formal analysis, Visualization, Validation

• Yishuo Zhang: Methodology, Writing—Original Draft, Formal Analysis, Validation

• Xin Li: Writing—Reviewing and Editing

• Tianqing Zhu: Writing—Reviewing and Editing

• Gang Li: Supervision, Writing—Review and Editing, Investigation, Data Curation, Project administration.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was partly supported by a research grant funded by the Deakin University.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Author Biographies

Rauan Kyrykbayev, His research interests are tourism and hospitality management, and data science.

Yishuo Zhang, Ph.D. His research interests are tourism and hospitality management, and data science and AI.

Xin Li, Ph.D. Her research interests are big data analytics, econometric modelling, data mining and forecasting.

Tianqing Zhu, Ph.D. Her research interests are data privacy, technolology applications.

Gang Li, Ph.D. His research interests are technology applications to tourism and hospitality, data science, and business intelligence.

References

ABS, Australian Bureau of Statistics . (2024). ‘Table 11: Short-term movement, visitors arriving - intended State of stay: Original’ [time series spreadsheet]. Australian Bureau of Statistics.

Adelyanov

A. M.

Generalov

E. A.

Zhen

Yakovenko

L. V.

(2024). Using entropy in time series analysis. Moscow University Physics Bulletin, 79(4), 415–425. https://doi.org/10.3103/s0027134924700607

Bandt

Pompe

(2002). Permutation entropy: A natural complexity measure for time series. Physical Review Letters, 88(17), 174102. https://doi.org/10.1103/physrevlett.88.174102

Barà

Pernice

Catania

C. A.

Hilal

Porta

Humeau-Heurtier

Faes

(2024). Comparison of entropy rate measures for the evaluation of time series complexity: Simulations and application to heart rate and respiratory variability. Journal of Applied Biomedicine, 44(2), 380–392. https://doi.org/10.1016/j.bbe.2024.04.004

Boffetta

Cencini

Falcioni

(2002). Predictability: A way to characterize complexity. Physics Reports, 356(6), 367–474. https://doi.org/10.1016/s0370-1573(01)00025-4

Bufalo

Orlando

(2023a). Improved tourism demand forecasting with CIR# model: A case study of disrupted data patterns in Italy. Tourism Review, 79(2), 445–464. https://doi.org/10.1108/TR-09-2022-0467

Bufalo

Orlando

(2023b). Time series forecasting with the CIR# model: From hectic markets sentiments to regular seasonal tourism. Technological and Economic Development of Economy, 29(4), 1216–1238. https://doi.org/10.3846/tede.2023.19337

Denœux

Masson

Hébert

(2005). Nonparametric rank-based statistics and significance tests for fuzzy data. Fuzzy Sets and Systems, 153(1), 1–28. https://doi.org/10.1016/j.fss.2005.01.008

Fadlallah

Chen

Keil

Príncipe

(2013). Weighted-permutation entropy: A complexity measure for time series incorporating amplitude information. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 87(2), 022911. https://doi.org/10.1103/physreve.87.022911

10.

Garland

James

Bradley

(2014). Model-free quantification of time-series predictability. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 90(5), 052910. https://doi.org/10.1103/physreve.90.052910

11.

Kovantsev

Gladilin

(2020). Analysis of multivariate time series predictability based on their features. In 2021 International Conference on Data Mining Workshops (ICDMW) (pp. 348–355). IEEE.. https://doi.org/10.1109/icdmw51313.2020.00055

12.

Law

Fong

D. K. C.

Han

(2019). Tourism demand forecasting: A deep learning approach. Annals of Tourism Research, 75, 410–423. https://doi.org/10.1016/j.annals.2019.01.014

13.

Zhang

Wang

(2023). Forecasting tourism demand with a novel robust decomposition and ensemble framework. Expert Systems with Applications, 236, 121388. https://doi.org/10.1016/j.eswa.2023.121388

14.

Pan

Takefuji

(2025). Enhancing heart disease feature analysis with spearman’s correlation with p-values. International Journal of Cardiology, 430, 133207. https://doi.org/10.1016/j.ijcard.2025.133207

15.

Pennekamp

Iles

A. C.

Garland

Brennan

Brose

Gaedke

Jacob

Kratina

Matthews

Munch

Novak

Palamara

G. M.

Rall

B. C.

Rosenbaum

Tabi

Ward

Williams

Petchey

O. L.

(2019). The intrinsic predictability of ecological time series and its potential to guide forecasting. Ecological Monographs, 89(2), Article e01359. https://doi.org/10.1002/ecm.1359

16.

Pham

T. D.

Dwyer

Ngo

(2021). COVID-19 impacts of inbound tourism on Australian economy. Annals of Tourism Research, 88, 103179. https://doi.org/10.1016/j.annals.2021.103179

17.

Polyzos

Samitas

Spyridou

A. E.

(2020). Tourism demand and the COVID-19 pandemic: An LSTM approach. Tourism Recreation Research, 46(2), 175–187. https://doi.org/10.1080/02508281.2020.1777053

18.

Richman

J. S.

Lake

D. E.

Moorman

J. R.

(2004). Sample entropy. Methods in Enzymology, 384, 172–184. https://doi.org/10.1016/s0076-6879(04)84011-4

19.

Richman

J. S.

Moorman

J. R.

(2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology - Heart and Circulatory Physiology, 278(6), H2039–H2049. https://doi.org/10.1152/ajpheart.2000.278.6.h2039

20.

Riedl

Müller

Wessel

(2013). Practical considerations of permutation entropy. European Physical Journal: Special Topics, 222(2), 249–262. https://doi.org/10.1140/epjst/e2013-01862-7

21.

Song

Blumm

Barabási

(2010). Limits of predictability in human mobility. Science, 327(5968), 1018–1021. https://doi.org/10.1126/science.1177170

22.

Song

Cai

(2022). Tourism forecasting competition in the time of COVID-19: An assessment of ex ante forecasts. Annals of Tourism Research, 96, 103445. https://doi.org/10.1016/j.annals.2022.103445

23.

Song

Qiu

R. T.

Park

(2019). A review of research on tourism demand forecasting: Launching the annals of tourism research curated collection on tourism demand forecasting. Annals of Tourism Research, 75, 338–362. https://doi.org/10.1016/j.annals.2018.12.001

24.

The Department of Home Affairs . (2022). All COVID-19 border restrictions to be lifted. https://minister.homeaffairs.gov.au/ClareONeil/Pages/covid-border-restrictions-to-be-lifted.aspx

25.

H. Q.

Song

Law

(2025). Exploring emotional aspects of travel concepts via travel photos based on contrastive language-image pretraining. Tourism Management, 108, 105117. https://doi.org/10.1016/j.tourman.2024.105117

26.

Webber

Moffat

Zobel

(2010). A similarity measure for indefinite rankings. ACM Transactions on Information Systems, 28(4), 1–38. https://doi.org/10.1145/1852102.1852106

27.

Xia

Shang

Wang

Shi

(2015). Permutation and weighted-permutation entropy analysis for the complexity of nonlinear time series. Communications in Nonlinear Science and Numerical Simulation, 31(1–3), 60–68. https://doi.org/10.1016/j.cnsns.2015.07.011

28.

Yin

Yue

Zhou

(2019). On predictability of time series. Physica A: Statistical Mechanics and Its Applications, 523, 345–351. https://doi.org/10.1016/j.physa.2019.02.006

29.

Yao

Essex

Davison

(2004). Measure of predictability. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 69(6), 066121. https://doi.org/10.1103/physreve.69.066121

30.

Zhang

Muskat

Law

Yang

(2020). Group pooling for deep tourism demand forecasting. Annals of Tourism Research, 82, 102899. https://doi.org/10.1016/j.annals.2020.102899

31.

Zhang

Muskat

H. Q.

Law

(2021a). Predictivity of tourism demand data. Annals of Tourism Research, 89, 103234. https://doi.org/10.1016/j.annals.2021.103234

32.

Zhang

Muskat

Law

(2021b). Tourism demand forecasting: A decomposed deep learning approach. Journal of Travel Research, 60(5), 981–997. https://doi.org/10.1177/0047287520919522