Abstract
We investigate if and how climate indicators and web-traffic data may improve the estimates of demand functions’ parameters, considering specific origins and destinations. Overall, augmented demand functions show better fit and more reliable price and income elasticities whether the demand is measured with arrivals or with overnights. However, heterogeneity stemming from the main type of tourism (business vs. cultural vs. sea and sun) affects both the web-based and the climate indicators better describing tourists demand as well as their optimal lags. Our findings highlight the utility of such prompt and territorial detailed information for local policymakers, showing, however, how sensitive different demand segments are to policy intervention.
Keywords
Introduction
Fragmentation of the tourism product (Haugland et al., 2011) and heterogeneity of local suppliers (Sainaghi and Baggio, 2014) are two relevant characteristics of the tourism industry, especially in countries like Italy where the diversification of tourism experiences is very high, as the local communities and their territories (physically and culturally) are usually an essential part of the visitor experience (Goffi and Cucculelli, 2018; OECD, 2011; Presenza et al., 2013). This fragmentation mirrors a strong dependence of the tourism industry on territorial-specific natural and cultural characteristics, attractions and products (Sainaghi and Mauri, 2018), often leading to fierce competition between ‘small’ destinations (Buhalis, 2000). Thus, local destination management organizations – which are in charge of coordination of promotional activities as well as the maintenance and management of cultural and natural attractions (Andergassen et al., 2013; Andergassen et al., 2017) – often call for prompt and spatially detailed statistical information needed to monitor the dynamics of tourism demand (Bornhorst et al., 2010; Sainaghi, 2006).
Unfortunately, the official statistical systems hardly produce tourism information that can be exploited for policymaking purposes. The current directive for tourism statistics (UNWTO, 2010) recommends the collection of monthly data on inbound and outbound tourism, at regional (NUTS2) detail. More punctual information (i.e. at provincial or municipal level) is published annually – monthly only in some countries –with a time delay that in the in the more recent Italian official survey ranges from 11 months to 23 months (ISTAT, 2018). This ‘information gap’ also regards tourism determinants like income and price levels: Gross domestic product (GDP) is available for NUTS2 areas at quarterly frequency, while price indices are available as a prompt monthly index number whose representativeness regarding tourists’ purchasing power, however, is often questioned (Dwyer and Forsyth, 2011).
Many scholars have recently worked on the idea to bridge this gap by using Big Data (BD), retrievable from the web almost in real time and with the granularity level needed by local policymakers (Mariani et al., 2018; Smith, 2016; Song and Liu, 2017). Specifically, web search queries and climate statistics have been exploited to augment the micro-founded econometric specifications of tourism demand models that embody prices and income as main explanatory variables (Li et al., 2017b; Zhang and Kulendran, 2017). The main goal of these and other recent papers is the enhancement of forecasting accuracy, with noticeable results (Pan et al., 2012). However, we expect that the inclusion of non-economic data retrieved from the web might also affect the estimate of price and income elasticities, as well as the dynamics of the demand function. The assessment of whether these augmented models are robust in terms of economic interpretation could thereby increase their appeal among scholars and practitioners.
The present article exactly focuses on this aspect. Accordingly, we aim to shed light on the effects of timely and territorially detailed information (mainly coming from Google Trends (GT) and Composite Climate Indices (CCIs)) on tourism demand models’ dynamic features, lag dependence structure and elasticities to price and income. We acknowledge that it is hard to derive a general conclusion, as many factors influence the econometric relationship between tourism demand and exogenous variables (Peng et al., 2015). However, in this article, we control for some of the factors at play by modelling monthly tourism demand (measured in terms of arrivals or overnight stays) in the Italian cities of Catania, Florence and Milan (micro-destinations, representative, respectively, of the leisure, cultural and business tourism segments). We also distinguish between two origin markets (Germany and the United Kingdom) that differ in terms of currency used. Moreover, we test different hypotheses about dynamics (in terms of seasonality and trend–cycle patterns) and lag/dependence structure. That way, local decision makers are provided with robust empirical evidence about the possibility to exploit web-based and climate information to estimate the responsiveness of tourism demand and to better understand how sensitive different segments are to policy intervention.
Within such a framework, we pose three research questions: Are elasticities to price and income robust to the inclusion in the model of complementary variables related to Internet searches and climate conditions? Are dynamics and lag/dependence structures of the augmented models robust across destinations, origin markets and measures of tourism flows? Is there a ‘best’ Google indicator to be included in the estimation or do tourists in each specific origin/destination combination leave a different fingerprint on the web?
To achieve our research goals, we structure the article as follows. The second section reviews the recent literature on the estimation of tourism demand and the use of BD. Third and fourth sections, respectively, introduce the model specification and the data. Results are presented in the fifth section, while some concluding remarks with useful suggestions for local policy makers are sketched in the last section.
Literature review
The literature on tourism demand rarely considered high territorial granularity and high-frequency temporal data (Song et al., 2009), although tourism is a complex phenomenon where seasonality and specific territorial characteristics (i.e. local attractions) have a dominant role as demand determinants (Smeral, 2014; Gunter and Onder, 2015; Vu and Turner, 2006). Among others, Smeral (2017) lamented the unavailability of exogenous information at the desired disaggregated level as one of the main limitations for advancement of quantitative analysis in the tourism field. Since the 1970s, the modelling of tourism demand has been characterized by time series analysis dominated by seasonally integrated autoregressive moving-average (SARIMA) models, sometimes with the inclusion of generalized autoregressive conditional heteroskedasticity (GARCH) effects or long memory features (Gil-Alana, 2005; Song and Li, 2008).
However, studies where time series models are augmented with economic determinants (mainly income and relative prices) are also popular. In this context, findings generally state that tourism is a luxury good (Crouch, 1994; Munoz, 2007; Peng et al., 2015; Smeral, 2017). Long-haul tourism displays a relatively higher elasticity to income because of the more exotic and unique features and the lack of available substitutes (Peng et al., 2015; Schiff and Becken, 2011). However, income elasticity less than one is also present in the literature and might be explained by some ‘necessary’ short-haul international trips (Fuleky et al., 2014). Income elasticity may also vary over time in line with changes in the macroeconomic environment and/or with structural changes in consumer behaviour, leading Smeral (2017) to suggest that tourism could no longer be considered a luxury good. On the contrary, negative income elasticities for ‘inferior’ destinations are rarely reported (Crouch, 1996).
With regards to price, as expected, own price elasticity is often found to be negative, but the magnitude varies considerably depending on the type of tourism and the time span under consideration (Song et al., 2010). Peng et al. (2015) pointed out that in destinations with less substitutes, price competition tends to be less intense, revolving in a lower sensitivity to price. Theoretical exceptions to the law of demand, leading to a positive price elasticity, are possible if the change of good’s price has such a strong impact on purchasing power that it causes a radical change in people’s whole pattern of consumption (Crouch, 1992). Sensitivity of demand to price has increased over time due to the major reductions in transport costs (air fares) and to increased competition between destinations (Crouch, 1994) although some market segments, such as business tourism, are less sensitive. Moreover, a prolonged period of low inflation, like the one experienced by European countries in recent years, is likely to boost price elasticity. It is also found that tourists tend to be more aware of exchange rate changes before they travel rather than inflationary effects in the destination they plan to visit (Peng et al., 2015). As researchers can easily access exchange rate data, this variable is sometimes recognized as the best proxy for price dynamics in tourism demand models (Song and Li, 2008).
The impact of economic variables on tourism demand is moderated by other factors. Song et al. (2010) and Martins et al. (2017) found that tourism arrivals are mainly influenced by income, while tourism expenditure and overnight stays are more affected by the real exchange rate. Crouch (1996) highlighted that estimated elasticities increase with data frequency. Moreover, the lag length of the dependent variable is likely to be associated with a larger absolute value of own-price elasticity (Peng et al., 2015), while no significant difference appears on estimated income elasticities. They also show that the inclusion/omission of other explanatory variables (and the way they are measured) significantly affects the result.
The measurement bias in income and price is often recalled as a limit to the possibility of setting up a reliable intertemporal relationship with the dependent variable (Song and Li, 2008). Moreover, the potential interdependence between income and price generates a bias in the estimates of income and price elasticities (Peng et al., 2015; Seetaram et al., 2016). From an econometric perspective, both measurement error and collinearity between variables generate an omitted variable bias that is theoretically and empirically recognized as a source of non-spherical models’ residuals (Lim, 1997) and low forecasting accuracy (Athanasopoulos et al., 2010). This bias increases with the spatial and temporal detail of the analysis, as scholars are forced to proxy local dynamics with national data (Gunter and Onder, 2015). The picture looks even worse when possible measurement errors in the dependent variable (arrivals or overnights) are considered (Guizzardi and Bernini, 2012).
To tackle the problems mentioned above, researchers have attempted to introduce non-economic variables in economic models. Ettredge et al. (2005) were the first authors to use Google index to analyse the dynamics of the unemployment rate. In the field of tourism, BD are used to improve knowledge about (potential) demand and tourism businesses’ target markets (Song and Liu, 2017). However, a large part of the literature uses web-based information to improve forecasting accuracy. Choi and Varian (2012) in their seminal paper improved the forecasting accuracy of ARIMA models using travel-related Google search data. Pan et al. (2012) obtained the same results with multivariate ARMA, while Bangwayo-Skeete and Skeete (2015) implemented an autoregressive mixed-data sampling (AR-MIDAS) regression. Rivera (2016) used GT data in a dynamic linear model to forecast arrivals in Puerto Rico, while Gunter and Onder (2016) were among the few authors focussing on micro areas (the city of Vienna).
Another strand of literature focuses on the most effective ways to include BD in the econometric model to limit issues of overfitting and multicollinearity. We recognize three main statistical approaches (Li et al., 2017a): the principal component analysis (Li et al., 2015), data shift and summation of different types of search query data, paying attention to the indicators’ lag orders (Yang et al., 2015), and the generalized dynamic factor models (Li et al., 2017b).
Finally, a different source of information largely considered by researchers is represented by climatic factors and weather perception (Jeuring, 2017). Both indicators are included either as singular factors (e.g. Falk, 2013) or as composite indicators. A noticeable example is the Tourism Climatic Index (TCI) developed by Mietczkowski (1985) that considers temperature, humidity, rainfall, hours of sunshine and wind speed. Goh (2012) introduced the TCI to analyse international tourism demand in Hong Kong, reporting a stronger impact of climate conditions when the distance between origin and destination countries increases. Similarly, Li et al. (2017a) defined a relative climate index, based on the comparison of TCI between the destination and the origin countries. Zhang and Kulendran (2017) introduced a CCI where climatic variables in the destination and in substitute countries are weighted by the impact and the volatility of each component.
To the best of our knowledge, climate and search queries data have never been considered together to increase the consistency of income and price elasticities estimation in baseline tourism demand models.
The model(s) specification
We start from the following general specification of the demand model:
where D is the log-difference in tourism demand between same months in subsequent years and
As a proxy for price competitiveness, we use the ratio between CPIs in the destination and in the origin countries. To improve the fitness, CPIs at the regional level are preferred to the national CPI. We discard the CPIs for the accommodation sector as they provide less significant estimates and a worse fit. Price levels for substitute destinations are not included in the model because of the theoretical and practical difficulties in defining competing destinations for ‘micro’ destinations (Dogru et al., 2017).
The estimation of equation (1) is repeated for different market segments, seasonality patterns and time-dependence structures, as follows.
Different demands (tourism products)
It is well known (Witt and Witt, 1995; Crouch, 1996) that elasticities are strongly influenced by market segments. Thus, we disentangle the analysis by considering three destinations and two origin markets. The three cities are representative of different mix of tourism: Florence, a worldwide known art city and cultural destination, is mainly an attraction for cultural tourists (Melotti, 2018); Milan, another art city which is also the Italian business and fashion capital (Sainaghi et al., 2018); Catania, a southern destination offering the typical mix between art and Sea & Sun that characterizes most part of Southern Italy (Cuccia and Rizzo, 2011). Demand from Germany and the United Kingdom are studied: they were chosen as they are among the top five incoming markets for all three destinations; moreover, they adopt different currencies (Euro and British Pound) allowing to test a possible ‘exchange rate effect’. We consider both monthly overnight stays and arrivals as alternative dependent variables. We cannot include tourism expenditure into the analysis due to the unavailability of this information at such fine grain territorial level (NUTS3).
The non-economic determinants: Climate indices and search queries
Information about climate in the destination is included as a CCI following Zhang and Kulendran (2017):
where wl
is the correlation of each climate indicator
Search queries data are obtained from GT as in Bangwayo-Skeete and Skeete (2015). Specifically, searches were defined using GT filters ‘Category’, ‘Region’, ‘Time Interval’ and ‘Web Search’ (see the guide at https://support.google.com/). The relative volume of searches containing the name of the three cities (i.e. ‘Catania’, ‘Florence’ and ‘Milan’) in the category ‘Travels’ (and in the subcategories ‘Hotels & Accommodation’, ‘Beaches & Sea’ and ‘Historical Sites & Buildings’) were considered (hereafter: Travel, Hotel, Beaches and Art). Queries were filtered by each origin country (looking specifically at German and UK online searches through the filter ‘Region’). Collected monthly data include searches returned on all the Google’s search tools (e.g. Google Images, Google News, etc., defined by the filter ‘Web Searches’). We expect this indicator to be positively correlated with tourism demand.
Different endogenous and seasonal dynamics
In line with the most common approach in the literature (see Athanasopoulos et al., 2010), we model the residual temporal dynamics – that is, after seasonal differencing – through two autoregressive components at lags 1 and 12. As literature does not offer a clear evidence that seasonality is always stochastic (Coshall, 2005; Sainaghi and Baggio, 2017), we also introduce a deterministic component given by a set of monthly dummies.
Different lag structures for the exogenous variables
For both dependent variables (arrivals and overnights, each measured for the six combinations of origins/destinations studied), we estimate baseline models (i.e. without GT and CCI) including income and price variables at the same lag (0, 1, 2 or 12 months) and augmented models which include the CCI (at lag 0, 1, 2, 3 or 12 months) and one among the four GT indices (at lag 0, 1 or 2 months). We lead the specification process by the Akaike information criterion (AIC).
The wide set of alternatives described above are represented in the following generalization of equation (1).
where
We estimate augmented and baseline (i.e. with
The data
Monthly data on arrivals and overnight stays for each destination disaggregated by origin market for the period January 2004 to December 2014 were provided by ISTAT, the Italian Statistics Office, on May 2017, under fee payment. Monthly demand is characterized by positive trend and seasonal dynamics for each combination of origin/destination under consideration (see Figure 1). British tourists show a single summer peak, while Germans display two peaks in May (around Pentecost holidays) and September. Milan has the largest number of arrivals, Florence the largest number of overnights, while Catania has much smaller tourism flows.

Arrivals (upper panels) and overnight stays (bottom panels) for Germany (lhs) and the United Kingdom (rhs). lhs: left-hand side; rhs: right-hand side.
Other descriptive statistics are reported in Table 1. Catania records the largest coefficient of variation (CV), indicating a much stronger seasonality than Milan and Florence: the ratio between high and low monthly peaks is 1283% and 815% for German arrivals and overnights, respectively, and 623% and 583% for UK arrivals and overnights. The same ratios are much lower in Milan (126% and 133% for Germans and 135% and 145% for Britons), coherently with its business vocation (Sainaghi et al., 2018). Florence lies in between.
Descriptive statistics.
Note: CV: coefficient of variation; D-H: Doornik–Hansen.
** statistical significance at 5%.
* statistical significance at 10%.
We tested normality with Doornik–Hansen test, while seasonal and non-seasonal non-stationarity was analysed through Hylleberg, Engel, Granger and Yoo (HEGY) statistics, as in Gunter and Onder (2015). Monthly tourism flows display at least one unit roots at frequencies: 0,
Information on income and price levels was retrieved from Datastream. Monthly GDP was computed as the third part of the correspondent quarterly figure. Income was deflated by the CPI of the related source market
Climate data were obtained from the Operative Centre for Meteorology of the Italian Aeronautical Military Force. Hourly data for each destination were averaged along each month. Monthly rainfall (mm), temperature (C°) and humidity (%) were combined according to equation (2). As the climatic variables are composite indicators weighed by correlation indices, their summary statistics are not straightforward interpretable.
GT monthly data were available from January 2004: data were summarized as a (relative) index, ranging from 0 to 100, where the upper limit corresponded to the month when the largest number of queries was recorded. Thus, the CVs reported in Table 1 can be linked to the product seasonality. In fact, a large CV (hence a large variance and/or a low mean) implies that there are either many months in which the indicator is very low – that is, the interest of web-people is low – and a few spikes. The existence of GT time series with some zero values (corresponding to months where web search are – relatively – low) prevents to calculate statistics on unit roots. For the remaining indices, the HEGY test results show non-stationarity at least at one frequency.
Results
For each combination of origin/destination (2 × 3 = 6) and for both indices of demand (arrivals and overnight stays), we estimated and compared 24 baseline models (6 models – see equation (4) – with economic variables specified at lags 0, 1, 2 or 12). Then, for each of the 12 combinations of origin/destination and demand measure, we estimated 1440 augmented specifications: the 24 baseline models augmented by considering five different lags for CCI (at 0, 1, 2, 3 or 12 months) and three different lags for each of the four GTs (at 0, 1 or 2 months). Ordinary least square (OLS) estimation was performed through a Matlab routine which is available on request.
The large number of estimated parameters suggests presenting the distribution of price and income elasticities in the baseline and augmented models by always setting the non-significant coefficients equal to zero. Detailed estimates are provided only for the 12 best baseline and the 12 best augmented specifications, considering a penalized fit criterion (AIC). The best augmented models highlight some dynamic features of tourists’ decisional process for different combinations of origin/destination and arrivals/overnights.
Following Song et al. (2010), our benchmark is as follows: for income elasticity, the range from 0.54 to 5.4 for German tourists and from 0.48 to 6.02 for UK tourists; for price elasticity, the range from −7.4 to −0.18 for German tourists and from −9.9 to 0.95 for UK tourists. In what follows, results are presented for the three destinations separately and then some general conclusions are outlined.
Catania
Catania is a cultural city on the seaside and, considering its whole province, it can be considered as a “Sea & Sun” destination, cheaper (in terms of both cost of living and tourism prices) than Florence and Milan.
Figure 2 (left-hand side – lhs) shows the distribution of the elasticities in the 24 baseline models. Income (measured on the x-axis) has a positive coefficient, which is almost always statistically significant. Elasticities are within the range 2–4 for Germans and 2–5 for UK tourists, indicating that tourists perceive a holiday in Catania as a kind of luxury product: as Catania is not one of the most notorious destinations in Italy, inbound tourists are likely to travel there when they can afford an ‘extra’ holiday. As in Song et al. (2010), we find that arrivals are more sensitive to income than overnights.

Estimated elasticities for Catania. Germany (upper panels) and United Kingdom (lower panels). Baseline models (lhs) and augmented models (centre and rhs). lhs: left-hand side; rhs: right-hand side.
Both source markets analysed in this study are wealthy countries, where international travel is a common activity and thus tourists are expected to be less sensitive to price changes. Accordingly, the estimated price elasticities are very low in absolute figures (Figure 2, lhs, y-axis) compared to the values reported in the literature. Findings seem to confirm that tourists are more aware of exchange rate changes than inflation, as we find that price elasticities are significant only for UK arrivals.
The comparison between the baseline (left) and the augmented (centre) models highlights that spatially and timely detailed BD do not substantially alter the sign and the significance of the parameters estimated in the baseline models. There are only a few specifications (e.g. UK overnight stays) where positive and significant estimates of price elasticity appear. These specifications, however, have the worst statistical fit.
The best fit has been found when the Beach GT is considered: the own coefficient is almost always significant and is higher for UK tourists (Figure 2, right-hand side – rhs, y-axis). The CCI has a significant and positive coefficient only when arrivals are modelled: the straightforward interpretation is that climate is important for the decision to undertake the trip, not for the decision about the length of stay. Comparing Catania with the other destinations, we observe the lowest coefficients for GT and the highest for CCI, consistently with the idea that climate and weather expectations drive consumers’ choice in this “Sea & Sun” destination.
The comparison between the best baseline and augmented models (reported in Table 2) shows that the introduction of non-economic variables always improves the overall fit and diminishes the omitted variable bias, as signalled by better residual statistics. The dynamic structure envisages an AR(12) component with a negative sign, indicating that the annual adjustment of the demand to the long-term trend is ‘oscillating’. In the case of the German market, the positive AR(1) component represents a cyclical (infra-annual) correction in the return to equilibrium that could be read as the attitude of Germans to take holiday decisions in the short term. Accordingly, the economic variables in the model for Germany enters with the 0 lag, while in the case of the United Kingdom, the optimal lag is 2, suggesting that the decision to buy the holiday takes place around 2 months before the departure.
Best baseline and augmented models for Catania.
Note: CCI: Composite Climate Index; AIC: Akaike information criterion. Optimal lags in parenthesis.
** statistical significance at 5%.
* statistical significance at 10%.
‘Beaches’ is the more informative GT index for both markets confirming the Catania’s Sea & Sun vocation. Its coefficient is always significant and positive, with very short optimal lags. The best models for Germany include an optimal lag equal to 0 for GT, implying that Google is mostly used to search information on the destination just before the departure. Similarly, the optimal lag in the models for UK arrivals is 1, indicating that this GT index can be considered an important short-term leading indicator for both arrivals and overnights.
The coefficients of the CCI are positive but significant only when demand is measured through arrivals. Consistently with Zhang and Kulendran (2017), the CCI most correlated to the dynamics of demand has a short lag (
Florence
Florence is a worldwide cultural superstar. It is the capital of Tuscany, a region renowned for its countryside, medieval villages and Etruscan sites. Figure 3(lhs) shows that income (x-axis) almost always has a positive and significant coefficient in the 24 baseline specifications. Elasticities ranged from 1 to 2.5 for Germany and from 0.9 to 3.5 for the United Kingdom. Their values are lower than the corresponding benchmarks in the literature; due to the lack of similar substitutes, the holiday in Florence seems perceived as a ‘must be’ from wealthy countries like Germany and United Kingdom. However, while German arrivals and overnight stays have similar elasticities, British arrivals are much more inelastic than overnights. Arguably, income affects more the length of British holidays than the decision to travel in such a unique destination.

Estimated elasticities for Florence. Germany (upper panels) and United Kingdom (lower panels). Baseline models (lhs) and augmented models (centre and rhs). lhs: left-hand side; rhs: right-hand side.
Price elasticities (y-axis) are consistent with the law of demand for the UK market only; their low absolute values, compared to benchmark, support the idea that Florence is perceived as a ‘necessary’ destination, which visit is worth regardless the price. Concerning the German market, some specifications return positive price elasticities, a result not new for Italy (Konovalova and Vidishcheva, 2013) that we believe is driven by the inadequacy of CPI in representing the relative price of a holiday. This hypothesis is reinforced by the results obtained for the UK market (negative price elasticity), where the exchange rate is likely to increase the effectiveness in measuring relative prices between countries. A positive elasticity to price can also be connected to the Veblen effect, as reported in Crouch (1992) and the possibility that the income effect might outweigh the substitution effect cannot be excluded as a further explanation.
The comparison between baseline models (lhs) and augmented models (centre) signals that coefficients in the latter are distributed around their respective coefficients in the baseline models, although the estimated elasticities are slightly lower in absolute value, coherently with the increase in the number of regressors.
GT indices are important to explain demand dynamics also in Florence. In particular, the optimal GT is ‘Travel’ for Germans and ‘Art’ for the United Kingdom (Figure 3, rhs, y-axis). Coefficients are always significant and larger for Germans, especially when demand is measured by overnight stays. The CCI variable is significant only when arrivals are considered, supporting the idea that the climate is not a strong determinant factor for a cultural holiday.
The introduction of non-economic variables improves the overall fit of the models (see Table 3), as the AICs always decrease moving to the augmented specification. Autocorrelation and heteroscedasticity of the residuals cast doubts about the reliability of elasticity estimates and confirm the difficulties of CPI in summarizing the role of the relative cost of living for German tourists. On the contrary, the UK models (where the exchange rate brings additional information) show much better residuals statistics.
Best baseline and augmented models for Florence.
Note: CCI: Composite Climate Index; AIC: Akaike information criterion. Optimal lags in parenthesis.
** statistical significance at 5%.
* statistical significance at 10%.
The dynamic structure presents an AR(12) component with a negative sign, indicating that the annual adjustment of the demand to the long-term trend is counter cyclical. However, in the case of British tourists, the short-term adjustment to equilibrium is also driven by an infra-annual cycle, probably in association with a greater importance given to a ‘short-term’ word-of-mouth effect. Turning to the economic variables, both indices of demand are better explained when a lag of 2 months is considered, and the inclusion of web-based variables does not modify the optimal lag.
The best lag for the GT indicators is always 0, supporting the hypothesis that tourists leave significant fingerprints on the web in the proximity of departure. The relationship among web searches and tourism demand is particularly evident for Germans arrivals. The fact that ‘Travel’ is the category associated with the better specification confirms that the web is used for a general retrieve of information. On the contrary, British seem more aware about things to do and see in Florence, as the sub-indices ‘Art’ and ‘Hotel’ are associated with the best specifications.
The CCI is only significant in modelling the arrivals, not the length of stay. The optimal lag is 12 months for Germany and 3 months for the United Kingdom. Therefore, German seem rational, as they form their expectations based on the climate 1 year before, while British show an intermediate lag (3 months, similar to the case of Catania), that is difficult to explain, without assuming that tourists are more influenced by current news about climatological events when they book the journey than by climate statistics.
Milan
Milan is the economic capital of Italy, mainly a business and fashion destination. With direct flights to all the major cities in Europe, it is also less than 4-h drive from Southern Germany, for which it is certainly a short-haul destination.
Accordingly, Figure 4 (baseline model, lhs) shows insignificant elasticity to income (x-axis). The estimated price elasticities (y-axis) are only significant for the UK market, where the exchange rate variations allow a better description of the relative cost of living, but their range (from −0.3 to −0.7) confirms the low sensitivity of demand to price that is typical of business tourism.

Estimated elasticities for Milan. Germany (upper panels) and United Kingdom (lower panels). Baseline models (lhs) and augmented models (centre and rhs). lhs: left-hand side; rhs: right-hand side.
When spatially and timely detailed BD are included in the augmented models (Figure 4, centre), price elasticities for Germany sometimes become positive and significant. Dealing with business tourism, this evidence may relate to the increased profitability of doing business in a destination, which pushes prices up (Blake and Cortes-Jiménez, 2007). However, as both baseline and augmented specifications present residuals’ autocorrelation (see Table 4), we believe that the significant and positive price elasticities do not exclude that tourists may adjust their purchasing behaviour upon arrival at the destination to account for unexpected price levels (Seetaram et al., 2016).
Best baseline and augmented models for Milan.
Note: CCI: Composite Climate Index; AIC: Akaike information criterion. Optimal lags in parenthesis.
** statistical significance at 5%.
* statistical significance at 10%.
The comparison between the best models (Table 4) confirms that the introduction of GT and CCI improves the fit (the AIC decreases). Residuals statistics for the German models highlight a misspecification problem, which is much less serious in the UK case. Accordingly, UK price elasticities are negative and highly significant, confirming that the exchange rate is a better proxy for tourism price dynamics, especially in regime of low inflation. The optimal lag signals that the decision to visit Milan is expected to be taken with a short lag, of about 2-month advance.
The dynamic structure presents an AR(12) component with a negative sign, indicating countercyclical adjustment to the long-term trend. In the case of German tourism, the positive AR(1) component represents a month-to-month cyclical adjustment that could be associated with the need for repeat visits during, for example, the execution of a contract, something that is typical for a business destination.
The GT associated with the best fit is the subcategory travel. Coefficients are almost always significant and larger than in other destinations, especially for the UK segment (Figure 4, rhs). The optimal lag is 0, indicating that tourists are more interested to search for Milan in the proximity of the departure. The CCI variable is rarely significant and the coefficients are generally lower than in Catania and Florence. GT indices are hence the only informative variable for policymakers willing to monitor demand dynamics in a framework of business tourism which is, as expected, not affected by climate conditions.
Conclusions
The present work aims at assessing if and when spatial and timely detailed non-economic BD such as GT data and CCI embody useful information about tourism demand dynamics in small areas (local destinations), where accurate and timely official statistics are seldom available. As previous literature employs climate and search query variables separately, focusing on the enhancement of forecasting accuracy, our novelty is to jointly consider these variables and to pay attention to the robustness of the estimated elasticities and to their economic interpretation.
Our main conclusion is that augmenting the demand functions with non-economic geographically and timely detailed data does not substantially change the economic interpretation of demand models. Moreover, the penalized fit improves while an (albeit marginal) enhancement in the residuals autocorrelation and heteroscedasticity indicates more reliable estimates of price and income elasticities.
Additionally, previous literature mainly studies whole countries, regions or small groups of countries, thus ignoring the issue of heterogeneity between (micro) destinations. Therefore, should different tourism segments have different elasticities, such studies would estimate averages that would mislead ‘micro’ destinations’ policymaking. In our work, we tackle the issue of heterogeneity by disentangling the lag/dependence structure and the GTs and CCIs structure that better detect specific origin/destination combinations.
Our findings suggest that there are indeed relevant differences between local destinations, likely stemming from the main type of tourism that is hosted (business vs. cultural vs. sea and sun) and from the intrinsic characteristics of areas. Similar considerations also apply to the differences found when alternative origin markets were considered. This result is key from both the methodological and policy perspectives, as suggests the development and use of tailor-made models for each specific destination and the rejection of general models with common indicators and lag/dependency structures. The bottom line is that policymakers and researchers should adapt the baseline models to the specific combination of tourism, source market and destination under consideration, to capture the idiosyncratic characteristics of local tourism demand.
We now move to summarize the more specific results stemming from our case studies. First, we find positive and significant income elasticities, smaller in a ‘must see’ cultural destination as Florence than in a Sea & Sun destination like Catania, perceived as a more luxury ‘extra’ holiday. As expected, estimates of income elasticity are not significant for Milan, a business destination. Generally, the optimal lag structure for income is 2 months (United Kingdom in Catania and Florence, Germany in Florence), which is coherent with a choice pattern where the decision of travelling to such short-haul destinations is made with a short-lag. The optimal lag structure for Germany in Catania is 0, but the models with lag of 2 months, however, show very similar performances.
Second, relative prices, proxied by the ratio between CPI in origin and destination regions, are generally found insignificant in Catania, the Sea & Sun destination where the cost of living is relatively low compared to European standards. The estimated price elasticities are instead significant (and negative, as expected) in more expensive destinations like Florence and Milan. Coefficients are more significant for the United Kingdom than for German demand: this confirms that – at least in ‘micro’ destinations – tourists tend to be more aware of exchange rate changes rather than inflationary effects in the destination they plan to visit. No relevant difference is found between the models for arrivals and for overnight stays. Theoretically, the price level should be more relevant in explaining the length of stay than the arrivals, but our case studies were short-haul destinations, where the majority of trips are weekend breaks or short business trips.
Third, the inclusion of non-economic variables (GT and CCI) keeps the sign and the significance of the economic coefficients (income and price) quite stable. Nevertheless, moving from baseline to augmented specifications, we observe better overall fit and residuals statistics, suggesting a more reliable estimation of demand function’s parameters. This feature is very important when policymakers’ goal is to understand how sensitive different segments are to policy intervention. This is particularly true in the business travel destinations, where web search volume is often the only significant information, as climate indicators and price or income elasticities are rarely significant.
Fourth, in most of the specifications, the sign of the GT variable is positive and significant at very short lags. Optimal lag is 0 (or 1, in the case of UK tourists in Catania), highlighting that tourists mainly search for information about the destination and its attractions in the proximity of the departure. The result does not exclude that tourists may use the Internet to decide among alternative destinations but underlines the importance, for local stakeholders, of the web marketing activity in obtaining and sustaining competitive advantage.
Fifth, segments’ heterogeneity also plays a key role that policymakers cannot underestimate. The best GT index to describe tourists demand dynamics changes with the origin/destination combination. Specifically, the category ‘Travel’ is the most informative for Milan, a business destination, while the best indicator for Catania, a Sea & Sun destination, is ‘Beaches’. The case of Florence, a cultural destination, is emblematic for heterogeneity as the fingerprints of the two origin markets are better detected by the subcategories ‘Travel’ for Germans and ‘Hotel’ and ‘Art’ for British. It is hence straightforward to conclude that the high granularity of web-based information has to be exploited by econometric modelling and decision makers. On one hand, no simple recipe can be easily adapted to all markets but, on the other hand, the gain in terms of efficient use of information can be substantial.
Sixth, we find that CCI is significant (and with the expected positive sign) only in a few cases. In particular, it is never significant in Milan (as business tourism is less sensitive to climate), while in Florence and in Catania, it is significant only when demand is measured by arrivals. This is consistent with the assumption that climate conditions are linked to the decision-making process and to the trip motivations rather than to the length of stay. The optimal lag for CCI is seldom 12 (only for German tourists in Florence), the expected lag if tourists build their predictions on climate by looking at climatological statistics of the previous year. On the contrary, but consistently with Zhang and Kulendran (2017), the optimal lags for CCI to describe the demand dynamics often range from 1 month to 3 months. The substantial coincidences with the lag of income and price variables led us to assume that tourists are less concerned with the comparison of temperature, precipitation and humidity across the seasons and more influenced by climate conditions or by the climatological news when they book the journey.
This article has many limitations, as the topic tackled in this article is ever evolving: first, continuous improvements in the availability and accuracy of web-based geographically and timely detailed data call for further refinements in the model specifications that can be applied to study tourism demand in micro-destinations. For example, patterns of micro-seasonality due to demand variations in different days of the week have not been addressed, although their relevance for local tourism policy is high: the use of daily GT and CCI data might shed further light on this issue, perhaps if linked to other data sources (e.g. accommodation prices provided by online search engines and occupancy rates provided by STR). Second, as the possible combinations between the individual characteristics of the destinations, the several origin regions and the different tourism products are many, our article only analyses three destinations in one country: the application of our methodology to other case studies might lead to the identification of robust general findings and shed further light on the role played by search data and climate indicators, which could be useful for both forecasting and policymaking. Finally, as a further extension of our approach, innovation in data collection and organization might imply that, in the near future, also official economic data (e.g. income and prices) will be published by National Statistical Offices with increased granularity, with possibility to include them in augmented models.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
