Abstract
This study uses geotagged photos from Instagram to identify differences between the popular places in Vienna for residents and visitors. Moreover, we explore whether geotagged data can be useful in determining tourism demand in Vienna. The spatial analysis of 627,632 geotagged photos reveals the top-50 locations in Vienna for all-, local-, and visiting-Instagram users based on three popularity indicators (numbers of likes, comments, and photos). The results show that the top locations unique to local users are closely linked to activities residents usually pursue or location types they usually visit at their place of dwelling. In using geotagged photos to predict actual tourist arrivals to Vienna, we conclude that only the popularity indicators number of likes and number of comments based on the location ID “Vienna, Austria” for visitors to Vienna should be used and not the number of photos, since this indicator does not automatically generate engagement.
Keywords
Introduction
Whenever an individual uses the internet or any digital device for any purpose, they leave electronic traces behind. These traces are called digital footprints, which can be collected and analyzed for reasons such as improving services or products. Today, these type of data are also used to understand consumer behavior and to predict their behavior. For instance, analyzing customer clickstream data logs from web analytics tools reveals trails of customers’ online activities and purchasing patterns, which can then be used for product placement optimization, customer transaction analysis, and market structure analysis (H. Chen et al., 2012). Digital footprints form part of big data, which consist of large datasets that are impossible to analyze using traditional computer processing features or that just take a very long time to analyze. However, they are an invaluable source for understanding consumer behavior, such as shopping patterns or purchasing behavior (Song & Liu, 2017). Social media posts from Twitter, Facebook, or Flickr, which can all be considered big data, are used in tourism research as well.
Previous research, for instance, has established the importance of social media in decision making, information search, and promotion (Zeng & Gerritsen, 2014). On the other hand, destination management organizations are also interested in tracking visitors to see where they go, how long they stay, and what they think of the destination’s attractions (Miah et al., 2017). This type of georeferenced information contained in social media posts needs to be extracted and analyzed so that it can be used for decision making and predicting future visitor behavior by the supply side of the tourism industry. Georeferenced information is typically available in the form of geotags, which are attached as metadata to photos, videos, or other types of social media posts, and provide the geographical location in the form of latitudinal and longitudinal coordinates. This information gives researchers the opportunity to identify the geographical location of the user at the time a photo was taken. Although geotagged photos have been previously used in research, it is only recently that geotags in Twitter have been employed to identify the tourist flows in southern Italy (Chua et al., 2016).
Instagram is one of the most popular photo-sharing mobile applications in the world. There are estimated to be 500 million daily active Instagram users, who upload a combined total of more than 100 million photos and videos each day, thereby receiving 4.2 billion Instagram likes per day (Omnicore, 2018). Instagram photos belong to the category of geotagged photos, which include as metadata the aforementioned geographical location in the form of latitudinal and longitudinal coordinates, as well as location IDs, user IDs, time stamps, the numbers of likes and comments received, and other characteristics such as the users’ hashtags. This information can, for instance, be used to track where users were located at a given time. The reasons that the photo sharing platforms Flickr and Panoramio have been used for research include their popularity as well as the openly available data, which can be retrieved free of charge via the platforms’ application program interfaces (APIs).
Compared with Flickr and Panoramio, Instagram has become more popular due to its mobile application, which gives users easy access to upload photos directly from their mobile phones. Flickr has become a platform for professional photographers and for semiprofessional users to archive their photos and exhibit their work online, whereas Instagram is used by everyone. Moreover, Panoramio is not available anymore. Companies such as Picodash (https://www.picodash.com/) can also retrieve these data from the Instagram API for those users who have not set their profiles to “private.” Consequently, data cannot be retrieved for the entire population of Instagram users, but for a quite large sample. This was a primary source of data for the present study.
The spatial analysis of 627,632 individual geotagged photos purchased from Picodash reveals the top-50 locations in Vienna for all-, local-, and visiting-Instagram users based on three popularity indicators (numbers of likes, comments, and photos). When comparing the ranking of the top-10 locations (all users and in terms of likes) with the official data on visitor numbers to Viennese attractions for 2017, for five of the top-10 locations in the present sample (notably “Schloß Schönbrunn,” “Belvedere, Vienna,” “Wiener Staatsoper,” “Kunsthistorisches Museum Vienna,” and “Albertina Museum”) it can be concluded that these locations also appear among the actually most visited attractions of Vienna according to official statistics (Statistics Austria, 2017; official statistics were available only for these top-10 locations). This also corroborates the accuracy and, thus, the usefulness of the employed Instagram metadata. The differences in popular locations uncovered between the user categories were few, but quite remarkable. Based on global indices of spatial autocorrelation, it is concluded that Vienna’s city authorities may still want to improve their public transportation routes and schedules linking the large and distant top-50 locations in order to ensure sustainable mobility The temporal analysis, in turn, shows that the numbers of likes and comments associated with the posts of users visiting the location ID “Vienna, Austria” indeed constitute a predictor of actual tourist arrivals to Vienna. Thus, considering this information a-priori potentially leads to more accurate out-of-sample tourism demand forecasts.
The remainder of this study is structured as follows: Literature Review section provides an overview of the literature on the underlying economic theory serving as the study’s conceptual framework, on web-based leading indicators in tourism demand forecasting, as well as on geotagged photos in tourism research leading to the main contributions of the present study; The Data section presents and describes the data set; Spatial Dimension of the Data: Top-50 Instagram Locations In Vienna for All, Residents, and Visitors section explores the spatial dimension of the data in more detail; Temporal Dimension of the Data: Usability of Instagram Data to Forecast Actual Tourist Arrivals section explores the temporal dimension of the data in more detail; and finally, Conclusion and Policy Recommendations section draws overall conclusions and provides some policy recommendations.
Literature Review
Conceptual Framework
From a microeconomic theoretical perspective, possessing information on the current and predicted future behavior of local- and visiting-Instagram users could help alleviate asymmetric information on the part of both the city authorities and the destination management organization of Vienna (Mas-Colell et al., 1995). Knowing and acknowledging what Instagram users are interested in could assist in the selection of more appropriate policy measures, thus mitigating losses of efficiency and economic welfare, ranging from a unique Pareto inefficient market equilibrium, over multiple Pareto inefficient market equilibria, to the possibility of complete market failure without reaching any equilibrium at all (Mas-Colell et al., 1995). This would be the case with the presence of information not symmetrically distributed between local- and visiting-Instagram users on the one hand and the city authorities and the destination management organization of Vienna on the other. 1 In other words, possessing relevant information and being able to analyze and to interpret it correctly by applying an appropriate, structured big data analytics method for social media would support the destination management organization of Vienna (and also Vienna’s city authorities) in their strategic decision making (Miah et al., 2017).
Pareto efficiency (or Pareto optimality), in turn, would mean the ideal situation in which it is impossible to make any market participants better off without making other market participants worse off (Mas-Colell et al., 1995). This is an important characteristic of market structures since when they are not perfectly competitive due to the presence of asymmetric information, external effects, or other market distortions such as monopoly power (i.e., one or more market participants are price makers) or inappropriate taxation, they are not Pareto efficient. In more technical and compact terms, this relationship has been formulated as the First Fundamental Theorem of Welfare Economics. It reads as follows: If every relevant good is traded in a market at publicly known prices (i.e., if there is a complete set of markets), and if households and firms act perfectly competitively (i.e., as price takers), then the market outcome is Pareto optimal. That is, when markets are complete, any competitive equilibrium is necessarily Pareto optimal. (Mas-Colell et al., 1995, p. 308, italics as in original)
While ensuring a better access to decision-relevant information for the city authorities and the destination management organization of Vienna will not necessarily lead to the ideal state of Pareto optimality, it will certainly result in a Pareto improvement compared with a situation characterized by asymmetric information, as has also been shown in the literature (e.g., Arnott et al., 1994; Filipova-Neumann & Welzel, 2010).
From an empirical methodological perspective, the present study applies Stages 2 (i.e., Geographical Data Clustering; see Spatial Dimension of the Data: Top-50 Instagram Locations in Vienna for All, Residents, and Visitors section) and 4 (i.e., Time Series Modeling; see Temporal Dimension of the Data: Usability of Instagram Data to Forecast Actual Tourist Arrivals section) of the Geotagged Photo Analytics Artefact proposed by Miah et al. (2017). Using a design science approach, these authors develop a formal method for the analysis of big data obtained from social media consisting of four distinct stages. However, the applied stages had to be adapted according to the structure of the present data, as well as to the research objectives of this study.
Pertaining to the predicted future behavior of local- and visiting-Instagram users in particular, it has been noted by Song et al. (2009) that tourism demand is the basis of essentially all business decisions in tourism. Consequently, accurate tourism demand forecasts are indispensable in order to mitigate the risks inherent to the decision-making process of any market participant in the tourism industry (Frechtling, 2001). This includes Vienna’s destination management organization since accurate demand forecasts are crucial for its short-term operational and long-term strategic decisions. If the digital footprint of local- and visiting-Instagram users indeed constitutes relevant information by including it in the form of web-based leading indicators in the prediction of actual tourism demand (Chatfield, 2001), (potentially) resulting in more accurate tourism demand forecasts, both the city authorities and the destination management organization of Vienna are advised not to neglect this outcome in order to avoid selecting inappropriate policy measures and thus precluding the aforementioned negative implications on efficiency and economic welfare.
Web-Based Leading Indicators in Tourism Demand Forecasting
Following the notation by Chatfield (2001), an
One objective of this study (see Temporal Dimension of the Data: Usability of Instagram Data to Forecast Actual Tourist Arrivals section) is therefore to identify if the aforementioned three popularity indicators (numbers of likes, comments, and photos) are relevant elements of the information set
In economic forecasting, leading indicators are economic factors that change before the whole economy is changing its direction. This is why they are also helpful for predicting changes in the economy. As a result of the ubiquity of the Internet and the emergence of and access to Big Data, many web-based leading indicators such as Google Trends, Facebook likes, Twitter (re) tweets, Instagram shares, etc. can potentially be used to predict the future of the economy as a whole or of a specific economic sector. For instance, Askitas and Zimmermann (2009) indicate that there is a strong correlation between keyword searches in online search engines and unemployment rates, thus search queries are useful for forecasting macroeconomic indicators.
In tourism demand forecasting, similar types of web-based leading indicators have been used in previous studies. One of them are web search data from online search engines such as Google or Baidu, which have shown to be useful web-based leading indicators for predicting future tourism and hospitality demand (e.g., Bangwayo-Skeete & Skeete, 2015; Önder & Gunter, 2016; Yang et al., 2014). One example from social media are Facebook likes: Gunter et al. (2019) show that including Facebook likes and the combination of Facebook likes and Google Trends improve the forecast accuracy of tourist arrivals to the two Austrian cities of Salzburg and Vienna across forecast horizons and forecast accuracy measures. Although the present study is not a pure tourism demand forecasting study, the temporal dimension of the metadata of geotagged photos from Instagram is exploited to investigate if the present popularity indicators can serve as web-based leading indicators.
Geotagged Photos in the Tourism Literature
Geotagged photos have previously been investigated in tourism research for various reasons. These include creating a recommendation system to suggest places to visit to first time visitors (Mamei et al., 2010), identifying tourist movements (Girardin et al., 2008; Önder et al., 2016), classifying multi destination trips (Önder, 2017), identifying tourist attractions in Budapest (Kadar & Gede, 2013), identifying places visited, duration of stay, and panoramic spots within destinations (Popescu et al., 2009), identifying points of interests (Kisilevich et al., 2010), and creating dynamic maps based on user locations (W. Chen et al., 2009), as well as creating automated travel itineraries (De Choudhury et al., 2010).
Social media posts that include geotagged photos are relevant for tourism research in order to understand the movement patterns of individuals at a destination. According to Wong et al. (2017), tourism organizations should use geotagged photographs to understand tourist behavior and preferences since it is less costly than traditional data collection methods such as surveys and since more data can be collected using geotagged photos than using traditional methods. Due to the fact that geotagged photos have a time stamp indicating when the photo was taken, as well as the user ID of the person who uploads the photo online, the spatio-temporal movements of users can be identified. For instance, Zheng et al. (2012) retrieve data from various websites to piece together the travel patterns of different tourists using a Markov chain model, which proved successful in identifying travel patterns among the study sample. Hu et al. (2015) use Flickr photos to identify the points of interest in urban areas for 10 different cities over a 10-year period, thus showing how the interest in some places changes over time.
García-Palomares et al. (2015) look at geotagged photos from various European cities on Panoramio and analyze the differences between residents and tourists in terms of their spatial distribution in each city. These authors find that some cities, such as Barcelona and Rome, show a high concentration in some areas, whereas Paris and London show a more spatially dispersed distribution. Vu et al. (2015) investigate the movement patterns of domestic tourists in Hong Kong using geotagged photos from Flickr to identify areas of interest, travel patterns, and daily activities. In another study by Yuan and Medel (2016), geotagged photos are used to model international travel behavior and interregional travel flows. Recommendation systems based on geotagged photos have also been examined, for example by Jiang et al. (2013), who investigate how to identify tourist attractions and estimate the popularity of these attraction based on the number of geotagged photos.
It is the ready online availability of geotagged photos, such as those from Instagram, which yields advantages for research compared to data collected based on questionnaires. Not only is the data collection process faster, but it also results in more observations than traditional methods. This type of big data therefore has significant potential to improve knowledge in the tourism research domain, particularly by enabling new insights and understanding related to travelers’ behaviors.
Contributions of This Study
Thus, research using geotagged photos to discover tourist flows already exists and has demonstrated the efficacy of this approach, yet still remains somewhat scarce such that more research is necessary to explore its full potential. For instance, none of the existing studies have employed Instagram data to date, despite the fact that this platform has become more popular than Flickr or Panoramio. Thus, the use of Instagram data in the present research implies potentially greater coverage of the overall daytime population at the study location, including both residents and visitors. Geographically, the study focusses on Vienna, the capital city of Austria, which has been one of the top-15 cities in Europe in terms of overnights for several consecutive years with over 16 million overnights in 2017 (Rank 9 out of 15 in 2017), thereby being characterized by an average annual growth of 4.9% between 2013 and 2017 (European Cities Marketing & MODUL University Vienna, 2018), and therefore constitutes an important study object for tourism researchers. The spatial and temporal investigation of geotagged photos from Instagram for the city of Vienna is therefore one of the major contributions of this study, thereby filling a gap in the existing literature. Uncovering the differences between residents and visitors could also potentially help city authorities and the destination management organization to moderate crowds between attractions in hot-spot areas, as well as allowing them to rethink public transportation routes and schedules, especially for Vienna’s large and distant top-50 locations.
The main theoretical contribution of this study consists of taking the consequences of asymmetric information on current and predicted future behavior of residents and visitors into account as its conceptual framework, while applying an appropriate, structured big data analytics method for social media similar to the one suggested by Miah et al. (2017). The main methodological contribution of this research pertains to the application of the mixed data sampling (MIDAS) model class when assessing the data’s temporal dimension (see Temporal Dimension of the Data: Usability of Instagram Data to Forecast Actual Tourist Arrivals section), which still has not been widely used in tourism demand modeling and forecasting to date (see, e.g., Bangwayo-Skeete & Skeete, 2015; Gunter et al., 2019; Volchek et al., 2019, for some notable examples), thereby also going beyond the simple univariate time-series techniques proposed by Miah et al. (2017).
One further contribution of this study is that by using geotagged photos, the data on visitors cover not only information on tourists but also on excursionists, which are typically difficult and costly to measure in practice (e.g., by conducting surveys). This costly and difficult measurement of both residents and visitors also pertains to the attraction level in general since those attractions that do not sell tickets, such as parks, squares, viewpoints, and so on, typically do not provide any official statistics. In the present sample, this is relevant, for instance, for the top-10 locations (all users and in terms of likes) “Stephansplatz” and “Volksgarten,” but also for many others among the top-50.
The Data
Metadata on geotagged photos from Instagram were purchased on February 8th, 2018 from the company Picodash (https://www.picodash.com/). The sample employed in the present study includes observations for 627,632 individual geotagged photos with time stamps running from October 30th, 2011 (4:15:51 p.m.) until February 7th, 2018 (6:16:23 a.m.), which were extracted using Instagram’s API based on the hashtag “#Vienna” within the geographical boundaries of Austria’s capital city. Only metadata from those Instagram users whose profiles had not been set to “private” could be obtained (https://www.picodash.com/export-instagram-data).
While the original metadata also include qualitative information such as the photo’s caption text, all hashtags, linked users, as well links to the photo (and/or video) in various resolutions, to the user’s (full) name, to their profile picture, and so on, only the information necessary for the quantitative analyses following hereinafter were retained for this study. In particular, these include the photo ID, the user ID, the location ID, the location latitude and longitude, the time stamp, the number of likes, the number of comments, and the number of photos from a given location and/or by a certain user, which can be calculated using the photo IDs. The latter three variables serve as popularity indicators for the various locations visited. Unfortunately, the structure of the data (large spatial dimension and very limited temporal dimension, which can be considered only at monthly intervals because other time series crucial to the analysis are available only at a monthly frequency or lower) does not allow for the analysis of both dimensions jointly.
To descriptively explore the spatial dimension of the metadata, aggregates over the whole sample period per location ID were calculated for the numbers of likes, comments, and photos, while differentiating between all users, local users, and visiting users (i.e., excursionists and tourists). The following heuristic as suggested by Girardin et al. (2008) and Önder (2017) was employed to make this distinction: if the period between the first and last photo taken at the destination by a user, who is uniquely identifiable via their user ID, is up to 30 days, they are considered as visitors; if they upload photos from the destination to Instagram during a time span of more than 30 days they are considered as residents. This procedure resulted in the classification of approximately 17.85% of all user IDs as local users, thus visiting users dominate the study sample. These and the other calculations and illustrations in this study have been created with EViews Version 10 (mostly temporal analysis), Microsoft Excel 2013 (mostly data cleaning and organization), and Stata Version 11 (mostly spatial analysis).
Figure 1 and Supplement Figures 1 and 2 (available online) show the geographical dispersion of the number of likes (Figure 1), the number of comments (Supplement Figure 1, available online), as well as the number of photos (Supplement Figure 2, available online) for all users aggregated over the whole sample period. Since the somewhat generic location ID “Vienna, Austria” is the most frequently used location ID, it has been excluded from these figures in order to better visualize the differences in popularity between particular locations within Vienna. Each column in Figure 1 and Supplement Figures 1 and 2 (available online) represents a different location (ID), while the height of the columns corresponds to the number of likes, comments, or photos per location, respectively for each figure. While it can be seen from these figures that there are similar geographical patterns across variables, Figures 2 to 4 (to be explained in more detail below) show that “likes” exhibit higher frequencies per location than the other popularity indicators. All three indicators reveal that the top-5 Viennese locations (i.e., the five location IDs characterized by the highest columns) correspond to the well-known Viennese attractions (in descending order of popularity and using the original location IDs from Instagram): “Schloß Schönbrunn,” “St. Stephen’s Cathedral, Vienna,” “Stephansplatz,” “Belvedere, Vienna,” and “Rathaus, Vienna” (apart from the generic location ID “Vienna, Austria” as aforementioned).

Number of Likes (All Users) per Location ID Aggregated Over the Whole Sample Period (Each Column Corresponds to One Location ID)

Top-50 Location IDs in Vienna for All Users Aggregated Over the Whole Sample Period

Top-50 Location IDs in Vienna for Local Users Aggregated Over the Whole Sample Period

Top-50 Location IDs in Vienna for Visiting Users Aggregated Over the Whole Sample Period
To also descriptively explore the temporal dimension of the metadata in more detail, the individual time stamps were aggregated to daily values for the period March 2015 until June 2016 to generate daily values of number of likes, number of comments, and number of photos. Since the objective of the temporal component of this study is to explore if any of the three popularity indicators possess information valuable for forecasting future visitor numbers to Vienna, only the visitors, as defined above, were retained for this analysis. Some form of aggregation to a variable with a regular frequency is necessary for time-series analysis and forecasting, since typical multivariate time-series models cannot be properly employed for unequally spaced and sparse raw data.
While it would have been interesting to also perform the same type of analysis at the location level, official data on visitor numbers to attractions—if at all—is unfortunately only available at an annual frequency for Viennese attractions (Statistics Austria, 2017), which precludes the investigation of whether the popularity indicators (number of likes, comments, and/or photos) are indeed predictors of actual visitor numbers within multivariate time-series models. The limitation to the aforementioned period resulted from a visual inspection of the data, which showed that all three time series exhibit non-negligible structural breaks before and after this period, which are most likely due to frequent and continuing changes in the terms of use of Instagram’s API (https://www.instagram.com/developer/). In order to investigate the predictive capacity of the three popularity indicators for estimating visitor numbers to Vienna, these data are also needed. Total tourist arrivals to Vienna (domestic and foreign tourist arrivals to all paid forms of accommodation) were obtained from the TourMIS database (www.tourmis.info) for the period March 2015 until June 2016, and are only available at a monthly frequency. A differentiation per source market (e.g., domestic vs. foreign) is unfortunately not possible for the present data.
Supplement Figure 3 (available online) presents times-series graphs for all four variables (total tourist arrivals, likes, comments, and photos) for the period March 2015 until June 2016, whereby the time series for the popularity indicators are differentiated according to all location IDs (top lines) and the location ID “Vienna, Austria” only (bottom lines). This differentiation is included in order to investigate whether one of the two time series possesses more information relevant for forecasting than the other. Both the time-series graphs based on all location IDs and those based only on the location ID “Vienna, Austria” show quite a similar pattern over time. The dominance of all location IDs (top lines) over the location ID “Vienna, Austria” (bottom lines) is explained by the fact that the “Vienna, Austria” location ID only partly represents Instagram activity for Vienna as a whole. For legibility reasons, only data for the subsample March 2015 until June 2016 are displayed in Supplement Figure 3 (available online), but graphs for the entire sample (i.e., from October 2011 until February 2018, where the aforementioned structural breaks are clearly visible) are available from the first author on request.
Visual inspection of the monthly total tourist arrivals shows about 1.5 seasonal cycles characterized by summer peaks and winter troughs, but not a particularly pronounced trending behavior. On the other hand, quite pronounced upward trending behavior can be observed for all three popularity indicators, which (at least partially) reflects the increasing popularity of the Instagram platform. Distinct seasonal patterns are, however, not really detectable at the aggregate level. Since typical seasonal adjustment procedures such a moving average filters, Census X-12, Census X-13, or TRAMO/SEATS (EViews, 2017) are not feasible for variables with a higher than monthly frequency and require at least three to four years’ worth of monthly observations, none of the variables has been seasonally adjusted. What can also be seen are the aforementioned structural breaks in the data at the beginning of March 2015 and at the end of June 2016, which is why only this subsample can be used for further time-series analysis, thus leaving only 16 monthly observations.
This small monthly sample substantially restricts the applicability of typical time-series techniques (in particular out-of-sample forecasting) and requires highly parsimonious time-series modeling. While Augmented Dickey–Fuller tests reject their null hypothesis of the presence of a nonseasonal unit root for all popularity indicators, at least at the 5% significance level, the same null hypothesis cannot be rejected for total tourist arrivals, although based on a very low number of observations (the detailed test results are not presented here but are available from the authors on request). In studies using the same tourist arrivals data for Vienna over a longer period of time (e.g., Gunter et al, 2019), the null hypothesis of the Augmented Dickey–Fuller test is typically also rejected for this variable at a quite strict significance level. Consequently, also in light of an already quite small monthly sample and in order to preclude information loss due to overdifferencing (Hyndman & Khandakar, 2008; J. Smith & Yadav, 1994), all variables are still employed in levels in the further time-series analysis. Last, natural logarithms have not been taken of any of the variables since the purpose of the time-series analysis undertaken in this study is to see if any of the three popularity indicators possess information valuable for forecasting future visitor numbers to Vienna, but not, for instance, to make the estimated coefficients interpretable as demand elasticities.
Spatial Dimension of the Data: Top-50 Instagram Locations in Vienna for All, Residents, and Visitors
The possibility to differentiate between all users, local users, and visiting users (as mentioned in Literature Review), enables the exploration of the most popular locations in Vienna according to the number of likes, number of comments, and number of photos by user group (i.e., Stage 2 of the Geotagged Photo Analytics Artefact proposed by Miah et al., 2017). This procedure not only permits the identification of commonalities and differences in the popularity of Viennese attractions between residents and visitors, but also the discovery of so-called “hidden gems” beyond well-known public places, that is, places that may have not yet been on the (commercial) radar of the destination management organization (in the case of visitors), or the city authorities and the local business sector (in the case of both groups).
Figure 2 shows the top-50 locations (or more precisely: location IDs) ranked from 1 (most popular) to 50 (least popular) in terms of number of likes (in blue), number of comments (in green), and number of photos (in red) of all users, whereas Figure 3 shows the same content for local users only, and Figure 4 for visiting users only. The generic location ID “Vienna, Austria” is also given in the first rows of Figures 2 to 4 as “top-0.” For readability reasons only the top-50 locations are given, yet this list can still be considered comprehensive, since the popularity of locations decreases exponentially as one moves down the list. In total, the number of different location IDs in Vienna comprise 20,447, whereby residents frequent only 9,687 of these, whereas visitors attend 18,083, which already indicates some differences between the groups. Location IDs appearing in terms of comments or photos but not in terms of likes are highlighted in the second and third columns of Figures 2 to 4. Location IDs ranking in the top 50 in terms of likes that appear only for local users or for visiting users but not for all users all highlighted in the first columns of Figures 3 and 4.
Taking a closer look at Figure 2 reveals that the top-50 locations in Vienna for all users according to the number of likes mostly comprise its traditional attractions (e.g., churches, markets, museums, parks, secular historical buildings, squares, streets, viewpoints, zoos), one café/restaurant (“Café Central Wien”), one event location (“Wiener Stadthalle”), several hotels, but also some larger public recreational areas such as “Donauinsel,” “Donaukanal,” or “Prater.” All of these locations are well-known and are therefore not classified as “hidden gems.” Potentially, the larger public recreational areas could be of interest for marketing activities geared toward visitors on the part of Vienna’s destination management organization to better moderate the visitor crowds between attractions, since those public places (in particular: “Donauinsel” and “Donaukanal”) are mostly frequented only by residents at times without any specific events (as it also shows in the present sample) or are not explored in their entirety by visitors (notably: “Prater”), respectively. Therefore, these public places are typically recommended to visitors who still want to visit places in Vienna “off the beaten path” (see, e.g., Just a Pack, 2019; Quora, 2015; Spotted by Locals, 2016).
It is further worth noting that some locations possess several location IDs, such as “Schloß Schönbrunn” or “Schönbrunn Palace.” Potentially, Instagram users choose their location IDs according to their language preferences. While the rankings of the top-50 individual locations differ slightly according the popularity indicator (number of comments or number of photos), this difference is not substantial, especially for the top-5 to top-10 locations, as already mentioned in Literature Review. Only a few new locations appear when assessing these popularity indicators separately, mostly falling into the categories of traditional attractions, events (“Wiener Christkindlmarkt”), event locations, and hotels.
Apart from that, Vienna’s city authorities could also rethink public transportation routes and schedules in order to support the sustainable mobility of visitors between these popular places within the city, especially between (a) those places located within walking distance of or very close to Vienna’s city center and (b) those that are not, especially when they are large. 2 In particular, two public recreational areas (“Donauinsel” and “Prater”/“Wiener Prater”/“Prater, Wien, Austria”), while well connected to the city center by public transportation at large, the existing connections are limited to certain spots and thus public transportation is not present in all areas, which therefore remain largely unconnected internally. Given their size (the “Donauinsel” is approximately 21 kilometers long, City of Vienna, 2019a; the “Prater” comprises a surface of approximately 6,000,000 square meters, City of Vienna, 2019b), this could infringe the sustainable mobility within these recreational areas, in particular for elderly and physically impaired visitors (and residents). It should also be noted that this recommendation is not extended to every location not in walking distance from Vienna’s city center as, for instance, “Schloß Schönbrunn” or “Kahlenberg, Vienna, Austria” are both easily reachable by public transportation and “Schloß Schönbrunn,” while large, is internally connected by a panorama train (Tiergarten Schönbrunn, 2019).
The impact of the distance of a nonnegligible number of the top-50 locations from Vienna’s city center becomes particularly evident when calculating two global indices of spatial autocorrelation (Pfeiffer et al., 2008) based on the top-50 locations according to the numbers of likes, comments, and photos of all users (without the generic location ID “Vienna, Austria”), namely Moran’s I (Moran, 1948) and Geary’s C (Geary, 1954). According to Tobler’s First Law of Geography (Tobler, 1970), closer locations are supposed to be more closely related than more distant locations, which would, in turn, manifest in statistically significant global indices of spatial autocorrelation. To evaluate Moran’s I and Geary’s C, row-standardized inverse-distance spatial weights matrices based on the haversine distance measure are calculated between the latitudes and longitudes of the top-50 location IDs for the three popularity indicators (Drukker et al., 2013; Pisati, 2010). As can be seen from Supplement Table 1 (available online), none of the global indices of spatial autocorrelation is statistically significantly different from their expected values across the three popularity indicators, thus the null hypothesis of no spatial autocorrelation cannot be rejected for any of the test statistics. Consequently, visitors (and residents) cannot be expected to simply walk from one top-50 location to another and within the location if large, which suggests the need for Vienna’s city authorities to continue to improve their public transportation routes and schedules, in particular within their large and distant top-50 locations that are still largely unconnected internally to date.
Pertaining to the top-50 locations for local users, some differences are evident to the case of all users, as shown in Figure 3. While the top-5 to top-10 locations look quite similar to those in Figure 2, there are more deviations in the lower ranks. Typically, the top locations unique to local users are much more closely linked to activities residents usually pursue or location types they usually visit at their place of dwelling. While quite heterogeneous, these location IDs are also generally not the ones advertised to, or known by, visitors to Vienna. In particular, these include stores from the sub-categories art gallery, bakery, clothing store, chocolate store, furniture store, jewelry store, and tattoo studio, as well as one night club, and the location IDs of the various Viennese districts, probably reflecting local identity. Although it could be argued that the number of photos for (some of) the top-50 locations has been influenced by the locations themselves by own photo postings on the locations’ commercial Instagram profiles, it is much less likely that this has also been the case for the two remaining popularity indicators. While these locations can be considered “hidden gems,” due to the stationary nature of many businesses the potential utilization of these findings is probably limited to small businesses positioning their mobile offers (e.g., mobile coffee and drinks stands or food trucks) close to those locations during peak times.
Not surprisingly, Figure 4 reveals that, across popularity indicators, the top-50 locations for visiting users presents a very similar picture to the popular locations for all users (see Figure 2), since visiting users dominate the study sample with approximately 82.15% of all user IDs (see Literature Review). Across Figures 2 and 4, and within Figure 4 (i.e., across popularity indicators), the few differences belong to the categories of traditional attractions, cafés/restaurants, events, and hotels.
Temporal Dimension of the Data: Usability of Instagram Data to Forecast Actual Tourist Arrivals
While out-of-sample forecasting as performed, for instance, in Gunter and Önder (2015, 2016) or Gunter et al. (2019) is not feasible for the present small sample (see Literature Review), it can at least be investigated whether any of the three popularity indicators possess informative content potentially leading to more accurate forecasts (i.e., Stage 4 of the Geotagged Photo Analytics Artefact proposed by Miah et al., 2017). In other words, it is investigated if the (daily aggregates of the) number of likes, comments, and/or photos by visitors to Vienna can be employed as predictors for actual tourist arrivals to the city. It should be noted that possessing informative content does not imply that any of the popularity indicators is causal to actual tourist arrivals beyond Granger (or predictive) causality (Granger, 1969). In other words, the present study does not assume that one’s individual likes, comments, and/or photos are causal to their own or to other Instagram users’ travel behavior. Including the information contained in these popularity indicators in their role as web-based leading indicators at the forecast origin may only lead to more accurate tourism demand forecasts (see Literature Review). This type of time-series analysis is performed separately for the complete set of all location IDs and for the location ID “Vienna, Austria” only, in order to figure out which aggregate a-priori appears to be the more suitable predictor. Since only two popularity indicators can be investigated within a single multivariate time-series model at a time due to the small sample, this results in six different estimated models.
As the frequency of the forecast variable (monthly total tourist arrivals) differs from the frequency of the potential predictors (daily number of likes, comments, and photos), the MIDAS model class (Andreou et al., 2010; Ghysels et al., 2005, 2006) becomes a viable option. It has been successfully applied in various fields, including tourism demand modeling and forecasting (e.g., Bangwayo-Skeete & Skeete, 2015; Gunter et al., 2019; Volchek et al., 2019). A generic MIDAS model for the low-frequency forecast variable
More precisely, Equation (2) describes an R-MIDAS-AR model, that is, a Restricted MIDAS model (due to the small sample requiring a parsimonious model specification, which can be ensured by employing a functional lag polynomial for temporal aggregation of the high-frequency lags; Armesto et al., 2010) with an autoregressive term
For the present data, preliminary estimation has shown that employing nonexponential Almon (1965) lag polynomials with four shape parameters (i.e.,
The estimation results obtained using the popularity indicators based on all location IDs were, in two out of three cases, characterized by statistically and economically implausible coefficient estimates (notably coefficient estimates of the autoregressive term
With automatically selected different lag orders for the high-frequency explanatory variables, all three combinations of the popularity indicators are, in principle, useful, as each features a satisfactory overall in-sample model fit in terms of adjusted coefficients of determination (adjusted R2). The substantially smaller than 1 estimated coefficient of the autoregressive term also indicates that the presence of a nonseasonal unit root in total tourist arrivals to Vienna is very unlikely (see Literature Review, for further details). However, when taking a closer look at other model selection criteria and the statistical significance of the individual coefficient estimates, the model specification with the popularity indicators number of likes and number of comments, as given in Supplement Table 2 (available online), stands out positively: not only are all coefficient estimates statistically significant at the very strict 0.1% significance level but also several further model selection criteria consistently feature the best values (i.e., highest in the case of log likelihood; lowest in the case of two information criteria: Akaike information criterion and Schwarz or Bayesian information criterion (see, e.g., Lütkepohl, 2005, for more details).
Consequently, in order to potentially produce more accurate out-of-sample forecasts for total tourist arrivals to Vienna, the joint use of the number of likes and the number of comments based on the location ID “Vienna, Austria” is a priori recommended based on the in-sample model fit. One tentative explanation for why the number of photos as such is not an “optimal” predictor for actual tourist arrivals to Vienna is that uploading a photo to Instagram does not necessarily trigger user engagement, as indicated by the generation of likes and/or comments (e.g., the owner of a business could upload numerous photos on the commercial Instagram profile of their undertaking, while failing to generate any engagement on the part of the users). However, it should be noted that this recommendation is based on a very short sample period comprising only 13 monthly observations employed in the estimations (see Supplement Tables 2 to 4, available online) and therefore has to be taken with a grain of salt (the automatic reduction from the original 16 observations took place due to the inclusion of lags of both the dependent and the explanatory variables). Previous research has shown that geotagged photos from Flickr were useful to determine the number of tourists in Austria, especially in urban areas (Önder et al., 2016). Thus, the present results also match with existing research that employs geotagged photos in estimating tourism demand.
Conclusion and Policy Recommendations
In tourism research, traditional methods include primary data collection with surveys, interviews, or focus groups. These types of data collection are expensive and time consuming, especially if researchers want to approximate visitor numbers to those attractions that do not sell tickets, such as parks, squares, viewpoints, and so on. Moreover, the study sample needs to be sufficiently large to produce meaningful results. On the other hand, using geotagged photos that are already available on social media websites such as Flickr or Instagram creates the possibility of analyzing much more data than do traditional methods.
Although geotagged photos are invaluable for tourism research, they also have their own limitations. One of these is the possibility of device or human error while geotagging the photo (Girardin et al., 2008). In that case, the geotags may not indicate the exact location where the photo was taken. Another potential limitation is that the data could be biased since they contain users of specific websites or applications only: in the present case only Instagram users with public profiles. Wong et al. (2017) also mention that the studies in this area mostly cover geotagged photos with textual information in English, while excluding other languages. Also, not every resident or visitor uploads data to social media, or the Internet in general, which means that any such data set will inevitably exclude certain individuals. For instance, more than 75% of Instagram users are between 18 and 24 years old (K. Smith, 2019), thus overrepresenting this particular age group (but also explaining the use of English hashtags along with German ones on the part of residents).
Concerning the present study in particular, a longer consistently available temporal dimension of the data would have been highly desirable. Only then would out-of-sample forecasting have been possible. Last, the provision of data on visitor numbers to more attractions and at a higher than annual frequency by the official statistical authorities would have been much welcomed for jointly investigating the spatial and the temporal dimensions of the data (e.g., by employing spatial panel-data techniques, also for out-of-sample forecasting).
One implication of the results from the in-depth temporal analysis (see Temporal Dimension of the Data: Usability of Instagram Data to Forecast Actual Tourist Arrivals section) is that there is the potential to obtain more accurate out-of-sample forecasting of actual tourist arrivals to Vienna when employing Instagram likes and comments based on the location ID “Vienna, Austria.” This is particularly relevant for Vienna’s destination management organization since accurate demand forecasts are crucial for the short-term operational and long-term strategic decisions made by destination management organizations (Frechtling, 2001; Song et al., 2009). Acknowledging the relevance of the two popularity indicators—Instagram likes and comments—in the information set available at the forecast origin (Chatfield, 2001) should therefore also mitigate the adverse implications of selecting inadequate policy measures as laid out by Mas-Colell et al. (1995; see Literature Review), which could occur if the destination management organization bases its decisions on inaccurate tourism demand forecasts, thereby potentially leading to Pareto inefficient market outcomes. However, since visiting Instagram users can be considered at most partially representative of all visitors to Vienna (see above), Instagram likes and comments should not be employed as the only web-based leading indicators for actual tourist arrivals.
The in-depth spatial analysis (see Spatial Dimension of the Data: Top-50 Instagram Locations in Vienna for All, Residents, and Visitors section) revealed that while the top-10 locations for residents and visitors according to the various popularity indicators are virtually identical, there are also some locations that are quite popular among visitors that may not yet have appeared on the radar of the destination management organization. Although all of them are well-known public places among residents, they could perhaps be better marketed from a visitors’ perspective at times without any specific events, and this could be supported by the city authorities. In the future, the locations popular among visitors but as yet unknown to the destination management organization and the city authorities should also be appropriately integrated into city guides, in particular into their mobile application versions. Concerning the residents, on the other hand, a few “hidden gems” have been identified. Since these consist of local stationary businesses, a potential commercial utilization of these findings is probably limited to small businesses positioning their mobile offers (e.g., mobile coffee and drinks stands or food trucks) close to those locations during peak times. However, these could also be included in tourist maps to identify the places where visitors can obtain the same authentic experiences as residents. Tourism is an experience economy in which ever more individuals are becoming interested in authentic experiences. Thus, this information can be helpful for the destination management organization’s marketing efforts. Based on global indices of spatial autocorrelation, it is concluded that Vienna’s city authorities could still improve their public transportation routes and schedules linking the large and distant top-50 locations—in particular internally—to ensure sustainable mobility. Finally, the in-depth temporal and spatial analyses of this study can also be interpreted as an application of adapted versions of two of the four stages of the Geotagged Photo Analytics Artefact proposed by Miah et al. (2017).
The results of this study are based on Instagram data that were tagged with “Vienna, Austria.” While there were not too many differences between the locations preferred by residents and visitors, Vienna is not a big city in terms of area dispersion, which could be one reason for the similarity of popular locations between the two groups. While the authors did not have the funding to purchase data for cities other than Vienna, the design and the methodology of the study could be easily applied to Instagram data for other cities as well. Despite the applicability of the same design and methodology, their results could be different if the same study were conducted in a larger city such as New York City. Future research could also, for instance, focus on including more data in order to predict tourism demand and explore different indicators stemming from the metadata of geotagged photos, such as textual information.
Supplemental Material
Supplemental_Material – Supplemental material for An Exploratory Analysis Of Geotagged Photos From Instagram For Residents Of And Visitors To Vienna
Supplemental material, Supplemental_Material for An Exploratory Analysis Of Geotagged Photos From Instagram For Residents Of And Visitors To Vienna by Ulrich Gunter and Irem Önder in Journal of Hospitality & Tourism Research
Footnotes
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
