Abstract
Data coming from Volunteered Geographic Information (VGI) are a precious source of knowledge, especially when official statistics are difficult to produce at a detailed level. However, in order to be used effectively as a supporting source, Volunteered Geographic Information must meet thorough standards of quality. In this work, the quality of OpenStreetMap (OSM) – in terms of completeness, positional and semantic accuracies – is evaluated in the cultural sector with reference to the official survey of Italian museums. This study offers novel insights into the quality assessment of OpenStreetMap points of interest, and it is a useful benchmark for the use of unconventional information for cultural analysis and policy. The results show that the number of museums mapped in OpenStreetMap accounts for 86% of the official total while – in terms of completeness – OpenStreetMap coverage is 39% overall. The distance is less than 150 metres for 77.7% of the matching museums and the similarity index among denominations is higher than 0.9 for more than half of the museums. OpenStreetMap cultural information appears to be quantitatively rich as well as positionally and semantically accurate. However, some concerns do arise about the reliability and consistency of tags and metadata.
Keywords
Introduction
Culture – in a broad sense – is commonly recognized as one of the determinants of societal progress and economic development (Duxbury et al., 2015), favouring the creation of human capital and the employment of a large share of qualified and creative people (OECD, 2005). Cultural endowments of territories have a great impact on places and generate an attractive eco-system for tourists and visitors (Santagata and Signorello, 1998), a fertile environment for innovative products (Katz, 1999), as well as social capital, networks and social integration (Putnam, 1995).
To quantify the social and economic values of culture, it is crucial to have reliable cultural statistics, especially at the micro or local level. However, it is not easy to produce detailed and updated cultural data. First, many heterogeneous elements concur in defining the cultural sector. 1 Then, the effectiveness of conventional data collection approaches is undermined by criticalities peculiar to the sector such as non-homogeneous industries, non-profits and intangibles (Eurostat, 2016). Moreover, normal economic theories cannot be applied tout-court to the field of culture due to issues like market failures, network effects, social externalities and imitation among peers (see Balducci, 2009; Candela and Scorcu, 2004; Frey, 2000). Consequently, there is potential to use non-traditional sources (e.g. social networks, search engines, crowdsourced data, administrative archives, cultural observatories) to integrate the scarce availability of micro or point level-data about museums, events, shows, art places, creative enterprises, intangible assets, creativity and diversity, etc. 2
Among these ‘alternatives’, Volunteered Geographic Information (VGI), which is geographic data provided voluntarily by individuals (Goodchild, 2007), offers readily available and updated point data, especially in the cultural domain for which official information is difficult to produce. However, in order to supplement official statistics, VGI must be reliable and meet standards of quality (Antoniou and Skopeliti, 2015). Yet compliance with these standards cannot be imposed upon a voluntary and collaborative source ex-ante but must be verified with existing data sets. The widespread OpenStreetMap (OSM – www.openstreetmap.org) is the most popular crowdsourced and collaborative project that has coexisted with proprietary products (such as Google Maps, Microsoft Bing and others) for cartography and spatial data production since the early 2000s. OSM is made of geographic data created, modified and updated by registered members, who can freely map or edit data by inserting paths and points, uploading contents, using GPS tools, etc. 3 In addition to the practical applications, the movement behind OSM and other VGI has interested various areas of scientific research, most of which are devoted to assessing the quality of the data (see ‘VGI quality assessment’ section).
In light of this, the aim of this work is to exploit the potential of OSM for quantitative representations of the cultural sector with the intention to encourage the use of non-traditional but information-rich data. To pursue this objective, the data extracted from OSM are compared with the data on Italian museums officially surveyed by the Italian National Institute of Statistics (Istat) (‘Data and methodologies’ section).
The Italian case study is of particular interest for culture, as the country is notoriously characterised by a unique diffusion of cultural elements within its territory. The choice of museums among the cultural items is motivated by the availability of accurate, updated and publicly accessible official reference micro-data.
The quality assessment is based on completeness as well as positional and semantic accuracies, highlighting the geographical differences across the country (‘Results’ section). The semantic analysis, the whole-country coverage and the specific application to the cultural field place this work in a novel position relative to previous studies. Its results contribute to the scientific literature as it constitutes a useful benchmark for quantitative analysis of culture via open and crowdsourced data. Also, this study offers useful insights for cultural policies and leaves room for several extensions (‘Discussion and conclusions’ section).
VGI quality assessment
During the last decade, the emergence of VGI has made it possible to create new data sets and to enrich or complement official sources of data (Antoniou, 2011). User-generated geographic data typically have advantages in terms of free use, timely creation, high detail in local areas and site-specific crisis management (Craglia et al., 2008; Haklay et al., 2014). A large amount of research has focused on VGI quality assessment, classifying methods into two major strands: comparisons with authoritative/official reference data sets and internal quality assessment based on indicators of VGI quality (Antoniou and Skopeliti, 2015).
In the first case, treating the reference data set as a ‘gold-benchmark’, qualitative criteria have been derived directly from ISO standards (Kresse and Fadaie, 2004). Quality criteria for geographical products refer to completeness (coverage of the VGI data with respect to official sources; Haklay, 2010, Jackson et al., 2013), positional accuracy (Arsanjani et al., 2015; Chrisman, 1991; Goodchild and Hunter, 1997), logical consistency (semantic accuracy, data integrity; Ali et al., 2014; Devillers et al., 2007; Girres and Touya, 2010), thematic accuracy (per cent of correct classifications and tags, Arsanjani et al., 2015), temporal accuracy (evolution of VGI and timely creation of data; Antoniou, 2011; Girres and Touya, 2010) and usability (fit-for-use according to the specific end-user aims; Barron et al., 2014). Despite all of the above research, no single set of metrics or criteria is in fact optimal for all possible users and applications (Mooney and Minghini, 2017).
Despite the existence of quality standards, the comparison of VGI with reference data is not always possible, primarily because valid reference sources may not be available (Mooney et al., 2010). Therefore, a second strand of research has checked the consistency of VGI by developing internal quality indicators. Direct indicators refer to the inner quality of the data (e.g. changes and corrections; Ciepłuch et al., 2011; Van Exel et al., 2010), while indirect proxies have been obtained from socioeconomic and demographic contexts within which the contributors operate (Antoniou and Skopeliti, 2015).
In fact, significant correlations have been found between the demographics (e.g. population density, rural versus urban, Zielstra et al., 2013) or the socioeconomic status of an area and the completeness and positional accuracy of VGI (Haklay, 2010). The results have also varied in relation to the characteristics and positions of local OSM communities: larger amounts of geodata were created in European cities (Neis et al., 2013). The total number of contributors, their IT expertise and their local knowledge have also influenced the quality of VGI (Napolitano and Mooney, 2012; Van Exel et al., 2010). However, contributors numbering more than five were not significant as a result of a noticeable participation inequality (Haklay, 2010): only 1% of the contributors were responsible for the majority of the contents (Neis and Zipf, 2012).
The quality tests of OSM have shown surprisingly fast database development (Barron et al., 2014) and excellent positional accuracy of the road network (Girres and Touya, 2010; Mooney et al., 2010; Neis et al., 2013; Zielstra et al., 2013). In the UK, 80% of the areas have shown an overlap within a 20 metre buffer (Haklay, 2010).
Results regarding completeness have been more varied. It has been shown that the OSM network was 80% complete worldwide (Barrington-Leigh and Millard-Ball, 2017). Road network coverage was significantly lower in suburban and rural areas (Haklay, 2010) and in areas with fewer contributors (Girres and Touya, 2010). The total length of OSM roads was even higher than the reference data in densely populated areas of Germany and US, due to the mapping of secondary roads and pedestrian and cycle paths (Craun, 2014; Neis et al., 2013).
While the majority of studies were based on the road network, only few of them focused on points of interest (POIs) or other elements. Referring to schools in a small area of the USA, Jackson et al. (2013) showed a completeness of 90% and an optimal positional accuracy. Other tested features included pedestrian and cycle paths (Hochmair et al., 2015), waterways and coastlines (Girres and Touya, 2010), buildings (Brovelli et al., 2016; Fan et al., 2014), parks (Kalantari and La, 2015) and gardens (Ali and Schmid, 2014; Ali et al., 2014). Sometimes, OSM data have been used to refine population estimates (Bakillah et al., 2014) or to identify parcel geometries (Liu and Long, 2015).
The semantic quality of these features has been studied in relation to the metadata (Devillers et al., 2007) and to tags assigned to natural elements (Mooney and Corcoran, 2012; Mooney et al., 2010). Although coordination efforts have been made by the OSM community, the taxonomies are not formally codified, because for VGI they cannot be imposed from the top (Davidovic et al., 2016). Instead, they have arisen from the choices of the contributors (hence the name folksonomies, Mocnik et al., 2017).
Data and methodologies
Reference data
Official reference data were obtained from the Survey on Museums and Cultural Institutes, conducted by the Istat and the Ministry of Culture and Tourism (MiBACT). The March 2017 data release referred to institutions that were open to the public in 2015. The eligible institutions were the ones that permanently acquire, preserve and exhibit cultural heritage for the purpose of education and study and that were actually open and equipped with services for visitors. The data contained information on the typology of the exhibited goods, museum ownership and management, human and financial resources, ancillary cultural activities and services, yearly number of visitors and territorial networks.
According to the official survey, the regions with the highest number of museums were Toscana and Emilia-Romagna. The top 20 institutes – less than 1% of the total – attracted almost a third of the visitors. More than half of Italian museums were free to enter and generated no income from ticket sales. Fifty per cent of the museums had a website and 40.5% had a social media account, but only 6.6% had an online ticketing service. 37.5% had assistance services and/or facilities for disabled visitors (Istat, 2016).
OSM data
OSM data are currently used for various purposes including commercial, educational, research, etc., and a great deal of software is based on OSM for data download, editing and visualization, location search and GIS (Mooney and Minghini, 2017). 4 As an open data provider, OSM allows users to freely download from and reuse its database which is licensed under the Open Data Commons Open Database License (ODbL). 5 OSM has grown exponentially and now keeps up with commercial products (Antoniou and Skopeliti, 2015; Arsanjani et al., 2015). 6 Sometimes, governments and institutions have contributed to the project, allowing their archives to be bulk-imported into OSM (i.e. TIGER in the US, CORINE land cover in France and Istat data for Italian national boundaries 7 ).
The OSM database is composed of nodes (points in space with coordinates), ways (polylines used to define linear or polygonal objects) and relations (a multi-purpose data structure that documents a relationship between two or more data elements). Each object has a (key = value) tag that specifies its features. 8 The reference values for museums fall into the tourism key, which is used to define elements of specific interest to tourists such as places to visit, POIs and information centres. The tags selected for the analysis were tourism=museum and tourism=gallery. 9 In order to take elements that have been coded into a different way into account, other tags were used such as amenity = museum, museum = yes and museum = *. 10 The OSM database was filtered using the Overpass API 11 by means of which the chosen tags were extracted within the Italian borders. Data were downloaded on 27 July 2018.
OSM contributors are allowed to map museums as point elements (nodes) or as polygons (ways). Sometimes, however, the polygons duplicated the information already present in point form (see Figure A.2, supplemental material). 12 To avoid double-counting, 59 such polygons were identified (using a 30 metre buffer) and excluded from the analysis. In other occurrences, various polygonal elements were referred to a unique museum complex; these cases were corrected with a fusion procedure. Finally, centroids were computed from museum perimeters. Overall, 3573 museum points were retained for the analysis (see Figure A.6 in the Supplemental Material).
Since every OSM object must have at least a tag but there is no limit to the number of tags an object can have, there were many keys (180 in total) associated with museum points that contained additional information and properties. 13 However, these data were missing for the majority of museums. Apart from the name information (present in 95% of cases 14 ), the keys Wikidata 15 and Wikipedia were both available for 22% of the museums. Twenty per cent of the museums had website information, around 12% had phone numbers and opening_hours and 7.6% had wheelchair accessibility. Other useful keys, unfortunately, pertained to a minority of cases, i.e. note and description sometimes contained relevant information such as the actual opening to the public, news or updates, but they were only available for 2% of the museums (see Table A.2 in the Supplemental Material).
Methodologies
In order to assess completeness and accuracy, it was necessary to identify the set of museums present in both sources. A series of GIS operations (using buffers, joins and point-in-polygon counts) allowed for the tracking of the Istat museums with an OSM counterpart nearby and to compute a matrix of distances among points.
The museums were then matched by means of a semi-automatic step-by-step procedure. The Jaro–Winkler (JW) index of similarity (Jaro, 1989; Winkler, 1990) between museum names was calculated for each record of the distance matrix, which allowed for the ranking of counterparts by distance and by similarity of names. 16 The threshold values for JW and distance were refined through experimental tests carried out in sample regions and with reference to the scientific literature (Jackson et al., 2013). The criteria adopted were modifiable, replicable and restrictive, with the aim of extracting subsequent sets of corresponding museums and minimizing false positives.
First, a subset of matching museums was extracted by setting a JW similarity equal to or higher than 0.85. In order to expand the set of coupled museums, a second and less restrictive criterion was applied to the residual set by setting JW ≥ 0.7 and distance between museums between 50 and 150 metres or JW ≥ 0.6 and the distance <50 metres. Automated results were manually revised and 22 cases of incorrect matching generated in the second round were fixed. One thousand forty-six museums were automatically matched in the first phase and 345 in the second. Finally, 242 were manually added.
Despite manual revision, there was still a minority of OSM museums that could not be matched to an Istat counterpart. These occurrences may have happened when the differences in the nomenclatures were too significant to determine that the museums were actually the same or when the museum points were geographically very close but the OSM name was missing (JW index = 0). In light of the above, it should be noted that the final number of matching museums was underestimated.
A hotspot analysis was also performed (Getis–Ord Gi* local statistic; Getis and Ord, 1992). Hotspots (coldspots) represent atypical high-value (low-value) locations (where the number of OSM museums is high compared to the official one) surrounded by other high-value (low-value) locations. The algorithm performed 999 permutations with a significance level of 5%, using a queen contiguity matrix (Anslein, 2018; O’Sullivan and Unwin, 2003; Oxoli et al., 2017).
Results
Completeness analysis
Of the 4158 museums officially surveyed, the OSM database contained 3573 museums (86% of the Istat total). OSM quantities varied remarkably among Italian regions. In the Marche region, for example, only 90 museums were mapped (30% of 303 Istat total). Similar low shares occurred in Calabria, Valle d’Aosta and Abruzzo. In contrast, OSM outnumbered the official data in other regions (Trentino, Lombardia, Sicilia, Campania and Sardegna) (see Table 1, and Table A.1 and Figure A.1 in the Supplemental Material). From a geographical point of view, a north–south polarisation was not evident. The spatial density of museums was similar in both sources, with fewer OSM museums in the regions of Central Italy (cf. maps in supplemental Figure A.3).
Completeness analysis.
OSM: OpenStreetMap.
Note: Column d. is the total share of OSM museums (a + c) with respect to the Istat total (a + b). Column e. represents the share of museums mapped in both sources (a) with respect to the Istat total (a + b). Column f. is the share of museums present in OSM only (c) with respect to the OSM total (a + c).
Data source: Istat and OSM (©OpenStreetMap contributors).
The total number of objects provided a global indication of the diffusion of OSM items across the country. However, to assess the completeness of OSM, it was necessary to determine how many of the museums were present in both sources rather than only in Istat or only in OSM. Columns e. and f. of Table 1 show the OSM coverage (the share of Istat museums which were also mapped in OSM) and the share of museums which were OSM-specific (not included in the Istat total), respectively. One thousand six hundred and thirty-three museums were mapped by both sources (39.2% for the whole of Italy). The OSM coverage varied substantially among regions: it was much higher than the average in Trentino (58.9%) and Umbria (52.1%), while remarkably lower in Calabria (18.1%) and Marche (19.8%) (cf. Table 1 and Figure A.1).
The total number of OSM objects did not imply a higher OSM coverage. For example, Sicilia had 45 museums more in OSM but had an OSM coverage below the national average. Many Sicilian museums were in fact OSM-specific (OSM-only share was 71%). Similar results were generated in Lombardia and Campania. Conversely, Trentino-Alto Adige was a good example of both numeric richness and adherence to official data. Overall, 54% of the OSM museums (1940) were not included in the official survey.
At the urban level, Roma and Firenze were the cities with the highest number of museums (Table 2). With the exception of Siena, the OSM coverage was higher than the national average for the first 10 cities, especially in Napoli (76%), Firenze (62%) and Venezia (60%). In Napoli, too, the number of additional OSM museums was very high. It is notable that 971 municipalities (33% of those that contained a museum) had a 100% OSM coverage (i.e. all the official museums were exactly mapped in OSM). Those municipalities were mostly located in Piemonte and Lombardia (for municipal figures, see Figure A.5).
First 10 cities by official number of museums and urban OSM coverage.
OSM: OpenStreetMap.
Data source: Istat and OSM (©OpenStreetMap contributors).
The OSM coverage generally increased with the resident population of the cities (Table 3), and it was particularly high for major cities (56%, 111 cities), and for cities belonging to the last quartile of the distribution. 17
OSM coverage by population distribution, major cities and other municipalities.
OSM: OpenStreetMap.
Data source: Istat and OSM (©OpenStreetMap contributors).
The spatial differences of OSM coverage can be explored without referring to the administrative boundaries by using a 5 kilometre square grid and hotspot analysis. The cells in which the number of OSM museums is higher (40% of those containing a museum) were located around the main metropolitan areas and in a horizontal band lying across northern Italy (see supplemental Figure A.4). The hotspots were superimposed upon dense urban areas and closely approximate the geography of major cities in Italy (Figure 1). On the contrary, there was a large and significant cluster of proximal low values (coldspots) in Marche and in scattered peripheral areas.

Hotspot analysis: clusters with proximal high values (hotspots) and low values (coldspots) of OSM data (5 kilometre square grid). Note: ‘High’ (‘Low’) represents cells with a number of OSM museums higher (lower) than the official one surrounded by other cells where the number of OSM museums is higher (lower) than the official one. ‘Neighbourless’ polygons are islands without contiguous polygons. 999 permutations, signif.= 5%. Data source: Istat and OSM (©OpenStreetMap contributors).
Positional and semantic accuracies
Positional accuracy for the set of museums present in both sources was tested within three thresholds (30, 50 and 150 metres, Jackson et al., 2013). As illustrated in Figure 2, which depicts central Rome, with respect to the Istat coordinates (dots), the OSM nodes (squares) can be located within a buffer of 30 metres (solid line, i.e. Galleria Nazionale Antica in Palazzo Corsini), 50 metres (dashed line, Galleria Spada), 150 metres (dotted line, Museo di Roma in Trastevere), 18 or more.

Official (dots) and OSM (squares) coordinates. 30 metres (solid line), 50 metres (dashed line) or 150 metres (dotted line) radii (zoom area: central Rome). Data source: Istat and OSM (©OpenStreetMap contributors).
31.8% of the comparable museums were located within 30 metres (the ‘building accuracy’ level) and 46.6% of them within 50 metres (Table 4). Overall, 77.7% of the OSM museums were located within a radius of 150 metres from their counterparts. The median distance between the museums was 56 metres, with a standard deviation of 465 metres.
Positional accuracy: distance among museum points.
Data source: Istat and OSM (©OpenStreetMap contributors).
Interestingly, positional accuracy varied significantly across the country. Compared to the national average, the median distance between points was at a minimum in Trentino-Alto Adige (27 metres) and at a maximum in southern regions such as Campania (92 metres) and Calabria (91 metres).
In terms of semantic accuracy (i.e. adherence between museum denominations), the JW index of similarity was larger than 0.9 in more than half of the cases, which indicates an almost perfect overlap between strings. The nomenclatures were identical in 25% of cases (410 museums), while differences were relevant in 10% of cases (JW < 0.67). Also, the JW index showed a weak inverse correlation with the distance between the coordinates of the museums: JW tended to be higher for museums with more accurate positions.
The direct comparison among the museum denominations allowed for the investigation into the reasons for variable degrees of OSM coverage. This is exemplified by the case of Bergamo, where one of the highest differences between OSM and Istat was observed (Table 5).
Comparison among denominations (example of Bergamo).
OSM: OpenStreetMap.
Data source: Istat and OSM (©OpenStreetMap contributors).
Museums 1–5 were coherently mapped in OSM, despite slight differences in the denominations. Museum 6 was correctly mapped from a geographical point of view but with a notable difference in the denomination. Museum 7 represented a duplicate of Museum 6; the city’s historical museum was also registered as the Palazzo Storico del Podestà (heritage building containing the historical museum) using the same tag.
Museums 8–11 were actually present in the city, but they were not present in the official survey because they did not meet all the eligibility criteria (i.e. they were not open to the public at the time of the survey). For example, Container Art and Galleria Michelangelo were new and independent, artistic spaces (the former was itinerant) that were not officially surveyed but which, nevertheless, helped to define the city’s cultural scene. On the contrary, Museum 20 was officially surveyed but it was not mapped in OSM.
Museums 12–19 refer to private and commercial art shops. These cases represent incorrect meta-information because they should not have been tagged with values like museum or gallery but with specific tags (i.e. amenity=art_shop).
Similar results were found when the test was repeated in other cities (i.e. Roma, Bologna). The experiments above allowed some recurrent inconsistencies to be highlighted: (a) tagging the container building, villa or garden as a museum; (b) tagging archaeological areas as museums; (c) counting parts of the same museum complex separately; (d) including museums that do not meet eligibility criteria (i.e. not open to the public, temporary exhibitions); (e) mapping art shops and private galleries that are not actual museums (for a summary of results, see Table A.3).
Discussion and conclusions
To evaluate the appropriateness of VGI for cultural statistics, this study sought to compare OSM data and the official survey on Italian museums. Information content, completeness, and the positional and semantic accuracies of OSM were examined.
The OSM source was comprehensive in terms of the total number of objects, accounting for 86% of the official number. As was the case for the road network length in other countries (Neis et al., 2013), OSM accounted for a higher total number of museums in several Italian regions than did the official data while it accounted for fewer than the official count in others. The results were generally related to the distribution of urbanised areas (cf. Craun, 2014; Haklay, 2010), and north–south polarisation was not evident in the OSM coverage. When completeness was taken into account, the degree of OSM coverage was nearly 40% for the whole of Italy. Again, regional differences arose with OSM coverage ranging from 18 to 59%.
The national level of completeness was largely influenced by the shares of larger cities and major towns. At the same time, the capillary diffusion of museums throughout the small towns of Central Italy (especially in Marche, Tuscany and Emilia-Romagna) was not adequately rendered by OSM. A straightforward explanation is that the largest museums located in larger cities, which also attract more tourists, may be more easily described by OSM contributors.
The number of museums only present in OSM and not in the official survey was particularly high in some regions and this result has both positive and negative aspects. It can be desirable in terms of information richness when the points represent actual museums that were not included in the official survey for reasons depending on eligibility criteria and time periods. VGI fulfils its primary raison d’etre when it accounts for the museums – often new, independent or itinerant – which were ‘invisible’ to official censuses but were noted by local expert contributors. On the other hand, when the number of OSM objects was inflated by duplicates, mistakenly or incorrectly tagged items, OSM data generated a biased and misleading picture of reality.
The positional attributes of the museums were very accurate, as was already observed in the literature (see ‘VGI quality assessment’ section). Evidently, OSM contributors are particularly committed to mapping activities, more so than to consistency of content. However, since there is progressively less empty space on the OSM map, the task of being an OSM contributor should change from providing new geodata to maintaining, updating and fixing existing features. The ability to sustain the contributors’ motivation is, in fact, a major challenge for VGI (McConchie, 2016; Sinton, 2016).
According to the results of this application, the positional accuracy of the museums varied geographically. Differently from what was observed for the OSM coverage, the median distance between museum points was generally shorter (more accurate) in the regions of northern Italy and longer in southern ones. Even if that may be an Italy-specific result, it diversifies the literature. In Italy, it was previously shown that the positional accuracy of buildings was almost constant among subareas for reasons mostly dependent on the good quality of satellite imagery (Brovelli et al., 2016). In the case of museums, however, since Istat coordinates referred to museums’ street entrances and OSM coordinates were sometimes centred on building perimeters or in their proximities, the discrepancies found in point positioning derive from the absence of a standardised mapping rule for POIs in OSM and, ultimately, they depend on the choice of the contributors. The presence and activity of local communities, and in many cases of single OSM contributors, are, in fact, the main forces generating the difference in the spatial distribution of the quality of the results. A promising extension may be to test the mapping behaviours of contributors by quantitatively analysing the discrepancies in point positioning.
The OSM data analysed were potentially rich in terms of information content. Unfortunately, many of the relevant fields associated with museum points were available only for a minority of cases. Clearly, this is a consequence of the very nature of volunteered information, which cannot be elicited on-demand. Evidently, this is a limitation in terms of statistical quality. For example, it is not possible to establish from the available keys if all the OSM museums were effectively open to the public or if they had ceased their activity. Instead, these were two eligibility criteria in the official survey. In terms of currency, however, while official data are updated only in the years of the survey, OSM data are – potentially, at least – continually updated.
This study has limitations and strengths. Despite automated matching and manual revision, there is still a portion of OSM museums that might be potentially matched to an Istat counterpart. Since the criteria used for matching were restrictive, the actual figures of OSM coverage are slightly underestimated. However, a point-to-point investigation of the metadata quality and the thematic accuracy is out of the scope of this work (Ali et al., 2014; Antoniou, 2011; Girres and Touya, 2010).
A major strength of this study is the large nationwide scope that distinguishes it from other contributions that generally encompassed only local areas (Antoniou, 2011; Mooney et al., 2010; Zielstra et al., 2013). A comparison with other studies is, therefore, difficult to perform. For example, Jackson et al. (2013) found better positional quality results for school POIs, but they referred only to a small area in the US where a collaborative validation process of OSM data also occurred. It should also be remembered that the figures presented here are national averages, while many cities (33% of the municipalities containing a museum) show a perfect (100%) degree of completeness.
In light of these results, the major drawbacks of OSM for cultural statistics appear to be the reliability of content and the correct attribution of tags to the corresponding elements, more so than geographic accuracy or completeness. It is still worth noting that in the majority of inconsistent cases (as listed in the previous section), OSM information is actually more all-encompassing than the official survey and is not really ‘mistaken’ even when incongruent with the official criteria. Although the eligibility criteria are unavoidable requisites for national statistics, compliance with them is not necessarily an advantage for tourists and visitors in terms of information content. At the same time, OSM contributors may understandably be unaware of all the official criteria for the features they map. This also explains the poor matches that sometimes arose between Istat data and OSM data.
The issue of thematic coherence is constantly observed by the OSM community, which has developed automated methodologies for reviewing the consistency of tags and checking metadata. 19 Semantic quality can also be improved by using ontologies such as OSMonto in data tagging (Codescu et al., 2014) or a tag recommendation system such as OSMantic (Vandecasteele and Devillers, 2015), which automatically suggests relevant tags to contributors during the editing process. 20 However, these procedures were mostly designed with the road network in mind and they may be further developed in the POI and tourism domains.
This study can be the starting point for several extensions and future works, i.e. a comparison with museum data in other countries or the inclusion of other cultural items (e.g. heritage buildings, churches, monumental gardens, entertainment venues). OSM data on museums may also be used as a proxy for estimating local cultural endowments or tourist attractiveness, as has already been done in terms of population estimates (Bakillah et al., 2014). Further research may allow analysis of what determines the different degrees of OSM coverage. It has been shown that deprivation and other socioeconomic conditions affect completeness and positional accuracy (Antoniou, 2011; Haklay, 2010) and that high income and low population age result in a higher number of contributions (Girres and Touya, 2010). Analogously, several variables may be tested in relation to OSM coverage for museums employing multivariate regression modelling: at the urban level, the characteristics of cities such as population, mean age, education, broadband diffusion and number of tourists, and at the museum level, characteristics such as type of collection, ownership and number of visitors. These extensions may also provide a class of indicators for VGI internal quality assessment (Antoniou and Skopeliti, 2015).
In conclusion, the results of this study are important in terms of cultural policy. A freely accessible and at the same time accurate and complete mapping of cultural endowments is essential for the promotion of the cultural sector, both at the national and local level. Misleading information – which often leads to the underestimation of the actual potential of the sector – results in limited investments and lower public participation. Policy makers in peripheral areas (such as small villages in central Italy) should reflect on this point when they fail to make visible their cultural assets – small but valuable realities spread across their territories.
Data
The data that support the findings of this study are available from OSM (https://www.openstreetmap.org), free to download and reuse under the ODbL, and from Istat micro-data (https://www.istat.it/en/microdata), available for research purposes after registering to the web service. Data are also attached as accompanying material and available from the author upon request.
Supplemental Material
EPB876949 Supplemental Material - Supplemental material for Is OpenStreetMap a good source of information for cultural statistics? The case of Italian museums
Supplemental material, EPB876949 Supplemental Material for Is OpenStreetMap a good source of information for cultural statistics? The case of Italian museums by Francesco Balducci in EPB: Urban Analytics and City Science
Footnotes
Acknowledgements
The findings and conclusions in this article are those of the author and do not necessarily represent the official position of the Italian National Institute of Statistics (Istat). Research conducted during a fellowship at the Istat.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material is available for this article online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
