Abstract
In line with Eurostat and other National Statistical Institutes, Istat has been publishing experimental statistics since April 2018. Experimental statistics inform users on topics not fully exploited by official statistics, and differ from them because they are not yet entirely developed. This enlargement of Istat statistical supply was fostered by the satisfaction of users’ needs and the increased availability of new data sources.
An internal procedure was set up to select, evaluate and disseminate experimental statistics before their publication on a dedicated area of the Istat website.
At Istat, the primary purpose of experimental statistics is to improve relevance. Indeed, they are new statistics or improved existing outputs, which have a value added for the users in terms of “new” or “additional” information available. Important features of experimental statistics are the use of non-traditional data sources, the use of innovative methodologies, the geo-spatial reference or other types of data visualisation, the integration of multiple sources. Up to now, improving timeliness seems to be a less important motivational factor for developing experimental statistics.
Recently, the transition from experimental to official statistics was tackled, leading to the definition of a set of criteria to be satisfied.
Introduction
Istat first published experimental statistics (ES) in April 2018, as part of a dedicated section (
So far, there is no standard definition of ES. According to the UK Office for National Statistics, experimental statistics are series of statistics that are in the testing phase and not yet fully developed (
The introduction of ES was part of Istat modernisation programme aiming at a more functional Institute where research was deemed as one of Istat core businesses. The ensuing organisational model has put research activity, methodological support and users’ satisfaction as key drivers, which require strategical investments.
Istat use of ES is in line with other National Statistical Institutes (NSIs), which have already developed such kind of statistics, and Eurostat as well (
To have a full overview of ES production at a EU level, the European Statistical System hub on ES gathers NSIs websites sections about ES (
Destatis (Germany) defines the ES they produce as “feasibility studies which are currently being developed using new data and methods, and conducted with the aim to integrate the successful results into the continuous calculations. At present, the majority of the results of these studies are still of an experimental nature. Both the degree of maturity and quality of experimental data differ from those of official statistics (e.g. regarding harmonisation, coverage and methodology)”. The ES presented on the Destatis website are of an experimental nature for a limited time. Eventually, some of them will be classified as “experiments”, while others will enhance official statistics. (
Other EU NSIs disseminating ES are Statbel (Belgium), Statistics Estonia, INE-Spain, Central Statistical Bureau of Latvia, Statistics Netherlands, Statistics Poland, Statistics Portugal, Statistics Finland and Statistics Iceland.
In the European region, the Swiss Federal Statistical Office (
Outside Europe, Statistics New Zealand has developed a large amount of experimental data series, tools, and methods underlining the importance of users’ feedback to figure out what initiatives are useful, and how to develop them further (
In general, the need for new data to face the recent crisis caused by COVID-19 pandemic has further enhanced the production of experimental statistics.
The paper summarises Istat two-year experience in disseminating ES, commenting strengths, weaknesses and lessons learnt. Section 2 gives a definition of ES and describes how Istat classifies them. Section 3 summarises the procedure for evaluating and disseminating ES. Section 4 provides examples of ES published. Section 5 illustrates how the ES meet relevance and other quality criteria; and reports on users’ feedback on the ES published so far. Section 6 concludes with future developments, taking into account the increasing need for ES due to COVID-19 pandemic as well.
What does Istat mean with “experimental statistics”?
As already said, there is no standard definition of ES. By producing ES, Istat tests the use of new data sources and the application of innovative methods in producing data. The resulting statistics are defined as experimental as they have not reached full maturity and are subject to further development or improvement. This means that ES do not meet all the rigorous requirements of official statistics, yet. Reasons might be:
the volatility of the data source; the new methods are subject to modification or further evaluation; the production process is not stable enough for ensuring regular outputs; there is only partial coverage (of unit or geographical); the statistics do not fully comply with the quality standards of official statistics [2].
For instance, several statistics based on Big Data are still experimental. Indeed, there are many problems to face for a full exploitation of Big Data, despite their increasing usefulness as an information source. These problems range from quality aspects (e.g. selectivity) to legal considerations (e.g. privacy). Producing ES based on Big Data may help to better analyse these sources and solve some issues. In this way, official statisticians’ community can better benefit from Big Data sources and use them in a methodologically sound manner [3].
Despite the limitations, ES have a high potential, as they are able to fill in the information gaps, are of immediate value for the users, and serve as a driving force for new analyses and indicators.
On the contrary, official statistics released by Istat, as well as by other NSIs in Europe, comply with the principles and standards of the European Statistics Code of Practice [2]. In many cases, they are also subject to EU Regulations. Thus, it becomes important to distinguish ES from official statistics to safeguard users from misuses of ES. On the Istat website, the dedicated section on ES is clearly marked with a specific logo and graphic standards. Other NSIs and Eurostat have made the same choice. According to users’ feedback, it seems to work properly.
Istat experimental statistics products with launch date, grouped by category
Istat experimental statistics products with launch date, grouped by category
On the Istat website, ES are classified according to four different categories:
Non-standard classifications derived from official taxonomies currently used by Istat (or from international standards) or new experimental classifications derived from analysis and research activities for microdata processing (e.g. “Households by social groups”, a classification resulting from a multidimensional approach, see Section 4). They can be divided into two main sub-categories:
Non-standard classifications in a strict sense, obtained from the official classifications (those defined at an international level and, more generally, those currently used by Istat in publishing official data), by aggregating the classification items in a different way. Users can apply these classifications, using transcoding tables provided for in disseminating data. New classifications, proposed as experimental within specific analysis and research activities, generally based on microdata processing. For this reason, they cannot be easily reproduced by the user, unless with the assistance of Istat researchers and the access to microdata. New indicators, produced by integrating many official and non-official data sources (e.g. “Wage inequality indices in small enterprises”, by a plurality of analysis dimensions and distribution indicators, see Section 4). The focus is on phenomena under investigation rather than on data sources used to describe them. The sources can be all official, combined in an experimental way (e.g. using new methods), or can include non-official sources. The introduction of new indicators is conditional on the assessment of their benefits that is the opportunity of a better or more in-depth, or else, more comprehensive description of the phenomena under investigation. Such benefits should outweigh the drawbacks related to their experimental nature such as limitations with regard to the theoretical framework, quality indicators or other limitations mentioned in the methodological note. Interpretation frameworks and analyses of complex phenomena obtained through the integration of official data sources (e.g. “Daily population for study and work reasons”, an experimental approach using administrative data, see Section 4). The objective is to represent phenomena, multidimensional by their nature, as completely as possible. The key concept, therefore, is “integration”, that is the overall representation of phenomena, rather than at the single indicator level. Results of experiments on Big Data, characterised by the use of non-official data sources which replaces the traditional ones. This includes web scraping, data scanner, geospatial data, mobile phones data and other types of Big Data, some of which are becoming more and more frequently used by NSIs.
Section 4 provides further examples of the cited categories. Table 1 lists Istat ES products.
Istat has set up an internal procedure to select and evaluate the ES proposals to be published through the website. The proposals have to be accompanied by supporting documentation, drafted according to guidelines and a template. The guidelines are aimed at easing researchers to describe the ES and highlight the most important and innovative aspects. The procedure, the guidelines and the template are made available to Istat staff through a dedicated page on the Intranet. The Director responsible for the product, who has to make a first evaluation, can submit proposals at any time of the year.
Then the Research Committee plays a key role. It was set up to orient and provide for a consistency check of Istat research activities as well as to propose policies in specific domains, including a contribution to the strategic planning and monitoring of research activity. With regard to ES, this Committee assesses the proposals with the support of its scientific secretariat and the experts – by subject-area as well as methodologists – who are identified considering their expertise and specialisation.
As foreseen by the procedure, the supporting standard documentation is required to allow for a proper assessment with regard to the soundness, innovative aspects, and relevance of the ES to the users. The main document is a methodological note that describes the purpose of the ES, the achieved results, the information gap filled in, the innovative aspects, the applied methodology and references to the literature or other documents. In addition to the methodological note, the ES can be accompanied by other explanatory documents. The methodological note is the key document the experts use to assess the soundness and the relevance of the proposals for new ES. One further aspect that the evaluators consider is the sustainability of the new ES, that is to which extent the ES can be regularly updated (e.g. workload, data availability, organisational problems). The colleagues from legal office are involved to assess the compliance with privacy issues as well. The ensuing assessment is then reported in a standard form and discussed within the Research Committee.
The proposals can be amended and integrated taking into account the recommendations provided for by the evaluators. The proposals that are positively evaluated by the Research Committee are submitted to the Board of Directors, which takes the final decision. The approved ES is then published with the support of the colleagues in charge of the website. The whole process is coordinated and monitored by a senior technologist who works to make it as smooth as possible, managing all the process steps with regard to the relations with the actors involved in assessing the ES proposals.
The first consideration that can be done is that the participation of internal colleagues in the whole process for publishing ES is of a cross-sectional kind. This represents a value added as many people with different skills take part in the process and give their contribution. In this sense, it could be seen as a collaborative process where everyone gives his contribution in a flexible way, without a real formal mandate.
The second reflection relates to the attempt to speed up the process in order to ensure the timeliness of ES. Timeliness is a key requirement for most users in general, and this is true for ES as well (see also Section 5). Actually, the length of the process for the approval and publication of ES can surely be improved.
One aspect that contributes to speed up the process is that the proposals can be submitted only if the statistical product is already available. This means that research projects are not allowed for submission, as they require time to be developed that might not be known in advance, might end in non-timely results, and do not assure valuable outputs.
Key information on Istat experimental statistics
By their nature, ES have a great value added for users. They provide new or additional information, as they cover fields not fully exploited by official statistics yet. In this sense, ES can be seen as a prompt reply of NSIs to users demand, increasingly characterised by a great complexity, new fields of analysis and the need for a quick response. The most important features of ES are: i) the use of non-traditional data sources (e.g. Big data); ii) the use of innovative methodologies (e.g. economic models); iii) the geo-spatial reference or other types of data visualisations (e.g. maps, routes); iv) the integration of multiple sources (e.g. surveys and administrative data sources; or official and non-official data sources).
So far, Istat has published 16 ES on the website since April 2018. Some of them have been updated whenever more recent data were available. Moreover, two additional ES will soon be published and one of the ES already published has been shifted from experimental to official statistics, having reached the required quality standards and having been strongly demanded by users and stakeholders.
According to the classification provided for in Section 2, Istat has published the following ES.
Non-standard classifications
“Households by social groups”, the proposed classification results from a multidimensional approach which takes into account economic, cultural and social aspects. “Taxonomy of the internationalisation models”, a taxonomy of Italian firms’ internationalisation models composed of six mutually exclusive classes, representing different modes of operating on foreign markets. “Classification of generations”, this was the main theme of 2016 Istat Annual Report and it is a non-conventional classification. “Classifications of local systems”, within the 2015 Istat Annual Report, new classifications of local systems have been presented. “Classification of Municipalities based on Italian ecoregions”, this new classification of Municipalities according to the ecoregions is based on homogeneity with respect to climatic, biogeographical, physiographic and hydrographic factors.
New indicators
“Enterprises classified by use of ICT and economic indicators”, an integration between data from register and sample surveys. “Businesses behaviours and sustainable development”, new indicators on sustainability spreading and orientation by the Italian business. “Twenty years of employment and work qualification”, a tool that groups the basic units of the classifications of occupations into a different conceptual structure. “Wage inequality indices in small enterprises”, the inequality and wage differentials of the private sector by a plurality of analysis dimensions and distribution indicators. “Municipality-tailored indicators”, a multi-source information system which fosters experimental sources and other more consolidated ones as well.
Interpretation frameworks and analyses
“Museum routes in Italy”, identification of “museum itineraries” and “thematic paths” included in a cultural network. “Daily population for study and work reasons”, this represents an experimental approach using administrative data. “Integrated economic and environmental accounts for tourism”, the integrated account extends the macroeconomic description of tourism to its environmental sustainability.
Experiments on Big Data
“Social Mood on Economy Index”, a daily measure of the Italian sentiment on the economy based on twitter data. “Modalities of use of websites by enterprises”, the estimates are obtained using directly Internet data. “Use of the Open Street Map to calculate indicators for road accidents”, a calculation of new indicators on road accidents.
Istat is going to publish two new ES: one concerns demographic projections at a Municipality level and the other a credit crunch indicator for Italian economy based on business surveys.
Table 1 lists Istat ES products by category, together with the launch date.
Addressing users’ needs
Quality aspects of ES
One of the main reasons for disseminating ES is to address users’ needs. As mentioned before, the dissemination of ES occurs when quality targets required by official statistics cannot be fully guaranteed. Nevertheless, it might be useful to investigate some quality aspects of ES. According to the Code of Practice [2], statistical outputs should meet the following principles (also referred to as quality dimensions): relevance; accuracy and reliability; timeliness and punctuality; accessibility and clarity; coherence and comparability.
The easiest quality dimension to address is relevance. Indeed, ES usually represent improvements in this dimension if compared to official statistics. The ES are a response to “the need for National Statistics and other official statistics to remain relevant for use, to provide a dynamic public service” [4].
At Istat, improving relevance is the key aspect in developing ES. Researchers are producing ES to cover topics that are not fully exploited by official statistics. To fill in information gaps or to explore new topics, researchers look for new data sources and their potential from a thematic viewpoint. In this respect, also the implementation of new methodologies seems somehow to be driven by the information needs rather than by other considerations more related to the production process (e.g. efficiency gains or improving timeliness).
In general, ES do not reach accuracy and reliability of official statistics, as they have “a potentially wider degree of uncertainty in the resulting estimates” [4].
Furthermore, it is more difficult to assess accuracy of ES compared to official statistics, due to the use of non-traditional data sources or to the integration of different sources. In both cases, errors that reduce accuracy might affect the sources, or be introduced at different stages of the production process, particularly during the linkage [5, 6]. Quality frameworks developed for Big Data sources usually consider three macro-phases of the statistical process: input (when the source is acquired or in the process of being acquired), throughput (when the source is processed), and output (referred to the output data derived from Big Data sources) [7]. The framework allows for a full assessment of the Big Data source and the resulting outputs. Moreover, it represents the starting point for user-oriented quality reporting. Guidelines on how to report to researchers and users on the quality and limitations of linked data (e.g. surveys and administrative data) are provided in [8]. They can be used for ES based on the integration of multiple sources as well. However, a more comprehensive approach for reporting to users on ES has not been developed yet.
Despite the difficulties in assessing accuracy, this dimension becomes one of the most important element to shift from experimental to official statistics.
Timeliness might be a driver for developing ES. New methods or new data sources can be used for increasing timeliness of existing statistics. Timeliness is vital to users – particularly for economic indicators [9]. However, it does not seem to be a key motivational factor for Istat researchers, at least so far. Even when new methods or methodologies are implemented, the primary aim is not to make the production of statistics faster. In addition, other factors affect the timeliness of ES. As known, timeliness refers to the lag between the publication date and the period to which the statistics refer [2]. However, the length of the process for publishing ES has negative effect on the timeliness of the statistics. Up to now, the process for publishing ES takes about 8 to 10 months on average, given the many different actors involved in the evaluation process (as described in Section 3). Besides, possible delays in disseminating some ES could depend on the responsible of the product who can decide to postpone the publication to update the analysis using more recent data that have become available in the meantime. In other occasions, the product could have a strategic importance for users and therefore it might deserve more in-depth analyses and a proper communication strategy before being published.
Istat has recently started to plan the submission of ES proposals, by identifying internal Divisions willingness to present proposals and their field of interest. In this way, the time needed to evaluate ES is expected to be shorter in the near future.
With regard to accessibility and clarity, the Istat website has a specific section marked with a specific logo for ES, as already mentioned. Moreover, ES are published together with methodological documents aimed to support users in using them and understanding potential limitations. Indeed, ES can provide useful information for users as long as their nature is well explained and understood (
Total number of accesses to the available ES in the period April 2018–September 2020.
A problem that Istat is facing with the dissemination of ES concerns the publication of particular ES that might have a high impact on users and media as well. This might be the case of ES dealing with very important themes from a social or economic point of view. In these cases, planning a proper communication strategy might help to prevent the media reactions and to clearly distinguish the ES from related official statistics. To cite an example, when the ES on “Social mood on economy index” (
Moreover, the publication date is an important element too. It is carefully selected to avoid overlapping and misinterpretation by users with regard to similar official statistics released in the same period.
Based on Istat experience, key elements to consider for a proper dissemination policy of those ES that are expected to have a high impact on the media, are:
publishing a notice on the website homepage to stress the importance of the ES released; launching an advertising campaign involving stakeholders or selected users (if known); defining the right title of the ES published to avoid misunderstanding.
Coherence and comparability of ES with other statistics, particularly with official statistics already disseminated, are key requirements for the users. Incoherent information would be useless and disappointing for the users. Therefore, coherence and comparability are assessed during the evaluation process of the proposals at Istat.
So far, the focus has been on how Istat is meeting users’ needs when it comes to the dissemination of ES. However, in order to have elements to evaluate how successful Istat is with the dissemination of ES, it is important to collect and analyse users’ feedback.
To survey the users’ interest in the ES published, there is a specific section on the Istat website where users are invited to leave their observations, comments and suggestions.
As expected, users’ feedback mainly focuses on generic data requests, both for experimental and official data. In some cases, users require additional data related to the ES published or to their updating.
In other cases, requests are about the information systems functionality, clarifications or suggestions, or else methods used to produce indicators, as in the case of the ES “Municipality-tailored indicators” information system (
Finally, further feedback from users regards proposals of collaboration based on a specific ES (e.g. ES on “Social mood on Economy index”) or requests for reference documents for experimental data not published yet (e.g. experiments on Big Data from mobile phones).
In order to have an outline of the users’ interest in the ES, the accesses to the website section devoted to ES have been analysed.
Figure 1 shows the total number of accesses to the specific ES since their publication on the Istat website in the period April 2018 – September 2020. The average number of accesses per ES is almost 15.000, with the most visited ES – “Social mood on Economy Index” and “Households by social groups” – being accessed around 25.000 times.
In Fig. 2, the monthly dynamics of users’ accesses is showed together with the number of available ES (period April 2018–August 2020). It can be noted a positive users’ feedback to the increasing ES supply.
Monthly number of available ES (solid line, left vertical axis) and users’ accesses (dashed line, right vertical axis). The latter are plotted in log scale for readability.
As already underlined, the ES future developments have a high potential to be exploited to assure prompt reactions by NSIs to an increasingly fast and detailed users’ demand.
To outline a comparison among NSIs approaches to ES – mentioned in Section 1 – the great majority has a dedicated section for ES on their website, even though not all of them have a clearly visible logo developed ad hoc for ES. Moreover, there is a great heterogeneity in the definitions of ES. In some cases, ES are considered as mere provisional estimates or, in other words, still under development. However, ES are considered as an important innovation for statistical production, and the feedback from users and stakeholders is deemed crucial to better develop and consolidate these kind of statistics.
With regard to Istat experience, there are three scenarios for an ES after an average period of three years from the first publication: 1) the ES is regularly updated with the most recent data available and it remains experimental; 2) the ES moves to official statistics; 3) the ES is stored, e.g. it remains a one-shot statistic.
The ES evolution, at least for some of them, envisages their transition to official statistics. This issue has recently been handled by the Research Committee, which stated that some ES may become official after a careful evaluation which involves different actors within Istat.
Therefore, a set of criteria to be satisfied has been defined where relevance to the users plays a preeminent, even though not exclusive, role. The criteria try to assess to which extent possible limitations of the ES have been overcome, thus justifying the transition to official statistics. The criteria are the following:
Strengthened and/or improved methodologies, and the stability of the estimates (reliability); Strengthened production process and its security, including IT services; Modified and/or improved data sources; Increased geographical coverage (e.g. a regional analysis extended to all Regions); Ensured relevance and timeliness; Positive feedback from users.
Of course, the above-mentioned criteria imply the need to evaluate the risk that the new statistics might have an impact in terms of competitiveness or overlapping with regard to official ones.
Moreover, users may play an important role in pushing for the shifting from experimental to official statistics. For instance, users may be particularly interested in a specific ES, as they need it for research activity or decision-making. This is the case of the ES on “Classification of Municipalities based on Italian ecoregions” (
The very recent health, social and economic crisis caused by COVID-19 pandemic has had a never seen before impact on the production of official statistics that has to meet users’ need quickly, in particular those of policy makers who have to take prompt decisions related to recovery.
To meet these needs, among other statistics, Istat has recently released the results from the “Survey on the COVID-19 emergency impact on Italian businesses” that can be considered as experimental with regard to data source integration, as the survey has been integrated with census data. This shows that timeliness is becoming another key element to take into account when processing and evaluating ES, given the current situation.
