Realisation of ‘administrative data first’ in quarterly business statistics

Abstract

Stats NZ has moved to an ‘administrative data first’ approach for quarterly financial statistics. This paper gives an overview of the transformation process, describes some of the methods and techniques used, as well as outlines the benefits, such as increased flexibility, improved quality, and reduced respondent burden that we have realised in adopting this approach. Some of the challenges faced with this significant paradigm shift, particularly the changes in thinking needed are discussed, as well as the ongoing commitment required to maintain this ‘administrative data first’ approach.

Keywords

Administrative data business statistics transformation

1. Introduction

Statistics New Zealand (Stats NZ) is aiming to maximise its use of administrative data. With economic statistics we are adopting an ‘administrative data first’ paradigm, where administrative data is the primary source of business information, and where we collect further data (for example by survey) only when necessary. This was driven by the increased demand for business data and the pressure to reduce respondent burden and create statistical efficiencies. The development and implementation of an ‘administrative data first’ approach for quarterly business financial statistics is one of our recent achievements in making greater use of administrative data in economic statistics.

This paper commences by giving an overview in Section 2 of the design and implementation of the ‘administrative data first’ approach in quarterly business financial statistics; how we started small, and gradually broadened and extended the approach once proven. Section 3 discusses the benefits of the approach and how these have been realised in practice. Section 4 outlines how we did it, identifying the key components of the ‘administrative data first’ approach and how we developed, used, and implemented them. The final section reviews some of the key challenges we faced. It also emphasises how the ‘administrative data first’ approach requires ongoing commitment – a key consideration being working closely with suppliers of the administrative data.

Figure 1.

History of quarterly financial statistical designs.

2. Overview of the transformation

2.1 Where we began

Stats NZ’s use of administrative data in quarterly business financial statistics is not new, but was limited prior to 2015. In 2001, administrative data sourced from the Goods and Services Tax (GST) filing was added to supplement the survey data. GST is a value-added tax levied on almost all goods and services sold in New Zealand. GST is collected by our national taxation office, Inland Revenue. Businesses supply their gross sales and purchases as part of their regular GST filing. Stats NZ receives a weekly (was fortnightly up until 2017) supply of unit-record GST data from Inland Revenue.

Before 2001 Stats NZ’s quarterly business collections, such as the Economic Survey of Manufacturing and the Retail Trade Survey, used a sample survey design. The GST component added in 2001 was limited and contributed no more than 15 percent of sales values for the manufacturing and wholesale trade industries, and 10 percent for retail trade. This relatively cautious approach was taken due to questions over the conceptual fit of the administrative data, quality concerns, and issues around the timeliness of the data.

Although the use of administrative data was limited, Stats NZ continued to think and discuss how we might make more use of this potentially rich source of data. Thought-leaders such as McKenzie [1], and others [2], explored how we might extend our use of GST data and encouraged the organisation to challenge the existing sample survey paradigm. What was needed to progress further was a consistent and reliable methodology and resolution of some of the issues (both real and perceived) with administrative data.

2.2 Dedicated investigation

A small dedicated project team was established in 2013 to advance the administrative data work. This cross-functional team consisted of subject-matter experts, researchers, data specialists, and methodologists. The team developed and refined methods for assessing, transforming, and using the administrative data (mainly GST), so it could be used more fully as part of Stats NZ’s data collection and output processes. These methods have been discussed in more detail in a previous paper [3], and are summarised in Section 4 of this paper.

A key part of the team’s work was ensuring that key users were confident that this radical transformation was both necessary and worthwhile. This involved extensive consultation with our key internal user, National Accounts, who needed to have confidence that the new approach could provide robust and timely results on an ongoing basis. The quarterly business financial statistics are outputs in their own right, but also contribute to National Accounts calculations of Gross Domestic Product (GDP). External users tended to be less concerned about how we collect the data – data quality and timeliness were the important considerations for them.

2.3 First steps – transforming existing outputs

By early 2015 the methods were ready to implement. The team had taken a wider view of how the administrative data could be used, so the assessment methods and transformation techniques were viewed as being applicable to most of the economy. However, we wanted to start relatively slowly and cautiously with existing outputs so that the ‘administrative data first’ approach could be proven before looking to extend it to more industries and potentially more variables.

The new approach was successfully implemented in the quarterly manufacturing and wholesale trade outputs from the September 2015 quarter onwards. GST sales data (direct or modified by modelling) is used wherever possible, supplemented by a managed collection of large and complex businesses where the use of GST is not suitable. Figure 1 shows the overall transition from a stratified sample survey model to the current ‘administrative data first’ model.

The approach was also used to create sales indicators for some selected service industries which had not previously been covered by our quarterly financial outputs. Adding the new service industries was a “quick win” – the extra industries had few large and complex units and most of the value was provided by the administrative data component (GST sales used directly or modified by modelling). These new indicators were quickly adopted by National Accounts as part of their quarterly GDP processes.

The new statistical design and methods were migrated onto our statistical production system, so we could efficiently expand the design to cover additional industries and variables. Stats NZ has moved to a single system for the processing and dissemination of most of its economic statistics. A key feature was the inclusion of survey and administrative data in the same system, along with the Stats NZ Business Register (a list of economically significant businesses and organisations engaged in the production of goods and services in New Zealand), which provides a population frame for business statistics and acts as a ‘spine’ to integrate many of our economic outputs.

2.4 Extension – new variables and industries

The success of applying the ‘administrative data first’ approach to the manufacturing, wholesale trade, and selected services industries left us in a great position for further enhancements. The new approach let us collect data at least cost from the maximum number of businesses, while also producing a wealth of business data. Initial thoughts were around a planned expansion of the approach to cover the balance of the service industries, for which National Accounts were wanting to improve their indicators.

Impetus for an even greater expansion was provided by other needs from within National Accounts. Stats NZ has embarked on a significant upgrade to the scope and quality of our National Accounts, including the development of a quarterly income measure of GDP (GDPI) and balance sheets, to improve our range of economic statistics. New variables and wider industry coverage was needed to support these developments. We believed that the standardised approach already in place could be efficiently scaled up to meet these new demands.

We needed to extend our managed collection of large and complex businesses to additional industries in the economy. For the remaining businesses, we investigated the use of GST sales and purchases in combination with salaries and wages from the Employee Monthly Schedule (EMS) administrative data to derive profit estimates. EMS is a monthly payroll return covering all employees, capturing taxable and non-taxable earnings. We used existing methods and systems for this extension. Instead of creating a separate new collection, we combined it with what we were using already and produced a complete and coherent collection including both administrative and managed collection data.

The expanded collection was implemented from the June 2016 quarter onwards. This involved the collection of additional variables (purchases, salaries and wages, and profit), as well as extending industry coverage to almost the entire economy. The expanded collection is referred to as the Business Data Collection (BDC). Retail trade, and accommodation and food services were initially excluded from this expansion as the Retail Trade Survey was at that time produced using a stand-alone process and system, and had a different design and production schedule from the other quarterly outputs.

2.5 Retail trade – adding the geographic dimension

The retail trade, and accommodation and food services industries were the last industries to which we applied the ‘administrative data first’ approach. All industries except some very specific sectors such as farming, banking and finance were now included within the collection. As noted above, the Retail Trade Survey had been produced using a separate system and schedule to the rest of the quarterly outputs. However, it was important for the GDPI work and the overall consistency and coherence of the Stats NZ financial data collections that retail trade was included within the BDC. The Retail Trade Survey would also benefit from transition to a more modern production environment and the additional advantages of moving from a sample survey design to the ‘administrative data first’ approach.

In migrating Retail Trade our key aim was to continue the ‘administrative data first’ approach and utilise the same systems, methods, and processes as the rest of the collection with as minimal a change as possible. We already knew that GST data would work very well for retail trade as it had already been assessed using the methods discussed in Section 4 of this paper. The key differences we needed to resolve were the earlier timing for the release of data from the Retail Trade Survey and the requirement for a geographic location or store-based aspect to the collection.

Retail Trade moved to the new approach from the September 2017 quarter onwards. We solved the timing issues by incorporating an earlier supply of the administrative data (GST sales), compressing our production processes slightly, and delaying the release timing by one week. This was acceptable to our users and the Retail Trade Survey results are still released ahead of the other quarterly financial outputs and in plenty of time for National Accounts to use the results in their quarterly GDP processes. We also incorporated a location (retail store for retailers) into our processing and analysis system to meet the needs of producing area-based retail statistics. These geographic locations were already a part of the Stats NZ Business Register. Instead of applying these to retail only, we included locations for all businesses within the collection. This work required additional modelling as the administrative data is at the business level rather than location-based. This data will provide greater flexibility in the future and the ability to look at area-based statistics for industries outside of retail.

2.6 What we have now – nirvana?

Stats NZ are now in the fortunate position where we have a comprehensive quarterly financial business collection. It is powerful and highly flexible in that it contains quarterly data on almost every economically significant business in the economy, across the key financial performance variables (sales, purchases, and salaries and wages).

The collection uses a combination of administrative data and survey data collected directly to provide its quarterly values. The surveyed data is from a relatively small number of large and complex businesses (the managed collection), with the bulk of the data being administrative data – from GST sales and purchases, and the EMS which provides salaries and wages data. The collection contains administrative data for all units including the managed collection units. Table 1 shows the sales value contributions of managed collection units versus administrative data for the collection.

Table 1
Sales value for managed collection versus administrative data (December 2017 quarter)

Industry	Sales value from	Sales value from
	managed collection	administrative data
Manufacturing	67.5%	32.5%
Wholesale	56.6%	43.4%
Retail	46.4%	53.6%
Selected services	14.1%	85.9%
Other industries	34.5%	65.5%
Total	46.1%	53.9%

Although the bulk of the collection began in June 2016, administrative data for all the units was available well before this so the collection includes back-dated quarterly administrative data (and surveyed data in industries where available) back to June 2011. This gives us an extended time-series to work with, and will greatly aid National Accounts with their GDPI work. Although data for expanded industries has been released on an experimental basis, we are continuing to refine and enhance the modelling and output processes as more data is collected.

The collection is in a single processing and analysis system, with standardised selection, editing, imputation, analysis, and output processes. Previous systems had been much more industry-based, had separate designs, and different production processes. The integration of surveyed data and administrative data into one system is extremely useful for research and evaluation purposes. The collection is also available to internal users for their own analysis work and for incorporating into their output processes. Due to the census nature of the collection and the metadata available on each business, it is possible to query and organise the data on many different dimensions and at various levels within each dimension. Key dimensions of interest used include: industry, sector, and location.

3. Benefits of ‘administrative data first’

There are several key benefits to the ‘administrative data first’ approach that Stats NZ has implemented for its quarterly business financial statistics. These benefits have been articulated in a previous paper [3], but are now being fully realised.

Figure 2.

Kaikoura district retail and tourism-related sales (quarterly).

3.1 Improvement in statistical quality

Previously, the quarterly business collections had stratified random sampling. We found that the sales series produced by the new approach are of better quality, because we now have data for all units in the population. These findings were part of the assessment process discussed in Section 4 of this paper. The removal of the sample designs has meant we no longer have weighted units within our sales series (we still use some sampling for inventories). Despite the extensive use of administrative data many industries still have significant proportions of their value provided by managed collection units.

3.2 Reduction in respondent burden

Transitioning to the ‘administrative data first’ approach has resulted in a sizeable reduction in respondent burden. In addition, we could extend our industry coverage significantly without a large increase in respondent burden. Some of the results achieved at the various stages were:

We achieved a 50 percent reduction in respondent burden across the combined Manufacturing, Wholesale Trade, and Selected Services outputs. In doing so we also increased the coverage of service industries.

The June 2016 extension, which widened the collection to almost the entire economy, was achieved with the addition of only around 750 businesses into the managed collection.

The managed collection for the retail trade portion of the BDC consists of around 375 businesses, as compared with close to 3000 businesses (including many small and medium enterprises) surveyed as part of the former Retail Trade Survey.

To put the overall transformation to ‘administrative data first’ in context and illustrate both the respondent load reduction and the collection gains achieved the difference from start to finish is worth noting. In June 2015 (the last quarter before we began to transition) Stats NZ were directly surveying close to 5000 businesses to cover Manufacturing, Wholesale Trade, Retail Trade and two service industries. By September 2017 (when retail trade was migrated) we were covering almost the entire economy quarterly by directly surveying just over 2000 businesses, and using administrative data for all other businesses. Table 2 shows the number of units directly collected from compared with the units for which we use ‘administrative data first’ across the whole collection, split by various industries.

Table 2
Units directly collected versus modelled units – business data collection (March 2018 quarter)

Industry	Managed	Stocks only	Modelled	Population
	collection	survey	units
Manufacturing	295	75	20,760	21,130
Wholesale	260	230	16,590	17,080
Retail	380	0	48,440	48,820
Selected services	75	0	43,165	43,240
Other industries	730	0	394,780	395,510
Total	1740	305	523,735	525,780

Figure 3.

Administrative data assessment model.

3.3 Flexibility in statistical production

The comprehensive unit-based coverage of the GST data has given us more flexibility. This has allowed us to meet emerging customer needs as we can provide greater detail and more varied “cuts” of the data. This flexibility has been shown recently in the ability to produce regionally-based results for specific industries. Stats NZ produced a sales series for retail and tourism-related industries to show the economic impacts on a relatively small area (Kaikoura district) after a major earthquake in late 2016. Figure 2 shows a graph of the time series we produced for the Kaikoura district using the new approach. This would not have been possible with a sample-based survey.

3.4 Scalability in statistical production

We developed methods that allowed us to easily expand our quarterly collections. Early results from this were realised in providing new statistics for industries that had not previously been covered by Stats NZ, such as the sales indicators for some service industries. The new statistics have improved the quality of our GDP statistics and provided new information for our other customers. The ability to expand further into almost all industries of the economy and increase the variables collected within the BDC fully demonstrated this scalability.

3.5 Production of unit-record data

Microdata is a powerful tool for analysis, and is a strong customer need. The new approach has provided unit-record data for sales and the other variables that were added as part of the BDC expansion. This also allows us to more easily integrate data from different sources. The addition of location information as part of the retail expansion enhanced the value of this unit-record data and will make it usable for spatially-based statistics. As mentioned in Section 2.6, we populated this quarterly unit-record dataset back to June 2011 using a combination of surveyed data (where available) and modelled administrative data.

4. How we did it – key components of the ‘administrative data first’ approach

This section presents some of the key components of the ‘administrative data first’ approach, and describes how they were utilised at various stages of design and implementation. These methods have been more fully described in other papers, for example [3].

4.1 Assessment model for use of administrative data

A key initial step in testing the viability of administrative data was to set up and use an assessment model. Initially this was applied to GST sales, but was extended to the other administrative data sources (GST purchases, and EMS for salaries and wages), which we are now also using in the collection. The assessment model helped us to make decisions in two areas:

The suitability of the administrative data – where the data can be used

The options available for using the administrative data – how the data should be used.

Various aspects of the administrative data were assessed within the model including conceptual alignment, timeliness, business reporting structures, and reporting frequency. Figure 3 shows the model with the aspects assessed within decision boxes. Depending on the results of the assessment and its suitability, the most appropriate methods were found for various parts of the business population.

4.1.1 Conceptual alignment

The conceptual alignment work utilised our Annual Enterprise Survey (AES) data, which was compared with annualised GST or EMS on a unit basis. We were easily able to identify industries with good conceptual alignment, and establish which industries would need further methodological work to enable us to use the administrative data successfully.

GST data can include sales and purchases of large capital items. These transactions do not fit within the concepts we are measuring. We developed a GST outlier strategy with rules to detect and enable us to remove large capital items from units in all industries. A similar outlier strategy was deployed for the EMS (salaries and wages) data when the June 2016 industry and variable extension was introduced. Outliers are less common in the salaries and wages information and tend to be explainable by real world events, such as large scale hiring for specific projects, or redundancy payments resulting from business closures or restructures.

4.1.2 Timeliness

Timeliness of administrative data was a key consideration and meant a more cautious approach had previously been taken. To correctly measure turning points and changes in the economy we need data that is applicable to the period being measured – forecasted data using previous periods is not suitable. We found that most of the GST data by value was received within our existing production timeframes for the Manufacturing and Wholesale Trade outputs. This meant that the level of imputation needed was not as high as initially feared, and is actually very low at around 3 percent of the overall tax contribution.

The transition of retail trade for the September 2017 quarter required further assessment around timeliness. This was due to the earlier release date for the Retail Trade Survey. We found that the supply of administrative data a week earlier than what is used for the remainder of our quarterly financial outputs still has enough administrative data processed to produce robust statistical outputs. We impute around 5 percent of the value for tax units within the Retail Trade Survey.

4.1.3 Business reporting structures

Reporting structures for administrative data do not always directly align with how we might ideally wish to collect data. For example, GST data can be filed as a group, which may contain combined data for several businesses sometimes spread across multiple industries. We analysed GST filing across the economy and found that 95% of businesses had a simple reporting structure and their data could be used without needing to apportion it over multiple units. However, we would need a different approach for the larger units and those with more complex structures.

As described in Section 2.5, further development of the assessment model was required to apportion the administrative data to the geographic location level with the introduction of the Retail Trade industry into the Business Data Collection.

4.1.4 Business reporting frequency

Filing frequency for the administrative data was another aspect that was part of our assessment model. GST in New Zealand is filed monthly, two-monthly, and six monthly depending on business size. In producing quarterly statistics we needed to deal with the various filing frequencies. EMS data is supplied monthly regardless of business size.

4.2 Transformation methodology

For GST monthly filers with simple structure, we could use what we called a direct unmodified approach by using their GST data directly. The sales and purchases measures are obtained by simply adding up the three-month GST value of these units. This approach was also applied to almost all businesses to transform their EMS salaries and wages data.

Two-monthly and six-monthly GST filers were handled using a direct transformed approach, where we use the GST data from each unit but transform to a quarterly basis using modelling (with some forecasting for six-monthlies). The modelling is based on data for monthly filers within the same industries, and we carefully check the monitoring factors each quarter. Unusual monthly units are detected and excluded from the modelling. The six-monthly filers receive the most modelling, however the contribution of these units to any industry is minimal (less than 5% by value and well under this for most industries).

The combined sources approach uses the GST and EMS data in conjunction with data from other sources (administrative or directly collected), which can provide reliable benchmarks or useful auxiliary information to the administrative data use. For example, this method is used for small and medium sized GST groups, where employment data is used to derive ratios to apportion GST data to the individual group members.

The other sources approach uses other sources to produce statistical output when the administrative data is deemed as ineligible for our use. For example, our managed collection strategy is described in the next section.

4.3 Managed collection

The managed collection strategy is a key compo- nent of the ‘administrative data first’ approach within the quarterly financial collection. This identifies businesses where administrative data is unsuitable for statistical use. We include these businesses in our quarterly managed collection and directly collect data for the variables we need. We set up a standard procedure for selecting this managed collection that could be applied across the whole economy.

The collection was established using three guiding principles: significance, dominance, and complexity. We wanted to include all units of significant size, those that had a level of dominance within their industry, and those with structural complexity (particularly those active in multiple industries). In most cases, these units coincided with those that were problematic because of their GST reporting structure. We developed trial business rules and tested several different rules using simulations. The business rules that proved to be the most suitable were as follows:

A $100 million significance rule – if an enterprise, or group of enterprises linked by ownership, have an annual GST sales turnover of more than $100 million.

A 3 percent industry dominance rule – if an enterprise makes more than a 3 percent contribution to annual total income for an industry.

A structure complexity rule – all enterprises that have a significant level of activity across more than one industry.

When we initially moved to the ‘administrative data first’ approach in September 2015 with the migration of Manufacturing, Wholesale Trade and selected services, we did not just select a managed collection for those industries but ran a selection for the whole economy. This meant we could test the methodology on other industries and allowed for the expansion that was to come.

Expansion of the collection in June 2016 required further testing of the managed collection. The existing rules were largely based on turnover measures, however with business profit being a key variable of interest we needed to ensure that the surveyed units would provide a significant proportion of this variable. We assessed all industries and found for most, our existing turnover basis was suitable for covering businesses that would make the greatest contributions to the profit data. A few industries had some erratic and volatile profits and for these we added some businesses that would ordinarily be below the turnover threshold – mining was an example of an industry where we added to the surveyed units in the managed collection.

The addition of the retail trade industries in September 2017 required a modification to the managed collection strategy. To give us locational data, administrative data would need to be modelled to any units with multiple locations. We investigated increasing the size of the managed collection to include more of these multi-location businesses so that we could minimise the modelling required. In practice this meant testing a range of lower thresholds for the managed collection. A retail threshold of $50 million GST sales per year (as opposed to $100 million for the balance of the BDC) was eventually chosen. This lowered threshold also had the added advantage of providing a higher level of directly collected retail inventories.

4.4 Measuring variables not in administrative data

A challenge that Stats NZ had in moving to an ‘administrative data first’ approach was the requirement for variables that are not included in the administrative data available. The main challenge we had initially was with inventories. Quarterly inventory changes are required by National Accounts in producing GDP, but the GST data does not include values for inventories.

We did extensive analysis and established several methods for producing the inventory estimates. These include:

Benchmark to annual approach – estimates are obtained by ‘rating up’ the aggregate managed collection inventory series using annual financial data. This method is used where the managed collection captures quarterly changes and are a significant portion of overall inventory levels.

Model from annuals approach – estimates are obtained by models using the relationship between GST and inventories in our annual financial collection. This method is suitable for some smaller industries where the inventory levels remain relatively consistent over time.

Sample survey approach – this is used where any of the other preferred methods are not suitable. The sample collects only inventories data. We use this in a few larger industries where the managed collection inventory contribution is low.

Profit was another variable that is not directly captured by administrative data. We had to consider how we would model this as part of the expansion to the collection in June 2016. Therefore, we needed to capture sales, purchases, and salaries and wages in that expansion. We are then able to analyse and model the relationship of those three variables to operating profit, which we are collecting within our managed collection. We also have comprehensive annual data, which we can use to assist us in this work.

5. ‘Administrative data first’ – challenges and changes required

Moving to the ‘administrative data first’ approach has been a significant change for Stats NZ. It has transformed the way that we collect and process quarterly financial data.

5.1 Paradigm shift

The new approach has meant a paradigm shift in how we produce statistics. That change in thinking has involved convincing staff and data-users (both internal and external) that this change is necessary to fully capture the benefits of an ‘administrative data first’ design.

Part of the difficulty with this change is that surveying and sampling techniques are so well known and understood. They tend to be ingrained as a way of thinking about how to collect data from and produce statistics about large populations. The cost and burden of directly collecting from the entire population (a census) is almost always prohibitive. Random sampling, or stratified random sampling in the case of businesses, has tended to be the option of first resort. Even where administrative data does exist it is sometimes assumed to be not of good enough quality or seen as too problematic to use fully. Therefore, we opted for a comprehensive data-driven assessment for testing the ‘administrative data first’ approach – we wanted to allay fears, and remove some of the myths surrounding administrative data use.

In addition, data directly collected from businesses/ respondents is often seen as the gold standard, with administrative data only used if directly collected data cannot be obtained. Despite the fact administrative data is not reported for statistical purposes, our experience is that the data we use is of high quality. This is perhaps unsurprising given the tax obligations and the financial penalties that can exist if mistakes are made.

Another change in thinking was required in the ability to query data. In the case of tax data, we are unable to query the businesses as the data was not provided directly to us, nor are we able to ask Inland Revenue about unit specific data. This is very different from directly collected data where respondents can be con- tacted about the data values they have provided. It requires a change in mindset in learning to trust the administrative data and the processes and checking employed across it.

5.2 Shift from methodological challenges to managing the supply of data

At least for the collection and processing of our quarterly financial data most of the major methodological challenges have been overcome. There is still fine-tuning required, but by and large the ‘administrative data first’ approach has been bedded in and is part of business-as-usual. What this does mean is that the administrative data is now crucial to the Business Data Collection (BDC) and the various outputs within Stats NZ that rely on it, including the production of quarterly GDP.

Ensuring stability and continuity of supply is now vitally important given the contribution of administrative data to the collection. In a few short years, the tax data provided by Inland Revenue has gone from a supplemental and limited source of quarterly data across a relatively small number of industries to a data source that is now underpinning a collection that spans almost the whole economy. In several industries, the contribution of administrative data is close to 100 percent, while in all but a few industries the contribution is significant.

Maintaining a good working relationship with Inland Revenue is now more vital than it has ever been before. Not only are Inland Revenue in their primary role of collecting taxation revenue from businesses and individuals, but they are now in a secondary sense providing a rich data source that is used in the measurement of the performance and progress of the economy in which those businesses and individuals operate. This requires cooperation and close collaboration between Stats NZ and Inland Revenue and an increasing recognition of how linked and inter-dependent we are.

Inland Revenue is currently undertaking a major transformation programme. This has meant that we needed to work closely with them to understand the changes they are making and to ensure that the data flowing through to our systems was consistent with what we expected. The transformation presented us with an opportunity to receive more timely data – for example we now receive a weekly supply of GST data, when it had previously been fortnightly.

This increased collaboration is not just applicable to Inland Revenue. There are many administrative datasets and as Stats NZ looks to make more use of them relationships with other government agencies and other organisations will become increasingly important.

5.3 Establishment of quality framework for ongoing statistical maintenance

Following the introduction of an ‘administrative data first design’, it is important to create a quality framework to ensure the quarterly financial series continue to reflect economic reality. For example, changes in legislation could alter the conceptual definitions in the administrative data. A natural disaster could impact the supply and timeliness of the data from part of the business population. Ongoing assessment of units is required to ensure they continue to report in the administrative system as expected.

Our quality framework uses data from our annual financial collection to confirm the administrative data continues to be fit for statistical production. We compare annualised GST sales and purchases, and EMS salaries and wages with the annual data at the unit-record level to determine if previously held assumptions continue to hold. Simple correlation analysis by industry is predominantly used. This process also identifies influential units to be forced into our managed collection due to large conceptual differences in the administrative data and annual survey data controlled to meet our definitional requirements. Additional large inventory holders outside the managed collection are identified on an annual basis.

The parameters and bias in the statistical models (for example transforming GST two-monthly and six-monthly values to a quarterly basis) are monitored quarterly to ensure the modelled data is suitable for statistical use. The quarterly changes in the modelling parameters are compared with previous values, with large changes investigated in more detail to ensure they meet our economic expectations.

6. Conclusion

This paper has presented a broad overview of Stats NZ’s successful transformation to an ‘administrative data first’ approach for quarterly financial statistics. It has also described some of the methods and techniques used, as well as outlined the benefits, such

as increased flexibility, improved quality, and reduced respondent burden that we have realised in adopting this approach. These benefits will continue to accrue, particularly as we gain more data quarter by quarter in the expanded collection. Some of the challenges faced with this significant paradigm shift, particularly the changes in thinking needed were discussed, as well as the ongoing commitment required to maintain ‘administrative data first’.

Specific data transformation methods we use may not apply directly to other administrative datasets. However, we consider that the overall methodology and approach would be of great value to other agencies seeking to make more extensive use of administrative data. Adopting a data-driven assessment approach has been essential, as has testing the methods on existing outputs before extending. The ‘managed collection’ is a key component of our approach, as is creating a quality framework for ensuring the ongoing maintenance of the collection. A vital ingredient for the success of our transformation was taking a wider view from the beginning and ensuring that we maintained this view as we worked through the specific changes needed.

Footnotes

Acknowledgments

Mathew Page for writing significant portions of the text and reviewing the paper. John Stewart for the structure of the paper, and reviewing and providing valuable feedback. Chen Chen for methods assistance and previous work on which parts of the paper are based. Sue Chapman for managing the writing process and reviewing the paper. Allyson Seyb, Kate Jackett, and Madeline Cooper for reviewing and editing the paper.

References

McKenzie,

. Statistical architecture. Statistics New Zealand, Christchurch (internal unpublished paper). 2008.

Stewart,

Costa,

Page,

, and Chen,

. Maximising the use of administrative data in sub-annual business collections. International Conference on Establishment Surveys, Montreal. 2012.

Chen,