Abstract
Abstract
Big Data changed the paradigm of how the private sector serves its clients. The adroit use of Program Administrative Data (PAD) collected as part of the normal delivery of government services, when linked to records in other datasets, can change the cost and empirical paradigm of how the government learns what works and what does not, and can serve as the foundation for evidence-based policymaking. The linking of individual records is best achieved when records are associated with a Unique Identifier. The use of these linked datasets require a robust data-sharing and data-use infrastructure of laws and technologies that include regulations and procedures on (1) Privacy that outline which data are collected, (2) Confidentiality that outline the allowable users and uses these data, and (3) Security, that outline the excluded users and uses of these data.
PAD is used for Performance Measurements to asses who was served, when they were served, how intensively they were served, and at what costs. These measurements, however, do not inform on the change in recipient behavior, or how the recipient behavior impacted a community or a region. Tracing and evaluating a program’s impacts on recipients or communities requires linking PAD to micro data found across government and possibly private entities. There are several challenges with this learning process. The most common among them is learning from insignificant, null or negative findings. Similar challenges arise when evaluating new programs that may not have been in existence long enough to gestate impacts, or for evaluating the impacts of small programs that may not have generated sufficient service-provision observations.
The Value Proposition for Using Linked Micro-Level Datasets for Evaluating SME Policies and Programmes
During the 2000–2016 period, Big Data has changed the paradigm of how the private sector delivers its goods and services to its clients.1
Big Data comprise of large number of micro-level data records in a dataset that is often linked across other large micro-level datasets, and analyzed with complex algorithms. The resultant data can triangulate otherwise obscure information.
The cost paradigm-shift is largely achieved by leveraging already existing datasets, thus reducing or eliminating the necessity of gathering additional data. This approach is often accompanied with significant fixed or first time costs, and more often than not the cost and indeed the empirical paradigm-shifts only occur with the use of large datasets. It should be noted that since PADs are generated previous to programme evaluations, there is often a need to supplement PAD with additional data.2
The need to collect additional data can be minimized by implementing a data gathering effort as early as possible. These efforts are best implemented during the programme design stage, and certainly by the time a programme is launched. Gathering data after service delivery tends to be costly, and often not possible due to legal restrictions, or by the difficulty of gathering data from participants that are no longer part of the programme. The data gathering effort should be preceded by the formulation of a set of hypothesis of how the programme is expected to achieve its intended change or impacts. This process, often referred to as building a programme Theory of Change, in effect outlines the causal linkages between programme inputs, activities, outcomes and final impacts. For further details, see Gramigna, Giuseppe, et al., ‘Building Smarter Data for Evaluating Business Assistance Programs: A Guide for Practitioners and Evaluators’, U.S. Department of Commerce and U.S. Small Business Administration, forthcoming.
The Prerequisites for Linking and Using Government Micro-level Datasets
The linking of individual data records is best achieved when each record is associated with a Unique Identifier.3
A Unique Identifier is an alphanumeric code that identifies the source of each data record in a dataset. In most economies, the Unique Personal Identifier is a person’s Tax Identification Number (TIN), and in the case of a firm, it often is the firm’s TIN.
The intended uses of these linked datasets require a robust data-sharing and data-use infrastructure of laws and technologies that include regulations and procedures on (i) privacy that outline which data are collected, (ii) confidentiality that outline the allowable users and uses of these data and (iii) security that outline the excluded users and uses of these data. Any effort that gets any of these three critical elements wrong will fail on its own weight, measured by the quality and quantity of the data collected, stored and linked.
The Analytical Toolkit for Evaluating Government SME Programmes
These large micro-level datasets comprise of only the raw material for evaluating government entrepreneurial and SME assistance policies and programmes. There are a myriad of analytical tools available for using micro-level data for assessing the efficiency and effectiveness of government programmes. The most commonly used tool is a set of performance measurements that provide data, analytics and findings on a programme’s objectives, inputs and outputs in order to assess how closely aligned are the delivered services to a programme’s stated objectives. A typical set of performance measurements include who was served, when they were served, how intensively they were served, how quickly they were served and at what cost. These data are all internally generated by the programme and comprise PAD.
These efficiency-related data and analytics, however, do not provide any information on the change in recipient behaviour, nor do they provide any information on how the change in recipient behaviour in turn impacted the recipient or a community or a region. Tracing recipients’ behavioural changes, especially behaviours that are internal to the recipient such as the acquisition of knowledge or developing a business plan may only be possible via surveys. Tracing and evaluating a programme’s impacts traditionally require linking PAD to micro-level data found across government and possibly private entities. Common impact variables include employment variables such as net new jobs created or supported, change in payroll; value generation variables such as receipts and profits; market expansion variables such as new domestic or international markets or new establishments; innovation variables such as new patents, new brands or new trademarks; and business dynamic variables such as business purchased, business sold, business closed or business bankruptcies.
Some evaluations simply compare pretreatment recipient’s behaviour or impacts to recipients’ post-treatment behaviours or impacts. The linking of PAD to secondary data across the government and possibly the private sector also allows for the augmentation of a simple pre-post treatment analysis to include experimental sample designs where a control or comparison sample is introduced in the analysis. This addition, allows for a more accurate estimation of impacts that are attributable to the programme.
Cultural and Institutional Factors for Evaluating Government Programmes
The usefulness of impact evaluations is greatly influenced by numerous cultural and institutional factors. Primary among these is the agency’s ability to build strong working relationships between a programme’s implementation staff, and the analytical staff from performance offices, and impact evaluation offices. It is through these relationships that accurate performance and impact data are created, meaningful research questions are generated and actionable conclusions are conveyed.
In addition, it is through these strong working relationships that actionable conclusions are not simply reduced to up or down votes on a programme, but are rather leveraged as part of a continuous learning process. Up or down votes on a programme may not allow for the possibility of improving the programme. At times, learning about a programme may lead to difficult or unpopular change. These working relationships, with time, allow for a development of a common language and common understanding (common institutional knowledge) between programme implementation staff and analytical staff. While this common understanding does not necessarily lead to common agreement, it does provide the common conceptual framework for dialogue on how to resolve difficult and unpopular change.
There are several challenges with this learning process. The most common among them is learning from insignificant, null or negative findings. Evaluation findings that have a low statistical significance may not provide sufficient statistical ‘confidence’ that the findings accurately describe programme impacts. At times, low statistical significance may not allow for the generalization of specific findings to other programmes, services or alternative conditions. In general, the informational value of findings with low statistical power is very difficult to convey or to convert into actionable conclusions. For example, insignificant findings are often erroneously interpreted that the programme has no significant impacts, when in fact the programme may have been too small relative to the intended objectives. This problem is often referred to as the challenge of ‘finding a needle in a haystack’. Similar difficulties arise with evaluations that find no programme impacts at all. Leveraging these difficulties to interpret impact evaluations into meaningful learning and actionable conclusions is best obtained via strong institutional knowledge frameworks that value data, analytics and critical findings as integral part of programme design, delivery and evaluation.
Similar challenges arise with using the linked micro-level data process for evaluating small or young programmes. The linked micro-level data process requires large number of observations. New programmes, however, may not have been in existence long enough to gestate impacts, or small programmes that may not have generated sufficient service-provision observations. The findings of these types of studies need to be carefully interpreted and used: They may provide more evidence than focused case studies, or even small surveys, but they often provide only preliminarily and very conscribed evidence of what worked in specific conditions, and may only provide even very preliminary, and even more conscribed evidence of what might work in a fully developed programme or in different environments.4
It should be noted that case studies and surveys are often the only way to gather preference and satisfaction data from prospective or actual recipients.
Other Applications of Micro-level Programme Data
Potential additional topics to develop: developing policy and programme objectives: ex-ante market gaps, national, regional and international benchmarking, market/population penetration ratios, end-use market analysis, unmet needs of target population, learning about potential recipient population.
Also note that the appendix table is for illustration purposes only as there are some details that would need to be changed.
Notes
1. The statements, findings, conclusions and recommendations in this study are those of the author, and do not necessarily reflect his position as the Chief Economist at the United States Small Business Administration. © 2017 Giuseppe Gramigna.
Data Items of Interest for a Typical Entrepreneurial Development Programme
Some
