Abstract
Record systems used to administer programs often contain information useful for evaluating the effectiveness of a program. Administrative records are most often designed to facilitate processes key to the mission of the program. Data structures, quality assurance, quality control, and updating processes are generally defined by the needs of the program. Statistical uses of administrative data, common to evaluation studies, face a predictable set of benefits and challenges. This article reviews these issues.
Evidence-based policymaking or administration generally requires formal evaluation of the design, execution, and outcomes of a given organization’s activities. Over time, leaders of organizations assess the good and bad features of their work, and customers are asked whether the organization’s product or service was useful to them. The “evidence” for evaluations can come from many sources. In the commercial sector, the measurement of production processes, service provision, and consumer proclivities have inspired continuous improvement feedback loops between the designers of products and services and consumers in an attempt to optimize outcomes. 1
The development of formal evaluation methods (Campbell and Stanley 1963), the growth of professional evaluation associations (e.g., American Educational Research Association, founded in 1916), 2 and, more recently, the fascination of economics with randomized controlled trials all originated in attempts to boost crop production in agriculture (Fisher 1935). These approaches use empirical data to assess the value of a program or service. The empirical data used in such evaluations can come from self-reports in a staff or customer survey or from recordkeeping by the program providers. The empirical data are used to compare whether those who experienced the service have better or worse outcomes than similar people who did not. Often, evaluations also ask whether any of the program’s beneficial effects are worth its costs.
One example is Chetty’s use of de-identified federal income tax data (Chetty et al. 2014), which showed how residential differences in income inequality and racial segregation are linked to lifetime social and economic outcomes of individuals. Another is Hawaii’s evaluation of its Opportunity Probation with Enforcement program, which uses administrative criminal records data to see how likely someone is to commit a new crime (Hawken and Kleiman 2009). Such evidence-based policy studies often use multiple data sources. The Next-Generation Data Platform for the U.S. Department of Agriculture Food Assistance Program Research project, for example, uses both administrative records and survey data (Bohman 2016).
Administrative records can also alert researchers to problems that might otherwise go unnoticed. Case and Deaton (2015) used individual death records to show a rise in mortality rates among non-Hispanic whites between the ages 45 and 54. By using administrative records, they alerted the country to a problem. Currie and Schwandt (2016) used Census Bureau administrative records to examine the problem further and discovered that poverty levels were critically important to differences in mortality among the middle-aged and elderly. This article discusses the benefits and challenges of using administrative data for evidence-based policymaking.
Administrative records are data collected for the purpose of carrying out various non-statistical programs. As such, the records are collected with a specific decision-making purpose in mind, and so the identity of the unit corresponding to a given record is crucial. (Statistics Canada 2009)
In the United States, common examples of federal administrative data relevant to evidence-based policymaking include income tax records, Social Security data, and Medicare records. Others come from state-based and local-government systems: unemployment insurance, the Supplemental Nutritional Assistance Program (SNAP), Medicaid records, police reports, birth and death certificates, transportation flows, public health service records, educational record systems, and housing data. And many administrative record systems exist outside of government—retail point-of-sale system data, credit card transaction data, real estate data, GPS travel data, payroll and human resources records, data from the Internet of Things, utility records, bank records, keystroke and click data, and electronic profile data used by retailers. All these are used to document and assist processes that fulfill the key mission of the organizations creating them.
Common Features of Evaluation Studies for Evidence-Based Policymaking
Potential roles of administrative data in evidence building vary by what type of evaluative approach is used. A common question is whether a program is achieving its aims. Here, “aims” must be specified. For an evaluation study to be feasible, the aims need to be measurable in some way. Do the participants in the program or customers of a product or service achieve a state after the experience that is different from that before the experience? Are those differences the ones desired by the designers of the product or program? Do the differences themselves vary across subgroups of individuals?
The logic of randomized controlled trials is that inferences about the effects of a program can be gleaned by comparing two identical groups both before and after the program. The groups are made (statistically) identical by randomly assigning them to experience the product/service or not. Those who experience the program are compared to those who do not, with the statistical assurance that, except for experiencing the program, the two groups should have identical outcomes. Here, administrative data often are used for program participants but other data need to be designed for nonparticipants.
Many scholars have identified challenges in using randomized designs to produce correct causal inferences (Rosenbaum 2002; Rubin 2006). These include lack of the same preexperience and postexperience data on the two groups, the tendency of some individuals to drop out of one of the groups in a way that makes the groups less comparable on attributes related to the outcome (e.g., clients who perceive little value in a program may tend to quit and then are lost to administrative data), events external to both groups that affect their behavior (e.g., the effect of closing a long-standing food bank on the performance of a free lunch program in schools not relevant to administrative data), and the groups’ awareness that they are being examined having an effect on their behavior relevant to the outcome (i.e., the so-called “Hawthorne effect” [Landsberger 1958]).
Quasi-experiments (based on comparisons of groups without a formal randomized assignment) can be used when the researcher cannot randomly assign a subject to a control or treatment group. For example, many studies of teaching innovation compare classrooms exposed to a new method with others not exposed, without randomly assigning students to classrooms. In quasi-experiments, the researcher must make all threats to internal validity clear (e.g., do we possess strong measures of learning under both the new and old methods?). Can we interpret differences as solely due to the treatment (e.g., does performance differ between classrooms solely because of the teaching innovation)? The researcher’s final causal inference will not be as strong as if a random experiment had been implemented. Researchers use quasi-experiments instead of randomized experimental designs because of cost, time constraints, inability to withhold treatment from a control group, and the fact that randomized experiments cannot be designed to answer questions about certain kinds of possible causal variables (Cook and Campbell 1979). Administrative data sometimes contain attributes of the subject population that are helpful in statistical analyses that attempt to repair the weaknesses of groups compared without randomized assignment to the program being evaluated.
Common Features of Administrative Data and Their Use in Evidence-Based Policymaking
Evaluation studies can be greatly enhanced by combining administrative data with survey and other data. Administrative data, however, are not problem-free. A common feature of such data is that they were not envisioned to be used for statistical purposes. Instead, they were designed to help individuals to participate in programs or obtain services. Because administrative data are used to manage a set of organizational processes, they tend to have common properties.
Administrative data are often limited to the population experiencing the program
Not all datasets include the entire population of interest to an evaluator. Most administrative records from a government program describe only those participating in the program. For example, SNAP data contain information on those who request food assistance. Federal income tax data contain information on those who submit tax forms. Thus, administrative data themselves do not permit a crisp answer to questions of what happened to those eligible for the program who chose not to participate, or what would have happened to the participants if they had not participated. For such questions, other datasets containing nonparticipants are necessary.
Administrative data often describe statuses only during the time that participants experienced the program
Administrative data generally have little historical information about an individual. They generally begin with the start of participation and end when participation does. Unfortunately, sometimes a program’s effects are realized far after it has ended, when the data of program participants are no longer being recorded.
Individual items in administrative data vary in their importance to the program
When an attribute of an administrative data record is crucial to the administration of the program, it is likely to be more accurate than items that are not actively used. As a result, data elements within an administrative dataset vary in quality for evaluation purposes.
Administrative record systems vary widely in their practices for keeping attributes up to date
Some administrative record systems update data fields with each interaction with a participant. But some of these updates destroy the original value of a data item, making it impossible to know its former value. For example, SNAP overwrites old addresses as individuals move (Iwig et al. 2013).
Administrative data sometimes lack metadata important to secondary users
Metadata, which describe the meaning of values in a dataset, are necessary to understand and properly use data. When centralized software platforms control administrative data content and use, metadata are driven by the program’s needs. Later statistical analysis of the data may require more complete documentation of each field in the dataset. Poor metadata management often means that values in a data field have unknown or ambiguous meaning. The administrative agency can often continue to use such data, but for a secondary user, the lack of metadata can prevent intelligent statistical analysis.
The temporal extent of administrative data may not be well documented
Since administrative record systems are designed to aid the delivery of products and services to individuals eligible for them, it is not necessary to purge the records of those who are no longer eligible. For example, voter data systems often contain records of people who have died. Ideally, these records would be purged from the file before evaluating the program. Of course, sometimes these records might reflect “dropouts” from the program. Those who quit the program complicate the assessment of its effects. Such cases may require that evaluation studies be adjusted for selection bias.
Administrative data systems are not necessarily well structured for statistical analysis
Administrative records vary in their structure. They come in three forms: structured, semistructured, and unstructured (National Academies of Sciences, Engineering, and Medicine 2017). Although structured data like SNAP’s can be very useful, unstructured data like text-based medical records complicate statistical analysis for program evaluation. Transforming the nonempirical data into empirical data becomes an important step before statistical analysis. How the data are transformed can affect the evaluation’s conclusions. Hence, full transparency requires careful documentation of the transformation process.
Administrative data often combine multiple data collection methods with different properties
Sometimes the customer or client provides data in response to a request for information (e.g., a Social Security disability eligibility interview). Sometimes a device provides data (e.g., a traffic sensor for transportation data). Sometimes a service provider observes attributes (e.g., a housing inspector’s observation of a structure’s quality). Each of these data sources may fail to create a data item in different ways. For example, Statistics Netherlands uses road sensor data to produce traffic statistics. However, information may be missing either because of packet loss between a sensor and the central database, or because a sensor has broken (Puts et al. 2016). If the probability of a sensor failing is a function of precipitation, and if precipitation increases the probability of traffic accidents, then studies of the effect of precipitation on accidents are damaged.
Administrative data are collected on different units
Sometimes the direct user or beneficiary of a product or service is measured by the record; sometimes a related aggregate is measured. While Medicare data are collected on an individual, income tax data are often collected on a family or household unit; credit card data consist of transactions made by a cardholder for any purpose (i.e., whether or not the cardholder benefits directly from the transaction). Problems arise when the administrative dataset measures units that are different from the unit participating in the program; researchers must exercise great care to ensure that the conclusions of the analysis reflect that.
Privacy Protections for Administrative Data and Access Issues for Evaluation Studies
The privacy protections of administrative record systems are generally defined by the organization that owns the records (National Academies of Sciences, Engineering, and Medicine 2017). The Privacy Act (5 U.S. Code § 552a) is a fundamental legal protection regarding federal records systems that contain data on individuals. Under the Privacy Act, statistical records must be used only to “support any research or statistical project, the specific data of which may not be used to make decisions concerning the rights, benefits, or privileges of specific individuals.” 3
Statistical analyses of person-level data have been treated differently under the law than other uses of individual records. Statistical uses require that individual records be used only as an input to some type of aggregation over a set of records. Statistical records are defined by the Privacy Act as “a record in a system of records maintained for statistical research or reporting purposes only and not used in whole or in part in making any determination about an identifiable individual.” The output is a number that describes features of a group of records in the file. Because those uses are fundamentally “uninterested” in the attributes of any individual, actions of a statistical record system face less stringent restrictions than those required for administrative uses. From another perspective, when federal statistical agencies analyze administrative data, the data are brought under the protective coverage of the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) (U.S. Office of Management and Budget 2014).
The use of tax records is limited by the U.S. Tax Code, which imposes requirements that significantly limit the scope of permissible research. The data may be used only for tax administration purposes (which includes tax policy–relevant research and analysis) and only by employees and contractors. 4 Tax administration “includes assessment, collection, enforcement, litigation, publication, and statistical gathering functions under such laws, statutes, or conventions” (U.S. Office of Management and Budget 2016).
It is important to note that extracting statistical information from individual data can sometimes lead to the inadvertent reidentification of individual records. If, for example, percentages are displayed in a table, and the revealed numbers of persons allow one to determine that the percentage is based on a unique individual, then it can be possible to infer from the statistical information some characteristics of individuals. That is because there are mathematical limits on “how much” information can be analyzed while maintaining a reasonable notion of privacy (Dinur and Nissim 2003). For the same reasons, if many tabulations are performed and disseminated from the same dataset, it can become easier to make such inferences if the tabulations overlap on one or more attributes. These simple examples yield the Fundamental Law of Information Recovery in a developing area labeled “differential privacy” (Abowd 2016; National Academies of Sciences, Engineering, and Medicine 2017). This field is progressing rapidly and affecting the research and development agendas of statistical agencies. It has not yet, to our knowledge, impacted the analysis of administrative data. It could imply, however, that use of data be restricted by what might be labeled a “privacy budget,” which would limit the probability of reidentifying individuals whose data contribute to statistics.
Privacy issues also arise when administrative data are merged with other data systems. For example, linking administrative data to survey data can greatly enhance the survey data’s value for assessing a policy. When multiple datasets are linked, however, the probability that new attributes will identify unique individuals in the population can increase.
Because of privacy concerns and the existing legal framework to protect privacy, access to administrative records for evaluative studies is an ongoing issue in the field. To create the Longitudinal Employer Household Dynamics (LEHD) program, which combines administrative data on business establishments and workers with household and business survey data, the Census Bureau had to obtain separate memoranda of understanding with every state to obtain their administrative records, a process that took 10 years (Abowd, Haltiwanger, and Lane 2004).
Over the years, a variety of practices have been used for third parties to access administrative data for evaluation study purposes. Some of the most prominent forms are
The third party becomes an employee of the organization owning the administrative data. For example, researchers have used the Intergovernmental Personnel Act to become employees of the Internal Revenue Service.
The agency places the data in a protected enclave. Examples include the Federal Statistical Research Data Centers (FSRDC) and the data enclave of the National Opinion Research Center at the University of Chicago. 5
The agency permits access with a binding contract protecting the confidentiality pledges. The National Center for Education Statistics in the Institute of Education Sciences (IES) permits access to “restricted-use data” under well-defined limits with application through an organization such as a university or research institution.6,7
Synthetic data are created to protect the identity of individuals by using real data to create other datasets based on statistical modeling, producing “plausible” other datasets that have many of the same statistical properties of the original. For example, the Synthetic Longitudinal Business Database describes changes in the size and status of businesses. 8 The Survey of Income and Program Participation Synthetic Beta (SSB) at the Census Bureau is a dataset that “integrates person-level micro-data from a household survey with administrative tax and benefit data.” 9
Specified statistical analyses are conducted within the agency and the results are given to the third-party evaluator. The third party relies on the interpretation of the analytic goals of the project, proper preanalysis handling of the data (e.g., recoding of fields for analytic purposes), proper use of cases with missing data, and proper implementation for the statistical procedures.
The agency creates a “public use” version of the data, stripped of individual identifiers for secondary use. Some agencies prepare a version of the data that limits the inadvertent reidentification of individuals in the dataset. One example of this type of public use file is the Supplemental Security Income (SSI) Public-Use Microdata File, which contains data drawn from the Supplemental Security Record file (SSR) created from a 5 percent random, representative sample of individuals who received SSI payments. 10
When expert panels and government commissions have examined how administrative records might be used to improve evaluation of government programs and statistics more broadly, they acknowledge that legal and regulatory changes must be made (National Academies of Sciences, Engineering, and Medicine 2017; Commission on Evidence-Based Policymaking 2017). Further, modern privacy-enhancing technologies could be used to simultaneously increase privacy protections for administrative data and make them usable for statistical purposes.
Summary
Record systems used to administer government programs often contain data that are quite valuable for assessing the efficacy of the programs. Though they often document the program’s actions on the recipients, they sometimes also measure program outcomes for those recipients. Because of that, they can be valuable evaluation tools. But they routinely fail to include those who were eligible for the program but did not take advantage of it. Outside of a formalized randomized experimental design, administrative data have limitations for determining the value of a program for those for whom it was intended. Further, statistical users of administrative records must attend to a host of discrepancies between the needs of a program administrator and the needs of a third-party evaluator.
Despite the challenges present in using administrative records for program evaluation, these data can enhance social and economic statistics and reduce the burden of data collection on the American public. Yet in many cases, laws restricting their use outside the administrative agency that collects them impede access to such data for statistical evaluation studies. The future of administrative data in evidence-based policymaking, therefore, depends heavily on changes in the legal and regulatory structure. With the enhanced techniques for protecting privacy that now exist, such legal changes might be made with assurance that public trust could be maintained or even enhanced.
Footnotes
Notes
Robert M. Groves, Gerard Campbell SJ Professor in the Departments of Mathematics and Statistics and the Department of Sociology, is the provost of Georgetown University. His research has focused on influences on survey participation, the use of adaptive research designs to improve statistics, and privacy-related concerns affecting statistical agencies.
George J. Schoeffel is a researcher for the Committee on National Statistics at the National Academies of Science. His current research focuses on combining multiple data sources for better public policy.
