Abstract
Administrative experiments are increasingly available for public programs with high-quality administrative data to identify what changes make programs and services more effective. Program administrators can run short-term experiments to test improvements in programs and have causally valid impact estimates within a year. Administrative experiments also can be used to better understand what works for whom by testing program improvements on key subpopulations of program participants. This article provides an overview of rapid cycle evaluation, describes its use in identifying what works best for whom, and provides an illustrative example of how the techniques could be applied to the veterans’ employment services area.
Keywords
Introduction
Administrative experiments are an approach to program improvement in which changes to program operations and services are formally tested through short-term experiments. The approach lets program administrators know whether changes to ongoing programs actually improve outcomes, and even whether effective changes have unintended consequences (e.g., an employment program for youth intended to increase job entry decreases school attendance). Administrators can use this approach to quickly identify changes that make programs more effective and to quickly discard those changes that do not.
Administrative experiments also can help program administrators better identify what works for whom. Most public programs do not have a single, uniform effect on all participants. Instead, some people are likely to benefit significantly from services, others benefit a little, and still others benefit not at all. Administrative experiments can be used to test different approaches for working with the groups less likely to benefit, with the goal of finding changes that make the program more effective for these groups. The approach can be employed when the groups least likely to benefit are known (e.g., homeless people may be least likely to benefit from work-first employment programs when their basic needs remain unmet), and in cases where the groups are unknown.
The private sector has used experimentation to test program changes for years. For example, Capital One runs thousands of experiments each year to test the effects of changes to their services and offerings, such as whether letting individuals transfer other credit card balances to Capital One cards for free would increase card usage (Anderson & Simester, 2011; Davenport & Harris, 2007). Kohl’s, the department store company, has also used rigorous experimentation to test new initiatives before implementing them enterprise wide. For example, when considering opening their stores 1 hour later, the company was concerned that sales revenues might decrease. Instead of blindly implementing the adjustment, Kohl’s tested the idea by conducting an experiment in 100 stores, which produced evidence that the change would not decrease sales (Thomke & Manzi, 2014).
The public sector has recently begun to embrace this approach. In particular, several federal agencies now use administrative experiments to test behavioral interventions—program changes intended to influence how participants engage with the program and respond to program services. In 2014, the White House Office of Science and Technology Policy launched the Social and Behavioral Science Team (SBST). Staff from SBST have been working with numerous federal agencies to identify and launch administrative experiments testing behavioral interventions (Office of Science & Technology Policy, 2015). Also in 2014, The U.S. Department of Labor (DOL) launched the project on Behavioral Interventions for Labor Related Programs, which aims to improve the performance of DOL programs, such as employment and training, unemployment insurance, occupational health and safety, workers’ rights protection, and the enforcement of labor law (Mathematica Policy Research, 2015). In 2010, the Department of Health and Human Services launched the Behavioral Interventions to Advance Self-Sufficiency project, which aims to use behavioral insights to improve programs affecting low-income child and their families (Richburg-Hayes et al., 2014). All of these efforts have been demonstrating the promise of using rigorous methods to formally test changes to public programs.
To be good candidates for such experimentation, programs must have high-quality, robust administrative data, and a large number of participants to ensure that impacts can be measured quickly with minimal investment. At the same time, it is important to recognize that administrative experiments are not appropriate in all contexts or for all purposes. Rather, it should be viewed as an additional tool in the toolkit of program administrators to innovate and optimize around the delivery of existing program services to their target populations. It is especially well suited to test modest changes to existing programs but not a replacement for implementation studies, rigorous studies overall effectiveness, cost-benefit studies, and others evaluation approaches (Rossi, Lipsey, & Freeman, 2004; Shadish, Cook, & Campbell, 2002; Wholey, Hatry, & Newcomer, 2004).
In this article, we describe the key elements of administrative experiments. We then explain how the approach can be used to better understand what works for whom. We discuss experiments conducted when we know in advance which types of participants are more and less likely to benefit from current program services as well as experiments conducted when principal beneficiaries are not known but successful outcomes can be anticipated through predictive modeling. We then discuss how these approaches could be deployed in a specific example, providing employment services to veterans.
Administrative Experiments
Overview
Administrative experiments use rigorous research methods to determine quickly whether a change in ongoing program operations or services improves outcomes. Often, administrators assess changes simply by implementing them—either system wide or in ad hoc pilot locations. However, such assessments may yield misleading information. If outcomes improve after a change is implemented, it may or may not be that the change caused the improvements. Without a reliable source of information about what would have happened to program participants without the program (the counterfactual), administrators cannot determine causation. Prevailing trends, or other factors changing at the same time, might actually explain the improvements. Moreover, if the changes were shown effective through an initial pilot, it may not be that the same level of effectiveness will occur when the changes are implemented system wide. As a result, these assessments often lead administrators to adopt changes that, in the end, have no long-term impact on program outcomes (and to abandon changes that are actually effective).
Administrative experiments can provide administrators with compelling evidence that a program change does or does not have the intended impact. The approach is designed to generate quick feedback, allowing administrators to quickly test and then implement effective changes with confidence, and to avoid implementing ineffective changes. In some cases, the experiment can be designed to not only measure impacts on desired outcomes but also look for unintended impacts on other outcomes. For example, a program may implement changes to streamline their eligibility determination process to increase program access. The administrative experiment can measure the impact of the change on the number of new program participants. However, if administrators are worried that the change could have adverse consequences, such as increasing operating costs or reducing data quality, the evaluation can measure impacts on those outcomes as well.
There are four defining characteristics of administrative experiments:
They are focused on measuring the impact of changes to existing program operations and services
Administrative experiments in the public sector can test whether improvements or enhancements to current program operations result in better outcomes. Instead of testing an entirely new program model, administrative experiments let administrators test incremental changes to their existing approach, whether the changes are in how the program communicates with participants, when and where services are provided, how services are provided, or even what services are provided. The evaluation can examine whether these changes improve outcomes for participants. Participant outcomes can reflect the program’s primary goals (such as impacts on participants’ employment status), and they can also examine impacts on participant satisfaction. Program administrators also can use administrative experiments to determine how these changes may impact program operations, such as operating costs.
They use experimental or quasi-experimental methods to identify a causal relationship
Administrative experiments use rigorous techniques that generate confidence that observed changes in outcomes are due to the change in program services and not to other factors (such as differences between the group that received the changed services and the group that did not). Common techniques include (1) randomized controlled trials that create randomly formed treatment and control groups, with the treatment group receiving changed program services and (2) quasi-experimental designs, in which treatment and control groups are formed in a nonrandom way, such as matched comparison or regression discontinuity research designs. 1
They rely predominantly on administrative data to measure impacts
Administrative experiments rely predominantly on existing administrative data, which expedite analysis and keep costs low. Individual-level administrative data provides the best foundation for administrative experiments. These data can be obtained through internal operational systems maintained by the program or through integrated data systems that combine administrative data across programs. These data must be of high quality and must contain accurate, valid, and reliable measures of the outcomes of interest.
To the extent that administrators have the ability to track participant activity electronically, this information can be used to supplement existing administrative data at relatively low cost. For example, programs could track click throughs to a landing web page, expressions of interest, or information requests, or use scannable UPC codes to track follow through among people receiving different offers in the mail.
Results can be observed quickly (within 1 year)
Administrative experiments focus on those scenarios where results can be observed quickly. This means the experiments should be focused on operational and procedural changes that can be implemented quickly. It also means that the evaluation must monitor short-term outcomes. In some cases, the short-term outcomes that can be observed reflect the primary goals of the program—for example, monitoring whether changes to employment programs affect whether a job seeker finds a job. In other cases, the analysis must focus on proximal outcomes. These could include important program performance or participant outcomes, such as program take-up, engagement or persistence, and program completion. The availability of high-quality administrative data support the rapid analysis of these outcomes. For example, administrators could rapidly assess whether an enhancement to a 3-month job training program yields increased job placements, but it would not be possible to assess whether the program enhancements influence long-term earnings gains. However, administrative experiments do not preclude administrators from conducting long-term follow-up to examine longer term outcomes such as earnings growth, job retention, and career advancement.
Rapid cycle evaluation is another term commonly used for administrative experiments (see, e.g., Cody & Asher, 2014). Typically, rapid cycle evaluations refer to a series of consecutive administrative experiments conducted as part of an agency’s continuous quality improvement process. Rapid cycle evaluations can support a formative, continuous improvement model in which a program change is tested, the results are examined, the changes are modified if needed, and the modified program is tested again. The approach is ideal for testing those changes (and subsequent modifications) that can be rapidly implemented and whose impacts can be observed relatively quickly.
Identifying what works for whom
Administrative experiments can also be used to identify what works for whom, a critical issue for public program administration. Programs designed to alleviate important societal problems, including unemployment, poverty, and obesity, cannot be “one size fits all.” The populations these programs target are diverse, as are the barriers they face and the communities they live in. Therefore, it is not hard to imagine that the available programs will not be equally effective for all participants. Some people may benefit significantly from a given program, while others may experience no benefit at all.
Variation in program effectiveness—whether it be variation by participant characteristics or variation by the context in which participants are served—is a concern for two main reasons. First, there are participants whose needs have gone unmet, despite engaging in a government-sponsored program. These people will likely continue to need support and services in the future. Second, the government has spent limited resources inefficiently by providing services with no benefit.
If administrators were able to better target their programs by identifying what works for whom, programs would be more efficient and effective, and a greater number of participants would experience improved outcomes. Rapid experimentation is one rigorous tool administrators can employ to do so, by allowing them to test enhancements to existing programs with different groups that may experience the least benefit from current services.
The key to this approach is to correctly identify those groups of participants that are unlikely to benefit from the existing programs and services. Once identified, these groups form the basis for testing enhancements to current program services. There are two ways to form these groups (see Figure 1).

Two approaches to identifying subgroups in administrative experiments.
The first strategy uses observable characteristics to identify a subgroup up front and then tests program improvements tailored to that group. In this approach, program administrators can analyze data on program participants to determine which types of participants tend to have different outcomes. They can then segment the population based on these characteristics at intake or referral and test an enhancement to the program with the group less likely to benefit.
For example, people referred to a program through a certain path may be known to have unique barriers that can affect their likelihood to succeed. Temporary assistance for needy family participants referred from a homeless shelter may have housing and stabilization issues that prevent them from fully engaging in and benefiting from the work components of the program. This group could be identified at referral, and administrators can use an experiment to test an enhanced program tailored to their needs that could be quickly evaluated. Alternatively, individuals could be identified through initial, routine assessments as potentially needing different services, such as job training participants who score below a certain threshold on academic or skills assessments. Administrators could use an experiment to test a program enhancement that includes remedial education or other services for this subpopulation.
The second strategy uses predictive analytics to segment the population. When program administrators cannot observe a pattern based on individual characteristics or are not confident in the pattern they observe, they can run predictive models that look at multiple characteristics of the population using existing data to predict who is most or least likely to benefit from the current programs and services.
Predictive modeling leverages the fact that key outcomes from program participants are often correlated with the participant’s prior behaviors, circumstances, and characteristics, as well as those of the participant’s family, associates, service providers, and surroundings. Predictive models can be “supervised” or “unsupervised.” In supervised models, data on the outcomes of interest (found a job and did not find a job) are related to potential predictors of that outcome (level of education and extent of job search effort). In unsupervised learning models, potential predictors are used to first classify individuals and then make predictions based on how different one is from the group one is classified into. These predictors can be based on attributes (such as age, gender, and years of work experience) or measures of relationships or connection (are the client’s five closest friends and same-generation family members employed). These approaches can be used to identify correlations in the data that predict future outcomes. It should be noted, however, that predictive correlations do not always exist, and researchers should be careful to assess the predictive validity of their models.
Using models with strong predictive validity, researchers can rank program participants based on the likelihood that an outcome, whether positive or negative, will occur. Once program participants are ranked, administrators can identify a subpopulation that is least likely to benefit from the program. This determination can be made by looking for a natural cut point in the data or simply to represent a certain percentage of the distribution. Once the participants least likely to benefit are identified, they can be further separated into treatment and control groups, and a test of different strategies can be conducted, all while the program continues to serve those who benefit under standard procedures. Administrators could also test new approaches on people likely to benefit from the program. These tests could help administrators determine whether enhanced services would yield even greater improvements for people best positioned to benefit from existing services. Administrators may view this approach as an effective way to achieve further gains for participants and improve the program’s overall success.
Whether using observed characteristics or predictive models, administrators need to be cautious and conscientious in identifying which participants are more and less likely to benefit. Racial, ethnic, and gender profiling in particular should not be the basis for forming groups of individuals. While there may be some cases where the underlying factors explaining why a group benefits (or fails to benefit) are correlated with race, ethnicity, or gender, administrators should work to better understand what those underlying factors are. Administrators should also be mindful of false stereotypes about participants referred from certain paths (for instance, the criminal justice system) to ensure that those perceptions are also not playing a role in grouping participants.
Illustrative Example Focused on Veterans’ Employment Services
Veterans’ employment services are a good example of an area in which administrative experiments could be used to identify what works for whom and help to improve employment outcomes for veterans who recently separated from military service. Improving employment outcomes for veterans is a growing area of interest, as 1 million people are expected to separate from military service in coming years (Shinseki, 2013).
Despite being highly trained and possessing extensive experience that may be attractive to the private sector, veterans face many challenges finding employment. As of October 2014, the unemployment rate for veterans serving since September 2001 was 7.2%, while the rate for nonveterans was 5.4% and the rate for all veterans groups was 4.5% (Bureau of Labor Statistics, 2014b). Veterans surveyed in 2012 identified the bad economy and difficulties translating their military experience to marketable skills in the civilian labor force as major challenges to finding employment (Prudential, 2012). Veterans and transitioning service members may also lack familiarity and prior experience with the job search process in the civilian labor market (Crenshaw & Wright, 2013).
An extensive infrastructure is in place and willing providers are ready to support former service members in their transition to civilian life. The U.S. DOL’s Veterans’ Employment and Training Service (VETS) funds employment services for eligible veterans through Jobs for Veterans State Grants, including the hiring of Local Veterans’ Employment Representatives and Disabled Veterans’ Outreach Program specialists (U.S. DOL, 2014). The departments of Defense, Veterans Affairs, and Labor administer the Transition Assistance Program (TAP) that separating service members have been required to attend since November 2012. 2 TAP provides preseparation counseling, individual transition planning, and employment workshops (U.S. Department of Veterans Affairs, 2014).
Despite the availability of these programs, take-up rates could be improved. According to Prudential’s 2012 survey, while 55% of veterans reported attending a TAP seminar, only 6% reported using the VETS programs (Prudential, 2012). The percentage of veterans that attends TAP seminars will likely increase with the addition of the participation mandate, but strategies to increase participation rates in VETS and other programs are also necessary.
While veterans as a group have higher unemployment rates and face unique challenges to becoming employed, the population is also heterogeneous. Veterans come from across the country, both male and female, from different service branches and occupations and have varying education and skill levels. Given this diversity, one approach to employment services will not work for all veterans. To best address veterans’ unemployment and improve veteran employment services, administrators need to understand what works for whom.
In developing an administrative experiment to answer this question, program administrators can undertake a five-step process that begins with diagnosing the problem (Figure 2). Note that this process requires close partnership between program administrators, analytical specialists, and researchers to allow the approach to become feasible and not burden those running the programs.

Administrative experiment design and implementation steps.
Step 1: Diagnose the Problem
The first step in implementing an administrative experiments is to diagnose the problem to be addressed—for example, what are the various reasons why veterans struggle in their transition to civilian employment? This step includes understanding the existing programs and service infrastructure, including the programs and funding streams, eligibility criteria, service providers, outreach and engagement strategies, service content, performance goals, and outcomes.
To identify program changes to test and groups to target, it is essential to explore diverse perspectives on veterans’ responses to the available programs. To gain insights into strengths and weaknesses of the existing programs, administrators could engage with direct service providers, advocate groups, and former service members themselves, including those who may and may not have received services. These conversations will help administrators better understand the various reasons why veterans may struggle in their transition to the civilian labor market (e.g., excessive optimism, lack of information, procrastination, negative perceptions of the workforce system, competing concerns such as physical and mental health issues, and difficulties translating military credentials and skills to the civilian labor market, and others) and whether they are more or less relevant for particular subgroups.
Step 2: Identify the Population of Interest
Once the problem is diagnosed, administrators can hone in on a programmatic subgroup of interest for which to test program enhancements. As described, there are two strategies to help identify the target subgroups. The first is to use observable characteristics to identify a group of interest upfront. From prior experience, program administrators may have an idea of participant subgroups that are less likely to engage with or benefit from current services. In the veteran context, potential subgroups could include recipients of unemployment compensation for ex-service members compared with all separating veterans or TAP participants, different branches and ranks, different functions and responsibilities based on military occupational codes (MOCs), and combat veterans compared with noncombat.
As an example of this strategy, program administrators could use veterans’ MOCs to segment the population. The Department of Defense has grouped enlisted personal occupations into 15 broad occupational groups across the military branches, including combat specialty, support service, and construction (Bureau of Labor Statistics, 2014a). Those occupations that are specific to the military can be harder to translate to the civilian workforce, such as snipers and nuclear defense officers. Administrators could use MOCs to identify service members in hard to employ occupations and offer them different services tailored to their training and employment needs. Administrators could then test these enhanced services using rapid experimentation and incorporate the findings into ongoing service delivery.
The second strategy is to use predictive modeling to look at multiple characteristics of the veteran population to predict who is most or least likely to engage in or benefit from a program. As an example, this strategy could be used to improve employment services program outreach efforts and take-up rates. If rich historical data on individuals leaving military service could be linked to administrative data on employment services, administrators, supported by analytical specialists, could estimate a predictive model on who does or does not access employment services. They could then use the model to predict who among the newly separated service members are unlikely to enter employment services and test different outreach strategies to increase engagement.
Step 3: Define the Intervention
Program administrators could decide how to enhance current services based in part on the targeted subgroup and their unique needs and barriers to employment. Types of programmatic changes and enhancements administrators could test include:
Outreach
Outreach can be used to encourage engagement in employment services as well as engagement in the labor market. Possible dimensions for experimentation include the mode of outreach (letters, e-mails, text messages, tweets, digital ads, etc.), content and framing (e.g., appealing to veterans’ patriotism or military social norms, priming different aspects of the veterans’ identity, or using reward vs. entitlement messages), timing (e.g., soon after TAP completion vs. right after separation), and simple versus compound approaches (e.g., text messages only, letters and text messages, or e-mails and robocalls).
Service enhancements to the status quo
Program administrators could consider testing add-ons to the existing programming such as self-guided web-based job search workbooks, modularized (in-person) job search assistance programs, group workshops, one-on-one reemployment assistance evaluations, transferrable skills analysis and career planning and coaching, volunteering and job shadowing, linking to employer-sponsored apprenticeships and on the job training, and stackable credentials.
Wraparound supports
Administrators could consider offering new wraparound support services to participants that would be in addition to the existing service provision, such as motivational text messages, one-on-one follow-up, e-mail and phone reminders, links to support groups, and motivational incentives such as lotteries or chances to win (small) prizes. These services are designed to provide additional supports that may be needed by participants to fully engage in a program.
Step 4: Plan and Conduct the Administrative Experiment
After identifying a subpopulation of interest and a programmatic enhancement, administrators, working with researchers, would need to determine how to setup and deploy a test within the administrative experiment framework. They may need to create a prototype of the enhanced program that they want to test. Simultaneously, they would need to develop and refine an administrative experiment strategy, whose elements could include a mechanism to identify, randomly assign, engage, and track individuals as well as an overall evaluation design. The evaluation design would include elements such as sample size targets, catchment areas, a timeline, outcome measures, and administrative data sources. In some cases, especially if the program is particularly complex or very different from existing services, administrators might want to include pilot testing of the model program on a small scale before randomly assigning participants. Once the prototype and evaluation strategy are finalized, administrators can begin implementing and testing the enhanced program.
Step 5: Evaluate Results and Decide Next Steps
At the end of the administrative experiment, administrators, researchers, and analytical specialists will collect and analyze data, report on the results, and determine the next steps. Depending on the results, administrators may decide to continue with multiple rounds of experimentation, continuing to refine and evaluate program enhancements.
Conclusion
Administrative experiments are a tool for continuous improvement of programs with high-quality administrative data. After agencies have built the capacity to conduct administrative experiments on their own following the steps described here, the technique can be used routinely to continuously test and improve their programs. Agencies can test program changes (Steps 1 through 4), examine the results (Step 5), modify the changes if needed (Steps 1 through 3), and then test and evaluate the modified program again (Steps 4 and 5).
Administrative experiments also can be used for testing different strategies for different types of program participants, helping program administrators better understand what works for whom. While this article provides examples for veterans’ employment services, the approach can be used across a wide array of programs: other employment and training programs, education programs, programs aimed at reducing risky behaviors, home visiting programs for first-time parents, programs providing financial counseling for low-income individuals, programs aimed at reducing recidivism in the criminal justice system, and so on.
Although administrative experiments can be applied to a variety of programs, the programs must have one thing in common, that is, high-quality administrative data. To ensure that administrators draw the right conclusions from administrative experiment analysis, program data need to contain accurate, valid, and reliable measures of the outcomes of interest. Moreover, to facilitate an analysis of what works for whom—whether determining subgroups through observational data or predictive modeling—programs require high-quality data at the individual participant level. With high-quality data in hand, administrative experiments can provide a powerful tool to help determine what works better and what works for whom.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
