Abstract
Quantitative social work researchers are often interested in estimating the causal effects of various types of interventions. Randomized controlled trials (RCTs) are typically thought to be the “gold standard” when it comes to estimating such effects. RCTs, however, are sometimes neither ethical nor feasible, which means researchers will need alternatives if they’re still interested in estimating causal effects. Economists have dealt with this issue as well and have developed econometric tools for estimating causal effects in the absence of data from RCTs. Many social work researchers may not be familiar with these methods, and the purpose of this article is to provide a “bird’s eye” overview of them. Readers should obtain enough here to understand the issues involved in using these methods as well which ones might be useful for their research and, therefore, worth pursuing further.
Quantitative social work researchers are often interested in estimating the causal or treatment effects of various types of interventions. We tend to view randomized controlled trials (RCTs) as the “gold standard” when it comes to estimating such effects, and there’s a key reason for this. RCTs are based on randomly assigning units to at least two different treatments, and such assignment tends to result in the groups being balanced on variables related to treatment, which may also causally affect the outcome of interest. “Balanced” means that the average values of these other variables are equal across the different groups. The beauty of randomization is that it tends to generate this balance not just for observed variables (those we have data on) but for unobserved ones as well (those we don’t have data on).
Observed and unobserved variables related to treatment which also affect an outcome of interest are often called confounding variables. The fact that randomization tends to generate balance on potentially confounding variables means that if a researcher observes a difference (or differences) on the outcome of interest across treatment groups, they can be confident this difference is due to the intervention being evaluated instead of one or more confounding variables.
The problem many social work researchers face is that RCTs are sometimes neither ethical nor feasible, yet these researchers are still interested in estimating causal effects. A group of researchers who frequently find themselves in the same situation is economists. Drawing on developments in statistics, as well as inventing methods of their own, economists have created a subfield of economics called econometrics.
Econometrics was developed largely to address estimation of causal effects in relation to economic issues, such as the causal effect of price on quantity demanded, income on consumption, marginal tax rates on labor supply, and so on. Yet these techniques are also used in neighboring social sciences, such as sociology and political science. Although there are certainly social work researchers who are familiar with them (Guo & Fraser, 2015; Rose & Stone, 2011), these techniques don’t appear to have entered the mainstream of social work research. Perhaps this is because econometric methods aren’t typically covered in the standard statistics sequence social workers are required to take during their doctoral training but are left to specialists in quantitative methodology.
The purpose of this article is to provide an accessible overview of state-of-the-art econometric techniques that can be used to estimate the causal effects of various types of interventions. By “overview” I mean a “bird’s eye” view that neglects many of the details one would need to be familiar with to actually apply these methods. Instead of such details, I’ll focus only on the “logic” of as well as fundamental issues involved in using these methods to estimate causal effects in the absence of data from RCTs. That is, the article is intended for quantitative social work researchers interested in estimating causal effects but who haven’t received much training in the econometric techniques involved. After reading it, such researchers will, hopefully, have enough of a background to know which methods might be useful for their work and, therefore, which ones they might want to learn more about.
Treatment or Intervention Effects
The place to begin is with the conception of causality, which has come to dominate econometric approaches to estimating causal effects. This view is similar to the one found in statistics and is called the potential outcomes (PO; or sometimes counterfactual) model of causal effects (Murnane & Willett, 2011).
Let i represent a given unit of analysis (a person, organization, city, or whatever is being studied), let outcome represent some dependent variable of interest, and let intervention be some independent variable of interest. Intervention equals 1 if the case has received treatment and intervention equals 0 if it hasn’t. According to the PO model, the treatment effect of the intervention in question is:
The “|” in the above-mentioned expression stands for “given that.” So, according to the PO model, the treatment effect of the intervention is the difference between the value of the outcome for case i given receipt of the intervention and the value of the outcome for that same case i given nonreceipt of the intervention.
The example mentioned earlier is pretty abstract. To make things more concrete, much of this article will use the following hypothetical intervention to illustrate many of the concepts discussed. Suppose a social service agency is providing reentry services to a formally imprisoned population. The agency hires social workers and other professionals who focus on providing these clients with services that will, hopefully, make them more employable as well as address any trauma or other experiences, while in prison, which might make successful reentry into the society more difficult. The outcome variable in this scenario is hourly earnings on a job in the formal labor market (earnings). The causal effect of interest is the impact of having received reentry services on earnings. We’ll call this variable reentry and it equals 1 for those who received such services and 0 for those who did not. Given this setup, we can write the treatment effect of interest as follows:
Reentry = 1 refers to having received reentry services, while reentry = 0 refers to not having received them. Equation 2 is the difference between earnings, after having left prison, given receipt of the intervention for a specific person and earnings given that the person did not receive the intervention.
Of course, for a specific individual or case, we don’t simultaneously have data on the outcome given receipt of the intervention or treatment and the outcome given no treatment. That is, for a specific case, we either have their outcome under treatment or their outcome under no treatment but not both. But, according to proponents of the PO model, it still makes sense to think about outcomes we could potentially observe if a case were assigned the intervention or not. This is the reasoning behind the name PO model (Murnane & Willett, 2011).
If an individual actually receives treatment and, therefore, we can observe the value of their outcome under treatment, then their outcome under no treatment is unobserved and is called the counterfactual outcome or, more simply, the counterfactual. If we have a case’s outcome under no treatment, then their outcome under treatment is the counterfactual.
Now suppose there’re 50 former prisoners and we’re interested in the causal effect of reentry on earnings for all 50 of them. According to the PO model, there would be 50 differences like Equation 2. And it is possible that these differences could themselves differ. In other words, the causal effect of an intervention may vary among those exposed to it. Let’s call such variation treatment effect heterogeneity (TEH).
Given the possibility of TEH, we might be interested in the average treatment effect (ATE) of reentry. In theory (because all the quantities needed to actually do this aren’t observed), we could calculate this by adding up the 50 individual treatment effects and dividing by 50. More generally, the ATE could be obtained by applying the following formula:
The summation above is over the total number of treatment effects n, which is also the number of cases.
One might reasonably ask what’s the point of a model that relies on calculations involving unobserved quantities? As proponents of the PO model see it, the answer is this: The model provides a rigorous definition of treatment effect and can, therefore, shed light on what we’re attempting to do when we conduct RCTs or apply other methods in an effort to estimate the causal effects of interventions.
Consider how the PO model can help us see why RCTs can be regarded as the gold standard. The treatment effect in Equations 1 and 2 could be obtained if we could clone our cases of interest. Then, we could subject one member of a pair to the treatment and the clone of that member to the nontreatment condition. What random assignment does is approximate this by creating what might be called “an approximate clone” for each member of the treatment group. That approximate clone is a given member of the control group. Members of the control group can be regarded as clones of members of the treatment group in the sense that control group members, thanks to random assignment, on average, have the same values on variables related to treatment, which affect the outcome of interest as do those in the treatment group. This means that approximations of Equations 1 and 2 can be obtained with Equations 4 and 5, respectively,
In Equations 4 and 5, the subscript “i” denotes a given case from the treatment group and “j” denotes one from the control group. In Equation 5, ni is the number of cases in the treatment and nj is the number in the control group. Thinking about RCTs with respect to the PO model makes it clear that even the gold standard is just an approximation to what we really want—the difference between the outcome for the same case under two different conditions or the average of those differences. Morgan and Winship (2010) discuss how the PO model can be extended to more than two conditions, but keeping the focus on only two serves the purpose of this article. When RCTs are well designed, participants follow study protocols, attrition isn’t a major problem, and a few other conditions hold, they provide the basis for the best approximations to ATEs we can come up with. As I said earlier, however, RCTs are sometimes neither ethical nor feasible. The rest of this article will provide an overview of approaches for estimating causal effects in the absence of RCTs.
Structural Equations
Anyone who’s read the econometrics literature will have seen terms such as “multiple regression,” “regression model,” and similar ones quite frequently. Yet as Pearl (2000) and Freedman (2005) have shown, when economists use terms like multiple regression or regression models, they’re often referring to structural equations or response schedules.
A regression model is an equation that indicates an association between the average values of a given variable and the values of some other variable of interest. A structural equation or response schedule represents a modeler’s assumptions regarding a causal relationship. That is, such a model is meant to represent how the value of a given variable would change, holding all else constant, as a result of a change in another variable of interest. The phrase as “a result of” is meant to highlight the assumption that changes in the value of one variable cause changes in the value of the other, holding constant all other variables that affect this second variable. That is, if we intervened to change the value of one variable, a change in the value of the other would follow as a consequence. An example might help readers see this distinction more clearly.
Suppose we have a set of people, the amount of hours they spend working outside the home, and the amount of government income support they receive. A regression model would represent the association between average hours worked and amount of income support. A structural equation would represent the modeler’s assumption that if amount of income support were changed, hours worked would change as a result.
Economists tend to use the term regression model to refer both to regression models proper and to structural equations. And as shown by Chen and Pearl (2013), this is an unfortunate and potentially confusing practice. In an effort not to repeat it, I’ll use the term structural equation, instead of regression model, when, in fact, that’s what I’m talking about.
In social work and the social sciences, we’re often taught that “correlation doesn’t imply causality” and that we ought to be skeptical when it comes to inferring causal effects in the absence of RCTs. Economists tend to be more daring when it comes to trying to say something about causality in the absence of RCTs and structural equations play a big role in their attempts to do so (Stock & Watson, 2011).
Consider the following two equations:
Equation 6 is a structural equation that is assumed to represent the causal relationship between reentry (the causal variable) and earnings (the effect variable), holding constant the effect of education (highest grade completed in formal schooling) and work (work experience) on earnings. In econometricians’ view, ε represents the net causal effect of all those variables which affect Earnings but which aren’t explicitly included in the model. An example of such a variable might be motivation to do well in life (Motive).
Recall that reentry is a dummy variable where 1 refers to those who received the intervention and 0 refers to those who did not. This means that β1 is the difference between average earnings for those who received the intervention versus those who didn’t. That is,
Equation 7 displays estimators of the βs (the bs) and α (a) in Equation 6. An estimator can be thought of as a rule or formula that can be used for obtaining numerical estimates of certain quantities. Under a set of specific conditions discussed in standard econometrics texts, ordinary least squares regression (OLS), combined with sample data, can be used to obtain the estimators in Equation 7. And these estimators can be used to obtain numerical estimates of the βs and α in Equation 6. The e in Equation 7 is called the residual and is just the difference between the actual value of the outcome variable and the value predicted by that equation for a given case.
If we focus specifically on b1 it represents the estimator used to obtain an estimate of the causal effect of reentry on earnings. That is, b1 can be used to obtain an estimate of β1. What we want is for b1 to be a good estimator. Much work in econometrics is centered on what it means for an estimator to be good. Given the task of this article, I’ll focus on only one criterion of goodness—unbiasedness.
The first thing to be said about unbiasedness is that it isn’t a property of estimates but of estimators, the formulas, or rules used to obtain estimates. An estimator is unbiased if the average value of estimates obtained with that estimator, in repeated sampling, equals the true value of the population parameter one is trying to estimate, in this case β1; it is biased if this is not the case. Unbiasedness is a desired quality of estimators because it tells us that in repeated sampling we won’t systematically overestimate or underestimate the true population parameter we’re interested in. This means we can have more confidence in the value generated by b1, as an estimate of β1, obtained in the one sample we typically have on hand. It turns out, however, that OLS under certain conditions can generate biased estimators.
The key condition, for our purposes, which must hold if OLS is to be an unbiased estimator of β1 is that Equation 7 must explicitly include any independent variable in the model which (1) has a causal effect on earnings and (2) is correlated with any other independent variable explicitly included in the model. To see what’s meant here, look at (a slightly modified) Equation 7 again:
I’ve now dropped work from the model. We said earlier (in Equation 6) that work has a causal effect on earnings. So in our attempt to estimate the causal effect of reentry on earnings, we’ve neglected to explicitly include an independent variable in the model which also has a causal effect on earnings. Assume that work and education are correlated. Then the two conditions for a biased estimator of the causal effect of reentry on earnings would be met. That is, OLS would no longer provide a way of obtaining good estimates of β1. When this situation arises from leaving an independent variable out of a model which should include it, economists call it omitted variable bias (OVB).
It might appear that OVB is easy to deal with—just explicitly include any independent variable that should be in a model. But things aren’t so simple. First, we might know which independent variables ought to be included in a model but don’t have data on them. This might be the case with motivation. That is, we might know that motivation has a causal effect on Earnings but the data set we’re working with doesn’t contain measures of motivation levels. This would render Motive an unobserved variable, assuming people differ in their levels of motivation.
Second, there may be independent variables that causally affect the outcome of interest yet we aren’t aware of this. With the social sciences being what they are, we don’t yet fully understand human behavior and the social structures constructed by them and, perhaps, never will. Thus, there may be variables acting on outcomes which we’re simply totally ignorant of. If these are correlated with any of the variables explicitly included in a given model, OVB would result.
There are statistical tests, such as the Ramsey Test and the Lagrange Multiplier Test (Gujarati, 2015), which were developed to test for OVB. I won’t go into the details of these tests, however, since that isn’t the purpose of this article. Its purpose, again, is merely to raise the fundamental issues involved in using econometric methods to estimate causal effects with nonexperimental data.
Instrumental Variables
Recall the conditions stated earlier for OVB to occur: (1) an independent variable, not explicitly included in the specified model, has a causal effect on the outcome of interest and (2) that independent variable is correlated with the error term. Another method econometricians have developed to address this problem is called the instrumental variables estimator (IVE). It relies on a special variable which, following Stock and Watson (2011), we’ll call Z. Understanding the role Z plays in IVE requires one to understand two concepts used widely in econometrics: (1) endogenous variables and (2) exogenous variables.
Endogenous variables are those which are correlated with the error term in the structural equation of interest. Exogenous variables are those which are not so correlated. The IVE works by finding a variable Z, called an instrumental variable (IV) which meets two conditions.
The first is that Z must be correlated with the independent variable whose causal effect we’re trying to estimate. In the reentry services example, Z would have to be correlated with the dummy variable for treatment or intervention (1 = received reentry services and 0 = didn’t receive them). The second condition is that Z must be uncorrelated with the error term in the structural equation of interest. Recall that the error term represents the net impact of those variables not explicitly included in or omitted from the structural equation but which affect the outcome. So Z having no correlation with the error term also means no correlation with omitted variables.
The first condition is required because Z is designed to serve as a “stand in” for the variable whose causal effect we’re trying to estimate (reentry in our case). If Z is going to replace a variable whose causal effect we’re trying to estimate, then it must “behave” like that variable. That is, it must be related to the causal variable of interest.
The second condition is necessary because if Z is to replace the causal variable of interest it can’t have the same problem that variable does. That problem is correlation between it and the error term. Notice that the second condition Z must meet to serve as an IV is that it is uncorrelated with the error term; that is, it is uncorrelated with any omitted variables in the structural equation model of interest. Thus, using it, instead of the causal variable of interest, doesn’t present an OVB problem.
Assuming an IV has been found, an estimator called two-stage least squares (2SLS) can be used to estimate the effect of a causal variable of interest. In the first stage, the causal variable of interest is regressed on the IV as well as any covariates that would be included in the structural equation for the treatment effect of interest. In other words, the variable whose causal effect we’re interested in estimating becomes a dependent variable in the first-stage model. In the second stage, the actual dependent variable of interest is regressed on the predicted outcome from the first-stage model as well as covariates thought to have an impact on the original outcome of interest. Applying all this to our prison reentry example may make it easier to see what’s going on.
Suppose whether or not an inmate ends up getting reentry services depends on their birthdate. Those born on a day of the month that’s an even number get to receive such services. Those born on an odd day of the month do not. Suppose we created a variable called Birth which equals 1 for those born on an even day of the month and 0 for those born on an odd day. In this case, Birth would be correlated with reentry our treatment variable. The first condition of an IV would be met. If Birth were uncorrelated with the error term in the structural equation for the treatment effect of reentry on earnings, it would also meet the second condition of an IV.
Applying 2SLS, step one would be:
In Equation 9, covariates refers to terms representing the effects of observed variables thought to affect earnings other than reentry. In other words, covariates refers to those variables that will be explicitly included in the model of the effect of reentry on earnings. The key is that they must be included in the first-stage model as well.
Step two in 2SLS would be:
In Equation 10, reentrypred refers to the predicted values of reentry from the Stage 1 model. Since reentry is a binary variable logistic regression could be used at the first stage and predicted probabilities could be saved from that model. Those predicted probabilities would be the reentrypred values at Stage 2. Reentrypred is also our IV. Covariates at Stage 3 would be the same ones used at Stage 1.
By using reentrypred at Stage 2, the numerical value of obtained from the estimator b1 would be an unbiased estimate of the effect of reentry on earnings because reentrypred has been “purged” of any association between reentry and omitted variables. That is reentrypred can be used to estimate the effect of treatment on earnings because it uses only that part of treatment associated with birth and the covariates from the original treatment model. It is not “contaminated” by unobserved or omitted variables that affect earnings.
Astute readers have probably noticed that as a predicted probability, reentrypred is no longer a binary or dummy variable. This “problem” can be fixed as follows. We could recode reentry by stipulating that anyone with a value less than 0.5 gets a 0 on our recoded version. Anyone with 0.5 or greater would get a recoded value of 1. So our recoded reentrypred would now be a dummy variable. We could then run the model in Equation 10 with this recoded version of reentrypred, and this would make the numerical value for b1 an estimate of
So far, I’ve focused on the case of one endogenous variable whose causal effect we’re trying to estimate. However, 2SLS can be used with more than one endogenous variable, and Stock and Watson (2011) provide an accessible treatment regarding how such an extension can be carried out. They also discuss other key issues involved in the use of IVs, such as those having to do with how instruments meeting the two conditions spelled out above are found as well as the problem of weak instruments. A weak instrument is one which isn’t strongly associated with the variable whose causal effect we want to estimate. In our case, Birth would be a weak instrument if it weren’t strongly associated with reentry. See Stock and Watson (2011) for further discussion.
Difference in Differences
Readers of this journal are likely to be familiar with pre/posttests research designs that employ comparison groups when RCTs aren’t an option. Relating such a design to our reentry example, we could gather average earnings data for men who’ll later receive reentry services at a point before they’ve received them and gather average earnings data at the same point in time for men who won’t receive such services. This would be pretest data. Next, one group of men would get reentry services while the other would not, and we’d then gather average earnings data for both groups after the intervention group has received reentry services. This would be posttest data. The difference in differences (DID) strategy is ideal for estimating causal effects in such situations.
To understand how it works some notation is required. Let posttest outcometreat be the average value of the outcome variable for those who received the intervention after having received it. Pretest outcome refers to the average value of the outcome for those who’ll received the intervention before they’ve actually received it. Posttest outcomecomparison and pretest outcomecomparison are defined similarly for the comparison group.
DID estimates the causal effect of a treatment by taking the following difference:
or, using our specific example,
As you can see, this is a difference between two differences, which is what gives the approach its name. If an intervention causes an increase in the average value of the outcome over time, then the value of the first difference should be significantly larger than the value of the second one. If an intervention causes a decrease in the average value of the outcome, then the value of the first difference should be significantly smaller than the value of the second one.
The way I’ve presented the DID approach so far facilitates explaining how the approach works. Econometricians usually employ the approach in equation form but there’s no need to do so here for our purposes. Readers interested in seeing this equation form should consult Stock and Watson (2011).
A key strength of the DID approach is that it controls for the effects of unobserved omitted variables on the outcome of interest as long as these variables don’t change over time. Its key limitation then is that it doesn’t control for the effects of time-varying unobserved variables. See Stock and Watson (2011) or Angrist and Pischke (2015) for further details.
Regression Discontinuity
Imagine the following hypothetical situation. A researcher is interested in evaluating the impact of a nutrition education program. The goal of the program is to increase the health status of those who participate in it as measured by some valid and reliable indicator of how healthy a person is. Suppose, due to scarce resources, only those whose incomes fall below the poverty line qualify for the nutrition education program. This type of scenario is ideal for application of the regression discontinuity (RD) approach or design.
The logic of RD is relatively simple. Instead of randomly assigning participants to treatment and control groups, some rule or other mechanism does the assigning. This assignment is done on the basis of a variable called the forcing variable (Murnane & Willet, 2011). In our example, income is the forcing variable since eligibility for the nutrition education program is determined by whether one’s income qualifies them as poor.
The fundamental assumption on which RD is based is that cases on either side of the forcing variable cutoff, but very close to it, are similar in all other respects that might affect the outcome in question other than the intervention of interest. In our example, that assumption amounts to believing that those right above the poverty line and those right below it are similar in all ways that might affect health status besides exposure to the nutrition education program. To be more specific, some of these similarities might have to do with whether one smokes, exercise habits, exposure to stress, and so on.
If this assumption is correct, we can think of assignment to treatment according to value on the forcing variable as analogous to random assignment. So if we calculate the difference between average outcomes for cases at or right above the cutoff, compared to those right below it, this can be viewed as an estimate of the effect of the intervention in question.
Applying all this to our nutrition education example, assigning people to receive nutrition education based on whether they are poor is akin to random assignment, as long as we compare cases close to the poverty line. Thus, if we calculated the difference between the average health status of those at or right above the poverty line and the average health status of those right below it, we’d have an estimate of the effect of the nutrition education program on health status. All of this might be clearer if it’s shown graphically.
Figure 1 is a display of the RD approach. The x-axis displays values of the forcing variable, in this case Income. The term “Poverty” is pointing to the poverty line on x-axis. The y-axis displays values of Health Status. On the left side of the line running down the middle of the graph would be cases who’ve received nutrition education, and the trend in their health status is the positive sloping line on the left side of that middle line. Cases on that middle line and above it would not receive such education, and the positive sloping line on the right of the middle line captures the trend in their health status. Notice that there’s a jump or “discontinuity” at the poverty line. This indicates that those at or above poverty tend to have lower health status than those below poverty, and granting the assumptions on which RD is based, this is due to the nutrition education program. In other words, this jump is a graphical indication of the treatment effect. The RD approach can be represented algebraically, but, for our purposes, there’s no need to do so.

Regression discontinuity of nutrition education on health status.
Recall that the assumption of RD is that cases right around the threshold value of the forcing variable are alike in all respects that might affect the outcome of interest other than the treatment or intervention. This raises the question of how far we can move away from the threshold before this assumption is no longer the case, granting that it is near the threshold. There’s a literature which addresses this issue in some detail, and Murnane and Willett (2011) would be a good place to start.
So far, we’ve been discussing what’s called a sharp RD approach or design. This applies when cases actually comply with their assignment. But, of course, human beings don’t always comply with the wishes of researchers and sometimes refuse to accept treatment when assigned to it or refuse to accept the comparison condition when assigned to it. The fuzzy RD approach or design is intended to address this situation. It is called fuzzy because whether or not someone actually receives the intervention isn’t strictly determined by their value on the forcing variable. See Imbens and Lemieux (2008) for further discussion of the distinction between sharp and fuzzy RD approaches as well as how to estimate causal effects with fuzzy designs.
Conclusion
Social work researchers are frequently interested in estimating the causal effects of interventions. The ideal way to do so is to conduct RCTs, yet there are times when these are neither ethical nor feasible. This article has provided a bird’s eye view of some ideas from econometrics that social work researchers might find helpful when trying to estimate causal effects using nonexperimental data. None of these ideas are a “magic bullet.” Causal inference is hard even when one has conducted an RCT, given that research participants sometimes fail to follow treatment protocols, they’re sometimes lost to follow-up, it is often a challenge to design RCTs with both internal and external validity, and so on. When RCTs can’t be conducted, causal inference becomes that much harder. However, this should not lead researchers to despair. It should, instead, lead us to recognize the limitations of the approaches we employ, be clear about the assumptions on which various methods are based as well as their plausibility in specific situations, test those assumptions when possible, and retain a fair dose of humility.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
