Abstract
The widely used Oaxaca decomposition applies to linear models. Extending it to commonly used nonlinear models such as duration models is not straightforward. This paper shows that the original decomposition that uses a linear model can also be obtained by an application of the mean value theorem. By extension, this basis provides a means of obtaining a decomposition formula which applies to nonlinear models which are continuous functions. The detailed decomposition of the explained component is expressed in terms of what are usually referred to as marginal effects. Explicit formulae are provided for the decomposition of some nonlinear models commonly used in applied econometrics including binary choice, duration and Box-Cox models.
Keywords
Introduction
Much applied work in economics is devoted to analyzing the sources of differences between individuals and groups. The Oaxaca decomposition [15] is a method of expressing the difference between the mean values of a variable – usually the logarithm of earnings – for two groups based on the coefficients obtained from two group-specific linear regressions.1 The difference is expressed in terms of two components that contribute to the divergence in group means: an explained part or ‘composition effect’ due to differences in the mean characteristics of the two groups, and an unexplained component or ‘structure effect’ due to differences in the estimated coefficients in the group equations. A very similar decomposition was proposed by Blinder [8], in the same year but after the publication of Oaxaca’s article.2 The technique was originally developed in order to establish the existence and extent of wage and other forms of discrimination and is widely used in labour economics and to some extent other areas. It can also be applied to analyze group differences, in general. Surveys of this and other decomposition methods are provided by Beblo et al. [7] and Fortin et al. [15].
Attempts have been made to use the Oaxaca approach to decompose group differences using specific nonlinear models, such as the logit and probit models [14, 22, 10, 16], hazard or duration models [20, 17] and Tobit-type models [13, 24, 21]. More recently, Bauer and Sinning [5], Schwiebert [19] and Kaiser [11] have proposed a generalization of the Oaxaca approach based on the sample means of estimated functions for nonlinear specifications. This method will be shown to be problematic for the identification of certain components of interest defined in the Oaxaca-linear approach. In particular, in the original version, the existence of discrimination is based on assuming that two groups have the same mean characteristics. The approaches mentioned above are not formulated on this kind of counterfactual basis.
This paper proposes an Oaxaca-type decomposition for any continuous nonlinear model. It uses as a basis the difference between two fitted values, which is decomposed into a composition and a structure effect. It is obtained through an application of the mean value theorem and the resulting decomposition is exact in the sense that there is no remainder even though the model is nonlinear. The paper begins in Section 2 with a brief examination of the basis of the Oaxaca decomposition and then explores some of the difficulties encountered when seeking to generalize this approach to nonlinear relations in Section 3. In the following section, it is shown that the Oaxaca decomposition can be obtained by the application of the mean value theorem to the estimated relation for one of the groups being compared. The application of theorem is then used as means of obtaining a decomposition technique, which can be used with any continuous nonlinear function. In Section 5, explicit forms for the decomposition of some widely used nonlinear models for binary choice and duration analysis along with a model using the Box-Cox transformation. Empirical examples of each of these are presented.
Interpretations of the Oaxaca decomposition
The original Oaxaca decomposition has a certain number of features which are inextricably linked to the linear regression model, and which limit the extent to which the method can be directly generalized. It applies to an explicitly linear framework in which the dependent variable for member
The Oaxaca decomposition is obtained by first estimating the parameters using ordinary least squares (OLS) to obtain
and adding and subtracting this counterfactual term, results in the following additive decomposition:
The first term on the right hand side is the unexplained component or structure effect – that is, what the person with mean characteristics in group
The decomposition is model-based. A model is specified to determine the value of The original focus was on the decomposition of differences in sample means,
The presence of a constant ensures that the sum and therefore the mean of the estimated OLS residuals,
It is this equality that permits the decomposition of the difference in means,
Although it was not presented in this form originally, it is common nowadays to express the decomposition in terms of the expectations of variables for the population relationships (for example, Fortin et al. [12], and Rothe [18]). The decomposition is based on the parameters of a linear specification Eq. (1). The Oaxaca decomposition at the population level is:
since, by assumption, Properties (ii) and (iii) differ since the sample mean of the estimated residual, The Oaxaca decomposition is subject to an index number problem. If the difference is calculated around
Extending the Oaxaca (linear) approach to nonlinear relations is not straightforward. First, OLS cannot generally be applied due to the presence of nonlinearities in the relation. The decomposition will not have the original Oaxaca form. Furthermore, as has been pointed out above, the decomposition has certain properties that are related explicitly to the numerical properties of least squares estimation and these will no longer apply. Second, and more importantly, when applied to nonlinear models, an Oaxaca-type decomposition of differences in either sample means or expectations of the left hand side variable will not be exact, and so neither of properties (i) and (ii) carries over to nonlinear functions. This is due to Jensen’s inequality, a consequence of which is that, in general, for a nonlinear function
Even an exact Oaxaca-type decomposition at the population level in terms of expectations, as in Eq. (5), will not be obtained in general.3 Due to the (near4) impossibility of obtaining an exact decomposition of the group difference in sample means for nonlinear models in terms of the group means of the explanatory variables, the basis for a decomposition using a nonlinear model needs to be rigorously specified.
Call the estimated functions or fitted values for each group
This implies that the group difference in any of these means can be used as basis for a decomposition in the affine case. When the function is nonlinear these three quantities are not identical. Thus when extending the Oaxaca approach to nonlinear relations, the possible candidates as a basis are the decomposition of the difference in:
The sample means of the left hand side variable, The sample means of the fitted values of estimated functional relationship,
The values of the group estimated functions (or fitted values) evaluated at the means of the right hand side variables for that group,
These different bases will not be equal and therefore a choice has to be made. In view of Jensen’s inequality, basis (a) is unlikely to prove fruitful for a generalization. Even in the linear regression case, (a) is appropriate only when the relation contains a constant. The earlier approaches of Nielsen [14] and Yun [23] and more recently Bauer and Sinning [5], Schwiebert [19] and Kaiser [11] propose using basis (b). This produces a decomposition of the differences in the sample means of the fitted values (or equivalently the sample means of the estimated function):
where
The logic of this choice is clear in that in population terms, this corresponds to a decomposition of the following difference
However, there are at least two reasons why Eq. (6) may be unsatisfactory as a generalization of the Oaxaca method. Firstly, if the functions
Using (b) as a basis therefore entails disconnecting the decomposition from the mean vectors
A decomposition using basis (b) is therefore not generally expressed in terms of the means of the variables
An alternative approach can be derived from the original “Oaxaca-linear” method which is applicable to both linear and nonlinear functions using basis (c),
Applying the mean value theorem to one of the functions over the intervals between
where
The first component on the right hand side is immediately recognizable as the structure effect – for identical mean characteristics, what is the model’s prediction of the difference in
In the case of one explanatory variable, this form of the decomposition can be presented graphically as in Fig. 1. The segment BE is parallel to the tangent representing
The decomposition obtained by applying the mean value theorem permits an exact detailed decomposition of the composition effect once the vector
An alternative approach is to calculate the vector of derivatives of the function,
Graphical representation of the decomposition for a nonlinear function.
Graphical representation of the Oaxaca decomposition for a linear model.
where
Proof Define the function:
The derivative of the function
Integrating this derivative across the range of
In this formulation, the individual contribution of each variable (
In the linear model, the marginal effect is constant and the Oaxaca approach is a special case of the decomposition presented here.
Proof If the model is linear,
The method can be simplified when a nonlinear function is defined on a linear index, so that
where
Given this tautology, there is no need to determine the elements of the vector
The weights in the detailed decomposition in fact resemble those in Yun [23], but are applied to a different basis.7 This common feature is a consequence of the function being defined on a single linear index.
The proposed decomposition has the advantage of having a coherent basis – it compares a model-based estimate of an actual situation with a model-based estimate of a counterfactual one, where both are specified in terms of a parametrically defined function and the vectors of group means (
Decomposing augmented linear models such as the sample selection model has already been addressed by Neumann and Oaxaca [13] based on mean characteristics which has some similarities with the approach proposed here, although they stress the importance of how one interprets the selectivity term.8 Other functions of interest in applied work are probability models (in which the population rate is decomposed) and hazard models (which involves either the hazard itself or the average duration of a spell). In this section, we derive explicit formulae the decomposition for these types of model using Eq. (10). Hereafter, any parameter covered by a hat (for example,
Logit and probit models
Logit and probit models have the same generic form for each of the groups:
where
where
where
Unlike the probit model, the function to be decomposed in the logit model has a closed form:
For maximum likelihood estimates of
where
where, tautologically,
Various authors have attempted to decompose the difference in sample means using logit and probit models [14, 22, 23, 10]. In fact for a logit model containing a constant term, when the parameters are estimated by maximum likelihood, the sample mean is related to the estimated function in the following way:10
This mean property has been used to obtain a decomposition for the logit model given by:
Note that this decomposition contains the sample means of the dependent variable but not the means of the right hand side variables,
Hazard functions and duration models
One of the key differences with duration models is that in most data sets, durations are censored at the time of the survey. This is the case for example with unemployment durations in the Labour Force Surveys used to estimate the unemployment rate according to the ILO definition. In order to analyse differences in unemployment duration or hazard rates between groups, using the difference in sample means as the basis for an Oaxaca-type decomposition is not appropriate because censoring. Most econometric analyses take account of censoring in the estimation of models, but there is an issue of which quantity is to be decomposed. It is in this context that the approach proposed here is particularly relevant. By using the fitted value corresponding mean characteristics as the basis, the decomposition can be straightforwardly obtained.
Using the same notation as above, where
where
where the survivor function, and thus the hazard function, is linked to the average completed spell duration through the following equality (see for example, Baker and Trivedi [3]):
In what follows, the link between a parametric hazard specification and the corresponding formula for the expected duration is used to obtain decompositions for two popular hazard specifications – the Weibull and loglogistic.
One of the more widely used parametric specifications of the function is the Weibull hazard given by:
In this case, the expected duration of a completed spell (
The first term on the right hand side is the gamma function and is independent of both
where the fixed scalar
Tautologically, this is equal to:
This specification contains the exponential specification as a special case when
The Weibull specification applies only to cases where the hazard rate is monotonic – it is either increasing,
When
where
The decomposition will thus have a similar form to the Weibull specification.
Uncertainty over the nature of duration dependence in the presence of unobserved heterogeneity has led to the use of specifications involving mixtures of distributions. A final example of a widely used specification is the mixture Weibull hazard function with gamma heterogeneity. The hazard specification is:
where
and the decomposition will have the same form the two earlier specifications with
The data used come from the French Generation 2004 survey, which follows a cohort of individuals leaving the education system in 2004. The age of the person in that year is obviously related to the number of years spent in the education system. However in France, the correspondence between educational attainment in terms of the highest diploma obtained and the age at which the person leaves the system is clouded by the widespread phenomenon of spending more than one year in a particular grade. For example, many university students take their first year twice over. The same occurs lower down the education ladder, where a pupil may spend two years in a particular grade (some pupils even skip a grade). When analyzing access to permanent employment, this lag acts as a signal to employers. The average education lag in the sample is more than two years (see Table 1). The duration of until finding a permanent job is modelled as a function of two education variables: educational attainment measured as the theoretical number of years necessary to obtain a given diploma and the education lag. In addition the overall unemployment rate in the geographical locality of the person’s domicile in 2004 is used to measure the influence of the state of the labour market. The duration variable used is the number of months following exit from the education system.
Means and standard deviations of variables used in duration analysis
A second phenomenon often associated with difficulty finding a permanent job among young persons is cultural and ethnic origin, and specifically whether the person has parents who are immigrants. In the sample used, 16% have parents who are not of French origin. There are differences in educational attainment and education lag that also suggest that children of immigrants are likely to fare less well in the labour market. In addition to these factors there may also be discrimination in the recruitment of young persons which favours those whose parents are not immigrants. We therefore use the proposed decomposition to quantify the different components of the difference in durations between the two groups of young persons.
The decomposition uses a model-based estimate of the mean duration for each group and decomposes the difference between these. In the current case, we assume that the hazard function is of the Weibull form – Eq. (11) – and the corresponding expected duration is given above in Eq. (12). The parameters are in fact obtained using an accelerated failure time model which is estimated separately for the two groups, and the results are presented in Table 2.12 The estimated Weibull shape parameters indicate that the hazard function is increasing with duration. The other estimated coefficients suggest that more education, shorter education lag and a smaller unemployment local unemployment all reduce the duration and more so for children of immigrants compared to their French counterparts. There is however a large difference between the estimated constant terms for the two groups which suggests that there is discrimination in access to employment in favour of those of French origin.
Accelerated failure time model estimates for duration analysis
The decomposition of the estimated expected duration (in months and not logarithms) is undertaken using the average French origin characteristics in the counterfactual. The difference to be decomposed is the difference between two model-based estimates of the average duration corresponding to the mean characteristics of the respective groups. This is 15.2 months (see Table 3). The structural component of this gap is 13.2 months – or 87%. The composition effect, the part due to differences in characteristics, therefore accounts for only a minor part of the gap (2 months). The detailed decomposition of the composition effect suggests that improving the educational performance (on both fronts) of children whose parents are immigrants will reduce the expected duration to obtaining a permanent job. However, this is limited to the extent that for identical characteristics, those with immigrant parents are at a disadvantage.
A decomposition of group difference in duration (in months)
The presence of a linear index in a nonlinear function enables the proportional weighting factor
In the case of both dependent and independent variables being transformed, the model to be estimated with only one explanatory variable is:
In order to obtain the conditional expectation of
Following Abrevaya and Hausman [1], a J-th term Taylor expansion around
where
This simplifies in the case when
and for
In both cases the decomposition will not simplify in the same way as for nonlinear models defined on a linear index and so the integral in Proposition 1 will have to be explicitly evaluated.
Means and standard deviations of variables used in earnings equations
Data from the 2005 French Labour Force Survey are used to examine earnings differences between males and females. The sample includes individuals aged 20 to 54 who declare earnings and hours worked enabling an hourly wage to be calculated. The earnings equations are of the form:
where
Male and female earnings equations with Box-Cox transformations
For both sexes, the Box-Cox transformations are significant (
Decomposition of gender differences in hourly earnings (2
By recognizing that the Oaxaca technique can be obtained by an application of the mean value theorem, a new decomposition method applicable to nonlinear models is proposed. For continuous nonlinear functions, the decomposition proposed here is based on counterfactuals using the means of the right hand side variables and group differences in fitted values of the estimated functions. In this decomposition the unexplained component or ‘structure effect’ is completely defined, and a detailed decomposition of the ‘composition effect’ is possible. There is no remainder. Explicit formulae for decompositions of the binary logit and probit models, duration models based on the Weibull and log-logistic hazard specifications and a Box-Cox model are presented, although the method applies to any a model which is specified as a continuous function of the explanatory variables.
It differs from other approaches in two key respects. First the proposed decomposition is based on the mean characteristics of each group and the fitted value of the model evaluated at the vector of means. It is therefore based on the same clearly defined counterfactual as the original Oaxaca approach. It does not decompose group differences in the observed mean of the dependent variable; but in general this cannot be done in a nonlinear model in an exact manner. Furthermore, in the presence of censoring, as in duration models, the sample mean of the left hand side variable is not a very useful measure. Second, its application is not restricted to the class of nonlinear models defined on a linear index as is the case in most other proposed decomposition techniques. This type of model is used in applied work for understandable reasons, but there are cases when the nonlinear aspect of the relation between economic variables goes beyond this form of model as in the Box-Cox case.
Footnotes
It is possible to obtain the same estimates in a pooled regression with group specific coefficients and dummy variables.
In private correspondence with these authors, it emerges that the two papers were prepared independently but the authors had met and discussed their research beforehand.
The equality
An equality could occur in certain situations since the function here is nonlinear but not necessarily monotonic.
The law of iterated expectations is that
An earlier version of this paper was entitled ‘The MV decomposition’. Since writing that version, an article by Schwiebert [
] appeared which uses the mean value theorem to obtain a decomposition and uses that name. The decomposition here is quite different from that presented in Schwiebert [19] which uses a different basis and applies only to nonlinear models defined on a linear index.
Yun [24] and Wolff [
] provide alternative decomposition procedures for the sample selection model.
The variance is normalised equal to one.
This is a consequence of the first order conditions for obtaining a maximum likelihood estimate of the constant term.
Note that the effect of a variable on the duration of a completed spell is of opposite sign to its effect on the hazard rate.
All estimates were obtained using SAS version 9.3 with a Windows 7 64-bit Professional operating system on a Lenovo Thinkpad S440 laptop with 8 GB of memory.
Acknowledgments
We are grateful to a referee, Habiba Djebbari and seminar participants at GREQAM, Marseille, IZA, Bonn and the University of Dijon for comments on an earlier version of this paper.
