Abstract
This article deals with a model for describing a sequence of events, for example, education is typically attained by a set of transitions from one level of education to the next. In particular, this article tries to reconcile measures describing the effect of a variable on each of these transitions, with measures describing the effect of this variable on the final outcome of that process. Such a relationship has been known to exist within a sequential logit model, but it has hardly been used in empirical research mainly because of an absence of a practical way of giving it a substantive interpretation. This article tries to provide such an interpretation by showing that the effect on the final outcome is a weighted sum of the effects on each transition, such that a transition gets more weight if more people are at risk of passing that transition, passing the transition is more differentiating, and people gain more from passing.
Introduction
Some processes are best described as a sequence of steps or choices. For example, the choice to have many children is not a single event but rather a sequence of choices to get yet another child (ignoring twins, triplets, etc.). Similarly, the process of attaining education can be seen as sequence of transitions between one school type and the next. Yet another example would be a process where someone first enters the military or not, and given that one has entered the military, one is exposed to combat or not. In order to describe such a process, it is often useful to look at these individual steps or transitions but also at the end result of that process. The purpose of this article is to present a technique that relates these two. This technique will be based on a sequential model and especially, but not exclusively, a sequential logit model (Fienberg 1977; Fullerton 2009). 1 The sequential logit model estimates the relationship between explanatory variables and the odds of passing each transition. This model will then be used to also derive the effect of an explanatory variable on the final outcome, which turns out to be a weighted sum of the effects on passing transitions.
This was already established by Mare (1981). However, he framed this result as indicating that models focusing solely on the final outcome—for example, a linear regression model explaining years of education—conflate changes in the marginal distribution of the outcome and changes in effects on the different transitions (Mare 1981, 83). As a consequence, this result has mainly been understood, at least within the discipline of social stratification research, as an argument against models that try to study effects on the final outcome and not as a possibility to study the relationship between the effects on transitions and effects on the final outcome.
However, Mare did make two attempts at interpreting this relationship. First, by deriving how changes in the transition probabilities influence the effect of an explanatory variable on the final outcome, which turned out to be “too complex” to be of practical use(Mare, 1981, 78). Consequently, this method has not been picked up at all by the discipline. Second, by comparing a set of scenarios that showed how the effects on the final outcome changed when the marginal distribution changes (Mare 1981, 82). This method has only been used twice (Nieuwbeerta and Rijken 1996; Smith and Cheung 1986). A reason for this may be that it is hard to choose which scenarios to compare: It is hard to find the right balance between scenarios different enough to be interesting but not so different that the amount of extrapolation becomes uncomfortable.
This article tries to add to this literature by showing that one can interpret the relationship between effects on passing transitions and effects on the final outcome. For example, if one studied the influence of family background on educational attainment, then the log-odds ratios of passing transitions returned by a sequential logit gives a good description of the influence of family background on the process of attaining education. One could also assign (pseudo-)years of education to each outcome. These are the number of years a “standard student” would need in order to attain a level of education. For example, if one studied the U.S. educational system, one could assign 12 years to someone who only finished high school and 16 for college graduates. Then one can use the results from the sequential logit model to also predict the final outcome, the expected number of pseudo-years of education, and look at how family background influences that outcome. This effect of family background on the highest attained level of education can be decomposed into a weighted sum of the effects of family background on passing transitions. The aim of this article is to show that this decomposition has a substantive interpretation. It makes intuitive sense that the effect on the final outcome is a weighted sum of the effects on each transition. What has been overlooked is that the weights also make substantive sense. In this decomposition, a transition receives more weight when: more people are at risk of passing that transition, passing that transition is more differentiating, that is, the weight is largest when the probability of passing is 50 percent, which means that neither passing nor failing that transition is virtually universal, and people gain more from passing that transition, that is, the difference in expected outcome (in the abovementioned example, pseudo-years of education) between those that pass and those that fail is larger.
Rather than having separate models for effects on the final outcome and effects on passing different transitions (e.g., Shavit and Blossfeld 1993), this decomposition allows for an integrated discussion of these two related processes. Moreover, this decomposition will allow one to study how changes in the distribution of the outcome variable influence the effect of an explanatory variable on the final outcome. For example, a major development in the educational systems of almost every country in the last century is that more recent cohorts tend to obtain more education (Hout and DiPrete 2006), a process called educational expansion. This is a change in the marginal distribution, and the method discussed in this article allows one to study the consequences of this development.
The decomposition in this interpretable form exists for the marginal effect for continuous explanatory variables and for discrete changes for categorical explanatory variables. A limitation of this method is that it provides a description of the patterns of association, but these have no causal interpretation.
This article will start with a discussion of the sequential logit model and then derive the decomposition. It will illustrate its use with two examples: the effect of father’s education on the offspring’s education in the Netherlands and the effect in West Germany of women’s education on the number of children they get.
Decomposing the Effect on the Final Outcome
The Sequential Logit Model
The decomposition of the effect of an explanatory variable on the final outcome into a weighted sum of effects on the odds of passing each transitions starts with a sequential logit model. Consider a hypothetical process consisting of four states A, B, C, and D, as represented in Figure 1. These states could be educational levels, such that A is no education, B is primary education, C is secondary education, and D is tertiary education. Figure 1 shows how respondents face three transitions: They can attend primary education or opt for no education at all; if they opt for primary education, they can choose to leave the system once they have completed primary education, or go on to secondary education; and if they opt for secondary education, they can then either choose to leave once they have completed this level or go on to tertiary education. Alternatively, these states could be the number of children such that A is no children, B is one child, C is two children, and D is three or more children. An implication of this model is that if someone’s highest-achieved level is B, then that person was “at risk” of passing the first two transitions but not the third. Furthermore, it implies that the person passed the first transition but failed the second.

Hypothetical process.
The model assumes that one has to be at risk of passing a transition—that is, to have passed through all lower transitions—in order to make a decision at that transition about whether to continue or to leave the system. Aside from this, these decisions are assumed to be completely independent. As a result, one can estimate the effects on passing each transition by running separate logistic regressions for each transition on the appropriate subsample (Mare 1980). This model is shown in equation (1):
The function Λ() is the standard logistic function, meaning that
Cameron and Heckman (1998) have given an important critique of the sequential logit model in which they question whether the results of a sequential logit model can be given a causal interpretation. They give two reasons for that. 2
First, they argue that the scale of the dependent variable cannot be compared across transitions, and that as a consequence the effects can also not be compared across transitions. This problem has its origin in the latent propensity representation of the logistic regression model. In this representation of the logistic regression model, there is a latent propensity of success which depends linearly on a set of explanatory variables and an error term. A success will occur when the latent propensity is larger than zero, and it is these successes and failures that are actually observed. In order to identify the scale of the latent propensity, and thus the scale of the effects, the variance of the error terms is fixed to a constant (
The second reason why the results of a sequential logit cannot be given a causal interpretation is that the selection of students at earlier transitions will result in a correlation between the error term and the observed variables that will influence the estimated coefficients (Cameron and Heckman 1998; Mare 1980). For example, one might suspect that mainly high-ability children from less privileged backgrounds pass transitions, while low-ability children from privileged backgrounds also have a fair chance of passing. In that case, less privileged children in higher transitions will on average have higher ability than more privileged children in that same transition. So the selection process created a correlation at higher transitions between family background and ability. If ability was not observed, then the effect of family background would be underestimated at higher transitions. However, if the aim of the study is describing the patterns in the data, then a possible correlation between unobserved variables and observed variables is not relevant. So, one can use sequential logit models in a purely descriptive manner (Mare 2011; Xie 2011).
The Decomposition for Marginal Effects
In order to make a link between the effects
3
of a variable on passing transitions (the
For some applications, it is hard to find meaningful values for each outcome. For example, what would the values for each outcome be if one studied a process where the first transition consists of whether or not someone ever served in the military and the second transition if they were ever exposed to combat or not (MacLean 2011)? Similarly, what would the values for each outcome be for a process where potential respondents are first successfully contacted or not, and given that contact was established they decide whether to participate in a study or not (Hoogendoorn and Daalmans 2009)? Since in both examples the main interest is in one outcome, combat exposure and participation in the survey, a solution is to assign the value 1 to “combat exposure” and “study participation” and the value 0 to the outcomes “no military service” and “military service without combat exposure” for the first example and “unsuccessful contact” and “contact but refuse to participate” for the second example. That way the expected final outcome is the probability of having experienced combat or participating in a survey, and the effect of explanatory variables on the final outcome is the marginal effect on those probabilities. If one wants to apply this decomposition to a research project in which all outcomes are of interest, one can repeat the decomposition such that the value 1 is assigned to a different outcome in each repetition.
The explanatory variables are part of equation (2) through the
One way to quantify the effect of an explanatory variable on the final outcome is the first derivative of equation (2) with respect to this explanatory variable. This derivative is shown in equation (3). A step-by-step derivation is set out in Online Appendix A.
Equation (3) shows that the effect on the final outcome The predicted proportion of people at risk of passing a transition. For the first transition, this proportion is 1; for the second, it is the proportion of respondents who complete the first transition, The differentiating capacity of the kth transition, as captured by the function The differences between the expected outcome of those who pass the transitions and those who do not. These are the parts in the square brackets. For instance, the expected outcome of those who pass the first transition is
As discussed in Online Appendix A, equation (3) can also be written as equation (4), which can be a useful variant. The marginal effect of xj on the final outcome is a weighted sum of the marginal effects of xj on passing the individual transitions, and this weight consists of two of the three elements in equation (3): the proportion at risk and the expected gain from passing that transition. One advantage of this version of the decomposition is that it is true regardless of the link function used, so it also applies to seqential probits, sequential cloglog, and so on. Moreover, many people find the marginal effect on passing a transition used in equation (4) easier to interpret than the log-odds ratio of passing a transition used in equation (3). However, this marginal effect is usually 4 also a function of the proportion passing that transition, so this decomposition no longer clearly separates the distribution from the effect. If this separation is important, then that would be an argument for using the decomposition in equation (3) instead. The Educational Expansion and the Effect of Father’s Occupation on the Son’s Education section discusses an example where this separation is important.
What is being decomposed in both decompositions is the marginal effect of an explanatory variable on the expected value of the final outcome at typical values of all explanatory variables. One limitation of this marginal effect is that if the relationship between the explanatory variable and the expected outcome is nonlinear, then each observation has its own marginal effect depending on where on the curve each observation is located. This limitation becomes more pressing, the more pronounced the degree of nonlinearity is. The example in the Educational Expansion and the Effect of Father’s Occupation on the Son’s Education section illustrates this point. Moreover, what is typical at the first transition may not be typical at later transitions. For example, a father with a skilled manual job may be typical during the first transition in an educational system but is probably less typical among the students at risk of entering university. For both these reasons, the marginal effect at typical values of the explanatory variables is not necessarily the best summary measure of the effect. To deal with this problem, some have proposed to look at average marginal effects—that is, compute the marginal effect for each observation and average those—as a way of summarizing the effect. This averaged effect of an explanatory variable on the expected final outcome can also be decomposed into a weighted sum of effects on each transition, where the weights are now the average of the weights predicted for each individual. However, the point of the decomposition is that the weights also have a substantive interpretation as the product of three substantively interesting components. This point is lost by averaging the weights, as the average of the product of three components is not the same as the product of the averages of these three components.
The Decomposition for Discrete Changes 5
Another potential problem with marginal effects is that in nonlinear relationships, the marginal effect is somewhat artificial: It is the effect of a unit change if the rate of change remains unchanged, but the whole point of a nonlinear model is that the rate of change is not constant. This is especially problematic when the explanatory variable is categorical, as the instantaneous rate of change (the marginal effect) is not defined for such a variable. To deal with this problem, some (e.g., Long 1997) have proposed looking at discrete changes—that is, looking at an actual unit change instead of marginal effects. One can derive a decomposition of the effects of a unit change in a similar way as the marginal effect. Let’s call the discrete change in the final outcome
where
where
The structure of equation (6) is very similar to equation (4). As a consequence, the interpretation is also similar: The discrete change in final outcome is a weighted sum of discrete changes in each transition, such that a transition receives more weight when more people are at risk of passing that transition and the difference in expected outcome between those that pass and those that fail that transition is larger. Notice however that the proportions used in this interpretation are rather artificial: They assume that during each transition, the number of observations in each category of the explanatory variable is equal. Even if, by accident, this happens to be (almost) true for the first transition, selection will make sure that this is no longer the case in later transitions.
Examples
The purpose of the following two examples is twofold: On the one hand, the examples will be used to illustrate how this decomposition can be used. On the other hand, they will be used to show some extensions to this decomposition. The first example will be used to show how one can apply this decomposition to more complex decision trees and emphasize the way this method can be used to study the effect of changes in the distribution of the final outcome. The second example illustrates a special case where the decomposition of discrete differences is nonparametric.
Educational Expansion and the Effect of Father’s Occupation on the Son’s Education
The second example considers the influence of family background on educational attainment in the Netherlands and how this process changed over cohorts. This is an application with a more complex decision tree, as the Dutch education system is a tracked system. So rather than a hierarchical set of “continue or stop” decision, this decision tree will contain a branching point, as is illustrated in Figure 2. It is also an application where changes in weights are of substantive interest. Changes in weights are driven by changes in the probabilities of passing the different transitions. As in most countries (Hout and DiPrete 2006), these probabilities have changed considerably over cohorts in the sense that more recent cohorts are much more likely to attain higher levels of education than older cohorts and this is illustrated in Figure 3. This process is called educational expansion. Much of the effort in studying these educational systems is aimed at getting estimates of effects of parental background on educational attainment while controlling for educational expansion (Breen and Jonsson 2005; Mare 1981). However, since educational expansion is such a common and far-reaching change in most educational systems, one might want to know how educational expansion has changed the influence of family background on educational attainment. The decomposition proposed in this article provides a useful way to answer such a question (e.g., Ballarino, Panichella, and Triventi 2014). 6 This example will only focus on the sons. Substantively, comparing the decomposition of male and female respondents would be interesting, as the timing of educational expansion was different for men and women. However, the mechanics of the decomposition, which is what this example is supposed to illustrate, would be the same for both men and women.

Simplified model of the Dutch education system.

Distribution of highest achieved level of education for men over cohorts.
The simplified representation of the Dutch educational system in Figure 2 assumes that all respondents complete primary education. After this, they face a choice between leaving the schooling system and continuing. 7 If they opt for the latter choice, they have to choose between the higher secondary and lower secondary schools. Once they have finished their second diploma in either track, they can choose whether or not to get a third diploma, continuing with vocational education if they are in the low track, or tertiary education if they are in the high track. The values assigned to each level of education are estimated in such a way that the effect of education on income is maximized while controlling for the father’s occupational status, as described in Buis (2010, chapter 3). 8 The values are standardized to have a mean of 0 and standard deviation of 1 for the cohort that were 12 in 1950. These values together with the Dutch names for the different educational levels are presented in Table 1.
Description of the Dutch Education System.
Sequential logit models can also be used for tracked systems like the one in Figure 2 (Breen and Jonsson 2000; Lucas 2001). The Online Appendix C shows that the decomposition discussed in this article also applies to such tracked systems, and it has exactly the same interpretation. Notice that this simplified representation of the Dutch education system reduces it to a system of binary choices. This is convenient and proved to be enough to reveal the main interesting trends, but it is not necessary. As discussed in Online Appendix A.1, the decomposition can also be computed for systems with multinomial choices. Assuming one has data on actual transitions made by respondents instead of only the highest achieved level education used in this example, one can extend this model and the decomposition further to account for the possibility that some respondents changed track and that there is thus more than one way in which one can obtain a given diploma.
In this example, it makes sense to represent the effects on transitions as log-odds ratios, so one can more clearly separate the influence of educational expansion from other changes. Educational expansion is captured in a sequential logit model by changes in the baseline odds of passing a transition. An odds ratio controls for changing baseline odds by being a relative effect. So the argument is that a doubling of the odds is considered to be the same effect regardless of how large the baseline odds is, just as two percentages are sometimes compared using the argument that a hundredth of some total is comparable to a hundredth of some other total. 9 This is how changes in odds ratios capture changes net of educational expansion, while the weights represent the influence of educational expansion.
The data were obtained from the International Stratification and Mobility File (ISMF; Ganzeboom and Treiman 2009). The ISMF contains 54 surveys 10 on the Netherland carried out between 1958 and 2006. The surveys were post-harmonized and merged to increase the time period covered and the number of respondents and to lessen the effect of individual surveys’ idiosyncrasies. Time was measured by the year in which the respondent was 12 scaled in decades since 1950. Information is available for the cohorts born between 1905 and 1991. Cohort is allowed to have a nonlinear effect by representing it as a cubic B-spline parameterized such that the coefficients of each variable represents the difference (in this case, ratio) between its year and a reference year (Newson 2012). So, these parameters can be interpreted in a similar way as a set of indicator or dummy variables, except that these points are linked by a smooth curve rather than a step function. The knots are 1910, 1930, 1950, 1970, and 1990 and the reference year is 1950. A family’s socioeconomic status was measured according to the father’s score on the International Socio-economic Index (ISEI) of occupational status (Ganzeboom and Treiman 2003), as information on the father’s occupation was available for the largest number of cohorts. The original ISEI score is a continuous variable ranging from 10 to 90, but it was standardized to have the value 0 for those fathers who are skilled manual workers (close to the mean in 1950) and a standard deviation of 1 for the cohort that were 12 in 1950. The effect of father’s occupational status was allowed to change linearly with cohort by adding an interaction term. Survey weights were used where available. After removing respondents with missing observations on any of the variables, 43,768 male respondents remained.
The distribution of the highest achieved level of education for different cohorts is shown in Figure 3. The proportion of pupils who only achieved primary education dropped dramatically, while the proportion attaining tertiary education and vocational education strongly increased. Figure 3 also shows that vocational education is a relatively recent level of education for the Netherlands. Whereas no one from the earlier cohorts completed this level of education, vocational education completion has rapidly grown to about 40 percent.
The sequential logit model is presented in Table 2. The main effects of the father’s occupational status (fisei) represent the effect for the cohort that were 12 in 1950. Thus, for men from that cohort, a standard deviation increase in father’s occupational status is associated with an 88 percent increase in the odds of passing the first and second transition, a 29 percent increase in the odds of continuing within the lower track, and a 36 percent increase in the odds of continuing in the higher track. This effect of father’s occupational status has significantly declined by 6.4 percent per decade for the first transition, 1.7 percent per decade for the second transition, and 3.3 percent per decade for the last transition within the upper track. The trend in the effect of father’s occupational status was nonsignificant in the final transition within the lower track. Educational expansion is captured by the baseline odds plus the four cohort variables. So, for men who were 12 in 1950 and whose father was a skilled manual worker, one would expect to find 5.04 individuals that passed the first transition for each individual that failed the first transition. For similar men from the 1910 cohort, this odds would only be a tenth of the odds of the 1950 cohort, that is, one would expect to find only half a person that passed this transition for every person that failed it. These estimates show a picture similar to Figure 3.
Sequential Logit Model for Men.
Note: Exponentiated coefficients. z statistics within parentheses.
aCohort is entered linearly in the interaction.
bReference splines, parameters can be interpreted as dummy variables, with 1950 as a reference.
With these estimates and the values that were attached to each level of education, one can compute the expected level of education for people with different father’s occupational status and born in different years. These are shown in Figure 4. The effect of father’s occupational status on the highest achieved level of education are the slopes of these curves. As was discussed in The Decomposition for Marginal Effects section, the slope will differ depending on at which value of father’s occupational status the slope is computed because these curves are nonlinear. For example, I chose the value 0 to be “typical” (the occupational status of a skilled manual worker, which is close to the mean for the cohort that was 12 in 1950), so it makes sense to look at the slopes at that value. At that value, the slopes initially become steeper from 1910 to 1930 and then become flatter after that. If we look instead at the value 2 standard deviations above 0 (the occupational status of a teacher at a secondary school), we don’t see that initial increase in the effect of father’s occupational status. Instead the slope is steadily declining over time. Mechanically, this difference is due to the fact that for the oldest cohort the lower bound forces the slope to flatter at a father’s occupational status of 0 compared with a father’s occupational status of 2. This also makes substantive sense. Around 1910, a large portion of the population were basically equal because they all had little education. As a consequence, if the father got a bit more occupational status, then that did not do much when he started as a skilled manual worker. However, a small increase in father’s occupational status was more consequential when the father started as a teacher, as all educational outcomes were for those children more or less realistic options. So it is important to carefully consider at which point you evaluate the effects.

Expected highest achieved level of education. Years refer to the year in which the respondent was 12.
Figure 5 shows the effects of father’s occupational status for sons of skilled manual workers. Both the highest achieved level of education and the father’s occupational status are scaled in such a way that the standard deviation is 1. So, this measure of the total effect is similar to a standardized regression coefficient. Figure 5 confirms the pattern found in Figure 4: The effect of father’s occupational status initially increases, till around 1945 and then decreased. 11

Contribution of different transitions to the effect of father’s occupation.
The first step in explaining this pattern is to investigate the contributions of each transition to the total effect. A striking feature is that the final two transitions (continuing in the lower track and continuing in the higher track) contribute negligible amounts to the total effect. Moreover, there has been a shift between the first and the second transitions as to the dominant source of the total effect. The contribution of the first transition initially increased and then decreased, while the contribution of the second transition increased throughout the period being studied. The initial increase in the contributions of the first and second transitions more or less coincides, thus reinforcing one another. The later increase in the contribution of the second transition was not enough to compensate for the decrease in the contribution of the first transition, thus leading to the decrease in the effect of father’s occupational status on the highest achieved level of education.
The second step of this explanation consists of breaking up each transition’s contribution into its two parts: the weight and the effect on each transition. Since the contribution is the product of these two terms, it can be visualized as the area of a rectangle, with a height equal to the effect and a width equal to the weight. This is shown in Figure 6. The horizontal axis shows the weights and the vertical axis the effects, while the columns represent the cohorts and the rows represent the transitions. These figures show that the initial increase in the contribution of the first transition is due to an increase in its weight, while the later decrease in this transition is due to both a decrease in the weight and a decrease in the effects on the transitions. The increase in importance of the second transition is entirely due to the increase in the weight of this transition. This increase in weight actually more than offsets a decrease in the effects. The low contributions of both higher transitions are due to both low effects and low weight.

Decomposition of the effect of father’s occupational status for men into effects on passing transitions and weights.
The third step breaks the weights down into three components, which is done in Table 3. This table shows that the initial increase and the later decline in the first transition’s influence are primarily due to the differentiating capacity of that transition. Initially, any inequality at the first transition affected few people because only a low proportion of students passed. As the proportion of students passing increased, the transition received more weight, until half of the students passed, after which inequality at this transition affected less people again because only a few students failed. The increase in importance of the second transition is partly due to the differentiating capacity but also to a strong increase in the number of students that are at risk of making this transition. The last two transitions receive relatively small weights because relatively few people are at risk of passing these transitions and those who pass gain relatively little. Those who pass the first two transitions gain both the immediate increase in level of education and the possibility of gaining an extra level of education, while in the third and fourth transition, people gain only the immediate increase in the level of education. These developments provide a mechanism through which educational expansion influences the effect of father’s occupational status on highest achieved level of education.
Detailed Decomposition of the Effect of Father’ Occupational Status on Men’s Educational Attainment.
It is informative to go through all the elements of the first cohort in Table 3 in order to illustrate how the decomposition works. One can see that the effect on the final outcome is the weighted sum of the effects on passing transitions:
Women’s Education and the Probability of Having a Large Family
In this example, I will study the influence of women’s education on the process leading to “large” families for women born between 1934 and 1936 in West Germany. The main purpose of this example is to illustrate the decomposition in terms of discrete differences. For illustrative purposes, it is convenient to have a fairly small decision tree. So in this example, the decision tree stops at “three or more children” rather than continue for all the children a women could have. One can imagine a research project in which the interest is on “large” families rather than the number of children per se. For this cohort in West Germany, the modal number of children is clearly 2, so I will consider a family with three or more children as large. The cohorts are interesting because they are to a large extend responsible for the “baby boom” in Germany (Buis, Mönkediek, and Hillmert 2012). The process is described in Figure 7.

Process leading to “big” families (three or more children).
The next step is to assign values to each outcome. One solution would be to not censor the number of children at three or more, but estimate a model for the complete distribution of completed fertility. In that case, it would make sense to assign the value 0 to no children, 1 to 1 child, and so on. The final outcome would in that case be the expected number of children. However, since the interest is in large families rather than the number of children, I assign the value 1 to “three or more children” and the value 0 to all other outcomes. That way the expected final outcome is the probability of having a large family, and the effect of mother’s education on the final outcome is the marginal effect on the probability of having a large family. By choosing these values, the decomposition in equation (6) simplifies to equation (7).
The data are from the scientific use file of the Mikrozensus of 2008. It contains 6,261 women born between 1934 and 1936, with valid information on the number of children they received and their education. Education is measured by three educational categories, namely, basic education, which I define as only general lower secondary education (Hauptschule or Realschule) or less, lower secondary education plus vocational education, and higher general secondary or more (Abitur or Abitur followed by vocational education or tertiary education.). The distribution of the number of children born to women conditional on their education is described in Table 4. In this table, one can see that women in this cohort were not well educated, and 46 percent of the women end up with only basic education. One can also see that having a large family is not uncommon, as 32 percent of the women in this cohort got three or more children. Finally, one can see that higher educated women tended to get fewer children. It is this pattern that this example wants to describe in more detail.
Distribution of Number of Children Conditional on Mother’s Education.
One could estimate a sequential logit model using this data and then compute the decomposition. However, that is in this case not necessary. Notice that the decomposition only depends on the predicted probabilities of passing transitions and that the model would be fully saturated. A characteristic of fully saturated models is that predictions from that model do not differ from predictions one could directly compute from Table 4. For example, the probability of passing the first transition, that is getting one or more children, for a women with basic education is
The decomposition of discrete changes is shown in Table 5. Women with vocational education tend to have a 12.3 percent lower chance of getting a large family compared to women with basic education. All transitions contribute to this negative effect, but the largest contribution comes from third transition, which has both the largest effect and the largest weight. The weight for the final transition is highest as the gain is high—everybody who passes the third transition by definition has a large family—and there is still a substantial proportion of women at risk of passing that transition. For the first transition, everybody is at risk, but only a small proportion who pass the first transition actually get a large family.
Decomposition of the Effect of Education on the Probability of Having a Big Family.
Women with Abitur tend to have only 0.4 percent lower chance of getting a large family compared to women with vocational education. This small effect is the result of two opposing effects: women with an Abitur are less likely to pass the first transition, that is, they are more likely to remain childless compared to women with vocational education. However, once a woman with Abitur has two children, she is more likely to get a third child. The negative effect on passing the first transition is larger than the positive effect at the third transition, but the weight of the first transition is lower than the weight of the third transition.
It is informative to show how the elements in Table 5 are related. The total effect is a weighted sum of the effects on passing the transitions. For example, the total effect for the comparison of vocational education versus basic education is
It is also informative to show how Table 5 can be derived from Table 4. The total effects can be computed directly from Table 4: For example, the probability of getting three or more children is 0.259 when a woman has vocational education and 0.382 when a woman has basic education, so the difference is −0.123, which is the total effect reported in Table 5. In order to compute the differences in the transition probabilities, one needs to compute those transition probabilities from Table 4. To repeat the abovementioned example, the probability of passing the first transition for a women with basic education is
Transition Probabilities Conditional on Mother’s Education.
To compute the weights, we need the “average” transition probabilites (
“Average” Transition Probabilities (
Conclusion
The purpose of this article was to discuss a method for studying processes consisting of a sequence of steps and especially relating effects of a variable on passing each transition to the effect of that variable on the final outcome. The idea behind this is that effects on passing each transition and effects on the final outcome are not competing descriptions of the process being studied but natural complements. Treating these effects as complementary creates the challenge to move beyond a separate discussion of these two estimates to an integrated discussion.
This challenge was met by showing that the sequential logit model and other sequential models also imply an estimate for the effect on the final outcome. In case of a sequential logit model, this estimate is a weighted sum of effects (log-odds ratios) on passing transitions such that the effect receives more weight if more people are at risk of passing that transition, if the transition is more differentiating (i.e., if the proportion of respondents who pass is closer to 50 percent), and if there is a larger difference in the expected outcome between people who pass and fail that transition. This method decomposes the marginal effect of an explanatory variable on the final outcome and thus applies to continuous explanatory variables. A variation of this decomposition was developed that decomposes the discrete change in final outcome in a weighted sum of discrete changes in the probabilities of passing transitions. In this decomposition, a transition receives more weight when more people are at risk and the expected gain from passing a transition is larger. A limitation of this decomposition is that it is only descriptive and does not have a causal interpretation.
Apart from enabling an integrated discussion of effects on each transition and effects on the outcome, this decomposition also allows for studying the impact of changes in the distribution of the outcome on the effects of explanatory variables on that outcome. For example, if one is studying the effect of parental background on educational attainment over cohorts, it is interesting to also study the impact of the general increase in the highest achieved level of education over cohorts on the effect of parental background. This can be especially interesting since the increase in average educational attainment across cohorts is one of the most universal and far-reaching changes in educational systems across countries during the twentieth century (Hout and Diprete 2006).
This method was illustrated using two examples that are also used to introduce several extensions. The first example considered the effect of father’s occupational status on the process of educational attainment in the Netherlands between 1905 and 1991. The Dutch educational system is a tracked system in that children at about the age of 12 are selected in different tracks. This example was used to illustrate that the decomposition can also be applied to decision trees that are more complex than a set of hierarchical pass or fail decisions. It showed that the composition of the effect of father’s occupation on the offspring’s highest achieved level of education shifted from being primarily determined by the first transition (whether or not to continue after primary education) to being primarily determined by the second transition (the choice between the vocational and the academic track). The last two transitions (whether to finish vocational or tertiary education) contributed relatively little throughout the period being studied. The differences in the distribution of education across cohorts (educational expansion) were shown to explain this shift in importance between the first and second transitions and a striking feature of the trend in effect on the final educational outcome. This feature is that the trend over cohorts showed an initial increase followed by a decrease. The initial increase can be explained by the increase in the proportion of students that pass the first two transitions from less than 50 percent to around 50 percent, thus initially increasing the weights for both transitions. The weight for the second transition also increased as more students became at risk of passing that transition. The subsequent decrease happened because the weight of the first transition sharply decreased since passing that transition became nearly universal. These changes also explain the shift in importance between the first and second transitions.
The second example considered the effect of women’s education on the process through which they get a “large” (three or more) number of children for West German women born between 1934 and 1936. This example was used to illustrate the decomposition in terms of discrete differences instead of marginal effects. Women with vocational education have a 12 percent larger chance of having a large family compared to women with only lower secondary education. Most of this effect is due to the last transition (whether to get a third child after having two children). Women with higher secondary education or more have about the same chance of getting a large family compared to women with vocational education. However, this is due to two effects canceling each other out: Women with higher secondary education are less likely to get a first child, but are more likely to get a third child if they already have two children compared to women with vocational education.
This example was used to show what can be done when it is not possible to assign a value to each outcome state. This is problematic as the final outcome of the process is defined as the average outcome, and the average can only be computed if each outcome state has a value. However, in those cases, there is often one state that is of particular interest, and by assigning the value 1 to that state and the value 0 to all other states, the outcome becomes the probability of attaining the state of interest. This way the effect on the final outcome becomes a marginal effect on the probability of attaining the state of interest.
In conclusion, this article has shown how the study of a sequential process, like attaining education or getting children, can be enriched by studying both effects on passing each transition within that process and effects on the final outcome of that process as complementary pieces of information.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
