Abstract
Because optimal matching (OM) distance is not very sensitive to differences in the order of states, we introduce a subsequence-based distance measure that can be adapted to subsequence length, to subsequence duration, and to soft-matching of states. Using a simulation technique developed by Studer, we investigate the sensitivity, relative to OM, of several variants of this metric to variations in order, timing, and duration of states. The results show that the behavior of the metric is as intended. Furthermore, we use family formation data from the Swiss Household Panel to compare a few variants of the new metric to OM. The new metrics have been implemented in the freely available TraMineR-package.
Introduction
Sequence analysis is the generic name for a variety of methods that subserve the analysis of state sequences like life courses and job careers. 1 Today, sequence analysis has become one of the standard toolboxes for those who analyze sequence data, and sophisticated, user-friendly software for such methods is freely available (e.g., Brzinsky-Fay, Kohler, and Luniak 2006; Elzinga 2009; Gabadinho et al. 2011).
To compare sequences, one needs a measure of distance or similarity between pairs of sequences and by far the most frequently used metric to generate such distances is the so-called optimal matching (OM) metric.
2
The OM metric expresses distances in terms of the minimum cost of a sequence of edit operations that turns one sequence into an exact copy of the other sequence. In the sequel, we will write
Ample descriptions of the metric and the associated algorithm can be found in numerous sources, for example, in Clote and Backofen (2000), in Martin and Wiggins (2009) and in Sankoff and Kruskal (1983). Largely motivated by the problems of determining the weight or cost of the edit operations involved, many variants of the metric have been proposed, some quite general (e.g., Gauthier et al. 2009; Halpin 2010; Moen 2000), others more application-specific like in, for example, Lesnard (2008). For a comprehensive overview of OM variants, the reader is referred to Studer (2012).
For various reasons, the use of OM in the social sciences has been widely criticized, most notably in Settersten and Mayer (1997), Dijkstra and Taris (1995), Wu (2000), Levine (2000), Elzinga (2003), and Lesnard (2008). The first major point of critique has been that, in the social sciences, edit operations have no interpretation; they cannot be interpreted as spontaneous or selection-driven mutations like in microbiology. However, OM can be interpreted in a way that does not involve edit operations at all but instead refers to the concept of a longest common subsequence (
wherein
The second major criticism of OM pertains to the fact that often there is no objective way of establishing the edit costs of the edit operations. Various, more or less sophisticated, ways of deriving edit cost from state-transition frequencies have been devised (see, e.g., Gauthier et al. 2009), but these methods do not resolve the basic issue: establishing the proximity or similarity of states and how this similarity can be derived and operationalized from social science theory. This matter has not been resolved and cannot be resolved within the framework of sequence analysis (Studer 2012). On the other hand, Hollister (2009) and Studer (2012) propose promising strategies to establish state proximities through scaling strategies that are independent of sequence analysis. Despite the trouble we have in finding acceptable ways of establishing state proximities or state similarities, we cannot do without them. To illustrate this point, we consider three toy sequences from the domain of family formation, using the states Single, Unmarried cohabitation, Married, and Married with Children:
Whatever metric we use, we should find that x is closer to y than it is to z, simply because the state U is more similar to the state M than to the state S. The second reason to consider state similarities is that it is a key feature to compute multichannel distances (Pollock 2007; Gauthier et al. 2010).
According to some authors, it is more convenient to invest in defining OM costs rather than moving to a more analytical definition of the dissimilarities. Perhaps that would be a viable strategy when the issue of establishing edit cost would be the only challenge for OM in particular or sequence analysis in general.
However, not so well known or ignored is the fact (Elzinga 2003; Studer 2012) that OM is not very sensitive to differences in the order of the states of a pair of sequences. As an example, we consider the three toy sequences below representing careers, using state
According to the OM metric, sequences x and y are the closest pair because they share a long “common narrative”: the seven time units spent in the
This lack of sensitivity to ordering is problematic since sequence analysis is about differences between categorical time series where event orderings or state orderings are defining the sequences. Moreover, the ordering of the states reflects the internal dynamics of a trajectory, one of the important aspects that sequences analysis claims to take into account. Therefore, Elzinga (2003, 2005) proposed a distance metric that is based on a subsequence-based vector space. Subsequences allow analyzing states in their contexts and thus the dynamic of the trajectory. In our example, the subsequence
However, Elzinga’s metric does not allow for different state proximities: All states are considered as equally different (see also Hollister 2009). Therefore, we propose a very flexible generalization of Elzinga’s subsequence-based metric that does handle such state proximities. The metric has a number of interesting properties that are best explained through representing sequences as vectors in a vector space with a Euclidean norm.
As said before, one limitation of OM is the way it handles time. This limitation is caused by the fact that OM counts edits applied to symbols in a string and has no inherent mechanism to deal with quantities like duration. Therefore, the observation that someone was unemployed (
that is, as a sequence of 40 observations of states. This mapping of durations to strings of states severely limits the way in which time or duration can be handled (see, e.g., Halpin 2010). However, representing the observations as
that is, as two states, one with a duration of 10 months and one with a duration of 30 months may seem more natural. Choosing a different time scale, say years, would then invite to write
Formally, a state sequence like for example,
that is, a state sequence and a numerical sequence. Using a metric that can handle time as a quantity that can be separated from the states would allow for a more sophisticated treatment of the time dimension (see, e.g., Abbott and Hrycak 1990; Halpin 2010).
The purpose of this article is to discuss a family of distance measures that is quite sensitive to differences in the sequencing of the states or events, that allows for proper duration handling and time transforms, and that uses state proximities. We will discuss the metrics and we will demonstrate their sensitivity, relative to OM. Finally, we will demonstrate their practical use in an application to family formation.
Thereto, the next section introduces the representation of sequences through feature vectors, the features being subsequences. The third section then discusses soft-matching, the use of state proximities, and the required transform of the vector space. The fourth section discusses spell sequences: sequences where duration is treated as a property of the states. In the fifth section, we discuss the unifying framework of a feature vector representation and in the sixth section, we assess the sensitivity of the metrics to differences in sequencing, timing, and duration of the pertaining states. Finally, in the seventh section, we apply the newly introduced metrics to family formation data and compare the results with those obtained using OM. In the eight section, we discuss our findings and the merits thereof.
Sequences as Vectors
Vectors and Distances
What makes vector representations so interesting? Vector representations are interesting because once we have vectors, there is a whole family of distance measures that are proper metrics in the sense that these distance measures satisfy the axioms of a metric:
D1 states that an object has one location only and D2 states that two distinct objects cannot be in the same location. D3, the symmetry axiom, states that direction does not affect distance, and D4, the so-called triangle inequality, states that “a detour takes at least as much time” or, put differently, that when two objects (x and z) are close to a third object (y), they cannot be remote from each other.
The triangle inequality is not only important because it formulates an intuitive property of a quantified space. The triangle inequality also ensures that objects can be located with respect to each other without other objects being involved. If the triangle inequality would not hold, the observation of
Finally, the triangle inequality ensures that the space exhibits a certain regularity or smoothness in the sense that at least some of its properties are invariant in all directions. If this were not true, the space, the representation of the sequences in a distance matrix, would not be very meaningful. Let us illustrate this remark: Imagine that we have observed a set of sequences
for all pairs
holds for all pairs
So the triangular inequality severely limits the location of the new sequence p in sequence space: The distances to
We know (see, e.g., Clote and Backofen 2000) that the OM distance
Illustration of the Metric Properties of the OM Standard Edit Cost Matrix.
Note. OM = optimal matching. The reader verifies that, in the left hand matrix,
Once vectors are available, it is easy to calculate Euclidean distance
In this article, we will only be concerned with Euclidean distances since they can be evaluated without even “knowing” or constructing the vectors explicitly: From equation (5), we see that we can evaluate the distances, provided that we have access to the values of the inner products
Finally, vector spaces are very attractive to work with because they have been amply studied in linear algebra (see, e.g., Meyer 2000) and much of this knowledge is exploited in the standard multivariate statistical models.
In the next two subsections, we will discuss how to construct vectors from sequences. Essentially, this is a new presentation of Elzinga’s proposals as discussed in Elzinga (2003, 2005). These new presentations allow us to discuss vector representations without referring to algorithms for the evaluation of vector products. In the sections Mapping Embedding Frequency, we discuss an easy extension and in the sections State Matching and Inner Product Spaces and Spell Sequences: Handling Durations, we exploit the representation to discuss more advanced issues like the handling of time and state matching.
The Basic Representation
We will construct vectors from sequences through using the concept of “subsequence,” so we begin with elaborating on this concept. For a more formal treatment, the reader is referred to for example, Apostolico and Cunial (2009); Crochemore, Hancart, and Lecroq (2007); or Elzinga, Rahmann, and Wang (2008).
Consider the toy sequence
We may take any nonnegative number of states from x and we will then be left with a subsequence of x: a subsequence u of states that have the same order in
We will now use the concept of subsequence to construct a vector representation
Formally, from
This construction characterizes strings by their subsequences and the resulting vectors are also called “feature vectors,” the subsequences being treated as features of the sequence.
The inner product
In practice, this is a very appealing feature. It means that, using a cluster analysis, sequences grouped together will share the same subsequences. In a discrepancy analysis (Studer et al. 2011), a test would be significant if the subsequences of one group are significantly different from those of the other one. This would be similar to using multivariate analysis of variance (MANOVA) in the subsequence space.
The vectors so constructed have a countably infinite dimension since the index function
The representation in equation (6) is very simple in the sense that it just uses the presence or absence of subsequences to represent the sequences and thus it is tempting to use substantially more interesting properties of the subsequences (provided that kernel functions exist that evaluate inner products of the resulting vectors). This is exactly what we will do in the following subsections: Define more sophisticated properties of the subsequences and use these to modify the distance measure according to its application.
Mapping Embedding Frequency
Returning to our toy sequence
Unfortunately, the sequence “Imprisoned, Probation, Convicted” is a subsequence that is embedded more than once in many a criminal career and we know that frequency of embedding of such subsequences is a relevant feature when comparing criminal careers. Similarly, the embedding frequency of the subsequence “Unemployed, Vocational Training, Employed” is an interesting feature of labor market careers.
From the previous examples, we conclude that taking embedding frequency into account when comparing sequences may be a sensible thing to do and it is easily accomplished by constructing vectors through defining coordinates according to
To interpret the meaning of
for if
Mapping Subsequence Lengths
Most people share, in most kinds of careers, a lot of single states. For example, when studying family formation careers, we know that most people started living with their parents, then become parents themselves and before that, live together with a partner. Similarly, most people go to school before starting to work, and so on. So, we may expect that many careers share the same short subsequences. Therefore, when counting the number of common or matching subsequences, that is, when using representations (6) or (7), it might be interesting to weight the counts according to the length of the subsequences by some convex function
The square root is arising since we are interested in evaluating an inner product
For example, by setting
State Matching and Inner Product Spaces
In this section, we will extend the methods dealt with to using nonperfect matchings between states and, consequently, matchings between subsequences. All of the methods discussed so far construct vectors from sequences in order that the inner product of such vectors is equivalent to a weighted count of the common subsequences. Such counts are then used to construct distances and similarities. Incorporating subsequence matchings will allow us to also count nonperfect matches and weight these appropriately.
We first have to define such matchings and this is the subject of the first subsection. Once defined, we will have to investigate how we can use them. This is nontrivial since the standard inner product counts the common subsequences that are perfect matches: The vector coordinates are indexed by the set of distinct subsequences and hence the inner product
Matchings
We already argued that generating meaningful distances between sequences is not well possible without assessing the similarity or substitutability of the states or events involved. On the other hand, the actual assessment of such quantities is highly dependent upon the subject matter of the sequences so, in a methodological essay, it is not possible to detail the evaluation of state similarity. On the other hand, we have seen authors (e.g., Chen, Ma, and Zhang 2009; Elzinga 2014; Elzinga et al. 2011; Emms and Franco-Penya 2012; Gower 1971; Gower and Legendre 1986; Tversky 1977; Wang 2006) dealing the general issue of similarity measures and their properties. However, a detailed account of their ideas is far beyond the scope of this article. Here, it suffices to state that we assume that we have somehow defined or constructed similarities between the states of the alphabet. With an alphabet
For example, for the alphabet of living arrangements
implying that being Married is very similar to living in Unmarried cohabitation and that this similarity even increases when there are children in the household. We not only compare states, we also compare sequences and we express the degree of matching
For sequences
Most importantly, we observe that given the matrix

Structure of the matrix
The reader observes that, due to the multiplicative structure of the matchings, the matrix has a very regular structure. The reader also notes that the submatrix containing the single-state matchings regularly reoccurs, in Figure 1 as
Generalizing the Standard Inner Product
So far, we have discussed a basic vector representation of sequences that utilizes more or less sophisticated properties of the subsequences. The distance
wherein
The reader easily verifies that the standard function
But this trivial extension invites to exchange
In Figure 2, we demonstrate that using an inner-product

Unit distance plots in
Just like actually constructing vectors from sequences is hardly practically feasible, it is not feasible either to generate the full matrix of matchings
Spell Sequences: Handling Durations
Durations of Subsequences
At first sight, handling durations in the context of vector representations is easy to conceptualize. Let
leading to an inner product of the form
that is, as the sum of the lengths of the spells.
When multiple embeddings do exist, we could set
but this will rarely be an appealing option since it could imply mapping quite different sequences onto the same vector. Alternatively, we might use the durations of all embeddings. So, we define
5
the sum of all durations of all embeddings of a particular subsequence
which can be interpreted as mapping embeddings, weighted for duration. The inner product resulting from this construction will then have the form
Equation (24) hints to an easy interpretation of the representation: Vector coordinates are averages of the durations of subsequences, weighted by their embedding frequencies.
Some authors (e.g., Abbott and Hrycak 1990; Halpin 2010) have suggested to transform time through a convex or concave function. This can be incorporated in Equation (23) by writing
Let us consider the function
Practical Considerations
All of the metrics discussed previously have been implemented in the freely 6 available software package TraMineR (Gabadinho et al. 2011) and the required algorithms have been amply described in Elzinga et al. (2008) and in Elzinga and Wang (2013); here we will not deal with algorithmic issues.
TraMineR imposes no practical limitations on the size of the alphabet or the number of sequences in the data set to analyze. However, with
Let
McVicar and Anyadike-Danes (2002) published a data set consisting of 712 sequences of school-to-work transitions, each covering 72 months. Calculating a full distance matrix using TraMineR for this data set requires only
Converting equally long state sequences to spell sequences will normally generate spell sequences of unequal length. However, contrary to OM, working with vector representations does not require the sequences to be equally long. The reason is that the vector representing the shorter of the sequences will have zero-valued coordinates for all subsequences that are longer than the sequence itself. Therefore, multiplying vectors representing sequences of unequal length will only result in zero-valued products of coordinates referring to longer subsequences. Hence, there is no theoretical or practical objection whatsoever to calculating distances between sequences of unequal lengths.
The General Framework: Feature Vectors
So far, we presented several examples of a very general model for representing sequences as vectors, the coordinates indexed, given the state alphabet, by the sequences that are constructible from this alphabet. Given a sequence
wherein
Weighted Functions for Feature Vectors.
Note. The middle column shows the evaluation of
In the first entry of Table 2,
In the last two entries, we mention two kinds of weighting not dealt with in this article. Weighting according to gap width is relevant when one considers common subsequences with big time gaps between the states as less relevant. Weighting according to the state composition of the subsequences might be relevant when the occurrence of particular states is more salient than the occurrence of other states. The point here is that any kind of weighting can be accommodated in the general representation and it can be applied as long as we can find algorithms that allow us to evaluate the inner products of the vectors. Furthermore, it is important to stress the fact that any number of these weightings may be applied simultaneously, again provided suitable algorithms are available.
What is not shown in Table 2 is that each of these weightings can be applied with or without soft-matching of states, that is, with either an inner product of the form
So, relying on a subsequence vector representation (SVR for short) allows for an enormous versatility in weighting features, warping time, applying soft-matching, and dealing with sequences of unequal lengths. Furthermore, the interpretation of the results of well-known methods in sequence analysis is made easier. For instance, using “Ward” clustering with such a metric is equivalent to finding clusters minimizing the residual variance of the features, that is, minimizing the variability of the subsequences. Using discrepancy analysis is equivalent to running a MANOVA in which the dependent variables are the features (i.e., the subsequences).
In the next two sections, we will compare SVR metrics with OM. In particular, we will use weighting of subsequence lengths by varying the parameter
Metrics Used in Assessing Sensitivity; All of Them Weighted for Embedding Frequency.
Note. Emb = embedding frequency; OM = optimal matching; SVR = subsequence vector representation.
Assessing Metric Sensitivity
Common order of states is the basic property that defines similarity between sequences as temporal successions of states or events (Elzinga 2003). However, common order is not the only angle from which to look at sequence similarity. Another important aspect is duration. For example,
To evaluate these sensitivities, we proceed as follows. We generate two groups of sequences that differ in only one of the facets: in ordering, in timing or in duration. We then evaluate the ability of each distance measure to discriminate between these two groups using a Discrepancy Analysis. This analysis evaluates the strength of the association between the sequences as described by a distance measure and a partition (here, our two groups).
7
This association is measured using a pseudo-
wherein
In order to get stable results, one million sequences were generated in each group of sequences. Each simulation is repeated one thousand times and the results proposed here show the average pseudo-
In the present context, we ran three different types of simulations as summarized in Table 4. Each of these separately evaluates the sensitivity of the metric to perturbations of one of the facets previously introduced: ordering, timing, or duration. Subsequently, we present the details of each of these simulations and discuss the results.
Patterns, Onset, and Duration Variations Used in Assessing the Sensitivity to Perturbations of Ordering, Timing and Duration. Total Duration of all Patterns Is Restricted to 20 Units of Time.
Note. Total duration of all patterns is restricted to 20 units of time.
Ordering and State Proximities
For each simulation, we created two groups of spell sequences:
A metric that is very sensitive to differences in order or pattern will easily separate the two groups by generating a high value of
State proximities strongly affect the ordering. As a first example, consider again the spell sequences
Summarizing: we generate a set of sequences with two generators, calculate distances with one of the metrics from Table 3, calculate

Plots of discrepancy analysis’
The results when
The shapes of the curves convincingly show the effect of soft-matching. In a qualitative sense, the SVR metrics show the same behavior as the OM metric and they all behave as expected. In all cases, the
Timing
Timing simulations follow the same logic as the one for ordering. Patterns and durations are random in both groups, but the spell in the state
The first panel of Figure 4 presents the evolution of the

Evolution of the
Duration
We used the same strategy again for the duration simulations. Patterns and timing are random, while the duration of the second spell (in the state
The second panel of Figure 4 presents the results for duration-related simulations. By far, OM is the most sensitive to duration. NMS and SVR (spell, a = 0, b = 2) present, here again, an intermediary position. Regarding SVR (spell), we note that
Conclusion
According to our simulations, the distance measures are sensitive to different facets. SVR (spell) variants are most sensitive to differences in ordering, NMS is most sensitive to timing and OM to duration. This means that the choice of a distance measure always has to be justified in the context of the application in which it is applied.
These simulations allowed us to measure the effects of the SVR (spell) parameters and to demonstrate that they behave as expected. The
We now turn to an application of these SVR metrics to real data in order to highlight the contributions of the newly introduced distance measures.
An Application to Family Formation
Data and Distances
In this section, we apply the different configurations of the SVR metrics to well-known data and compare the results with those obtained when applying OM and NMS to the same data. The data were first presented in Müller et al. (2008).
Briefly, these data represent family formation trajectories of Swiss individuals who were at least 30 years old at the time of the survey. 10 One of the goals of this study was to highlight the change of the social norms constraining these trajectories. The states in the sequences were built using a combination of four distinct events: Leaving home, Marriage, having a first Child, and Divorce. For the sake of simplicity, some very rare states were merged resulting in eight possible states. An individual is in the state “P” (living with Parent) if no event has occurred, in the state “L” if the event “Left parental home” occurred, in the state “LM” for “Left and Married,” and “LMC” for “Left, Married and with a first Child.” Similarly, state “M” is for an individual who just Married (without leaving parental home), and so on. Finally, state “D” is for all individuals who have married and divorced (without making difference for having left the parental home and/or having or not having children).
To determine the substitution costs needed for the calculation of an OM distance matrix, we proceeded as follows. First, we created a four-dimensional vector for each state; the coordinates corresponding to the events shown in Table 5 by assigning 0 to “no,” 1 to “yes,” and 0.5 to “yes/no.” Then, we calculated the Manhattan distance between all pairs of states and normalized these distances to the maximum distance found (
State Definitions of Family Formation Trajectories From the Swiss Household Panel.
Note. C = child; D = individuals who have married and divorced; L = Left parental home; P = living with Parent; LM = left and married; LMC = left, married and with a first child; NMS = number of matching subsequences; sp = spell; SVR = subsequence vector representation.
In order to compare the results obtained by using different metrics, we calculated the different distance matrices using the proximities (or costs) defined previously: SVR (sp, b = 1; SVR based on spells), SVR (sp, b = 2; SVR based on spells, squared durations), SVR (sp, a = 1; SVR based on spells with subsequence weighting), and the OM distance. In order to highlight the effect of proximities, we also added the distance SVR (sp, b = 1, c), the SVR (sp, b = 1) distance computed using constant differences (i.e., a similarity of zero between all states). Finally, we included the NMS distance as defined by Elzinga, that is, with constant cost, in order to highlight the distinctive features of the newly proposed metrics. These are the same metrics as used in the simulations (see Table 3), but with shortened names in order to generate useful plots.
Distance Disagreements
To investigate the differences between the various metrics, we started by looking for pairs of sequences where the different metrics generate very different distances. Thereto, we first standardized the metrics in order to get rid of different distance units; for each metric, say metric
with the effect of creating dimensionless or unit-free distances
Analysis of Biggest Standardized Distance Differences.
Note. D = individuals who have married and divorced; L = Left parental home; P = living with Parent; LM = left and married; LMC = left, married and with a first child; NMS = number of matching subsequences; sp = spell; SVR = subsequence vector representation. Table 6 specifies the distances and the pertaining sequence pairs.
Let us discuss an example by looking at the strongest disagreement between standardized OM and standardized SVR (sp, b = 1). In the first column fourth row, we have “OM – SVR (sp, b = 1) = 4.74” for the comparison of the sequences
We can identify the contribution of soft matching coefficients by looking for the differences between SVR (sp, b = 1) and SVR (sp, b = 1, c) (SVR (sp, b = 1) with or without soft matching coefficients). Using states proximities, the distance between
We can also identify the contribution of SVR distances parameters such as time transform by looking at the differences between SVR (sp, b = 1) and SVR (sp, b = 2). As expected and confirming our simulation results, SVR (sp, b = 2) is more sensitive to time spent in each state whereas SVR (sp, b = 1) is more sensitive to ordering. Subsequences length weighting SVR (sp, a = 1) has the effect of weighting the comparison of states in sequences containing many different spells. As a results,
Finally, let us look at the difference between SVR metric and NMS (Elzinga 2003). Since NMS only accepts constant state proximities, we will compare distances SVR (sp, b = 1, c) and NMS.
12
According to NMS, sequences
The analysis of distance disagreement confirms the results of the simulations. SVR (sp) variants are the most sensitive to ordering while OM distance is strongly linked with the time spent in a state. This analysis also highlights more precisely the effect of the SVR parameters. While
Clustering
We used the partitioning around medoids (PAM) algorithm (Kaufman and Rousseeuw 1990) to cluster the sequences on the basis of the four distance matrices using the sampling weights. To get an indication of the optimal number of clusters, we calculated all solutions with the number of clusters varying between 3 and 20. Table 7 summarizes the results. Both the ASW and the HC index are dimensionless measures, each depending on a ratio (of differences) of distances, and therefore, these indices can be used to compare partitions based upon different distance matrices. They can be interpreted as the capacity of a clustering method to match the structure of the data, the structure being defined by the features of each metric. The computations were carried out with the WeightedCluster library (Studer 2013). SVR-based clusterings usually identify more clusters and the best clustering quality is found with SVR (sp, a = 1).
Clustering Quality Measured Through Average Silhoutte Width (ASW, to be maximized, Kaufman and Rousseeuw [1990]) and the HC index (HC, to be minimized, Hubert and Levin (1976)) with Various Metrics as Indicated Subsequently.
Note. ASW = Average Silhoutte Width; HC = Hubert’s C; NMS; OM = ; SVR = subsequence vector representation; denote the optimal number of clusters for each of the metrics used.
Table 8 presents the medoids of the clusters obtained using this optimal number of groups. SVR-metrics provide very similar clustering (Cramer’s
Medoids of the Clusters Found With Different Distance Metrics With Varying State Similarities.
Note. D = individuals who have married and divorced; L = Left parental home; P = living with Parent; LM = left and married; LMC = left, married, and with a first child; sp = spell; SVR = subsequence vector representation. Durations of states are indicated as superscripts of the state acronyms. Clusters have been placed on the same row when their state-orders match. Relative cluster sizes (
Confirming the sensitivity to timing highlighted by the simulations, NMS makes several distinctions according to the timing of transitions. However, confirming the results presented by Aisenbrey and Fasang (2010), all complex sequences are regrouped in a big, quite heterogeneous “residual” cluster (
Next, we closely scrutinize the difference between the clustering results for OM and SVR (sp, b = 1) through visually rendering the clusters.
Cluster Visualization
To visually render the clusters, we will use parallel coordinate plots, chronograms, and sequence index plots. As many readers may not be familiar with parallel coordinate plots (for short: PC plots), we first spend a few lines on them (see also Bürgin and Ritschard 2014; Bürgin, Ritschard, and Rousseaux 2012; Inselberg 2009).
A PC plot renders multivariate objects on a flat surface by first drawing as many vertical lines as there are variables or dimensions, each of which may have a different scale. Individual objects are depicted as a line, drawn in left-to-right direction, crossing the vertical (parallel) lines at the appropriate height. Often, the thickness of the object-representing lines is proportional to the number of objects that share the same coordinates. A toy example of a PC plot is shown in Figure 5.

Parallel-coordinate plot of a multivariate object
Here, we use the PC plots to render the sequences by the order of the events, ignoring durations. To attain this, we use as many identical, parallel scales, as there are events (states) in the individual sequences. Hence, an individual’s position on the first of the scales corresponds to the first event, her position on the second parallel scale corresponds to the second event, and so on.
Figure 6 presents the PC plots of the sequences plotted according to the SVR (sp, b = 1) clustering. Let us discuss some examples in order to illustrate the interpretation of these plots. In the plot called “P-LM,” the brown line indicates one of the patterns of the four events. It starts at position

Parallel-coordinate plots of the 11 clusters found from the SVR (sp, b = 1)-distances.
Using these plots, we can see that the SVR (sp, b = 1) clustering is very homogenous according to the ordering of the underlying events. Only the clusters leading to divorce group different patterns, but they all end with divorce. The clusters distinguish the sequences according to the synchronization of events, notably marriage and leaving home. In a first set of clusters, leaving home is experienced before marriage, while in another set these events occur simultaneously. These are important distinctions; Billari, Philipov, and Baizán (2001) argued that the simultaneity of marriage and leaving home should be interpreted as one distinct state.
Figure 7 presents the chronograms of the six clusters found from the OM distances. From these chronograms, the clusters seem easy to interpret. Indeed they are, but only on the basis of the time spend in the states and not on the basis of the orderings of the underlying events. Figure 8 presents the PC plots of the same clustering. The underlying orderings of the events are very diverse in each cluster. For instance, looking at the cluster called “Late LMC,” at least four patterns can be identified (events in parenthesis occurs simultaneously): P-(LM)-C (in rose), P-(LMC) (dark-blue), (P-L-M-C (yellow), and P-L-(MC) (green). Using the chronogram, we are tempted to call the first group “Staying with parents,” because the mean time spent in state “P” is large. However, the PC-plots show that many distinct patterns are grouped here.

Chronograms of the six clusters found from the OM distances.

Parallel coordinate plots of the six clusters found from the OM distances.
The wide use of chronograms and index plots may be one of the reasons of the popularity of OM. As we have shown with our simulations and through this example, OM is strongly linked with duration differences. This is shown in chronograms and index plots too, because the area plotted in single color depends of the total time spent in the associated state.
Comparing both clustering solutions, using OM leads to some distinctions according to time spent in each state while SVR (sp, b = 1) is strongly linked with the ordering of the underlying events.
Metrics and the Evolution of family trajectories
If there would be an evolution of family trajectories, we would expect to see the size of clusters change over time in a systematic way (see, e.g., Elzinga and Liefbroer 2007). Here, we evaluate these changes as revealed by both clustering on the basis of OM distances as well as on the basis of SVR (sp, b = 1). In Tables 9 and 10, we present the relative distributions of cluster membership per cohort, for OM- and SVR distances respectively. The association is highly significant in both cases but stronger for SVR-based clustering (Cramer’s
Distributions of Relative Cluster Frequencies (%’s) per Cohort for OM-Based Clusters.
Note. Cells are colored in blue if the standardized Pearson residuals is higher than 1.96 and in red if lower than –1.96. Clusters are characterized by their medoids. Cramer’s V = 147. Color version of the table is available online at smr.sagepub.com.
More interesting is the question if and what qualitative differences show up when we study the evolution of cohorts through OM or through an SVR-based metric.
Using OM, the evolution seems to be dominated by state duration changes. Older cohorts were staying longer with their parents (clusters
Clustering SVR distances provides for an alternative view on this evolution by highlighting changes in the ordering of the events. Older cohorts stand out through the synchronicity of leaving the parental home and marriage. These two events were frequently occurring simultaneously, but this is much less frequent in the youngest cohorts. This “de-synchronization” has been interpreted as the result of the raise of nonmarital union in Switzerland and the introduction of a new intermediary stage of “partial independence” in the road toward autonomy (Thomsin et al. 2004). Contrary to OM, here the latest cohort does not distinguish by not marrying nor having children, but by different patterns leading to these situations.
Clearly, in this analysis of Swiss family formation sequences, the SVR (sp, b = 1)-based metric has provided new insights through revealing the underlying ordering of the events. OM leads to interesting results when we are interested in the durations spent in each state.
Conclusion and Discussion
We motivated this article by pointing at the poor performance of the OM metric with respect to a basic property of sequences: the order of the states or events involved. OM is not very sensitive to differences in the sequencing of the pertaining states. This lack of sensitivity is nicely demonstrated through the application described in the previous section through the PC plots that show very different orderings of the underlying events within clusters. This is not to say that OM cannot be a useful metric: It is useful when state durations are more important than state ordering. This too is shown in the chronograms of the previous section.
Distributions of Relative Cluster Frequencies (%’s) per Cohort for SVR (sp, b = 1)-Based Clusters.
Note. SVR = subsequence vector representation. Cells are colored in blue if the standardized pearson residuals is higher than 1.96 and in red if lower than −1.96. Clusters are characterized by their medoids. Cramer’s V = .193. Color version of the table is available online at smr.sagepub.com.
According to our simulations, the NMS-based metric is mostly sensitive to differences in timing. However, in the application presented, one of the big NMS cluster regroups all “complex” sequences which is not very meaningful. Such a phenomenon was already noted by Aisenbrey and Fasang (2010).
We presented a very flexible, versatile metric that does well when ordering of states is the key issue. This too was demonstrated in the previous section and in the simulations presented in the sixth section. Contrary to OM, the SVR-based metrics are less sensitive to duration and more sensitive to the sequencing, the ordering of the states. The exact behavior of the metric can be adjusted using two parameters. The exponential transformation of time (the
Our simulations and application have highlighted the difference between OM and SVR metrics. This can be used to justify the use of one or the other metric and to further interpret differences in results produced by different metrics. Finally, it also helps to interpret the structure of the data. If SVR-based distances produce better results, it might be because the data are more structured according to ordering than according to state durations. Therefore, we believe that the SVR family is a useful alternative to alignment-based methods.
Footnotes
Authors’ Note
This publication is part of the research works conducted within the Swiss National Centre of Competence in Research LIVES—Overcoming vulnerability: Life course perspectives, which is financed by the Swiss National Science Foundation.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
