Abstract

Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise.
The publication of our article in Sociological Methodology was the successful conclusion of a long “sequence” involving one journal’s refusal to referee it (for not being in its field), a presentation at the RC33 conference of the International Sociological Association in 2012, and submission to Sociological Methodology, followed by rounds of revisions. The symposium concerning the article has thus gone much further than we would have hoped when we started the work in 2009. Indeed, we are grateful for the opportunity we have had to interact with specialists in sequence analysis (and life-course analysis) and in this way construct a necessarily partial and temporary status report on the progress made using this family of techniques. We do not have the space in this rejoinder to discuss all the criticisms and observations we have received. However, certain patterns emerged, and we shall try to address the most “robust.” We must first point out and rectify a few misunderstandings.
1. It All Depends
The first misunderstanding concerns the contrast between local and global interdependence, which we explain in section 2 of the article. This contrast as we see it is a “conceptual” one, in the sense that it concerns the way the interdependence between the dimensions of the sequences is “grasped” and recorded by statistical techniques. It therefore precedes the chain of analysis. For example, to study the life-courses of two spouses after they formed a couple, it is appropriate to consider these life-courses as being simultaneous (because they develop jointly within each couple) and to compare the couples from point to point. This is a case of local interdependence, and multichannel sequence analysis (MCSA) is particularly appropriate. One of us has indeed used MCSA in exactly this sort of case (Pailhé, Robette, and Solaz 2013). But if we now turn to the study of homogamy on the basis of the two spouses’ past life-courses leading to their forming a couple, their alignment from point to point makes little sense, and this is a case of global interdependence, to be analyzed with globally interdependent multiple sequence analysis (GIMSA).
Fasang, in her commentary (this volume, pp. 56–70), appears to understand this distinction between local and global interdependence in a different sense, concerning the interpretation of results. That is, subsequent to the analysis chain, a question arises: are the dimensions of the sequences substantively linked in a general manner or at certain specific points in their development? This recalls the event/sequence dichotomy or, in Billari’s (2001) terms, that between the atomistic and the holistic approach.
Under our definition of the distinction between local and global interdependence, the comparison between MCSA and GIMSA is less relevant because the two techniques do not address the same problem. That is why in the article, we compare GIMSA and strategy 4, both of which address global interdependence. 1
The fact that Fasang’s comparison of MCSA and GIMSA on the basis of our application leads to similar results does not imply that the two techniques are interchangeable and that one might reasonably choose the simpler one. This comparison reveals rather the existence of deeper structural patterns in the analyzed data (as is often the case with empirical data in the social sciences).
2. Inflexibility Goes before a Fall
The difference between MCSA and GIMSA, therefore, is “conceptual.” But it is also practical: GIMSA analyzes dimensions of multidimensional sequences of varying length, with different time windows (e.g., age vs. calendar years) and time units (years, months, etc.), and uses different metrics for each dimension so as to emphasize a particular aspect of time (order, duration, date). 2 MCSA can just about cobble together the data formatting, aligning differing dimensions of length, time windows, and time units, by using missing value states, for example (cf. Fasang, this volume, pp. 56–70), but the sociological significance of this “forced” alignment remains questionable. 3 Last but not least, MCSA uses a single metric.
GIMSA’s practical flexibility, pointed out by a number of commentators—especially by Pavalko (this volume, pp. 73–76) and by Fan and Moen (this volume, pp. 51–56)—is one of the elements in its added value (disputed in other commentaries). It involves a series of choices, seen by some as a weakness, in the sense that users are no longer perfectly “controlling” what they are doing, and the “robustness” of the method is allegedly weakened by this. We cover robustness issues in section 4, but here we shall merely note that this concerns a debate between flexibility and simplicity similar to that about optimal matching (OM) analysis some years ago, a debate in which it is not for us to take sides. 4
Each of GIMSA’s steps has an unmistakable and indispensable role, with a varying latitude of decision:
Choosing a dissimilarity measure is indeed necessary when tracking patterns, because these emerge from similarity groups. Can methods that allow only one dissimilarity measure be viewed as more robust because they preclude choice? It all depends on whether this dissimilarity measure is indeed unique for theoretical reasons. Failing that, the fact that GIMSA can support various possible choices of dissimilarity measure should be viewed as an asset, not as a drawback: it should lead researchers to justify their choices of one measure over another or to use several measures and compare their outputs, looking for discrepancies as well as invariants across them.
Multidimensional scaling (MDS) involves no real decision. It simply translates the dissimilarity into the closest Euclidean distance and outputs the corresponding coordinates of units.
Canonical partial least squares (PLS) searches both spaces for principal “directions of matching.” This also involves no choice other than the number of retained directions. This choice is a necessary compromise between the richness of the description of the matching (in terms of dimensions) and its quality: the more dimensions we retain, the less strong the matching. Now, any method concerned with matching should have the following two concerns: (1) providing the ability to tune the demanded level of matching quality and (2) keeping the dimensions of noise (i.e., dimensions carrying structurally weak information) away from those considered in the matching. PLS is one of the simplest ways to achieve that, because it involves no tuning parameter. 5 Any regularized type of canonical correlation analysis could also be used here, 6 on the condition that the regularization be based on the structural strength of the components so as not to find correlations between noisy (i.e., non-information-bearing) features. This is the value of PLS.
The clustering step seems to us the one that involves questionable choices. It is also the only noncompulsory step in GIMSA: after identifying the structural “dimensions of matching” (previous step), we could analyze them in terms of life-history events by correlating them with all kinds of life-history descriptors and thus without having to perform clustering. Clustering is rightly famous for the many arbitrary choices it demands. This echoes the fuzziness of its root question: what is similar to what, how, and in what respect? But here, the final clustering is but one of the many ways to interpret the dimensions of matching. Ideally, these dimensions should be analyzed in a number of alternative ways, to extract the maximum amount of the information they capture.
GIMSA’s flexibility means that it is a particularly suitable instrument for studying linked lives, but like sequence analysis in general, its potential field of application goes beyond life-course analysis. Consequently, we would invite colleagues to disinhibit their “sociological imagination” as Mills (1959) recommended, and include data that are perhaps richer than they habitually use.
3. Guilty by Association?
The second misunderstanding concerns the aims of GIMSA and the analysis of multidimensional sequences generally. As Studer (this volume, pp. 81–88) astutely points out, the cluster analyses we habitually use are not designed to analyze the degree of association between dimensions and are not suited to do so. We obviously agree with this: GIMSA, like MCSA, is a pattern search technique—no more, no less. We plead guilty to the sloppy use of vocabulary (noted by Studer), particularly in the description of the results of the application, in which we tended to overuse “link” terminology. The clustering step cannot, and therefore should not, be interpreted as a way of finding connections, but rather it should be seen as a way of broadly summarizing the connections teased out by the PLS components submitted to clustering. This clustering step is only secondary anyway: GIMSA is mainly the combination of the first three steps (see above). Here too, we make no claim to be doing any more than fishing for patterns of dyads of sequences. 7 This remark may disarm some of the criticisms made of GIMSA, for it can easily be seen that they are indeed expressed in terms of the degree of association between dimensions.
4. What Is “Pattern Searching” in Social Sciences About?
This misunderstanding evokes more serious differences of opinion about how to envisage the use of statistics in social sciences. When Andrew Abbott introduced OM into the world of social sciences in the 1980s, this took its place within a broader discussion of what he called “general linear reality” (Abbott 2001b; Robette forthcoming). He saw the “methodological framework” of the social sciences as being structured by a set of dichotomies: quantitative versus qualitative, positivism versus interpretation, and so on (Abbott 2001a:28). These dichotomies possess “elective affinities,” of which the most profound associates positivism with analysis and narrative with interpretation. Abbott sought to break down these affinities by reintroducing a narrative dimension into positivism. This meant proposing an alternative to the “paradigm of variables” that dominates quantitative empiricism and its implicit presuppositions (Abbott 2001b; Fabiani 2003). The analysis of sequences provides a set of tools for developing this alternative, among which Abbott singled out OM. In 2000, an article by Abbott and Tsay in Sociological Methods and Research was followed by comments by Levine and by Wu. Levine (2000) took up a firm position in favor of general linear reality, reproaching OM mainly for not meeting the standards of stochastic models. Wu (2000), a specialist in event-history analysis, adopted the same point of view and also formulated more targeted criticisms of particular aspects of the method, such as the sociological meaning of the operations of substitution, insertion, and deletion of elements within sequences and the inclusion of the order of the events in the sequences. In response to all these criticisms, Abbott corrected what he saw as miscomprehensions about the workings of OM and more particularly resituated the method within the dichotomy of general linear reality versus narrative-descriptive methods: any assessment of OM against the bases of mainstream statistical methods de facto invalidates most of the criticisms (Abbott 2000):
OM algorithms are not models, nor are they premised on models. That is the foundation of their difference from standard methodologies. They simply look for patterns or regularities. The type of regularity they seek can be varied by varying the structure and parameters of the algorithm. But the algorithms do not rest, ultimately, on an idea of how the data are generated. (p. 67)
And yet in this symposium, just as more broadly in the assessments of research on the basis of sequence analyses, the criticisms have often been founded on principles close to criteria of scientificness calqued on those of the experimental sciences—on an “instrumental positivism” as defined by Bryant (1989), who called it
“instrumental” insofar as it is the available research instruments that mark out the object of research, and “positivist” because this self-imposed constraint of sociologists reflects their desire to submit to an analytical rigor similar to that they attribute to the natural sciences. (p. 64, retranslated)
For example, in his comment, Elzinga (this volume, pp. 45–51) considers that a degree of agreement of .65 between two clusterings is not satisfactory, contrary to what we state in our article. According to his view, one cannot settle for a value below .9. He illustrates this with some amusing and revealing examples: the allocation of children to one educational program or another and of patients to one therapy or another. But that is precisely the point: we are social scientists, not policymakers or doctors; decision making is far beyond our scope. In quantitative sociological research, it is common practice to use a significance threshold of 5 percent. This is merely a statistical habit: who would undergo vision correction surgery if medical engineering tolerated a similar degree of error? Many commonly accepted rules for statistical choices in our disciplines are social constructs, traditions based on no real theoretical foundations. These choices can be only contextual and often empirical, and any normative aspiration is founded on a poor understanding of the particular epistemology of the social sciences (Passeron 1991). The general problem of thresholds is easy to understand: just try to answer the question “How many grains of sand make a sand pile?”
This “instrumental positivism” recurs in the matter of the number of classes of typology produced by sequence analysis. Again and again, the referees of articles we have submitted to various journals (and here Sociological Methodology is no exception) have come back with remarks such as “there is no ‘numerical’ or ‘statistical’ criterion mentioned to motivate the choice of a cluster solution.” Lurking in the background is the idea that there is a “true” solution, or at least a “best” solution, which statistical tools are intended to reveal.
However, any automatic classification procedure will place all the individuals in a study population into mutually exclusive groups. So any of the possible solutions is “true.” As for which is the “best,” no general answer can be given, even for a single set of data: it all depends on the research question, the interpretability of the results and their value for advancing current sociological themes, the use to be made of the typology, and so on.
8
As Williams and Lance (1965), cited in our article, asserted, a typology is not true or false; it is profitable or unprofitable. They added,
To define an optimum method we should have to formalize the situation sufficiently to estimate, and thence to maximize, the expected profitability. The purpose of such methods is not to displace the intuitive taxonomist, but to suggest to him potentially fruitful lines of investigation. (p. 160)
To base the choice of number of classes on a statistical criterion is less a guarantee of scientificness on the researcher’s part than an abdication of responsibility.
But our view does not appear to be widely shared: as Aisenbrey and Fasang (2010) noted, the “validation” of sequence analysis results is repeatedly criticized. They suggested that a remedy might be to use cutoff criteria based on the dispersion of within- and between-cluster distances and to take the best solution to be the number of classes at the point at which the ratio of within- to between-cluster distances falls below .5 for the first time. 9 But what is the theoretical basis for this threshold? It is merely a heuristic. Furthermore, there are many cutoff criteria, and they do not necessarily lead to the same conclusions, so it is easy for cunning researchers to choose the criteria that suit them best so as to satisfy their peers while preserving their own choices. The whole apparatus of validity tests, robustness checks, sensitivity tests, and “noise models” may well have some use, but mainly for improving one’s chances of being published in the leading journals by aping the experimental sciences’ criteria of scientificness. The wisest thing to do when taking an exploratory, heuristic, and nonconfirmatory approach would be to (1) use as many instruments as possible that seem to be technically suited to identifying the patterns one wishes to discover (e.g. correlations, partitions), with a wide range of values for their tuning parameters, and (2) compile and critically interpret the similarities and differences between the results obtained, so as to sort out the more robust patterns from the weaker ones (those depending most on the observation instrument), or even from pure artifacts via meta-analysis.
When Benzécri (1973) developed correspondence analysis from 1962 to 1965, he was hoping to “discover the hidden properties, higher in the natural hierarchy of causes than those that are obvious, which control the obvious ones” (p. 48). In his view, therefore, “since the realities of this world are things created by God, the statistician’s work is to work back from the facts to the essence of things, the shape the Creator gave them” (Cibois 1981:339). Those using this technique immediately set aside these philosophical foundations. But one may well wonder whether, driven out by the door, these ideas have not slipped back in through the open window of mainstream statistics in its quest for the “true” or the “best” solution. 10
With correspondence analysis, Benzécri also intended to introduce into France a way of doing and seeing statistics similar to the data analysis practiced by English-speaking researchers (Cibois 1981), which Rouanet and Lépine (1976) described as follows:
It designates not really a set of techniques, let alone an “established doctrine,” but rather “a certain idea of statistics” whereby it is legitimate in principle (even if in practice problems arise) to examine the data in order to interpret them, whatever the intentions and procedures of their collection may have been, without the need to confine oneself to a model or restrictive hypotheses. (pp. 137–38)
It is within this legacy, we believe, that pattern search techniques such as sequence analysis should be placed.
5. What Are You Going to Do for Us Presently?
Once these misunderstandings have been cleared up, we may attempt now to summarize the encouraging prospects for research into sequence analysis outlined by the comments in this symposium.
First, as has been argued, the automatic classification of multidimensional sequences is not a tool for examining the degree of association between dimensions. However, the question of the association between dimensions is a central one, and there are already some ideas for research in that direction. Elzinga (this volume, pp. 45–51) suggests using distance matrices of the various dimensions, analyzing their association from Mantel, Kendall, or Rv coefficients and another coefficient based on the notion of “local monotonicity” (see also Piccarreta and Elzinga 2013). Studer (this volume, pp. 81–88) mentions Cramer’s V and standardized Pearson residuals (to analyze the contingency table of typologies for each dimension), discrepancy analysis (see also Studer et al. 2011), and “sequences of typical states” based on implicative statistics (see also Studer 2012). Taken together, these techniques already provide a copious toolbox, which we should use and test more widely.
Nearly 30 years after OM analysis was introduced into the social sciences, the question of comparing metrics remains open. A number of studies of systematic comparison have shown that many existing metrics gave closely similar results, although some metrics do stand out (Robette and Bry 2012; Studer and Ritschard 2014). Indeed, the recent subsequence vector representation metrics seem particularly effective when focusing on the order of elements within sequences (Elzinga and Studer 2015; Elzinga and Wang 2013). We should bear in mind that the choice of metric, although it certainly does not fundamentally alter the results, is no trivial matter, and it may be instructive to test a number of metrics on one set of data before proceeding with analyses. 11
Piccarreta’s point is also important: “Can sequences be so easily substituted by the MDS scores?” (this volume, p. 80). Abbott and De Viney (1992) appear to say yes in their article on policy adoption sequences (see also Halpin and Chan 1998). MDS applied to the matrix of distances between national sequences enables them to identify two main structuring factors, interpreted as the timing of pension program adoption and the timing of health insurance adoption. These two factors are then analyzed separately as dependent variables. However, MDS provides only a Euclidean approximation of a dissimilarity that is not necessarily Euclidean. Any Euclidean metric is perfectly rendered by the full set of MDS components, 12 whereas a non-Euclidean metric is rendered only approximately. So the question is, what information is lost by substituting MDS components for the distance matrix originally chosen; that is, what is the “non-Euclidean share” of this distance? Thorough research would be needed into ways of finding the Euclidean within the non-Euclidean. The use of MDS for sequence analysis probably deserves wider investigation (Piccarreta and Lior 2010) before we adopt it as a matter of routine.
Finally, one last prospect for research is the connection between the local and global (here in Fasang’s sense), that is, event and sequence. As Fan and Moen point out in their comment, one might, for example, ask “how a given transition in one person’s life is tied to temporal patterns in another’s.” The path toward combining the standard tools of event-history analysis and those of sequence analysis appears at first blush to be a stony one in both technical and epistemological terms, but it may not be totally impassable. Studer’s (2012) “sequences of typical states” may well supply another line of enquiry. Let us bet that this is the direction that will be taken by the most stimulating innovations in sequence analysis in the years ahead.
