Copula Models for Sociology: Measures of Dependence and Probabilities for Joint Distributions

Abstract

Often in sociology, researchers are confronted with nonnormal variables whose joint distribution they wish to explore. Yet, assumptions of common measures of dependence can fail or estimating such dependence is computationally intensive. This article presents the copula method for modeling the joint distribution of two random variables, including descriptions of the method, the most common copula distributions, and the nonparametric measures of association derived from the models. Copula models, which are estimated by standard maximum likelihood techniques, make no assumption about the form of the marginal distributions, allowing consideration of a variety of models and distributions in the margins and various shapes for the joint distribution. The modeling procedure is demonstrated via a simulated example of spousal mortality and empirical examples of (1) the association between unemployment and suicide rates with time series models and (2) the dependence between a count variable (days drinking alcohol) and a skewed, continuous variable (grade point average) while controlling for predictors of each using the National Longitudinal Survey of Youth 1997. Other uses for copulas in sociology are also described.

Keywords

copula modeling joint distributions measures of association nonnormal distributions marginal distributions

Introduction

Often in sociology, researchers are confronted with nonnormal variables whose joint distribution they wish to explore. Yet, assumptions of common measures of dependence can fail or estimating such dependence is computationally intensive. For these reasons, joint distributions, which define the probability of events in terms of two or more variables (i.e., the probability of two variables simultaneously having particular values), receive little attention in sociology. Rather, sociologists tend to focus on conditional distributions as given through methods based upon generalized linear models, which describe the probability across a univariate distribution when the value of another variable is known (i.e., the probability of a dependent variable given that an independent variable(s) has a certain value). This article presents the copula method for modeling the joint distribution of two random variables. Put most simply, copulas are joint distribution functions that “couple” one-dimensional marginal distribution functions together, defining the probability space across the values of both marginals (Nelsen [1998] 2006:1). Copulas accomplish the modeling of joint distributions by making the cumulative distribution functions (CDFs) of each marginal the object of analysis.

While techniques for joint multivariate normal distributions are well developed, copula models provide a technique for analyzing joint nonlinear distributions of nonnormal data, which arises frequently in sociology. This is advantageous for several reasons (Joe 1997; Frees and Valdez 1998; Nelsen 2006; Trivedi and Zimmer 2007). Sociologists have techniques for modeling nonlinear marginal distributions (e.g., via various generalized linear models), but lack straightforward methods for deriving joint distributions of those marginal distributions, which copulas provide. Copula modeling affords the researcher the ability to model each marginal differently, providing much flexibility in model specification and a valuable feature given the variety of empirical distributions encountered in sociological data. Estimating the joint distributions of nonlinear outcomes is often computationally demanding and requires the use of simulation, whereas copula models are fit via standard maximum likelihood procedures. Following this modeling, copulas provide a unit-free, nonparametric measure of dependence (alternatively called association or mixture) that is general and free from influence of the specific marginal distributions, going beyond correlation or linear association. The dependence parameter, along with the parameters of the marginal distributions, are then used to provide an estimate of the joint probability, which can exhibit different strengths across the joint distribution.

This technique has demonstrated utility in fields such as actuarial science (e.g., Frees and Valdez 1998, 2008; Frees and Wang 2005; Frees, Carriere, and Valdez 1996), finance (e.g., Cherubini, Luciano, and Vecchiato 2004; McNeil, Frey, and Embrechts 2005), economics (e.g., Cameron et al. 2004; Smith 2005; Trivedi and Zimmer 2007), marketing (e.g., Danaher and Smith 2011), medicine (e.g., Escarela and Carriere 2003; Wang and Wells 2000), and engineering (e.g., Genest and Favre 2006; Salvadori et al. 2007).

Copulas are particularly useful for sociologists in several situations. First, much data in sociology is nonnormal and often highly skewed. After finding the appropriate distributions for the two outcomes of interest, one can use a copula to understand the association between these two variables. A simulated example examining the dependent mortality of husbands and wives is used subsequently to illustrate this point. Given the skewness of these variables, copulas are particularly useful in examining behavior of heavy-tailed data. As such, sociologists could use these methods for the joint probability of events that may be rare in occurrence. The copula model, however, is much more general. In sociology, we often do not know the causal order of outcomes, including among complex time series data, yet still wish to accurately model a multivariate relationship while controlling for predictors of interest for each outcome. In modeling each marginal separately and applying a copula to understand the multivariate distribution, this becomes a possibility. An example of time series marginals for unemployment and suicide rates follows subsequently. Researchers can also construct separate marginal distributions based upon generalized regression models and examine the relationship between various phenomena using a copula. In an example using National Longitudinal Survey of Youth 1997 (NLSY97) data, the relationship between a skewed continuous measure and count data with covariates in the margins is shown in order to attest to the generality of the method. Additional potential uses for copulas within sociology are also described.

In what follows, I first describe the copula methodology, providing the definition of a copula and examples of several types of copula distributions. Then, as mentioned earlier, I show examples of the modeling procedure. First, I generate simulated data as a simple example of model fitting, model selection, association, and joint probabilities. Second, I present an empirical example with time series in the margins. Third, I show an example with generalized linear models in the margins. Then, I outline additional possible uses for sociologists. Finally, I revisit the usefulness of copulas, while also pointing to some of the limitations.

Introduction to Copula Models

Definition

The mathematics for copulas is presented elsewhere, but the main concepts are summarized here. For simplicity, only the bivariate case is discussed (for a complete treatment of the mathematics, see Joe 1997 and Nelsen 2006; for other discipline-specific introductions, see Frees and Valdez 1998; Genest and Favre 2006; Trivedi and Zimmer 2007). In the bivariate case, suppose we have two variables of interest y ₁ and y ₂. Their CDFs are then denoted as F₁(y₁) and F₂(y₂), and their joint CDF is denoted as F(y₁,y₂). A copula function uses the CDFs as the object of analysis or the inputs. The definition of the copula is derived from the work of Sklar (1959; see also Schweizer and Sklar 1974). According to Sklar’s (1959) Theorem, for any joint distribution F(y₁,y₂) with marginal cumulative distributions F₁(y₁) and F₂(y₂), there exists a copula function

F (y_{1}, y_{2}) = C_{θ} (F_{1} (y_{1}), F_{2} (y_{2})) .

In other words, the joint distribution of any two outcomes y ₁ and y ₂ can be expressed as a copula function that is determined only by the individual marginal CDFs F₁(y₁) and F₂(y₂) and an association parameter θ that binds them together. Given that CDFs are bound between 0 and 1 by definition, the function takes a value on the unit square and produces a value on the unit interval.¹ To summarize, we input the values for each of the marginals (y ₁ and y ₂) into its respective CDF (F ₁ and F ₂) to produce a value on the unit line (denoted u and v, respectively), which inputted into a copula function provides their joint distribution.

For any two continuous marginals, the CDF is uniform from 0 to 1, or $F_{1}, F_{2} ~ U (0, 1)$ , and the copula representation is unique (Nelsen 2006:18-21). While Sklar’s Theorem demonstrates that existence still holds and the association is still estimable in the discrete case, the copula may no longer possess uniqueness (Genest and Nešlehová 2007:478; Sklar 1959). For a discrete measure, the copula representation is unique if uniform on the range of F₁(y₁) and F₂(y₂) within the unit interval (Nelsen 2006:21). Thus, discrete margins should not inhibit empirical applications (Trivedi and Zimmer 2007:11-12), though care must be taken in their application (Denuit and Lambert 2005; Genest and Nešlehová 2007). The example of count data given subsequently using the number of days drinking alcohol will illustrate one approach to modeling discrete data with copulas, with additional cautions and caveats regarding discrete margins provided in note 10.

Schweizer and Wolff (1981) showed that the copula captures all of the dependence between two random variables via two standard nonparametric measures of association, namely Spearman’s ρ and Kendall’s τ. This surprising conclusion results from the fact that (1) the joint distribution can be decomposed into the marginal distributions and the copula, and (2) each of these measures of association can be expressed solely in terms of the copula function. They further show that Pearson’s correlation coefficient does not satisfy the latter, such that it does not provide an accurate measure of dependence only, due to also containing information from each of the marginals. Thus, Spearman’s ρ and Kendall’s τ can be applied to the joint distribution regardless of the specific form of the marginal distributions and have been shown to take all values in the desired interval [−1,1], but both conditions are not always satisfied for Pearson’s correlation² (Embrechts, McNeil, and Straumann 2002). Among the most useful results, the copula allows the strength of the relationship of the two marginals to change across the joint distribution, as can be seen in the several examples of types of copulas. For the association parameters, the square of Spearman’s ρ can be interpreted similar to Pearson’s correlation as the percentage of shared or explained variance, while Kendall’s τ can be interpreted as the difference in the probability of the two variables being in the same order versus the probability of not being in the same order. For the case of two discrete marginal distributions, the occurrence of ties could decrease the attainable value of the measure of association (see Denuit and Lambert 2005; Trivedi and Zimmer 2007:24-25). Thus, it is crucial that the marginal distributions are first modeled accurately. With a better fit of the margins, one can model the dependence structure more precisely (Trivedi and Zimmer 2007:62). Luckily, this step is one with which sociologists are quite familiar, as it encompasses the modeling techniques that are standard to the discipline.

Copula modeling allows researchers to consider the marginal modeling and dependence modeling as separate but related issues through several straightforward steps. First, choose the appropriate marginal distributions, which need not be the same for each marginal distribution, and estimate the distributional parameters via maximum likelihood. Importantly, there is no restriction on the distribution of the margins. This step is akin to model selection for each marginal and can be a simple chosen distribution that appears to fit the data well or an actual generalized regression model.

Second, we need to transform the observed data to its CDF. This conversion is known as the probability integral transformation, though the procedure is more straightforward than the name suggests.³ The transformation applies a given distribution to data in order to produce the probability of observing an equal or lower value. In essence then, this procedure is the same as finding a p value (technically 1 − p value, or the “complement” of the p value; that is, the probability up to that point, rather than beyond it). The uniform distribution of the CDFs on the unit line is only guaranteed if the marginal distribution is correct. In the case of empirical data, researchers must check this uniformity, which may not exist if the marginal distribution is incorrectly specified, attesting to the importance of fitting the margins correctly. The following example with grade point average (GPA) will demonstrate this importance.

Third, using the starting values from the estimation of the parameters of the margins once an appropriately fitting distribution is selected, estimate the copula model, including the association parameter and all the marginal parameters, by maximum likelihood. There are several types of copulas discussed subsequently and researchers should fit many using model selection techniques to choose the best fitting copula.

Finally, measures of dependence are computed from the best fitting copula model. We can then convert the dependence parameter into the well-known measures of association of Kendall’s τ and Spearman’s ρ, providing a measure of the strength of the joint relationship, which, together with the appropriate copula function, provide the joint distribution free of influence from the marginal distributional forms. Since the copula is a probability distribution, we can input the cumulative probabilities calculated from each marginal distribution into the best fitting copula model to produce joint probabilities.

Types of Copulas

Although the distributional forms of copulas may appear unfamiliar to sociologists, they are analogous to any other parametric joint distribution, but conforming to the above mentioned definition and properties. As with the distributions that are familiar, there are many distributions that satisfy those assumptions. In this section, I describe the most common copula distributions within three distributional families. Families share a common underlying formula, with differences dictated by a “generator” function which is plugged into that formula. For simplicity, the discussion is restricted to the bivariate case.

The first family is the elliptical family of copulas (see, e.g., Durante and Sempi 2010:14-15). The members of the elliptical family are derived from the density function of an elliptical distribution with mean zero and correlation matrix Σ which is given, for every $x \in R^{2}$ , by:

h_{g} (x) = {|Σ|}^{- 1 / 2} g ({(x)}^{'} Σ^{- 1} (x)),

where g is a generator function. The two most common elliptical copulas are the Normal (or Gaussian) copula and the t-copula. Table 1 gives several examples of bivariate copulas. The third column of the table shows the generator function g(t). For example, the generator function for the bivariate Normal copula is $g (t) = \frac{1}{\sqrt{2 π}} e^{- t / 2}$ . When inserted above, the Gaussian copula simplifies to the form given in the table, or $Φ_{θ} [Φ^{- 1} (u), Φ^{- 1} (v)]$ , where Φ_ρ is the CDF of the bivariate standard normal distribution with correlation ρ, and Φ⁻¹ is the quantile function.

Table 1.

Select Common Bivariate Copulas (Nelsen 2006).

Copula	Bivariate copula $C (u, v)$	Generator $g (t)$	Parameter Bounds	Kendall’s τ	Spearman’s ρ
Elliptical family
Normal/Gaussian	$Φ_{θ} [Φ^{- 1} (u), Φ^{- 1} (v)]$	$\frac{1}{\sqrt{2 π}} e^{- t / 2}$	$[- 1, 1]$	$\frac{2}{π} arcsin θ$	$\frac{6}{π} arcsin \frac{θ}{2}$
t	$T_{θ, ν} [T_{ν}^{- 1} (u), T_{ν}^{- 1} (v)]$	$\frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{ν}{2}) \sqrt{π ν}} {(1 + \frac{t}{ν})}^{- \frac{2 + ν}{2}}$	$θ \in [- 1, 1]; ν > 0$	$\frac{2}{π} arcsin θ$	No closed form
Archimedean Family
Independence/Product	uv	− ln t	N.A.	0	0
Clayton	${[m a x (u^{- θ} + v^{- θ} - 1, 0)]}^{- 1 / θ}$	$\frac{1}{θ} (t^{- θ} - 1)$	$[- 1, 0) \cup (0, \infty)$	$\frac{θ}{θ + 2}$	No closed form
Gumbel–Hougaard	$exp \{- {[{(- ln u)}^{θ} + {(- ln v)}^{θ}]}^{1 / θ}\}$	${(- ln t)}^{θ}$	$[1, \infty)$	$\frac{θ - 1}{θ}$	No closed form
Frank	$- \frac{1}{θ} ln [1 + \frac{(e^{- θ u} - 1) (e^{- θ v} - 1)}{e^{- θ} - 1}]$	$- ln (\frac{e^{- θ t} - 1}{e^{- θ} - 1})$ .	$(- \infty, 0) \cup (0, \infty)$	$1 - \frac{4}{θ} [1 - D_{1} (θ)]$	$1 - \frac{12}{θ} [D_{1} (θ) - D_{2} (θ)]$
Farlie–Gumbel–Morgenstern
FGM	$u v [1 + θ (1 - u) (1 - v)]$		$[- 1, 1]$	$\frac{2 θ}{9}$	$\frac{θ}{3}$

Note: $D_{k} (x)$ is the Debye function, which is defined for any positive integer k by $D_{k} (x) = \frac{k}{x^{k}} \int_{0}^{x} \frac{t^{k}}{e^{t} - 1} d t$ . $Φ_{ρ}$ is the cumulative distribution function (CDF) of the bivariate standard normal distribution with correlation ρ, and $Φ^{- 1}$ is the quantile function. $T_{ρ, ν}$ is the CDF of the bivariate t-distribution with correlation ρ and ν degrees of freedom, and $T_{ν}^{- 1}$ is the quantile function with ν degrees of freedom. Γ represents the gamma function. While there is no closed form for several of the values of Spearman’s ρ, the copula package in R can still compute these via numerical approximation techniques. FGM = Farlie–Gumbel–Morgenstern.

The Normal copula is an important starting point since normal marginal distributions result in the Normal copula reducing to the familiar multivariate normal distribution. In the case of elliptical marginal distributions (as would arise with normal margins), we can interpret the association parameter for elliptical copulas as the linear correlation. With more diverse margins of interest, there is a need for other measures of association because the correlation is not free of influence from the form of the marginal distributions. As described earlier, Kendall’s τ and Spearman’s ρ provide such measures and are given in Table 1. In the case of the Normal copula, Kendall’s τ provides a nonparametric robust and efficient estimator of association for both elliptical and non-elliptical margins (Embrechts, Lindskog, and McNeil 2003:357-60). Figure 1 depicts the CDF and probability distribution function (PDF) for the Normal copula with various association parameters. At the lower depicted relationship (τ = 0.16), the probability is more evenly distributed over all the values of both margins. While the relationship is strongest among the lowest and highest values of each marginal, there is room for considerable discordance. As the value of Kendall’s τ increases, the concordance increases, reserving the most probability along the main diagonal. A negative association would put the highest probabilities in the opposite corners (i.e., discordance).

Figure 1.

Cumulative distribution function (CDF) and probability density function (PDF) of Normal copula with various association parameters.

The second family of copulas in Table 1 is the Archimedean copulas (Nelsen 2006:109-55). These copulas have a relatively simple form for their construction and, thus, there is a great variety of copulas that fall under the family. A bivariate Archimedean copula is defined as:

C (u, v) = g^{[- 1]} (g (u) + g (v)),

where g ^[−1] denotes the operation of pseudo-inversion (i.e., equal to zero at zero and the inverse otherwise). As mentioned earlier, plugging a generator function into this formula produces a copula, as shown in Table 1.

Although a great variety of Archimedean copulas have been described in the literature (see Nelsen 2006:116-19), Table 1 depicts four of the most common Archimedean copulas: the product (or independence), Clayton (Clayton 1978; Genest and Rivest 1993), Gumbel–Hougaard (Gumbel 1960; Hougaard 1986), and Frank (Frank 1979; Genest 1987; Nelsen 1986) copulas. Each of the latter three provide instructive examples due to where the weight of the joint probability distribution is most heavy. Figure 2 shows each of these three copulas at the value of their association parameter corresponding to a Kendall’s τ of 0.5. Each copula is quite different in its representation of the dependence structure over the joint distribution. The Clayton copula and Gumbel–Hougaard copula are each most useful for dependence that is high in the lower and upper tails, respectively. On the other hand, the Frank copula allows for both positive and negative association, as well as throughout the distribution where the data are concordant. Although resembling the Normal copula, it is clear that the Frank assigns less probability to the most discordant areas of the distribution for this particular parameter (the exact shape is dependent on the parameter value). As with the example of the Normal copula, increasing or decreasing the association parameter of any of the Archimedean copulas increases the figure’s concordance or discordance, respectively. The product copula is worth noting as a baseline for measuring independence. As the formula in Table 1 shows, the copula does not depend on an association parameter and the Kendall’s τ and Spearman’s ρ are both zero.

Figure 2.

Cumulative distribution function (CDF) and probability density function (PDF) of select Archimedean copulas with Kendall’s τ of 0.5.

The final copula family in Table 1 is the Farlie–Gumbel–Morgenstern (FGM) family (Farlie 1960; Gumbel 1958; Morgenstern 1956; Nelsen 2006:77-78). In the table, only a general form for the family is given, as is most typical. FGM copulas can only model weak dependence. Consider the extreme values of the FGM association parameter (−1 and 1) when plugged into the formula for Kendall’s τ in Table 1. The highest and lowest values are for τ are 2/9 and −2/9, respectively. Given that such weak associations can occur in sociological data, the FGM copula is worth noting, though it is not considered in the following examples.

Fitting Copulas

One of the major advantages of copula modeling is in its computational ease relative to other methods of examining joint distributions. Log likelihoods for copulas are easily derived and, thus, standard maximum likelihood procedures are applicable.⁴ As for the actual task of computation, the “copula” package (Kojadinovic and Yan 2010; Yan 2007) for the R Statistical Software Environment (R Core Development Team 2011) provides the necessary commands to estimate copula models. All models presented were estimated in R using the “copula” package and other standard commands as described in Yan (2007). The development of this module represents a major step forward in accessibility for those interested in copula models. The package includes, among other things, commands for fitting copula models, drawing random data from a particular copula model with user-defined marginal distributions, and model fit statistics. While the arithmetic for several calculations are shown subsequently, the copula package can compute these values for the user, such as Kendall’s τ, Spearman’s ρ, and joint probabilities.

Several model fit measures are presented. As describe by Joe (2015), we can use the Akaike information criteria (AIC) and Bayesian information criteria (BIC) for model selection, where they are interpreted as penalized log-likelihood functions. They take the typical forms of $A I C = - 2 ln \hat{l} + 2 p$ and $B I C = - 2 ln \hat{l} + p ln n$ , where p is the number of parameters (for more on goodness-of-fit, see Junker, Szimayer, and Wagner 2006; Genest, Rémillard, and Beaudoin 2009). Grønneberg and Hjort (2014) describe the scenarios where the AIC and BIC provide accurate measures of copula model fit, with suggestions for additional model fit statistics. Given this, we also present model fit statistics that compare the observed and estimated joint distributions. The root mean square error (RMSE) for the model is presented, while probability–probability (PP) plots are used to visually demonstrate the comparison where relevant. The RMSE is calculated as is typical: $R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(C (u_{i}, v_{i}) - C_{n} (u_{i}, v_{i}))}^{2}}$ , where C_n is the empirical, observed copula (Vandenberghe, Verhoest, and De Baets 2010). The R code for all analyses that follow is available on the author’s website.

Copula Modeling with Sociological Data

Example 1: Known Marginal Distributions and Copula with Simulated Data

In order to gain an understanding of copula models and their application, we first begin with an example with known marginal distributions and a known copula dependency structure connecting them. This example uses randomly generated data meant to resemble the age of death of husbands and wives and the dependency between them, as the “widowhood effect” is a well-established finding in sociology (see, e.g., Elwert and Christakis 2006). Using the copula package in R, margins for each spouse’s age are each fit with the Weibull distribution and a dependency structure from the Gumbel–Hougaard copula. The association parameter for the copula is set to θ = 2, which inserted into the formula for Kendall’s τ in Table 1 produces an association of τ = 0.5. In other words, we are setting the difference in the probability of both marginals being in the same order versus not in the same order at 0.5. For husbands, the Weibull distribution parameters are set to a scale of 65 and a shape of 8. For wives, the Wiebull distribution parameters are set to a scale of 80 and a shape of 8. Figure 3 shows the produced margins with a random draw of N = 1,000. As the figure shows, the scale parameter sets the peak in the age of death and the Weibull distribution assigns much more weight to the older ages, with more skewness to the left. The dependency between them is not apparent just by examining the margins.

Figure 3.

Histograms of randomly generated husband and wife age of death with Weibull marginals and Gumbel–Hougaard copula with association parameter 2.

Yet, the margins are linked by the Gumbel–Hougaard copula and the joint distribution between the margins demonstrates the dependency. Going back to Figure 2, the middle panels depict this exact copula. It was chosen for this example because it assigns high dependency in the upper tail, thus assuming the mortality of spouses is more related as they age. Figure 4 shows how the joint distribution between husband and wives’ age of death changes according to the level of dependency assigned between them (each is a random draw of 1,000). As the parameter is increased, the joint distributions on the left become tighter. As apparent through the contour plots on the right, the dependency is always stronger in older age. That is, spouse’s ages of death are closer when they are older. When a death is experienced at a young age, the surviving spouse is not as likely to die at a similar age compared to the upper part of the marginal distributions. The middle panel of Figure 4 shows the example used of θ = 2.

Figure 4.

Randomly generated data from Gumbel–Hougaard copulas for various association levels between age of death of husband and wife (each with Weibull marginal distributions).

Demonstrating how copula models allow flexibility in the shape of the dependence structure, Figure 5 gives a comparison of what the joint distribution would look like assuming three alternative copula structures each with the same association of τ = 0.5 (again each with 1,000 random draws). As was described earlier, the Clayton copula exhibits higher dependency in the left portions of marginal distributions, and we can see this in the contour plot and joint probability plot. For spouse’s deaths, this copula is inappropriate if we assume the relationship is stronger when the age of death is older. While the joint distribution from the Frank copula assigns some weight to the higher ages, the distribution is fairly even throughout. Finally, the Normal copula comes the closest, yet still does not exhibit as much dependency in higher ages that we know exists in the data. The fit of various copula models will demonstrate this.

Figure 5.

Other potential copula models with association 0.5 for randomly generated age of death of husband and wife (each with Weibull marginal distributions).

As stated earlier, we must properly specify the margins before the copula model is fit. While in this case they are known, it is still instructive to show the example of checking for uniformity. Before this step, a researcher should fit several different feasible marginal distributions to each of those data. Plausible marginal distributions should be garnered from typical examinations of the data, such as histograms, Quantile-Quantile (QQ)-plots, and plots of observed versus predicted values from a distribution. Once a plausible list of marginal distributions is gathered, researchers can then fit those distributions to the margins and use the log likelihood to select the best fitting, through the R command “fitdistr,” for example. Seeing the margins in Figure 3, a researcher may think to fit distributions such as the Weibull, Normal, t, and Cauchy distributions, among others. The values of the log likelihoods (not shown) indicate the best fit is the Weibull, not surprisingly. Since this is a random draw, the shape and scale parameters for the Weibull are not exact. Thus, fitting a Weibull distribution to each marginal separately produces a shape parameter and scale parameter of 8.26 and 64.87, respectively, for the husbands, and 8.23 and 79.54, respectively, for the wives.

The next step is to apply the probability integral transformation to each marginal, such that the values are now that of the CDF and bound on the unit interval. This step is simply a matter of computing the probability for the values of the marginal given the best fitting distribution, akin to determining the complement of a p value (e.g., what is the probability of a wife dying at or below age 60 from a Weibull distribution with a shape of 8.26 and a scale of 64.87?).⁵ Figure 6 shows the probability integral transformations. The left side shows the CDF for each marginal. The straight line clearly shows that the CDF is uniform on the unit interval. The histograms on the right also show considerable uniformity across the unit interval.

Figure 6.

Probability integral transformation of the marginal distributions of randomly generated husband and wife age of death.

The next step is to fit several different copula models. Table 2 shows each of the discussed Archimedean and elliptical copulas fit to the randomly generated data of spouse’s age of death with the Gumbel–Hougaard association parameter θ = 2 (i.e., τ = 0.5). In each case, the copula parameter is significantly different from zero (p < .001). As shown in Figures 4 and 5, some copulas were more plausible than others. The high RMSE of the independent copula demonstrates that the assumption of no joint relationship has poor fit. Given its weight to dependency at lower values, the Clayton copula is also a particularly poor fit, with the highest AIC, BIC, and RMSE by far and a quite incorrect Kendall’s τ at τ = 0.323. For the other four copulas, the association parameter implies a value of Kendall’s τ close to 0.5. Yet, the AIC, BIC, and RMSE clearly show that the Gumbel–Hougaard copula from which the data were drawn fits best, as the shape of the copula function better captures the dependence across the entire joint distribution. As a point of comparison, Table 3 shows model fits when data are randomly drawn from the same two marginal distributions but with no copula structure specified. Thus, any association is purely random. Indeed, the copula parameters are each closest to the value that translates into a Kendall’s τ of practically zero.

Table 2.

Copula Model Fits to Randomly Generated Husband and Wife Age of Death with Weibull Marginals and Gumbel–Hougaard Copula with Association Parameter 2.

Copula	Parameter	Kendall’s τ	Log likelihood	AIC	BIC	RMSE
Gumbel–Hougaard	2.072*** (0.054)^a	0.517	−405.600	−809.181	−804.273	0.1667
Clayton	0.953*** (0.061)	0.323	−175.936	−349.873	−344.965	1.2244
Frank	5.991*** (0.249)	0.513	−331.980	−661.961	−657.053	0.4088
Normal	0.707*** (0.013)	0.500	−347.899	−693.798	−688.891	0.3833
t	0.711*** (0.015)^b	0.503	−353.975	−703.950	−694.135	0.3769
Independent	—	0.000	—	—	—	2.5694

Note: Standard errors are in parentheses. AIC = Akaike information criterion; BIC = Bayesian information criterion; RMSE = root mean square error.

^aWhereas a parameter of zero would imply an association of zero for the other four copulas in the table, a parameter value of 1 is the lowest attainable for the Gumbel–Hougaard and implies an association of zero. Therefore, this coefficient is tested against a value of 1 rather than zero.

^bThe t-copula also has a degrees of freedom parameter, estimated at 8.158.

*p < .05, **p < .01, ***p < .001.

Table 3.

Copula Model Fits to Randomly Generated Husband and Wife Age of Death with Weibull Marginals and No Dependence Specification.

Copula	Parameter	Kendall’s τ	Log likelihood	AIC	BIC	RMSE
Gumbel–Hougaard	1.000 (0.021)^a	0.000	^c	^c	^c	0.1235
Clayton	−0.028 (0.030)	−0.014	−0.396	1.207	6.115	0.1066
Frank	−0.081 (0.190)	−0.009	−0.090	1.819	6.727	0.1141
Normal	−0.011 (0.032)	−0.007	−0.061	1.878	6.785	0.1151
t	−0.011 (0.032)^b	−0.007	−0.016	3.967	13.783	0.1150
Independent	—	0.000	—	—	—	0.1235

Note: Standard errors are in parentheses. AIC = Akaike information criterion; BIC = Bayesian information criterion; RMSE = root mean square error.

^bThe t-copula also has a degrees of freedom parameter, estimated at 383.051.

^cDue to the closeness of the parameter to the boundary (see Table 1), the maximum likelihood method could not be used, as it produced errors regardless of the starting value. The parameters shown are from an alternative method, inversion of Kendall’s τ (Yan 2007), which does not produce a log-likelihood.

*p < .05, **p < .01, ***p < .001.

Next, we can calculate probabilities by plugging values into the copula function given in Table 1. Recall that by the definition of a distribution function and the copula:

F (y_{1}, y_{2}) = C (F_{1} (y_{1}), F_{2} (y_{2})) = C (P (Y_{1} \leq y_{1}), P (Y_{2} \leq y_{2})) = C (u, v) .

First then, an examination of the marginal probabilities is in order. Here, F ₁ and F ₂ are the Weibull distributions defined earlier. Taking the mean from the simulated data as an example, we find that a wife or husband that dies at the respective means of 75.29 and 61.84 each have a value of about u = v = 0.461 on the marginal CDFs (i.e., 46.1 percent of wives and husbands die at or below those ages). To find the joint probability for a couple both dying at or below those respective means, we can plug the values into the Gumbel–Hougaard distribution given in Table 1 (or simply use the “pCopula” function in R):

\begin{aligned} C (u, v) = exp \{- {[{(- ln u)}^{θ} + {(- ln v)}^{θ}]}^{1 / θ}\} \\ = exp \{- {[{(- ln 0.461)}^{2.072} + {(- ln 0.461)}^{2.072}]}^{1 / 2.072}\} \\ = 0.337 . \end{aligned}

Thus, given an association of τ = 0.511, 33.9 percent of couples have both members die at or below the mean on each marginal distribution. If we change the husband’s age of death to 45, that individual now has a value of 0.054 on the marginal distribution function, while the couple is located at 0.049 on the joint cumulative distribution. That is, most couples survive to a higher combination of ages. Finally, keeping the husband at the mean but changing the wife’s age of death to 90, we compute a wife-specific cumulative distribution value of 0.928 and a couple joint probability of 0.459. Of course, this example is one of known distributions and dependency. Thus, we move on to an example with empirical data.

Example 2: Time Series in the Margins with Empirical Data

Arguably, the most common application of copula models is that of understanding the relationship between various time series. In finance in particular, a common goal is to model dependencies between stocks or other financial instruments in order to create a portfolio with decreased risk. In this case, copula models can provide the dependence between the time series in manner that is not a function of the distribution of the margins. Thus, in the margins, we can model the time series in a manner that accounts for autocorrelation (i.e., association of the current value with past values) and heteroskedasticity of both series separately. In this example, we examine a classic sociological outcome, suicide (for a review, see Wray, Colen, and Pescosolido 2011), in a simple bivariate time series example. The relationship between the monthly suicide rate per 100,000 and unemployment rate from January 1999 to December 2011 is examined. The suicide rate comes from mortality data from the Centers from Disease Control and Prevention. Unemployment data come from the Bureau of Labor Statistics, with the seasonally unadjusted rate used since such seasonal fluctuations occur in suicide as well and we are interested in their association. Figure 7 presents both series. If we take these data as the objects of analysis, the Pearson’s correlation is 0.624 (p < .001).

Figure 7.

Time series plot of monthly unemployment rate and suicide rate.

In this example, we are interested in the dependence between unemployment and suicide, but we have reason to hypothesize that there is both an autoregressive (AR) and seasonal (SAR) component to each rate. In time series analysis, the goal is to model the dependence of current values on past values in order to remove any time trend. Differencing the data, such that the current value is the change from the previous year (also called a return), is also typical in order to satisfy the time series model assumption of stationarity (i.e., the effect of past values on the current value is constant across time). Once an appropriate model is selected, the residuals become the object of analysis, as these provide data that are stationary and devoid of a time trend (for more on time series modeling, see Shumway and Stoffer 2011). Analyses of both series demonstrated that the best fitting AR model had a lag of two and an SAR component for the 12 month cycle, and was differenced to account for stationarity. For unemployment, AR(1) = 0.13 (p > .05), AR(2) = 0.27 (p < .001), and SAR(12) = 0.83 (p < .001). For suicide, AR(1) = −0.57 (p < .001), AR(2) = −0.31 (p < .001), and SAR(12) = 0.84 (p < .001).

As with the previous example, the first step to copula modeling is to calculate the probability integral transformation and check the uniformity of the marginal CDFs. As with any linear regression, the residuals from an AR model are assumed normal with mean zero and some variance. Thus, we fit the normal distribution respectively to both series of residuals. As expected, the fitted mean for both is essentially zero (unemployment: mean = 0.00098, variance = 0.04357; suicide: mean = 0.00007, variance = 0.00096). Then, these distributions are used to compute the respective complements of the p values for each marginal. Figure 8 demonstrates that both marginals appear to satisfy the uniformity assumption.

Figure 8.

Probability integral transformation of residuals from autoregressive (AR) model.

As mentioned earlier, it is wise to fit many possible copula models when applied to empirical data (Trivedi and Zimmer 2007). Thus, each of the elliptical and Archimedean copulas was fit to the marginal distributions, with the results shown in Table 4. Although the Frank copula fits the data best according to each of the model fit statistics, all the copulas lead to the same conclusion: We cannot reject the null hypothesis that the association between unemployment and the suicide rate is zero. The linear correlation reported earlier would lead to the conclusion that the shared variance between unemployment and suicide is 39.0 percent. Once the margins are modeled appropriately in a manner that accounts for autocorrelation and a measure of dependence taken that is free from influence of the marginal distributions, the square of Spearman’s ρ from the best fitting Frank copula results in a shared variance of only 0.9 percent. According to Kendall’s τ, the difference between the probability of suicide and unemployment being in the same order is only −0.064 greater than being in a different order. Indeed, the 95 percent confidence interval (CI) computed from the standard error of the association parameter overlaps zero for both measures of association (Kendall’s τ: [−0.163, 0.037]; Spearman’s ρ: [−0.242, 0.056]). Figure 9 shows the contour plot of the joint distribution for the fitted Frank copula. As the neat, elliptical shape demonstrates, there is virtually no relationship throughout the distribution (cf. Figures 3 and 4).

Table 4.

Copula Model Fits to Unemployment Rate and Suicide Rate Time Series.

Copula	Parameter	Kendall’s τ	Spearman’s ρ	Log likelihood	AIC	BIC	RMSE
Gumbel–Hougaard	1.000 (0.054)^a	0.000	0.000	^c	^c	^c	0.0155
Clayton	−0.041 (0.072)	−0.021	−0.032	−0.142	1.717	4.767	0.0137
Frank	−0.579 (0.467)	−0.064	−0.096	−0.079	0.459	3.509	0.0116
Normal	−0.080 (0.079)	−0.051	−0.076	−0.495	1.010	4.059	0.0123
t	−0.080 (0.080)^b	−0.051	−0.077	−0.442	3.117	9.217	0.0122
Independent	—	0.000	0.000	—	—	—	0.0155

Note: Standard errors are in parentheses. AIC = Akaike information criterion; BIC = Bayesian information criterion; RMSE = root mean square error.

^bThe t-copula also has a degrees of freedom parameter, estimated at 155.005.

*p < .05, **p < .01, ***p < .001.

Figure 9.

Contour plot for fitted joint distribution of unemployment and suicide rates.

This example demonstrates the utility of copula models for time series data. In particular, the method enables the researcher to take into account autocorrelation in both variables of interest prior to fitting the dependence. For sociologists, such modeling has further applications. For example, a researcher could use the method to verify whether two different measures of the same phenomenon are associated as expected. In this vein, the method could assess whether the United States’ two measures of crime, the Uniform Crime Report and National Crime Victimization Survey, are indeed moving in unison as might be expected, a long-standing question for those who study crime (Biderman and Lynch 1991; Lynch and Addington 2007). Of course, the preceding model was not meant as the definitive analysis of suicide and unemployment, but was merely used to provide an example. Instead, researchers would most likely be interested in a more detailed analysis, such as including covariates in the margins, to which we now turn.

Example 3: Copula with Predictors from Empirical Data

Here, we consider the relationship of education and alcohol use, modeled with several predictors in the margin (see, e.g., Crosnoe and Reigle-Crumb 2007). The data come from the NLSY97. The NLSY97 is an annual nationally representative survey of youth who were aged 12–17 in 1997. A subset of the 1998 cross-sectional survey that includes only those aged 17 at the time of the survey is analyzed for a sample size of N = 1,298. The outcomes of interest are GPA and a count response to, “During the last 30 days, on how many days did you have one or more drinks of an alcoholic beverage?” GPA is measured on the traditional 0–4 range and comes from the students’ school transcripts.⁶ Several predictor variables are included in the modeling of these two marginal distributions. Indicator variables are included for gender, community type (urban vs. rural), and whether any paid work was reported. A series of dummy variables also measure race (white, black, and Hispanic) and household (HH) parental structure (both biological parents, other two parent, single parent, and other structure). Finally, total HH income is reported by the parent of the respondent and logged due to skewness.⁷

In this particular example, there is reason to believe that either GPA or alcohol use could affect the other. Yet, we are still interested in their association and would like to control for other potential influences for both variables. More importantly, commonly used measures of association are inappropriate to understand the relationship between a continuous, potentially skewed measure and a count variable. The assumptions behind linear correlation are not met, despite such relationships appearing often in both correlation tables and regression analyses. This scenario is one in which copula models are useful.

Unlike the preceding examples, we are interested in including covariates in the margin. While the NLSY97 is expansive in scope such that many predictors could have been tested, for simplicity, only a small group of predictors is included in the models and these are kept the same for both margins. Thus, sociologists should consider all the model selection, model fit, and model diagnostics techniques that are typically employed when fitting each of the margins separately, and the predictors need not be the same.⁸ Here, care is still taken in modeling the margins in a manner that satisfies the uniformity assumption demonstrated in the previous sections via an appropriate regression model.

The first step is to appropriately model the margins and confirm their uniformity. Beginning with GPA, Figure 10 depicts a histogram, which shows some degree of skew. An ordinary least squares (OLS) regression was fit with each of the predictors described earlier (not shown). We then use the probability integral transformation to compute the CDF, which again is akin to computing p values on the appropriate distribution. Given that this is OLS regression, we use the normal distribution function to compute the probability of each respondent’s observed value on the cumulative distribution of GPA using the model variance and their respondent-specific fitted value as the mean. Figure 11 attests to the importance of fitting the marginal distribution correctly. The top panel shows the probability integral transformation of an OLS regression with GPA in its original scale, while the bottom panel shows an OLS regression of GPA with the Box–Cox transformation for normality of the residuals applied with a best fitting power of 1.7 (Box and Cox 1964). Clearly, the former is not uniform on the unit interval, while the latter is.⁹ Given the assumption of uniformity, the transformation of GPA is used in the copula model.

Figure 10.

Histogram of grade point average (GPA; National Longitudinal Survey of Youth 1997 [NLSY97]).

Figure 11.

Probability integral transformation from ordinary least squares (OLS) regression models for grade point average (GPA; National Longitudinal Survey of Youth 1997 [NLSY97]).

A different generalized linear model was then applied to the count of days using alcohol. Given the nature of this count measure, a Poisson or negative binomial regression is most appropriate, with the latter used given the presence of overdispersion (mean = 1.91; variance = 17.8). Here for the probability integral transformation, we compute the complement of the p values associated with the respondent’s observed alcohol use from the negative binomial distribution using the model dispersion and the exponentiated respondent-specific fitted value (so that it is in the correct units) as the mean, shown in Figure 12. The figure depicts uniformity, though the range is not the unit interval. From the above mentioned definition, the copula is still unique on the range of the distribution function as long as it is uniform (i.e., jumps do not appear). In these particular data, no obvious horizontal plateaus appear, there are virtually no ties across the margins, and the models presented subsequently converge. Given several cautions regarding discrete data in the margins, these assumptions should always be checked carefully.¹⁰

Figure 12.

Probability integral transformation from negative binomial regression models for number of days drinking alcohol in last month (National Longitudinal Survey of Youth 1997 [NLSY97]).

Once the researcher is certain that they have correctly modeled the margins and that they satisfy these assumptions for discrete data, he or she can use the values of the regression coefficients as starting values for estimating a copula model. That is, the copula parameter and each of the regression coefficients, as well as any other model parameters such as the residual standard error in the OLS regression and the overdispersion parameter in the negative binomial regression, are estimated simultaneously via maximum likelihood.¹¹ Table 5 shows the best fitting model for each of the elliptical (Normal) and Archimedean (Frank) families, as well as the Product/Independence copula as a point of comparison. Both a model χ² test and the significance of the copula parameter provide evidence to reject the null hypothesis that GPA and days drinking alcohol are not associated, controlling for the predictors in the model. Beginning with the former, a likelihood ratio test between the Independence copula model and either the Normal copula (χ² = 30.607, df = 1, p < .001) or Frank copula (χ² = 23.712, df = 1, p < .001) demonstrate that the model with an association parameter is a better fit. The AIC and BIC suggest that the Normal copula (AIC = 8,189.59, BIC = 8,308.46) is the better model relative to the Frank copula (AIC = 8,196.48, BIC = 8,315.36), and the RMSE is lower for the Normal.

Table 5.

Copula Models for GPA and Number of Days Drinking Alcohol in Last Month (NLSY97).

Copula	Independence	Normal	Frank
Copula parameter	—	−0.286*** (0.045)	−1.627*** (0.048)
GPA model (OLS regression)
Intercept	2.955*** (0.157)	3.227*** (0.157)	3.251*** (0.157)
Paid work	0.222** (0.069)	0.214** (0.069)	0.224** (0.069)
Income (log)	0.022* (0.011)	0.022* (0.011)	0.022* (0.011)
Race: Hispanic versus black	0.056 (0.104)	−0.025 (0.104)	−0.024 (0.104)
Race: white versus black	0.341*** (0.085)	0.271** (0.085)	0.247** (0.085)
HH: Other two parents versus both biological	−0.585*** (0.101)	−0.600*** (0.101)	−0.608*** (0.101)
HH: Single parent versus both biological	−0.413*** (0.079)	−0.418*** (0.079)	−0.420*** (0.079)
HH: Other versus both biological	−0.726*** (0.133)	−0.750*** (0.133)	−0.757*** (0.133)
Gender: Male versus female	−0.519*** (0.065)	−0.554*** (0.065)	−0.559*** (0.065)
Community: Urban versus rural	−0.083 (0.074)	−0.070 (0.074)	−0.076 (0.074)
(Residual standard error)	1.132*** (0.027)	1.168*** (0.027)	1.164*** (0.027)
Alcohol model (negative binomial)
Intercept	−0.778** (0.277)	−0.749** (0.277)	−0.777** (0.277)
Paid work	0.179 (0.127)	0.197 (0.127)	0.170 (0.127)
Income (log)	0.013 (0.019)	0.010 (0.019)	0.015 (0.019)
Race: Hispanic versus black	0.925*** (0.198)	0.971*** (0.198)	0.911*** (0.198)
Race: white versus black	1.091*** (0.160)	1.064*** (0.160)	1.101*** (0.160)
HH: Other two parents versus both biological	0.251 (0.186)	0.236 (0.186)	0.243 (0.186)
HH: Single parent versus both biological	0.206 (0.147)	0.203 (0.147)	0.203 (0.147)
HH: Other versus both biological	0.383 (0.242)	0.381 (0.242)	0.374 (0.242)
Gender: Male versus female	0.475*** (0.121)	0.474*** (0.121)	0.466*** (0.121)
Community: Urban versus rural	−0.097 (0.134)	−0.095 (0.134)	−0.098 (0.134)
(Dispersion parameter)	0.252*** (0.016)	0.252*** (0.016)	0.258*** (0.016)
Log likelihood	−4,087.097	−4,071.793	−4,075.241
AIC	8,220.195	8,189.586	8,196.482
BIC	8,339.072	8,308.463	8,315.359
RMSE	0.0125	0.0096	0.0099

Note: Standard errors are in parentheses. Grade point average (GPA) model uses best-fitting Box–Cox Transformation of 1.7. AIC = Akaike information criterion; BIC = Bayesian information criterion; HH = household; NLSY97 = National Longitudinal Survey of Youth 1997; OLS = ordinary least squares; RMSE = root mean square error.

*p < .05, **p < .01, ***p < .001.

Thus, we can use the copula parameter from the Normal model to determine the association between GPA and days drinking alcohol. The association parameter is θ = −0.286. We use the formulas from Table 1 to compute either of the nonparametric measures of association. As described earlier, the linear correlation can only capture the dependence free of influence from the marginal distribution in the case of a Normal copula and Normal margins. Kendall’s τ and Spearman’s ρ, however, have the advantage of making no assumption about the distribution of either marginal or the relationship between them (Schweizer and Wolff 1981) and providing the association after controlling for important predictors of both phenomena. Spearman’s ρ is −0.274 ( $ρ = (6 / π) arcsin (- 0.286) = - 0.274$ ) with a 95 percent CI of [−0.359,−0.189]. Squaring this estimate gives an estimate of shared variance between alcohol use and GPA of 7.5 percent. If we ignored the marginal distributions and simply used the linear correlation, we would conclude that the shared variance resulting from the linear correlation of −0.116 is 1.3 percent.¹² The calculation for Kendall’s τ results in a value of −0.185 ( $τ = (2 / π) arcsin (- 0.286) = - 0.185$ ; 95 percent CI: [−0.244,−0.127]). That is, the probability of the two margins being in the same (discordant) order is significantly higher than the probability of not being in the same order.

Although the strength of the association is not remarkably different between linear correlation and the other two measures of association, there is a further advantage of the copula modeling procedure. Namely, the dependence varies over the joint distribution, as opposed to a similar effect of a one unit increase in the case of linear correlation or even a linear model. Figure 13 shows the fitted copula probability distribution for the estimated parameter. A PP plot confirms a good fit between the empirical copula and the fitted copula with this association, as depicted in Figure 14. Thus, we are able to model a more detailed joint relationship, and the results make intuitive sense. Given the negative association, the highest probability is in the discordant sections, such that GPA and alcohol use have higher discordant dependency at high and low values of each. In the middle of the distribution, alcohol use and GPA are not as strongly associated. As the nonzero weight in the opposite corners show, even possible concordance is not completely discounted, though albeit rare by comparison. The various shapes of the copula models presented, together with the association parameter, allow for a variety of dependencies.

Figure 13.

Parametric normal copula fit from National Longitudinal Survey of Youth 1997 [NLSY97] copula models.

Figure 14.

Probability–probability (PP) plot of [NLSY97] empirical and fitted copula.

The importance of the ability to control for predictors in the margins becomes further apparent when examining probabilities, as one’s place in the joint distribution is no longer determined solely by values on the two outcomes of interest, but also on the values of the predictors. Within the margins, we see that higher GPA is predicted by paid work, higher income, white race, female gender, and living with both biological parents (relative to all other HH structures). Higher number of days drinking alcohol is predicted by white and Hispanic race (relative to black), and male gender.¹³ For given values of the predictors and outcomes, we can then compute the joint probability of GPA and alcohol use. The first step is to compute the associated complement of the p value on the CDF given those values, using the normal for GPA and negative binomial for alcohol use with the estimated model variance and dispersion, respectively. As with the simulated example, these values are then inserted into the best-fitting copula to determine the joint cumulative probability.

A few examples will demonstrate this computation.¹⁴ If we assume a white, urban, working female teen from a HH with both biological parents and an income of US$68,971, this individual has a predicted mean of 1.71 days drinking and 3.68 GPA. Now suppose that individual has an observed value of 2 days drinking and a GPA of 4.00. Using the model dispersion and variance parameters, respectively, and the respondent-specific predicted means, these observed values translate to a value on the respective marginal CDFs of 0.798 for drinking and 0.933 for GPA conditional upon the covariates. To compute the joint cumulative probability, we use the best-fitting Normal copula formula in Table 1:

Φ_{θ} [Φ^{- 1} (u), Φ^{- 1} (v)] = Φ_{- 0.286} [Φ^{- 1} (0.798), Φ^{- 1} (0.933)] = 0.736.

That is, the joint probability of having a lower GPA and drinking fewer times given the values on the covariates is 0.736. Supposing instead a Hispanic, urban, nonworking female teen from a HH with both biological parents and an income of US$10,615 and observed days drinking of 0 and GPA 3.6, this individual lies at 0.645 for the drinking marginal distribution function, 0.872 for the GPA marginal distribution function, and 0.541 on the joint distribution function. Finally, for a white, rural, nonworking male teen from a HH with both biological parents and income of US$19,029 and observed days drinking of 1 and GPA 1.41, they lie on the 0.673 for drinking, 0.025 for GPA, and only 1.0 percent of 17-year-olds are at or lower on the joint distribution of drinking and GPA given the observed covariates.

While the association parameter was taken as the object of interest in this example, sociologists may also use the copula method with a focus on the covariates. In such a scenario, the joint probability between two variables might be considered a nuisance parameter that needs to be controlled to garner more accurate covariate estimates. In the aforementioned example, we might be hesitant to include alcohol use as a predictor of GPA because we are unsure this is the correct temporal order, yet still wish to account for the relationship. One approach to such a scenario is structural equation modeling. For noncontinuous variables, however, structural equation modeling requires computationally intensive methods such as generalized latent variable models (see, e.g., Skrondal and Rabe-Hesketh 2005). Instead, we could fit a copula model that accounts for the dependence (with or without covariates for alcohol use), with flexibility in the margins and computational advantages from the maximum likelihood and simultaneous estimation.

Other Applications

While this introductory article to copulas prohibits a detailed discussion of every aspect of the methodology, it is worth noting several applications not demonstrated earlier that are potentially quite useful to sociologists. First, any copula can be rewritten in a survival function form, connecting the joint survival function to its univariate margins in a manner completely analogous to the coupling in a copula (Nelsen 2006:32-34). In applications, researchers may be interested in the probability of surviving beyond a particular point, rather than the probability of surviving until that point, a case where the survival copula is appropriate. In the above mentioned simulated example of spousal ages of death, it is easy to see how the survival copula is useful, and we could redo the exercise with the survival version of the Weibull distribution. With empirical data, issues arise in survival analysis, such as censoring, that are still applicable and researchers should consult methods to deal with such concerns (see Hougaard 2000; Shemyakin and Youn 2006).

Second, copulas are particularly adept at modeling “tail dependence,” or associations of relatively rare, but related events or values occurring in the tails of each of the margins. In examining some of the copulas in the above mentioned figures, I alluded to the various copulas’ ability to model joint behavior in the tails. We can explicitly compute the nonparametric tail dependence parameters (denoted λ _L and λ _U for lower and upper), which are interpreted as the limiting value of one marginal surpassing a particular value on the marginal distribution function given that the other marginal is already greater than that same value (McNeil et al. 2005; Nelsen 2006:214-16; Trivedi and Zimmer 2007:25-26). Some copulas are better suited for modeling tail dependence, such that researchers should carefully choose which copulas to model depending on whether joint rare values are of interest and where they are expected to occur. In Figure 2, we saw that the Clayton and Gumbel–Hougaard were useful for modeling relationships in the lower and upper tail, respectively. The upper and lower tail dependence parameters are $λ_{L} = 2^{- 1 / θ}$ and $λ_{U} = 0$ for the Clayton, and $λ_{L} = 0$ and $λ_{U} = 2 - 2^{- 1 / θ}$ for the Gumbel–Hougaard, confirming this dependency in one tail and lack of dependency in the other. On the other hand, the t-copula can model tail dependency in both tails. For the Frank and the Normal, both $λ_{L} = 0$ and $λ_{U} = 0$ , demonstrating that while each can model dependency in cases of concordance (or discordance) near the tails, the copula is not well suited for the relationship between rare joint values far out in the tails. Those in finance have used tail dependency to understand phenomena such as interest rate and stock market shocks (Ane, Ureche-Rangau, and Labidi-Makni 2008; Junker et al. 2006) or costs of insurance claims (Frees and Valdez 2008). With methods to model tail dependency, sociologists may find a useful avenue for modeling rare events or values and heavy-tailed distributions. For example, one could model the relationship between time to college degree completion and student debt, where we might expect a strong relationship in the upper tail (see, e.g., Dwyer, McCloud, and Hodson 2012).

Third, while this introduction focused exclusively on the bivariate case, copulas are theoretically extendable to higher dimensions. That is, Sklar’s Theorem in n-dimensions (Nelsen 2006:46-47) is:

F (x_{1}, x_{2}, \dots, x_{n}) = C (F_{1} (x_{1}), F_{2} (x_{2}), \dots, F_{n} (x_{n})) .

We can extend the other formulas mentioned earlier as well to the multidimensional case, though at high enough dimensions, the current ability to easily estimate the models is hampered. Frees and Valdez (2008) and Zimmer and Trivedi (2006) provide excellent examples of three-dimensional applications. In the NLSY97 analysis, we could, for example, add a depression scale to the copula, furthering past research on GPA, depression, and drinking (see, e.g., Shippee and Owens 2011). The ability to model the margins separately allows researchers a very flexible method to understand the association between phenomena in ways that might prove difficult otherwise, including relationships between rare events, mixed effects models, and highly skewed data.

Discussion

Copulas provide a flexible methodology for understanding associations between related phenomena and their joint probabilities. Thus, copula models ask a fundamentally different question than typical techniques modeling conditional values. Rather than how does variable X influence variable Y, copulas ask: how do two variables move together in unison and how strong is that concurrent movement at various point in the distribution? The substantive outcome of interest then becomes the joint probability of two particular values of each marginal distribution occurring together, and the association is the hypothesis tested, while allowing for considerable flexibility in the modeling of each marginal. The copula function “couples” two univariate marginal distributions together in a way that preserves the margins and provides a measure of dependence that does not require the specification of an a priori joint distribution. Instead, the probability integral transform is used to convert the margins to a uniform distribution, whose joint relationship is then modeled via a copula function. It is often difficult to define, and even more difficult to estimate, joint distributions for margins in situations where one or more of the margins are nonnormal. This approach allows for consideration of a plethora of marginal distributions in a maximum likelihood estimation setting and numerous shapes of joint distributions that capture different strengths of local dependence.

Three examples were provided to demonstrate the method. The first included randomly generated data meant to simulate the joint distribution of spousal mortality. The value of this example was to provide a short introduction to the basics of the method. The real benefit for sociologists no doubt lies in approaches similar to the latter two examples using empirical data. In the first example, time series models were used to adjust for autocorrelation and stationarity to model the joint distribution of suicide and unemployment. In the second example of 17-year-old’s GPA and days using alcohol, the modeling procedure showed how copulas can be used to understand the association and joint distribution between two nonnormal variables, while at the same time controlling for important predictors for each. While this example shares similarities with structural equation modeling, the flexibility in the margins opens possibilities that go beyond what is feasible in structural equation modeling. With models similar to the given examples, as well as the other avenues described in the previous section, the applications within sociology are numerous, as we are often confronted with nonnormal data whose joint relationship we seek to understand.

To close, it is worth noting that any method can produce erroneous results when not properly applied. Copulas are no exception to this rule, and their misuse as a cause of the Great Recession (through violated statistical assumptions and use of a particular distribution on data to which it did not appropriately fit) has actually been well documented through qualitative sociological studies of financial market employees (MacKenzie 2011) and quantitative economic studies of the models used leading up to the financial crisis (Brigo, Pallavicini, and Torresetti 2010; Donnelly and Embrechts 2010; Zimmer 2012). Thus, sociologists should exercise caution and safeguard that each marginal distribution, as well as the copula used to bind them, is fit using the most appropriate method and best-fitting model. Several additional cautions were given throughout where sociologists should take care, such as in the case of discrete marginal distributions or in using copula models with the correct assumption of tail dependence (see Hays and Kachi 2009 for other instances). Further, there are alternative methods for creating or assessing bivariate distributions and associations that researchers might consider, such as seemingly unrelated regression, inversion methods, and mixture models. MacKenzie’s (2011) findings concerning copulas and the sociology of knowledge can be applied much more broadly, as any method can seemingly take hold as the “correct” method such that it becomes unquestioned for a period. His qualitative study reminds those conducting quantitative analysis to take great care in checking the assumptions of one’s models. While copulas, like any method, can be abused, the goal of this article was to help sociologists take the first step toward using them correctly within our discipline.

Footnotes

Acknowledgments

I am grateful to Ken Ferraro and Sarah Mustillo for their valuable feedback on this manuscript. I would also like to thank Brian Kelly, Michael Light, Chris Uggen, Scott Feld, and Joy Kadowaki for input on various aspects of the paper, as well as Arkady Shemyakin and John Dodson of the School of Mathematics at the University of Minnesota who introduced me to the method in their classes.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

References

Ane

Thierry

Ureche-Rangau

Loredana

Labidi-Makni

Chiraz

. 2008. “Time-varying Conditional Dependence in Chinese Stock Markets.” Applied Financial Economics 18:895–916.

Biderman

Albert D.

Lynch

James P.

. 1991. Understanding Crime Incidence Statistics: Why the UCR Diverges from the NCS. New York: Springer-Verlag.

Box

George E. P.

Cox

D. R.

. 1964. “An Analysis of Transformations.” Journal of the Royal Statistical Society, Series B 26:211–52.

Brigo

Damiano

Pallavicini

Andrea

Torresetti

Roberto

. 2010. Credit Models and the Crisis: A Journey into CDOs, Copulas, Correlations and Dynamic Models. New York: Wiley.

Cameron

A. Colin

Tong

Trivedi

Pravin K.

Zimmer

David M.

. 2004. “Modeling the Differences in Counted Outcomes Using Bivariate Copula Models: With Application to Mismeasured Counts.” Econometrics Journal 7:566–84.

Cherubini

Umberto

Luciano

Elisa

Vecchiato

Walter

. 2004. Copula Methods in Finance. New York: Wiley.

Clayton

D. G.

1978. “A Model for Association in Bivariate Life Tables and Its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence.” Biometrika 65:141–51.

Crosnoe

Robert

Riegle-Crumb

Catherine

. 2007. “A Life Course Model of Education and Alcohol Use.” Journal of Health and Social Behavior 48:267–82.

Danaher

Peter J.

Smith

Michael S.

. 2011. “Modeling Multivariate Distribution with Copulas: Applications in Marketing.” Marketing Science 30:4–21.

10.

Denuit

Michel

Lambert

Philippe

. 2005. “Constraints on Concordance Measures in Bivariate Discrete Data.” Journal of Multivariate Analysis 93:40–57.

11.

Donnelly

Catherine

Embrechts

Paul

. 2010. “The Devil Is in the Details: Actuarial Mathematics and the Subprime Mortgage Crisis.” ASTIN Bulletin 40:1–33.

12.

Durante

Fabrizio

Sempi

Carlo

. 2010. “Copula Theory: An Introduction.” Pp. 3–32 in Copula Theory and Its Applications: Proceedings of the Workshop Held in Warsaw, 25-26 September 2009, edited by Jaworski

Durante

Hardle

Rychlik

. New York: Springer.

13.

Dwyer

Rachel E.

McCloud

Laura

Hodson

Randy

. 2012. “Debt and Graduation from American Universities.” Social Forces 90:1133–55.

14.

Elwert

Felix

Christakis

Nicholas A.

. 2006. “Widowhood and Race.” American Sociological Review 71:16–41.

15.

Embrechts

Paul

Lindskog

Filip

McNeil

Alexander

. 2003. “Modeling Dependence with Copulas and Applications to Risk Management.” Pp. 329–84 in Handbook of Heavy Tailed Distributions in Finance, edited by Rachev

S. T.

. Maryland Heights, MO: Elsevier.

16.

Embrechts

Paul

McNeil

Alexander

Straumann

Daniel

. 2002. “Correlation and Dependence in Risk Management: Properties and Pitfalls.” Pp. 176–223 in Risk Management: Value at Risk and Beyond, edited by Dempster

M. A. H.

. New York: Cambridge University Press.

17.

Erhardt

Vinzenz

Czado

Claudia

. 2012. “Modeling Dependent Yearly Claim Totals Including Zero Claims in Private Health Insurance.” Scandinavian Actuarial Journal 2:106–29.

18.

Escarela

Gabriel

Carriere

Jacques F.

. 2003. “Fitting Competing Risks with an Assumed Copula.” Statistical Methods in Medical Research 12:333–49.

19.

Farlie

D. J. G.

1960. “The Performance of Some Correlation Coefficients for a General Bivariate Distribution.” Biometrika 47:307–23.

20.

Frank

M. J.

1979. “On the Simultaneous Associativity of F (x,y) and x + y – F (x,y).” Aequationes Mathematicae 18:266–67.

21.

Fredricks

Gregory A.

Nelsen

Roger B.

. 2007. “On the Relationship between Spearman’s Rho and Kendall’s Tau for Pairs of Continuous Random Variables.” Journal of Statistical Planning and Inference 137:2143–50.

22.

Frees

Edward W.

Carriere

Jacques

Valdez

Emiliano A.

. 1996. “Annuity Valuation with Dependent Mortality.” Journal of Risk and Insurance 63:229–61.

23.

Frees

Edward W.

Valdez

Emiliano A.

. 1998. “Understanding Relationships Using Copulas.” North American Actuarial Journal 2:1–25.

24.

Frees

Edward W.

Valdez

Emiliano A.

. 2008. “Hierarchical Insurance Claims Modeling.” Journal of the American Statistical Association 103:1457–69.

25.

Frees

Edward W.

Wang

Ping

. 2005. “Credibility Using Copulas.” North American Actuarial Journal 9:31–48.

26.

Genest

Christian

. 1987. “Frank’s Family of Bivariate Distributions.” Biometrika 74:549–55.

27.

Genest

Christian

Favre

Anne-Catherine

. 2006. “Everything You Wanted to Know about Copula Modeling But Were Afraid to Ask.” Journal of Hydrologic Engineering 11:347–68.

28.

Genest

Christian

Rémillard

Bruno

Beaudoin

David

. 2009. “Goodness-of-fit Tests for Copulas: A Review and Power Study.” Insurance: Mathematics and Economics 44:199–213.

29.

Genest

Christian

Nešlehová

Johanna

. 2007. “A Primer on Copulas for Count Data.” ASTIN Bulletin 372:475–515.

30.

Genest

Christian

Rivest

Louis-Paul

. 1993. “Statistical Inference Procedures for Bivariate Archimedean Copulas.” Journal of the American Statistical Association 88:1034–43.

31.

Grønneberg

Steffen

Hjort

Nils Lid

. 2014. “The Copula Information Criteria.” Scandinavian Journal of Statistics 41:436–59.

32.

Gumbel

E. J.

1958. “Distributions a plasieurs variables dont les marges sont donnees.” Comptes Rendus de l’Academie des Sciences Paris 246:2717–19.

33.

Gumbel

E. J.

1960. “Distributions des Valeurs Extremes en Plusieurs Dimensions.” Publications de l’Institute de Statistique Paris 9:171–73.

34.

Hays

Jude C.

Kachi

Aya

. 2009. “Independent Duration Models in Political Science.” Paper presented at the annual meeting of the American Political Science Association, Toronto, Canada.

35.

Heinen

Andréas

Rengifo

Erick

. 2007. “Multivariate Autoregressive Modeling of Time Series Count Data Using Copulas.” Journal of Empirical Finance 14:564–83.

36.

Heinen

Andréas

Rengifo

Erick

. 2008. “Multivariate Reduced Rank Regression in Non-Gaussian Contexts Using Copulas.” Computational Statistics & Data Analysis 52:2931–44.

37.

Hougaard

Philip

. 1986. “A Class of Multivariate Failure Time Distributions.” Biometrika 73:671–78.

38.

Hougaard

Philip

. 2000. Analysis of Multivariate Survival Data. New York: Springer.

39.

Joe

Harry

. 1997. Multivariate Models and Dependence Concepts. London, UK: Chapman & Hall.

40.

Joe

Harry

. 2015. Dependence Modeling with Copulas. Boca Raton, FL: Taylor & Francis.

41.

Junker

Markus

Szimayer

Alex

Wagner

Niklas

. 2006. “Nonlinear Term Structure Dependence: Copula Functions, Empirics, and Risk Implications.” Journal of Banking & Finance 30:1171–99.

42.

Kojadinovic

Ivan

Yan

Jun

. 2010. “Modeling Multivariate Distributions with Continuous Margins Using the Copula R Package.” Journal of Statistical Software 34:1–20.

43.

Lynch

James P.

Addington

Lynn A.

. 2007. Understanding Crime Statistics: Revisiting the Divergence of the NCVS and UCR. Cambridge, UK: Cambridge University Press.

44.

MacKenzie

Donald

. 2011. “The Credit Crisis as a Problem in the Sociology of Knowledge.” American Journal of Sociology 116:1778–841.

45.

Madsen

2009. “Maximum Likelihood Estimation of Regression Parameters with Spatially Dependent Discrete Data.” Journal of Agricultural, Biological, and Environmental Statistics 14:375–91.

46.

Madsen

Fang

. 2011. “Joint Regression Analysis for Discrete Longitudinal Data.” Biometrics 67:1171–76.

47.

McNeil

Alexander J.

Frey

Rudiger

Embrechts

Paul

. 2005. Quantitative Risk Management: Concepts, Techniques, Tools. Princeton, NJ: Princeton University Press.

48.

Morgenstern

1956. “Einfache Beispiele Zweidimensionaler Verteilungen.” Mitteilingsblatt fur Mathematische Statistik 8:234–35.

49.

Nelsen

Roger B.

1986. “Properties of a One-parameter Family of Bivariate Distributions with Specified Marginals.” Communications in Statistics–Theory and Methods 15:3277–85.

50.

Nelsen

Roger B.

[1998] 2006. Introduction to Copulas. 2nd ed. New York: Springer.

51.

Nikoloulopoulos

Aristidis K

. 2013. “On the Estimation of Normal Copula Discrete Regression Models Using the Continuous Extension and Simulated Likelihood.” Journal of Statistical Planning and Inference 143:1923–37

52.

Nikoloulopoulos

Aristidis K.

Karlis

Dimitris

. 2009. “Finite Normal Mixture Copulas for Multivariate Discrete Data Modeling.” Journal of Statistical Planning and Inference 139:3878–90.

53.

R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. Retrieved (http://www.R-project.org).

54.

Salvadori

Gianfausto

Michele

Carlo De

Kottegoda

Nathabandu B.

Rosso

Renzo

. 2007. Extremes in Nature: An Approach Using Copulas. New York: Springer.

55.

Schweizer

Sklar

. 1974. “Operations on Distribution Functions Not Derivable from Operations on Random Variables.” Studia Mathematica 52:43–52.

56.

Schweizer

Wolff

E. F.

. 1981. “On Nonparametric Measures of Dependence for Random Variables.” The Annals of Statistics 9:879–85.

57.

Shemyakin

Arkady E.

Youn

Heekyung

. 2006. “Copula Models of Joint Last Survivor Analysis.” Applied Stochastic Models in Business and Industry 22:211–24.

58.

Shippee

Nathan D.

Owens

Timothy J.

. 2011. “GPA, Depression, and Drinking: A Longitudinal Comparison of High School Boys and Girls.” Sociological Perspectives 54:351–76.

59.

Shumway

Robert H.

Stoffer

David S.

. 2011. Time Series Analysis and its Applications: With R Examples. 3rd ed. New York: Springer.

60.

Sklar

1959. “Fonctions de Repartition a n Dimensions et leurs marges.” Publications de l’Institute de Statistique Paris 8:229–31.

61.

Skrondal

Anders

Rabe-Hesketh

Sophia

. 2005. Structural Equation Modeling: Categorical Variables. Encyclopedia of Statistics in Behavioral Science. Hoboken, NJ: Wiley.

62.

Smith

Murray D.

2005. “Using Copulas to Mode Switching Regimes with an Application to Child Labour.” Economic Record 81:S47–57.

63.

Trivedi

Pravin K.

Zimmer

David M.

. 2007. “Copula Modeling: An Introduction for Practitioners.” Foundations and Trends in Economics 1:1–111.

64.

Vandenberghe

Verhoest

N. E. C.

De Baets

. 2010. “Fitting Bivariate Copulas to the Dependence Structure between Storm Characteristics: A Detailed Analysis Based on 105 Year 10 Min Rainfall.” Water Resources Journal 46:W01512.

65.

Wang

Weijing

Wells

Martin T.

. 2000. “Model Selection and Semiparametric Inference for Bivariate Failure-time Data.” Journal of the American Statistical Association 95:62–72.

66.

Wray

Matt

Colen

Cynthia

Pescosolido

Bernice

. 2011. “The Sociology of Suicide.” Annual Review of Sociology 37:505–28.

67.

Yan

Jun

. 2007. “Enjoy the Joy of Copulas: With a Package Copula.” Journal of Statistical Software 21:1–21.

68.

Zimmer

David M.

2012. “The Role of Copulas in the Housing Crisis.” The Review of Economics and Statistics 94:607–20.

69.

Zimmer

David M.

Trivedi

Pravin K.

. 2006. “Using Trivariate Copulas to Model Sample Selection and Treatment Effects: Application to Family Health Care Demand.” Journal of Business and Economic Statistics 24:63–76.