Abstract
Split-ballot data are often used to study double standards. The key problem of this design is that individual double standards cannot be identified. I propose a simple two-step approach based on a matching pre-processing of the data to estimate individual double standards. Once this preliminary first step is completed, any statistical technique (e.g., regression models) can be applied on the new data. I apply the method to gender double standards on attitudes toward the age one leaves home by using data from the third round of the European Social Survey. The proposed method simplifies regression analyses of the effects of covariates on double standards and offers new opportunities for research on double standards.
In this article, I propose a simple two-step approach for the analysis of double standards using split-ballot survey data. A double standard refers to the use of different criteria for judging the same trait, behavior, or performance of two different groups of people (Foschi 1996).
There have been many studies in social psychology and other disciplines on gender double standards, specifically on the differential evaluation of men and women, which have used a variety of methodological approaches ranging from qualitative (e.g., Reid, Elliott, and Webber 2011) to experimental methods (e.g., Foschi 1996; see Crawford and Popp 2003 for a review).
Quantitative nonexperimental studies, on which the technique I propose in this article can be applied, have been used to collect data on large samples, especially in cross-national surveys and when double standards items are part of a broader set of questions. These studies have used two different types of survey designs to analyze gender-based double standards. In the first type of design, used by Reiss (1964), all respondents answer the same questions on the acceptance of given behaviors with reference to women and men separately. In this approach, which I label as repeated measures design, asking the same question twice for the two gender targets may introduce a bias. Counterbalancing, consisting of rotating across different subsamples the order of the gender target asked first, can be used to avoid systematic bias due to question order effects, but other disadvantages of the method remain: increased interviews’ duration and cost, higher probability of nonresponse, and within-respondents dependence of observations.
The second approach, named split-ballotdesign, consists of randomly drawing two subsamples to be assigned to items regarding women and men, respectively (see e.g., Rijken and Merz 2014). The split-ballot design is a classic way to integrate experimental design and the survey of a representative population sample, combining the distinctive external validity advantages of representative surveys with the decisive internal validity strengths of the fully randomized experiment (Sniderman and Grob 1996).
The main limitation of the split-ballot design is that it does not allow one to identify the existence of double standards at the individual level because each respondent either answers the question regarding men or women but does not answer regarding both.
The goal of this article is not to compare approaches based on split-ballot data with alternative ones. The focus is to discuss how to impute the missing answer in split-ballot data if the investigator has to rely on them. The simplest approach to impute the missing answer uses regression models where a dummy for the gender of target is included among the covariates (see e.g., Rijken and Merz 2014). This approach has some limitations. First, a methodological drawback is that the imputation relies on the goodness of the specified regression model assumptions (e.g., linearity). Second, it makes it difficult to study the full heterogeneity that can characterize double standards. One should in fact include all possible interactions between the dummy for the gender of target and all other covariates. This becomes unfeasible or makes it difficult to interpret the results when several covariates are included, especially in complex models like multilevel and/or nonlinear models.
I propose a simple two-step approach based on pre-processing the data by matching before multivariate analyses are implemented. In the first step, I impute the missing answer by using matching, namely, for individuals assigned to the male target I look for one or more similar individuals that were assigned to the female target and vice versa. Similarity is defined on the basis of covariates thought to be important predictors of the outcome under study. Individual-level double standards are then estimated as the difference between the observed and imputed responses for each individual. These individual-level double standards can be used in a second step for further analysis, such as a regression analyses to assess whether specific individual characteristics influence double standards. Since matching pre-processing makes the two groups under study more similar with respect to covariate distributions, regression analyses implemented on the matched data set will be less sensitive to parametric assumptions.
The proposed approach has several advantages. The imputation is made transparent by the matching procedure, and parametric assumptions are avoided. However, the most relevant advantages are for the substantive research questions that the approach helps to answer. The imputation step allows the estimation of double standards for each individual, and these can be directly analyzed. Any technique can be used in the second step. With this approach, the study of the (possibly complex) heterogeneity characterizing double standards formation is simplified. I apply this approach to data from the third round of the European Social Survey.
Methods
Consider a split-ballot design questionnaire where, for ease of use and analysis, a single question on attitudes related to the acceptance of a given behavior is included. Assume that we have a random sample of N individuals. I indicate with T the gender of target indicator; the value 1 is for respondents assigned to the male question and 0 for those who answer the female question. Y indicates the observed attitude, and X indicates a set of covariates. With m, I indicate the generic individual assigned to the male target, m = 1, … , M, and f is the generic individual assigned to the female target, f = 1, … , F. Usually, M≈F≈½N, namely, about half of the respondents are randomly assigned to the male target, and the rest are assigned to the female target.
Adapting the concept of potential outcomes, independently of the split-ballot assignment, two (potential) attitudes exist for each individual, Y(1) and Y(0), that, respectively, indicate the attitude of each respondent toward the behavior of men and women. The observed attitudes, Y, will be equal to Y(1) for individuals assigned to the male target and equal to Y(0) otherwise: Y = T×Y(1) + (1 – T) ×Y(0).
The difference (or any other comparison) between the two potential attitudes, D = Y(1) –Y(0), is a measure of double standard. We say that an individual i holds a double standard if di≠ 0. Since each respondent either answers the question for males or females, the individual double standard is not observed. I propose to impute the missing attitude using a matching approach. The idea is to use covariates X to find for each individual, m, assigned to the male target an “identical” individual, f, assigned to the female target. Then the estimated individual double standard is the difference between the observed answer of individual m and the answer of the matched individual f:
The CEM algorithm involves three steps: (1) coarsen each control variable in X as much as the researcher is willing, for the purposes of matching; (2) sort all units into strata, each of which has the same values of the coarsened X; (3) discard from the data set the units in any stratum that do not include at least one unit of both targets. Coarsening means that continuous covariates are transformed into categorical variables and categories of qualitative variables are eventually grouped. Note that the quality of matching strictly depends on the choice of covariates that need to be selected on the basis of theoretical arguments.
Once we have estimated the variable D for each subject, we can directly use this new variable in a multivariate analysis. It is worth noting that my approach based on pre-processing the data via CEM is similar to those used in several sociological works for the estimation of causal effects (e.g., Liao et al. 2013).
Alternative matching methods are available, such as propensity score matching (PSM), which is widely used in the treatment effects literature (see e.g., Arpino and Aassve 2013). The propensity score is defined as the conditional probability to be treated given the observed covariates, and it is usually estimated via logistic regression. PSM consists of matching on this univariate variable in order to create a group of treated and control units that have similar distributions of the covariates. As argued by Iacus etal. (2011), one of the key advantages of CEM over PSM is that by specifying a priori the maximum imbalance (the intervals of each coarsened covariate), it avoids the potentially laborious process of finding a specification of the propensity score model that guarantees a good balance of each covariate. Moreover, CEM guarantees that increasing the balance on one variable cannot increase imbalance on another, while this can happen in PSM. CEM is less sensitive to measurement error, has greater computational efficiency, and is more intuitive than PSM. However, results on the comparative performance of CEM versus PSM are not yet well established, and existing findings are mixed (see e.g., King et al. 2011; Thompson 2014).
Instead of matching, an alternative way to impute the missing target response is model-based imputation (Allison 2001). There are no specific results available on split-ballot data, but several studies in the treatment effects literature point to the superiority of matching over model-based approaches. The key advantage of matching is that extrapolation is avoided by retaining in the final data set only units that successfully found at least one match in the other target group. Model-based imputation, instead, relies on a parametric model and may be biased especially in the presence of highly unbalanced distributions of covariates across treatment groups and in the presence of spaces of nonoverlap in their distributions (see e.g., Drake 1993). See also Imbens and Rubin (2015) for a review of different methods used in causal inference and Allison (2001) for a discussion of different methods used for imputation of missing data.
In the following, I will focus on CEM as a method to impute the missing target response.
Data and Variables
To illustrate the proposed approach, I use data from the third round of the European Social Survey (ESS-3) conducted in 2006 in 25 European countries.
A key innovation of the ESS-3 is that it includes a module with a split-ballot design (named “Timing of the life course”), which contains questions on individuals’ attitudes toward life-course choices. A set of ESS-3 questions useful to study double standards relates to the transition to adulthood. Specifically, questions pertain to individuals’ perceptions of when life course events should or should not occur. I focus in particular on what an individual believes is the appropriate age for leaving home. To be specific, the question is posed as follows: “After what age would you say a man [woman] is generally too old to still be living with her or his parents?” One problem with this variable is that respondents are allowed to answer, “one is never too old to live with his parents,” and the percentage of “never” responses is quite high in some countries. Ignoring the “never” response or including them among the category representing high values of the appropriate age for leaving home leads to similar results (for such an analysis, see Aassve, Arpino, and Billari 2013).
Results
In this section, I apply the matching pre-processing approach discussed previously to the question related to the appropriate age for leaving home. For illustrative purposes, the analysis will be limited to the United Kingdom, which showed one of the highest sample sizes (N = 1,588) among the ESS-3 countries.
I tested for evidence of self-selection for those answering “one is never too old to live with her or his parents” by using a Heckman selection model. I did not find evidence for self-selection in the UK sample. Therefore, I ignored the never category and considered the variable as numerical.
I apply CEM to find strata of individuals with similar characteristics using the R package CEM (Iacus, King, and Porro, 2009). I selected covariates thought to influence attitudes about the age when leaving the parental home: respondent’s gender (1 = female, 0 = male), age, education (1 = primary or less, 2 = secondary, 3= postsecondary), religiosity (0 = not at all religious to 10 = very religious). The numerical covariates (age and religiosity) were coarsened into four equal frequency intervals. Then all units were assigned to one of the strata formed by combining the values of the considered variables. Finally, the algorithm discarded from the data set the units in any stratum that did not include at least one individual assigned to both the male and female versions of the questionnaire. Given that the gender of the target was randomly assigned, I had very few unmatched units (5 with T = 1 and 14 with T = 0). The final sample size was therefore 1,569. Again, given randomization of T, I do not expect strong imbalance in the covariates distribution between the two groups. In fact, the L1 index, which is a measure of imbalance ranging from 0 = maximum balance to 1 (Iacus et al. 2011), is very small (<.05) for all the covariates. However, the multivariate L1 calculated for some combination of covariates (e.g., female and education) showed higher values (.21).
The estimated difference between the average age evaluation for the two targets was equal to .30 and was not statistically significant (95 percent confidence interval of –.41 to 1.01); this indicates no evidence of a double standard in the United Kingdom.
The fact that overall there is no evidence of double standard does not imply that nobody in the UK population holds a double standard. In causal inference studies, researchers often explore heterogeneity of treatment effect to assess whether they vary according to individual characteristics (see e.g., Xie, Brand, and Jann 2012). This can give important insights on the variability of treatment effects around the mean.
In the context of my application, it is possible that the small average difference between gender targets found in the United Kingdom could mask considerable heterogeneity. In other words, it may be that some subgroups do hold a double standard while others do not. Different approaches can be followed to define such groups. Groups can be specified a priori on the basis of theoretical arguments, and analyses can be stratified according to one or more variables. CEM could be implemented separately for men and women if we hypothesize, for example, that the likelihood of holding a double standard is higher for men. Alternatively, and this is the way I proceed in the following analysis in figure 1, an exploratory approach can be used. CEM offers a natural way to investigate subgroup heterogeneity by analyzing double standards in each of the strata created combining all coarsened covariates. It is worth noting that this subgroup analysis is intended to be an explorative step to suggest the existence of individual factors that may influence double standards, which could then be further tested with additional analyses or future studies. This analysis can be implemented irrespectively of the result of the analysis on the pooled sample. For example, in case of an overall significant positive effect, it could still be interesting to explore whether some subgroups hold a stronger/weaker/null double standard.

Strata-Level Estimates of Double Standards toward Age at Leaving Home Ordered by Magnitude and Classified by Sign and Statistical Significance
Figure 1 shows double standard estimates by strata sorted in numerical order and classified in three groups: negative (solid circle), not significantly different from zero (X), or positive (hollow circle). Negative estimates indicate that the average declared age deadline for women is higher than for men, vice versa for positive estimates. The number of non-empty strata was 62, and the number of respondents per stratum ranged from 5 to 69 units. So, some strata-specific estimates are quite unreliable. The sample size of each stratum is automatically determined by the algorithm according to the way covariates have been coarsened. This guarantees to obtain the level of balance in the distribution of covariates between treated and control units fixed a priori by the investigator. This may implicate to have, as in my application, some strata with small sample sizes. To increase stratasample sizes, less coarsening shouldbe imposed on the covariates (Iacus et al. 2011). However, since the primarygoal of matching is to obtain a good balance of covariates, this is not advisable.
An alternative approach would be to require the same sample size in each stratum (k2k in the CEM package). In this case, weights would not be used in the following analyses, but more units would be discarded. Discarding units is, in general, problematic because it may imply changing the estimand, namely, the average difference between the two groups under study would strictly refer only to the subpopulation of units that hold characteristics represented in the matched data set. In other words, discarding units with specific characteristics may compromise the external validity of the study. This is not a problem in my study where only 19 units (out of 1,588) were discarded.
From Figure 1, we can see that there is evidence of considerable heterogeneity in the UK sample: there are several strata that show positive and significant estimates, while on the other hand there are strata with negative and significant estimates. The fact that overall there is no significant difference between the two targets is due to compensation effects between these two groups to which belong about 30 percent and 24 percent of the matched units, respectively. The remaining 46 percent belongs to the “not significant” group, indicating that a great part of the UK population do not hold a double standard with respect to age deadlines related to leaving home.
After we found an indication of heterogeneity in the population, it is important to characterize the different groups. Table 1 reports descriptive statistics by the three groups previously identified. Among those that tend to accept men staying at home longer, there is a prevalence of young females with low education. On the contrary, among those who are more “tolerant” with women, we notice higher age and education. Religiosity seems not to differ much across groups. Without the matching pre-processing step, this analysis would not have been possible.
Covariate Profile of Three Groups with Different Double Standards toward Age at Leaving Home
Note: Age and religiosity categories are binary variables roughly corresponding to inter-quartile ranges; education categories correspond to primary or less, secondary, or postsecondary education levels.
To assess the statistical significance of these heterogeneities of double standards, I implement a regression analysis where the outcome is the estimated individual double standard from the matching procedure described previously. Formally, I estimate the model:
where
Table 2 reports the estimates of Models 1 and 2. Standard errors are clustered at the CEM strata level, and weights proportional to strata sizes are applied. In Model 1, all the covariates but religiosity are significantly associated with double standards. Women tend to be more “tolerant” on the age at leaving home for men than for women (positive double standard), especially if they are young, as indicated by the negative interaction between the variables female and age in Model 2.
Double Standards toward Age at Leaving Home: Regression Results after Matching Pre-Processing
Note: Standard errors are clustered at the CEM strata level. Religiosity categories are binary variables roughly corresponding to interquartile ranges; education categories correspond to primary or less, secondary, or postsecondary education levels.
p < .10. **p < .05. ***p < .01.
As an alternative to this two-step approach, it is possible to run a regression analysis directly on the raw data (this approach is labeled as one-step). In this case, I estimate a regression model on the observed attitude answers and include a dummy variable for gender of target T among the covariates. Moreover, to account for heterogeneity in double standards, as Model 1 does, a full set of interactions between T and the other covariates has to be included. Formally, I estimate the model:
where Y represents the observed attitude. Estimates of Model 3 are reported in Table 3. First of all, standard errors are now much higher than in Model 1 of Table 2, and there are only significant differences by gender of the respondent. Moreover, Model 3 compared to Model 1 is more complex (16 instead of 8 parameters), and its interpretation is more involved. For example, the effect of gender on double standards in Model 3 is captured by the interaction Target × Female instead of being represented directly by the coefficient of the variable female as in Model 1.
Double Standards toward Age at Leaving Home: Regression Results without Matching Pre-Processing
Note: Religiosity categories are binary variables roughly corresponding to interquartile ranges; education categories correspond to primary or less, secondary, or postsecondary education levels.
p < .10. ***p < .01.
Finally, we can notice that in order to test for interaction effects on double standards (e.g., by gender and education) as done in Model 2 of Table 2, we should add in Model 3 also three-way interactions, making the model and interpretation even more complicated.
Summarizing, we can notice that introducing a matching data pre-processing step simplifies the analysis of the effects of covariates on double standards and offers some additional opportunity for research on double standards.
Conclusions
In this article, I discussed how to analyze double standards using data collected with a split-ballot survey. The key problem of this design is that individual double standards cannot be identified. I proposed a two-step method that simplifies the analysis of split-ballot data and offers new opportunities for research on double standards. The approach consists of a first step in which units assigned to one target are matched with units assigned to the other target to estimate individual double standards. Once this matching pre-processing of the data is completed, any statistical technique (e.g., regression model) can be applied on the new data. I illustrated the method using data on attitudes toward age at leaving home from the third round of the European Social Survey.
The proposed method simplifies regression analysis of the effect of covariates on double standards. In fact, full interactions between the indicator for the target and covariates are not necessary. Having estimated individual double standards, these can be directly modeled using only the main effects of covariates. If interaction effects have to be tested, three-way interactions that should be included in the traditional approach are substituted by two-way interactions.
The proposed matching pre-processing approach also offers the possibility to implement analyses that otherwise could not be realized with split-ballot data. For example, having obtained estimates of individual double standards, one can calculate the proportion of individuals that hold (different types of) double standards. As another example, in case of analyzing different dimensions, a cluster or factor analysis could be implemented on the estimated double standards for different items.
While the focus of this article has been on gender double standards, the proposed methodological approach can be applied more generally to evaluate double standards with reference to race, age, economic status, and so on. Moreover, the proposed two-step approach can be generally applied to pre-process split-ballot data, and so it is not restricted to analyses of double standards.
Several interesting avenues of future research can further develop the work presented in this article. I used coarsened exact matching (CEM) to impute missing target responses in a split-ballot survey, but other alternative methods exist, including propensity score matching (PSM) and model-based imputation. An interesting avenue for future research is to implement simulation studies tailored to split-ballot survey data to compare the performance of CEM, PSM, and model-based imputation. Moreover, it would be interesting to compare split-ballotversus repeated measures designs (with and without counterbalancing). Again, simulation experiments could be used. Alternatively, one could use a repeated measures data set, randomly derive a constructed split-ballot data set, and then compare the results from different approaches.
One limitation of the approach proposed here is that uncertainty due to imputation of the missing response is not taken into account in the second step analyses. Multiple imputation can be applied in future studies to address this issue. Also, measurement error may bias results if people who hold and those who do not hold double standards tend to answer the survey questions in a different way.
Qualitative, experimental, and survey-based approaches have been used in studies on double standards. Each approach has its own pros and cons (for a review, see Crawford and Popp 2003). Qualitative and experimental studies, for example, allow for a better understanding of the causal mechanisms behind the emergence of double standards but are often limited to small nonrepresentative samples (Foschi 1996). A valuable topic for future research is to conduct a meta-analysis to assess the potential role of the employed technique on the results found in double standard studies.
