The Factorial Survey

Abstract

The factorial survey is an experimental design consisting of varying situations (vignettes) that have to be judged by respondents. For more complex research questions, it quickly becomes impossible for an individual respondent to judge all vignettes. To overcome this problem, random designs are recommended most of the time, whereas quota designs are not discussed at all. First comparisons of random designs with fractional factorial and D-efficient designs are based on fictitious data, first comparisons with fractional factorial and confounded factorial designs are restricted to theoretical considerations. The aim of this contribution is to compare different designs regarding their reliability and their internal validity. The benchmark for the empirical comparison is established by the estimators from a parsimonious full factorial design, each answered by a sample of 132 students (real instead of fictitious data). Multilevel analyses confirm that, if they exist, balanced confounded factorial designs are ideal. A confounded D-efficient design, as proposed for the first time in this article, is also superior to simple random designs.

Keywords

factorial survey reliability internal validity random design quota design multilevel analysis

Introduction

The factorial survey (vignette analysis) is an experimental design in which the researcher combines varying descriptions of persons or situations (vignettes) which will be judged by respondents from a particular point of view. If each respondent is presented with a sufficiently large number of vignettes, then it becomes possible to estimate the weight assigned by each individual to the different vignette characteristics indirectly via respondent-specific regression analysis. An advantage of such decompositional procedures consists in the fact that the respondent has to judge concrete vignette descriptions as a whole without being forced to indicate the influence of each individual vignette characteristic explicitly. Judging concrete vignettes is also much closer to real judgment in daily life than answering comparably general and most of the time rather abstract questions, as is usual for survey research (cf. also Beck and Opp 2001:304). For this reason, the factorial survey allows a respondent’s opinion to be ascertained with higher reliability and higher validity than it is possible with more general single questions (cf. Alexander and Becker 1978:93). However, as the complexity of the research question increases (more vignette factors, i.e., more vignette dimensions, and/or more levels for the vignette factors), so does the number of possible vignette combinations (vignette universe). As a consequence, the number of vignettes to be judged by an individual respondent has to be restricted to an increasingly lower percentage of the whole vignette universe. If only a vignette sample can be judged by each respondent, then the question of an optimal sample becomes important.

Most introductions to vignette analysis are restricted to random designs (cf. Jasso 2006:343; Rossi 1979:179; Rossi and Anderson 1982:40-41), that is, to designs where the vignette sample (one sample for all respondents) or samples are drawn randomly from the vignette population.¹ If and as long as, for substantive reasons, multiple ratings per vignette are not required for a research question (examples are given, for instance, by Jasso 2006:343, 379-80), it is most common to randomly draw a vignette sample (vignette set) of the same set size for each participant. The vignette sets are assigned randomly to the participants. Each individual vignette set as well as the combined sample of vignette sets filled out by different respondents has, within the limits of sampling error, the same features as the fully crossed vignette universe: The vignette universe is orthogonal, that is, all main effects and interaction effects can be estimated uncorrelated, which implies that all effects can be estimated independently of all other effects (orthogonality per se includes all effects except the intercept), and the vignette universe is balanced, that is, “each level occurs equally often within each vignette factor, which means that the intercept is orthogonal to each effect” (cf. Kuhfeld 1997:2; Kuhfeld, Tobias, and Garratt 1994:545-46). Even if a single vignette set might strongly deviate from the central features (orthogonality, balance) of the fully crossed vignette universe, with an increasing number of respondents the combined sample of vignette sets nonetheless asymptotically approximates these features within the limits of a decreasing sampling error. Hence, the combined sample is a representative sample of the fully crossed vignette universe. Quota designs offer a possible alternative to random designs. The basic idea of quota designs is to represent, as far as possible, the central features of the fully crossed vignette universe by constructing only one or comparably few different vignette sets. As with random designs, the vignette samples are assigned randomly to the respondents. By using quota designs instead of random designs it is intended to optimize the efficiency (i.e., optimizing the precision by preserving the unbiasedness) with which the unstandardized regression coefficients of main and included interaction effects can be estimated.

A first systematic comparison between random and quota designs was carried out by Dülmer (2007), who also illustrated the expected differences by analyzing fictitious data. The comparison covered two different kinds of quota designs, namely, fractional factorial designs (cf. for instance, Gunst and Mason 1991) that had already been used by Alexander and Becker (1978) for vignette analysis and D-efficient designs (cf. Kuhfeld et al. 1994) that have been used for years in conjoint analysis.² A common property of both quota designs is that all participants have to judge the same vignette sample. Whereas fractional factorial designs are always orthogonal but not necessarily balanced, D-efficient designs try to optimize both of these features simultaneously. By somewhat relaxing the classical requirement of orthogonality, a D-efficient design can be found for every set size (which is not true for fractional factorial designs). A further quota design, first discussed for vignette analysis by Steiner and Atzmüller (2006:132-33), is the confounded factorial design. In this case, a number of different fractional factorial designs are used for data collection. Besides the confounded factorial design, the theoretical comparison by Steiner and Atzmüller (2006, cf. also Atzmüller and Steiner 2010) also includes random designs and fractional factorial designs. Their empirical analyses, which were carried out with a small respondent sample, are confined to the confounded factorial design. Hence, the behavior of the estimates (unstandardized regression coefficients and their t values) under different design conditions could not be compared empirically. A further way to generate a vignette design, up to now not discussed in the literature, consists in constructing a confounded D-efficient design. The aim of this procedure is to improve the efficiency for estimating unstandardized regression coefficients in comparison to random designs and simple D-efficient designs, in cases where no suitable confounded factorial design exists.³

The aim of this contribution is to outline the basic ideas behind the main variants of random and quota designs and to shed some light on the expected differences regarding their reliability and internal validity. The conclusions drawn from these considerations will be illustrated afterward on the basis of real data. The main focus of this article is on research questions to assess the impact of vignette characteristics on respondents’ answer behavior.⁴ The analyses will be carried out using a multilevel program that has become standard for analyzing hierarchical data. The use of a simple research question allowed each respondent to be presented with the completely crossed vignette universe. After the data for the completely crossed vignette universe were collected, a fixed number of reduced subsamples (fractions) of the completely crossed vignette universe were generated for each respondent, either randomly or by using a quota design. The answers for vignettes of the reduced subsamples were copied from the fully crossed vignette universe. The intended statistical comparison of random and quota designs is based on these fractions (vignettes plus corresponding respondent-specific answers) of the completely crossed vignette universe. An advantage of having the complete information from the full factorial design is that its estimators establish an empirical yardstick for evaluating the reliability and the internal validity of the designs that are included in the comparison. This is the first time that such a comparison has been carried out.

Basic Ideas, Main Variants, and Applicability of Random and of Quota Designs

Random Designs

In the literature, basically two proposals for generating simple random designs can be distinguished: In their introduction to the factorial survey, Rossi and Anderson (1982:40-41) recommend using a computer routine that generates the needed random samples from a variable list. The computer routine picks out a value (level) at random from the first vignette variable (characteristic), then a value of the second vignette variable, and so on through the last vignette variable. Each such cycle produces an additional vignette. The cycle is repeated until the number of vignettes for the specified set size has been reached. Thereafter, the whole routine has to be repeated until a vignette sample of the same set size has been produced for each respondent (simple random design with replacement). In a more recent introduction to the factorial survey, Jasso (2006:342-43) recommends drawing each vignette set randomly out of the fully crossed vignette universe, which makes it easy to draw simple random designs without replacement. Drawing without replacement might be preferable, especially for small vignette populations, since it guarantees that no respondent has to judge a vignette more than once.

The basic idea behind using simple random designs is to represent the vignette universe as accurately as possible with different vignette samples of the same set size. Each vignette set is, within the limits of sampling error, a reduced representative random sample of the complete vignette universe. Merging such random vignette samples again produces a random sample of the whole vignette universe. The combined vignette sample, however, has a much smaller sampling error than each individual vignette set (cf. Rossi and Anderson 1982:29-30). The higher the number of vignette sets, the closer the combined vignette sample approximates the central features of the fully crossed vignette universe: All main effects and interaction effects within the limits of a decreasing sampling error become asymptotically more orthogonal, and the levels within each vignette factor become asymptotically more balanced.

To what extent the estimators of a factorial survey can benefit from the smaller sampling error of the combined vignette sample, however, depends on the heterogeneity of the respondents’ answer behavior. Since respondents in general have to answer more than one vignette, the collected data are nested hierarchically: The answer behavior is embedded in the personal context of each individual respondent. For such hierarchically structured data, multilevel regression analysis (cf. Hox 2010; Raudenbush and Bryk 2002; Snijders and Bosker 2012) is the recommended choice. The mathematical equation system for a multilevel model with four vignette variables, no interaction effects, and completely heterogeneous answer behavior is given in the following (cf. also Hox 2010:11-13; Raudenbush and Bryk 2002:35; Snijders and Bosker 2012:74-75):

Vignette level (Level 1):

Y_{i j} = β_{0 j} + β_{1 j} X_{1 i j} + β_{2 j} X_{2 i j} + β_{3 j} X_{3 i j} + β_{4 j} X_{4 i j} + r_{i j}

Respondent level (Level 2):

β_{0 j} = γ_{00} + u_{0 j},

β_{1 j} = γ_{10} + u_{1 j},

β_{2 j} = γ_{20} + u_{2 j},

β_{3 j} = γ_{30} + u_{3 j},

β_{4 j} = γ_{40} + u_{4 j},

where

denotes a vignette,

denotes a respondent,

Y_{i j}

is the answer of the ith respondent about the jth vignette,

X_{1 i j} t o X_{4 i j}

are the vignette variables 1 to 4,

β_{0 j} t o β_{4 j}

denote a respondent’s unstandardized regression coefficient for the intercept ( $β_{0 j}$ ) and the slopes ( $β_{1 j} t o β_{4 j}$ ) of the predictor variables at the vignette level,

γ_{00} t o γ_{40}

denote the grand mean (average unstandardized regression coefficient) for the intercept ( $γ_{00}$ ) and the slopes ( $γ_{10} t o γ_{40}$ ) of the predictor variables at the vignette level,

u_{0 j} t o u_{4 j}

denote the residuals (random terms) at the respondent level, that is, respondent-specific deviations from a respective grand mean, and

r_{i j}

denotes the residual at the vignette level.

From the equation for the respondent level (level 2), it becomes clear that by including the u-terms in the regression equation each respondent can have his or her own regression equation, that is, an individual β for the intercept as well as for each of the four slopes. These respondent-specific βs deviate by a respondent-specific u-term from their respective grand mean (the reported “grand mean” $\hat{γ}$ is a so-called empirical Bayes’s estimate or posterior mean).⁵ Whether or not the estimated variance, denoted by $\hat{τ}$ , of an estimated u-term becomes significant is an empirical question (cf. also Snijders and Bosker 2012:43-44). The lower the differences between the estimators for the respondent-specific βs (i.e., the $\hat{β}$ s) and the estimator for a respective grand mean (i.e., the $\hat{γ}$ ) as captured by an estimated u-term, the lower the likelihood that the estimated variance component will become significant.⁶ Now, the less variance components become significant, that is, the more homogeneous the answer behavior of the respondents, the more the estimators for a simple random design can benefit from the fact that the design is becoming asymptotically more orthogonal and more balanced within the limits of an increasingly smaller sampling error (cf. also Dülmer 2007:393-94). Only in the extreme case that none of the estimated variance components becomes significant and all u-terms can be dropped (fixed), that is, in cases where the multilevel regression is reduced to a simple ordinary least squares (OLS) regression, can the estimators for the simple random design fully benefit from the fact that the design is becoming asymptotically more orthogonal and more balanced (in contrast to multilevel analyses, OLS regression is estimated exclusively on the basis of the pooled data matrix).

At the other extreme, that is, in cases where the answer behavior of the respondents is completely heterogeneous as in equation (1), the grand mean $\hat{γ}$ (posterior mean) of the intercept as well as of all slopes is (strongly) based on the average of the respective βs estimated for each individual respondent on the basis of individual vignette sets (separate respondent-specific data matrices). Based on these considerations, it becomes clear that the statistical characteristics of each individual vignette set cannot be ignored. Randomly sampled vignette sets of relatively small sample size in particular, within the limits of the sampling error, sometimes not only show very high mutual correlations between vignette variables, but the levels of each vignette variable can also be very unbalanced. Based on such vignette sets, respondent-specific βs can only be estimated with a high standard error. In extreme cases, a vignette set might even contain a constant or a vignette variable (or an interaction term), which is a linear combination of other vignette variables. Such unsuitable vignette sets have to be excluded before the survey is carried out since otherwise it would no longer be at all possible to estimate all theoretically relevant βs for each individual respondent. Based on such considerations, it would seem to be desirable to search for designs that are restricted to vignette sets that try to optimize the precision (i.e., minimize the standard error) with which respondent-specific βs can be estimated. Quota designs offer such an alternative, compared with simple random designs.

Quota Designs

The basic idea behind quota designs is to represent the whole vignette universe with only one or relatively few different vignette samples which cover the central features of the vignette universe as accurately as possible. For this purpose, a complete knowledge of the statistical properties of the vignette universe is needed. In general, quota designs can be divided into the classical fractional factorial designs and the less well-known D-efficient designs (cf. Dülmer 2007:386). Both approaches are restricted to using the same vignette set for each participant. Fractional factorial designs are constructed by aliasing (cf. Alexander and Becker 1978:96-97; Gunst and Mason 1991:48; Steiner and Atzmüller 2006:126), that is, confounding main effects with higher order interaction effects within one vignette set. Since a perfectly aliased higher order interaction effect can no longer be separated statistically from the main effect (the correlation between the vignette variable and the interaction term is 1 or −1), one has to assume that the aliased interaction effect is negligible. If this is not the case, then the estimator for the main effect will be systematically biased. One of the main differences between fractional factorial design and D-efficient designs consists in the fact that by somewhat relaxing the classical requirement of orthogonality of vignette factors (including interaction terms that are assumed to have a nonnegligible impact), a D-efficient design can be found for every set size. Relaxing the requirement for orthogonality frequently allows the balance of a design to be improved, an aspect that is sometimes sacrificed by fractional factorial designs in order to preserve orthogonality (mutual uncorrelatedness between vignette variables of different vignette factors including nonnegligible interaction terms). Balance and orthogonality, however, are both important for optimizing the precision with which βs can be estimated (cf. Kuhfeld et al. 1994:545).

Further possibilities arise if different quota designs are combined. A confounded factorial design (cf. Atzmüller and Steiner 2010:132; Kirk 1995:587-664; Steiner and Atzmüller 2006:132-33) is a design where different fractional factorial designs are used for a survey. An advantage of using a confounded factorial design over a simple fractional factorial design is that by using more than one fractional factorial design, confounding interaction effects with set effects becomes possible. A higher order interaction term that has no variation within a vignette set of a fractional factorial design is perfectly aliased with a set effect. A set effect is the specific influence which can be caused by the context of the vignette set as a whole. The common effect of the higher order interaction effect and the set effect is captured by the intercept of a regression model estimated for the fractional factorial design. If the higher order interaction effect, for instance, of dichotomous effect-coded vignette variables is a positive constant (i.e., 1) within one fractional factorial design and a negative constant (i.e., −1) within another fractional factorial design, then including a set variable allows the common effect of the higher order interaction effect and the set effect to be captured across the two fractional factorial designs. Disentangling both effects statistically, however, is impossible because in this case the higher order interaction effect is perfectly confounded with the set effect. Furthermore, if a main effect of a vignette variable is perfectly positively aliased with an interaction effect between different vignette variables within one fractional factorial design, and perfectly negatively aliased with the same interaction effect within another fractional factorial design, then the common impact of the perfectly confounded interaction effect and the set effect can be captured by estimating a two-way interaction effect between the respective vignette variable on the one hand and the set variable on the other hand. By controlling for the set variable and the two-way interaction term between the vignette variable and the set variable, the estimated intercept, as well as the estimated main effect of the vignette variable are protected against potential systematic bias that otherwise might result from a chosen design. Finally, one might also opt for using a confounded D-efficient design. Choosing a simple D-efficient design over a simple random design is done with the intention to optimize the precision (by preserving the unbiasedness) with which respondent-specific βs of the vignette variables (including nonnegligible interaction effects) can be estimated. Combining the vignette sets of a simple D-efficient design, in contrast to a simple random design however, neither improves the balance of the levels of each vignette variable nor does it reduce mutual correlations between vignette variables (cf. also Dülmer 2007:393). This weakness of simple D-efficient designs can be overcome by using a confounded D-efficient design, which consists of different simple D-efficient designs of the same D-efficiency. Existing imbalances as well as correlations among vignette variables (including nonnegligible interaction terms) are leveled off for the confounded D-efficient design across different vignette sets. In this way, the combined confounded D-efficient design covers the central features of the fully crossed vignette universe better than the simple D-efficient design: The D-efficiency of the combined confounded D-efficient design is higher than the D-efficiency of a comparable, simple D-efficient design and it is at least as D-efficient as a combined simple random design. To what extent the estimators of a confounded D-efficient design can be improved by the comparably higher D-efficiency of the combined design, however, depends again on the heterogeneity of the respondents’ answer behavior: The higher the differences between respondent-specific $\hat{β}$ s, that is, the more heterogeneous the answer behavior of the respondents (the more estimated u-terms become significant), the less the estimators of a confounded D-efficient design benefit from the fact that the combined confounded D-efficient design better covers the central features of the fully crossed vignette universe than a simple D-efficient design. Heterogeneous answer behavior does on the other hand affect the estimators of the simple random design, mostly since in this case the individual vignette set has on average the lowest D-efficiency of all designs.

Fractional factorial designs

Vignette universes can be symmetrical or asymmetrical (cf. Addelman 1962:21): Symmetrical vignette universes involve factors which all occur with the same number of levels (for instance, three factors each having two levels), and asymmetrical vignette universes include factors with different numbers of levels (for instance, two factors with two levels and one factor with three levels). A central property of all vignette universes, independent of whether they are symmetrical or asymmetrical, is that each level of a vignette variable (factor) occurs within the fully crossed vignette universe equally frequently (balanced) and that the variables of different vignette characteristics are mutually uncorrelated (orthogonal). For symmetrical vignette universes, fractional factorial designs can sometimes be found that fulfill both criteria (balanced orthogonal designs; cf. Addelman 1962:23) within the limits of a reasonable set size. In other cases, one might also opt for a design with an unbalanced number of variable levels. A necessary and sufficient condition for orthogonality within a chosen fractional factorial design is that the levels of one vignette variable occur with each level of the other variables with proportional frequency (unbalanced orthogonal designs; cf. Addelman 1962:23). All fractional factorial designs guarantee that at least the main effects of each vignette variable can be estimated mutually uncorrelated for each individual respondent.

Sometimes it is possible to generate fractional factorial designs where at least each main effect is completely aliased (cf. Alexander and Becker 1978:96; Atzmüller and Steiner 2010:131; Gunst and Mason 1991:48; Steiner and Atzmüller 2006:126) with a higher order interaction effect (no aliasing of main effects, for instance, with two-way interaction effects). Among such designs are primarily designs where the number of levels is the same for all vignette variables. The main effects of other fractional factorial designs are frequently only partially aliased with higher order interactions. This means at the same time that they are aliased with several instead of only one higher order interaction. Most frequently affected are designs with a different number of levels per vignette variable (mixed fractional factorial designs, cf. Lawson 2002:228; Ryan 2007:264, 274-75).

Complete aliasing means that, because the affected variables are mutually perfectly correlated, their respective effect can no longer be separated statistically. An estimated main effect therefore also captures the whole influence of the interaction effect with which the main effect is perfectly aliased. A partially aliased main effect, corresponding to the correlation between the vignette variable and a respective interaction term, captures only a part of the influence of the affected interaction effect. This, however, holds for all interaction effects with which a main effect is partially aliased. To retain the interpretability of an estimated main effect, aliasing is only permissible under the condition that the aliased interaction effects are at least negligible relative to the affected main effect (cf. Gunst and Mason 1991:41-42).

In order to choose between different fractional factorial designs of the same set size, knowledge of a design’s resolution might be helpful: For Resolution III designs only main effects can be estimated uncorrelated, for Resolution IV designs main effects are also uncorrelated with first-order interaction effects (i.e., two-way interactions between different vignette variables), whereby some two-way interactions are aliased with each other. A Resolution V design allows estimating all main effects as well as all first-order interaction effects mutually uncorrelated (cf. Kuhfeld 2010:58; McLean and Anderson 1984:40; Ryan 2007:169-70). If for a given set size fractional factorial designs of different resolutions are available, then the higher resolution design should be preferred in cases where first-order interaction effects cannot be excluded on the basis of a priori knowledge. Interaction effects of higher order will be found very rarely in social science (Kirk 1995:627; Louviere 1988:40) and in practice are mostly seen as negligible.

Constructing balanced fractional factorial designs for symmetrical vignette universes can be done on the basis of the modular arithmetic (division with integer remainder whereby the remainder determines the assignment to a vignette set, cf. for instance, Kirk 1995:590-94; McLean and Anderson 1984) applied to equation systems for the vignette variables. How to produce an unbalanced fractional factorial design for asymmetrical vignette universes is described by Addelman (1962, cf. also Backhaus, Erichson, Plinke, and Weiber 2000:575-76). However, fractional factorial designs are usually generated in practice via computer programs like SPSS (“orthogonal design”) or SAS (PROC FACTEX, Macro %MktOrth or %MktEx), or via ready-made construction plans to be found in the literature (for instance, Gunst and Mason 1991, whose construction plans also inform about the interaction effects that can be estimated by using a particular design).⁷

Confounded factorial designs

A confounded factorial design (cf. Atzmüller and Steiner 2010:132; Kirk 1995:587-664; Steiner and Atzmüller 2006:132-33) can be constructed by dividing the whole vignette universe into distinct fractional factorial designs of the same set size (cf. also Gunst and Mason 1991:42-44, 49-50).⁸ The resulting fractional factorial designs each have to be replicated by the same factor. Thereafter, the sets have to be assigned randomly to the respondents. Whereas aliasing is concerned with effects within a fractional factorial design, confounding interaction effects with set effects is done across different fractional factorial designs (each distinct fractional factorial design is called a set). A higher order interaction effect which cannot be estimated within a fractional factorial design now becomes perfectly confounded across distinct fractional factorial designs with a set effect. Since perfectly confounding means that both effects cannot be statistically separated, only those higher order interaction effects that are assumed to be negligible should be confounded with a set effect. Including the set variable in the multilevel model, however, allows the common effect to be estimated of the confounded higher order interaction term and the set variable with which the higher order interaction effect is perfectly confounded. Furthermore, if, and as long as, the confounded higher order interaction effect not anticipated by the researcher is at least plausible, and the researcher has no plausible theoretical explanation for a set effect, he or she has good reasons to assume that the set effect is indeed negligible compared to the confounded interaction effect. In this case, the common effect of the confounded interaction effect and the set effect as captured by the set variable gives at least a rough hint about the magnitude of the confounded interaction effect. This information is useful for improving the design for further surveys with the factorial survey. If, on the other hand, the confounded interaction effect is not plausible and the researcher finds a good substantive argument for a set effect, the latter interpretation might be chosen as plausible explanation. Finally, if no plausible explanation can be found, the researcher should leave the question open for further research. Another limitation of confounding an interaction effect between different vignette variables with a set effect across distinct fractional factorial designs is that the set variable has no variability within a vignette set. Therefore, even confounding does not allow the affected respondent-specific interaction effects to be estimated (even if the set variable has no, or only a negligible effect). Instead, including such set variables (and interaction terms with set variables) serves the purpose of estimating all theoretically relevant main and interaction effects of interest without any aliasing (cf. Steiner and Atzmüller 2006:126).

For asymmetrical vignette universes especially, where usually no ready-made designs exist (cf. Atzmüller and Steiner 2010:134), it quickly becomes extremely difficult or even impossible to find a suitable confounded fractional factorial design where the set size is not too large to be judged by respondents. As a consequence, partial confounding of interaction effects with set effects is sometimes unavoidable (cf. Atzmüller and Steiner 2010:133-34).⁹ As with constructing confounded factorial designs, constructing partially confounded factorial designs always requires that theoretically relevant main and interaction effects will be confounded/partially confounded only with those effects that are assumed to be negligible (cf. Steiner and Atzmüller 2006:125, 132-33).

In order to construct relatively simple confounded fractional designs, Kirk (1995:587-659) is recommended reading. The designs that can be found there, however, are not always suitable for vignette analysis: the reason being that some of them have too few degrees of freedom for estimating all main effects, including the intercept, via respondent-specific regression analysis. For analyzing factorial surveys via multilevel analysis, this is necessary in order to test whether all respondent-specific $\hat{β}$ s deviate by a respondent-specific $\hat{u}$ -term from their respective grand mean $\hat{γ}$ (a further degree of freedom is required for estimating the variance of the residuals $r_{i j}$ at the vignette level). The only way to construct a complex design is frequently to systematically work on an existing but not satisfying design (cf. Atzmüller and Steiner 2010:134; Steiner and Atzmüller 2006:137). This task, however, is in general a demanding and a time-consuming one.

D-efficient designs

The use of quota designs that have been discussed up to now is restricted by mathematical rules of divisibility applied to the number of levels per vignette variable. Sometimes no fractional factorial design might exist for a reasonable maximum number of vignettes per respondent. By relaxing the classical requirement of perfect orthogonality, a D-efficient design can be generated for every set size.

The reason for modifying the classical criteria for constructing quota designs is, according to Kuhfeld et al. (1994:545), that orthogonality is only a secondary goal that has to be subordinated to the primary goal of minimizing the standard error of the parameter estimates. Since balanced orthogonal designs allow all effects (intercept, main effects, and nonnegligible interaction effects) to be estimated uncorrelated, they represent the vignette universe most adequately. With such designs as reference, D-efficiency is a standardized measure that takes into account orthogonality and balance. The formula for D-efficiency (cf. also Kuhfeld et al. 1994:547) is given by:

D - e f f i c i e n c y = 100 \cdot \frac{1}{N_{D} \cdot {|(X^{'} \cdot X)^{- 1}|}^{\frac{1}{p}}} = 100 \cdot (\frac{1}{N_{D}} \cdot {|X^{'} \cdot X|}^{\frac{1}{p}}),

where N_D denotes a design’s set size, $X^{'} \cdot X$ denotes the information matrix of the vignette variables including the intercept, and p denotes the number of βs including the intercept that will be estimated. If all vignette variables are standardized orthogonally contrast coded (cf. Kuhfeld 2010:74), then D-efficiency is in the range from 0 to 100 (cf. Kuhfeld et al. 1994:547, 549). Every deviation from balance or from orthogonality produces values lower than 100. Classical fractional factorial designs sacrifice balance to preserve orthogonality in cases where no balanced orthogonal design exists. Under the same conditions, search algorithms for D-efficient designs try to find an optimally efficient solution between perfect orthogonality and balance. For this reason, D-efficient designs most of the time deviate at least slightly from orthogonality.

D-efficiency measures the goodness of a selected design relative to a balanced orthogonal design. For designs where only qualitative variables are included, a D-efficiency of 100 at least provides a rough reference for the generated design, even if such a design may be far from being possible for a given research question. If quantitative variables that are not standardized orthogonally contrast coded have to be included, then D-efficiency is no longer restricted to a maximum of 100. A design comparison, however, is possible as long as the same coding is used (cf. Kuhfeld et al. 1994:548-49). Since increasing the set size does not necessarily result in higher or at least the same D-efficiency (i.e., D-efficiency does not follow a monotone rising function), it is recommended to compare designs with different set sizes (cf. Dülmer 2007:394).

Generating D-efficient designs requires computer programs like SAS (ADX “Optimal Design,” PROC OPTEX, or the Macro %MktEx) or the conjoint value analysis (CVA) module of Sawtooth Software.¹⁰ Since nonexhaustive search algorithms are used, a computer program may fail to find the optimal design, even if the search algorithm is carried out several times (cf. Kuhfeld et al. 1994:547; Sawtooth Software 1997–2002:7-10). Repeated searches are therefore recommended, especially for complex designs. D-efficiency will be computed automatically by the programs mentioned above.

Confounded D-efficient designs

Constructing confounded D-efficient designs is mainly motivated by the intention to improve the precision (minimizing the standard error) for estimating unstandardized regression coefficients in comparison to simple random as well as simple D-efficient designs in cases where no suitable confounded factorial design exists or at least cannot be found. Generating confounded D-efficient designs is a two-step procedure: In the first step, it is recommended to generate a number of D-efficient designs and to select one to be used for the survey. By using a D-efficient design as a starting point, it is ensured that, compared to simple random designs, the precision with which respondent-specific βs can be estimated is optimized. The purpose of the second step is to further optimize the D-efficiency, this time across different vignette sets, by adding D-efficient designs of same D-efficiency that are constructed manually on the basis of the selected simple D-efficient design.

Constructing such additional designs can be done by permuting the assignment between the levels of the vignette variables with their respective characteristics. Adding such D-efficient designs of same efficiency across different vignette sets not only increases the balance of the vignette variables but at the same time also reduces the correlations between the vignette variables. If sufficient such different simple D-efficient designs of same D-efficiency are constructed, then maximal possible D-efficiency will be reached across the combined sample: The levels of each vignette variable appear exactly equally frequently (balanced), and the variables of different vignette characteristics are uncorrelated (if interaction terms were included when the D-efficient design was generated, then the same applies to the interaction terms too). By further increasing the number of different D-efficient designs, the confounded D-efficient design will even show the same statistical characteristics as the fully crossed vignette universe (i.e., vignette variables and all interaction terms are balanced and orthogonal). Since balance and orthogonality are only reached across different D-efficient vignette sets, the combined confounded D-efficient design only reaches exactly maximal possible D-efficiency under the condition that each simple D-efficient design of which the confounded D-efficient design consists, is answered by the same number of respondents (N_D in equation (2) denotes in this case the total number of vignettes judged by the respondents). How much the estimators of a confounded D-efficient design can benefit from the fact that the combined confounded design possesses a higher D-efficiency than a simple D-efficient design depends again on the heterogeneity of the respondents’ answer behavior: The more homogeneous the answer behavior of different respondents (i.e., the more u-terms can be fixed), the more estimators of a confounded D-efficient design benefit from the fact that the pooled data matrix of the confounded D-efficient design is (at least nearly) balanced and orthogonal.

Applicability

After introducing the basic ideas and main variants of random and quota designs, it is important to discuss, at least in short, the applicability of quota designs in cases where the vignette universe includes logically impossible combinations of vignette characteristics. Logically impossible combinations, for instance, will generally exist when education and occupation are important characteristics of a described fictitious vignette person (cf. Alves and Rossi 1978:545; Jasso and Rossi 1977:642; Nock 1982:104). Since a minimum level of education is required to be qualified for certain occupations, one has to remove vignettes with logically impossible combinations from the fully crossed vignette universe (cf. Jasso 2006:343). Excluding such vignettes from simple random designs in general only increases the correlations between affected variables, but by doing so, fractional factorial designs generally also lose orthogonality between affected variables as well as frequently between affected and unaffected variables (cf. Dülmer 2007:390). Hence, fractional factorial designs and confounded fractional factorial designs would not be applicable any longer. Choosing a D-efficient design might under such conditions be a viable alternative. Although excluding logically impossible combinations will always reduce the D-efficiency of a chosen design, the loss will sometimes be very small for D-efficient designs (cf. Kuhfeld et al. 1994:551). Furthermore, the existence of logically impossible combinations might frequently also prevent a suitable confounded D-efficient design from being found.

Reliability and Internal Validity

Reliability

Designs that allow unstandardized regression coefficients to be estimated with a lower standard error and for that reason with a higher precision produce ceteris paribus more reliable results than other designs. Therefore, the key to understanding the reliability of the results is the formula for estimating the standard error of $\hat{β}$ s in multiple OLS regression analysis (cf. Fox 1991:7-8; Thome 1990:166-67):

\hat{σ} ({\hat{β}}_{1}) = \sqrt{\frac{\sum_{i = 1}^{n} e_{i}^{2} / (n - k - 1)}{\sum_{i = 1}^{n} {(X_{1 i} - {\overset{ˉ}{X}}_{1})}^{2} \cdot (1 - R_{X_{1}; X_{2}, X_{3} \dots X_{k}}^{2})}},

where

$\hat{σ} ({\hat{β}}_{1})$ is the estimated standard error of the estimated unstandardized regression coefficient of X ₁,

$\sum_{i = 1}^{n} e_{i}^{2} / (n - k - 1)$ is the estimated error variance, that is, the quotient of observed error variance and the remaining degrees of freedom (n refers to the set size, k to the number of estimated β-coefficients for the included predictor variables),

$\sum_{i = 1}^{n} {(X_{1 i} - {\overset{ˉ}{X}}_{1})}^{2}$ is the variation of X ₁ across the vignettes 1 to n, and

$R_{X_{1}; X_{2}, X_{3} \dots X_{k}}^{2}$ is the coefficient of multiple determination of the predictor variables X ₂ to X_k on X ₁.

Besides the reliability of a respondent’s answer behavior (observed error variance), a $\hat{β}$ ’s estimated standard error depends on three components that are under the control of a researcher: These are the number of the remaining degrees of freedom for estimating the respondent-specific error variance, the variation of the vignette variables, and the multiple coefficient of determination among the vignette variables.

The higher the set size, the higher the number of degrees of freedom for estimating the error variance for a given number of vignette variables and the lower the estimated standard error for a respondent-specific $\hat{β}$ . So a higher set size contributes to increased reliability. On the other hand, it might also contribute to fatigue in respondents (cf. also Jasso 2006:349), which reduces the reliability of the answers and as a result increases the estimated error variance. This trade-off relationship has to be taken into account before generating the vignette samples. An advantage of simple random designs and D-efficient designs over fractional factorial designs and confounded factorial designs is that they exist for every set size.¹¹

The remaining two components are functions of the two factors in the denominator of equation (3): The higher the variation of the vignette variables and the lower the coefficient of multiple determination among the vignette variables, the lower ceteris paribus the estimated standard error of a respective $\hat{β}$ . For the simple random design, the variation of the vignette variables as well as the coefficient of multiple determination differ from vignette set to vignette set. Hence, the β of a vignette variable can be estimated for each vignette set only with different standard errors and for that reason only with different reliability. As a consequence, part of the $\hat{β}$ ’s observed variation across respondents results exclusively from using different vignette sets (sampling variation) and cannot be traced back to substantially different answer behavior (cf. Hox, Kreft, and Hermkens 1991:501). Standardizing the vignette samples, as in fractional factorial designs or D-efficient designs, has the advantage that it eliminates this source of random error. This is especially important for small set sizes where the βs will generally be estimated with high standard errors.

If for a given set size no balanced orthogonal design exists, then D-efficiency will be maximized for fractional factorial designs exclusively with respect to orthogonality. Perfect orthogonality reduces the multiple coefficient of determination among the vignette variables to zero, whereby the right-hand factor in the denominator of equation (3) reaches its maximum of 1. Under the same conditions, search algorithms for D-efficient designs try to optimize both factors of the denominator simultaneously, whereby the request for perfect orthogonality is relaxed. Consequently, it becomes clear that for a given set size fractional factorial designs as well as D-efficient designs allow the individual βs to be estimated with higher reliability than an average set of a simple random design.

The comparisons up to now have focused on individual vignette samples and not on the combined vignette sample. Adding the same fractional factorial designs or the same D-efficient designs changes neither the degree of balance nor the correlations among the vignette variables. Hence, each quota set will show the same D-efficiency as the combined sample of all used vignette sets. The situation changes if different unbalanced fractional factorial designs, different D-efficient designs, or different vignette sets of a simple random design are added. If for the unbalanced variables (factors) of an unbalanced fractional factorial design the assignment between a vignette characteristic and its numerical representation (indicator variables for the nominal scaled variables) is permuted in order to generate further designs, then the combined vignette sample can also reach the maximal D-efficiency of 100 (for designs exclusively including qualitative variables). This requires, however, that each vignette set is judged by the same number of respondents. The same procedure can also be applied to D-efficient designs, although the correlations among the vignette variables make the task in this case somewhat more demanding. Furthermore, adding different vignette sets of a simple random design at least asymptotically increases the balance and uncorrelatedness of the combined sample. Therefore, with an increasing respondent number, the D-efficiency of the combined vignette sets asymptotically approaches 100 within the limits of the sampling error. The only design that for each individual vignette set, as well as for the combined sample of all completely judged vignette sets, possesses the maximal D-efficiency of 100 remains, however, the balanced fractional factorial design (the same, of course, also applies to a confounded factorial design consisting of different balanced fractional factorial designs). From this point of view it is an ideal design that produces the most reliable results.

How much the estimators of a factorial survey can benefit from the fact that the combined vignette sample (pooled data matrix) of a random or a confounded design have a higher D-efficiency than a single vignette set depends on the heterogeneity of the respondents’ answer behavior. Heterogeneity means that the respondent-specific estimated βs significantly differ across the respondents. Such unexplained context effects are modeled in multilevel analysis by including random terms that capture respondent-specific deviations from the grand mean of the intercept or the slope of a vignette variable. If no significant context effects exist, then multilevel analysis ends up with conventional OLS-regression. Under this condition, the unstandardized regression coefficients are estimated on the basis of the pooled data matrix where the D-efficiency of the factorial survey exclusively corresponds to the D-efficiency of the combined sample of all vignette sets. In this case, simple random designs with an increasing number of vignette sets within the limits of a decreasing sampling error easily produce more reliable results than unbalanced fractional factorial designs and D-efficient designs. This especially applies to factorial surveys where a relatively low set size is used. If on the other hand the answer behavior of different respondents is very heterogeneous, then the estimated γs (posterior means) are much more strongly based on the D-efficiency of the single vignette sets. Since unexplained context effects caused by unmeasured respondent-level variables are very likely to occur in factorial surveys, the estimators of simple random designs can only partly benefit from the fact that the design is becoming asymptotically more orthogonal and more balanced. Especially for small set sizes with a high D-efficiency and a relatively heterogeneous answer behavior, both fractional factorial designs and D-efficient designs will allow respondent-specific βs to be estimated with higher reliability than simple random designs (cf. Dülmer 2007:395-96, 405). The same also applies to a comparison between confounded factorial designs or confounded D-efficient designs on the one hand and simple random designs on the other.

Internal Validity

High reliability is necessary but not sufficient for high validity (cf. for instance, Neuman 2006:196-97). This at least applies to the reliability of the measuring instrument (i.e., the chosen design), the focus of the present article, and to internal validity. Accordingly, even a highly reliable estimator might suffer from a high systematic bias and for that reason might have low internal validity. Although each individual vignette set of a quota sample reaches a higher D-efficiency and therefore the respondent-specific βs can be estimated with a higher reliability than an average set of a random vignette sample, both fractional factorial designs and D-efficient designs have in general a somewhat higher susceptibility to systematic biases caused by interaction effects between vignette variables that were not expected in theory and so were not included when the design was generated. If for a fractional factorial design a main effect is perfectly aliased with a higher order interaction effect, then the estimated main effect captures the whole influence of the interaction effect. So, if an unexpected interaction effect should have a nonnegligible effect (due to perfect aliasing this cannot be tested afterward), then the main effect with which the interaction effect is perfectly aliased will suffer from a corresponding bias. The internal validity of the results is endangered. For a D-efficient design, the influence of a not-included but nonnegligible interaction effect will be distributed corresponding to its correlation with the vignette variables across several main effects. Therefore, the bias that results for an individual main effect is less severe for a D-efficient design than for a fractional factorial design.¹² The fact that at least subsequent testing for unexpected higher order interaction effects is frequently impossible for fractional factorial (perfect aliasing) and D-efficient designs probably explains why such designs are sometimes seen as having a somewhat lower internal validity than random designs. An advantage of simple random designs and confounded designs is that they at least allow for the impact of such interaction effects to be tested across different vignette sets. But such tests—although at least possible—also have certain limitations.

For simple random designs, all main and interaction effects of a vignette set are randomly aliased. If a nonnegligible interaction term is not included when the vignette sets are generated, then the respondent-specific β of a vignette variable will, depending on the set-specific correlation with the interaction term, be either under- or overestimated, or almost correctly estimated. Since the aliasing structure of each vignette set of simple random designs is generated randomly, with an increasing number of respondents, the set-specific error across all included vignette sets and within the limits of the sampling error will asymptotically approach zero. Therefore, a simple random design allows an unbiased estimation of the γs of all included vignette variables. So, although the differences in the aliasing structure of the vignette sets of a simple random design contribute to an increase in random errors, they will not cause a systematic bias of the estimated γs.¹³

However, if an unanticipated interaction effect is nonnegligible in comparison to a respective main effect, then it becomes impossible to interpret conditional effects (a conditional effect is the main effect of one vignette variable under the condition that the other vignette variable of the interaction term reaches a specific value that the researcher is interested in, cf. Friedrich 1982; Thome 1991). In contrast to fractional factorial designs and D-efficient designs, with simple random designs it is possible, at least in principle, to test afterward for virtually all interaction effects. This, however, strictly speaking only applies to situations where the answer behavior of the respondents empirically justifies fixing the respective random components. The test of whether such a random component is needed or not (cf. Hox 2010:47) is based on the number of individual respondents for which the randomly selected vignette set allows a respective interaction effect to be estimated. For simple random designs, this is generally only possible for a part of all respondents. It follows that, the lower the number of such respondents, the less reliable the significance test will be for all between-respondent variance components included (listwise exclusion). If none of the individual vignette sets allows an interaction effect to be estimated, then the γ can only be estimated without a variance component, that is, across vignette sets, by assuming that the $\hat{β}$ is the same for all respondents. As a consequence, the t value for the common $\hat{γ}$ is based on the number of vignettes instead of on the much lower number of respondents (in the former case minus the degrees of freedom needed for estimating the γs of the whole multilevel model, in the latter case minus the degrees of freedom needed for estimating the γs that predict a random $\hat{β}$ ). Although the $\hat{γ}$ of an unanticipated interaction term (as well as the conditional effects) will not suffer from systematic bias, the t value will be too high in cases where the variance component would be needed but cannot be estimated. This is a limitation of simple random designs.

From the start, the confounded factorial design guarantees that the estimated γs of the vignette variables (including the intercept) are protected against a potential systematic bias that otherwise might be caused by a chosen design. If a suitable confounded factorial design exists which allows all important main and interaction effects to be estimated without any confounding, then respondent-specific $\hat{β}$ s will show no design-related systematic biases. This applies as long as all set variables are controlled for. Hence, the protection against potential biases, which might result in a higher internal validity, is an advantage of confounded fractional designs over fractional factorial designs and D-efficient designs.

However, if an unanticipated interaction effect captured across vignette sets via a set variable turns out to have a nonnegligible size in comparison to a respective main effect, then the unanticipated interaction effect¹⁴ (as well as the conditional effects) will not be biased (provided that no or only a negligible set effect exists) but the t value again might be misleading. Depending on whether the γ of a main effect is estimated with a random component or not, the t value of the $\hat{γ}$ of the set variable that captures the impact of a respective interaction effect is based on the number of respondents or on the much higher number of vignettes (minus the degrees of freedom needed for estimating the γs that predict a random $\hat{β}$ or for estimating the γs of the whole multilevel model, respectively). An independent test of whether or not the γ of the interaction term has to be estimated with its own variance component cannot be carried out (set variables have no within respondent variability). These considerations show the limitations of confounded fractional factorial designs.

The gain in internal validity of confounded factorial designs over simple random designs consists in the higher D-efficiency of each individual fractional factorial design of which a confounded factorial design consists (this applies to each individual set comparison except the one where a fractional factorial design or sometimes even a D-efficient design is part of the simple random design)¹⁵; at the same time, including all set variables needed to protect the estimated γs of the vignette variables of a confounded fractional factorial design against design-related potential systematic biases also reduces the variance components of the respondent-specific $\hat{β}$ s. Therefore, respondent-specific βs can be estimated with higher reliability and for this reason also with higher internal validity for the confounded factorial design than for a simple random design.

If no suitable confounded factorial design can be found, then a confounded D-efficient design may be used instead. The individual vignette set of such designs show—like D-efficient designs, fractional factorial designs, and confounded factorial designs—a higher D-efficiency than an average vignette set of simple random designs. The gain in internal validity of the confounded D-efficient design over D-efficient designs and fractional factorial designs is that the risk of design-related systematic biases of the reported $\hat{γ}$ s across the vignette sets is zero. This requires, however, that the number of respondents is the same for each D-efficient design of which the confounded D-efficient design consists (this requirement is not needed for confounded factorial designs as long as all set variables are included in the estimated multilevel model). The less this requirement is fulfilled and the higher the correlations between vignette variables and unanticipated nonnegligible interaction terms not included in a chosen D-efficient design, the higher the design-related systematic bias of the reported $\hat{γ}$ s will be. As with simple random designs, unanticipated higher order interaction effects can in general only be estimated without a variance component, that is, across different D-efficient designs, by assuming at the same time that the respondent-specific $\hat{β}$ s of an interaction effect do not differ significantly (homogeneous answer behavior). Here again, the t value will be too high in cases where a variance component would be needed but cannot be estimated. All in all, these considerations underline again how important it is that nonnegligible interaction effects can be estimated for each individual vignette set.

Empirical Design Comparison Conducted Using the Example of the Four Inglehart Items

Operationalizations and Data

In order to illustrate the expected differences in the reliability and internal validity between different designs, a parsimonious design is most suitable since each respondent can be presented with the fully crossed vignette population (full factorial design). One of the main advantages of this procedure is that the estimates from the full factorial design can be used as the benchmark for the intended design comparison. Based on the four items that are used to measure Ingleharts’s materialist and postmaterialist value orientations, such a parsimonious full factorial design can easily be generated.

According to Inglehart (1994:290-91), materialist and postmaterialist value priorities can only be measured adequately via a ranking procedure. This assumption has been criticized by authors like Bürklin, Klein, and Ruß (1996) as theoretically inadequate. If a political system is not generally assumed to have limited capacities for problem solving, then there is no trade-off relationship between the two items “protecting freedom of speech” and “fighting rising prices” that would justify a ranking procedure. The factorial survey makes it possible to avoid such restrictions by presenting respondents with different vignettes with a list of the four Inglehart items. The task of the respondents could be to indicate how much they would like to be governed by a party for which the listed items are either not so important or very important. The factorial survey also reduces the problem of response sets (in many cases response sets can be easily identified), frequently observed in simple rating procedures, which have been criticized also for this reason by Inglehart (1997:116-17) as an unsuitable alternative for measuring both value orientations.¹⁶ For illustration purposes, an example vignette with an introduction to the task is given in Table 1.

Table 1.

Example Vignette for Measuring Inglehart’s Value Orientations with Introduction.

The vignette universe of four items with each having two levels (“important” and “not so important”) consists of 16 (= 2⁴) vignettes. To prevent possible order effects from causing systematic bias to the estimators, the vignette order of all questionnaires was randomly selected (cf. also Jasso 2006:343; Rossi and Anderson 1982:33). The paper-and-pencil interviews with the full factorial survey (completely crossed vignette universe), which was presented to each participant were carried out on October 16, 2006, during the first session of two identical methodological courses designed for students of the Faculty of Economic and Social Sciences at the University of Cologne. In total, 137 students participated in the interviews. Five questionnaires were not sufficiently filled out and had to be excluded from analysis. Of the remaining 132 participants, 72 were female and 60 male.

The empirical comparison includes, besides the full factorial design, two random and two quota designs. The required random and quota designs were produced by dropping vignettes from the full factorial design. For analyzing random and quota designs, these subsamples/fractions of the vignette universe, including the corresponding fraction of respondent-specific answers from the full factorial design, were used. Therefore, the statistical comparison of random and quota designs is based on these fractions of the complete data set. The complete knowledge about the answers from the full factorial survey has the advantage that it allows an empirically based yardstick to be established for the empirical comparison of the reliability and internal validity of different designs. By using this setting, all possible other sources of disturbances not related to a selected design itself are eliminated.

Since Inglehart’s (1977:28-29) value types are assigned on the basis of the two items that a respondent ranked highest out of the four items, it is assumed that interaction effects are negligible. Therefore, all reduced designs included in the comparison are generated as main effect only designs. The reported D-efficiency in the following will refer to this model. Nonetheless, only those designs that should be especially robust against the influence of not included interaction effects between vignette variables will be compared. In order to reduce the likelihood of comparing outliers and at the same time to get a measurement for the stability of the results, each of the four reduced designs was produced 50 times, whereby the required vignette sets were each time randomly assigned to the respondents. Therefore, the reported results for each reduced design will be based on 50 multilevel regressions.

The first quota design included in the comparison is a confounded factorial design. This design was constructed by dividing the vignette universe along the value of the highest interaction term (X ₁·X ₂·X ₃·X ₄) into two sets of eight vignettes. Since, with regard to the main effects, each of the resulting half-fractional factorial designs is orthogonal and balanced, both set 1 and set 2 already possess a D-efficiency of 100. Dividing the vignette universe along the product term of the four vignette variables ensures that both vignette sets are of Resolution IV (cf. Ryan 2007:170). Table 2 gives an overview of the confounding pattern of the confounded factorial design.

Table 2.

Overview of the Aliasing and Confounding Pattern of the Confounded Factorial Design.

	Independent Variables				Aliasing/Confounding Pattern
Vignette Number	X ₁	X ₂	X ₃	X ₄	X ₁ X ₂ = X ₃ X ₄	X ₁ X ₃ = X ₂ X ₄	X ₁ X ₄ = X ₂ X ₃	X ₁ X ₂ X ₃ = X ₄	X ₁ X ₂ X ₄ = X ₃	X ₁ X ₃ X ₄ = X ₂	X ₂ X ₃ X ₄ = X ₁	X ₁ X ₂ X ₃ X ₄ = I ^a
Set 1
1	−	−	−	−	+	+	+	−	−	−	−	+
2	−	−	+	+	+	−	−	+	+	−	−	+
3	−	+	−	+	−	+	−	+	−	+	−	+
4	−	+	+	−	−	−	+	−	+	+	−	+
5	+	−	−	+	−	−	+	+	−	−	+	+
6	+	−	+	−	−	+	−	−	+	−	+	+
7	+	+	−	−	+	−	−	−	−	+	+	+
8	+	+	+	+	+	+	+	+	+	+	+	+
Vignette Number	X ₁	X ₂	X ₃	X ₄	X ₁ X ₂ = −X ₃ X ₄	X ₁ X ₃ = −X ₂ X ₄	X ₁ X ₄ = −X ₂ X ₃	X ₁ X ₂ X ₃ = −X ₄	X ₁ X ₂ X ₄ = −X ₃	X ₁ X ₃ X ₄ = −X ₂	X ₂ X ₃ X ₄ = −X ₁	X ₁ X ₂ X ₃ X ₄ = −I ^a
Set 2
1	−	−	−	+	+	+	−	−	+	+	+	−
2	−	−	+	−	+	−	+	+	−	+	+	−
3	−	+	−	−	−	+	+	+	+	−	+	−
4	−	+	+	+	−	−	−	−	−	−	+	−
5	+	−	−	−	−	−	−	+	+	+	−	−
Vignette Number	X ₁	X ₂	X ₃	X ₄	X ₁ X ₂ = −X ₃ X ₄	X ₁ X ₃ = −X ₂ X ₄	X ₁ X ₄ = −X ₂ X ₃	X ₁ X ₂ X ₃ = −X ₄	X ₁ X ₂ X ₄ = −X ₃	X ₁ X ₃ X ₄ = −X ₂	X ₂ X ₃ X ₄ = −X ₁	X ₁ X ₂ X ₃ X ₄ = −I ^a
6	+	−	+	+	−	+	+	−	−	+	−	−
7	+	+	−	+	+	−	+	−	+	−	−	−
8	+	+	+	−	+	+	−	+	−	−	−	−

Note: The equals sign in the table indicates which interaction terms are perfectly aliased with an X variable or another interaction term within a respective vignette set. If an interaction term is perfectly positively aliased with an X variable or another interaction term within vignette set 1 (for instance, $X_{1} \cdot X_{2} \cdot X_{3}$ is perfectly positive aliased with X ₄ within set 1) and perfectly negatively aliased with the same X variable or interaction term within vignette set 2 ( $X_{1} \cdot X_{2} \cdot X_{3}$ is perfectly negative aliased with X ₄ within set 2), then it is possible to capture the common influence of the perfectly confounded interaction term and the set variable across the two different vignette sets by including a respective set variable at the respondent level (i.e., the interaction term between X ₄ and the set variable; including this two-way interaction term allows the common influence of the interaction term $X_{1} \cdot X_{2} \cdot X_{3}$ and the set variable with which the interaction effect is perfectly confounded to be captured across both vignette sets). The same applies to a perfectly aliased interaction term that is at the same time also perfectly confounded with a set effect ( $X_{1} \cdot X_{2} \cdot X_{3} \cdot X_{4}$ is perfectly aliased with the intercept within a vignette set and perfectly confounded with the set effect). If and as long as the set effect is at least negligible, the magnitude of the estimated γ for the two-way interaction term captures the estimated magnitude of the three-way interaction effect with which the main effect is perfectly aliased (in our case, set effects are implausible because every respondent judged the same full factorial design).

^a I = intercept; “−” stands for “−1,” “+” stands for “+1.” In order to construct both half-fractional factorial designs (set 1 and set 2), the vignette universe has to be divided along the highest order interaction term $X_{1} \cdot X_{2} \cdot X_{3} \cdot X_{4}$ . Since the vignette universe is divided along a four-way interaction term, the resulting design is of Resolution IV (by choosing a three-way interaction term, one would produce a half-fractional factorial design of Resolution III). The formula for generating both vignette sets (only applicable to effect-coded dichotomous variables) is:

$X_{1} \cdot X_{2} \cdot X_{3} \cdot X_{4} = -$ and $X_{1} \cdot X_{2} \cdot X_{3} \cdot X_{4} = +$ . The same equation system can be written for 0–1 dummy coding (division with integer remainder) as:

$X_{1} + X_{2} + X_{3} + X_{4} = 0, m o d u l o 2$ and $X_{1} + X_{2} + X_{3} + X_{4} = 1, m o d u l o 2$ .

The first vignette set was assigned randomly to half of the 132 respondents; the second vignette set was assigned to the remaining respondents.¹⁷ To get different assignments, this procedure was repeated 49 times. Since for a half-fractional factorial design only 8 instead of 16 vignettes are needed, the remaining 8 vignettes were excluded from the respective analysis. In this way, the number of vignettes included in a single multilevel analysis was reduced to 1,056 (132 respondents × 8 vignettes).

If for a given set size no suitable confounded factorial designs can be found, then reliability and internal validity can be optimized instead via a confounded D-efficient design. This condition is met for a given set size of 10 vignettes. For this set size, the maximal D-efficiency for a D-efficient design is 97.032 (cf. also Dülmer 2007:394, 398). Two different examples of such designs are presented in Table 3.

Table 3.

Balance and Correlations of Two Possible D-efficient Designs of Set Size 10.

		D-efficient Design 1					D-efficient Design 2
Variable		X ₁	X ₂	X ₃	X ₄		X ₁	X ₂	X ₃	X ₄
Balance		4/6	4/6	5/5	5/5		4/6	5/5	5/5	5/5
Correlations	X ₁		0.167	0.000	0.000	X ₁		0.000	0.000	0.000
	X ₂			0.000	0.000	X ₂			0.200	0.200
	X ₃				−0.200	X ₃				0.200

Note: The D-efficiency of both designs is 97.032 (search algorithm: Modified Federov which is the most reliable, cf. Kuhfeld et al. 1994:548). Balance refers to the ratio of the levels of the dichotomous vignette variables (i.e., the ratio of “−1” to “1”). Design 1 consists of the vignettes 1, 2, 3, 4, 5, 6, 7, 8 of set 1 and 7, 8 of set 2. Design 2 consists of the vignettes 1, 4, 6, 7, 8 of set 1 and 1, 4, 5, 6, 7 of set 2 (the numbering refers to Table 2, first column). The confounded D-efficient design (D1–D8) used for design comparison consists of the following vignettes (numbering again refers to Table 2, first column):

D1: Set 1: vignettes 1–8, Set 2: vignettes 7 and 8;

D2: Set 1: vignettes 1–8, Set 2: vignettes 3 and 4;

D3: Set 1: vignettes 1–8, Set 2: vignettes 5 and 6;

D4: Set 1: vignettes 1–8, Set 2: vignettes 1 and 2;

D5: Set 1: vignettes 3 and 4, Set 2: vignettes 1–8;

D6: Set 1: vignettes 5 and 6, Set 2: vignettes 1–8;

D7: Set 1: vignettes 7 and 8, Set 2: vignettes 1–8;

D8: Set 1: vignettes 1 and 2, Set 2: vignettes 1–8.

A main effect only model for a confounded D-efficient designs consisting of D1–D4 (or alternatively for D5–D8) would already reach a D-efficiency of 100 across the four selected designs. By permuting the assignment between the four Inglehart items and the variables X ₁ to X ₄, one could further increase the number of designs used for a confounded D-efficient design.

D-efficient design 1 is less balanced than D-efficient design 2 (X ₁ is unbalanced for both designs, X ₂ only for design 2) but compensates for higher imbalance by having lower correlations between the vignette variables. In accordance with classical fractional factorial designs, the D-efficient design with the lower correlations was given priority over the more balanced design. Based on D-efficient design 1, seven further D-efficient designs of the same efficiency were constructed by switching the coding for the two levels of one or more vignette variables. Therefore, the combined confounded D-efficient design consists of eight D-efficient designs of the same D-efficiency. For the combined confounded D-efficient design consisting of eight D-efficient designs, not only does the main effect only model have a D-efficiency of 100 but so does the model that includes all possible interaction effects besides the main effects. Therefore, the confounded D-efficient design consisting of eight D-efficient designs is, like the vignette universe, fully balanced and orthogonal. Exactly the same applies to the collected data only as long as each of the eight designs is answered equally often. To distribute the eight D-efficient designs as evenly as possible across the 132 respondents, four of the eight D-efficient designs were used 16 times, and the remaining four 17 times ( $4 \cdot 16 + 4 \cdot 17 = 132$ vignette samples). The decision on which of the D-efficient designs should be included 17 times was determined randomly. This data set (ordered by the design and vignette number) was thereafter merged to the data set of the respondents, which was itself first ordered randomly by identification number (the order of the vignettes remained unchanged). Since for a set size of 10 only the answers for these 10 vignettes are needed, the answers for the six spare vignettes were replaced by a missing value and excluded from analyses. In this way, the number of vignettes used for a single multilevel analysis was reduced to 1,320 (132 respondents × 10 vignettes). Since for simulation purposes it was decided to have 50 multilevel regressions, the described procedure was repeated 49 times. This ensures that both the decision about which D-efficient designs will be used 17 times and the assignment between respondents and D-efficient designs vary randomly. Finally, for both set sizes (8 and 10 vignettes), a simple random design without replacement was also generated 50 times. This enables a comparison of quota and random designs on the basis of the same number of respondents.

The only respondent characteristic that is included in the multilevel analysis is gender. The decision to include a respondent characteristic essentially goes back to an assumption of Steiner and Atzmüller (2006:123-24): Confounding set effects (caused by differences between vignette sets) with respondent effects, as is the case for random designs, would have a negative impact on the estimators for respondent characteristics. Since vignette sets are randomly assigned to respondents, this fear seems to be unwarranted for the reason that, with an increasing number of respondents, both effects will, within the limits of sampling error, be asymptotically uncorrelated. Hence, the estimators for the respondent level should at least not suffer from a systematic bias.

All predictor variables included in the analyses were effect coded.¹⁸ Code −1 was used if a political goal was “not so important” for a fictitious governing party, code 1 if the goal was “very important” for the party. A respondent’s gender was coded −1 for males and 1 for females. The rescaled answer scale from the vignettes ranges from 0 (not at all) to 8 (very strongly).¹⁹ The set variable included in the confounded factorial design was coded −1 for the first half-fractional factorial design and 1 for the second half-fractional factorial design. In our case, systematic set effects can be excluded because all respondents answered the fully crossed vignette universe. Hence, the effect that is captured by a respective set variable exclusively measures the impact of the interaction effects between vignette variables that are perfectly confounded with this set effect. All multilevel analyses were carried out with the multilevel program HLM 6.

Empirical Results of the Example Comparison

Each multilevel regression model is based on equation (4), except for the predictor variables the same terminology is used as in equation (1), whereby the set variable (Set) was only included in the multilevel regression model for the confounded factorial design. Since all five estimated variance components turned out to be highly significant, none of them could be fixed (i.e., dropped from the equation for the multilevel model).

Vignette level (Level 1):

Y_{i j} = β_{0 j} + β_{1 j} {O r d e r}_{i j} + β_{2 j} {G o v e r n m e n t}_{i j} + β_{3 j} {P r i c e}_{i j} + β_{4 j} {S p e e c h}_{i j} + r_{i j} .

Respondent level (Level 2):

β_{0 j} = γ_{00} + γ_{01} {G e n d e r}_{j} + γ_{02} {S e t}_{j} + u_{0 j},

β_{1 j} = γ_{10} + γ_{11} {S e t}_{j} + u_{1 j},

β_{2 j} = γ_{20} + γ_{21} {S e t}_{j} + u_{2 j},

β_{3 j} = γ_{30} + γ_{31} {S e t}_{j} + u_{3 j},

β_{4 j} = γ_{40} + γ_{41} {S e t}_{j} + u_{4 j} .

The empirical results of the design comparison are presented in Table 4. The first column contains the estimators for the main effect model of the full factorial design. Since 50 separate multilevel analyses were estimated for each of both random designs and for each of both quota designs, the reported results for these cases also include the respective standard deviations. Two further columns were needed for the confounded factorial design in order to document the estimators of the main effect of the set variable and its interaction effect with each of the four vignette variables (mean effect: set variable) as well as their respective standard deviations.

Table 4.

Empirical Comparison of the Estimated Coefficients of Different Designs.

	Full Factorial Design	Simple Random Design (50 Samples)		Confounded Factorial Design (50 Samples)					Simple Random Design (50 Samples)		Confounded D-efficient Design (50 Samples)
Set Size (n)	16	8		8					10		10
Respondents (n)	132	132		132					132		132
Vignettes (n)	2,112	1,056		1,056					1,320		1,320
		Mean	Standard Deviation	Mean	Standard Deviation		Mean Effect: Set Variable	Standard Deviation	Mean	Standard Deviation	Mean	Standard Deviation
$\hat{γ}$ Intercept	3.515	3.514	0.029	3.514	0.022		−0.110	0.082	3.511	0.025	3.519	0.014
$\hat{γ}$ Gender	0.105	0.109	0.038	0.114	0.026				0.113	0.023	0.109	0.023
$\hat{γ}$ Order	0.428	0.430	0.031	0.430	0.023		−0.095	0.065	0.429	0.032	0.427	0.022
$\hat{γ}$ Government	0.668	0.671	0.027	0.674	0.026		−0.132	0.043	0.671	0.024	0.668	0.020
$\hat{γ}$ Price	0.320	0.321	0.031	0.321	0.018		−0.078	0.038	0.320	0.028	0.317	0.019
$\hat{γ}$ Speech	1.107	1.113	0.033	1.106	0.025		−0.045	0.058	1.102	0.023	1.104	0.019
$\hat{σ}$ Intercept	0.078	0.085	0.004	0.081	0.002		0.081	0.002	0.082	0.003	0.080	0.002
$\hat{σ}$ Gender	0.077	0.083	0.004	0.080	0.003				0.080	0.003	0.079	0.002
$\hat{σ}$ Order	0.068	0.074	0.003	0.072	0.002		0.072	0.002	0.072	0.002	0.072	0.002
$\hat{σ}$ Government	0.049	0.058	0.002	0.055	0.003		0.055	0.003	0.054	0.002	0.053	0.002
$\hat{σ}$ Price	0.044	0.054	0.002	0.051	0.002		0.051	0.002	0.049	0.002	0.049	0.002
$\hat{σ}$ Speech	0.054	0.062	0.003	0.058	0.002		0.058	0.002	0.059	0.002	0.058	0.002
t Intercept	44.924	41.624	1.885	43.423	1.254		−1.357	1.016	43.027	1.322	43.861	1.137
t Gender	1.372	1.322	0.482	1.424	0.328				1.416	0.297	1.384	0.300
t Order	6.307	5.800	0.518	5.960	0.376		−1.318	0.906	5.959	0.490	5.971	0.357
t Government	13.767	11.671	0.554	12.327	0.908		−2.419	0.786	12.341	0.675	12.561	0.648
t Price	7.313	5.935	0.589	6.362	0.381		−1.552	0.778	6.478	0.579	6.547	0.433
t Speech	20.646	18.038	1.118	19.021	0.758		−0.762	0.991	18.777	0.789	19.188	0.732
${\hat{τ}}_{00}$ Intercept	0.708	0.715	0.088	0.661		0.049			0.707	0.056	0.683	0.045
${\hat{τ}}_{11}$ Order	0.513	0.504	0.059	0.491		0.040			0.517	0.038	0.515	0.042
${\hat{τ}}_{22}$ Government	0.218	0.209	0.031	0.200		0.038			0.222	0.030	0.212	0.029
${\hat{τ}}_{33}$ Price	0.159	0.157	0.033	0.139		0.026			0.152	0.023	0.154	0.023
${\hat{τ}}_{44}$ Speech	0.286	0.278	0.046	0.250		0.027			0.287	0.034	0.282	0.027
${\hat{σ}}_{R}^{2}$	1.492	1.494	0.117	1.590		0.091			1.484	0.059	1.537	0.060
$χ_{U 00}^{2}$ Intercept	1,117.399	480.779	62.105	561.470		46.023			631.731	58.515	670.856	44.576
$χ_{U 11}^{2}$ Order	852.117	372.845	35.095	452.881		33.059			496.000	35.679	541.872	40.957
$χ_{U 22}^{2}$ Government	436.589	228.891	18.550	261.856		27.865			289.293	27.840	300.424	26.114
$χ_{U 33}^{2}$ Price	354.125	207.711	22.790	221.675		18.980			240.638	20.305	254.795	21.400
$χ_{U 44}^{2}$ Speech	532.550	267.545	33.554	294.477		22.641			333.976	34.769	358.390	25.550
$R_{1}^{2}$	.368	.372	.018	.378		.014			.368	.016	.366	.011
$R_{2}^{2}$	.013	.132	.050	.025		.024			.097	.035	.033	.028
Iterations	5.000	26.880	11.508	6.320		2.938			16.820	4.241	13.080	2.656

Note: Coding of gender: male = −1, female = 1. The effect of the set variable for the intercept is the main effect of the set variable. The effect of the set variable of the vignette variables is the interaction effect between a respective vignette variable and the set variable. The (Pseudo-) R ² for each level has been calculated according to the simplified formula of Snijders and Bosker (1994:350-54). All multilevel models were estimated via restricted maximum likelihood.

The estimated γs are documented in the first block (rows 1–6) of the table. Since for the full factorial design all correlations between the vignette variables and their interaction terms are exactly zero and since only respondents who judged all 16 vignettes were included, the reported $\hat{γ}$ s of the vignette variables will remain unchanged even if interaction terms are entered into the multilevel regression analyses. Therefore, the $\hat{γ}$ s of the full factorial design offer a suitable yardstick for identifying possible systematic biases of the estimators of the reduced designs. The biggest average deviation between the $\hat{γ}$ of a vignette characteristic of one of the four reduced designs on the one hand, and the respective coefficient for the full factorial design on the other amounts to .006 scale points. This deviation is observed for the simple random design of set size 8 ( $\hat{γ}$ Speech) and for the confounded factorial design ( $\hat{γ}$ Government). Furthermore, the average $\hat{γ}$ s for the intercept as well as for the respondents’ gender also show only negligible deviations from their respective yardstick. Hence, all four reduced designs allow the γs to be estimated without systematic bias.

The confounded factorial design is the only reduced design for which the magnitude of the interaction effects that are perfectly aliased with the main effects can be estimated via respective set variables across respondents (this applies as long as no or only a negligible set effect exists). Such set variables prevent the estimated main effects from being systematically biased by potentially nonnegligible interaction effects in cases where the response rate for different vignette sets (i.e., different fractional factorial designs) somewhat differs.

According to Table 2, each of the four Inglehart items is perfectly aliased within a vignette set with a three-way interaction term between the remaining three Inglehart items, in vignette set 1 positively (for instance, $X_{1} = X_{2} \cdot X_{3} \cdot X_{4}$ ) and in vignette set 2 negatively ( $- X_{1} = X_{2} \cdot X_{3} \cdot X_{4}$ ). This difference in the aliasing pattern allows the common magnitude of the perfectly confounded three-way interaction effect and the set effect to be estimated across respondents via a set variable (more precisely, via a two-way interaction term between the vignette variable and the set variable). Since in our case systematic set effects can be excluded, the magnitude of the estimated γ for the two-way interaction term between the vignette variable and the set variable (to be found in the column “mean effect: set variable”) measures the estimated magnitude of the three-way interaction effect with which the main effect is perfectly aliased within separate vignette sets. Under this condition, the estimated two-way interaction effect between the set variable and the item “order” means that the γ of this item is, due to its perfect aliasing with the three-way interaction between the items “government,” “price,” and “speech,” overestimated on average by .095 scale points for the first half-fractional factorial design and underestimated on average by the same amount for the second half-fractional factorial design (the polarity of the systematic bias is given by the coding of the set variable, i.e., a −1 for set 1 and a +1 for set 2). Hence, the $\hat{γ}$ for the three-way interaction between “government,” “price,” and “speech” amounts to .095 scale points. The γ for the items “government,” “price,” and “speech” is on average overestimated by .132, .078, and .045 scale points, respectively, for the first half-fractional factorial design and underestimated by the same amount for the second half-fractional factorial design. The average bias of the intercept which is perfectly aliased within each vignette set with the four-way interaction effect between all four Inglehart items is .110 for the first set and −.110 for the second one. Therefore, due to its aliasing pattern, using a half-fractional factorial design instead of a confounded fractional design would have resulted on average in slightly biased estimators for the γs. For the confounded factorial design, this systematic bias is captured by including set variables. The results in Table 4 show furthermore that both simple random designs also allow the γs to be estimated without systematic biases—despite the fact that there are small interaction effects present in the student sample.²⁰

The first systematic differences between the four reduced designs can be observed with respect to the standard deviations for the $\hat{γ}$ s which show the average distance between the estimated γs and their mean: The lower the standard deviation, the higher the reliability of a respective design. The standard deviation for the $\hat{γ}$ s of quota designs, because of the higher D-efficiency of both quota designs in comparison to simple random designs, is almost always smaller than the standard deviation of the simple random design of the same set size (the only exception is the $\hat{γ}$ for gender which has a standard deviation of .023 for both designs of set size 10). The factor by which the standard deviation for a $\hat{γ}$ of the simple random designs is higher than that of the quota design of the same set size reaches a maximal value of 1.72 for the comparison between the confounded factorial design and the respective simple random design (.018 vs. .031 for “price”) and a maximal value of 1.79 for the comparison between the confounded D-efficient design and the simple random design of set size 10 (.014 vs. .025 for the intercept). Therefore, both quota designs, for the same set size, show the higher reliability. Although the γs can be estimated for all four reduced designs without systematic bias, simple random designs, due to their lower reliability, also possess a lower internal validity than the respective quota design. This result is also corroborated by the estimated standard errors, $\hat{σ}$ , of the $\hat{γ}$ s (cf. block 2), although the observed pattern in this case is somewhat less pronounced.

If the same γ is estimated with a lower standard error, then the t value as the quotient of both estimators will increase. As a consequence, a $\hat{γ}$ will become significant more easily. The comparison of the means of the t values (block 3) for single vignette characteristics documents that with one exception (t value of gender for the confounded D-efficient design) they are higher for quota designs than for simple random designs of the same set size. In our case, not until the set size of the simple random design is increased by two vignettes do its t values reach nearly the same level as the respective t values of the confounded factorial design.

In multilevel analysis, the estimated sum of the squared deviations of the $\hat{β}$ of each context (here each respondent) from that unstandardized regression coefficient’s grand mean $\hat{γ}$ is called variance component $\hat{τ}$ , the estimated variance of the estimated residuals r_ij at level 1 (here the vignette level) is called ${\hat{σ}}^{2}$ . These estimators can be found in the second part of Table 4 (first block). The mean of the variance components of the confounded factorial design is always somewhat smaller than the respective mean of the variance components of both simple random designs. The respective χ² values (block 2) are, however, on average somewhat smaller for the simple random design than for the quota design of the same set size. Whether an estimated variance component τ becomes significant or not is tested in HLM via a χ² test. A χ² value is computed by summing up across all respondents the squared deviation of a respondent-specific estimated β from its overall estimator γ computed across all respondents, divided by the respondent-specific estimated sampling variance of that $\hat{β}$ (i.e., by the square of the respondent-specific estimated standard error of that $\hat{β}$ ; cf. also Hox 2010:47). As long as a β is estimated ceteris paribus with a smaller standard error (higher reliability) for a quota design than for a simple random design, the respective χ² value will be higher. Therefore, for a given set size, quota designs are better able to detect unexplained heterogeneity between respondents. Increasing the set size reduces the estimated standard error of the $\hat{β}$ s and a respective χ² value will increase. This consideration also explains why the simple random design of set size 8 shows the lowest χ² values and the confounded D-efficient design the highest ones.

The average coefficients of multiple determination computed according to the simplified formulas proposed by Snijders and Bosker (1994:350-54) can be found immediately below the χ² values. Including the four vignette variables in the multilevel model explains 36.8 percent of the vignette level variance of the full factorial design. All reduced designs show very similar values. Big differences between the designs become visible for the respondent level: According to the full factorial design, gender accounts for 1.3 percent of the level 2 variance. With an average of 2.5 percent explained variance, the confounded factorial design comes closest to this value. This small difference, however, can be traced back to the additional inclusion of the set variables at level 2. The remaining respondent level R ²s are on average 3.3 percent for the confounded D-efficient design, 9.7 percent for the simple random design of set size 10, and 13.2 percent for the simple random design of set size 8. The reason for the higher percentage of explained variance is that differences between the means of the vignette variables of different vignette sets lead to variation in the mean of the dependent variable between respondents. Including the vignette variables in regression analysis explains these differences between respondents—the level 2 coefficient of multiple determination increases. Since both half-fractional factorial designs of the confounded factorial design are balanced, no differences between the means of the vignette variables exist across either set. Although such mean differences also exist for the confounded D-efficient design, optimizing D-efficiency reduces these differences in comparison to simple random designs. Hence, including all vignette variables in multilevel analyses of quota designs either cannot increase the R ² values of the respondent level or only to a lower degree than for a simple random design.

A final look at the number of iterations shows that due to their more complex error structure where unique set effects are confounded with unique respondent effects, random designs need on average more iterations than quota designs until the maximum likelihood function for estimating multilevel regression converges.

Conclusion

The purpose of this contribution was to compare random designs and quota designs with respect to their reliability and their internal validity. The reliability of a design depends on its D-efficiency and therefore on the precision with which a respondent-specific β can be estimated. One strength of quota designs is that for a given set size they yield more reliable estimators for the respondent-specific βs than an average set of a simple random design. However, to what extent the estimators of a quota design can benefit from this characteristic, for hierarchically structured data like those of factorial surveys, depends on the heterogeneity of the respondents’ answer behavior. If no significant differences exist between the respondent-specific $\hat{β}$ s, then the unstandardized regression coefficients are exclusively estimated on the basis of the pooled data matrix where the D-efficiency of the factorial survey corresponds to the D-efficiency computed across all vignettes of the combined vignette sample (OLS regression based on the pooled data matrix). Accordingly, simple random designs with an increasing number of vignette sets within the limits of a decreasing sampling error easily produce more reliable results than unbalanced fractional factorial designs and D-efficient designs. This especially applies to factorial surveys where a relatively low set size is used. The estimators of a simple random design, however, insufficiently benefit from the fact that the design is becoming asymptotically more orthogonal and more balanced in cases where a comparably small set size is used and the answer behavior of respondents turns out to be very heterogeneous. Under such conditions, the $\hat{γ}$ s (posterior means) are strongly based on the average of the respective βs estimated for each individual respondent on the basis of individual vignette sets (separate respondent-specific data matrices). As a consequence, in comparison to unbalanced fractional factorial designs and D-efficient designs, the γs of a simple random design will be estimated with the lower efficiency. The only design that allows the γs to be estimated with a D-efficiency of 100, independent of the heterogeneity of the respondents’ answer behavior, is the balanced fractional factorial design (and, of course, also the confounded factorial design consisting of different balanced fractional factorial designs). Therefore, it is from the point of view of reliability an ideal design.

However, even a highly reliable estimator might be heavily biased and for that reason may be highly invalid. One strength of simple random designs is their low susceptibility to systematic biases as might be caused for fractional factorial designs and for D-efficient designs by their aliasing structure. Since the aliasing structure of simple random designs varies randomly across different vignette sets, respondent-specific βs will—depending on the set-specific correlations between the predictor variable and not included but nonnegligible interaction terms—either be systematically under- or overestimated, or almost correctly estimated. However, the estimated βs will show within the limits of the sampling error no systematic bias across all respondents.

The confounded factorial design is the only design that provides real protection against systematic biases that otherwise might be caused by the aliasing structure of a design. For a given set size, it also yields a higher D-efficiency than the simple random design. All in all, the balanced confounded factorial design is the most D-efficient design with the highest internal validity and for that reason represents an ideal reduced design. For complex research questions, and/or in cases where vignette variables with different numbers of levels have to be included, a confounded factorial design will rarely be found (cf. also Steiner and Atzmüller 2006:144). For such situations, one may instead opt for a confounded D-efficient design, which can be generated for every set size. Since generating a confounded D-efficient design is rather time consuming for very complex research questions with a high number of vignette variables and/or a high number of levels per vignette variable, for such cases one might in practice continue using simple random designs. The same applies to situations where impossible vignette combinations exist that prevent a suitable quota design from being found. Finally, using fractional factorial designs or D-efficient designs may only be recommended in cases where aliased interaction effects (as well as set effects that are aliased with these interaction effects within the vignette sets of the chosen design) can be assumed to be negligible relative to the affected βs that will be estimated in the multilevel model. This recommendation, however, is restricted to situations where relatively small vignette sets have to be used and where, besides the intercept, a high number of slopes have to be estimated with their own random component (cf. also Dülmer 2007:406).

Footnotes

Acknowledgements

I thank the anonymous SMR reviewers for their constructive comments and suggestions.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

Supplemental Material

The online appendix is available at

References

Addelman

Sidney

. 1962. “Orthogonal Main-effect Plans for Asymmetrical Factorial Experiments.” Technometrics 4:21–46.

Alexander

Cheryl S.

Becker

Henry J.

. 1978. “The Use of Vignettes in Survey Research.” Public Opinion Quarterly 42:93–104.

Alves

Weyne M.

Rossi

Peter H.

. 1978. “Who Should Get What? Fairness Judgments of the Distributive Earnings.” American Journal of Sociology 84:541–64.

Atzmüller

Christiane

Steiner

Peter M.

. 2010. “Experimental Vignette Studies in Survey Research.” Methodology 6:128–38.

Backhaus

Klaus

Erichson

Bernd

Plinke

Wulff

Weiber

Rolf

. 2000. Multivariate Analysemethoden. Eine anwendungsorientierte Einführung. Berlin, Germany: Springer.

Beck

Michael

Opp

Karl-Dieter

. 2001. “Der faktorielle Survey und die Messung von Normen.” Kölner Zeitschrift für Soziologie und Sozialpsychologie 53:283–306.

Bürklin

Wilhelm

Klein

Markus

Ruß

Achim

. 1996. “Postmaterialistischer oder anthropozentrischer Wertewandel? Eine Erwiderung auf Ronald Inglehart und Hans-Dieter Klingemann.” Politische Vierteljahresschrift 37:517–36.

Dülmer

Hermann

. 2007. “Experimental Plans in Factorial Surveys: Random or Quota Design?” Sociological Methods & Research 35:382–409.

Fox

John

. 1991. Regression Diagnostics: An Introduction. Newbury Park, CA: Sage.

10.

Friedrich

Robert J.

1982. “In Defense of Multiplicative Terms in Multiple Regression Equations.” American Journal of Political Science 26:797–833.

11.

Gunst

Richard

Mason

Robert L.

. 1991. How to Construct Fractional Factorial Experiments. Vol. 14. The Basic References in Quality Control: Statistical Techniques. Milwaukee, WI: ASQC Quality Press.

12.

Hox

Joop J.

2010. Multilevel Analysis. Techniques and Applications. 2nd ed. New York: Routledge.

13.

Hox

Joop J.

Kreft

Ita G. G.

Hermkens

Piet L. J.

. 1991. “The Analysis of Factorial Surveys.” Sociological Methods & Research 19:493–510.

14.

Inglehart

Ronald

. 1977. The Silent Revolution. Changing Values and Political Styles Among Western Publics. Princeton, NJ: Princeton University Press.

15.

Inglehart

Ronald

. 1994. “‘Polarized Priorities or Flexible Alternatives? Dimensionality in Inglehart’s Materialism-postmaterialism Scale’: A Comment.” International Journal of Public Opinion Research 6:289–92.

16.

Inglehart

Ronald

. 1997. Modernization and Postmodernization. Cultural, Economic, and Political Change in 43 Societies. Princeton, NJ: Princeton University Press.

17.

Jasso

Guillermina

. 2006. “Factorial Survey Methods for Studying Beliefs and Judgments.” Sociological Methods & Research 34:334–423.

18.

Jasso

Guillermina

Rossi

Peter H.

. 1977. “Distributive Justice and Earned Income.” American Sociological Review 42:639–51.

19.

Kirk

Roger E.

1995. Experimental Design: Procedures for the Behavioral Sciences. 3rd ed. Pacific Grove, CA: Brooks/Cole Publishing Company.

20.

Klein

Markus

Dülmer

Hermann

Ohr

Dieter

Quandt

Markus

Rosar

Ulrich

. 2004. “Response Sets in the Measurement of Values: A Comparison of Rating and Ranking Procedures.” International Journal of Public Opinion Research 16:474–83.

21.

Kmenta

Jan

. 1971. Elements of Econometrics. New York: Macmillan.

22.

Kuhfeld

Warren F.

1997. “Efficient Experimental Designs Using Computerized Searches.” Pp. 1–14 in Sawtooth Software. Research Paper Series, edited by Sawtooth Software. Sequium, WA. Retrieved October 8, 2013 (http://homepage.stat.uiowa.edu/∼gwoodwor/AdvancedDesign/KuhfeldTobiasGarratt.pdf).

23.

Kuhfeld

Warren F.

2010. “Experimental Design: Efficiency, Coding, and Choice Designs.” Pp. 53–241 in Marketing Research Methods in SAS. Experimental Design, Choice, Conjoint, and Graphical Techniques, edited by Kuhfeld

W. F.

. Retrieved October 8, 2013 (http://support.sas.com/resources/papers/tnote/tnote_marketresearch.html).

24.

Kuhfeld

Warren F.

Tobias

Randall D.

Garratt

Mark

. 1994. “Efficient Experimental Design with Marketing Research Applications.” Journal of Marketing Research 31:545–57. Retrieved October 8, 2013 (http://support.sas.com/resources/papers/tnote/tnote_marketresearch.html).

25.

Lawson

John

. 2002. “Regression Analysis of Experiments with Complex Confounding Pattern Guided by the Alias Matrix.” Computational Statistics & Data Analysis 39:227–41.

26.

Louviere

Jordan J.

1988. Analyzing Decision Making. Metric Conjoint Analysis. Sage University Paper Series on Quantitative Applications in Social Sciences, 07-067. Newbury Park, CA: Sage.

27.

McLean

Robert A.

Anderson

Virgil A.

. 1984. Applied Factorial and Fractional Designs. New York: Marcel Dekker.

28.

Neuman

W. Lawrence

. 2006. Social Research Methods. Qualitative and Quantitative Approaches. 6th ed. Boston, MA: Pearson.

29.

Nock

Steven L.

1982. “Family Social Status: Consensus on Characteristics.” Pp. 95–118 in Measuring Social Judgments. The Factorial Survey Approach, edited by Rossi

P. H.

Nock

St. L.

. Beverly Hills, CA: Sage.

30.

Raudenbush

Stephen W.

Bryk

Anthony S.

. 2002. Hierarchical Linear Models. Applications and Data Analysis Methods. 2nd ed. Thousand Oaks, CA: Sage.

31.

Rossi

Peter H.

1979. “Vignette Analysis. Uncovering the Normative Structure of Complex Judgments.” Pp. 176–86 in Qualitative and Quantitative Social Research: Papers in Honor of Paul F. Lazarsfeld, edited by Merton

R. K.

Coleman

J. S.

Rossi

P. H.

. New York: The Free Press.

32.

Rossi

Peter H.

Anderson

Andy B.

. 1982. “The Factorial Survey Approach: an Introduction.” Pp. 15–67 in Measuring Social Judgments. The Factorial Survey Approach, edited by Rossi

P. H.

Nock

St. L.

. Beverly Hills, CA: Sage.

33.

Ryan

Thomas P.

2007. Modern Experimental Design. Hoboken, NJ: John Wiley & Sons.

34.

Sahai

Hardeo

Ageel

Mohammed I.

. 2000. The Analysis of Variance. Fixed, Random and Mixed Models. Boston, MA: Birkhäuser.

35.

Sawtooth Software. 1997-2002. CVA: A Full-profile Conjoint Analysis System. Version 3. Sequim, WA: Sawtooth Software. Retrieved December 2, 2005 (http://www.sawtoothsoftware.com/download/techpap/cva3tech.pdf).

36.

Snijders

Tom A. B.

Bosker

Roel J.

. 1994. “Modeled Variance in Two-level Models.” Sociological Methods & Research 22:342–63.

37.

Snijders

Tom A. B.

Bosker

Roel J.

. 2012. Multilevel Analysis. An Introduction to Basic and Advanced Multilevel Modeling. 2nd ed. Los Angeles, CA: Sage.

38.

Steenbergen

Marco R.

Jones

Bradford S.

. 2002. “Modeling Multilevel Data Structures.” American Journal of Political Science 46:218–37.

39.

Steiner

Peter M.

Atzmüller

Christiane

. 2006. “Experimentelle Vignettendesigns in faktoriellen Surveys.” Kölner Zeitschrift für Soziologie und Sozialpsychologie 58:117–46.

40.

Stevens

Stanley S.

1975. Psychometrics. Introduction to its Perceptual, Neutral, and Social Prospects. New York: John Wiley & Sons.

41.

Thome

Helmut

. 1990. “Grundkurs Statistik für Historiker. Teil II: Induktive Statistik und Regressionsanalyse.” Historical Social Research, Supplement No. 3. Cologne: University of Cologne.

42.

Thome

Helmut

. 1991. “Modelling and Testing Interactive Relationships within Regression Analysis.” Historical Social Research 16:21–50.

43.

Wallander

Lisa

. 2009. “25 Years of Factorial Surveys in Sociology: A Review.” Social Science Research 38:505–20.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB